[00:00:33] (03PS2) 10Kimberly Sarabia: References new fragment in scroll [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/916625 (https://phabricator.wikimedia.org/T335309) [00:19:30] 10Data-Engineering, 10SRE, 10Security: Use user-specific passwords for accessing EventLogging database - https://phabricator.wikimedia.org/T120532 (10Dzahn) [00:38:28] (SystemdUnitFailed) firing: (19) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:38:32] (SystemdUnitFailed) firing: (18) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:38:32] (SystemdUnitFailed) firing: (18) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:04:22] 10Data-Engineering, 10Product-Analytics (Kanban): Model impact of User-Agent deprecation on top line metrics - https://phabricator.wikimedia.org/T336084 (10Milimetric) @Mayakp.wiki: you would be able to get the synthetic actor signature using the user_agent_map we retain in pageview_hourly, but we discard the... [09:54:28] 10Data-Engineering, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 12): Bring stat1009 into service - https://phabricator.wikimedia.org/T336036 (10Stevemunene) a:03Stevemunene [10:26:12] 10Data-Engineering, 10Data-Platform-SRE, 10LDAP-Access-Requests, 10SRE, and 2 others: Grant temporary access to web based Data Engineering tools to Bishop Fox - https://phabricator.wikimedia.org/T336357 (10BTullis) As the #WMF-Legal project tag was added to this task, some general information to avoid wron... [11:23:33] 10Data-Engineering, 10Data-Platform-SRE, 10LDAP-Access-Requests, 10SRE, and 2 others: Grant temporary access to web based Data Engineering tools to Bishop Fox - https://phabricator.wikimedia.org/T336357 (10BTullis) I have created a Wikitech account to be used for this purpose. {F36990981,width=70%} Wikit... [11:23:48] 10Data-Engineering, 10Data-Platform-SRE, 10LDAP-Access-Requests, 10SRE, and 2 others: Grant temporary access to web based Data Engineering tools to Bishop Fox - https://phabricator.wikimedia.org/T336357 (10BTullis) [12:38:33] (SystemdUnitFailed) firing: (18) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:48:55] 10Data-Engineering, 10SRE, 10Security: Use user-specific passwords for accessing Analytics MariaDB replica databases - https://phabricator.wikimedia.org/T120532 (10Ottomata) [13:22:54] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 13): Fix eventutillites_python stream_manager error_sink configuration - https://phabricator.wikimedia.org/T335591 (10lbowmaker) [13:24:00] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 13): jsonschema-tools tests - ensure that array items type is set - https://phabricator.wikimedia.org/T329515 (10lbowmaker) [13:24:39] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 13): Define Service Level Objective (SLO) for mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T333833 (10lbowmaker) [13:28:05] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 13): Event Catalog: Standardize Options Handling - https://phabricator.wikimedia.org/T333795 (10lbowmaker) [13:28:45] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Document Flink job deployment to k8s - https://phabricator.wikimedia.org/T329629 (10lbowmaker) [13:29:08] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 13): Document Flink job deployment to k8s - https://phabricator.wikimedia.org/T329629 (10lbowmaker) [13:33:51] FYI, as part of the migration of Kerberos servers to Bullseye (along with a related server replacement) I have switched the kadmin server (notably the part of Kerberos which allows for password changes) to krb2002 running Bullseye [13:34:13] if there should be any issues with hadoop or other kerberized services, let me know (or revert https://gerrit.wikimedia.org/r/c/operations/puppet/+/917359) [13:34:33] moritzm: Great, thanks. Will keep an eye out for anything odd. [13:35:56] PROBLEM - Kerberos Kpropd daemon on krb2002 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/sbin/kpropd https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos%23Daemons_and_their_roles [13:36:08] PROBLEM - Kerberos KAdmin daemon on krb1001 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/sbin/kadmind https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos%23Daemons_and_their_roles [13:37:02] that's kind of expected, the Icinga checks needs to pick up the new exported resource, gonna force some puppet runs [13:37:13] Ack, thanks. [13:37:25] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Geo Analytics Service - https://phabricator.wikimedia.org/T288305 (10JArguello-WMF) [13:38:08] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Media Analytics Service - https://phabricator.wikimedia.org/T288303 (10SGupta-WMF) [13:42:06] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Geo Analytics Service - https://phabricator.wikimedia.org/T288305 (10JArguello-WMF) [13:42:53] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Geo Analytics Service - https://phabricator.wikimedia.org/T288305 (10JArguello-WMF) [13:44:56] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Media Analytics service - Unit testing - https://phabricator.wikimedia.org/T336383 (10SGupta-WMF) [13:46:12] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0 : Media Analytics - Manual integration testing - https://phabricator.wikimedia.org/T336386 (10SGupta-WMF) [13:46:51] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Device Analytics service - https://phabricator.wikimedia.org/T288298 (10Atieno) [13:47:20] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Media Analytics Service - In-service testing split (test vs itest) - https://phabricator.wikimedia.org/T336389 (10SGupta-WMF) [13:48:13] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Page Analytics Service - https://phabricator.wikimedia.org/T288296 (10Sfaci) [13:48:29] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Media Analytics Service - Metrics dashboards - https://phabricator.wikimedia.org/T336392 (10SGupta-WMF) [13:49:14] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: Geo Analytics Service pipeline deployent - https://phabricator.wikimedia.org/T336393 (10JArguello-WMF) [13:49:28] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Media Analytics Service - Configure routing in staging and production - https://phabricator.wikimedia.org/T336396 (10SGupta-WMF) [13:50:31] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: Geo Analytics Service deployment pipeline integration - https://phabricator.wikimedia.org/T336393 (10JArguello-WMF) [13:52:26] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Page Analytics Service - https://phabricator.wikimedia.org/T288296 (10Sfaci) [13:53:05] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: Geo Analytics Service Deploy to Staging and production - https://phabricator.wikimedia.org/T336400 (10JArguello-WMF) [13:53:13] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Page Analytics Service - https://phabricator.wikimedia.org/T288296 (10Sfaci) [13:54:17] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: Geo Analytics In-service testing split (test vs itest) - https://phabricator.wikimedia.org/T336402 (10JArguello-WMF) [13:55:15] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Device Analytics service - https://phabricator.wikimedia.org/T288298 (10Atieno) [13:56:53] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: Geo Analytics create unit tests - https://phabricator.wikimedia.org/T336406 (10JArguello-WMF) [13:58:06] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: Geo Analytics Manual integration testing - https://phabricator.wikimedia.org/T336409 (10JArguello-WMF) [14:00:47] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: Geo Analytics service Configure routing in staging and production - https://phabricator.wikimedia.org/T336411 (10JArguello-WMF) [14:18:33] 10Data-Engineering, 10Data-Platform-SRE, 10LDAP-Access-Requests, 10SRE, and 2 others: Grant temporary access to web based Data Engineering tools to Bishop Fox - https://phabricator.wikimedia.org/T336357 (10bchoo) WMF Legal reviewed the contract on file for Bishop Fox and their employees should be covered u... [14:23:46] (03CR) 10Snwachukwu: Migrate pageview druid load hql queries to Airflow (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/910520 (https://phabricator.wikimedia.org/T334104) (owner: 10Snwachukwu) [14:24:10] (03CR) 10Snwachukwu: [V: 03+2 C: 03+2] Migrate pageview druid load hql queries to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/910520 (https://phabricator.wikimedia.org/T334104) (owner: 10Snwachukwu) [14:39:02] (03CR) 10Sergio Gimeno: "(cleaning my dashboard)" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/821711 (https://phabricator.wikimedia.org/T306018) (owner: 10Sergio Gimeno) [15:02:42] 10Data-Engineering, 10Data-Platform-SRE, 10LDAP-Access-Requests, 10SRE, and 2 others: Grant temporary access to web based Data Engineering tools to Bishop Fox - https://phabricator.wikimedia.org/T336357 (10BTullis) [15:06:41] 10Data-Engineering-Planning, 10SRE-swift-storage, 10Event-Platform Value Stream (Sprint 12): Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing - https://phabricator.wikimedia.org/T330693 (10Eevans) Per a discussion with @gmodena on IRC, I'll create an account named !... [15:09:37] 10Data-Engineering-Planning, 10Equity-Landscape: Load language data - https://phabricator.wikimedia.org/T315886 (10ntsako) [15:18:21] 10Data-Engineering, 10API Platform (AQS 2.0 Roadmap), 10Epic, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Page Analytics Service - https://phabricator.wikimedia.org/T288296 (10Sfaci) [15:25:17] 10Data-Engineering, 10Anti-Harassment, 10Data-Persistence, 10IP Masking, 10Patch-For-Review: Adding user_is_temp to the user table - https://phabricator.wikimedia.org/T333223 (10Tchanders) I've added a couple of patches for adding user.user_is_temp, in case the discussions on {T336176} don't get anywhere... [15:30:27] 10Data-Engineering, 10Data-Platform-SRE, 10LDAP-Access-Requests, 10SRE, and 2 others: Grant temporary access to web based Data Engineering tools to Bishop Fox - https://phabricator.wikimedia.org/T336357 (10BTullis) [15:32:36] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "No problem at all with down-sizing retention." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/904660 (owner: 10Krinkle) [15:36:11] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "These new metrics add more differentiation between one data point and another in this dataset. But the dataset has nothing I can see that" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/902571 (owner: 10Phedenskog) [15:38:00] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "Nice, we love fewer datasets. I'll confirm with you all via email if it's ok to delete the existing data we've kept so far." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/917855 (https://phabricator.wikimedia.org/T334550) (owner: 10Barakat Ajadi) [15:43:02] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Adapt virtualpageview druid scripts to spark [analytics/refinery] - 10https://gerrit.wikimedia.org/r/912360 (https://phabricator.wikimedia.org/T334105) (owner: 10Milimetric) [15:50:04] !log killed oozie job mobile_apps-uniques-monthly-coord because of https://phabricator.wikimedia.org/T329310 [15:50:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:51:48] !log stopped Airflow DAG mobile_app_session_metrics_weekly because of https://phabricator.wikimedia.org/T329310 [15:51:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:57:49] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 12): Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10mforns) This is the MR that will remove the DAG from airflow-dags: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/379 [16:36:26] !log installing airflow 2.6.0 on an-test-client1001 for T336286 [16:36:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:36:29] T336286: Upgrade Airflow to version 2.6.0 - https://phabricator.wikimedia.org/T336286 [16:38:33] (SystemdUnitFailed) firing: (18) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:50:44] !log deploy conda-analytics v0.0.13 T335721 [16:50:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:50:47] T335721: Add support for Iceberg in Spark - https://phabricator.wikimedia.org/T335721 [17:24:47] no realy view on this but it came up in my feed and thought it may intrest folks hre https://github.com/ipyflow/ipyflow [17:24:50] "a next-generation Python kernel for Jupyter" [17:41:31] thanks jbond [17:43:31] np [17:50:17] jbond: That looks really interesting. I might try it out and if it works well, we could add it to our conda-analytics environment so that users get it by default. [17:51:21] awesome glad it looks usefull :) [20:00:40] 10Data-Engineering, 10Data-Platform-SRE, 10LDAP-Access-Requests, 10SRE, and 3 others: Grant temporary access to web based Data Engineering tools to Bishop Fox - https://phabricator.wikimedia.org/T336357 (10Dzahn) 05Open→03In progress p:05Triage→03High [20:01:10] hm, some failures trying to deploy refinery to an-airflow1001: [20:01:11] 20:00:31 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'analytics/refinery', '-g', 'default', 'fetch', '--refresh-config'] (ran as analytics-deploy@an-airflow1001.eqiad.wmnet) returned [255]: ssh: Could not resolve hostname an-airflow1001.eqiad.wmnet: Name or service not known [20:02:22] 10Data-Engineering, 10Data-Platform-SRE, 10LDAP-Access-Requests, 10SRE, and 3 others: Grant temporary access to web based Data Engineering tools to Bishop Fox - https://phabricator.wikimedia.org/T336357 (10Dzahn) Not sure if a manager has to say approved on ticket for this or not. +1 to the patch but waiti... [20:37:32] !log deployed refinery (except to an-airflow1001) [20:37:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:38:33] (SystemdUnitFailed) firing: (18) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:55:11] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 12): Setup config to allow lineage instrumentation - https://phabricator.wikimedia.org/T333004 (10Antoine_Quhen) Update: I'm emitting metadata to Kafka from an ad-hoc Airflow data lineage task. The configuration is setting up the communication with Kafka an... [22:56:11] milimetric: an-airflow1001 has been decommissioned now. T333697 [22:56:12] T333697: decom an-airflow1001 - https://phabricator.wikimedia.org/T333697