[04:28:35] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#9947879 (10Marostegui) [04:57:54] 06Data-Engineering, 10Dumps-Generation, 06SRE, 10Data Products (Data Products Sprint 15), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9947893 (10Marostegui) >>! In T368098#9946355, @xcollazo wrote: >>>! In T36809... [05:06:27] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#9947905 (10Marostegui) [05:23:09] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#9947950 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=8806bd66-4c8a-4047-9bba-1c5cb25125be) set by marostegui@cumin1002 for 1 day, 0:00:... [06:16:33] 14Analytics, 06Data-Engineering, 10Pageviews-API: Track page views by page ID rather than title (handles moved pages) - https://phabricator.wikimedia.org/T159046#9947993 (10stjn) 05Declined→03Open Closed without reason, re-opening. [09:26:12] 14Analytics, 06Data-Engineering-Icebox, 10ContentTranslation, 10Language-analytics, and 3 others: Special:ContentTranslationStats is slow and getting crowded - https://phabricator.wikimedia.org/T325790#9948589 (10Pginer-WMF) [10:44:40] (03CR) 10Milimetric: [C:03+2] Add wikilambda_zobject_join query to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1041815 (https://phabricator.wikimedia.org/T363434) (owner: 10David Martin) [10:46:56] (03CR) 10Milimetric: [C:03+2] Create wikilambda_zobject_join table in HQL [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1043205 (https://phabricator.wikimedia.org/T363436) (owner: 10Ecarg) [11:20:22] 06Data-Engineering, 10Dumps-Generation, 06SRE, 10Data Products (Data Products Sprint 15), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9948995 (10Ladsgroup) The explain: ` *************************** 1. row ******... [11:38:15] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949035 (10BTullis) a:05BTullis→03Jclark-ctr Hi @Jclark-ctr - apologies for the delay. I've updated the required files, so please feel free to reimag... [11:38:17] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949037 (10BTullis) [12:25:46] 06Data-Engineering, 10Dumps-Generation, 06SRE, 10Data Products (Data Products Sprint 15), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9949210 (10Ladsgroup) The prefetch has been done now so these are causing issu... [12:45:05] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: add approvers to analytics-research-admins - https://phabricator.wikimedia.org/T368435#9949268 (10Dzahn) @Miriam Would you be ok with becoming a formal "group approver" for the group "analytics-research-admins"? That would mean we'd ask... [12:55:26] 06Data-Engineering, 10Temporary accounts, 10Data-Platform-SRE (2024.06.17 - 2024.07.07): Generate a list of Superset users affected by changes to IP masking/temp users - https://phabricator.wikimedia.org/T347510#9949406 (10lbowmaker) @kostajh we have changes to make to a lot of our data pipelines once you de... [13:26:26] 06Data-Engineering, 10Dumps-Generation, 06SRE, 10Data Products (Data Products Sprint 15), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9949572 (10xcollazo) >>! In T368098#9949210, @Ladsgroup wrote: > The prefetch... [13:28:06] 06Data-Engineering, 10Dumps-Generation, 06SRE, 10Data Products (Data Products Sprint 15), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9949574 (10xcollazo) Ok I am going to postpone re-enabling the Commons RDF/JSO... [13:29:51] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949577 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host an-conf1004.eqiad.wmnet with OS bookworm [13:29:52] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949578 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host an-conf1005.eqiad.wmnet with OS bookworm [13:29:58] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949579 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host an-conf1006.eqiad.wmnet with OS bookworm [13:42:56] 06Data-Engineering, 10Data-Platform-SRE (2024.06.17 - 2024.07.07), 10Temporary accounts (Blockers to pilot wiki deployment): Generate a list of Superset users affected by changes to IP masking/temp users - https://phabricator.wikimedia.org/T347510#9949633 (10kostajh) [14:40:40] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949855 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host an-conf1004.eqiad.wmnet with OS bookworm execute... [14:40:44] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949856 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host an-conf1005.eqiad.wmnet with OS bookworm execute... [14:40:48] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#9949857 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host an-conf1006.eqiad.wmnet with OS bookworm execute... [15:04:59] 10Quarry, 10superset.wmcloud.org: Analysis and metrics collection for quarry and superset adoption - https://phabricator.wikimedia.org/T369150#9949976 (10JJMC89) [16:05:32] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#9950285 (10Marostegui) [16:45:36] 14Analytics, 06Data-Engineering, 10AQS2.0, 06Data Products, 10Pageviews-API: Track page views by page ID rather than title (handles moved pages) - https://phabricator.wikimedia.org/T159046#9950551 (10Ottomata) [16:50:16] Hey folks, there is an incredible amount of network traffic being generated and consumed by the an-worker nodes. In total it's around 40Gbps. [16:54:10] 07Analytics-Data-Problem, 06Data-Engineering, 10Data-Engineering-Dashiki, 10Data Products (Data Products Sprint 16), and 2 others: Investigate surprising "10% Other" portion of Analytics Browsers report - https://phabricator.wikimedia.org/T342267#9950622 (10WDoranWMF) [16:54:27] 06Data-Engineering, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 16): [MPIC] Analyse risk of potential performance issues with static approach to stream configuration - https://phabricator.wikimedia.org/T366627#9950629 (10WDoranWMF) [16:56:45] 06Data-Engineering, 10Dumps-Generation, 06SRE, 10Data Products (Data Products Sprint 16), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9950632 (10WDoranWMF) [16:57:43] cwhite: That's not unusual for Hadoop. See: https://grafana.wikimedia.org/d/ZvSPbGOnz/hadoop-server-utilization-btullis?orgId=1&viewPanel=23&from=now-30d&to=now [16:58:22] 06Data-Engineering, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 16): [MPIC] Analyse risk of potential performance issues with static approach to stream configuration - https://phabricator.wikimedia.org/T366627#9950681 (10Ottomata) We haven't done any number crunching, but I think we can... [16:58:27] 10Quarry, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): Allow Quarry to query ToolsDB public databases - https://phabricator.wikimedia.org/T348407#9950680 (10fnegri) Additional clean-up: I removed the grant for `heartbeat_p` as that is already implied in the grant for `%\_p`. ` MariaDB [(n... [16:58:59] I'm currently running some sqoop jobs that take mariaDB tables and update files in HDFS. They normally run sequentially at the beginning of the month, but I'm running two at once at the moment, because of a pipeline failure that I'm trying to backfill. [16:59:45] btullis: thanks for the graph, much better than the one I hacked together [17:00:03] btullis: that spike at 15:58 paged [17:00:20] Oh I see. Sorry. [17:01:42] there's a cooresponding drop in edits and total http request volume around the same time: https://grafana-rw.wikimedia.org/d/O_OXJyTVk/home-w-wiki-status?orgId=1&refresh=5m&from=now-3h&to=now [17:02:09] 06Data-Engineering, 10Dumps-Generation, 06SRE, 10Data Products (Data Products Sprint 16), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9950686 (10xcollazo) I played with the offending SQL statements from T368098#9... [17:02:22] I could cancel one of the jobs, but it would be a little inconvenient. They're consuming data from dbstore100[7-9] and an-redacteddb1001 - then writing this to an-worker1* and analytics1* (which are the hadoop workers). [17:03:24] That's only the manual pipeline re-runs that I am doing. There will be other production pipelines running and also possibly user jobs. [17:07:23] Do you think that this is enough evidence of causation, or does it just correlate? [17:08:11] Traffic seems to be slowing a bit now. It's unclear if these events are connected, that is the mediawiki load and backend response time [17:08:41] that's unclear: mediawiki load connected to the an-worker network utilization [17:09:55] Understood. [17:11:11] Graphs look to be within thresholds at the moment. I'll reach back out if it changes. :) [17:11:21] Thanks for looking into it with me! [17:23:12] Thanks. You're welcome. [18:17:23] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] [Maintenance] Define late arrival event strategy and idem-potent backfilling concept. - https://phabricator.wikimedia.org/T361503#9951176 (10Ottomata) [18:17:25] 06Data-Engineering, 07Epic: [Iceberg Migration] Apache Iceberg Migration - https://phabricator.wikimedia.org/T333013#9951177 (10Ottomata) [18:17:44] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Refine Refactoring] Define and implement a automated testing / comparison tool for config store configured datasets - https://phabricator.wikimedia.org/T361502#9951181 (10Ottomata) [18:17:47] 06Data-Engineering, 10Data Pipelines: Refine jobs should be scheduled by Airflow - https://phabricator.wikimedia.org/T307505#9951182 (10Ottomata) [18:19:21] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Refine Refactoring] Define and implement a automated testing / comparison tool for config store configured datasets - https://phabricator.wikimedia.org/T361502#9951189 (10Ottomata) @Antoine_Quhen @Ahoelzl should https://gerrit.wikimedia.org/r/c/analytics/refin... [18:29:29] 10Analytics-Canonical-Data, 06Data-Engineering, 06Movement-Insights: Automate the loading of canonical data tables to the Data Lake - https://phabricator.wikimedia.org/T339928#9951241 (10OSefu-WMF) [18:29:31] 10Analytics-Canonical-Data, 06Movement-Insights, 06Product-Analytics: Create a structured list of Wikimedia projects' creation and closure dates - https://phabricator.wikimedia.org/T336999#9951242 (10OSefu-WMF) [18:31:28] 14Analytics, 10AQS2.0, 06Tech-Docs-Team, 10Data Products (Epics Timeline), and 2 others: AQS 2.0 user documentation - https://phabricator.wikimedia.org/T288664#9951244 (10apaskulin) [18:35:42] (03CR) 10Mforns: [C:03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1043205 (https://phabricator.wikimedia.org/T363436) (owner: 10Ecarg) [18:39:00] (03CR) 10Mforns: [C:03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1041815 (https://phabricator.wikimedia.org/T363434) (owner: 10David Martin) [19:08:43] !log deploying airflow dags [19:08:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:29:30] (03CR) 10Mforns: [V:03+2 C:03+1] Create wikilambda_zobject_join table in HQL [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1043205 (https://phabricator.wikimedia.org/T363436) (owner: 10Ecarg) [19:31:12] (03CR) 10Mforns: [V:03+2 C:03+1] Add wikilambda_zobject_join query to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1041815 (https://phabricator.wikimedia.org/T363434) (owner: 10David Martin) [19:36:00] (03PS1) 10Milimetric: Fix improperly padded year/month in query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1051830 [19:37:57] (03PS2) 10Milimetric: Fix improperly padded year/month in query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1051830 [19:39:14] (03PS3) 10Milimetric: Fix improperly padded year/month in query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1051830 [19:41:16] (03CR) 10Milimetric: [C:03+1] "tested working in prod" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1051830 (owner: 10Milimetric) [19:41:42] (03CR) 10Ladsgroup: [C:03+1] Fix improperly padded year/month in query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1051830 (owner: 10Milimetric) [19:42:03] (03CR) 10Milimetric: [V:03+2 C:03+2] Fix improperly padded year/month in query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1051830 (owner: 10Milimetric) [19:51:04] 07Analytics-Data-Problem, 06Data-Engineering, 10Data-Engineering-Dashiki, 10Data Products (Data Products Sprint 16), and 2 others: Investigate surprising "10% Other" portion of Analytics Browsers report - https://phabricator.wikimedia.org/T342267#9951527 (10Milimetric) [19:54:05] 06Data-Engineering, 10Dumps-Generation, 06SRE, 10Data Products (Data Products Sprint 16), and 2 others: Dumps generation without prefetch cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#9951531 (10xcollazo) In {T29112} they modified the code to `ORDER BY page_id A... [19:56:04] 06Data-Engineering, 06Web-Team-Backlog, 10Event-Platform: Deprecate use of desktop- and mobilewebuiactions in Event Platform - https://phabricator.wikimedia.org/T368678#9951543 (10Jdlrobson) p:05Triage→03Medium