[03:25:30] (SystemdUnitFailed) firing: (9) monitor_refine_event_sanitized_analytics_immediate.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:21:19] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:24:16] (SystemdUnitFailed) firing: (9) monitor_refine_event_sanitized_analytics_immediate.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:42:59] Hi team! I’ll be in later than usual today due to single dadding. I should be in around 10am CET [08:25:30] (SystemdUnitFailed) firing: (8) user-runtime-dir@24065.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:48:52] 10Data-Engineering (Sprint 7), 10Data-Platform-SRE, 10Patch-For-Review: [Data Platform] Deploy Spark History Service - https://phabricator.wikimedia.org/T330176 (10brouberol) [08:48:58] 10Data-Engineering (Sprint 7), 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10Patch-For-Review: Investigate Spark History Server silent errors when downloading some files from HDFS - https://phabricator.wikimedia.org/T354777 (10brouberol) 05Open→03Resolved After having redeployed the spark history serve... [08:49:07] 10Data-Engineering (Sprint 7), 10Data-Platform-SRE, 10Patch-For-Review: [Data Platform] Deploy Spark History Service - https://phabricator.wikimedia.org/T330176 (10brouberol) [08:50:59] 10Data-Engineering (Sprint 7), 10Data-Platform-SRE, 10Patch-For-Review: [Data Platform] Deploy Spark History Service - https://phabricator.wikimedia.org/T330176 (10brouberol) We have fixed the last [[ https://phabricator.wikimedia.org/T354777 | remaining issue ]] with the Spark History Server. It is now cons... [08:51:07] 10Data-Engineering (Sprint 7), 10Data-Platform-SRE, 10Patch-For-Review: [Data Platform] Deploy Spark History Service - https://phabricator.wikimedia.org/T330176 (10brouberol) 05Open→03Resolved [10:29:52] btullis: Hey! Would you have time to jump in a call to check on https://phabricator.wikimedia.org/T354452 ? [10:30:14] btullis: [10:30:17] btullis: meet.google.com/wyr-qcdd-sjs [10:30:30] Yep, jumping in now. [11:34:15] (SystemdUnitFailed) firing: (9) mariadb.service Failed on an-mariadb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:35:09] ^ this is me, working with superset. It has already recovered and is not service-affecting. Apologies for the noise. [12:48:50] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 3 others: Revert dbstore migration from puppet7 to puppet5 - https://phabricator.wikimedia.org/T354411 (10MoritzMuehlenhoff) @Marostegui @ABran-WMF With https://gerrit.wikimedia.org/r/c/operations/puppet/+... [13:02:33] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 3 others: Revert dbstore migration from puppet7 to puppet5 - https://phabricator.wikimedia.org/T354411 (10Marostegui) 05Stalled→03Declined Good to decline! We can always reopen if needed. Thank you Ben... [13:35:03] Hi btullis - Would you have a minute for regarding an archiva issue I'm facing? [13:35:15] for me sorry [13:51:00] 10Data-Engineering, 10Wikidata, 10Wmfdata-Python, 10Wikidata Analytics (Kanban): Add linter and formatter to wmfdata-python (and link check) - https://phabricator.wikimedia.org/T348999 (10mpopov) [13:55:00] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Create 3 microsites for wdqs full graph, main graph, & scholarly articles - https://phabricator.wikimedia.org/T354658 (10Gehel) a:03RKemper [13:56:01] 10Data-Engineering, 10Data Products: Data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10mpopov) [13:56:48] 10Data-Engineering, 10Data Products: Data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10mpopov) @Isaac has shared an idea > my first guess would be oversighted edits? these are deleted but also suppressed from public logs so possibly not accounted for in our data (even though... [14:18:14] 10Data-Engineering, 10MediaWiki-Core-Tests, 10MediaWiki-extensions-CentralNotice, 10MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), 10ci-test-error: CentralNotice failing in browser test on master - https://phabricator.wikimedia.org/T354977 (10Ejegg) Well, fixing the EventLogging test cleans up the logging a... [14:25:54] 10Data-Engineering, 10Data Products: Data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10mpopov) By the way, that's just an example. It's actually more than just 4 edits: `lang=sql WITH edit_counts AS ( SELECT user_is_bot, is_deleted, is_reverted, SUM(IF(snapshot = '... [14:27:08] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [14:29:41] 10Data-Engineering, 10Data Products: Data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10mpopov) The delta between snapshots for bot-made edit counts is due to accounts being marked as bot or not changes over time, so edits may be considered as having been made by bots in one s... [14:35:49] 10Data-Engineering, 10Data Products: Minor data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10mpopov) [14:37:03] 10Data-Engineering, 10Data Products: Minor data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10WDoranWMF) hey @mpopov, thanks for looping us in. As a plug, we have a [[ https://phabricator.wikimedia.org/T355182 | fancy intake process for #data_products ]] and a more [[ https:/... [14:40:35] 10Data-Engineering, 10Data Products: Minor data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10mpopov) p:05Triage→03Low [14:40:45] 10Data-Engineering, 10Data Products: Minor data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10mpopov) [14:42:37] 10Data-Engineering, 10MediaWiki-Core-Tests, 10MediaWiki-extensions-CentralNotice, 10MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), and 2 others: CentralNotice failing in browser test on master - https://phabricator.wikimedia.org/T354977 (10Ejegg) Core CI runs the saveOptions tests under mediawiki-quibble-vendo... [14:45:33] 10Data-Engineering, 10Data Products: Minor data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10mpopov) @WDoranWMF: Sorry about that! Yes, I will use https://phabricator.wikimedia.org/maniphest/task/edit/form/121/ going forward and update this ticket to use that template. I miss... [14:46:58] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [14:47:29] 10Data-Engineering, 10Data Products: Minor data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10mpopov) [14:48:42] 10Data-Engineering, 10Data Products: Minor data quality issue in wmf.edit_hourly - https://phabricator.wikimedia.org/T355182 (10mpopov) [14:53:18] 10Data-Engineering, 10Data Products: Past edits increase in wmf.edit_hourly with every new snapshot - https://phabricator.wikimedia.org/T355182 (10mpopov) [14:54:50] (03CR) 10Snwachukwu: Migration of browser General table to iceberg format. (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/988711 (https://phabricator.wikimedia.org/T352670) (owner: 10Snwachukwu) [14:55:39] 10Data-Engineering, 10Data Products: Past edits increase in wmf.edit_hourly with every new snapshot - https://phabricator.wikimedia.org/T355182 (10WDoranWMF) @mpopov yeah, I'm sorry about that I need to fix that link we had some issues around tasks - we need them to appear in the add/edit list but that is rest... [14:59:38] (03CR) 10Joal: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/988711 (https://phabricator.wikimedia.org/T352670) (owner: 10Snwachukwu) [15:26:06] 10Data-Platform-SRE, 10Patch-For-Review: Ensure Elastic stack works on bookworm - https://phabricator.wikimedia.org/T353392 (10Gehel) @MoritzMuehlenhoff : we're not working on the Bookworm upgrade yet. We're probably going to need Java 11 on Bookworm when we do, but we'll confirm once we get started. [15:29:02] 10Data-Engineering, 10Fundraising-Backlog, 10MediaWiki-Core-Tests, 10MediaWiki-extensions-CentralNotice, and 3 others: CentralNotice failing in browser test on master - https://phabricator.wikimedia.org/T354977 (10Ejegg) Running the qunit tests locally in a wiki with CN and its dependencies (plus a few ext... [15:35:31] (SystemdUnitFailed) firing: (8) user-runtime-dir@24065.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:45:21] 10Data-Platform-SRE: Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2088.codfw.wmnet with OS bullseye [15:50:43] 10Data-Platform-SRE: Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2089.codfw.wmnet with OS bullseye [16:04:22] 10Data-Platform-SRE: Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2091.codfw.wmnet with OS bullseye [16:06:42] 10Data-Platform-SRE: Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2092.codfw.wmnet with OS bullseye [16:08:57] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10bking) p:05High→03Medium a:03bking [16:09:54] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2093.codfw.wmnet with OS bullseye [16:12:31] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2094.codfw.wmnet with OS bullseye [16:15:28] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2095.codfw.wmnet with OS bullseye [16:18:37] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2096.codfw.wmnet with OS bullseye [16:22:03] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2097.codfw.wmnet with OS bullseye [16:27:33] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2098.codfw.wmnet with OS bullseye [16:35:13] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2099.codfw.wmnet with OS bullseye [16:36:34] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2090.codfw.wmnet with OS bullseye completed: - elastic2090 (**PASS**) - Do... [16:42:27] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2100.codfw.wmnet with OS bullseye [16:49:14] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2101.codfw.wmnet with OS bullseye [16:54:06] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2102.codfw.wmnet with OS bullseye [17:04:31] joal: I'm so sorry that I missed the ping. [17:04:49] btullis: no problem - will you have time in say 1/2h maybe? [17:05:01] Yes, will do. [17:05:28] 10Data-Engineering, 10Movement-Insights, 10Traffic, 10Patch-For-Review: Identify and label prefetch proxy data in our traffic - https://phabricator.wikimedia.org/T346463 (10dr0ptp4kt) It's live and looking good in `kafkacat`. Now we wait a little for stuff to show up in the analytics tables. Thanks @Vgutie... [17:05:36] thanks a lot btullis [17:06:16] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2088.codfw.wmnet with OS bullseye executed with errors: - elastic2088 (**FAI... [17:11:37] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2089.codfw.wmnet with OS bullseye executed with errors: - elastic2089 (**FAI... [17:21:03] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2098.codfw.wmnet with OS bullseye executed with errors: - elastic2098 (**FAI... [17:21:27] 10Data-Engineering (Sprint 7), 10Data-Platform-SRE, 10Patch-For-Review: [Data Platform] Deploy Spark History Service - https://phabricator.wikimedia.org/T330176 (10xcollazo) >I hope it's useful for y'all! Just passing by to thank you for this work! It will definitely make debugging easier, and also will mak... [17:25:48] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2091.codfw.wmnet with OS bullseye executed with errors: - elastic2091 (**FAI... [17:27:40] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2092.codfw.wmnet with OS bullseye executed with errors: - elastic2092 (**FAI... [17:28:44] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2099.codfw.wmnet with OS bullseye executed with errors: - elastic2099 (**FAI... [17:30:49] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2093.codfw.wmnet with OS bullseye executed with errors: - elastic2093 (**FAI... [17:31:40] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2102.codfw.wmnet with OS bullseye completed: - elastic2102 (**PASS**) - Do... [17:31:48] joal: Ready any time. [17:33:28] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2094.codfw.wmnet with OS bullseye executed with errors: - elastic2094 (**FAI... [17:36:07] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2100.codfw.wmnet with OS bullseye executed with errors: - elastic2100 (**FAI... [17:36:31] 10Data-Engineering, 10Data-Platform-SRE: Send a critical alert to data-engineering if produce_canary_events isn't running correctly - https://phabricator.wikimedia.org/T337055 (10xcollazo) There was another repro of this situation on 2024-01-17. TL;DR: `event.mediawiki_page_content_change_v1` and `event.media... [17:36:33] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2095.codfw.wmnet with OS bullseye executed with errors: - elastic2095 (**FAI... [17:39:29] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2096.codfw.wmnet with OS bullseye executed with errors: - elastic2096 (**FAI... [17:42:45] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2101.codfw.wmnet with OS bullseye executed with errors: - elastic2101 (**FAI... [17:43:09] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2097.codfw.wmnet with OS bullseye executed with errors: - elastic2097 (**FAI... [17:53:49] btullis: sorry I got caught in time-space void [17:53:59] btullis: available now, ot am I too late? [17:54:16] I'm here, batcave? [17:54:29] 10Data-Engineering, 10Movement-Insights, 10Traffic, 10Patch-For-Review: Identify and label prefetch proxy data in our traffic - https://phabricator.wikimedia.org/T346463 (10dr0ptp4kt) Documentation updated: https://wikitech.wikimedia.org/w/index.php?title=X-Analytics&diff=2140528&oldid=2028273 [17:55:05] joal: https://meet.google.com/rxb-bjxn-nip [18:13:42] 10Data-Engineering, 10Event-Platform: ProduceCanaryEvents job should be scheduled by Airflow - https://phabricator.wikimedia.org/T341229 (10Ottomata) [18:25:17] 10Data-Engineering, 10Event-Platform: ProduceCanaryEvents job should be scheduled by Airflow and/or a k8s service - https://phabricator.wikimedia.org/T341229 (10Ottomata) [18:25:41] 10Data-Engineering, 10Event-Platform: ProduceCanaryEvents job should be scheduled by Airflow and/or a k8s service - https://phabricator.wikimedia.org/T341229 (10Ottomata) [18:25:44] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2088.codfw.wmnet with OS bullseye [18:28:47] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2089.codfw.wmnet with OS bullseye [18:29:04] 10Data-Engineering (Sprint 7), 10Data-Platform-SRE: Users in archiva-deployer group can't upload artifacts anymore. - https://phabricator.wikimedia.org/T355352 (10JAllemandou) [18:29:17] 10Data-Engineering (Sprint 7), 10Data-Platform-SRE: Users in archiva-deployer group can't upload artifacts anymore. - https://phabricator.wikimedia.org/T355352 (10JAllemandou) [18:29:39] 10Data-Engineering (Sprint 7), 10Data-Platform-SRE: Users in archiva-deployer group can't upload artifacts anymore. - https://phabricator.wikimedia.org/T355352 (10JAllemandou) [18:30:17] 10Data-Engineering (Sprint 7), 10Data Pipelines, 10Discovery-Search, 10Java-Scala-Standardization, 10Patch-For-Review: [Maintenance] We should have a top level maven parent pom based on wikimedia-discovery-discovery-parent-pom, - https://phabricator.wikimedia.org/T309097 (10JAllemandou) Blocked on https:... [18:34:49] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2091.codfw.wmnet with OS bullseye [18:43:08] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2092.codfw.wmnet with OS bullseye [18:47:34] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2093.codfw.wmnet with OS bullseye [19:06:42] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2089.codfw.wmnet with OS bullseye completed: - elastic2089 (**PASS**) - Re... [19:11:30] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2091.codfw.wmnet with OS bullseye completed: - elastic2091 (**PASS**) - Re... [19:19:49] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2092.codfw.wmnet with OS bullseye completed: - elastic2092 (**PASS**) - Re... [19:23:54] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2094.codfw.wmnet with OS bullseye [19:24:16] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2093.codfw.wmnet with OS bullseye completed: - elastic2093 (**PASS**) - Re... [19:27:14] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2095.codfw.wmnet with OS bullseye [19:35:32] (SystemdUnitFailed) firing: (8) user-runtime-dir@24065.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:50:45] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10Wikidata, 10Discovery-Search (Current work), 10Patch-For-Review: Create DNS records for 3 new WDQS endpoints - https://phabricator.wikimedia.org/T354662 (10RKemper) Deployed the following changes via `sudo -i authdns-update`: ` diff --git templates/wikidata.... [19:59:50] 10Data-Engineering, 10Movement-Insights, 10Traffic, 10Patch-For-Review: Identify and label prefetch proxy data in our traffic - https://phabricator.wikimedia.org/T346463 (10dr0ptp4kt) It's entering the analytics system based on the following query: ` select http_status, hour, x_analytics_map['prefetch_sec... [20:03:45] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2095.codfw.wmnet with OS bullseye completed: - elastic2095 (**PASS**) - Re... [20:30:07] 10Data-Engineering (Sprint 7), 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Users in archiva-deployer group can't upload artifacts anymore. - https://phabricator.wikimedia.org/T355352 (10Gehel) p:05Triage→03High [20:37:00] 10Data-Engineering, 10Movement-Insights, 10Traffic, 10Patch-For-Review: Identify and label prefetch proxy data in our traffic - https://phabricator.wikimedia.org/T346463 (10fkaelin) Nice! ` pa = spark.table("wmf.pageview_actor").where("""year=2024 and month=1 and day=18 and hour=16""") prefetch_fields = [... [20:44:09] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2094.codfw.wmnet with OS bullseye executed with errors: - elastic2094 (**FAI... [21:23:09] (03PS3) 10Mforns: Add query to load MediaWiki snapshot to Cassandra AQS config table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/989558 (https://phabricator.wikimedia.org/T352948) [21:23:25] (03PS4) 10Mforns: Add query to load (set) properties to Cassandra AQS config table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/989558 (https://phabricator.wikimedia.org/T352948) [21:24:09] (03PS5) 10Mforns: Add query to load (set) properties to Cassandra AQS config table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/989558 (https://phabricator.wikimedia.org/T352948) [21:59:50] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2088.codfw.wmnet with OS bullseye executed with errors: - elastic2088 (**FAI... [22:00:29] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2088.codfw.wmnet with OS bullseye [22:07:39] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Migrate Search Platform-owned hosts to Puppet 7 - https://phabricator.wikimedia.org/T354959 (10bking) a:03bking [22:30:25] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2094.codfw.wmnet with OS bullseye [22:56:28] 10Data-Engineering (Sprint 7): [Data Quality] Implement basic data quality metrics for MW history - https://phabricator.wikimedia.org/T354692 (10Ahoelzl) a:05tchin→03Snwachukwu [23:25:22] 10Data-Engineering, 10Fundraising-Backlog, 10MediaWiki-Core-Tests, 10MediaWiki-extensions-CentralNotice, and 3 others: CentralNotice failing in browser test on master - https://phabricator.wikimedia.org/T354977 (10Umherirrender) When skipping all qunit tests from CentralNotice it works, even with the Event... [23:35:32] (SystemdUnitFailed) firing: (8) user-runtime-dir@24065.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:42:01] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Migrate Search Platform-owned hosts to Puppet 7 - https://phabricator.wikimedia.org/T354959 (10bking) `elastic2086` is the first Elastic host to successfully migrate from Puppet 5 to Puppet 7. We also have some net-new hosts on Puppet 7 that didn't require a migratio... [23:48:02] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2088.codfw.wmnet with OS bullseye executed with errors: - elastic2088 (**FAI... [23:49:24] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2098.codfw.wmnet with OS bullseye [23:50:40] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host elastic2094.codfw.wmnet with OS bullseye executed with errors: - elastic2094 (**FAI... [23:57:24] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Service implementation for elastic2087-2109 - https://phabricator.wikimedia.org/T353878 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host elastic2099.codfw.wmnet with OS bullseye