[00:34:40] (SystemdUnitFailed) firing: (7) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:15:30] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:18:22] (SystemdUnitFailed) firing: (7) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:46:23] 10Data-Engineering: Run maintain-views on zhwiki, newiki - https://phabricator.wikimedia.org/T334041 (10MusikAnimal) [02:48:39] 10Data-Engineering, 10XTools: Run maintain-views on zhwiki, newiki - https://phabricator.wikimedia.org/T334041 (10MusikAnimal) [05:19:40] (SystemdUnitFailed) firing: (6) wmf_auto_restart_envoyproxy.service Failed on an-test-ui1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:00:33] 10Data-Engineering, 10Data-Persistence, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10Marostegui) [07:00:53] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10Marostegui) [07:08:20] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 8 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10Marostegui) [07:10:11] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 8 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10Marostegui) @jcrespo could you double check the backup-related hosts? Thanks! [07:10:26] 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, 10Infrastructure-Foundations, and 8 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10ayounsi) p:05Triage→03Medium [07:12:34] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 8 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10Marostegui) [07:13:00] 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, 10Infrastructure-Foundations, and 8 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10ayounsi) [07:13:20] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 8 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10Marostegui) m3-master and m5-master have been failed over. [07:13:50] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 8 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10Marostegui) [07:21:15] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 8 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10Marostegui) @ayounsi to confirm, codfw will be depooled before this maintenance right? @akosiaris @Joe ? [07:24:30] Hi folks, I am reimaging the kafka test nodes to bullseye [07:24:35] if you see errors it is my fault [07:30:41] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10akosiaris) Yes, we 'll have to depool codfw. [07:30:59] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10ayounsi) >>! In T334049#8757732, @Marostegui wrote: > @ayounsi to confirm, codfw will be depooled before this maintenance right? @akosiaris... [07:31:27] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10MoritzMuehlenhoff) [07:37:55] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10MoritzMuehlenhoff) [07:45:40] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10jcrespo) [07:46:31] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10jcrespo) >>! In T333377#8757686, @Marostegui wrote: > @jcrespo could you double check the backup-related hosts? Thanks! Documented- mi... [07:47:38] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10jcrespo) [08:01:59] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10Marostegui) [08:03:31] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10Marostegui) @jcrespo kindly check backup servers needs. Thanks [08:15:01] 10Data-Engineering-Planning, 10SRE-swift-storage, 10Event-Platform Value Stream (Sprint 11): Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing - https://phabricator.wikimedia.org/T330693 (10dcausse) >>! In T330693#8756120, @Ottomata wrote: > Generally implementers wo... [08:19:07] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10jcrespo) [08:21:24] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Replace db1108 with db1208 - https://phabricator.wikimedia.org/T334055 (10Marostegui) [08:21:59] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade db1108 to Bullseye - https://phabricator.wikimedia.org/T304492 (10Marostegui) 05Open→03Declined Closing this in favour of: T334055 [08:58:06] 10Data-Engineering, 10SRE, 10ops-eqiad: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333960 (10Peachey88) [08:58:09] 10Data-Engineering, 10SRE, 10ops-eqiad: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10Peachey88) [09:19:40] (SystemdUnitFailed) firing: (6) wmf_auto_restart_envoyproxy.service Failed on an-test-ui1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:43:22] (SystemdUnitFailed) firing: (6) wmf_auto_restart_envoyproxy.service Failed on an-test-ui1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:31:38] (03PS2) 10Aqu: Use a disallow list to filter top articles sent to Cassandra [analytics/refinery] - 10https://gerrit.wikimedia.org/r/905701 (https://phabricator.wikimedia.org/T333940) [10:33:22] (SystemdUnitFailed) firing: (6) jupyter-dsaez-singleuser-conda-analytics.service Failed on stat1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:27:25] (03PS1) 10AikoChou: Add event schema for ML classification change on current page state [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/905965 (https://phabricator.wikimedia.org/T331401) [11:28:31] (03PS2) 10AikoChou: Add event schema for ML classification change on current page state [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/905965 (https://phabricator.wikimedia.org/T331401) [11:32:40] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10hnowlan) [11:33:36] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10hnowlan) [12:11:24] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 8 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10ayounsi) [13:00:15] (03CR) 10Mforns: [C: 03+2] "+2 LGTM! I left a typo comment, if you want to fix, otherwise, it's ok to me!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/905701 (https://phabricator.wikimedia.org/T333940) (owner: 10Aqu) [13:04:40] (SystemdUnitFailed) firing: (7) nagios-nrpe-server.service Failed on kafka-test1008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:31:36] 10Data-Engineering-Planning: Data Engineering Pairing system - https://phabricator.wikimedia.org/T327790 (10JArguello-WMF) [13:57:36] 10Data-Engineering, 10serviceops-radar, 10Event-Platform Value Stream (Sprint 11): Store Flink HA metadata in Zookeeper - https://phabricator.wikimedia.org/T331283 (10JMeybohm) After chatting with @elukey about this it seems it's a good option to move away from kubernetes configmaps to zookeeper for Flink st... [13:58:22] (SystemdUnitFailed) firing: (7) ifup@ens13.service Failed on kafka-test1010:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:58:41] 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10Jclark-ctr) 05Open→03Resolved updated backplane firmware looks like errors have resolved [13:59:13] steve_munene: --^ [13:59:27] so maybe an-worker1132 doesn't need to be removed from hdfs [14:00:48] !log powercycle an-worker1132 [14:00:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:11:35] PROBLEM - eventgate-analytics-external validation error rate too high on alert2001 is CRITICAL: 2.001 gt 2 https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos [14:12:24] 10Data-Engineering, 10Machine-Learning-Team, 10Research, 10Event-Platform Value Stream (Sprint 11), 10Patch-For-Review: Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10Ottomata) Nice! I'll add some comments there, but ask another que... [14:16:56] elukey interesting, what are your thoughts? [14:21:22] 10Data-Engineering, 10serviceops-radar, 10Event-Platform Value Stream (Sprint 11): Store Flink HA metadata in Zookeeper - https://phabricator.wikimedia.org/T331283 (10Ottomata) > It would be nice to know how easy it is to switch between the two HA Service implementations. IIUC, it should be easy. We'd chan... [14:26:59] (03CR) 10Ottomata: Add event schema for ML classification change on current page state (032 comments) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/905965 (https://phabricator.wikimedia.org/T331401) (owner: 10AikoChou) [14:30:36] 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10elukey) 05Resolved→03Open @Jclark-ctr hi! I tried to reboot the node and it gets blocked when checking the hard drivers, telling me about possible preserved cache et... [14:32:56] steve_munene: I tried to check in the remote mgmt console and the host is still stuck while booting, I asked John to check if anything is needed.. Hopefully the host can be unblocked and we don't need to decom, let's wait a bit more [14:33:00] does it make sense? [14:36:36] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 11), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) > Could you please share resource requirements for the operator from your experiments on DSE here so that... [14:40:44] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 11), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) BTW, IIUC {T331283} is an app specific configuration. The flink app (not the flink k8s operator) stores th... [14:40:53] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 11), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) [14:41:21] 10Data-Engineering-Planning, 10serviceops, 10Event-Platform Value Stream (Sprint 11), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) [14:43:21] (03PS3) 10Aqu: Use a disallow list to filter top articles sent to Cassandra [analytics/refinery] - 10https://gerrit.wikimedia.org/r/905701 (https://phabricator.wikimedia.org/T333940) [14:43:46] (03PS4) 10Aqu: Use a disallow list to filter articles sent to Cassandra [analytics/refinery] - 10https://gerrit.wikimedia.org/r/905701 (https://phabricator.wikimedia.org/T333940) [14:45:29] Yes it does, thanks elukey [14:46:09] PROBLEM - eventgate-analytics-external validation error rate too high on alert2001 is CRITICAL: 2.105 gt 2 https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos [14:48:22] (SystemdUnitFailed) firing: (7) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:49:27] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:01:03] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:03:22] (SystemdUnitFailed) firing: (7) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:05:54] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10colewhite) [15:10:15] (EventgateValidationErrors) firing: ... [15:10:16] Eventgate-analytics-external stream eventlogging_VisualEditorFeatureUse validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors [15:25:15] (EventgateValidationErrors) resolved: ... [15:25:16] Eventgate-analytics-external stream eventlogging_VisualEditorFeatureUse validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors [15:34:46] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Q4 eventutilities-python should bundle java deps. - https://phabricator.wikimedia.org/T327251 (10Ottomata) Most likely we should: During packaging: - download the .jar files we need (via make?) - make eventutilities-python setup.cfg use [[ https://do... [15:35:58] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Q4 eventutilities-python should bundle java deps. - https://phabricator.wikimedia.org/T327251 (10Ottomata) In the future, if/when we move wikimedia-event-utilities (java) to gitlab, we should move eventutilities-python into tha repo, and do all the bu... [15:36:59] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Q4 eventutilities-python should bundle java deps. - https://phabricator.wikimedia.org/T327251 (10Ottomata) a:03tchin [15:37:39] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 11): Q4 eventutilities-python should bundle java deps. - https://phabricator.wikimedia.org/T327251 (10Ottomata) [15:52:13] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 11), 10Patch-For-Review: Streaming services errors should be routed to an error event topic. - https://phabricator.wikimedia.org/T326536 (10Ottomata) @gmodena @tchin and I talked today and decided on the convention of: `.error` Wh... [15:59:53] RECOVERY - eventgate-analytics-external validation error rate too high on alert2001 is OK: (C)2 gt (W)1 gt 0.9859 https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos [16:24:45] !log kafka test cluster migrated to bullseye [16:24:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:11:26] (03PS1) 10Snwachukwu: Add referer_name field to pageview_hourly table in hive. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/906073 (https://phabricator.wikimedia.org/T334120) [19:01:55] (03CR) 10Mforns: [V: 03+2 C: 03+2] "+2 On my side, I've tested the monthly job and it works well." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/905701 (https://phabricator.wikimedia.org/T333940) (owner: 10Aqu) [19:04:40] (SystemdUnitFailed) firing: (6) jupyter-dsaez-singleuser-conda-analytics.service Failed on stat1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:16:03] (03PS5) 10Mforns: Use a disallow list to filter articles sent to Cassandra [analytics/refinery] - 10https://gerrit.wikimedia.org/r/905701 (https://phabricator.wikimedia.org/T333940) (owner: 10Aqu) [19:16:40] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Deploying :-)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/905701 (https://phabricator.wikimedia.org/T333940) (owner: 10Aqu) [19:18:09] !log starting refinery deployment to fix aqs rankings [19:18:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:35:58] !log finished refinery deployment to fix aqs rankings\ [19:35:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:51:04] (03PS1) 10Mforns: Add disallowed_cassandra_articles list and fix pageview allowlist. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/906100 (https://phabricator.wikimedia.org/T333950) [19:52:31] (03CR) 10Aqu: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/906100 (https://phabricator.wikimedia.org/T333950) (owner: 10Mforns) [19:54:28] !log starting second refinery deployment to fix aqs rankings [19:54:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:06:27] 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10Jclark-ctr) @elukey so the foreign drives have effected both os drives it will need to be reimaged and is not letting me clear it. I did open the box and did found a... [20:08:52] !log finished second refinery deployment to fix aqs rankings [20:08:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:14:04] 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10Jclark-ctr) @elukey i was able to clear foreign status but will still need to be reimaged. [20:17:52] !log deployed airflow to fix aqs pageview ranks [20:17:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:40:01] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 11), 10Patch-For-Review: Streaming services errors should be routed to an error event topic. - https://phabricator.wikimedia.org/T326536 (10Ottomata) [23:04:40] (SystemdUnitFailed) firing: (6) jupyter-dsaez-singleuser-conda-analytics.service Failed on stat1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:41:45] Hi! What is the process for changes such as https://gerrit.wikimedia.org/r/c/analytics/refinery/+/902571/ ? I checked the README, https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Cluster/Deploy/Refinery, and the Deployments calendar but could not find what best to do to move this forward. [23:42:09] I noticed Reviewer-bot auto-tagged a few people but I don't know if that's the main process used or not.