[00:16:42] <jinxer-wm>	 (SystemdUnitFailed) firing: hardsync-published.service Failed on an-web1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:19:32] <icinga-wm>	 PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:30:18] <icinga-wm>	 RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:36:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: hardsync-published.service Failed on an-web1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:42:58] <icinga-wm>	 PROBLEM - puppet last run on an-worker1145 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[04:53:42] <jinxer-wm>	 (SystemdUnitFailed) firing: kube-controller-manager.service Failed on dse-k8s-ctrl1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:58:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: kube-controller-manager.service Failed on dse-k8s-ctrl1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:04:12] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) kube-controller-manager.service Failed on dse-k8s-ctrl1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:09:12] <jinxer-wm>	 (SystemdUnitFailed) resolved: (2) kube-controller-manager.service Failed on dse-k8s-ctrl1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:02:42] <jinxer-wm>	 (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:04:50] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:15:38] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:17:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:08:25] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1069.eqiad.wmnet - https://phabricator.wikimedia.org/T341209 (10Stevemunene)
[08:29:25] <wikibugs>	 10Data-Engineering, 10Content-Transform-Team, 10Event-Platform: [session length] Investigate slight drop at sessions of 30 minutes or more - https://phabricator.wikimedia.org/T280254 (10Aklapper)
[08:29:48] <wikibugs>	 10Data-Engineering, 10Content-Transform-Team, 10Event-Platform: [session length] Change domain of event collection to avoid ad-blocker issue - https://phabricator.wikimedia.org/T280256 (10Aklapper)
[08:34:35] <wikibugs>	 10Data-Engineering-Planning, 10Data Engineering and Event Platform Team, 10Data Pipelines: [Iceberg] Migrate event_sanitized_iceberg to event_sanitized - https://phabricator.wikimedia.org/T311737 (10Aklapper) Please do add also codebase project tags to tasks and not only team tags as WMF loves to change teams.
[08:42:37] <moritzm>	 FYI; I'm doing a rolling restart of aqs to pick up the c-ares security updates
[08:43:06] <btullis>	 Ack, many thanks. Of the cassandra services?
[08:43:22] <moritzm>	 no, just the node-based aqs service itself
[08:43:31] <btullis>	 OK, thanks.
[08:43:34] <moritzm>	 node uses c-ares heavily under the hood
[08:45:49] <btullis>	 Gotcha, thanks.
[08:53:27] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Hadoop test cluster to Bullseye - https://phabricator.wikimedia.org/T329363 (10BTullis)
[09:24:05] <moritzm>	 dse-k8s-etcd1001 will briefly go down for a Ganeti reboot
[09:24:27] <btullis>	 moritzm: 👍 thx
[09:32:42] <jinxer-wm>	 (SystemdUnitFailed) firing: nagios-nrpe-server.service Failed on dse-k8s-etcd1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:37:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: nagios-nrpe-server.service Failed on dse-k8s-etcd1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:38:31] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Datahub to v0.10.4 - https://phabricator.wikimedia.org/T329514 (10BTullis) I've noticed that the production and staging instances of datahub share a single schema registry, namely `karapace1001.eqiad.wmnet:8081`  I think that this is likely to cause issue for us...
[10:30:33] <wikibugs>	 10Data-Engineering, 10Event-Platform (Sprint 14 B), 10Patch-For-Review: mediawiki-event-enrichment taskmanager crashes at startup - https://phabricator.wikimedia.org/T341096 (10CodeReviewBot) gmodena merged https://gitlab.wikimedia.org/repos/data-engineering/eventutilities-python/-/merge_requests/77  Add sch...
[10:42:33] <elukey>	 btullis: o/
[10:42:40] <elukey>	 I am going to deploy eventgate-main
[10:43:16] <btullis>	 elukey: OK, is this a new schema?
[10:45:18] <btullis>	 elukey: Because if it is, this step might not be necessary any more, since Andrew did this: https://phabricator.wikimedia.org/T340166
[10:46:06] <elukey>	 btullis: ah nono sorry it is a kafka queueing settings
[10:46:34] <btullis>	 elukey: OK, still cool by me :-)
[10:47:42] <elukey>	 ack thanks! :)
[10:49:48] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Datahub to v0.10.4 - https://phabricator.wikimedia.org/T329514 (10BTullis) Oh, now we have a really useful error from the kafka-setup job. ` Error while executing config command with args '--command-config /tmp/connection.properties --bootstrap-server kafka-test...
[10:53:46] <btullis>	 I'm about to fail back the hadoop namenode service from an-master1002 to an-master1001
[10:53:51] <btullis>	 https://www.irccloud.com/pastebin/04zCSN4K/
[10:55:56] <btullis>	 !log `sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet` on an-master1001
[10:55:58] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[10:56:47] <btullis>	 https://www.irccloud.com/pastebin/dKmh3ews/
[11:15:14] <wikibugs>	 10Analytics, 10Data-Engineering-Icebox: Create a tool checking HDFS data size - https://phabricator.wikimedia.org/T256644 (10JAllemandou)
[11:16:17] <wikibugs>	 10Data-Engineering: spark3 in yarn master mode exhibits warnings when the HDFS namenodes are in the failed over state - https://phabricator.wikimedia.org/T338137 (10ntsako) I also experienced similar issues to the one above.  I was running my Spark application on `stat1004` via Airflow using the `analytics-priva...
[11:16:22] <wikibugs>	 10Analytics, 10Data-Engineering-Icebox: Create a tool checking HDFS data size - https://phabricator.wikimedia.org/T256644 (10JAllemandou) Hi @Gopavasanth, I updated the task description and title. Let me know if you wish more details!
[11:46:51] <jinxer-wm>	 (HdfsFSImageAge) firing: The HDFS FSImage on analytics-hadoop:an-master1001:10080 is too old. - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Alerts#HDFS_FSImage_too_old - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&viewPanel=129&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsFSImageAge
[11:51:51] <jinxer-wm>	 (HdfsFSImageAge) resolved: The HDFS FSImage on analytics-hadoop:an-master1001:10080 is too old. - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Alerts#HDFS_FSImage_too_old - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&viewPanel=129&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsFSImageAge
[11:55:44] <wikibugs>	 10Data-Engineering, 10Event-Platform (Sprint 14 B), 10Patch-For-Review: mediawiki-event-enrichment taskmanager crashes at startup - https://phabricator.wikimedia.org/T341096 (10CodeReviewBot) gmodena opened https://gitlab.wikimedia.org/repos/data-engineering/mediawiki-event-enrichment/-/merge_requests/66  ev...
[12:21:48] <btullis>	 moritzm: Are you happy for me to go ahead and create a new karapace VM: https://phabricator.wikimedia.org/T341464
[12:43:16] <wikibugs>	 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 0), 10Event-Platform (Sprint 14 B): jsonschema-tools test should fail if fields are removed in new (non major) version - https://phabricator.wikimedia.org/T340765 (10tchin)
[12:55:53] <moritzm>	 btullis: sure thing, can you use group A?
[12:56:02] <moritzm>	 it's the least used currently
[13:10:17] <btullis>	 Great, thanks.
[13:11:09] <wikibugs>	 10Data-Platform-SRE, 10SRE, 10vm-requests: eqiad: 1 VM requested for karapace in support of datahub in staging - https://phabricator.wikimedia.org/T341464 (10BTullis)
[13:19:43] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Datahub to v0.10.4 - https://phabricator.wikimedia.org/T329514 (10BTullis) Initial testing of the internal schema registry for datahub didn't work very well, so rather than proceeding with that right now I'm going to create a second karapace instance in {T341464...
[13:20:36] <wikibugs>	 10Data-Engineering, 10Event-Platform (Sprint 14 B), 10Patch-For-Review: mediawiki-event-enrichment taskmanager crashes at startup - https://phabricator.wikimedia.org/T341096 (10CodeReviewBot) gmodena merged https://gitlab.wikimedia.org/repos/data-engineering/mediawiki-event-enrichment/-/merge_requests/66  ev...
[13:26:00] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1058.eqiad.wmnet - https://phabricator.wikimedia.org/T338227 (10Stevemunene)
[13:26:32] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1059.eqiad.wmnet - https://phabricator.wikimedia.org/T338408 (10Stevemunene)
[13:26:47] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1060.eqiad.wmnet - https://phabricator.wikimedia.org/T338409 (10Stevemunene)
[13:27:05] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1061.eqiad.wmnet - https://phabricator.wikimedia.org/T339199 (10Stevemunene)
[13:27:08] <wikibugs>	 10Data-Platform-SRE, 10SRE, 10vm-requests: eqiad: 1 VM requested for karapace in support of datahub in staging - https://phabricator.wikimedia.org/T341464 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host karapace1002.eqiad.wmnet with OS bullseye
[13:27:17] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1062.eqiad.wmnet - https://phabricator.wikimedia.org/T339200 (10Stevemunene)
[13:27:31] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1063.eqiad.wmnet - https://phabricator.wikimedia.org/T339201 (10Stevemunene)
[13:27:52] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1064.eqiad.wmnet - https://phabricator.wikimedia.org/T341204 (10Stevemunene)
[13:28:27] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1066.eqiad.wmnet - https://phabricator.wikimedia.org/T341206 (10Stevemunene)
[13:28:57] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1067.eqiad.wmnet - https://phabricator.wikimedia.org/T341207 (10Stevemunene)
[13:29:25] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1068.eqiad.wmnet - https://phabricator.wikimedia.org/T341208 (10Stevemunene)
[13:29:40] <wikibugs>	 10Data-Platform-SRE, 10decommission-hardware: decommission analytics1069.eqiad.wmnet - https://phabricator.wikimedia.org/T341209 (10Stevemunene)
[13:53:03] <wikibugs>	 10Data-Platform-SRE, 10SRE, 10vm-requests, 10Patch-For-Review: eqiad: 1 VM requested for karapace in support of datahub in staging - https://phabricator.wikimedia.org/T341464 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host karapace1002.eqiad.wmnet with OS b...
[13:54:43] <wikibugs>	 10Data-Platform-SRE, 10SRE, 10vm-requests, 10Patch-For-Review: eqiad: 1 VM requested for karapace in support of datahub in staging - https://phabricator.wikimedia.org/T341464 (10BTullis) 05Open→03Resolved
[14:00:34] <wikibugs>	 10Data-Platform-SRE: an-worker1145 has a problem - https://phabricator.wikimedia.org/T341481 (10BTullis)
[14:00:50] <wikibugs>	 10Data-Platform-SRE: an-worker1145 has a problem - https://phabricator.wikimedia.org/T341481 (10BTullis) p:05Triage→03High a:03BTullis
[14:01:37] <wikibugs>	 10Data-Platform-SRE: an-worker1145 has a problem - https://phabricator.wikimedia.org/T341481 (10BTullis) I've logged in via the SOL console and I can see that there is a problem with the storage controller. This kind of thing is scrolling past on the console. ` [14278134.771367] systemd[22798]: confd.service: Fa...
[14:02:33] <btullis>	 !log powered off an-worker1145 for T341481
[14:02:36] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:02:36] <stashbot>	 T341481: an-worker1145 has a problem - https://phabricator.wikimedia.org/T341481
[14:03:07] <btullis>	 Giving it a few minutes to think about what it's done.
[14:03:58] <icinga-wm>	 PROBLEM - Host an-worker1145 is DOWN: PING CRITICAL - Packet loss = 100%
[14:04:22] <btullis>	 !log powered on an-worker1145
[14:04:24] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:05:14] <icinga-wm>	 ACKNOWLEDGEMENT - SSH on an-worker1145 is CRITICAL: CRITICAL - Socket timeout after 10 seconds Btullis Cold booted for T341481 https://wikitech.wikimedia.org/wiki/SSH/monitoring
[14:05:14] <icinga-wm>	 ACKNOWLEDGEMENT - Hadoop DataNode on an-worker1145 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode Btullis Cold booted for T341481 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_Datanode_process
[14:05:14] <icinga-wm>	 ACKNOWLEDGEMENT - Host an-worker1145 is DOWN: PING CRITICAL - Packet loss = 100% Btullis Cold booted for T341481
[14:06:09] <wikibugs>	 10Data-Platform-SRE: an-worker1145 has a problem - https://phabricator.wikimedia.org/T341481 (10BTullis) Cold booted the host. We'll see if this reinitializes the storage system, or whether it fails to boot.
[14:07:26] <icinga-wm>	 RECOVERY - Host an-worker1145 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms
[14:07:36] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1145 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:07:38] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1145 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[14:09:34] <icinga-wm>	 RECOVERY - Hadoop DataNode on an-worker1145 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_Datanode_process
[14:11:12] <wikibugs>	 10Data-Platform-SRE: an-worker1145 has a problem - https://phabricator.wikimedia.org/T341481 (10BTullis) Server appears to have booted correctly and all services are recovering.
[14:14:48] <icinga-wm>	 RECOVERY - puppet last run on an-worker1145 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[15:14:50] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Deployment of the Search Update Pipeline on Flink / k8s - https://phabricator.wikimedia.org/T340548 (10bking)
[15:15:08] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Determine whether or not to change CPU frequency governor on Search Platform-owned hosts - https://phabricator.wikimedia.org/T340554 (10bking)
[15:19:38] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Unmanaged envoyproxy installation on wdqs1009 and wdqs1010 - https://phabricator.wikimedia.org/T341042 (10bking)
[15:21:30] <wikibugs>	 10Data-Platform-SRE, 10serviceops-radar, 10Discovery-Search (Current work): Test version compatibility between production Kafka and newer ZooKeeper - https://phabricator.wikimedia.org/T341137 (10bking)
[15:29:53] <wikibugs>	 10Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): qlever dblp endpoint for wikidata federated query nomination - https://phabricator.wikimedia.org/T339347 (10bking)
[15:32:51] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10KaiOS-Wikipedia-app (Discovery), 10Patch-For-Review: Implement depool (source only) and keep-downtime options on data-transfer cookbook - https://phabricator.wikimedia.org/T340793 (10bking)
[15:36:30] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Document SRE steps for deploying a new WDQS (and WCQS) host - https://phabricator.wikimedia.org/T330714 (10bking)
[15:37:29] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Patch-For-Review: Diagnose and fix WDQS deployment process - https://phabricator.wikimedia.org/T341290 (10bking) 05Open→03Declined a:03bking
[15:42:01] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Epic: [EPIC] Deployment of the Search Update Pipeline on Flink / k8s - https://phabricator.wikimedia.org/T340548 (10TJones)
[15:42:05] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Determine whether or not to change CPU frequency governor on Search Platform-owned hosts - https://phabricator.wikimedia.org/T340554 (10bking) 05Open→03Invalid a:03bking
[15:44:08] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Unmanaged envoyproxy installation on wdqs1009 and wdqs1010 - https://phabricator.wikimedia.org/T341042 (10bking)
[17:08:34] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] "do we need to decide anything else before merging this?" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/935697 (https://phabricator.wikimedia.org/T325565) (owner: 10DCausse)
[17:10:39] <wikibugs>	 (03CR) 10DCausse: "@Sam adding you as a reviewer per https://wikitech.wikimedia.org/wiki/Event_Platform/Maintainers. I don't think anyone in the search platf" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/935697 (https://phabricator.wikimedia.org/T325565) (owner: 10DCausse)
[17:46:20] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Sustainability (Incident Followup): Create Turnilo/Superset dashboards for identifying users w/ excessive WDQS queries - https://phabricator.wikimedia.org/T338159 (10EBernhardson) It looks like we added only the link, could we add a paragraph about ho...
[19:38:14] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Reimage WDQS servers to Bullseye - https://phabricator.wikimedia.org/T328325 (10bking) This is complete. Closing...  `   ansible codfw_tbd -i wdqs.hosts -m shell -a "cat /etc/debian_version"  wdqs2017.codfw.wmnet | CHANGED | rc=0 >> 11.7 wdqs2016.codfw.wm...
[19:43:48] <wikibugs>	 10Data-Engineering, 10Data-Engineering-Wikistats: no view data by country for the last month (June 2023) - https://phabricator.wikimedia.org/T341523 (10Rtfroot)
[21:23:50] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Ensure WCQS/WDQS stack works on Bullseye - https://phabricator.wikimedia.org/T331300 (10bking)
[21:23:53] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10KaiOS-Wikipedia-app (Discovery), 10Patch-For-Review: Implement depool (source only) and keep-downtime options on data-transfer cookbook - https://phabricator.wikimedia.org/T340793 (10bking)
[21:38:58] <wikibugs>	 10Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Configure new WDQS servers in codfw (wdqs20[13-22]) - https://phabricator.wikimedia.org/T332314 (10bking) Update: I forgot to target 2013 in my last command, here is the latest list of hosts that need a data trans...
[21:54:01] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Datahub to v0.10.4 - https://phabricator.wikimedia.org/T329514 (10BTullis) This is a first! I've successfully ingested sample data to the staging deployment of datahub. This is great because it shows that end-to-end ingestion works with 0.10.4. {F37135199,width=...
[22:30:45] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Datahub to v0.10.4 - https://phabricator.wikimedia.org/T329514 (10BTullis) I'm going to aim for an upgrade of the production deployments tomorrow at approximately 10:00 UTC.  I'll take a `mydumper` backup of the database on an-coord1001 before I start, in case I...