[01:49:52] <wikibugs>	 10Analytics-Radar, 10Data-Engineering-Icebox, 10MediaWiki-Core-AuthManager, 10MediaWiki-Platform-Team, 10Privacy Engineering: Clear site data on MediaWiki log out - https://phabricator.wikimedia.org/T179752 (10Krinkle)
[03:28:47] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[03:28:47] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[07:17:43] <wikibugs>	 10Data-Platform-SRE, 10DBA, 10cloud-services-team: Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 - https://phabricator.wikimedia.org/T334651 (10Marostegui)
[07:18:01] <wikibugs>	 10Data-Platform-SRE, 10DBA, 10cloud-services-team: Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 - https://phabricator.wikimedia.org/T334651 (10Marostegui) clouddb1015 migrated to 10.6. Leaving it for a few days before going for the last wikireplica of this section
[07:33:30] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[07:33:31] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[09:47:42] <jinxer-wm>	 (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:49:54] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:00:26] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:02:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:33:31] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[11:33:31] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[12:49:23] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Document SRE steps for deploying a new WDQS (and WCQS) host - https://phabricator.wikimedia.org/T330714 (10bking) 05Open→03Resolved
[12:49:34] <wikibugs>	 10Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work), 10Patch-For-Review: Configure new WDQS servers in codfw (wdqs20[13-22]) - https://phabricator.wikimedia.org/T332314 (10bking)
[12:49:51] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Document SRE steps for deploying a new WDQS (and WCQS) host - https://phabricator.wikimedia.org/T330714 (10bking)
[12:50:21] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Document SRE steps for deploying a new WDQS (and WCQS) host - https://phabricator.wikimedia.org/T330714 (10bking) I noticed that I forgot to complete one of the acceptance criteria:     - Create a script or process that verifies "deployment worthiness"...
[12:59:00] <wikibugs>	 10Data-Platform-SRE: WDQS/WCQS: Create a script or process that verifies "deployment worthiness" - https://phabricator.wikimedia.org/T343712 (10bking)
[13:27:37] <wikibugs>	 10Data-Platform-SRE, 10Infrastructure-Foundations, 10SRE, 10vm-requests: codfw: 3 VMs requested for Zookeeper - https://phabricator.wikimedia.org/T343715 (10bking)
[13:29:00] <wikibugs>	 10Data-Platform-SRE, 10Infrastructure-Foundations, 10SRE, 10vm-requests: codfw: 3 VMs requested for Zookeeper - https://phabricator.wikimedia.org/T343715 (10bking)
[13:29:05] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Patch-For-Review: Provision Zookeeper Cluster for storing Flink HA data - https://phabricator.wikimedia.org/T341792 (10bking)
[13:50:45] <wikibugs>	 10Data-Platform-SRE, 10sre-alert-triage: Alert triage: overdue alert [warning] - https://phabricator.wikimedia.org/T343318 (10BTullis) @gmodena and the rest of #event-platform will probably want to know about this. https://wikitech.wikimedia.org/wiki/MediaWiki_Event_Enrichment/SLO/Mediawiki_Page_Content_Change...
[13:51:25] <wikibugs>	 10Data-Platform-SRE, 10sre-alert-triage: Alert: Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - https://phabricator.wikimedia.org/T343318 (10BTullis)
[13:56:18] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install stat1011.eqiad.wmnet - https://phabricator.wikimedia.org/T342454 (10Jclark-ctr)
[13:57:46] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install an-master100[3-4] - https://phabricator.wikimedia.org/T342291 (10Jclark-ctr)
[13:58:40] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye
[14:07:30] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye executed with errors: - an-worker1078 (**FAIL**)   - Downtimed on Icinga...
[14:14:54] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye
[14:29:00] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis) I've started work on the next Hadoop worker, but again it's not straightforward. Firstly, the host didn't boot into PXE mode, so I had to do that manually. Secondly, it stopped at the familiar 'load fi...
[14:38:47] <wikibugs>	 10Quarry, 10superset.wmcloud.org, 10cloud-services-team (FY2023/2024-Q1): Replace Quarry with an installation of Superset - https://phabricator.wikimedia.org/T169452 (10fnegri)
[14:50:48] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis) The reason for this seems to be that the disk containing the root and boot filesystems has been detected as `/dev/sdb` whereas the receipe expects it to be `/dev/sda`. {F37329505,width=50%} I'll see if...
[14:52:11] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10sre-alert-triage: Alert triage: overdue alert [warning] - https://phabricator.wikimedia.org/T343319 (10bking) a:03bking
[14:53:31] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10sre-alert-triage: Alert triage: overdue alert [warning] - https://phabricator.wikimedia.org/T343319 (10bking) [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/454788 | This is how we provisioned the certificate in 2018 ]] . Will check with more...
[14:54:11] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10bking)
[15:00:02] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10bking) Checking the Puppet repo... `modules/profile/files/ssl/search.discovery.wmnet.crt` is valid for `search.discov...
[15:11:54] <wikibugs>	 10Data-Engineering, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Set data permission on new snapshot generation (discovery.wikibase_rdf) - https://phabricator.wikimedia.org/T342416 (10Gehel)
[15:20:59] <wikibugs>	 10Data-Platform-SRE, 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10Gehel)
[15:24:41] <wikibugs>	 10Data-Engineering, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Set data permission on new snapshot generation (discovery.wikibase_rdf) - https://phabricator.wikimedia.org/T342416 (10Gehel)
[15:32:36] <wikibugs>	 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Requesting permission to use kafka-main cluster to transport CirrusSearch updates - https://phabricator.wikimedia.org/T341625 (10bking)
[15:33:31] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[15:33:31] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[15:35:18] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye executed with errors: - an-worker1078 (**FAIL**)   - Removed from Puppet...
[15:35:44] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye
[15:38:07] <wikibugs>	 10Data-Platform-SRE, 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10bking) T162037 might also have more context.
[15:49:46] <wikibugs>	 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) Added the two buckets for the rows in use. ` btullis@cephosd1001:~$ sudo ceph osd crush add-bucket eqiad-e row added bucket eqiad-e type row to crush map btullis@cephosd1001:~$ sudo ce...
[15:50:25] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye executed with errors: - an-worker1078 (**FAIL**)   - Removed from Puppet...
[15:53:35] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye
[16:22:13] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye completed: - an-worker1078 (**PASS**)   - Removed from Puppet and Puppet...
[16:40:56] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1079.eqiad.wmnet with OS bullseye
[17:03:29] <wikibugs>	 10Data-Platform-SRE, 10superset.wikimedia.org: Superset annotation text overlaps illegibly - https://phabricator.wikimedia.org/T279738 (10BTullis) Hi @nettrom_WMF - I've checked [[https://superset.wikimedia.org/r/825|your test case]] again and I can see that the issue remains even in version 1.5.3 of Superset....
[17:09:20] <btullis>	 !log deploying new mediawiki_history  snapshot to AQS
[17:09:22] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:22:46] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1079.eqiad.wmnet with OS bullseye completed: - an-worker1079 (**PASS**)   - Downtimed on Icinga/Alertmanag...
[17:30:56] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] Add mediawiki/cirrussearch/page_rerender [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/935697 (https://phabricator.wikimedia.org/T325565) (owner: 10DCausse)
[17:31:09] <wikibugs>	 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) Looking at it, I think it's going to be better to continue to use the `root=default` bucket at the top of the hierarchy. So now we have `root=default, row=eqiad=e, rack=e1, host=cephos...
[17:31:31] <wikibugs>	 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis)
[17:41:21] <wikibugs>	 10Data-Platform-SRE, 10superset.wikimedia.org: Superset annotation text overlaps illegibly - https://phabricator.wikimedia.org/T279738 (10nettrom_WMF) @BTullis : Thank you for the update! While I no longer maintain the chart where this was an issue, I think switching the chart type to an Echarts type is a perf...
[18:44:11] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Epic: [EPIC] Deployment of the Search Update Pipeline on Flink / k8s - https://phabricator.wikimedia.org/T340548 (10Gehel)
[18:44:13] <wikibugs>	 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Requesting permission to use kafka-main cluster to transport CirrusSearch updates - https://phabricator.wikimedia.org/T341625 (10Gehel)
[18:59:16] <wikibugs>	 10Data-Platform-SRE, 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10bking) After some help from #wikimedia-sre , I was able to get this solved. Basically, the alert is from a check that runs locally on the puppetmaster. Th...
[18:59:44] <wikibugs>	 10Data-Platform-SRE, 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10bking) 05Open→03Resolved
[19:33:31] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[19:33:31] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[20:00:55] <wikibugs>	 10Data-Platform-SRE, 10Epic: [Epic] Migrate all Search Platform servers to Debian Bullseye - https://phabricator.wikimedia.org/T323921 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host wcqs2003.codfw.wmnet with OS bullseye
[20:21:07] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1080.eqiad.wmnet with OS bullseye
[20:39:58] <wikibugs>	 10Data-Platform-SRE, 10superset.wikimedia.org: Superset annotation text overlaps illegibly - https://phabricator.wikimedia.org/T279738 (10BTullis) Great! Thanks @nettrom_WMF - In case it's of any use, I first came across the nvd3 deprecation notice when working on this ticket with @Mayakp.wiki: T301895#7890845...
[20:40:19] <wikibugs>	 10Data-Platform-SRE, 10superset.wikimedia.org: Superset annotation text overlaps illegibly - https://phabricator.wikimedia.org/T279738 (10BTullis) 05Open→03Resolved a:03BTullis
[20:53:15] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Datahub to v0.10.4 - https://phabricator.wikimedia.org/T329514 (10CodeReviewBot) btullis opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/473  Bump the version of the datahub packaged environment
[20:58:21] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10Patch-For-Review: Add checksumming of miniconda installer - https://phabricator.wikimedia.org/T337271 (10CodeReviewBot) btullis merged https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/merge_requests/30  Add checksumming of the miniconda installer
[21:00:05] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10Patch-For-Review: Add checksumming of miniconda installer - https://phabricator.wikimedia.org/T337271 (10BTullis) 05Open→03Resolved
[21:00:47] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10Patch-For-Review: Add checksumming of miniconda installer - https://phabricator.wikimedia.org/T337271 (10BTullis) I've merged [[https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/merge_requests/30?commit_id=4767a329a15756117bc3e2c138a9cc643fff704e...
[21:03:21] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1080.eqiad.wmnet with OS bullseye completed: - an-worker1080 (**PASS**)   - Downtimed on Icinga/Alertmanag...
[21:03:41] <wikibugs>	 10Data-Platform-SRE, 10Epic: [Epic] Migrate all Search Platform servers to Debian Bullseye - https://phabricator.wikimedia.org/T323921 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host wcqs2003.codfw.wmnet with OS bullseye executed with errors: - wcqs2003 (**FAIL**...
[21:03:58] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1081.eqiad.wmnet with OS bullseye
[21:07:37] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] Provide internal schema for CirrusSearch update-pipeline updates. (032 comments) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) (owner: 10Peter Fischer)
[21:09:58] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Datahub to v0.10.4 - https://phabricator.wikimedia.org/T329514 (10CodeReviewBot) btullis merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/473  Bump the version of the datahub packaged environment
[21:37:43] <wikibugs>	 10Data-Platform-SRE: Create conda .deb and docker image - https://phabricator.wikimedia.org/T304450 (10BTullis) I think that we can close this ticket now. We're now using GitLab-CI to build conda-analytics here: https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics We currently build the conda-anal...
[21:42:21] <wikibugs>	 10Data-Platform-SRE, 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10RKemper) Just some investigation we did to understand where the metrics come from: `probe_ssl_earliest_cert_expiry` comes from the blackbox exporter. That...
[21:43:49] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1081.eqiad.wmnet with OS bullseye completed: - an-worker1081 (**PASS**)   - Downtimed on Icinga/Alertmanag...
[21:45:26] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search: Confirm TLS certificate monitoring is in place for Search Platform-owned domains - https://phabricator.wikimedia.org/T343761 (10bking)
[21:47:42] <jinxer-wm>	 (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:49:21] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:51:04] <wikibugs>	 10Data-Platform-SRE: Bring Hadoop workers an-worker11[49-56] into service - https://phabricator.wikimedia.org/T343762 (10BTullis)
[21:53:20] <wikibugs>	 10Data-Platform-SRE: Decommission analytics10[70-77] - https://phabricator.wikimedia.org/T343763 (10BTullis)
[21:54:22] <wikibugs>	 10Data-Platform-SRE: Bring an-mariadb100[12] into service - https://phabricator.wikimedia.org/T284150 (10BTullis)
[22:01:15] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:02:42] <jinxer-wm>	 (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:12:19] <wikibugs>	 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis) 12 of 79 Hadoop workers have now been upgraded. ` btullis@cumin1001:~$ sudo cumin A:hadoop-worker 'cat /etc/debian_version' 79 hosts will be targeted: an-worker[1078-1148].eqiad.wmnet,analytics[1070-10...
[22:14:20] <wikibugs>	 10Data-Platform-SRE, 10Data-Services, 10cloud-services-team: Drop several views from ptwikisource - https://phabricator.wikimedia.org/T332596 (10BTullis) a:03BTullis Claiming this ticket. Apologies for the delay. I will look into it.
[22:23:06] <wikibugs>	 (03CR) 10Shay Nowick: [C: 03+2] "Thanks for feedback - we are using the structure of a different Growth schema: https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/even" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 (owner: 10Sharvaniharan)
[22:24:21] <wikibugs>	 (03Merged) 10jenkins-bot: Android: New schema for image recommendations feature [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 (owner: 10Sharvaniharan)
[22:29:44] <wikibugs>	 (03PS1) 10Sharvaniharan: Revert "Android: New schema for image recommendations feature" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/945804
[22:31:55] <wikibugs>	 (03CR) 10Shay Nowick: [C: 03+2] Android: New schema for image recommendations feature (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 (owner: 10Sharvaniharan)
[22:32:06] <wikibugs>	 (03Abandoned) 10Sharvaniharan: Revert "Android: New schema for image recommendations feature" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/945804 (owner: 10Sharvaniharan)
[23:33:31] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[23:33:31] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability