[01:49:52] 10Analytics-Radar, 10Data-Engineering-Icebox, 10MediaWiki-Core-AuthManager, 10MediaWiki-Platform-Team, 10Privacy Engineering: Clear site data on MediaWiki log out - https://phabricator.wikimedia.org/T179752 (10Krinkle) [03:28:47] (MediawikiPageContentChangeEnrichAvailability) firing: ... [03:28:47] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [07:17:43] 10Data-Platform-SRE, 10DBA, 10cloud-services-team: Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 - https://phabricator.wikimedia.org/T334651 (10Marostegui) [07:18:01] 10Data-Platform-SRE, 10DBA, 10cloud-services-team: Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 - https://phabricator.wikimedia.org/T334651 (10Marostegui) clouddb1015 migrated to 10.6. Leaving it for a few days before going for the last wikireplica of this section [07:33:30] (MediawikiPageContentChangeEnrichAvailability) firing: ... [07:33:31] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [09:47:42] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:49:54] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:00:26] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:02:42] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:33:31] (MediawikiPageContentChangeEnrichAvailability) firing: ... [11:33:31] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [12:49:23] 10Data-Platform-SRE, 10Discovery-Search (Current work): Document SRE steps for deploying a new WDQS (and WCQS) host - https://phabricator.wikimedia.org/T330714 (10bking) 05Open→03Resolved [12:49:34] 10Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work), 10Patch-For-Review: Configure new WDQS servers in codfw (wdqs20[13-22]) - https://phabricator.wikimedia.org/T332314 (10bking) [12:49:51] 10Data-Platform-SRE, 10Discovery-Search (Current work): Document SRE steps for deploying a new WDQS (and WCQS) host - https://phabricator.wikimedia.org/T330714 (10bking) [12:50:21] 10Data-Platform-SRE, 10Discovery-Search (Current work): Document SRE steps for deploying a new WDQS (and WCQS) host - https://phabricator.wikimedia.org/T330714 (10bking) I noticed that I forgot to complete one of the acceptance criteria: - Create a script or process that verifies "deployment worthiness"... [12:59:00] 10Data-Platform-SRE: WDQS/WCQS: Create a script or process that verifies "deployment worthiness" - https://phabricator.wikimedia.org/T343712 (10bking) [13:27:37] 10Data-Platform-SRE, 10Infrastructure-Foundations, 10SRE, 10vm-requests: codfw: 3 VMs requested for Zookeeper - https://phabricator.wikimedia.org/T343715 (10bking) [13:29:00] 10Data-Platform-SRE, 10Infrastructure-Foundations, 10SRE, 10vm-requests: codfw: 3 VMs requested for Zookeeper - https://phabricator.wikimedia.org/T343715 (10bking) [13:29:05] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Patch-For-Review: Provision Zookeeper Cluster for storing Flink HA data - https://phabricator.wikimedia.org/T341792 (10bking) [13:50:45] 10Data-Platform-SRE, 10sre-alert-triage: Alert triage: overdue alert [warning] - https://phabricator.wikimedia.org/T343318 (10BTullis) @gmodena and the rest of #event-platform will probably want to know about this. https://wikitech.wikimedia.org/wiki/MediaWiki_Event_Enrichment/SLO/Mediawiki_Page_Content_Change... [13:51:25] 10Data-Platform-SRE, 10sre-alert-triage: Alert: Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - https://phabricator.wikimedia.org/T343318 (10BTullis) [13:56:18] 10Data-Engineering, 10Data-Platform-SRE, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install stat1011.eqiad.wmnet - https://phabricator.wikimedia.org/T342454 (10Jclark-ctr) [13:57:46] 10Data-Engineering, 10Data-Platform-SRE, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install an-master100[3-4] - https://phabricator.wikimedia.org/T342291 (10Jclark-ctr) [13:58:40] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye [14:07:30] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye executed with errors: - an-worker1078 (**FAIL**) - Downtimed on Icinga... [14:14:54] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye [14:29:00] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis) I've started work on the next Hadoop worker, but again it's not straightforward. Firstly, the host didn't boot into PXE mode, so I had to do that manually. Secondly, it stopped at the familiar 'load fi... [14:38:47] 10Quarry, 10superset.wmcloud.org, 10cloud-services-team (FY2023/2024-Q1): Replace Quarry with an installation of Superset - https://phabricator.wikimedia.org/T169452 (10fnegri) [14:50:48] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis) The reason for this seems to be that the disk containing the root and boot filesystems has been detected as `/dev/sdb` whereas the receipe expects it to be `/dev/sda`. {F37329505,width=50%} I'll see if... [14:52:11] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10sre-alert-triage: Alert triage: overdue alert [warning] - https://phabricator.wikimedia.org/T343319 (10bking) a:03bking [14:53:31] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10sre-alert-triage: Alert triage: overdue alert [warning] - https://phabricator.wikimedia.org/T343319 (10bking) [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/454788 | This is how we provisioned the certificate in 2018 ]] . Will check with more... [14:54:11] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10bking) [15:00:02] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10bking) Checking the Puppet repo... `modules/profile/files/ssl/search.discovery.wmnet.crt` is valid for `search.discov... [15:11:54] 10Data-Engineering, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Set data permission on new snapshot generation (discovery.wikibase_rdf) - https://phabricator.wikimedia.org/T342416 (10Gehel) [15:20:59] 10Data-Platform-SRE, 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10Gehel) [15:24:41] 10Data-Engineering, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Set data permission on new snapshot generation (discovery.wikibase_rdf) - https://phabricator.wikimedia.org/T342416 (10Gehel) [15:32:36] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Requesting permission to use kafka-main cluster to transport CirrusSearch updates - https://phabricator.wikimedia.org/T341625 (10bking) [15:33:31] (MediawikiPageContentChangeEnrichAvailability) firing: ... [15:33:31] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [15:35:18] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye executed with errors: - an-worker1078 (**FAIL**) - Removed from Puppet... [15:35:44] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye [15:38:07] 10Data-Platform-SRE, 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10bking) T162037 might also have more context. [15:49:46] 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) Added the two buckets for the rows in use. ` btullis@cephosd1001:~$ sudo ceph osd crush add-bucket eqiad-e row added bucket eqiad-e type row to crush map btullis@cephosd1001:~$ sudo ce... [15:50:25] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye executed with errors: - an-worker1078 (**FAIL**) - Removed from Puppet... [15:53:35] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye [16:22:13] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1078.eqiad.wmnet with OS bullseye completed: - an-worker1078 (**PASS**) - Removed from Puppet and Puppet... [16:40:56] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1079.eqiad.wmnet with OS bullseye [17:03:29] 10Data-Platform-SRE, 10superset.wikimedia.org: Superset annotation text overlaps illegibly - https://phabricator.wikimedia.org/T279738 (10BTullis) Hi @nettrom_WMF - I've checked [[https://superset.wikimedia.org/r/825|your test case]] again and I can see that the issue remains even in version 1.5.3 of Superset.... [17:09:20] !log deploying new mediawiki_history snapshot to AQS [17:09:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:22:46] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1079.eqiad.wmnet with OS bullseye completed: - an-worker1079 (**PASS**) - Downtimed on Icinga/Alertmanag... [17:30:56] (03CR) 10Ebernhardson: [C: 03+1] Add mediawiki/cirrussearch/page_rerender [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/935697 (https://phabricator.wikimedia.org/T325565) (owner: 10DCausse) [17:31:09] 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) Looking at it, I think it's going to be better to continue to use the `root=default` bucket at the top of the hierarchy. So now we have `root=default, row=eqiad=e, rack=e1, host=cephos... [17:31:31] 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) [17:41:21] 10Data-Platform-SRE, 10superset.wikimedia.org: Superset annotation text overlaps illegibly - https://phabricator.wikimedia.org/T279738 (10nettrom_WMF) @BTullis : Thank you for the update! While I no longer maintain the chart where this was an issue, I think switching the chart type to an Echarts type is a perf... [18:44:11] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Epic: [EPIC] Deployment of the Search Update Pipeline on Flink / k8s - https://phabricator.wikimedia.org/T340548 (10Gehel) [18:44:13] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Requesting permission to use kafka-main cluster to transport CirrusSearch updates - https://phabricator.wikimedia.org/T341625 (10Gehel) [18:59:16] 10Data-Platform-SRE, 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10bking) After some help from #wikimedia-sre , I was able to get this solved. Basically, the alert is from a check that runs locally on the puppetmaster. Th... [18:59:44] 10Data-Platform-SRE, 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10bking) 05Open→03Resolved [19:33:31] (MediawikiPageContentChangeEnrichAvailability) firing: ... [19:33:31] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [20:00:55] 10Data-Platform-SRE, 10Epic: [Epic] Migrate all Search Platform servers to Debian Bullseye - https://phabricator.wikimedia.org/T323921 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host wcqs2003.codfw.wmnet with OS bullseye [20:21:07] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1080.eqiad.wmnet with OS bullseye [20:39:58] 10Data-Platform-SRE, 10superset.wikimedia.org: Superset annotation text overlaps illegibly - https://phabricator.wikimedia.org/T279738 (10BTullis) Great! Thanks @nettrom_WMF - In case it's of any use, I first came across the nvd3 deprecation notice when working on this ticket with @Mayakp.wiki: T301895#7890845... [20:40:19] 10Data-Platform-SRE, 10superset.wikimedia.org: Superset annotation text overlaps illegibly - https://phabricator.wikimedia.org/T279738 (10BTullis) 05Open→03Resolved a:03BTullis [20:53:15] 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Datahub to v0.10.4 - https://phabricator.wikimedia.org/T329514 (10CodeReviewBot) btullis opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/473 Bump the version of the datahub packaged environment [20:58:21] 10Data-Engineering, 10Data-Platform-SRE, 10Patch-For-Review: Add checksumming of miniconda installer - https://phabricator.wikimedia.org/T337271 (10CodeReviewBot) btullis merged https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/merge_requests/30 Add checksumming of the miniconda installer [21:00:05] 10Data-Engineering, 10Data-Platform-SRE, 10Patch-For-Review: Add checksumming of miniconda installer - https://phabricator.wikimedia.org/T337271 (10BTullis) 05Open→03Resolved [21:00:47] 10Data-Engineering, 10Data-Platform-SRE, 10Patch-For-Review: Add checksumming of miniconda installer - https://phabricator.wikimedia.org/T337271 (10BTullis) I've merged [[https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/merge_requests/30?commit_id=4767a329a15756117bc3e2c138a9cc643fff704e... [21:03:21] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1080.eqiad.wmnet with OS bullseye completed: - an-worker1080 (**PASS**) - Downtimed on Icinga/Alertmanag... [21:03:41] 10Data-Platform-SRE, 10Epic: [Epic] Migrate all Search Platform servers to Debian Bullseye - https://phabricator.wikimedia.org/T323921 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host wcqs2003.codfw.wmnet with OS bullseye executed with errors: - wcqs2003 (**FAIL**... [21:03:58] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1081.eqiad.wmnet with OS bullseye [21:07:37] (03CR) 10Ebernhardson: [C: 03+1] Provide internal schema for CirrusSearch update-pipeline updates. (032 comments) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) (owner: 10Peter Fischer) [21:09:58] 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Datahub to v0.10.4 - https://phabricator.wikimedia.org/T329514 (10CodeReviewBot) btullis merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/473 Bump the version of the datahub packaged environment [21:37:43] 10Data-Platform-SRE: Create conda .deb and docker image - https://phabricator.wikimedia.org/T304450 (10BTullis) I think that we can close this ticket now. We're now using GitLab-CI to build conda-analytics here: https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics We currently build the conda-anal... [21:42:21] 10Data-Platform-SRE, 10sre-alert-triage: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire - https://phabricator.wikimedia.org/T343319 (10RKemper) Just some investigation we did to understand where the metrics come from: `probe_ssl_earliest_cert_expiry` comes from the blackbox exporter. That... [21:43:49] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1081.eqiad.wmnet with OS bullseye completed: - an-worker1081 (**PASS**) - Downtimed on Icinga/Alertmanag... [21:45:26] 10Data-Platform-SRE, 10Discovery-Search: Confirm TLS certificate monitoring is in place for Search Platform-owned domains - https://phabricator.wikimedia.org/T343761 (10bking) [21:47:42] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:49:21] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:51:04] 10Data-Platform-SRE: Bring Hadoop workers an-worker11[49-56] into service - https://phabricator.wikimedia.org/T343762 (10BTullis) [21:53:20] 10Data-Platform-SRE: Decommission analytics10[70-77] - https://phabricator.wikimedia.org/T343763 (10BTullis) [21:54:22] 10Data-Platform-SRE: Bring an-mariadb100[12] into service - https://phabricator.wikimedia.org/T284150 (10BTullis) [22:01:15] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:02:42] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:12:19] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis) 12 of 79 Hadoop workers have now been upgraded. ` btullis@cumin1001:~$ sudo cumin A:hadoop-worker 'cat /etc/debian_version' 79 hosts will be targeted: an-worker[1078-1148].eqiad.wmnet,analytics[1070-10... [22:14:20] 10Data-Platform-SRE, 10Data-Services, 10cloud-services-team: Drop several views from ptwikisource - https://phabricator.wikimedia.org/T332596 (10BTullis) a:03BTullis Claiming this ticket. Apologies for the delay. I will look into it. [22:23:06] (03CR) 10Shay Nowick: [C: 03+2] "Thanks for feedback - we are using the structure of a different Growth schema: https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/even" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 (owner: 10Sharvaniharan) [22:24:21] (03Merged) 10jenkins-bot: Android: New schema for image recommendations feature [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 (owner: 10Sharvaniharan) [22:29:44] (03PS1) 10Sharvaniharan: Revert "Android: New schema for image recommendations feature" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/945804 [22:31:55] (03CR) 10Shay Nowick: [C: 03+2] Android: New schema for image recommendations feature (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 (owner: 10Sharvaniharan) [22:32:06] (03Abandoned) 10Sharvaniharan: Revert "Android: New schema for image recommendations feature" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/945804 (owner: 10Sharvaniharan) [23:33:31] (MediawikiPageContentChangeEnrichAvailability) firing: ... [23:33:31] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability