[00:34:44] <jinxer-wm>	 (SystemdUnitFailed) firing: jupyter-dsaez-singleuser-conda-analytics.service Failed on stat1005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:49:34] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:49:44] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:00:12] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:04:44] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:53:27] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[02:53:28] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[06:04:44] <jinxer-wm>	 (SystemdUnitFailed) firing: jupyter-dsaez-singleuser-conda-analytics.service Failed on stat1005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:53:28] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[06:53:28] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[07:47:44] <wikibugs>	 10Data-Engineering, 10Advanced-Search, 10All-and-every-Wikisource, 10ArticlePlaceholder, and 60 others: Remove unnecessary targets definitions - https://phabricator.wikimedia.org/T328497 (10karapayneWMDE)
[09:29:44] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) jupyter-dsaez-singleuser-conda-analytics.service Failed on stat1005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:30:59] <wikibugs>	 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Icebox, 10Epic: [EPIC] Deprecate mw.eventLog.logEvent() - https://phabricator.wikimedia.org/T317874 (10phuedx)
[09:31:03] <wikibugs>	 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics-Platform-Planning, 10Patch-Needs-Improvement: Deprecate/delete the mw.eventLog.Schema class - https://phabricator.wikimedia.org/T305491 (10phuedx)
[09:34:49] <wikibugs>	 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Icebox, 10Epic: [EPIC] Deprecate mw.eventLog.logEvent() - https://phabricator.wikimedia.org/T317874 (10phuedx)
[10:53:28] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[10:53:28] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[11:56:30] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: [Airflow] Setup Airflow instance for WMDE - https://phabricator.wikimedia.org/T340648 (10Stevemunene) >>! In T340648#9034367, @Manuel wrote: > Could you please add @karapayneWMDE to the parent group? If not, what would be required to do so?  > (see {T284308} for referen...
[12:12:32] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Deploy ceph osd processes to data-engineering cluster - https://phabricator.wikimedia.org/T330151 (10BTullis) The first of the cephosd hosts now has all OSDs active. There are 20 OSDs, numbered 0 to 19. ` btullis@cephosd1001:~$ sudo ceph osd tree ID  CLASS  WEIGHT     T...
[12:19:48] <wikibugs>	 10Data-Platform-SRE, 10DBA, 10cloud-services-team: Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 - https://phabricator.wikimedia.org/T334651 (10Marostegui) @BTullis I was planning to depool the other via the normal haproxy puppet change. But I am happy to try other approaches if you want me to
[13:29:44] <jinxer-wm>	 (SystemdUnitFailed) firing: jupyter-dsaez-singleuser-conda-analytics.service Failed on stat1005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:34:58] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: [Airflow] Setup Airflow instance for WMDE - https://phabricator.wikimedia.org/T340648 (10karapayneWMDE) Request made: https://phabricator.wikimedia.org/T342546
[13:39:59] <wikibugs>	 10Data-Platform-SRE, 10DBA, 10cloud-services-team: Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 - https://phabricator.wikimedia.org/T334651 (10BTullis) >>! In T334651#9037480, @Marostegui wrote: > @BTullis I was planning to depool the other via the normal haproxy puppet change. But I am happy to tr...
[13:42:13] <wikibugs>	 10Data-Platform-SRE, 10Release Pipeline, 10ci-test-error: Limitations on CI fetching files from the wikimedia public datasets archive - https://phabricator.wikimedia.org/T341582 (10BTullis) Adding #data-platform-sre because I think that this might be something to do with us and an-web1001, from where these f...
[13:43:46] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Deploy ceph osd processes to data-engineering cluster - https://phabricator.wikimedia.org/T330151 (10BTullis) All 100 OSD daemons are installed and running. `lines=10 root@cephosd1002:~# ceph osd tree ID   CLASS  WEIGHT      TYPE NAME             STATUS  REWEIGHT  PRI-A...
[13:48:29] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Deploy ceph osd processes to data-engineering cluster - https://phabricator.wikimedia.org/T330151 (10BTullis) Noting down some things to fix, while I think about it:  * Need to install manually: `ceph-osd`, `ceph-volume`, hdparm` * Need to take a copy of `/var/lib/ceph/...
[14:30:53] <wikibugs>	 (03CR) 10Phuedx: [C: 03+1] Add mediawiki/cirrussearch/page-rerender [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/935697 (https://phabricator.wikimedia.org/T325565) (owner: 10DCausse)
[14:35:43] <wikibugs>	 10Data-Platform-SRE, 10DBA, 10cloud-services-team: Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 - https://phabricator.wikimedia.org/T334651 (10fnegri) > #cloud-services-team any objections from your side with this migration?  I don't think we have any objections, cc @aborrero
[14:53:28] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[14:53:28] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[15:15:00] <wikibugs>	 10Data-Platform-SRE, 10SRE, 10vm-requests, 10Discovery-Search (Current work): eqiad: 3 VMs requested for Zookeeper - https://phabricator.wikimedia.org/T341705 (10Gehel) a:03bking
[15:20:39] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Reimage WDQS servers to Bullseye - https://phabricator.wikimedia.org/T328325 (10bking) @MoritzMuehlenhoff Sorry for the delayed response. Some of these will be decommissioned per hardware refresh, see [[ https://docs.google.com/spreadsheets/d/1y3kh8JAYlb3...
[15:20:52] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Reimage WDQS servers to Bullseye - https://phabricator.wikimedia.org/T328325 (10Gehel) a:03bking
[15:22:12] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Reimage wdqs20[13-22] servers to Bullseye - https://phabricator.wikimedia.org/T328325 (10bking)
[15:23:14] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Reimage wdqs20[13-22] servers to Bullseye - https://phabricator.wikimedia.org/T328325 (10bking) Updated ticket message to make the AC more clear...now moving to "Needs Reporting"
[15:26:43] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search (Current work): Ensure WCQS/WDQS stack works on Bullseye - https://phabricator.wikimedia.org/T331300 (10Gehel) Manual steps (see above) needs to be documented on wiki before we close this task.
[15:27:20] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search: Test flink operations/failure scenarios relevant to Search Update Pipeline - https://phabricator.wikimedia.org/T342010 (10bking) a:03bking
[15:29:42] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10Discovery-Search, 10Event-Platform: Test common operations in the flink operator/k8s/Flink ZK environment - https://phabricator.wikimedia.org/T342149 (10Gehel)
[15:29:47] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search: Test flink operations/failure scenarios relevant to Search Update Pipeline - https://phabricator.wikimedia.org/T342010 (10bking) 05Open→03Invalid
[15:30:24] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10Discovery-Search, 10Event-Platform: Test common operations in the flink operator/k8s/Flink ZK environment - https://phabricator.wikimedia.org/T342149 (10Gehel) p:05Triage→03High
[15:30:32] <wikibugs>	 10Data-Platform-SRE, 10Discovery-Search: Test flink operations/failure scenarios relevant to Search Update Pipeline - https://phabricator.wikimedia.org/T342010 (10bking) p:05Triage→03Low
[15:30:44] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Event-Platform: Test common operations in the flink operator/k8s/Flink ZK environment - https://phabricator.wikimedia.org/T342149 (10Gehel)
[15:31:16] <wikibugs>	 10Data-Platform-SRE, 10Release-Engineering-Team, 10Scap: "scap deploy"'s config-deploy should check for broken symlinks - https://phabricator.wikimedia.org/T342162 (10Gehel)
[15:32:01] <wikibugs>	 10Data-Platform-SRE: Examine/refactor WDQS categories update scripts - https://phabricator.wikimedia.org/T342361 (10Gehel)
[15:33:42] <wikibugs>	 10Data-Engineering, 10Discovery-Search, 10Wikidata, 10Wikidata-Query-Service: Set data permission on new snapshot generation (discovery.wikibase_rdf) - https://phabricator.wikimedia.org/T342416 (10Gehel)
[15:35:04] <wikibugs>	 10Data-Platform-SRE: Write new partman recipe for cloudelastic (jbod) - https://phabricator.wikimedia.org/T342463 (10Gehel)
[15:37:08] <wikibugs>	 10Data-Platform-SRE: Write new partman recipe for cloudelastic (jbod) and update relevant Elastic config - https://phabricator.wikimedia.org/T342463 (10bking)
[15:43:50] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Event-Platform: Test common operations in the flink operator/k8s/Flink ZK environment - https://phabricator.wikimedia.org/T342149 (10Gehel)
[17:22:18] <wikibugs>	 10Data-Engineering, 10Movement-Insights, 10Product-Analytics, 10Wmfdata-Python: Enable wmfdata-py to access MariaDB replicas on the cluster - https://phabricator.wikimedia.org/T340467 (10mpopov)
[17:24:21] <wikibugs>	 10Data-Engineering, 10Product-Analytics, 10Wmfdata-Python: Enable wmfdata-py to access MariaDB replicas on the cluster - https://phabricator.wikimedia.org/T340467 (10mpopov) @nshahquinn-wmf: Did you want to keep tabs on this on the Movement Insights board?
[17:29:44] <jinxer-wm>	 (SystemdUnitFailed) firing: jupyter-dsaez-singleuser-conda-analytics.service Failed on stat1005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:44:00] <wikibugs>	 10Quarry: Autocomplete for the database field shows invalid databases - https://phabricator.wikimedia.org/T342569 (10Novem_Linguae)
[18:46:31] <wikibugs>	 10Quarry: [bug] "Internal Server Error" when logging into Quarry - https://phabricator.wikimedia.org/T333043 (10Novem_Linguae) I've had this bug for a couple weeks. Usually when opening quarry for the first time during that browsing session. A refresh fixes it. If I recall correctly, I've never had it happen twi...
[18:52:13] <wikibugs>	 (03CR) 10Milimetric: [C: 04-1] "-1 only because of the name (guideline is to use _ instead of -), everything else follows guidelines." [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/935697 (https://phabricator.wikimedia.org/T325565) (owner: 10DCausse)
[18:53:28] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[18:53:28] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[19:08:51] <jinxer-wm>	 (HdfsRpcQueueLength) firing: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength
[19:14:51] <jinxer-wm>	 (HdfsRpcQueueLength) firing: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength
[19:19:23] <wikibugs>	 10Quarry: Autocomplete for the database field shows invalid databases - https://phabricator.wikimedia.org/T342569 (10Novem_Linguae)
[19:28:51] <jinxer-wm>	 (HdfsRpcQueueLength) resolved: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength
[19:29:51] <jinxer-wm>	 (HdfsRpcQueueLength) resolved: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength
[19:54:51] <jinxer-wm>	 (HdfsRpcQueueLength) firing: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength
[19:58:38] <btullis>	 xcollazo: Just a guess, but might these HDFS RPC call queue length alarms be something to do with you? You've got a pretty hefty job running here: https://yarn.wikimedia.org/cluster/app/application_1688722260742_86888
[20:02:40] <btullis>	 I'm not too concerned by it, because the graph shows that it's not hugely overwhelmed, just that there's much more activity than usual.
[20:19:33] <wikibugs>	 10Data-Platform-SRE: Ensure Data Platform SREs have a contact group in puppet/alerting - https://phabricator.wikimedia.org/T342578 (10bking)
[20:19:51] <jinxer-wm>	 (HdfsRpcQueueLength) resolved: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength
[20:20:51] <jinxer-wm>	 (HdfsRpcQueueLength) firing: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength
[20:25:06] <jinxer-wm>	 (HdfsRpcQueueLength) resolved: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength
[21:25:46] <wikibugs>	 10Quarry: Quarry suggests invalid database names, and doesn't suggest some valid database names - https://phabricator.wikimedia.org/T289943 (10rook)
[21:25:55] <wikibugs>	 10Quarry: Autocomplete for the database field shows invalid databases - https://phabricator.wikimedia.org/T342569 (10rook)
[21:29:44] <jinxer-wm>	 (SystemdUnitFailed) firing: jupyter-dsaez-singleuser-conda-analytics.service Failed on stat1005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:46:14] <wikibugs>	 10Data-Platform-SRE, 10Patch-For-Review: Deploy ceph osd processes to data-engineering cluster - https://phabricator.wikimedia.org/T330151 (10BTullis) I've made two small CRs to suggest fixes for the errors mentioned in T330151#9037871 but there is one more fix that will require a little more thinking about. T...
[21:47:23] <wikibugs>	 (03PS1) 10Tsevener: Update schemas for iOS diff view changes [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/941012 (https://phabricator.wikimedia.org/T341896)
[22:01:48] <wikibugs>	 10Data-Platform-SRE, 10Epic: Install Ceph Cluster for Data Engineering - https://phabricator.wikimedia.org/T324660 (10BTullis)
[22:01:51] <jinxer-wm>	 (HdfsTotalFilesHeap) firing: Total files on the analytics-hadoop HDFS cluster are more than the heap can support. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_total_files_and_heap_size - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=28&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsTotalFilesHeap
[22:09:14] <wikibugs>	 10Data-Engineering, 10Product-Analytics, 10User-Iflorez: Use Hive/Spark timestamps in Refined event data - https://phabricator.wikimedia.org/T278467 (10Iflorez)
[22:11:10] <wikibugs>	 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) Moving this task to in-progress, so that I can use it to record the pool creation and the related crush rules.
[22:15:18] <wikibugs>	 10Data-Platform-SRE: Alert: Total files on the analytics-hadoop HDFS cluster are more than the heap can support. - https://phabricator.wikimedia.org/T342587 (10BTullis)
[22:31:59] <wikibugs>	 10Data-Platform-SRE: Alert: Total files on the analytics-hadoop HDFS cluster are more than the heap can support. - https://phabricator.wikimedia.org/T342587 (10BTullis) p:05Triage→03Medium This is not super-urgent to fix. It might be related to some work on the Iceberg migration by @xcollazo. I know that rec...
[22:53:28] <jinxer-wm>	 (MediawikiPageContentChangeEnrichAvailability) firing: ...
[22:53:28] <jinxer-wm>	 Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability
[23:30:49] <wikibugs>	 10Data-Platform-SRE: Decommission wdqs200[4-6] - https://phabricator.wikimedia.org/T342035 (10RKemper)