[03:03:28] (MediawikiPageContentChangeEnrichAvailability) firing: ... [03:03:28] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [04:24:44] (SystemdUnitFailed) firing: kube-controller-manager.service Failed on dse-k8s-ctrl1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:29:44] (SystemdUnitFailed) resolved: kube-controller-manager.service Failed on dse-k8s-ctrl1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:34:44] (SystemdUnitFailed) firing: (2) kube-controller-manager.service Failed on dse-k8s-ctrl1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:39:44] (SystemdUnitFailed) resolved: (2) kube-controller-manager.service Failed on dse-k8s-ctrl1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:53:48] 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 1), 10Event-Platform, 10Patch-For-Review: mediawiki page_content_change should generate new meta.id field - https://phabricator.wikimedia.org/T341277 (10CodeReviewBot) tchin opened https://gitlab.wikimedia.org/repos/data-engineering/med... [08:43:38] 10Data-Platform-SRE, 10sre-alert-triage: Alert triage: overdue warning alert - https://phabricator.wikimedia.org/T342762 (10dcausse) There was a stale `/srv/query_service/aliases.map` file with some content in it (that I copied to `/root/aliases.map.T342762`) which I believe was confusing nginx causing it to r... [08:54:56] (03CR) 10DCausse: Provide internal schema for CirrusSearch update-pipeline updates. (034 comments) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) (owner: 10Peter Fischer) [10:00:18] (03PS1) 10Nmaphophe: GDI Equity Landscape Tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/941911 [10:29:58] 10Data-Engineering, 10Growth-Team, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Icebox, and 4 others: [EPIC] Deprecate EventLogging::logEvent() - https://phabricator.wikimedia.org/T318263 (10phuedx) [11:03:28] (MediawikiPageContentChangeEnrichAvailability) firing: ... [11:03:28] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [11:08:03] 10Data-Engineering, 10Anti-Harassment, 10Growth-Team, 10MediaWiki-extensions-EventLogging, and 5 others: [EPIC] Deprecate EventLogging::logEvent() - https://phabricator.wikimedia.org/T318263 (10Dreamy_Jazz) [11:34:58] (03PS9) 10Peter Fischer: Provide internal schema for CirrusSearch update-pipeline updates. [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) [11:40:35] (03CR) 10Peter Fischer: "Thanks for the review! I followed your suggestions. I'm not completely happy with the names though. Do we aim for consistency with other s" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) (owner: 10Peter Fischer) [12:37:38] 10Data-Engineering, 10Data-Platform-SRE, 10DC-Ops, 10ops-eqiad: Q1:rack/setup/install dbstore100[89] - https://phabricator.wikimedia.org/T342862 (10RobH) [12:38:10] 10Data-Engineering, 10Data-Platform-SRE, 10DC-Ops, 10ops-eqiad: Q1:rack/setup/install dbstore100[89] - https://phabricator.wikimedia.org/T342862 (10RobH) [12:49:44] (SystemdUnitFailed) firing: hdfs_rsync_analytics_hadoop_published.service Failed on an-web1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:49:50] PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hdfs_rsync_analytics_hadoop_published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:57:41] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Requesting permission to use kafka-main cluster to transport CirrusSearch updates - https://phabricator.wikimedia.org/T341625 (10elukey) The stability of the kafka main cluster is now way better, they are not totally rebalanced but this ca... [13:01:01] RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:04:44] (SystemdUnitFailed) resolved: hdfs_rsync_analytics_hadoop_published.service Failed on an-web1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:18:45] 10Data-Platform-SRE, 10sre-alert-triage: 404 from nginx on wcqs2001 - https://phabricator.wikimedia.org/T342762 (10bking) [13:43:14] 10Data-Engineering, 10Infrastructure-Foundations, 10Puppet-Infrastructure: GeoIP2-Anonymous-IP Subscription expired - https://phabricator.wikimedia.org/T342878 (10jbond) p:05Triage→03Medium [13:46:56] 10Data-Engineering, 10Infrastructure-Foundations, 10Puppet-Infrastructure: GeoIP2-Anonymous-IP Subscription expired - https://phabricator.wikimedia.org/T342878 (10jbond) p:05Medium→03High [13:47:30] hi all could someone take a look at T342878 and mnake sure the correct people are looped in. tl;dr subscription expired for GeoIP2-Anonymous-IP [13:47:31] T342878: GeoIP2-Anonymous-IP Subscription expired - https://phabricator.wikimedia.org/T342878 [13:51:52] 10Data-Engineering, 10Data-Platform-SRE, 10DC-Ops, 10SRE, and 2 others: Q1:rack/setup/install dbstore100[89] - https://phabricator.wikimedia.org/T342862 (10Marostegui) I have assigned the recipe already with the above patch. [13:51:54] 10Data-Engineering, 10Data-Platform-SRE, 10DC-Ops, 10SRE, and 2 others: Q1:rack/setup/install dbstore100[89] - https://phabricator.wikimedia.org/T342862 (10Marostegui) @BTullis any reason why this needs AAAA records records? The other hosts do not have them and it will likely give some headaches with the m... [13:58:28] RECOVERY - Zookeeper Server on flink-zk1003 is OK: PROCS OK: 1 process with command name java, args org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg https://wikitech.wikimedia.org/wiki/Zookeeper [15:03:28] (MediawikiPageContentChangeEnrichAvailability) firing: ... [15:03:28] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [15:05:08] 10Data-Engineering, 10Data-Platform-SRE, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install dbstore100[89] - https://phabricator.wikimedia.org/T342862 (10BTullis) >>! In T342862#9048212, @Marostegui wrote: > @BTullis any reason why this needs AAAA records records? The other hosts do not have them and it... [15:05:22] 10Data-Engineering, 10Data-Platform-SRE, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install dbstore100[89] - https://phabricator.wikimedia.org/T342862 (10BTullis) [15:10:45] jbond: I'll make sure that Olja knows about it and find out if there's any way that we can expedite this and stop it happening again in future. [15:11:31] btullis: thanks [15:19:35] 10Data-Engineering, 10Infrastructure-Foundations, 10Puppet-Infrastructure: GeoIP2-Anonymous-IP Subscription expired - https://phabricator.wikimedia.org/T342878 (10odimitrijevic) This dataset is no longer subscribed to. We should remove the download of the database. [15:21:04] 10Data-Engineering, 10Data-Platform-SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure: GeoIP2-Anonymous-IP Subscription expired - https://phabricator.wikimedia.org/T342878 (10odimitrijevic) [15:41:35] 10Data-Platform-SRE, 10Patch-For-Review: Deploy ceph osd processes to data-engineering cluster - https://phabricator.wikimedia.org/T330151 (10BTullis) I did some standard benchmarks with `rados bench` as per the guidance [[https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/administrati... [15:43:18] 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10MatthewVernon) >>! In T326945#9045370, @BTullis wrote: >>>! In T326945#9045226, @MatthewVernon wrote: >> Apropos your CRUSH rules, it might be worth adding rack/row as well? We have the equival... [15:45:54] 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) >>! In T326945#9048873, @MatthewVernon wrote: > > In which case, the time to add those to the CRUSH rules is now - adjusting the CRUSH rule later often ends up involving a log of data... [15:53:48] 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) I did some standard benchmarks with `rados bench` as per the guidance [[https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/administration_guide/benchmarking_pe... [15:56:22] 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) I then did a little bit of testing with `rbd bench` First writing 10 GB in 4 MB chunks and using 16 threads to the SSDs. ` btullis@cephosd1001:~$ sudo rbd bench --io-type write --io-t... [15:59:55] 10Data-Engineering, 10Data-Platform-SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10Patch-For-Review: GeoIP2-Anonymous-IP Subscription expired - https://phabricator.wikimedia.org/T342878 (10jbond) 05Open→03Resolved a:03jbond >>! In T342878#9048742, @odimitrijevic wrote: > This dataset... [16:23:58] 10Data-Platform-SRE, 10Discovery-Search (Current work): Provision Zookeeper Cluster for storing Flink HA data - https://phabricator.wikimedia.org/T341792 (10bking) The cluster is up and all nodes appear to have joined correctly; my compliments to whoever wrote the puppet code. The next step is to get metrics... [16:26:52] (03CR) 10Sharvaniharan: "Hi Dmitry... Please review when you get a chance :-)" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 (owner: 10Sharvaniharan) [17:07:39] 10Data-Platform-SRE, 10Data Engineering and Event Platform Team, 10GitLab (Project Migration), 10Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Migrate analytics/datahub pipeline to GitLab - https://phabricator.wikimedia.org/T341194 (10CodeReviewBot) btullis merged https://gitlab.wiki... [17:17:54] 10Data-Platform-SRE: Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) [17:20:57] 10Data-Platform-SRE, 10Data Engineering and Event Platform Team, 10GitLab (Project Migration), 10Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Migrate analytics/datahub pipeline to GitLab - https://phabricator.wikimedia.org/T341194 (10CodeReviewBot) dancy opened https://gitlab.wikime... [17:21:30] 10Data-Platform-SRE, 10Data Engineering and Event Platform Team, 10GitLab (Project Migration), 10Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Migrate analytics/datahub pipeline to GitLab - https://phabricator.wikimedia.org/T341194 (10CodeReviewBot) dancy merged https://gitlab.wikime... [17:32:47] 10Data-Platform-SRE, 10Data Engineering and Event Platform Team, 10GitLab (Project Migration), 10Release-Engineering-Team (Priority Backlog 📥): Migrate analytics/datahub pipeline to GitLab - https://phabricator.wikimedia.org/T341194 (10BTullis) Thank you for adding trusted runners to this repo. On first ru... [17:36:00] 10Data-Platform-SRE, 10Data Engineering and Event Platform Team, 10GitLab (Project Migration), 10Release-Engineering-Team (Priority Backlog 📥): Migrate analytics/datahub pipeline to GitLab - https://phabricator.wikimedia.org/T341194 (10dancy) >>! In T341194#9049353, @BTullis wrote: > # It looks like there... [18:28:07] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Create a wiki list for Wikifunctions' call to sqoop-mediawiki-tables (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/941985 (https://phabricator.wikimedia.org/T342199) (owner: 10David Martin) [18:48:48] !log done deploying some simple stuff to refinery (static files and script comment updates) [18:48:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:03:28] (MediawikiPageContentChangeEnrichAvailability) firing: ... [19:03:28] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [19:14:44] (SystemdUnitFailed) firing: hadoop-yarn-nodemanager.service Failed on an-worker1085:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:16:40] PROBLEM - Hadoop NodeManager on an-worker1085 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [19:17:12] PROBLEM - Check systemd state on an-worker1085 is CRITICAL: CRITICAL - degraded: The following units failed: hadoop-yarn-nodemanager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:18:37] PROBLEM - Hadoop NodeManager on an-worker1093 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [19:19:44] (SystemdUnitFailed) firing: (3) hadoop-yarn-nodemanager.service Failed on an-worker1083:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:24:44] (SystemdUnitFailed) firing: (4) hadoop-yarn-nodemanager.service Failed on an-worker1083:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:27:59] RECOVERY - Hadoop NodeManager on an-worker1093 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [19:29:44] (SystemdUnitFailed) firing: (5) hadoop-yarn-nodemanager.service Failed on an-worker1083:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:30:41] PROBLEM - Check systemd state on an-worker1091 is CRITICAL: CRITICAL - degraded: The following units failed: hadoop-yarn-nodemanager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:32:29] PROBLEM - Hadoop NodeManager on an-worker1091 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [19:34:31] PROBLEM - Check systemd state on an-worker1129 is CRITICAL: CRITICAL - degraded: The following units failed: hadoop-yarn-nodemanager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:34:41] PROBLEM - Hadoop NodeManager on an-worker1129 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [19:34:44] (SystemdUnitFailed) firing: (6) hadoop-yarn-nodemanager.service Failed on an-worker1083:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:35:19] RECOVERY - Hadoop NodeManager on an-worker1085 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [19:36:01] RECOVERY - Check systemd state on an-worker1085 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:36:41] RECOVERY - Check systemd state on an-worker1091 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:37:23] RECOVERY - Hadoop NodeManager on an-worker1091 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [19:39:01] PROBLEM - Hadoop NodeManager on an-worker1115 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [19:39:35] RECOVERY - Check systemd state on an-worker1129 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:39:44] (SystemdUnitFailed) firing: (6) hadoop-yarn-nodemanager.service Failed on an-worker1085:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:39:45] RECOVERY - Hadoop NodeManager on an-worker1129 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [19:44:44] (SystemdUnitFailed) firing: (6) hadoop-yarn-nodemanager.service Failed on an-worker1085:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:51:03] PROBLEM - Check systemd state on an-worker1115 is CRITICAL: CRITICAL - degraded: The following units failed: hadoop-yarn-nodemanager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:56:05] RECOVERY - Hadoop NodeManager on an-worker1115 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [19:56:55] RECOVERY - Check systemd state on an-worker1115 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:59:44] (SystemdUnitFailed) resolved: (2) hadoop-yarn-nodemanager.service Failed on an-worker1115:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:36:44] 10Data-Engineering: Don't pollute skein logs. Part II. - https://phabricator.wikimedia.org/T342926 (10xcollazo) [23:03:28] (MediawikiPageContentChangeEnrichAvailability) firing: ... [23:03:28] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability