[00:18:28] <jinxer-wm>	 (RdfStreamingUpdaterNotEnoughTaskSlots) firing: (4) The flink session cluster rdf-streaming-updater in codfw (k8s) does not have enough task slots - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterNotEnoughTaskSlots
[00:38:21] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/983234
[00:38:27] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/983234 (owner: 10TrainBranchBot)
[00:59:49] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/983234 (owner: 10TrainBranchBot)
[01:08:28] <jinxer-wm>	 (KeyholderUnarmed) firing: 1 unarmed Keyholder key(s) on acmechief1002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed
[01:56:23] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin1001 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[01:58:39] <icinga-wm>	 PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-api-int_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:03:45] <icinga-wm>	 PROBLEM - cassandra-b service on restbase2028 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[02:03:59] <icinga-wm>	 PROBLEM - Check systemd state on restbase2028 is CRITICAL: CRITICAL - degraded: The following units failed: cassandra-b.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:04:05] <icinga-wm>	 PROBLEM - cassandra-b SSL 10.192.16.238:7000 on restbase2028 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[02:04:23] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.192.16.238:9042 on restbase2028 is CRITICAL: connect to address 10.192.16.238 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[02:15:49] <icinga-wm>	 RECOVERY - cassandra-b service on restbase2028 is OK: OK - cassandra-b is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[02:16:01] <icinga-wm>	 RECOVERY - Check systemd state on restbase2028 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:16:07] <icinga-wm>	 RECOVERY - cassandra-b SSL 10.192.16.238:7000 on restbase2028 is OK: SSL OK - Certificate restbase2028-b valid until 2025-12-03 21:33:01 +0000 (expires in 717 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[02:16:25] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.192.16.238:9042 on restbase2028 is OK: TCP OK - 0.082 second response time on 10.192.16.238 port 9042 https://phabricator.wikimedia.org/T93886
[02:36:53] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:54:31] <icinga-wm>	 RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:59:23] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin1001 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[03:08:28] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:33:28] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[03:49:23] <icinga-wm>	 PROBLEM - cassandra-b SSL 10.192.16.241:7000 on restbase2029 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[03:49:29] <icinga-wm>	 PROBLEM - Check systemd state on restbase2029 is CRITICAL: CRITICAL - degraded: The following units failed: cassandra-b.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:49:29] <icinga-wm>	 PROBLEM - cassandra-b service on restbase2029 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[03:49:45] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.192.16.241:9042 on restbase2029 is CRITICAL: connect to address 10.192.16.241 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[03:55:31] <icinga-wm>	 RECOVERY - Check systemd state on restbase2029 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:55:31] <icinga-wm>	 RECOVERY - cassandra-b service on restbase2029 is OK: OK - cassandra-b is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[03:56:53] <icinga-wm>	 RECOVERY - cassandra-b SSL 10.192.16.241:7000 on restbase2029 is OK: SSL OK - Certificate restbase2029-b valid until 2025-12-05 16:11:13 +0000 (expires in 719 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[03:57:17] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.192.16.241:9042 on restbase2029 is OK: TCP OK - 0.032 second response time on 10.192.16.241 port 9042 https://phabricator.wikimedia.org/T93886
[04:18:28] <jinxer-wm>	 (RdfStreamingUpdaterNotEnoughTaskSlots) firing: (4) The flink session cluster rdf-streaming-updater in codfw (k8s) does not have enough task slots - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterNotEnoughTaskSlots
[05:08:28] <jinxer-wm>	 (KeyholderUnarmed) firing: 1 unarmed Keyholder key(s) on acmechief1002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed
[05:24:14] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to restricted production access and analytics-privatedata-users for Riddy Khan - https://phabricator.wikimedia.org/T353370 (10Aklapper) @Himejijo: Hi, if you work for or provide services for WMF, please state so on https://phabricator.wikimedia.org/p/Himejijo/ an...
[06:30:06] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to restricted production access and analytics-privatedata-users for Riddy Khan - https://phabricator.wikimedia.org/T353370 (10Himejijo) @Aklapper Done and done.
[06:59:07] <icinga-wm>	 PROBLEM - BGP status on cr1-esams is CRITICAL: BGP CRITICAL - ASunknown/IPv4: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:01:09] <icinga-wm>	 PROBLEM - Check systemd state on cumin2002 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-web_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:04:17] <jinxer-wm>	 (PoolcounterFullQueues) firing: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:08:45] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[07:09:17] <jinxer-wm>	 (PoolcounterFullQueues) resolved: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:36:54] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[07:46:05] <icinga-wm>	 PROBLEM - SSH on wdqs1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[07:58:05] <icinga-wm>	 RECOVERY - Check systemd state on cumin2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231217T0800)
[08:01:13] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[08:08:51] <icinga-wm>	 PROBLEM - SSH on wdqs1022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:18:28] <jinxer-wm>	 (RdfStreamingUpdaterNotEnoughTaskSlots) firing: (4) The flink session cluster rdf-streaming-updater in codfw (k8s) does not have enough task slots - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterNotEnoughTaskSlots
[08:22:43] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1022 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:24:43] <jinxer-wm>	 (SystemdUnitFailed) firing: systemd-timedated.service Failed on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:29:37] <icinga-wm>	 PROBLEM - SSH on wdqs1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:29:43] <jinxer-wm>	 (SystemdUnitFailed) resolved: systemd-timedated.service Failed on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:31:11] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1023 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:31:45] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1022 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:34:09] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1023 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:34:11] <icinga-wm>	 RECOVERY - SSH on wdqs1023 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:34:43] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) systemd-timedated.service Failed on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:44:43] <jinxer-wm>	 (SystemdUnitFailed) resolved: (2) systemd-timedated.service Failed on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:48:19] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1022 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:58:17] <icinga-wm>	 PROBLEM - Check systemd state on puppetserver1001 is CRITICAL: CRITICAL - degraded: The following units failed: upload_puppet_facts.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:00:51] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1024 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service,user@499.service,wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-categories.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:01:25] <icinga-wm>	 RECOVERY - SSH on wdqs1022 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:06:05] <icinga-wm>	 PROBLEM - SSH on wdqs1022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:08:28] <jinxer-wm>	 (KeyholderUnarmed) firing: 1 unarmed Keyholder key(s) on acmechief1002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed
[09:14:23] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1024 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service,user@499.service,wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-categories.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:14:35] <icinga-wm>	 RECOVERY - SSH on wdqs1024 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:16:43] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1022 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:19:12] <jinxer-wm>	 (SystemdUnitFailed) firing: systemd-timedated.service Failed on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:20:57] <icinga-wm>	 RECOVERY - SSH on wdqs1022 is OK: SSH OK - OpenSSH_8.4p1 Debian-5+deb11u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[09:28:43] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1022 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:31:57] <jinxer-wm>	 (SystemdUnitFailed) resolved: systemd-timedated.service Failed on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:36:54] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[11:58:11] <icinga-wm>	 PROBLEM - Check systemd state on restbase2030 is CRITICAL: CRITICAL - degraded: The following units failed: cassandra-a.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:58:13] <icinga-wm>	 PROBLEM - cassandra-a service on restbase2030 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[11:58:15] <icinga-wm>	 PROBLEM - cassandra-a SSL 10.192.16.243:7000 on restbase2030 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[11:59:17] <icinga-wm>	 PROBLEM - cassandra-a CQL 10.192.16.243:9042 on restbase2030 is CRITICAL: connect to address 10.192.16.243 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[11:59:43] <icinga-wm>	 RECOVERY - Check systemd state on restbase2030 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:59:43] <icinga-wm>	 RECOVERY - cassandra-a service on restbase2030 is OK: OK - cassandra-a is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[12:00:45] <icinga-wm>	 RECOVERY - cassandra-a CQL 10.192.16.243:9042 on restbase2030 is OK: TCP OK - 0.093 second response time on 10.192.16.243 port 9042 https://phabricator.wikimedia.org/T93886
[12:01:15] <icinga-wm>	 RECOVERY - cassandra-a SSL 10.192.16.243:7000 on restbase2030 is OK: SSL OK - Certificate restbase2030-a valid until 2025-12-06 17:50:13 +0000 (expires in 720 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[12:18:30] <jinxer-wm>	 (RdfStreamingUpdaterNotEnoughTaskSlots) firing: (4) The flink session cluster rdf-streaming-updater in codfw (k8s) does not have enough task slots - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterNotEnoughTaskSlots
[12:59:39] <elukey>	 !log restart kubelet on ml-serve1001 (errors while syncing old containers)
[12:59:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:08:28] <jinxer-wm>	 (KeyholderUnarmed) firing: 1 unarmed Keyholder key(s) on acmechief1002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed
[13:16:05] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:17:25] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.310 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:41:16] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[13:46:16] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[13:56:17] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[14:36:53] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:56:57] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:08:28] <wikibugs>	 (03CR) 10Anzx: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972473 (https://phabricator.wikimedia.org/T350739) (owner: 10Strainu)
[15:38:30] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[16:18:30] <jinxer-wm>	 (RdfStreamingUpdaterNotEnoughTaskSlots) firing: (4) The flink session cluster rdf-streaming-updater in codfw (k8s) does not have enough task slots - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterNotEnoughTaskSlots
[17:08:28] <jinxer-wm>	 (KeyholderUnarmed) firing: 1 unarmed Keyholder key(s) on acmechief1002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed
[17:50:15] <icinga-wm>	 PROBLEM - BGP status on cr1-esams is CRITICAL: BGP CRITICAL - No response from remote host 185.15.59.128 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:38:30] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[20:09:06] <wikibugs>	 (03PS4) 10Strainu: [namespaces] Use correct diacritics in Romanian [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972473 (https://phabricator.wikimedia.org/T350739)
[20:11:42] <wikibugs>	 (03PS5) 10Strainu: [namespaces] Use correct diacritics in Romanian [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972473 (https://phabricator.wikimedia.org/T350739)
[20:18:30] <jinxer-wm>	 (RdfStreamingUpdaterNotEnoughTaskSlots) firing: (4) The flink session cluster rdf-streaming-updater in codfw (k8s) does not have enough task slots - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterNotEnoughTaskSlots
[21:08:29] <jinxer-wm>	 (KeyholderUnarmed) firing: 1 unarmed Keyholder key(s) on acmechief1002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed
[21:30:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[21:31:59] <icinga-wm>	 PROBLEM - OSPF status on cr2-drmrs is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 1/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[21:33:31] <icinga-wm>	 RECOVERY - OSPF status on cr2-drmrs is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[21:35:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:38:31] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure