[00:09:19] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:11:33] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:12:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:20:42] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1003-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[00:26:41] <jinxer-wm>	 (CertManagerCertNotReady) firing: Certificate istio-system/knative-serving is not in a ready state - https://wikitech.wikimedia.org/wiki/Kubernetes/cert-manager  - https://alerts.wikimedia.org/?q=alertname%3DCertManagerCertNotReady
[00:28:09] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_dispersion_stats_lowlatency.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:36:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: Elasticsearch instance elastic1049-production-search-psi-eqiad is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchJVMGCOldPoolFlatlined
[00:39:57] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:51:29] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sat 25 Jun 2022 07:55:09 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[00:54:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on ml-serve-ctrl2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:55:20] <wikibugs>	 10SRE, 10Data-Engineering, 10Traffic, 10Patch-For-Review, 10User-zeljkofilipin: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10Ottomata) Thanks ben!
[00:56:13] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 24 Aug 2022 07:48:40 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[01:05:42] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[01:17:47] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_ring_manager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:25:19] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:11:53] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:26:33] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:30:45] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sat 25 Jun 2022 07:55:09 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[02:33:07] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 24 Aug 2022 07:48:40 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:03:41] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:11:15] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:38:33] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sat 25 Jun 2022 07:55:09 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:43:15] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 24 Aug 2022 07:48:40 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:53:09] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_dispersion_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:02:23] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: / (spec from root) is CRITICAL: Test spec from root returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid
[04:04:47] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[04:12:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[04:25:31] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1003-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[04:26:41] <jinxer-wm>	 (CertManagerCertNotReady) firing: Certificate istio-system/knative-serving is not in a ready state - https://wikitech.wikimedia.org/wiki/Kubernetes/cert-manager  - https://alerts.wikimedia.org/?q=alertname%3DCertManagerCertNotReady
[04:36:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: Elasticsearch instance elastic1049-production-search-psi-eqiad is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchJVMGCOldPoolFlatlined
[04:54:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on ml-serve-ctrl2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[05:05:42] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[05:18:13] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl1002 is CRITICAL: instance=10.64.48.64 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[05:20:45] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:27:49] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_dispersion_stats.service,swift_dispersion_stats_lowlatency.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:37:09] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl1002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[05:53:57] <icinga-wm>	 PROBLEM - SSH on cp5012.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:07:29] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:10:07] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:16:57] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_ring_manager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:39:05] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl2002 is CRITICAL: instance=10.192.48.41 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[06:55:13] <icinga-wm>	 RECOVERY - SSH on cp5012.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220619T0700)
[07:02:43] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[07:18:45] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:52:27] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl2002 is CRITICAL: instance=10.192.48.41 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[07:57:09] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[08:12:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:25:42] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1003-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[08:26:41] <jinxer-wm>	 (CertManagerCertNotReady) firing: Certificate istio-system/knative-serving is not in a ready state - https://wikitech.wikimedia.org/wiki/Kubernetes/cert-manager  - https://alerts.wikimedia.org/?q=alertname%3DCertManagerCertNotReady
[08:28:01] <icinga-wm>	 PROBLEM - SSH on wtp1025.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:36:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: Elasticsearch instance elastic1049-production-search-psi-eqiad is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchJVMGCOldPoolFlatlined
[08:54:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on ml-serve-ctrl2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:56:07] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl2002 is CRITICAL: instance=10.192.48.41 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[09:05:42] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[09:11:41] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:20:23] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:29:15] <icinga-wm>	 RECOVERY - SSH on wtp1025.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:29:59] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl1002 is CRITICAL: instance=10.64.48.64 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[09:32:13] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:43:23] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[09:58:21] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl1002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[10:05:18] <jinxer-wm>	 (ProbeDown) firing: Service api-https:443 has failed probes (http_api-https_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:05:19] <jinxer-wm>	 (ProbeDown) firing: Service api-https:443 has failed probes (http_api-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:05:43] <icinga-wm>	 PROBLEM - Apache HTTP on mw1423 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:05:45] <icinga-wm>	 PROBLEM - Apache HTTP on mw1426 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:05:45] <icinga-wm>	 PROBLEM - Apache HTTP on mw1383 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:05:47] <icinga-wm>	 PROBLEM - Apache HTTP on mw1348 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:05:49] <icinga-wm>	 PROBLEM - Apache HTTP on mw1388 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:05:50] <TheresNoTime>	 heck
[10:05:51] <icinga-wm>	 PROBLEM - Apache HTTP on mw1340 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:05:51] <icinga-wm>	 PROBLEM - Apache HTTP on mw1344 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:07] <icinga-wm>	 PROBLEM - Apache HTTP on mw1443 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:07] <icinga-wm>	 PROBLEM - Apache HTTP on mw1449 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:09] <icinga-wm>	 PROBLEM - Apache HTTP on mw1346 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:17] <icinga-wm>	 PROBLEM - Apache HTTP on mw1396 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:17] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki api_appserver at eqiad #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=54&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad prometheus/ops&var-cluster=api_appserver - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[10:06:19] <icinga-wm>	 PROBLEM - Apache HTTP on mw1408 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:19] <icinga-wm>	 PROBLEM - Apache HTTP on mw1361 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:21] <icinga-wm>	 PROBLEM - Apache HTTP on mw1394 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:21] <icinga-wm>	 PROBLEM - Apache HTTP on mw1347 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:23] <icinga-wm>	 PROBLEM - Apache HTTP on mw1345 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:26] <taavi>	 umh
[10:06:31] <icinga-wm>	 PROBLEM - Apache HTTP on mw1428 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:33] <icinga-wm>	 PROBLEM - Apache HTTP on mw1362 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:33] <icinga-wm>	 PROBLEM - Apache HTTP on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:33] <icinga-wm>	 PROBLEM - Apache HTTP on mw1357 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:35] <icinga-wm>	 PROBLEM - Apache HTTP on mw1379 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:35] <icinga-wm>	 PROBLEM - Apache HTTP on mw1386 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:35] <icinga-wm>	 PROBLEM - Apache HTTP on mw1312 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:47] <icinga-wm>	 PROBLEM - Apache HTTP on mw1382 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:06:53] <icinga-wm>	 PROBLEM - Apache HTTP on mw1358 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:01] <icinga-wm>	 PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase
[10:07:03] <icinga-wm>	 PROBLEM - Apache HTTP on mw1375 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:03] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2018 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:05] <icinga-wm>	 PROBLEM - Apache HTTP on mw1378 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:07] <icinga-wm>	 PROBLEM - Apache HTTP on mw1363 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:07] <icinga-wm>	 PROBLEM - Apache HTTP on mw1343 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:09] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:09] <icinga-wm>	 PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: /api/rest_v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[10:07:09] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1024 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:10] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:10] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:11] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2015 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:11] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2022 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:11] <icinga-wm>	 PROBLEM - Apache HTTP on mw1381 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:17] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2017 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:17] <icinga-wm>	 PROBLEM - Apache HTTP on mw1376 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:17] <icinga-wm>	 PROBLEM - Apache HTTP on mw1425 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:19] <icinga-wm>	 PROBLEM - Apache HTTP on mw1313 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:25] <icinga-wm>	 PROBLEM - wikifeeds codfw on wikifeeds.svc.codfw.wmnet is CRITICAL: /{domain}/v1/feed/onthisday/{type}/{month}/{day} (retrieve selected events on January 15) timed out before a response was received: /{domain}/v1/page/featured/{year}/{month}/{day} (retrieve title of the featured article for April 29, 2016) timed out before a response was received: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for Apr
[10:07:25] <icinga-wm>	 016) timed out before a response was received: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most read articles for January 1, 2016) timed out before a response was received: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) timed out before a response was received: /{domain}/v1/page/random/title (retrieve a random article title) timed out before 
[10:07:25] <icinga-wm>	 se was received https://wikitech.wikimedia.org/wiki/Wikifeeds
[10:07:25] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1028 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:25] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2021 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:26] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2013 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:27] <TheresNoTime>	 It's already #page d out twice
[10:07:29] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2014 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:07:33] <icinga-wm>	 PROBLEM - Apache HTTP on mw1360 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:33] <icinga-wm>	 PROBLEM - Apache HTTP on mw1402 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:35] <icinga-wm>	 PROBLEM - Restbase edge drmrs on text-lb.drmrs.wikimedia.org is CRITICAL: /api/rest_v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[10:07:35] <icinga-wm>	 PROBLEM - proton LVS eqiad on proton.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not fou
[10:07:35] <icinga-wm>	  nonexistent title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Proton
[10:07:39] <icinga-wm>	 PROBLEM - Apache HTTP on mw1398 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:39] <icinga-wm>	 PROBLEM - Apache HTTP on mw1314 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:40] <icinga-wm>	 PROBLEM - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/featured/{year}/{month}/{day} (retrieve title of the featured article for April 29, 2016) timed out before a response was received: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) timed out before a response was received: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most read arti
[10:07:40] <icinga-wm>	  January 1, 2016) timed out before a response was received: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) timed out before a response was received https://wikitech.wikimedia.org/wiki/Wikifeeds
[10:07:41] <icinga-wm>	 PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received: /{domain}/v1/page/description/{title} (Get description for test page) timed out before a response was received: /{domain}/v1/page/media-list/{title} (Get media list from test page) timed out before a response was received: /{domain}/v1/page/mobile-html/{title} (Get pa
[10:07:41] <icinga-wm>	 nt HTML for test page) timed out before a response was received: /{domain}/v1/page/mobile-sections/{title} (retrieve test page via mobile-sections) timed out before a response was received: /{domain}/v1/page/summary/{title} (Get summary for test page) timed out before a response was received: /{domain}/v1/transform/html/to/mobile-html/{title} (Get preview mobile HTML for test page) timed out before a response was received https://wikitech
[10:07:41] <icinga-wm>	 ia.org/wiki/Mobileapps_%28service%29
[10:07:49] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[10:07:51] <icinga-wm>	 PROBLEM - Apache HTTP on mw1317 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:51] <icinga-wm>	 PROBLEM - Apache HTTP on mw1377 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:51] <icinga-wm>	 PROBLEM - Apache HTTP on mw1341 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:07:54] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s1 #page on db1132 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:07:57] <icinga-wm>	 PROBLEM - termbox codfw on termbox.svc.codfw.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
[10:08:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1030 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:08:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2025 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:08:09] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 2699 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:08:11] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for appserver on alert1001 is CRITICAL: 2730 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:08:13] <icinga-wm>	 PROBLEM - proton LVS codfw on proton.svc.codfw.wmnet is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not fou
[10:08:13] <icinga-wm>	  nonexistent title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Proton
[10:08:15] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in eqiad on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=POS
[10:08:29] <icinga-wm>	 PROBLEM - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is CRITICAL: /api/rest_v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[10:08:40] <taavi>	 looks like current impact is limited to the api cluster?
[10:08:53] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - api-https_443: Servers mw1422.eqiad.wmnet, mw1426.eqiad.wmnet, mw1346.eqiad.wmnet, mw1380.eqiad.wmnet, mw1404.eqiad.wmnet, mw1448.eqiad.wmnet, mw1392.eqiad.wmnet, mw1374.eqiad.wmnet, mw1344.eqiad.wmnet, mw1402.eqiad.wmnet, mw1358.eqiad.wmnet, mw1386.eqiad.wmnet, mw1348.eqiad.wmnet, mw1342.eqiad.wmnet, mw1396.eqiad.wmnet, mw1390.eqiad.wmnet, mw1362.eq
[10:08:53] <icinga-wm>	 t, mw1381.eqiad.wmnet, mw1450.eqiad.wmnet, mw1340.eqiad.wmnet, mw1449.eqiad.wmnet, mw1443.eqiad.wmnet, mw1343.eqiad.wmnet, mw1421.eqiad.wmnet, mw1347.eqiad.wmnet, mw1377.eqiad.wmnet, mw1314.eqiad.wmnet, mw1447.eqiad.wmnet, mw1424.eqiad.wmnet, mw1412.eqiad.wmnet, mw1408.eqiad.wmnet, mw1398.eqiad.wmnet, mw1363.eqiad.wmnet, mw1357.eqiad.wmnet, mw1423.eqiad.wmnet, mw1317.eqiad.wmnet, mw1425.eqiad.wmnet, mw1444.eqiad.wmnet, mw1316.eqiad.wmnet,
[10:08:53] <icinga-wm>	 eqiad.wmnet, mw1312.eqiad.wmnet, mw1394.eqiad.wmnet, mw1383.eqiad.wmnet, mw1427.eqiad.wmnet, mw1406.eqiad.wmnet, mw1375.eqiad.wmnet, mw1400.eqiad.wmnet, mw1382.eqiad.wmnet, mw1376.eqiad https://wikitech.wikimedia.org/wiki/PyBal
[10:08:53] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[10:08:53] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2024 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:03] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1021 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:03] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:03] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:03] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:05] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:07] <icinga-wm>	 PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/translate/{from}/{to}/{provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received: /v2/suggest/source/{title}/{to} (Suggest a source title to use for translation) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
[10:09:09] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:11] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2027 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:11] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1375 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:11] <icinga-wm>	 PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received: /{domain}/v1/page/description/{title} (Get description for test page) timed out before a response was received: /{domain}/v1/page/media-list/{title} (Get media list from test page) timed out before a response was received: /{domain}/v1/page/mobile-html/{title} (Get pa
[10:09:11] <icinga-wm>	 nt HTML for test page) timed out before a response was received: /{domain}/v1/page/mobile-sections/{title} (retrieve test page via mobile-sections) is CRITICAL: Test retrieve test page via mobile-sections returned the unexpected status 503 (expecting: 200): /{domain}/v1/page/summary/{title} (Get summary for test page) timed out before a response was received: /{domain}/v1/transform/html/to/mobile-html/{title} (Get preview mobile HTML for 
[10:09:11] <icinga-wm>	 e) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[10:09:11] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:11] <TheresNoTime>	 Moving to sre
[10:09:13] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[10:09:13] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.9677 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[10:09:15] <icinga-wm>	 PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[10:09:15] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2023 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:15] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1361 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:15] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2026 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:15] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2016 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:15] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:17] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2019 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:21] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1379 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:25] <icinga-wm>	 PROBLEM - Apache HTTP on mw1424 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:29] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - api-https_443: Servers mw1426.eqiad.wmnet, mw1362.eqiad.wmnet, mw1380.eqiad.wmnet, mw1448.eqiad.wmnet, mw1374.eqiad.wmnet, mw1344.eqiad.wmnet, mw1428.eqiad.wmnet, mw1348.eqiad.wmnet, mw1386.eqiad.wmnet, mw1378.eqiad.wmnet, mw1390.eqiad.wmnet, mw1388.eqiad.wmnet, mw1449.eqiad.wmnet, mw1345.eqiad.wmnet, mw1424.eqiad.wmnet, mw1408.eqiad.wmnet, mw1398.eq
[10:09:29] <icinga-wm>	 t, mw1357.eqiad.wmnet, mw1423.eqiad.wmnet, mw1317.eqiad.wmnet, mw1425.eqiad.wmnet, mw1316.eqiad.wmnet, mw1312.eqiad.wmnet, mw1394.eqiad.wmnet, mw1427.eqiad.wmnet, mw1406.eqiad.wmnet, mw1342.eqiad.wmnet, mw1382.eqiad.wmnet, mw1341.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, mw1313.eqiad.wmnet, mw1422.eqiad.wmnet, mw1346.eqiad.wmnet, mw1447.eqiad.wmnet, mw1361.eqiad.wmnet, mw1443.eqiad.wmnet, mw1314.eqiad.wmnet, mw1412.eqiad.wmnet,
[10:09:29] <icinga-wm>	 eqiad.wmnet, mw1404.eqiad.wmnet, mw1381.eqiad.wmnet, mw1450.eqiad.wmnet, mw1340.eqiad.wmnet, mw1343.eqiad.wmnet, mw1421.eqiad.wmnet, mw1347.eqiad.wmnet, mw1377.eqiad.wmnet, mw1358.eqiad https://wikitech.wikimedia.org/wiki/PyBal
[10:09:29] <icinga-wm>	 PROBLEM - Apache HTTP on mw1400 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:29] <icinga-wm>	 PROBLEM - Apache HTTP on mw1447 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:30] <icinga-wm>	 PROBLEM - Apache HTTP on mw1404 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:30] <icinga-wm>	 PROBLEM - Apache HTTP on mw1406 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:31] <icinga-wm>	 PROBLEM - Apache HTTP on mw1315 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:31] <icinga-wm>	 PROBLEM - Apache HTTP on mw1390 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:32] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:09:33] <icinga-wm>	 PROBLEM - Apache HTTP on mw1450 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:33] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1347 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:33] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1343 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:45] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1450 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:45] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1394 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:45] <icinga-wm>	 PROBLEM - Apache HTTP on mw1380 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:45] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1381 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:47] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1377 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:47] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1344 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:47] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1425 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:47] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1341 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:09:49] <icinga-wm>	 PROBLEM - Apache HTTP on mw1392 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:49] <icinga-wm>	 PROBLEM - Apache HTTP on mw1356 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:52] <jinxer-wm>	 (FrontendUnavailable) firing: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[10:09:55] <icinga-wm>	 PROBLEM - Apache HTTP on mw1427 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:09:57] <jinxer-wm>	 (FrontendUnavailable) firing: HAProxy (cache_text) has reduced HTTP availability #page - TODO - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[10:09:59] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1363 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:00] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1313 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:03] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in appserver at eqiad on alert1001 is CRITICAL: 0.5068 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[10:10:11] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1388 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:19] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1360 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:20] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s1 #page on db1132 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:10:28] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[10:10:29] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1362 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:30] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1383 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:33] <icinga-wm>	 PROBLEM - termbox eqiad on termbox.svc.eqiad.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
[10:10:43] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1312 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:43] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1378 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:43] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1340 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:43] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1382 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:43] <icinga-wm>	 PROBLEM - Restbase edge codfw on text-lb.codfw.wikimedia.org is CRITICAL: /api/rest_v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[10:10:45] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1358 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:10:53] <icinga-wm>	 PROBLEM - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is CRITICAL: /api/rest_v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[10:10:53] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1404 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:11:07] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1346 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:11:11] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1029 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:11:13] <icinga-wm>	 PROBLEM - Apache HTTP on mw1374 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:11:13] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1357 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:11:13] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1443 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:11:13] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1448 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:11:15] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2012 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:11:17] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1396 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:11:17] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1376 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:11:17] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1315 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:11:33] <icinga-wm>	 PROBLEM - Apache HTTP on mw1444 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:11:37] <icinga-wm>	 PROBLEM - Apache HTTP on mw1422 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:11:43] <icinga-wm>	 PROBLEM - Apache HTTP on mw1342 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:12:05] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 5 day(s) (Sat 25 Jun 2022 07:55:09 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[10:12:13] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1392 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:13] <icinga-wm>	 PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase
[10:12:17] <icinga-wm>	 PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v2/translate/{from}/{to} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received: /v2/suggest/source/{title}/{to} (Suggest a source title to use for translation) timed out before a response was received: /v2/suggest/sections/{title}/{from}/{to} (Suggest source sections to translate) timed 
[10:12:17] <icinga-wm>	 re a response was received https://wikitech.wikimedia.org/wiki/CX
[10:12:33] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1424 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:33] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1386 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:41] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2020 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:12:41] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1390 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:41] <icinga-wm>	 RECOVERY - Apache HTTP on mw1317 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 8.494 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:12:51] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1400 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:51] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1428 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:53] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1427 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:53] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1421 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:55] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1423 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:55] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1406 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:55] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1374 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:55] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1019 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:12:55] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1348 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:56] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1426 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:56] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1345 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:57] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1408 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:57] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1402 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:58] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1316 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:12:59] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:13:07] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1412 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:13:09] <icinga-wm>	 PROBLEM - Apache HTTP on mw1316 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:13:17] <icinga-wm>	 PROBLEM - Apache HTTP on mw1448 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:13:19] <icinga-wm>	 PROBLEM - Apache HTTP on mw1421 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[10:13:23] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1317 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:13:25] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1380 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:13:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[10:13:59] <icinga-wm>	 PROBLEM - MariaDB read only s1 on db1132 is CRITICAL: Could not connect to localhost:3306 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[10:14:21] <icinga-wm>	 RECOVERY - Apache HTTP on mw1447 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 9.213 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:14:21] <icinga-wm>	 RECOVERY - Apache HTTP on mw1406 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 9.288 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:14:34] <jinxer-wm>	 (FrontendUnavailable) resolved: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[10:14:37] <logmsgbot>	 !log ayounsi@cumin1001 dbctl commit (dc=all): 'depool', diff saved to https://phabricator.wikimedia.org/P29910 and previous config saved to /var/cache/conftool/dbconfig/20220619-101436-ayounsi.json
[10:14:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:45] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1422 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:14:47] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1444 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:14:47] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1314 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:14:51] <icinga-wm>	 RECOVERY - Apache HTTP on mw1402 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 9.588 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:14:53] <icinga-wm>	 RECOVERY - Apache HTTP on mw1314 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 6.667 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:14:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: (2) Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag  - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[10:14:59] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1388 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 6.646 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:03] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1356 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:05] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1390 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 8.079 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:07] <icinga-wm>	 RECOVERY - Apache HTTP on mw1341 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 7.601 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:13] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1421 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 5.216 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:13] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1428 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 7.710 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:13] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1427 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 6.069 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:15] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1406 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 4.878 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:15] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1423 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 5.666 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:17] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1348 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 4.736 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:17] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1408 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 4.837 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:17] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1402 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 4.365 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:17] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1345 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 5.452 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:17] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1426 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 5.562 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:19] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1362 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 6.858 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:19] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1316 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 5.165 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:21] <icinga-wm>	 RECOVERY - Apache HTTP on mw1423 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 4.924 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:21] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1383 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 9.166 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:21] <icinga-wm>	 RECOVERY - Apache HTTP on mw1426 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 4.661 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:23] <icinga-wm>	 RECOVERY - Apache HTTP on mw1348 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 3.759 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:25] <icinga-wm>	 RECOVERY - Apache HTTP on mw1383 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 7.353 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:25] <icinga-wm>	 RECOVERY - Apache HTTP on mw1388 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 3.270 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:27] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1412 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 3.531 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:27] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1312 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 3.896 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:29] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1340 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 3.599 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:29] <icinga-wm>	 RECOVERY - Apache HTTP on mw1340 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 3.612 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:29] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1030 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:15:30] <icinga-wm>	 RECOVERY - Apache HTTP on mw1316 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 3.029 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:30] <icinga-wm>	 RECOVERY - Apache HTTP on mw1344 is OK: HTTP OK: HTTP/1.1 302 Found - 547 bytes in 4.405 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:31] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1382 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 6.015 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:31] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1378 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 7.373 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:31] <icinga-wm>	 RECOVERY - proton LVS codfw on proton.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Proton
[10:15:33] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1358 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 5.564 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:35] <icinga-wm>	 RECOVERY - Apache HTTP on mw1448 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.060 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:35] <icinga-wm>	 RECOVERY - Apache HTTP on mw1421 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:35] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1404 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 1.148 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:39] <icinga-wm>	 RECOVERY - Apache HTTP on mw1443 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.043 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:39] <icinga-wm>	 RECOVERY - Apache HTTP on mw1449 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.051 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:40] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1317 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:41] <icinga-wm>	 RECOVERY - Apache HTTP on mw1346 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:43] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1380 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:47] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1346 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.046 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:49] <icinga-wm>	 RECOVERY - Apache HTTP on mw1396 is OK: HTTP OK: HTTP/1.1 302 Found - 546 bytes in 0.392 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:55] <icinga-wm>	 RECOVERY - Apache HTTP on mw1408 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.056 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:55] <icinga-wm>	 RECOVERY - Apache HTTP on mw1361 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.048 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:55] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1448 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:55] <icinga-wm>	 RECOVERY - Apache HTTP on mw1394 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.058 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:55] <icinga-wm>	 RECOVERY - Apache HTTP on mw1345 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.055 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:56] <icinga-wm>	 RECOVERY - Apache HTTP on mw1374 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.056 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:56] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1443 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.090 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:57] <icinga-wm>	 RECOVERY - Apache HTTP on mw1347 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.104 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:15:57] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1357 is OK: HTTP OK: HTTP/1.1 302 Found - 561 bytes in 1.107 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:58] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1315 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.041 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:58] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1396 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.053 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:15:59] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1376 is OK: HTTP OK: HTTP/1.1 302 Found - 560 bytes in 0.710 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:05] <icinga-wm>	 RECOVERY - Apache HTTP on mw1428 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.037 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:07] <icinga-wm>	 RECOVERY - Apache HTTP on mw1362 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:07] <icinga-wm>	 RECOVERY - Apache HTTP on mw1359 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.090 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:07] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[10:16:07] <icinga-wm>	 RECOVERY - Apache HTTP on mw1357 is OK: HTTP OK: HTTP/1.1 302 Found - 546 bytes in 0.956 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:07] <icinga-wm>	 RECOVERY - Apache HTTP on mw1379 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.050 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:09] <icinga-wm>	 RECOVERY - Apache HTTP on mw1386 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:09] <icinga-wm>	 RECOVERY - Apache HTTP on mw1312 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:17] <icinga-wm>	 RECOVERY - Apache HTTP on mw1444 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:19] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1025 is CRITICAL: /en.wikipedia.org/v1/transform/wikitext/to/html/{title} (Transform wikitext to html) is CRITICAL: Could not fetch url http://10.64.48.125:7231/en.wikipedia.org/v1/transform/wikitext/to/html/User%3ABSitzmann_%28WMF%29%2FMCS%2FTest%2FFrankenstein: Timeout on connection while downloading http://10.64.48.125:7231/en.wikipedia.org/v1/transform/wikitext/to/html/User%3ABSitzmann_%2
[10:16:19] <icinga-wm>	 2FMCS%2FTest%2FFrankenstein https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:16:19] <icinga-wm>	 RECOVERY - Apache HTTP on mw1422 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:20] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1375 is OK: HTTP OK: HTTP/1.1 302 Found - 560 bytes in 0.109 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:23] <icinga-wm>	 RECOVERY - Apache HTTP on mw1382 is OK: HTTP OK: HTTP/1.1 302 Found - 546 bytes in 0.261 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:23] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1361 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:25] <icinga-wm>	 RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[10:16:25] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[10:16:27] <icinga-wm>	 RECOVERY - Apache HTTP on mw1342 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:27] <icinga-wm>	 RECOVERY - MariaDB read only s1 on db1132 is OK: Version 10.6.8-MariaDB-log, Uptime 2166102s, read_only: True, event_scheduler: True, 10.90 QPS, connection latency: 0.011391s, query latency: 0.000681s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[10:16:29] <icinga-wm>	 RECOVERY - Apache HTTP on mw1358 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.068 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:31] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1379 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.048 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:33] <icinga-wm>	 RECOVERY - Apache HTTP on mw1424 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.037 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:39] <icinga-wm>	 RECOVERY - Apache HTTP on mw1404 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.033 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:39] <icinga-wm>	 RECOVERY - Apache HTTP on mw1400 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.040 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:39] <icinga-wm>	 RECOVERY - Apache HTTP on mw1375 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.055 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:39] <icinga-wm>	 RECOVERY - Apache HTTP on mw1315 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:40] <icinga-wm>	 RECOVERY - Apache HTTP on mw1378 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.054 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:40] <icinga-wm>	 RECOVERY - Apache HTTP on mw1390 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.052 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:41] <icinga-wm>	 RECOVERY - Apache HTTP on mw1450 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.043 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:41] <icinga-wm>	 RECOVERY - Apache HTTP on mw1363 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:41] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1343 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:42] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1347 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.068 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:43] <icinga-wm>	 RECOVERY - Apache HTTP on mw1343 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:47] <icinga-wm>	 RECOVERY - Apache HTTP on mw1381 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:47] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:16:49] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:16:49] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:16:51] <icinga-wm>	 RECOVERY - Apache HTTP on mw1376 is OK: HTTP OK: HTTP/1.1 302 Found - 546 bytes in 0.343 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:53] <icinga-wm>	 RECOVERY - Apache HTTP on mw1425 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:53] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1394 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.041 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:53] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1450 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.050 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:53] <icinga-wm>	 RECOVERY - Apache HTTP on mw1380 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.057 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:53] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1381 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.057 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:55] <icinga-wm>	 RECOVERY - Apache HTTP on mw1313 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:55] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1392 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:55] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1344 is OK: HTTP OK: HTTP/1.1 302 Found - 560 bytes in 0.116 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:55] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1377 is OK: HTTP OK: HTTP/1.1 302 Found - 560 bytes in 0.424 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:56] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1341 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.041 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:56] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1425 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.051 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:16:57] <icinga-wm>	 RECOVERY - Apache HTTP on mw1392 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.050 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:57] <icinga-wm>	 RECOVERY - Apache HTTP on mw1356 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:16:58] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 24 Aug 2022 07:48:40 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[10:17:03] <icinga-wm>	 RECOVERY - Apache HTTP on mw1427 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.039 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:17:03] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1422 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:03] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1444 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.046 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:03] <icinga-wm>	 RECOVERY - wikifeeds codfw on wikifeeds.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds
[10:17:05] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1314 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:05] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2013 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:17:05] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1028 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:17:09] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1363 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.048 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:09] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1313 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.084 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:09] <icinga-wm>	 RECOVERY - Apache HTTP on mw1360 is OK: HTTP OK: HTTP/1.1 302 Found - 546 bytes in 0.361 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:17:10] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2014 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:17:11] <icinga-wm>	 RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[10:17:13] <icinga-wm>	 RECOVERY - Apache HTTP on mw1398 is OK: HTTP OK: HTTP/1.1 302 Found - 545 bytes in 0.046 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:17:13] <icinga-wm>	 RECOVERY - Restbase edge drmrs on text-lb.drmrs.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[10:17:15] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1424 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:15] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1386 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:17] <icinga-wm>	 RECOVERY - proton LVS eqiad on proton.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Proton
[10:17:17] <icinga-wm>	 RECOVERY - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds
[10:17:19] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1356 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.041 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:19] <icinga-wm>	 RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[10:17:21] <icinga-wm>	 RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in appserver at eqiad on alert1001 is OK: All metrics within thresholds. https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[10:17:27] <icinga-wm>	 RECOVERY - Apache HTTP on mw1377 is OK: HTTP OK: HTTP/1.1 302 Found - 546 bytes in 0.191 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:17:27] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1360 is OK: HTTP OK: HTTP/1.1 302 Found - 560 bytes in 0.671 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:30] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:17:33] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1400 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.046 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:37] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1374 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.053 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:40] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: s1 #page on db1132 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:17:41] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s1 #page on db1132 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:17:45] <icinga-wm>	 RECOVERY - termbox codfw on termbox.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
[10:17:45] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1359 is OK: HTTP OK: HTTP/1.1 302 Found - 559 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[10:17:45] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:17:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in eqiad (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh
[10:17:53] <icinga-wm>	 RECOVERY - termbox eqiad on termbox.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service
[10:17:53] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:17:55] <jinxer-wm>	 (LogstashIngestSpike) firing: (2) Logstash rate of ingestion percent change compared to yesterday - https://phabricator.wikimedia.org/T202307 - https://grafana.wikimedia.org/d/000000561/logstash?orgId=1&panelId=2&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DLogstashIngestSpike
[10:17:59] <icinga-wm>	 RECOVERY - Restbase edge codfw on text-lb.codfw.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[10:17:59] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=POST
[10:18:05] <icinga-wm>	 RECOVERY - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[10:18:11] <icinga-wm>	 RECOVERY - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[10:18:29] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1029 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:30] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2012 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:33] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:37] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[10:18:43] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:43] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:43] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:43] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:43] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:47] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:51] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:51] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:53] <icinga-wm>	 RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[10:18:55] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:55] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:55] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:55] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:59] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:18:59] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:19:13] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[10:19:13] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:19:13] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:19:13] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:19:13] <icinga-wm>	 RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[10:19:14] <icinga-wm>	 RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[10:19:15] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2015 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:19:15] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:19:21] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:19:26] <jinxer-wm>	 (RdfStreamingUpdaterFlinkJobUnstable) firing: (2) WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[10:19:31] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:19:56] <jinxer-wm>	 (FrontendUnavailable) resolved: HAProxy (cache_text) has reduced HTTP availability #page - TODO - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DFrontendUnavailable
[10:19:59] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[10:20:15] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: (2) Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag  - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[10:20:19] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:20:21] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for appserver on alert1001 is OK: (C)100 gt (W)50 gt 1 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:21:06] <jinxer-wm>	 (ProbeDown) resolved: Service api-https:443 has failed probes (http_api-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:21:11] <jinxer-wm>	 (ProbeDown) resolved: (2) Service api-https:443 has failed probes (http_api-https_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:21:28] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki api_appserver at eqiad #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=54&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad prometheus/ops&var-cluster=api_appserver - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[10:22:19] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:22:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: (2) Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh
[10:22:55] <jinxer-wm>	 (LogstashIngestSpike) resolved: (2) Logstash rate of ingestion percent change compared to yesterday - https://phabricator.wikimedia.org/T202307 - https://grafana.wikimedia.org/d/000000561/logstash?orgId=1&panelId=2&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DLogstashIngestSpike
[10:23:47] <icinga-wm>	 RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: (C)0.3 gt (W)0.1 gt 0.08065 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[10:23:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkJobUnstable) resolved: (2) WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[10:24:22] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 #page on db1132 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1342.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:24:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: (2) Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag  - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[10:27:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: (2) Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh
[10:28:27] <icinga-wm>	 RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[10:28:36] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
[10:28:38] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
[10:28:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:31:29] <icinga-wm>	 RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[10:31:51] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:33:40] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10Traffic, and 2 others: [epic] The SSL certificate for Beta cluster domains fails to properly renew & deploy - https://phabricator.wikimedia.org/T293585 (10TheresNoTime)
[11:21:41] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl1002 is CRITICAL: instance=10.64.48.64 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[11:46:21] <Bsadowski1>	 O_O
[11:48:05] <RhinosF1>	 hey Bsadowski1 
[11:58:18] <jinxer-wm>	 (ProbeDown) firing: Service thumbor:8800 has failed probes (http_thumbor_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:59:31] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl1002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[12:08:19] <jinxer-wm>	 (ProbeDown) resolved: Service thumbor:8800 has failed probes (http_thumbor_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:12:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[12:15:41] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:25:42] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1003-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[12:26:41] <jinxer-wm>	 (CertManagerCertNotReady) firing: Certificate istio-system/knative-serving is not in a ready state - https://wikitech.wikimedia.org/wiki/Kubernetes/cert-manager  - https://alerts.wikimedia.org/?q=alertname%3DCertManagerCertNotReady
[12:36:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: Elasticsearch instance elastic1049-production-search-psi-eqiad is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchJVMGCOldPoolFlatlined
[12:54:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on ml-serve-ctrl2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[13:05:42] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[13:29:23] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:49:39] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift-account-stats_swift:dispersion.service,swift_dispersion_stats_lowlatency.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:01:01] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:06:53] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 5 day(s) (Sat 25 Jun 2022 07:55:09 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:09:15] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 24 Aug 2022 07:48:40 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:15:13] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_dispersion_stats.service,swift_ring_manager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:29:37] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:07:27] <icinga-wm>	 PROBLEM - SSH on cp5012.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:12:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[16:24:27] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 5 day(s) (Sat 25 Jun 2022 07:55:09 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:25:42] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1003-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[16:26:41] <jinxer-wm>	 (CertManagerCertNotReady) firing: Certificate istio-system/knative-serving is not in a ready state - https://wikitech.wikimedia.org/wiki/Kubernetes/cert-manager  - https://alerts.wikimedia.org/?q=alertname%3DCertManagerCertNotReady
[16:26:49] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 24 Aug 2022 07:48:40 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:36:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: Elasticsearch instance elastic1049-production-search-psi-eqiad is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchJVMGCOldPoolFlatlined
[16:37:38] <wikibugs>	 (03PS1) 10Majavah: P:wmcs::metricsinfra::prometheus: enable thanos sidecar [puppet] - 10https://gerrit.wikimedia.org/r/806551 (https://phabricator.wikimedia.org/T286301)
[16:37:40] <wikibugs>	 (03PS1) 10Majavah: P:metricsinfra: add thanos query [puppet] - 10https://gerrit.wikimedia.org/r/806552 (https://phabricator.wikimedia.org/T286301)
[16:37:42] <wikibugs>	 (03PS1) 10Majavah: P:metricsinfra::haproxy: add thanos routing [puppet] - 10https://gerrit.wikimedia.org/r/806553 (https://phabricator.wikimedia.org/T286301)
[16:39:32] <wikibugs>	 (03PS2) 10Majavah: P:metricsinfra::haproxy: add thanos routing [puppet] - 10https://gerrit.wikimedia.org/r/806553 (https://phabricator.wikimedia.org/T286301)
[16:54:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on ml-serve-ctrl2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[17:05:42] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[17:08:43] <icinga-wm>	 RECOVERY - SSH on cp5012.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:09:25] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 5 day(s) (Sat 25 Jun 2022 07:55:09 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:14:07] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 24 Aug 2022 07:48:40 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:49:37] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 5 day(s) (Sat 25 Jun 2022 07:55:09 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:51:59] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 24 Aug 2022 07:48:40 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:12:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on ml-staging-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[20:25:42] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1003-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[20:26:41] <jinxer-wm>	 (CertManagerCertNotReady) firing: Certificate istio-system/knative-serving is not in a ready state - https://wikitech.wikimedia.org/wiki/Kubernetes/cert-manager  - https://alerts.wikimedia.org/?q=alertname%3DCertManagerCertNotReady
[20:36:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: Elasticsearch instance elastic1049-production-search-psi-eqiad is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchJVMGCOldPoolFlatlined
[20:54:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on ml-serve-ctrl2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[21:05:42] <jinxer-wm>	 (NodeTextfileStale) firing: Stale textfile for ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
[23:14:13] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 5 day(s) (Sat 25 Jun 2022 07:55:09 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[23:16:35] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 24 Aug 2022 07:48:40 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[23:36:59] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: per-backend-service concurrency limits in ATS-BE - https://phabricator.wikimedia.org/T306223 (10CDanis) 05Open→03Stalled p:05Triage→03Medium