[00:04:44] (HaproxyUnavailable) firing: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable [00:05:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:05:07] (ProbeDown) firing: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:06:43] (VarnishUnavailable) firing: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DVarnishUnavailable [00:06:54] PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Idle https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [00:07:16] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:08:16] (MediaWikiLatencyExceeded) firing: (3) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:09:44] (HaproxyUnavailable) resolved: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable [00:10:07] (ProbeDown) resolved: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:10:07] (ProbeDown) resolved: (17) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:11:43] (VarnishUnavailable) resolved: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DVarnishUnavailable [00:13:16] (MediaWikiLatencyExceeded) resolved: (4) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:13:46] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:14:13] Just got 503 errors on enwiki [00:15:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:15:33] (ProbeDown) firing: (17) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:15:36] PROBLEM - PyBal backends health check on lvs3007 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp3064.esams.wmnet are marked down but pooled: textlb_443: Servers cp3064.esams.wmnet are marked down but pooled: textlb6_443: Servers cp3064.esams.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [00:15:37] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [00:16:14] (HaproxyUnavailable) firing: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable [00:17:06] RECOVERY - PyBal backends health check on lvs3007 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:18:12] (LVSHighRX) firing: Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:18:46] (MediaWikiLatencyExceeded) firing: (4) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:19:09] online, looking [00:20:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:20:33] (ProbeDown) firing: (17) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:21:14] (HaproxyUnavailable) resolved: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable [00:23:12] (LVSHighRX) resolved: Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:23:46] (MediaWikiLatencyExceeded) firing: (3) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:25:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:25:33] (ProbeDown) firing: (17) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:27:14] (HaproxyUnavailable) firing: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable [00:28:02] Still occasionally hitting 503 errors (if that's helpful) [00:28:46] (MediaWikiLatencyExceeded) firing: (3) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:29:43] grafana.wikimedia.org is also seeing 503 errors [00:30:02] PROBLEM - PyBal backends health check on lvs3005 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3064.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: textlb_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3054.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled [00:30:02] 6_443: Servers cp3050.esams.wmnet, cp3054.esams.wmnet, cp3058.esams.wmnet, cp3062.esams.wmnet, cp3052.esams.wmnet, cp3064.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: textlb6_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3064.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [00:30:06] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - appservers-https_443: Servers mw1367.eqiad.wmnet, mw1498.eqiad.wmnet, mw1477.eqiad.wmnet, mw1433.eqiad.wmnet, mw1414.eqiad.wmnet, mw1371.eqiad.wmnet, mw1451.eqiad.wmnet, mw1473.eqiad.wmnet, mw1475.eqiad.wmnet, mw1434.eqiad.wmnet, mw1411.eqiad.wmnet, mw1432.eqiad.wmnet, mw1384.eqiad.wmnet, mw1454.eqiad.wmnet, mw1387.eqiad.wmnet, mw1456.eqiad.wmnet, mw [00:30:06] ad.wmnet, mw1430.eqiad.wmnet, mw1415.eqiad.wmnet, mw1476.eqiad.wmnet, mw1480.eqiad.wmnet, mw1351.eqiad.wmnet, mw1391.eqiad.wmnet, mw1352.eqiad.wmnet, mw1399.eqiad.wmnet, mw1368.eqiad.wmnet, mw1435.eqiad.wmnet, mw1420.eqiad.wmnet, mw1472.eqiad.wmnet, mw1355.eqiad.wmnet, mw1395.eqiad.wmnet, mw1481.eqiad.wmnet, mw1366.eqiad.wmnet, mw1487.eqiad.wmnet, mw1372.eqiad.wmnet, mw1403.eqiad.wmnet, mw1389.eqiad.wmnet, mw1418.eqiad.wmnet, mw1496.eqiad [00:30:06] mw1401.eqiad.wmnet, mw1397.eqiad.wmnet, mw1365.eqiad.wmnet, mw1409.eqiad.wmnet, mw1385.eqiad.wmnet, mw1417.eqiad.wmnet, mw1455.eqiad.wmnet, mw1429.eqiad.wmnet, mw1441.eqiad.wmnet, mw141 https://wikitech.wikimedia.org/wiki/PyBal [00:30:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:30:33] (ProbeDown) firing: (17) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:30:38] PROBLEM - PyBal backends health check on lvs3007 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3062.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: textlb_443: Servers cp3050.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: testlb6_443: Serve [00:30:38] 0.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: textlb6_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3056.esams.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [00:30:58] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - appservers-https_443: Servers mw1477.eqiad.wmnet, mw1433.eqiad.wmnet, mw1365.eqiad.wmnet, mw1367.eqiad.wmnet, mw1475.eqiad.wmnet, mw1442.eqiad.wmnet, mw1434.eqiad.wmnet, mw1432.eqiad.wmnet, mw1478.eqiad.wmnet, mw1349.eqiad.wmnet, mw1384.eqiad.wmnet, mw1387.eqiad.wmnet, mw1430.eqiad.wmnet, mw1415.eqiad.wmnet, mw1476.eqiad.wmnet, mw1480.eqiad.wmnet, mw [00:30:58] ad.wmnet, mw1399.eqiad.wmnet, mw1435.eqiad.wmnet, mw1419.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqiad.wmnet, mw1481.eqiad.wmnet, mw1454.eqiad.wmnet, mw1372.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, mw1496.eqiad.wmnet, mw1395.eqiad.wmnet, mw1397.eqiad.wmnet, mw1385.eqiad.wmnet, mw1369.eqiad.wmnet, mw1455.eqiad.wmnet, mw1373.eqiad.wmnet, mw1436.eqiad.wmnet, mw1452.eqiad.wmnet, mw1498.eqiad.wmnet, mw1414.eqiad.wmnet, mw1417.eqiad [00:30:58] mw1371.eqiad.wmnet, mw1420.eqiad.wmnet, mw1473.eqiad.wmnet, mw1453.eqiad.wmnet, mw1413.eqiad.wmnet, mw1456.eqiad.wmnet, mw1351.eqiad.wmnet, mw1391.eqiad.wmnet, mw1352.eqiad.wmnet, mw144 https://wikitech.wikimedia.org/wiki/PyBal [00:31:34] PROBLEM - PHP7 rendering on mw1488 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:31:38] PROBLEM - PHP7 rendering on mw1368 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:31:38] PROBLEM - PHP7 rendering on mw1472 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:31:40] PROBLEM - PHP7 rendering on mw1350 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:31:40] PROBLEM - PHP7 rendering on mw1384 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:31:52] PROBLEM - PHP7 rendering on mw1370 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:31:54] PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:31:56] PROBLEM - PHP7 rendering on mw1367 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:31:58] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:00] PROBLEM - PHP7 rendering on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:00] PROBLEM - PHP7 rendering on mw1498 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:00] PROBLEM - PHP7 rendering on mw1478 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:00] PROBLEM - PHP7 rendering on mw1474 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:06] PROBLEM - PHP7 rendering on mw1371 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:12] (LVSHighRX) firing: Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:32:12] PROBLEM - PHP7 rendering on mw1480 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:12] PROBLEM - PHP7 rendering on mw1481 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:12] PROBLEM - PHP7 rendering on mw1475 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:12] PROBLEM - PHP7 rendering on mw1496 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:14] PROBLEM - PHP7 rendering on mw1365 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:16] (AppserversUnreachable) firing: Appserver unavailable for cluster appserver at eqiad - https://wikitech.wikimedia.org/wiki/Application_servers - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&var-site=eqiad&var-cluster=appserver - https://alerts.wikimedia.org/?q=alertname%3DAppserversUnreachable [00:32:16] PROBLEM - PHP7 rendering on mw1351 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:16] PROBLEM - PHP7 rendering on mw1366 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:18] PROBLEM - PHP7 rendering on mw1372 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:18] PROBLEM - PHP7 rendering on mw1373 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:18] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:20] PROBLEM - PHP7 rendering on mw1369 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:30] PROBLEM - PHP7 rendering on mw1477 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:32] PROBLEM - PHP7 rendering on mw1353 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:32:32] (JobUnavailable) firing: Reduced availability for job probes/swagger in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [00:33:02] PROBLEM - PHP7 rendering on mw1479 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:33:02] PROBLEM - PHP7 rendering on mw1476 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:33:08] PROBLEM - PHP7 rendering on mw1497 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:33:08] PROBLEM - PHP7 rendering on mw1487 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:33:08] PROBLEM - PHP7 rendering on mw1473 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:33:09] hi [00:33:16] (PHPFPMTooBusy) firing: Not enough idle php7.4-fpm.service workers for Mediawiki appserver at eqiad #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?from=now-3h&orgId=1&to=now&var-cluster=appserver&var-site=eqiad&viewPanel=64 - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [00:35:33] (ProbeDown) firing: (20) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:35:49] (ProbeDown) firing: (20) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:37:36] RECOVERY - PHP7 rendering on mw1487 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 6.829 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:37:36] RECOVERY - PHP7 rendering on mw1497 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 6.833 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:37:52] RECOVERY - PHP7 rendering on mw1370 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 9.061 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:37:54] RECOVERY - PHP7 rendering on mw1367 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 7.842 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:37:56] RECOVERY - PHP7 rendering on mw1474 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 4.550 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:37:56] RECOVERY - PHP7 rendering on mw1498 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 4.728 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:02] RECOVERY - PHP7 rendering on mw1478 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 9.840 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:06] RECOVERY - PHP7 rendering on mw1371 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 7.511 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:06] RECOVERY - PHP7 rendering on mw1496 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 3.310 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:18] RECOVERY - PHP7 rendering on mw1480 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 4.532 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:18] RECOVERY - PHP7 rendering on mw1481 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 3.734 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:18] RECOVERY - PHP7 rendering on mw1475 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 5.349 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:18] RECOVERY - PHP7 rendering on mw1351 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 4.422 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:19] RECOVERY - PHP7 rendering on mw1365 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 5.956 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:19] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 3.594 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:19] RECOVERY - PHP7 rendering on mw1366 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 7.066 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:20] RECOVERY - PHP7 rendering on mw1373 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 6.874 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:20] RECOVERY - PHP7 rendering on mw1372 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 7.067 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:21] RECOVERY - PHP7 rendering on mw1369 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 4.293 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:24] RECOVERY - PHP7 rendering on mw1477 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 2.484 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:25] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/940213 [00:38:26] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:38:30] RECOVERY - PHP7 rendering on mw1353 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 5.557 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:31] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/940213 (owner: 10TrainBranchBot) [00:38:56] RECOVERY - PHP7 rendering on mw1476 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:56] RECOVERY - PHP7 rendering on mw1479 is OK: HTTP OK: HTTP/1.1 302 Found - 517 bytes in 0.376 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:38:56] RECOVERY - PHP7 rendering on mw1488 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:39:00] RECOVERY - PyBal backends health check on lvs3005 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:39:02] RECOVERY - PHP7 rendering on mw1472 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 1.296 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:39:02] RECOVERY - PHP7 rendering on mw1368 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 2.406 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:39:02] RECOVERY - PHP7 rendering on mw1473 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 2.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:39:02] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:39:06] RECOVERY - PHP7 rendering on mw1350 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 3.595 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:39:06] RECOVERY - PHP7 rendering on mw1384 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 3.948 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:39:20] RECOVERY - PHP7 rendering on mw1352 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 3.949 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:39:26] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 6.025 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:39:30] RECOVERY - PHP7 rendering on mw1349 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 6.821 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:40:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:40:33] (ProbeDown) firing: (19) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:41:55] (DDoSDetected) firing: FastNetMon has detected an attack on esams #page - https://bit.ly/wmf-fastnetmon - https://w.wiki/8oU - https://alerts.wikimedia.org/?q=alertname%3DDDoSDetected [00:42:12] (LVSHighRX) resolved: Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:42:16] (AppserversUnreachable) resolved: Appserver unavailable for cluster appserver at eqiad - https://wikitech.wikimedia.org/wiki/Application_servers - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&var-site=eqiad&var-cluster=appserver - https://alerts.wikimedia.org/?q=alertname%3DAppserversUnreachable [00:42:21] ww/win 14 [00:42:32] (JobUnavailable) resolved: Reduced availability for job probes/swagger in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [00:42:56] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - appservers-https_443: Servers mw1477.eqiad.wmnet, mw1433.eqiad.wmnet, mw1365.eqiad.wmnet, mw1455.eqiad.wmnet, mw1475.eqiad.wmnet, mw1442.eqiad.wmnet, mw1434.eqiad.wmnet, mw1432.eqiad.wmnet, mw1478.eqiad.wmnet, mw1349.eqiad.wmnet, mw1384.eqiad.wmnet, mw1387.eqiad.wmnet, mw1430.eqiad.wmnet, mw1415.eqiad.wmnet, mw1476.eqiad.wmnet, mw1480.eqiad.wmnet, mw [00:42:56] ad.wmnet, mw1399.eqiad.wmnet, mw1435.eqiad.wmnet, mw1420.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqiad.wmnet, mw1481.eqiad.wmnet, mw1454.eqiad.wmnet, mw1372.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, mw1496.eqiad.wmnet, mw1395.eqiad.wmnet, mw1397.eqiad.wmnet, mw1385.eqiad.wmnet, mw1417.eqiad.wmnet, mw1367.eqiad.wmnet, mw1409.eqiad.wmnet, mw1436.eqiad.wmnet, mw1452.eqiad.wmnet, mw1498.eqiad.wmnet, mw1414.eqiad.wmnet, mw1369.eqiad [00:42:56] mw1371.eqiad.wmnet, mw1419.eqiad.wmnet, mw1473.eqiad.wmnet, mw1453.eqiad.wmnet, mw1413.eqiad.wmnet, mw1456.eqiad.wmnet, mw1351.eqiad.wmnet, mw1391.eqiad.wmnet, mw1352.eqiad.wmnet, mw144 https://wikitech.wikimedia.org/wiki/PyBal [00:43:32] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - appservers-https_443: Servers mw1367.eqiad.wmnet, mw1498.eqiad.wmnet, mw1389.eqiad.wmnet, mw1433.eqiad.wmnet, mw1414.eqiad.wmnet, mw1369.eqiad.wmnet, mw1371.eqiad.wmnet, mw1419.eqiad.wmnet, mw1365.eqiad.wmnet, mw1473.eqiad.wmnet, mw1453.eqiad.wmnet, mw1395.eqiad.wmnet, mw1434.eqiad.wmnet, mw1432.eqiad.wmnet, mw1478.eqiad.wmnet, mw1349.eqiad.wmnet, mw [00:43:32] ad.wmnet, mw1387.eqiad.wmnet, mw1456.eqiad.wmnet, mw1415.eqiad.wmnet, mw1480.eqiad.wmnet, mw1351.eqiad.wmnet, mw1353.eqiad.wmnet, mw1405.eqiad.wmnet, mw1497.eqiad.wmnet, mw1352.eqiad.wmnet, mw1441.eqiad.wmnet, mw1391.eqiad.wmnet, mw1435.eqiad.wmnet, mw1420.eqiad.wmnet, mw1454.eqiad.wmnet, mw1431.eqiad.wmnet, mw1355.eqiad.wmnet, mw1393.eqiad.wmnet, mw1481.eqiad.wmnet, mw1366.eqiad.wmnet, mw1487.eqiad.wmnet, mw1370.eqiad.wmnet, mw1429.eqiad [00:43:32] mw1451.eqiad.wmnet, mw1418.eqiad.wmnet, mw1496.eqiad.wmnet, mw1401.eqiad.wmnet, mw1403.eqiad.wmnet, mw1373.eqiad.wmnet, mw1411.eqiad.wmnet, mw1417.eqiad.wmnet, mw1385.eqiad.wmnet, mw141 https://wikitech.wikimedia.org/wiki/PyBal [00:43:34] PROBLEM - PyBal backends health check on lvs3005 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp3050.esams.wmnet, cp3062.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: textlb_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: testlb6_443: Serve [00:43:34] 0.esams.wmnet, cp3058.esams.wmnet, cp3062.esams.wmnet, cp3052.esams.wmnet, cp3064.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: textlb6_443: Servers cp3060.esams.wmnet, cp3054.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3058.esams.wmnet, cp3056.esams.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [00:43:46] (MediaWikiLatencyExceeded) firing: (2) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:44:48] PROBLEM - PHP7 rendering on mw1351 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:44:50] PROBLEM - PHP7 rendering on mw1372 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:44:50] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:04] PROBLEM - PHP7 rendering on mw1353 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:45:16] (AppserversUnreachable) firing: Appserver unavailable for cluster appserver at eqiad - https://wikitech.wikimedia.org/wiki/Application_servers - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&var-site=eqiad&var-cluster=appserver - https://alerts.wikimedia.org/?q=alertname%3DAppserversUnreachable [00:45:33] (ProbeDown) firing: (18) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:45:34] PROBLEM - PHP7 rendering on mw1476 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:34] PROBLEM - PHP7 rendering on mw1479 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:36] PROBLEM - PHP7 rendering on mw1488 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:38] PROBLEM - PHP7 rendering on mw1487 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:38] PROBLEM - PHP7 rendering on mw1472 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:38] PROBLEM - PHP7 rendering on mw1497 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:38] PROBLEM - PHP7 rendering on mw1368 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:40] PROBLEM - PHP7 rendering on mw1473 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:42] PROBLEM - PHP7 rendering on mw1384 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:42] PROBLEM - PHP7 rendering on mw1350 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:54] PROBLEM - PHP7 rendering on mw1370 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:54] PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:58] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:45:58] PROBLEM - PHP7 rendering on mw1367 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:00] PROBLEM - PHP7 rendering on mw1474 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:02] PROBLEM - PHP7 rendering on mw1498 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:02] PROBLEM - PHP7 rendering on mw1478 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:02] PROBLEM - PHP7 rendering on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:06] PROBLEM - PHP7 rendering on mw1371 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:10] PROBLEM - PHP7 rendering on mw1480 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:12] (LVSHighRX) firing: Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:46:14] PROBLEM - PHP7 rendering on mw1475 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:14] PROBLEM - PHP7 rendering on mw1496 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:14] PROBLEM - PHP7 rendering on mw1481 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:16] PROBLEM - PHP7 rendering on mw1365 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:18] PROBLEM - PHP7 rendering on mw1366 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:20] PROBLEM - PHP7 rendering on mw1373 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:22] PROBLEM - PHP7 rendering on mw1369 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:32] PROBLEM - PHP7 rendering on mw1477 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:46:33] (JobUnavailable) firing: Reduced availability for job probes/swagger in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [00:46:50] PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS64600/IPv4: Active - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [00:50:33] (ProbeDown) firing: (18) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:52:32] RECOVERY - PHP7 rendering on mw1477 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 7.844 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:52:34] RECOVERY - PyBal backends health check on lvs3005 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:53:46] (MediaWikiLatencyExceeded) firing: (2) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:55:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:55:25] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/940213 (owner: 10TrainBranchBot) [00:56:12] (LVSHighRX) resolved: Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:56:33] (JobUnavailable) resolved: Reduced availability for job probes/swagger in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [00:57:04] PROBLEM - PyBal backends health check on lvs3005 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp3060.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: textlb_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled [00:57:04] 6_443: Servers cp3050.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3064.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: textlb6_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [00:57:12] (LVSHighRX) firing: Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:58:40] PROBLEM - Host lvs3005 is DOWN: PING CRITICAL - Packet loss = 100% [00:58:46] (MediaWikiLatencyExceeded) firing: (2) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:59:00] RECOVERY - Host lvs3005 is UP: PING OK - Packet loss = 0%, RTA = 81.06 ms [00:59:04] PROBLEM - PHP7 rendering on mw1477 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:59:32] (JobUnavailable) firing: Reduced availability for job probes/swagger in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:00:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:00:08] PROBLEM - PyBal backends health check on lvs3005 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp3060.esams.wmnet, cp3054.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: textlb_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are ma [01:00:08] n but pooled: testlb6_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3064.esams.wmnet, cp3056.esams.wmnet are marked down but pooled: ncredirlb_443: Servers ncredir3001.esams.wmnet are marked down but pooled: textlb6_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3054.esams.wmnet, cp3062.esams.wmnet, cp3064.esams.wmnet, cp3058.esams.wmnet, cp3052.esams.wmnet, cp3056.esams.wmnet are m [01:00:08] wn but pooled https://wikitech.wikimedia.org/wiki/PyBal [01:00:10] hi, is it expected that phabricator.wikimedia.org not to be working in certain locations? [01:00:21] yes, ongoing incident, thanks [01:00:26] ah, ok, thanks! [01:00:26] PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS64600/IPv4: Active - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [01:00:33] (ProbeDown) firing: (19) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:02:12] (LVSHighRX) firing: (2) Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [01:05:33] (ProbeDown) firing: (18) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:07:12] (LVSHighRX) firing: (2) Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [01:08:46] (MediaWikiLatencyExceeded) firing: (2) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [01:10:56] PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS64600/IPv4: Active - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [01:15:03] (03PS1) 10Ssingh: prepend esams and knams [homer/public] - 10https://gerrit.wikimedia.org/r/940491 [01:15:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:15:33] (ProbeDown) firing: (16) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:15:42] !log sukhe@cumin2002 START - Cookbook sre.network.cf [01:15:42] !log sukhe@cumin2002 END (PASS) - Cookbook sre.network.cf (exit_code=0) [01:16:32] (03CR) 10Ssingh: [C: 03+2] prepend esams and knams [homer/public] - 10https://gerrit.wikimedia.org/r/940491 (owner: 10Ssingh) [01:17:12] (LVSHighRX) resolved: (2) Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [01:18:46] (MediaWikiLatencyExceeded) firing: (2) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [01:20:33] (ProbeDown) firing: (16) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:21:55] (DDoSDetected) resolved: FastNetMon has detected an attack on esams #page - https://bit.ly/wmf-fastnetmon - https://w.wiki/8oU - https://alerts.wikimedia.org/?q=alertname%3DDDoSDetected [01:23:42] (LVSHighRX) firing: Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [01:23:46] (MediaWikiLatencyExceeded) firing: (2) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [01:25:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:28:56] PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_hourly_appserver.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:30:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:32:32] PROBLEM - Check unit status of httpbb_hourly_appserver on cumin1001 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_hourly_appserver https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:45:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:50:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:54:38] RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:00:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:04:02] RECOVERY - Check unit status of httpbb_hourly_appserver on cumin1001 is OK: OK: Status of the systemd unit httpbb_hourly_appserver https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [02:05:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:09:32] (JobUnavailable) firing: (3) Reduced availability for job probes/swagger in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:10:07] (ProbeDown) firing: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:12:02] RECOVERY - PHP7 rendering on mw1472 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 7.554 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:02] RECOVERY - PHP7 rendering on mw1497 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 7.647 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:02] RECOVERY - PHP7 rendering on mw1487 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 8.119 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:02] RECOVERY - PHP7 rendering on mw1473 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 8.596 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:08] RECOVERY - PHP7 rendering on mw1370 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 1.190 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:08] RECOVERY - PHP7 rendering on mw1352 is OK: HTTP OK: HTTP/1.1 302 Found - 518 bytes in 1.582 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:10] RECOVERY - PHP7 rendering on mw1498 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.030 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:10] RECOVERY - PHP7 rendering on mw1474 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.031 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:10] RECOVERY - PHP7 rendering on mw1349 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.030 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:10] RECOVERY - PHP7 rendering on mw1367 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.031 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:10] RECOVERY - PHP7 rendering on mw1478 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.034 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:12] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 302 Found - 517 bytes in 0.195 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:16] RECOVERY - PHP7 rendering on mw1371 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.030 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:22] RECOVERY - PHP7 rendering on mw1480 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.036 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:24] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [02:12:28] RECOVERY - PHP7 rendering on mw1365 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.030 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:28] RECOVERY - PHP7 rendering on mw1475 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:28] RECOVERY - PHP7 rendering on mw1496 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:28] RECOVERY - PHP7 rendering on mw1481 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:30] RECOVERY - PHP7 rendering on mw1351 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.030 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:30] RECOVERY - PHP7 rendering on mw1366 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.038 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:30] RECOVERY - PHP7 rendering on mw1373 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.037 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:34] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.035 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:34] RECOVERY - PHP7 rendering on mw1372 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.081 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:36] RECOVERY - PHP7 rendering on mw1369 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:12:44] RECOVERY - PyBal backends health check on lvs3005 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [02:12:46] RECOVERY - PHP7 rendering on mw1353 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.077 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:13:12] RECOVERY - PHP7 rendering on mw1488 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:13:12] RECOVERY - PHP7 rendering on mw1477 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.026 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:13:12] RECOVERY - PHP7 rendering on mw1476 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.027 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:13:12] RECOVERY - PHP7 rendering on mw1479 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.024 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:13:18] RECOVERY - PyBal backends health check on lvs3007 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [02:13:18] we should be seeing recoveries soon but I know I will jinx it by saying it, because c'est la vie [02:13:20] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [02:13:24] RECOVERY - PHP7 rendering on mw1368 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.033 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:13:24] RECOVERY - PHP7 rendering on mw1384 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.031 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:13:24] RECOVERY - PHP7 rendering on mw1350 is OK: HTTP OK: HTTP/1.1 302 Found - 516 bytes in 0.041 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [02:13:42] (LVSHighRX) resolved: Excessive RX traffic on lvs3005:9100 (enp175s0f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [02:14:32] (JobUnavailable) firing: (3) Reduced availability for job probes/swagger in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:15:07] (ProbeDown) resolved: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) #page - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:15:16] (AppserversUnreachable) resolved: Appserver unavailable for cluster appserver at eqiad - https://wikitech.wikimedia.org/wiki/Application_servers - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&var-site=eqiad&var-cluster=appserver - https://alerts.wikimedia.org/?q=alertname%3DAppserversUnreachable [02:15:34] (ProbeDown) resolved: (13) Service appservers-https:443 has failed probes (http_appservers-https_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:15:42] PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_hourly_appserver.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:16:00] PROBLEM - Check unit status of httpbb_hourly_appserver on cumin1001 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_hourly_appserver https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [02:17:14] (HaproxyUnavailable) resolved: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable [02:18:16] (PHPFPMTooBusy) resolved: Not enough idle php7.4-fpm.service workers for Mediawiki appserver at eqiad #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?from=now-3h&orgId=1&to=now&var-cluster=appserver&var-site=eqiad&viewPanel=64 - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [02:18:46] (MediaWikiLatencyExceeded) resolved: (2) Average latency high: eqiad appserver GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [02:19:08] PROBLEM - Check systemd state on gitlab1003 is CRITICAL: CRITICAL - degraded: The following units failed: sync-gitlab-group-with-ldap.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:19:28] PROBLEM - Check systemd state on gitlab2002 is CRITICAL: CRITICAL - degraded: The following units failed: sync-gitlab-group-with-ldap.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:29:32] (JobUnavailable) firing: (3) Reduced availability for job probes/swagger in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:30:02] RECOVERY - Check systemd state on gitlab2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:34:32] (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:46:30] RECOVERY - Check systemd state on gitlab1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:55:02] RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:57:58] RECOVERY - Check unit status of httpbb_hourly_appserver on cumin1001 is OK: OK: Status of the systemd unit httpbb_hourly_appserver https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:30:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [03:40:16] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [03:48:30] PROBLEM - Check systemd state on netbox1002 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_report_accounting_run.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:49:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [04:14:36] RECOVERY - Check systemd state on netbox1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:15:37] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [04:29:16] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [04:35:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:20:37] (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [06:21:07] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [06:22:56] PROBLEM - Check systemd state on build2001 is CRITICAL: CRITICAL - degraded: The following units failed: debian-weekly-rebuild.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:31:07] (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230723T0700) [08:28:34] 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder) [08:35:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [08:44:17] (NELHigh) firing: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh [08:49:17] (NELHigh) resolved: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh [08:58:17] (NELHigh) firing: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh [09:03:18] (NELHigh) resolved: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh [09:19:17] (NELHigh) firing: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh [09:21:38] (KubernetesAPILatency) firing: High Kubernetes API latency (LIST services) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [09:24:17] (NELHigh) resolved: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh [09:26:38] (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST services) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [09:50:16] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [09:51:38] (KubernetesAPILatency) firing: High Kubernetes API latency (LIST services) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [09:56:38] (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST services) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [11:11:32] 10SRE, 10Traffic, 10Wikimedia-Incident: 2023-07-23 unavailable with 503 error - https://phabricator.wikimedia.org/T342503 (10taavi) 05Open→03Resolved Please don't open such tasks several hours after the issue was fixed, especially if you're missing context on what caused the issue or how it was resolved. [11:22:03] (ProbeDown) firing: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [11:27:03] (ProbeDown) resolved: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:06:01] 10SRE-swift-storage, 10Commons: Server error 500 after uploading chunk - https://phabricator.wikimedia.org/T340917 (10Yann) This still happens on Wikisource, and there is no file on Special:UploadStash. This is weird, as I was able to upload a 1 GB PDF file on Commons recently. [12:49:12] PROBLEM - WDQS SPARQL on wdqs1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [12:49:30] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:53:38] RECOVERY - WDQS SPARQL on wdqs1005 is OK: HTTP OK: HTTP/1.1 200 OK - 691 bytes in 0.154 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [13:00:14] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:07:32] (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:17:32] (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:07:27] 10SRE-swift-storage, 10Commons: Server error 500 after uploading chunk - https://phabricator.wikimedia.org/T340917 (10Midleading) Why not upload all these files on Commons so that they can be used on any Wikimedia project? I have been able to upload PDFs from 1GB to 3GB on Commons 100% successful with the meth... [16:36:38] (03PS7) 10Aklapper: sdwiki: set 'wgTranslateNumerals' to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/937922 (https://phabricator.wikimedia.org/T268203) (owner: 10Kaleem Bhatti) [16:36:57] (03CR) 10Aklapper: "See https://www.mediawiki.org/wiki/Gerrit/Code_review/Getting_reviews" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/937922 (https://phabricator.wikimedia.org/T268203) (owner: 10Kaleem Bhatti) [17:20:50] PROBLEM - Router interfaces on cr3-esams is CRITICAL: CRITICAL: host 91.198.174.245, interfaces up: 83, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:22:08] PROBLEM - Router interfaces on cr2-drmrs is CRITICAL: CRITICAL: host 185.15.58.129, interfaces up: 61, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [18:26:03] what's up with wikipedia is russia? [18:26:16] is it blocked by Wikimedia or by ISPs? [18:38:25] There are a lot of reports about connection issues (from users and major media). What is the problem? [18:43:18] (KubernetesAPILatency) firing: High Kubernetes API latency (POST nodes) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [18:48:18] (KubernetesAPILatency) resolved: High Kubernetes API latency (POST nodes) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [19:09:22] (03PS1) 10Vgutierrez: learn.wiki: Update ALB CNAME records [dns] - 10https://gerrit.wikimedia.org/r/940503 (https://phabricator.wikimedia.org/T342509) [19:10:02] (03CR) 10Vgutierrez: [C: 04-2] "Do not merge till 2024-07-24 10:00:00 UTC" [dns] - 10https://gerrit.wikimedia.org/r/940503 (https://phabricator.wikimedia.org/T342509) (owner: 10Vgutierrez) [19:36:47] CyberTailor, Iluvatar: are you getting specific errors now are you talking about previously? [19:39:13] Users are reporting about issues en masse right now. [19:40:33] vgutierrez: are widespread connectivity issues known? The graph on wikimediastatus.net is elavated but I'm not sure if that's related [19:40:39] And is there anywhere to point users [19:40:52] Or any other SRE around? [19:41:06] sukhe, Emperor: ^ [19:41:31] Iluvatar: I pinged some people around today. Hopefully we'll have an answer soon. [19:41:35] widespread? :) [19:41:53] vgutierrez: read a few messages up. At least across Russia. [19:41:56] Discussion: https://ru.wikipedia.org/wiki/Википедия:Форум/Новости#Сообщения_о_блокировке_Википедии_в_России (+report in a social media, media, messengers, etc) [19:42:02] 19:38:24 There are a lot of reports about connection issues (from users and major media). What is the problem? [19:43:57] it's a known side-effect of the applied countermeasures [19:44:31] vgutierrez: is there anything that can be said on wiki? [19:44:45] RhinosF1: tricky, we are discussing [19:44:53] and not many of us are around today [19:45:25] sukhe: I don't blame you. I can imagine it's hard balance. [19:47:28] (03PS1) 10Ssingh: Revert "prepend esams and knams" [homer/public] - 10https://gerrit.wikimedia.org/r/940459 [19:48:43] (03CR) 10Ssingh: [C: 03+2] Revert "prepend esams and knams" [homer/public] - 10https://gerrit.wikimedia.org/r/940459 (owner: 10Ssingh) [19:53:36] !log sukhe@cumin2002 START - Cookbook sre.network.cf [19:53:37] !log sukhe@cumin2002 END (PASS) - Cookbook sre.network.cf (exit_code=0) [19:55:21] thanks for reporting, please let us know if the issues persist, they should be resolving [19:55:27] Iluvatar: CyberTailor: RhinosF1: ^ [19:55:47] sukhe: thank you for the quick reply [19:56:06] I will go back to making random guesses in my head about what happened [20:20:08] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:30:58] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:38:22] sukhe: yes, people report that wikipedia works now [21:18:58] CyberTailor: thanks [22:55:53] (03PS1) 10Lucas Werkmeister: tools-static: Hide more Cloudflare response headers [puppet] - 10https://gerrit.wikimedia.org/r/940506 [22:56:23] (03PS2) 10Lucas Werkmeister: tools-static: Hide more Cloudflare response headers [puppet] - 10https://gerrit.wikimedia.org/r/940506 [22:56:44] (03PS3) 10Lucas Werkmeister: tools-static: Hide more Cloudflare response headers [puppet] - 10https://gerrit.wikimedia.org/r/940506 [23:00:27] (03PS4) 10Lucas Werkmeister: tools-static: Hide more Cloudflare response headers [puppet] - 10https://gerrit.wikimedia.org/r/940506 [23:04:08] (03CR) 10Majavah: [C: 04-1] "please add matching rules to the /cdnjs block too" [puppet] - 10https://gerrit.wikimedia.org/r/940506 (owner: 10Lucas Werkmeister) [23:05:20] (03CR) 10Lucas Werkmeister: tools-static: Hide more Cloudflare response headers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/940506 (owner: 10Lucas Werkmeister) [23:06:16] (03CR) 10Lucas Werkmeister: tools-static: Hide more Cloudflare response headers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/940506 (owner: 10Lucas Werkmeister) [23:06:22] (03PS5) 10Lucas Werkmeister: tools-static: Hide more Cloudflare response headers [puppet] - 10https://gerrit.wikimedia.org/r/940506 [23:07:07] (03CR) 10Lucas Werkmeister: tools-static: Hide more Cloudflare response headers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/940506 (owner: 10Lucas Werkmeister) [23:15:50] PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Idle - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status