[00:14:48] PROBLEM - Host fasw-c-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:15:02] PROBLEM - Host ps1-f5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:15:02] PROBLEM - Host ps1-f2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:15:02] PROBLEM - Host ps1-e5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:15:02] PROBLEM - Host ps1-e2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:15:12] PROBLEM - Host ps1-e1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:15:32] PROBLEM - Host asw2-a-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:15:32] PROBLEM - Host asw2-b-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:15:44] PROBLEM - Host asw2-c-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:00] PROBLEM - Host ps1-e3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:04] PROBLEM - Host asw2-d-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:08] PROBLEM - Host ps1-f1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:12] PROBLEM - Host ps1-e6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:20] PROBLEM - Host ps1-e7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:20] PROBLEM - Host ps1-e8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:26] PROBLEM - Host ps1-f8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:26] PROBLEM - Host ps1-f4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:26] PROBLEM - Host ps1-f7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:28] PROBLEM - Host ps1-f6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:32] PROBLEM - Host ps1-f3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:16:38] PROBLEM - Host ps1-e4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [00:17:56] PROBLEM - Host mr1-eqiad IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [00:23:43] (JobUnavailable) firing: Reduced availability for job pdu_sentry4 in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [00:27:06] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [00:38:49] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/968978 [00:38:55] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/968978 (owner: 10TrainBranchBot) [00:42:42] RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:57:06] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/968978 (owner: 10TrainBranchBot) [02:26:57] (KeyholderUnarmed) firing: (2) 1 unarmed Keyholder key(s) on acmechief2002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [02:36:26] PROBLEM - Check systemd state on aqs1010 is CRITICAL: CRITICAL - degraded: The following units failed: aqs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:38:43] (JobUnavailable) firing: (2) Reduced availability for job pdu_sentry4 in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:01:44] RECOVERY - Check systemd state on aqs1010 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:03:43] (JobUnavailable) firing: (2) Reduced availability for job pdu_sentry4 in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:40:18] RECOVERY - Host ps1-e6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.18 ms [03:40:18] RECOVERY - Host ps1-e1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms [03:40:18] RECOVERY - Host ps1-e5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.19 ms [03:40:18] RECOVERY - Host ps1-f2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms [03:40:18] RECOVERY - Host ps1-e8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.13 ms [03:40:19] RECOVERY - Host ps1-f4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.17 ms [03:40:19] RECOVERY - Host ps1-e2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.41 ms [03:40:20] RECOVERY - Host ps1-f8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.99 ms [03:40:20] RECOVERY - Host ps1-f3-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.58 ms [03:40:21] RECOVERY - Host ps1-e7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.65 ms [03:40:21] RECOVERY - Host ps1-f1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.01 ms [03:40:22] RECOVERY - Host ps1-f6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.67 ms [03:40:22] RECOVERY - Host ps1-f5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.10 ms [03:40:23] RECOVERY - Host ps1-e3-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.26 ms [03:40:23] RECOVERY - Host ps1-f7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 0.97 ms [03:40:24] RECOVERY - Host ps1-e4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.08 ms [03:40:24] RECOVERY - Host asw2-c-eqiad is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [03:40:25] RECOVERY - Host asw2-a-eqiad is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [03:40:28] RECOVERY - Host fasw-c-eqiad is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [03:40:44] RECOVERY - Host asw2-d-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.35 ms [03:41:18] RECOVERY - Host asw2-b-eqiad is UP: PING OK - Packet loss = 0%, RTA = 0.82 ms [03:42:50] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [03:43:43] (JobUnavailable) resolved: Reduced availability for job pdu_sentry4 in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:44:34] RECOVERY - Host mr1-eqiad IPv6 is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [03:51:16] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [06:31:41] (KeyholderUnarmed) firing: (2) 1 unarmed Keyholder key(s) on acmechief2002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [06:33:42] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:46:10] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231029T0700) [07:51:16] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:19:36] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:19:46] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:19:56] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:22:22] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50715 bytes in 6.757 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:22:28] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 3.657 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:22:34] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Sun 17 Dec 2023 03:07:37 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:34:20] PROBLEM - Check systemd state on aqs1010 is CRITICAL: CRITICAL - degraded: The following units failed: aqs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:00:36] RECOVERY - Check systemd state on aqs1010 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:17:24] (03CR) 10CI reject: [V: 04-1] namespaces:mediawiki: add Extensions as alias of NS_EXTENSION [mediawiki-config] - 10https://gerrit.wikimedia.org/r/969353 (https://phabricator.wikimedia.org/T349970) (owner: 10RhinosF1) [10:18:34] (CirrusSearchJobQueueLagTooHigh) firing: CirrusSearch job cirrusSearchLinksUpdate lag is too high: 7h 22m 24s - TODO - https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?orgId=1&var-dc=codfw%20prometheus/k8s&var-job=cirrusSearchLinksUpdate - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchJobQueueLagTooHigh [10:19:08] (03PS3) 10RhinosF1: namespaces:mediawiki: add Extensions as alias of NS_EXTENSION [mediawiki-config] - 10https://gerrit.wikimedia.org/r/969353 (https://phabricator.wikimedia.org/T349970) [10:20:18] (03CR) 10CI reject: [V: 04-1] namespaces:mediawiki: add Extensions as alias of NS_EXTENSION [mediawiki-config] - 10https://gerrit.wikimedia.org/r/969353 (https://phabricator.wikimedia.org/T349970) (owner: 10RhinosF1) [10:24:16] (03PS4) 10RhinosF1: namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/969353 (https://phabricator.wikimedia.org/T349970) [10:25:28] (03CR) 10CI reject: [V: 04-1] namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/969353 (https://phabricator.wikimedia.org/T349970) (owner: 10RhinosF1) [10:25:30] (03PS5) 10RhinosF1: namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/969353 (https://phabricator.wikimedia.org/T349970) [10:28:34] (CirrusSearchJobQueueLagTooHigh) resolved: CirrusSearch job cirrusSearchLinksUpdate lag is too high: 7h 7m 59s - TODO - https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?orgId=1&var-dc=codfw%20prometheus/k8s&var-job=cirrusSearchLinksUpdate - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchJobQueueLagTooHigh [10:31:57] (KeyholderUnarmed) firing: (2) 1 unarmed Keyholder key(s) on acmechief2002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [11:51:16] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [14:08:40] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:09:20] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:09:54] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.323 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:10:38] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50715 bytes in 3.751 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:17:06] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:17:46] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:18:26] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 7.080 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:19:00] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50713 bytes in 0.102 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:31:58] (KeyholderUnarmed) firing: (2) 1 unarmed Keyholder key(s) on acmechief2002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [14:38:43] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:53:43] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:49:24] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:51:16] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [15:54:52] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:55:18] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:56:12] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 6.379 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:56:32] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50713 bytes in 0.080 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [16:00:32] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:34:08] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:45:18] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:58:04] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [17:59:20] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.381 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:04:54] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:30:02] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:31:58] (KeyholderUnarmed) firing: (2) 1 unarmed Keyholder key(s) on acmechief2002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [19:51:16] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [20:06:26] PROBLEM - Check systemd state on cumin2002 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-web_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:06:32] PROBLEM - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [20:44:12] 10SRE, 10Maps: Allow Wikimedia Maps usage on wikiworld.sidl-corporation.fr - https://phabricator.wikimedia.org/T349985 (10SIDLCorporation) [21:02:24] RECOVERY - Check systemd state on cumin2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:04:58] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:08:52] RECOVERY - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [21:30:08] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:34:20] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:42:38] (03PS1) 10Pppery: Avoid trailing newline in qqq.json [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/969515 (https://phabricator.wikimedia.org/T294754) [21:43:06] (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1005-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [21:45:28] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:05:28] PROBLEM - Router interfaces on cr2-drmrs is CRITICAL: CRITICAL: host 185.15.58.129, interfaces up: 61, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:06:14] PROBLEM - Router interfaces on cr1-esams is CRITICAL: CRITICAL: host 185.15.59.128, interfaces up: 77, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:06:34] (03PS1) 10Pppery: Update source strings from Phabricaotr source [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/969517 [22:09:04] RECOVERY - Router interfaces on cr1-esams is OK: OK: host 185.15.59.128, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:09:40] RECOVERY - Router interfaces on cr2-drmrs is OK: OK: host 185.15.58.129, interfaces up: 62, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:31:58] (KeyholderUnarmed) firing: (2) 1 unarmed Keyholder key(s) on acmechief2002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [22:38:06] (CirrusSearchHighOldGCFrequency) resolved: Elasticsearch instance cloudelastic1005-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [23:02:01] (03PS1) 10Pppery: Update source strings from Phrabricator [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/969518 [23:04:47] (03PS2) 10Pppery: Update source strings from Phrabricator [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/969518 [23:06:26] (03PS3) 10Pppery: Update source strings from Phrabricator [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/969518 (https://phabricator.wikimedia.org/T969518) [23:08:08] (03PS4) 10Pppery: Update source strings from Phrabricator [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/969518 (https://phabricator.wikimedia.org/T318763) [23:51:16] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [23:53:33] (03PS1) 10Pppery: Update arcanist translations too [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/969520 (https://phabricator.wikimedia.org/T318763)