[00:00:37] <icinga-wm>	 RECOVERY - Check systemd state on grafana1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:06:05] <icinga-wm>	 PROBLEM - Check systemd state on grafana1002 is CRITICAL: CRITICAL - degraded: The following units failed: grafana-ldap-users-sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:42:47] <icinga-wm>	 RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:18:33] <wikibugs>	 (03CR) 10Superpes15: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/886416 (https://phabricator.wikimedia.org/T328834) (owner: 10Superpes15)
[02:10:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:20:45] <jinxer-wm>	 (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:09:11] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:09:59] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:11:41] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 49567 bytes in 2.800 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:12:41] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.197 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:46:37] <icinga-wm>	 RECOVERY - BFD status on cr2-codfw is OK: OK: UP: 18 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:47:21] <icinga-wm>	 RECOVERY - BFD status on cr2-eqiad is OK: OK: UP: 17 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[05:55:35] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 226, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:57:25] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 227, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:59:51] <icinga-wm>	 PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), Fresh: 116 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[06:05:01] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:08:33] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1091 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[06:10:47] <icinga-wm>	 PROBLEM - Swift https frontend on ms-fe2012 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 309 bytes in 7.419 second response time https://wikitech.wikimedia.org/wiki/Swift
[06:12:29] <icinga-wm>	 RECOVERY - Swift https frontend on ms-fe2012 is OK: HTTP OK: HTTP/1.1 200 OK - 245 bytes in 0.140 second response time https://wikitech.wikimedia.org/wiki/Swift
[06:15:55] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:19:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1091 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[06:23:51] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[06:25:33] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.195 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[06:28:35] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 226, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:32:17] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 227, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:38:01] <icinga-wm>	 PROBLEM - BFD status on cr2-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:40:29] <icinga-wm>	 PROBLEM - BFD status on cr2-eqiad is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:58:07] <icinga-wm>	 RECOVERY - BFD status on cr2-codfw is OK: OK: UP: 19 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:58:43] <icinga-wm>	 RECOVERY - BFD status on cr2-eqiad is OK: OK: UP: 17 AdminDown: 2 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:24:07] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1091 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230205T0800)
[09:02:13] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 117 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[09:12:13] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1091 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[09:44:39] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1091 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[10:27:55] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1091 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[10:58:17] <jinxer-wm>	 (NELHigh) firing: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[11:00:17] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1091 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[11:03:17] <jinxer-wm>	 (NELHigh) resolved: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[11:36:17] <jinxer-wm>	 (NELHigh) firing: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[11:41:17] <jinxer-wm>	 (NELHigh) resolved: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[11:49:49] <topranks>	 !log Manually deactivating peering to Telecom Italia / Seabone at AMS-IX on cr2-esams as they are having issues
[11:49:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1091 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[12:26:45] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1091 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[12:30:55] <icinga-wm>	 PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:36:31] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_hourly on cumin1001 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[12:48:19] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1091 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[12:57:55] <wikibugs>	 (03PS1) 10Andrew Bogott: Backy2 backup jobs: don't email on failure [puppet] - 10https://gerrit.wikimedia.org/r/886470 (https://phabricator.wikimedia.org/T328868)
[12:58:45] <wikibugs>	 (03CR) 10Andrew Bogott: "...is this the right way to do this? I don't see any existing examples like this in the code." [puppet] - 10https://gerrit.wikimedia.org/r/886470 (https://phabricator.wikimedia.org/T328868) (owner: 10Andrew Bogott)
[13:00:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] hieradata: drop ldap-labtest acme-chier cert [puppet] - 10https://gerrit.wikimedia.org/r/885026 (owner: 10Majavah)
[13:05:44] <wikibugs>	 10SRE, 10Traffic-Icebox, 10WMF-General-or-Unknown, 10Browser-Support-Internet-Explorer, 10Upstream: Visting [[c:File:FEZ_trial_gameplay_HD.webm]] in IE11 shows errors in developer console about insecure data:image/png;base64 "URL" - https://phabricator.wikimedia.org/T148595 (10Aklapper) 05Invalid→03De...
[13:09:36] <wikibugs>	 (03PS1) 10Andrew Bogott: Bump pontoon default openstack version to 'zed' [puppet] - 10https://gerrit.wikimedia.org/r/886471
[13:09:38] <wikibugs>	 (03PS1) 10Andrew Bogott: Remove unused files for OpenStack version 'xena' [puppet] - 10https://gerrit.wikimedia.org/r/886472
[13:10:09] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Remove unused files for OpenStack version 'xena' [puppet] - 10https://gerrit.wikimedia.org/r/886472 (owner: 10Andrew Bogott)
[13:25:29] <icinga-wm>	 RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:30:35] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_hourly on cumin1001 is OK: OK: Status of the systemd unit httpbb_kubernetes_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[13:31:35] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1091 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[14:18:17] <jinxer-wm>	 (NELHigh) firing: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[14:23:17] <jinxer-wm>	 (NELHigh) resolved: Elevated Network Error Logging events (tcp.timed_out) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#NEL_alerts - https://logstash.wikimedia.org/goto/5c8f4ca1413eda33128e5c5a35da7e28 - https://alerts.wikimedia.org/?q=alertname%3DNELHigh
[14:37:33] <wikibugs>	 10SRE, 10Observability-Alerting, 10observability: alertmanager silence confirmation page links to localhost - https://phabricator.wikimedia.org/T328869 (10CDanis)
[14:39:00] <cdanis>	 !log silenced NELHigh alert for 20 hours: Telecom Italy issues; alertmanager silence id 3fb3b999-9756-44af-a1e8-fd1faae8b9bf
[14:39:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:02:59] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1091 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[16:30:07] <wikibugs>	 (03PS1) 10Zabe: stop setting checkuser actor/comment migartion variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/886476 (https://phabricator.wikimedia.org/T233004)
[16:30:26] <wikibugs>	 (03CR) 10Zabe: [C: 04-2] "not quite yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/886476 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[16:46:15] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1091 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[18:01:09] <wikibugs>	 (03CR) 10Dreamy Jazz: stop setting checkuser actor/comment migartion variables (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/886476 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[18:29:38] <wikibugs>	 (03PS2) 10Zabe: stop setting checkuser actor/comment migration variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/886476 (https://phabricator.wikimedia.org/T233004)
[18:29:50] <wikibugs>	 (03CR) 10Zabe: stop setting checkuser actor/comment migration variables (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/886476 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[18:34:27] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1091 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[18:38:56] <wikibugs>	 (03PS1) 10Majavah: mediawiki: drop pybal-check user [puppet] - 10https://gerrit.wikimedia.org/r/886478 (https://phabricator.wikimedia.org/T111899)
[19:06:53] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1091 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[19:15:05] <icinga-wm>	 PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/translate/{from}/{to}/{provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
[19:16:49] <icinga-wm>	 RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[19:17:43] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1091 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[19:50:11] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1091 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[21:09:31] <wikibugs>	 (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/885482
[22:28:50] <topranks>	 !log Re-enabling peering to Seabone/Telecom Italit AS 6762 on cr2-esams at AMS-IX 
[22:28:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:48:21] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1091 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring