[00:39:04] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/971428 [00:39:10] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/971428 (owner: 10TrainBranchBot) [00:47:40] (LVSHighRX) firing: Excessive RX traffic on lvs3008:9100 (eno12399np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3008 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:52:40] (LVSHighRX) resolved: Excessive RX traffic on lvs3008:9100 (eno12399np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3008 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:57:15] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/971428 (owner: 10TrainBranchBot) [01:35:57] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:45:27] (SwiftObjectCountSiteDisparity) firing: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity [02:38:46] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:41:05] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [02:47:03] (ProbeDown) firing: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:52:03] (ProbeDown) resolved: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [03:04:33] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:11:05] (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [03:51:17] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [05:45:27] (SwiftObjectCountSiteDisparity) firing: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity [06:34:21] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:35:15] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:39:23] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50860 bytes in 0.066 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:39:55] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.267 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:08:46] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [07:28:51] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:30:05] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50860 bytes in 0.063 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:51:17] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:07:03] (ProbeDown) firing: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:12:03] (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:43:23] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:44:09] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:44:25] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:49:49] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 9.257 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:50:01] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Sun 17 Dec 2023 03:07:37 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:50:25] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50862 bytes in 7.970 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [09:45:27] (SwiftObjectCountSiteDisparity) firing: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity [09:47:35] (03PS1) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [09:47:40] (03CR) 10RhinosF1: [C: 04-1] [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [09:50:55] (03PS2) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [09:50:59] (03CR) 10RhinosF1: [C: 04-1] [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [09:51:40] (03CR) 10CI reject: [V: 04-1] [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [09:51:52] (03PS3) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [09:51:57] (03CR) 10RhinosF1: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [09:52:27] (03CR) 10CI reject: [V: 04-1] [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [09:54:01] (03PS4) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [09:54:09] (03CR) 10RhinosF1: [C: 04-1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [09:55:37] (03PS5) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [09:55:43] (03CR) 10RhinosF1: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [09:55:55] (03CR) 10RhinosF1: [C: 04-1] [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:01:49] (03PS6) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [10:01:59] (03CR) 10RhinosF1: [C: 04-1] "experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:02:32] (03CR) 10CI reject: [V: 04-1] [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:02:40] (03CR) 10RhinosF1: [C: 04-1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:03:25] (03PS7) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [10:03:29] (03CR) 10RhinosF1: [C: 04-1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:04:01] (03CR) 10CI reject: [V: 04-1] [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:07:35] (03PS8) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [10:08:02] (03CR) 10CI reject: [V: 04-1] [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:09:12] (03PS9) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [10:09:42] (03CR) 10CI reject: [V: 04-1] [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:11:22] (03PS10) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [10:11:48] (03CR) 10CI reject: [V: 04-1] [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:13:41] This Variable has no effect. A value was produced and then forgotten (one or more preceding expressions may have the wrong form) (file: /srv/workspace/puppet/modules/wikistats/manifests/job/update.pp, line: 13, column: 7) [10:13:49] why do you hate me, it is on line 27 [10:13:59] s/27/23 [10:14:10] oh [10:14:17] (03PS11) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [10:15:37] (03CR) 10RhinosF1: [C: 04-1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:17:27] (03PS12) 10RhinosF1: [DNM] Revert "wikistats:wikia: pause updates while changes are made to table" [puppet] - 10https://gerrit.wikimedia.org/r/971530 [10:18:19] (03CR) 10RhinosF1: [C: 04-1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [10:21:09] (03CR) 10RhinosF1: [C: 04-1] "PCC SUCCESS DIFF 1" [puppet] - 10https://gerrit.wikimedia.org/r/971530 (owner: 10RhinosF1) [11:08:46] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [11:51:18] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [13:45:28] (SwiftObjectCountSiteDisparity) firing: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity [14:38:46] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:53:46] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:51:18] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [16:13:44] (ThanosQueryInstantLatencyHigh) firing: Thanos Query Frontend has high latency for queries. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/aa7Rx0oMk/thanos-query-frontend - https://alerts.wikimedia.org/?q=alertname%3DThanosQueryInstantLatencyHigh [16:18:44] (ThanosQueryInstantLatencyHigh) resolved: Thanos Query Frontend has high latency for queries. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/aa7Rx0oMk/thanos-query-frontend - https://alerts.wikimedia.org/?q=alertname%3DThanosQueryInstantLatencyHigh [17:45:28] (SwiftObjectCountSiteDisparity) firing: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity [18:53:46] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [19:51:18] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [21:45:28] (SwiftObjectCountSiteDisparity) firing: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity [22:53:46] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [23:09:55] PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Idle - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [23:16:09] PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [23:51:18] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure