[00:18:25] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[00:38:49] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/977232
[00:38:55] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/977232 (owner: 10TrainBranchBot)
[00:57:22] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/977232 (owner: 10TrainBranchBot)
[01:19:00] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-backup: don't try to clean up protected snapshots [puppet] - 10https://gerrit.wikimedia.org/r/977279
[01:24:13] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs-backup: don't try to clean up protected snapshots [puppet] - 10https://gerrit.wikimedia.org/r/977279 (owner: 10Andrew Bogott)
[02:33:25] <jinxer-wm>	 (ProbeDown) firing: (4) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:38:25] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:08:25] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:24:26] <jinxer-wm>	 (KubernetesAPINotScrapable) firing: (2) k8s-staging@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[04:18:25] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[06:33:25] <jinxer-wm>	 (ProbeDown) firing: (4) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:24:26] <jinxer-wm>	 (KubernetesAPINotScrapable) firing: (2) k8s-staging@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[08:00:06] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231126T0800)
[08:05:23] <wikibugs>	 (03PS3) 10Gergő Tisza: CentralAuth: Fix wikisource.org cookie handling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/976864 (https://phabricator.wikimedia.org/T351685)
[08:06:17] <wikibugs>	 (03CR) 10Gergő Tisza: CentralAuth: Fix wikisource.org cookie handling (035 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/976864 (https://phabricator.wikimedia.org/T351685) (owner: 10Gergő Tisza)
[08:18:25] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[08:34:52] <jinxer-wm>	 (ProbeDown) firing: (5) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4)   - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:38:25] <jinxer-wm>	 (ProbeDown) firing: (5) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4)   - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:53:20] <wikibugs>	 10SRE, 10Traffic-Icebox, 10WMF-General-or-Unknown: Varnish: Mobile site redirect interferes with OAuth authorization process - https://phabricator.wikimedia.org/T74186 (10Tgr) After re-reading the comments, I'm not really sure about the current state of the task. Varnish [[https://gerrit.wikimedia.org/r/plug...
[09:05:45] <wikibugs>	 10SRE, 10Traffic-Icebox, 10WMF-General-or-Unknown: Varnish: Mobile site redirect interferes with OAuth authorization process - https://phabricator.wikimedia.org/T74186 (10Tgr) >>! In T74186#8716436, @Tgr wrote: > As pointed out in T332650#8715452, if the user is not logged in, they get redirected to the logi...
[09:51:03] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:51:17] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:53:07] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:54:25] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:58:39] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50862 bytes in 8.598 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:59:47] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.894 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[10:30:45] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1006 is CRITICAL: HTTP CRITICAL: HTTP/1.1 429 Too Many Requests - string http://www.w3.org/2001/XML... not found on https://query.wikidata.org:443/bigdata/namespace/wdq/sparql?query=SELECT%20*%20WHERE%20%7Bwikibase%3ADump%20schema%3AdateModified%20%3Fy%7D%20LIMIT%201 - 754 bytes in 0.087 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[11:24:27] <jinxer-wm>	 (KubernetesAPINotScrapable) firing: (2) k8s-staging@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[12:18:25] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[12:38:25] <jinxer-wm>	 (ProbeDown) firing: (4) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:13:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 5.726215854993238s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:20:17] <wikibugs>	 10SRE-swift-storage, 10Commons: Incomplete files uploaded (10 MB interruption) - https://phabricator.wikimedia.org/T350917 (10MatthewVernon) Looking at recent uploads, there are definitely >10MB files being uploaded: https://commons.wikimedia.org/wiki/File:Ambito_veneziano,_Martirio_di_san_Tommaso_Becket,_1260...
[13:20:47] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:20:51] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:23:57] <wikibugs>	 10SRE-swift-storage, 10Commons: Incomplete files uploaded (10 MB interruption) - https://phabricator.wikimedia.org/T350917 (10MatthewVernon) Perhaps worth noting that some of these images predate the move to envoy too (e.g. https://commons.wikimedia.org/wiki/File:Youngtimer_Trophy_Schwedenkreutz_2023_15.jpg )...
[13:25:03] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50861 bytes in 0.486 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:25:05] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.277 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:38:25] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:53:25] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:02:46] <wikibugs>	 (03PS4) 10D3r1ck01: ClusterConfig: Rename `isTest()` to `isDebug()` for consistency [mediawiki-config] - 10https://gerrit.wikimedia.org/r/976252 (https://phabricator.wikimedia.org/T347366)
[15:21:26] <wikibugs>	 (03PS5) 10D3r1ck01: ClusterConfig: Rename `isTest()` to `isDebug()` for consistency [mediawiki-config] - 10https://gerrit.wikimedia.org/r/976252 (https://phabricator.wikimedia.org/T347366)
[15:21:36] <wikibugs>	 (03CR) 10D3r1ck01: ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/976252 (https://phabricator.wikimedia.org/T347366) (owner: 10D3r1ck01)
[15:22:10] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ClusterConfig: Rename `isTest()` to `isDebug()` for consistency [mediawiki-config] - 10https://gerrit.wikimedia.org/r/976252 (https://phabricator.wikimedia.org/T347366) (owner: 10D3r1ck01)
[15:23:49] <wikibugs>	 (03PS6) 10D3r1ck01: ClusterConfig: Rename `isTest()` to `isDebug()` for consistency [mediawiki-config] - 10https://gerrit.wikimedia.org/r/976252 (https://phabricator.wikimedia.org/T347366)
[15:24:27] <jinxer-wm>	 (KubernetesAPINotScrapable) firing: (2) k8s-staging@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[15:28:05] <wikibugs>	 (03CR) 10D3r1ck01: [C: 04-2] "Pending completion of work on T336004 before complete cleanup." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/974508 (https://phabricator.wikimedia.org/T336004) (owner: 10D3r1ck01)
[16:19:52] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[16:38:25] <jinxer-wm>	 (ProbeDown) firing: (4) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:50:47] <wikibugs>	 (03PS2) 10Phuedx: Remove eventlogging_FeaturePolicyViolation and _SpecialMuteSubmit event streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/977075 (https://phabricator.wikimedia.org/T329718)
[17:09:23] <icinga-wm>	 PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:13:31] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 5.619115035593177s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[17:15:51] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[18:38:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 4.124450535881516s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[18:39:46] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 5.499799976997294s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[19:24:27] <jinxer-wm>	 (KubernetesAPINotScrapable) firing: (2) k8s-staging@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[20:23:26] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[20:28:31] <icinga-wm>	 PROBLEM - Query Service HTTP Port on wdqs1006 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 298 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[20:38:25] <jinxer-wm>	 (ProbeDown) firing: (4) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:06:01] <wikibugs>	 (03PS1) 10Andrew Bogott: rbd2backy2: default value for new snapshot attribute 'protected' [puppet] - 10https://gerrit.wikimedia.org/r/977295
[21:11:23] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] rbd2backy2: default value for new snapshot attribute 'protected' [puppet] - 10https://gerrit.wikimedia.org/r/977295 (owner: 10Andrew Bogott)
[22:03:47] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[22:04:45] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[22:05:09] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 2.414 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[22:06:03] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50860 bytes in 0.105 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[22:40:01] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 5.541067114181549s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:04:46] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 4.0644654514338345s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:05:46] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 3.4909274130964563s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:24:27] <jinxer-wm>	 (KubernetesAPINotScrapable) firing: (2) k8s-staging@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[23:46:01] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 4.6455259792201415s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:50:46] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 5.080825044753017s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded