[00:39:23] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/906695 [00:39:29] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/906695 (owner: 10TrainBranchBot) [00:55:15] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/906695 (owner: 10TrainBranchBot) [02:06:32] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:26:32] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:49:41] (NodeTextfileStale) firing: (2) Stale textfile for puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [07:49:41] (NodeTextfileStale) firing: (2) Stale textfile for puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [08:42:51] PROBLEM - Check systemd state on mirror1001 is CRITICAL: CRITICAL - degraded: The following units failed: update-ubuntu-mirror.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:54:33] (03PS1) 10Majavah: O:wmcs::nfs: remove primary_backup::misc and related classes [puppet] - 10https://gerrit.wikimedia.org/r/906783 (https://phabricator.wikimedia.org/T301280) [09:13:47] PROBLEM - WDQS SPARQL on wdqs1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [09:36:58] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: wdqs2004:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [09:46:58] (RdfStreamingUpdaterHighConsumerUpdateLag) resolved: wdqs2004:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [11:40:33] 10SRE, 10Commons: Persistent 404 errors for some Commons files, fixable by overwriting - https://phabricator.wikimedia.org/T334346 (10bjh21) [11:49:41] (NodeTextfileStale) firing: (2) Stale textfile for puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [12:36:51] PROBLEM - Host mw2448 is DOWN: PING CRITICAL - Packet loss = 100% [12:49:03] 10SRE, 10SRE-swift-storage, 10Commons: Persistent 404 errors for some Commons files, fixable by overwriting - https://phabricator.wikimedia.org/T334346 (10Peachey88) [13:01:51] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [13:03:23] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.345 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [13:34:46] (03PS1) 10Andrew Bogott: cinder-backups: consolidate backup jobs on one host [puppet] - 10https://gerrit.wikimedia.org/r/906789 [13:36:03] (03CR) 10Andrew Bogott: [C: 03+2] cinder-backups: consolidate backup jobs on one host [puppet] - 10https://gerrit.wikimedia.org/r/906789 (owner: 10Andrew Bogott) [13:48:03] (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [13:53:03] (ProbeDown) resolved: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [13:54:53] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [13:55:27] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:21:31] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 49853 bytes in 3.869 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:21:51] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 2.211 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:41:58] (03PS1) 10Jon Harald Søby: Add blkwiki to wgSitename [mediawiki-config] - 10https://gerrit.wikimedia.org/r/906793 (https://phabricator.wikimedia.org/T334351) [15:10:37] Hi! Why is [[wikitech:Deployments]] not updated? I wanted to schedule a patch for next week (hopefully tomorrow) [15:21:05] PROBLEM - Ubuntu mirror in sync with upstream on mirror1001 is CRITICAL: /srv/mirrors/ubuntu is over 14 hours old. https://wikitech.wikimedia.org/wiki/Mirrors [15:49:41] (NodeTextfileStale) firing: (2) Stale textfile for puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [17:39:03] (ProbeDown) firing: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:44:03] (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [19:11:12] 10SRE-Sprint-Week-Sustainability-March2023, 10DynamicPageList (Wikimedia), 10serviceops-radar, 10Patch-Needs-Improvement, and 7 others: Limit concurrency of DPL queries - https://phabricator.wikimedia.org/T263220 (10Pppery) [19:49:41] (NodeTextfileStale) firing: (2) Stale textfile for puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [21:52:21] PROBLEM - Query Service HTTP Port on wdqs1007 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 298 bytes in 0.005 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [23:49:41] (NodeTextfileStale) firing: (2) Stale textfile for puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale