[00:08:34] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] prometheus6001: add to global node list [puppet] - 10https://gerrit.wikimedia.org/r/748225 (https://phabricator.wikimedia.org/T282787) (owner: 10BBlack)
[00:22:18] <wikibugs>	 (03PS1) 10BBlack: Add prometheus.svc.drmrs.wmnet alias [dns] - 10https://gerrit.wikimedia.org/r/748227 (https://phabricator.wikimedia.org/T282787)
[00:25:41] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Add prometheus.svc.drmrs.wmnet alias [dns] - 10https://gerrit.wikimedia.org/r/748227 (https://phabricator.wikimedia.org/T282787) (owner: 10BBlack)
[00:26:07] <wikibugs>	 (03PS1) 10BBlack: Add drmrs prometheus to various global config [puppet] - 10https://gerrit.wikimedia.org/r/748228 (https://phabricator.wikimedia.org/T282787)
[00:28:23] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Add drmrs prometheus to various global config [puppet] - 10https://gerrit.wikimedia.org/r/748228 (https://phabricator.wikimedia.org/T282787) (owner: 10BBlack)
[00:37:34] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=pdu_sentry4 site=drmrs https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:39:44] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:44:43] <bblack>	 just for the record, anything that mentions "drmrs" is non-critical if it alerts.  The site isn't active.
[00:44:55] <bblack>	 it's just hard to control for all possible spam fallouts as things are being initially configured
[00:50:40] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=sidekiq site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:52:52] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:01:16] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] add and enable subset filters [software/ecs] - 10https://gerrit.wikimedia.org/r/747641 (https://phabricator.wikimedia.org/T294581) (owner: 10Cwhite)
[01:01:49] <wikibugs>	 (03Merged) 10jenkins-bot: add and enable subset filters [software/ecs] - 10https://gerrit.wikimedia.org/r/747641 (https://phabricator.wikimedia.org/T294581) (owner: 10Cwhite)
[01:06:11] <wikibugs>	 (03PS1) 10Cwhite: profile: upgrade to ecs 1.11.0-2 [puppet] - 10https://gerrit.wikimedia.org/r/748230 (https://phabricator.wikimedia.org/T294581)
[01:17:27] <wikibugs>	 10SRE: Allow Wikimedia Maps usage on bbcrewind.co.uk - https://phabricator.wikimedia.org/T297968 (10AntiCompositeNumber) I don't believe this request meets the criteria in the [[https://foundation.wikimedia.org/wiki/Maps_Terms_of_Use#Using_maps_in_third-party_services|Maps Terms of Use]]. > Wikimedia Maps may no...
[01:18:59] <wikibugs>	 (03CR) 10Samwilson: [C: 03+2] Move horizontal/vertical layout to CSS only [extensions/ProofreadPage] (wmf/1.38.0-wmf.13) - 10https://gerrit.wikimedia.org/r/748095 (https://phabricator.wikimedia.org/T297339) (owner: 10Inductiveload)
[01:46:12] <icinga-wm>	 RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:04:06] <icinga-wm>	 PROBLEM - SSH on kubernetes1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:05:08] <icinga-wm>	 RECOVERY - SSH on kubernetes1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:43:39] <wikibugs>	 10SRE: Allow Wikimedia Maps usage on bbcrewind.co.uk - https://phabricator.wikimedia.org/T297968 (10Ed6767) Ignoring that this proposal contradicts the Maps Terms of Service at time of writing, will bbcrewind.co.uk support and benefit Wikimedia projects, other than through providing historical references? We can...
[04:02:09] <wikibugs>	 (03PS2) 10RLazarus: Add a pod_name column to ActiveContainerImage [docker-images/imagecatalog] - 10https://gerrit.wikimedia.org/r/747881 (https://phabricator.wikimedia.org/T287130)
[04:02:11] <wikibugs>	 (03PS1) 10RLazarus: Fix --cluster command line parsing and add tests [docker-images/imagecatalog] - 10https://gerrit.wikimedia.org/r/748232 (https://phabricator.wikimedia.org/T287130)
[04:04:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Fix --cluster command line parsing and add tests [docker-images/imagecatalog] - 10https://gerrit.wikimedia.org/r/748232 (https://phabricator.wikimedia.org/T287130) (owner: 10RLazarus)
[04:05:11] <wikibugs>	 (03PS2) 10RLazarus: Fix --clusters command line parsing and add tests [docker-images/imagecatalog] - 10https://gerrit.wikimedia.org/r/748232 (https://phabricator.wikimedia.org/T287130)
[04:06:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Fix --clusters command line parsing and add tests [docker-images/imagecatalog] - 10https://gerrit.wikimedia.org/r/748232 (https://phabricator.wikimedia.org/T287130) (owner: 10RLazarus)
[04:11:01] <wikibugs>	 (03PS3) 10RLazarus: Fix --clusters command line parsing and add tests [docker-images/imagecatalog] - 10https://gerrit.wikimedia.org/r/748232 (https://phabricator.wikimedia.org/T287130)
[05:35:56] <icinga-wm>	 PROBLEM - Check systemd state on sodium is CRITICAL: CRITICAL - degraded: The following units failed: update-ubuntu-mirror.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:14:28] <icinga-wm>	 PROBLEM - SSH on kubernetes1004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:15:34] <icinga-wm>	 RECOVERY - SSH on kubernetes1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:32:28] <icinga-wm>	 RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:16:04] <icinga-wm>	 PROBLEM - SSH on kubernetes1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:09:03] <wikibugs>	 10SRE: Allow Wikimedia Maps usage on bbcrewind.co.uk - https://phabricator.wikimedia.org/T297968 (10LWyatt) For those commenting with concerns about 'slippery slope' and 'mission alignment' - I should clarify some context here:  - The Maps API //used// to be available for anyone to use for any purpose, but was r...
[13:17:10] <icinga-wm>	 RECOVERY - SSH on kubernetes1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:57:41] <dcausse>	 !log restarting blazegraph on wdqs1013 (jvm stuck for 10hours)
[13:57:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:58] <icinga-wm>	 PROBLEM - At least one CPU core of an LVS is saturated- packet drops are likely on lvs3005 is CRITICAL: cpu={1,11,13,15,3,5,7,9} https://bit.ly/wmf-lvscpu https://grafana.wikimedia.org/d/000000377/host-overview?var-server=lvs3005&var-datasource=esams+prometheus/ops
[15:39:14] <icinga-wm>	 RECOVERY - At least one CPU core of an LVS is saturated- packet drops are likely on lvs3005 is OK: All metrics within thresholds. https://bit.ly/wmf-lvscpu https://grafana.wikimedia.org/d/000000377/host-overview?var-server=lvs3005&var-datasource=esams+prometheus/ops
[17:23:14] <icinga-wm>	 PROBLEM - SSH on rdb1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:24:14] <icinga-wm>	 RECOVERY - SSH on rdb1006.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:44:03] <wikibugs>	 (03PS1) 10Zabe: Add towiki.ru to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/748305 (https://phabricator.wikimedia.org/T294190)