[00:03:24] <icinga-wm>	 PROBLEM - Maps tiles generation on alert1001 is CRITICAL: CRITICAL: 100.00% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1
[00:03:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P20951 and previous config saved to /var/cache/conftool/dbconfig/20220217-000355-marostegui.json
[00:03:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:06:40] <icinga-wm>	 PROBLEM - Check systemd state on grafana1002 is CRITICAL: CRITICAL - degraded: The following units failed: grafana-ldap-users-sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:18:28] <wikibugs>	 (03PS8) 10JHathaway: Remove ordered_yaml function [puppet] - 10https://gerrit.wikimedia.org/r/763362
[00:18:30] <icinga-wm>	 RECOVERY - Check systemd state on doh6001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:19:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300381)', diff saved to https://phabricator.wikimedia.org/P20952 and previous config saved to /var/cache/conftool/dbconfig/20220217-001859-marostegui.json
[00:19:01] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[00:19:03] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[00:19:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:19:07] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[00:19:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3318 (T300381)', diff saved to https://phabricator.wikimedia.org/P20953 and previous config saved to /var/cache/conftool/dbconfig/20220217-001907-marostegui.json
[00:19:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:19:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:19:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:21:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Remove ordered_yaml function [puppet] - 10https://gerrit.wikimedia.org/r/763362 (owner: 10JHathaway)
[00:23:14] <wikibugs>	 (03PS9) 10JHathaway: Remove ordered_yaml function [puppet] - 10https://gerrit.wikimedia.org/r/763362
[00:26:46] <wikibugs>	 10SRE, 10DynamicPageList (Wikimedia), 10MW-1.37-notes (1.37.0-wmf.16; 2021-07-26), 10Sustainability (Incident Followup): Decide on the future of DPL - https://phabricator.wikimedia.org/T287380 (10Matthiasb) I urge to investigate wether the Russian issue can be minimized if DPL is not used on category pages...
[00:38:02] <icinga-wm>	 PROBLEM - BGP status on cr3-eqsin is CRITICAL: BGP CRITICAL - No response from remote host 103.102.166.131 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[00:38:34] <AntiComposite>	 getting intermittent failures from US east coast
[00:38:41] <AntiComposite>	 upstream connect error or disconnect/reset before headers. reset reason: overflow
[00:38:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org
[00:39:00] <AntiComposite>	 a few other reports of the same in Discord
[00:39:22] <rzl>	 AntiComposite: thanks, looking
[00:39:47] <icinga-wm>	 PROBLEM - Not enough idle PHP-FPM workers for Mediawiki appserver at eqiad #page on alert1001 is CRITICAL: 0.2685 lt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=54&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver
[00:40:07] <icinga-wm>	 PROBLEM - ATS TLS has reduced HTTP availability #page on alert1001 is CRITICAL: cluster=cache_text layer=tls https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1
[00:40:11] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on alert1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[00:41:08] <jhathaway>	 online, just got the page
[00:41:24] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:41:26] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs3005 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3058.esams.wmnet are marked down but pooled: textlb_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3058.esams.wmnet are marked down but pooled: testlb6_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3058.esams.wmnet are marked down but pooled: textlb6_443: Servers cp3060.es
[00:41:26] <icinga-wm>	 t, cp3050.esams.wmnet, cp3058.esams.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[00:41:34] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:41:40] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp3060 is CRITICAL: HTTP CRITICAL - No data received from host https://wikitech.wikimedia.org/wiki/Varnish
[00:41:52] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp3058 is CRITICAL: HTTP CRITICAL - No data received from host https://wikitech.wikimedia.org/wiki/Varnish
[00:41:52] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:42:11] <icinga-wm>	 RECOVERY - Not enough idle PHP-FPM workers for Mediawiki appserver at eqiad #page on alert1001 is OK: (C)0.3 lt (W)0.5 lt 0.8909 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=54&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver
[00:42:26] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:42:26] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:42:26] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:42:32] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on alert1001 is CRITICAL: 43.77 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[00:42:34] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[00:42:40] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs3007 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3058.esams.wmnet are marked down but pooled: textlb_443: Servers cp3060.esams.wmnet, cp3050.esams.wmnet, cp3058.esams.wmnet are marked down but pooled: testlb6_443: Servers cp3060.esams.wmnet, cp3058.esams.wmnet are marked down but pooled: textlb6_443: Servers cp3060.esams.wmnet, cp3050.es
[00:42:40] <icinga-wm>	 t are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[00:42:44] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:42:48] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:43:00] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:43:00] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:43:01] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3127 on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:43:12] <icinga-wm>	 PROBLEM - LVS text esams port 80/tcp - Main wiki platform LVS service- text.eqiad.wikimedia.org -Varnish- IPv4 #page on text-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[00:43:32] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3126 on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:43:46] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:43:50] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:43:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org
[00:44:04] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:44:04] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:44:14] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:44:14] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:44:14] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:44:34] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3127 on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:44:34] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:44:42] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:44:42] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3127 on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:45:02] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[00:45:06] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs3007 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[00:45:14] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:45:20] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3126 on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[00:45:27] <icinga-wm>	 RECOVERY - LVS text esams port 80/tcp - Main wiki platform LVS service- text.eqiad.wikimedia.org -Varnish- IPv4 #page on text-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 610 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[00:45:38] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3125 on cp3058 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.171 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:46:58] <icinga-wm>	 PROBLEM - Number of messages locally queued by purged for processing on cp3060 is CRITICAL: cluster=cache_text instance=cp3060 job=purged layer=frontend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3060
[00:47:14] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3122 on cp3058 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:47:14] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3127 on cp3058 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:47:18] <icinga-wm>	 RECOVERY - ATS TLS has reduced HTTP availability #page on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1
[00:47:54] <icinga-wm>	 PROBLEM - Number of messages locally queued by purged for processing on cp3050 is CRITICAL: cluster=cache_text instance=cp3050 job=purged layer=frontend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3050
[00:48:18] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3123 on cp3058 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.316 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:48:28] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3120 on cp3058 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:48:28] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3125 on cp3060 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:48:28] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3120 on cp3060 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.164 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:48:40] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs3005 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[00:48:54] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3123 on cp3060 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.163 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:48:54] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3127 on cp3060 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.163 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:49:32] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3126 on cp3060 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:49:42] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on alert1001 is OK: (C)60 le (W)70 le 72.25 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[00:50:16] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3124 on cp3060 is OK: HTTP OK: HTTP/1.1 200 OK - 474 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:50:28] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3124 on cp3058 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:51:00] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp3058 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:51:20] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3122 on cp3060 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:51:34] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp3060 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:51:41] <icinga-wm>	 RECOVERY - Number of messages locally queued by purged for processing on cp3060 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3060
[00:52:22] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3121 on cp3060 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:55:01] <icinga-wm>	 RECOVERY - Number of messages locally queued by purged for processing on cp3050 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3050
[00:55:32] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3123 on cp3050 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 8.769 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:55:32] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp3050 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 9.002 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:55:44] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3122 on cp3050 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:56:34] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3126 on cp3050 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 7.900 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:56:46] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3120 on cp3050 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:57:31] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3127 on cp3050 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:57:31] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3121 on cp3050 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:58:10] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3124 on cp3050 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[00:59:12] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3125 on cp3050 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Varnish
[01:00:05] <jouncebot>	 twentyafterfour: (Dis)respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T0100). Please do the needful.
[01:00:52] <rzl>	 AntiComposite: we were discussing in a private channel but to follow up here -- thanks for the advance heads up, appreciate you being faster than the automatic alerts :)
[01:01:23] <AntiComposite>	 not the first time, probably won't be the last :)
[01:07:38] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp1087 is OK: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[01:18:56] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp1089 is OK: HTTP OK: HTTP/1.1 200 OK - 474 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[01:20:41] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[01:36:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300381)', diff saved to https://phabricator.wikimedia.org/P20954 and previous config saved to /var/cache/conftool/dbconfig/20220217-013607-marostegui.json
[01:36:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:36:16] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[01:38:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job mjolnir in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[01:40:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job mjolnir in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[01:51:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P20955 and previous config saved to /var/cache/conftool/dbconfig/20220217-015111-marostegui.json
[01:51:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:06:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P20956 and previous config saved to /var/cache/conftool/dbconfig/20220217-020616-marostegui.json
[02:06:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:14:57] <wikibugs>	 10SRE, 10User-Ladsgroup, 10Wikimedia-Incident: upstream connect error or disconnect/reset before headers. reset reason: overflow - https://phabricator.wikimedia.org/T301505 (10MZMcBride) 05Resolved→03Open This issue is still happening.
[02:17:04] <Oona>	 rzl: Hi. I reopened https://phabricator.wikimedia.org/T301505 just now. Should this ticket be assigned to you?
[02:17:55] <rzl>	 Oona: sure -- nothing to share on it yet but I can claim the task
[02:18:03] <rzl>	 sorry for the trouble, will have more to share soon
[02:18:25] <wikibugs>	 10SRE, 10User-Ladsgroup, 10Wikimedia-Incident: upstream connect error or disconnect/reset before headers. reset reason: overflow - https://phabricator.wikimedia.org/T301505 (10RLazarus) a:05Ladsgroup→03RLazarus
[02:18:32] <Oona>	 Awesome, thanks so much.
[02:21:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300381)', diff saved to https://phabricator.wikimedia.org/P20957 and previous config saved to /var/cache/conftool/dbconfig/20220217-022121-marostegui.json
[02:21:23] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[02:21:24] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[02:21:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:21:28] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[02:21:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1178 (T300381)', diff saved to https://phabricator.wikimedia.org/P20958 and previous config saved to /var/cache/conftool/dbconfig/20220217-022128-marostegui.json
[02:21:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:21:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:21:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:00:36] <icinga-wm>	 RECOVERY - SSH on dns5001.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:32:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300381)', diff saved to https://phabricator.wikimedia.org/P20959 and previous config saved to /var/cache/conftool/dbconfig/20220217-033159-marostegui.json
[03:32:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:32:07] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[03:47:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P20960 and previous config saved to /var/cache/conftool/dbconfig/20220217-034704-marostegui.json
[03:47:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:02:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P20961 and previous config saved to /var/cache/conftool/dbconfig/20220217-040208-marostegui.json
[04:02:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:17:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300381)', diff saved to https://phabricator.wikimedia.org/P20962 and previous config saved to /var/cache/conftool/dbconfig/20220217-041713-marostegui.json
[04:17:15] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
[04:17:17] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
[04:17:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:17:21] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[04:17:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1104 (T300381)', diff saved to https://phabricator.wikimedia.org/P20963 and previous config saved to /var/cache/conftool/dbconfig/20220217-041721-marostegui.json
[04:17:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:17:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:17:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:53:53] <wikibugs>	 (03PS1) 104nn1l2: InitialiseSettings: General cleanup, wgAddGroups (R-Z) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763398 (https://phabricator.wikimedia.org/T301647)
[05:41:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300381)', diff saved to https://phabricator.wikimedia.org/P20964 and previous config saved to /var/cache/conftool/dbconfig/20220217-054154-marostegui.json
[05:42:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:42:02] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[05:43:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job mjolnir in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[05:56:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P20965 and previous config saved to /var/cache/conftool/dbconfig/20220217-055659-marostegui.json
[05:57:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:12:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P20966 and previous config saved to /var/cache/conftool/dbconfig/20220217-061203-marostegui.json
[06:12:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:12:49] <wikibugs>	 (03PS2) 10Andrew Bogott: backy2: don't back up shelved instances [puppet] - 10https://gerrit.wikimedia.org/r/763345
[06:12:51] <wikibugs>	 (03PS1) 10Andrew Bogott: backy2: initialize backy2 database if necessary [puppet] - 10https://gerrit.wikimedia.org/r/763401
[06:15:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] backy2: don't back up shelved instances [puppet] - 10https://gerrit.wikimedia.org/r/763345 (owner: 10Andrew Bogott)
[06:21:27] <wikibugs>	 (03PS2) 10Andrew Bogott: backy2: initialize backy2 database if necessary [puppet] - 10https://gerrit.wikimedia.org/r/763401
[06:21:29] <wikibugs>	 (03PS3) 10Andrew Bogott: backy2: don't back up shelved instances [puppet] - 10https://gerrit.wikimedia.org/r/763345
[06:27:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300381)', diff saved to https://phabricator.wikimedia.org/P20967 and previous config saved to /var/cache/conftool/dbconfig/20220217-062708-marostegui.json
[06:27:10] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
[06:27:11] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
[06:27:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:27:14] <stashbot>	 T300381: Make page_props.pp_page unsigned on wmf wikis - https://phabricator.wikimedia.org/T300381
[06:27:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:27:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:30:36] <icinga-wm>	 PROBLEM - SSH on kubernetes1004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:19:17] <wikibugs>	 (03PS3) 10BrandonXLF: wiki replicas: Only hide log_params when bit 0 is on in log_delete [puppet] - 10https://gerrit.wikimedia.org/r/758081 (https://phabricator.wikimedia.org/T301943)
[07:32:00] <icinga-wm>	 RECOVERY - SSH on kubernetes1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:46:33] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] logstash::input::kafka: allow a custom truststore path [puppet] - 10https://gerrit.wikimedia.org/r/763110 (https://phabricator.wikimedia.org/T300130) (owner: 10Elukey)
[07:53:05] <wikibugs>	 (03PS1) 10ArielGlenn: add Hannah Okwelum to platform-engineering group [puppet] - 10https://gerrit.wikimedia.org/r/763456 (https://phabricator.wikimedia.org/T301876)
[08:00:04] <jouncebot>	 Amir1 and apergos: It is that lovely time of the day again! You are hereby commanded to deploy UTC early backport and config training. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T0800).
[08:00:04] <jouncebot>	 kart_: A patch you scheduled for UTC early backport and config training is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:18] * kart_ is here..
[08:00:20] <apergos>	 oh? woops
[08:00:28] <apergos>	 lemme see whether we have any trainees for the session
[08:00:41] <apergos>	 nope!
[08:00:48] <apergos>	 let me look at the patches for today's window
[08:00:57] <kart_>	 OK. Then, I can self deploy.
[08:01:04] <apergos>	 oh. I forgot... I am not here, because Code Jam this week, heh
[08:01:08] <apergos>	 anyways lemme just look
[08:02:09] <apergos>	 yours is the lone patch, looks reasonable to me, I see it already has a +1 (thank you!), feel free to go ahead
[08:02:22] <kart_>	 Thanks! :)
[08:02:44] <wikibugs>	 (03PS4) 10KartikMistry: Enable SectionTranslation in Occitan and Luganda [mediawiki-config] - 10https://gerrit.wikimedia.org/r/761626 (https://phabricator.wikimedia.org/T301443)
[08:04:29] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] "Config deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/761626 (https://phabricator.wikimedia.org/T301443) (owner: 10KartikMistry)
[08:05:11] <wikibugs>	 (03Merged) 10jenkins-bot: Enable SectionTranslation in Occitan and Luganda [mediawiki-config] - 10https://gerrit.wikimedia.org/r/761626 (https://phabricator.wikimedia.org/T301443) (owner: 10KartikMistry)
[08:06:07] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, + what Cole said" [puppet] - 10https://gerrit.wikimedia.org/r/763172 (https://phabricator.wikimedia.org/T300130) (owner: 10Elukey)
[08:06:35] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] remove deprecated piechart plugin [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/763334 (https://phabricator.wikimedia.org/T282863) (owner: 10Cwhite)
[08:06:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] update grafana-image-renderer to 3.3.0 [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/763335 (https://phabricator.wikimedia.org/T282863) (owner: 10Cwhite)
[08:07:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] update grafana-simple-json-datasource to 1.4.2 [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/763337 (https://phabricator.wikimedia.org/T282863) (owner: 10Cwhite)
[08:08:37] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] grafana-next: set grafana codfw base domain to grafana next [puppet] - 10https://gerrit.wikimedia.org/r/763329 (https://phabricator.wikimedia.org/T282863) (owner: 10Cwhite)
[08:08:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] am: link alerts to their Icinga web page [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/763197 (https://phabricator.wikimedia.org/T300859) (owner: 10Filippo Giunchedi)
[08:09:43] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[08:09:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:09:56] <wikibugs>	 10SRE-swift-storage: Set up Misc Object Storage Service (moss) - https://phabricator.wikimedia.org/T279621 (10elukey) Hi everybody, is there a timeline for MOSS? The ML-Team is currently using the Thanos Swift cluster to store objects/models, we don't require a lot of space but at the same time we are not a grea...
[08:10:35] <logmsgbot>	 !log kartik@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:761626|Enable SectionTranslation in Occitan and Luganda WPs + CX out-of-Beta for Luganda WP (T301443)]] (duration: 00m 51s)
[08:10:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:40] <stashbot>	 T301443: Enable Flores for Occitan and Luganda - https://phabricator.wikimedia.org/T301443
[08:10:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[08:10:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[08:11:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:11:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:11:11] <kart_>	 (Although Log message is wrong, taking from the Phab)
[08:11:39] <wikibugs>	 (03CR) 10Elukey: "ping :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/741937 (https://phabricator.wikimedia.org/T295956) (owner: 10Hnowlan)
[08:12:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[08:12:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:13:14] <kart_>	 apergos: I'm done with config patch.
[08:13:48] <apergos>	 all tested and happy? fabulous!
[08:14:17] <apergos>	 anyone else with a patch they'd like to add last minute, since there's still plenty of time?
[08:19:08] <kart_>	 apergos: Yes. All good :)
[08:19:33] <urbanecm>	 apergos: I've a patch
[08:19:43] <urbanecm>	 should i self-service or do we have a trainee?
[08:20:07] <urbanecm>	 this: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/763171
[08:20:31] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: pass extinfo-url to icinga-exporter [puppet] - 10https://gerrit.wikimedia.org/r/763457 (https://phabricator.wikimedia.org/T300859)
[08:20:36] <hashar>	 good morning
[08:21:28] <apergos>	 morning.
[08:21:31] <apergos>	 well in that case...
[08:21:32] <taavi>	 morning!
[08:21:34] <apergos>	 !log UTC early B&C window completed
[08:21:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:46] <urbanecm>	 apergos: you must've missed my message above :)
[08:21:51] <apergos>	 dangit!
[08:22:03] <apergos>	 !log UTC early B&C window NOT completed, woops.
[08:22:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:11] <apergos>	 no trainees, self deploy!
[08:22:14] <wikibugs>	 (03PS2) 10Urbanecm: Deploy Growth features to 100% of newcomers on most Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763171 (https://phabricator.wikimedia.org/T301820)
[08:22:17] <urbanecm>	 doing :)
[08:22:21] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Deploy Growth features to 100% of newcomers on most Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763171 (https://phabricator.wikimedia.org/T301820) (owner: 10Urbanecm)
[08:23:04] <wikibugs>	 (03Merged) 10jenkins-bot: Deploy Growth features to 100% of newcomers on most Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763171 (https://phabricator.wikimedia.org/T301820) (owner: 10Urbanecm)
[08:26:32] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: c0cbd3048f9d288b40dbde09506fe212de176f19: Deploy Growth features to 100% of newcomers on most Wikipedias (T301820) (duration: 00m 50s)
[08:26:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:26:37] * urbanecm done
[08:26:38] <stashbot>	 T301820: Scale: enable Growth features for 100% of new accounts on most Wikipedias - https://phabricator.wikimedia.org/T301820
[08:26:48] <urbanecm>	 !log UTC early B&C now really done
[08:26:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:20] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[08:27:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:28:08] <wikibugs>	 (03PS1) 10Filippo Giunchedi: am: remove Icinga/ prefix and add 'source' label [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/763459 (https://phabricator.wikimedia.org/T300951)
[08:28:10] <wikibugs>	 (03PS1) 10Filippo Giunchedi: am: add 'host' label and add port to 'instance' [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/763460 (https://phabricator.wikimedia.org/T300951)
[08:28:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[08:28:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[08:28:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:28:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:29:54] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[08:29:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:34:26] <apergos>	 thanks for actually closing the window, urbane cm  :-D
[08:34:31] <apergos>	 see everyone next time!
[08:37:45] <_joe_>	 jelto: the output from helmfile is much better now, thanks
[08:43:23] <wikibugs>	 (03PS1) 10Majavah: hieradata: pcc: add clouddb-services-puppetmaster-01 key [puppet] - 10https://gerrit.wikimedia.org/r/763461
[08:43:53] <hashar>	 about mediawiki train, I have filed  a few tasks here and there but nothing concerning really
[08:44:09] <hashar>	 so I will roll the train to all wikis
[08:45:06] <hashar>	 though I will delay it a bit since I have a quick meeting at 9:00 UTC
[08:45:07] <apergos>	 \o/
[08:51:24] <jelto>	 _joe_: thanks! I also think the new output is more helpful now
[08:55:42] <apergos>	 hey urbanecm I notice you didn't add your patch to the deployment calendar, please don't forget to do that so we have a record.  :-) 
[08:55:57] <urbanecm>	 good point, let me do that now
[08:56:46] <urbanecm>	 apergos: {{done}}
[08:57:21] <apergos>	 ty!
[09:00:05] <jouncebot>	 hashar and jeena: May I have your attention please! MediaWiki train - Utc-0+Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T0900)
[09:14:15] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "As said on IRC:" [puppet] - 10https://gerrit.wikimedia.org/r/763277 (https://phabricator.wikimedia.org/T289131) (owner: 10Elukey)
[09:19:20] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/763345 (owner: 10Andrew Bogott)
[09:28:17] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] role::ml_k8s::master: enable Priority plugin [puppet] - 10https://gerrit.wikimedia.org/r/763277 (https://phabricator.wikimedia.org/T289131) (owner: 10Elukey)
[09:31:31] <hashar>	 ok train time
[09:31:56] <apergos>	 woo hoo!
[09:36:50] <wikibugs>	 (03PS1) 10Hashar: all wikis to 1.38.0-wmf.22  refs T300198 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763473
[09:36:52] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] all wikis to 1.38.0-wmf.22  refs T300198 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763473 (owner: 10Hashar)
[09:37:50] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.38.0-wmf.22  refs T300198 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763473 (owner: 10Hashar)
[09:39:07] <logmsgbot>	 !log hashar@deploy1002 rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.22  refs T300198
[09:39:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:14] <stashbot>	 T300198: 1.38.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T300198
[09:40:42] <hashar>	 Houston we are LIVE!
[09:40:49] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[09:40:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:41:58] <urbanecm>	 hashar: so, first morning train ever successfully finished? That's perfect 🙂
[09:42:09] <hashar>	 yes! we are lucky :]
[09:42:11] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[09:42:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[09:42:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:44] <urbanecm>	 hashar: i see T301936 opened under the blockers task though. Dunno if you saw it and assessed it though.
[09:42:45] <stashbot>	 T301936: Interwiki prefix "wikipedia" not working on multilingual wikimedia projects  - https://phabricator.wikimedia.org/T301936
[09:43:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job mjolnir in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[09:43:30] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[09:43:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:44:34] <hashar>	 oops
[09:44:38] <hashar>	 totally missed out that one
[09:45:42] <hashar>	 ah it got added as a blocker one hour ago :/
[09:45:48] <hashar>	 so after I have checked the list of blockers
[09:46:15] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1017.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
[09:46:17] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1017.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
[09:46:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:47:41] <wikibugs>	 10SRE, 10ops-eqiad: Installation issues on PowerEdge R440 eqiad Ganeti servers with buster / firmware update needed - https://phabricator.wikimedia.org/T299527 (10MoritzMuehlenhoff)
[09:47:52] <wikibugs>	 10SRE, 10ops-eqiad: Installation issues on PowerEdge R440 eqiad Ganeti servers with buster / firmware update needed - https://phabricator.wikimedia.org/T299527 (10MoritzMuehlenhoff) One more server is ready and downtimed; ganeti1017
[09:48:30] <hashar>	 so the wikipedia: interwiki got broken ? :\
[09:50:32] <moritzm>	 !log migrate instances off ganeti1012
[09:50:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:52:29] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10CAS-SSO, 10Patch-For-Review: Update CAS to 6.2 - https://phabricator.wikimedia.org/T265857 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This has been resolved for a long time, closing.
[09:59:59] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10netops, 10Patch-For-Review: Enable IPv6 for Wikidough - https://phabricator.wikimedia.org/T301165 (10cmooney) 05In progress→03Resolved
[10:00:27] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10netops, 10Patch-For-Review: Enable IPv6 for Wikidough - https://phabricator.wikimedia.org/T301165 (10cmooney)
[10:00:58] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10netops, 10Patch-For-Review: Enable IPv6 for Wikidough - https://phabricator.wikimedia.org/T301165 (10cmooney) Ok gonna close this one, range announced and doh working on IPv6 from all our POPs now.  I've a separate task - T301900 - to validate the route p...
[10:05:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/763456 (https://phabricator.wikimedia.org/T301876) (owner: 10ArielGlenn)
[10:11:53] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: allow using elasticsearch v6.8 [puppet] - 10https://gerrit.wikimedia.org/r/763477
[10:12:48] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: allow using elasticsearch v6.8 [puppet] - 10https://gerrit.wikimedia.org/r/763477 (https://phabricator.wikimedia.org/T295666)
[10:14:43] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: upgrade deployment-prep to elasticsearch 6.8 [puppet] - 10https://gerrit.wikimedia.org/r/763478 (https://phabricator.wikimedia.org/T301954)
[10:15:47] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: upgrade deployment-prep to elasticsearch 6.8 [puppet] - 10https://gerrit.wikimedia.org/r/763479 (https://phabricator.wikimedia.org/T301955)
[10:16:05] <hashar>	 apergos: Reedy: hi, any clue who might have the knowledge about `wikipedia:` interwikis being broken?  https://phabricator.wikimedia.org/T301936
[10:16:28] <hashar>	 I am pretty sure I once understood how interwiki worked or were defined but that was several years ago
[10:16:39] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: add cswiki & dewiki editquality isvcs [deployment-charts] - 10https://gerrit.wikimedia.org/r/763480 (https://phabricator.wikimedia.org/T301415)
[10:16:41] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: upgrade deployment-prep to elasticsearch 6.8 [puppet] - 10https://gerrit.wikimedia.org/r/763481 (https://phabricator.wikimedia.org/T301956)
[10:16:49] <apergos>	 my knowledge is as out of date as yours
[10:17:10] <apergos>	 I can ask in our team channel (or so can you), gotta think about timezones
[10:17:13] <hashar>	 ah good to know I am not the only one :D
[10:17:27] <hashar>	 will do
[10:17:50] <apergos>	 mention it's a train blocker and if it's ubn mention that too
[10:18:15] <apergos>	 our team is on "Code Jam" this week so we are supposed to not do anything else, obviously if it's ubn/train blocker then we stop and look at that
[10:18:28] <wikibugs>	 10SRE, 10SRE-Access-Requests: saisuman ssh production public keys reused for WMCS - https://phabricator.wikimedia.org/T300708 (10SCherukuwada) Just did. All good, thank you and sorry for the trouble!
[10:18:50] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: upgrade cloudelastic to elasticsearch 6.8 [puppet] - 10https://gerrit.wikimedia.org/r/763481 (https://phabricator.wikimedia.org/T301956)
[10:19:00] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: upgrade relforge to elasticsearch 6.8 [puppet] - 10https://gerrit.wikimedia.org/r/763479 (https://phabricator.wikimedia.org/T301955)
[10:20:08] <wikibugs>	 (03PS1) 10Majavah: toolsdb primary: come back in read only mode [puppet] - 10https://gerrit.wikimedia.org/r/763482
[10:20:47] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: upgrade codfw to elasticsearch 6.8 [puppet] - 10https://gerrit.wikimedia.org/r/763483 (https://phabricator.wikimedia.org/T301958)
[10:20:49] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: upgrade eqiad to elasticsearch 6.8 [puppet] - 10https://gerrit.wikimedia.org/r/763484 (https://phabricator.wikimedia.org/T301959)
[10:22:12] <wikibugs>	 10SRE, 10SRE-Access-Requests: saisuman ssh production public keys reused for WMCS - https://phabricator.wikimedia.org/T300708 (10MMandere) 05In progress→03Resolved a:03MMandere Thank you @SCherukuwada  for confirming.  We'll have the task marked as resolved for now, please reopen if you experience any ne...
[10:25:15] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] ml-services: add cswiki & dewiki editquality isvcs [deployment-charts] - 10https://gerrit.wikimedia.org/r/763480 (https://phabricator.wikimedia.org/T301415) (owner: 10Kevin Bazira)
[10:26:53] <wikibugs>	 (03PS1) 10EJoseph: Upgrade to elasticsearch 7.10.2 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/763485 (https://phabricator.wikimedia.org/T299226)
[10:32:16] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
[10:32:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:48] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
[10:32:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:16] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 (10ssingh)
[10:38:21] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic: Anycast: Add IPv6 support to bird and anycast-healthchecker (Puppet) - https://phabricator.wikimedia.org/T292737 (10ssingh) 05Open→03Resolved IPv6 support for Wikidough and durum was finalized in T301165.  Thanks to Arzhel, Cathal, and John Bond for all the...
[10:39:07] <wikibugs>	 (03PS2) 10Btullis: Remove the old AQS nodes from the aqs cluster [puppet] - 10https://gerrit.wikimedia.org/r/761884 (https://phabricator.wikimedia.org/T297803)
[10:40:49] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Remove the old AQS nodes from the aqs cluster [puppet] - 10https://gerrit.wikimedia.org/r/761884 (https://phabricator.wikimedia.org/T297803) (owner: 10Btullis)
[10:41:26] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: conftool: add request-actions / request-patterns [puppet] - 10https://gerrit.wikimedia.org/r/763486
[10:42:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] conftool: add request-actions / request-patterns [puppet] - 10https://gerrit.wikimedia.org/r/763486 (owner: 10Giuseppe Lavagetto)
[10:46:39] <kormat>	 !log running schema change against s5 T300774
[10:46:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:46:45] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[10:46:47] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[10:46:48] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[10:46:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:46:53] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20968 and previous config saved to /var/cache/conftool/dbconfig/20220217-104653-kormat.json
[10:46:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:46:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:58:11] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: conftool: add request-actions / request-patterns [puppet] - 10https://gerrit.wikimedia.org/r/763486
[10:58:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [dns] - 10https://gerrit.wikimedia.org/r/763323 (https://phabricator.wikimedia.org/T300076) (owner: 10Jbond)
[10:58:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] conftool: add request-actions / request-patterns [puppet] - 10https://gerrit.wikimedia.org/r/763486 (owner: 10Giuseppe Lavagetto)
[11:00:05] <jouncebot>	 mvolz: #bothumor I � Unicode. All rise for Services – Citoid / Zotero deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T1100).
[11:01:30] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] wikimedia.org: Add MS O365 txt record [dns] - 10https://gerrit.wikimedia.org/r/763323 (https://phabricator.wikimedia.org/T300076) (owner: 10Jbond)
[11:01:34] <moritzm>	 !log installing python3.5 security uodates
[11:01:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:06:27] <wikibugs>	 10SRE, 10DNS, 10Traffic, 10Patch-For-Review, 10WMSE (IT): Need Assistance adding DNS records to claim domain - https://phabricator.wikimedia.org/T300076 (10jbond) this is in place now, notice the list TXT line below  ` lang=console $ dig txt wikimedia.org @ns0.wikimedia.org...
[11:11:12] <wikibugs>	 10SRE, 10DNS, 10Traffic, 10WMSE (IT): Need Assistance adding DNS records to claim domain - https://phabricator.wikimedia.org/T300076 (10jbond) 05Open→03Stalled
[11:13:43] <hashar>	 I have confirmed the wikipedia: interwiki is broken due to https://gerrit.wikimedia.org/r/c/mediawiki/core/+/760695
[11:14:47] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20969 and previous config saved to /var/cache/conftool/dbconfig/20220217-111447-kormat.json
[11:14:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:14:52] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[11:21:12] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 04-1] "one slightly misleading task ID (I think), LGTM otherwise" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763398 (https://phabricator.wikimedia.org/T301647) (owner: 104nn1l2)
[11:23:03] <wikibugs>	 (03PS1) 10Hashar: Revert "Optimise Skin::getLanguages()" [core] (wmf/1.38.0-wmf.22) - 10https://gerrit.wikimedia.org/r/763294 (https://phabricator.wikimedia.org/T301936)
[11:27:11] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: elastic1043.eqiad.wmnet
[11:27:12] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: elastic1043.eqiad.wmnet
[11:27:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:38] <wikibugs>	 (03PS7) 10Minato826: Enable RelatedArticles for desktop (non-mobile) view at zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762761 (https://phabricator.wikimedia.org/T299856)
[11:28:39] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: elastic1046.eqiad.wmnet
[11:28:40] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: elastic1046.eqiad.wmnet
[11:28:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:52] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P20970 and previous config saved to /var/cache/conftool/dbconfig/20220217-112951-kormat.json
[11:29:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:36:33] <wikibugs>	 (03CR) 10Jbond: R:varnish:instance: Add hiere key to control cloud ratelimits (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/740828 (https://phabricator.wikimedia.org/T224891) (owner: 10Jbond)
[11:36:41] <wikibugs>	 (03PS6) 10Jbond: R:varnish:instance: Add genral public cloud rate limiting [puppet] - 10https://gerrit.wikimedia.org/r/740818 (https://phabricator.wikimedia.org/T224891)
[11:36:43] <wikibugs>	 (03PS11) 10Jbond: R:varnish:instance: Add hiere key to control cloud ratelimits [puppet] - 10https://gerrit.wikimedia.org/r/740828 (https://phabricator.wikimedia.org/T224891)
[11:36:47] <wikibugs>	 (03CR) 10ZPapierski: [C: 03+1] elasticsearch: allow using elasticsearch v6.8 [puppet] - 10https://gerrit.wikimedia.org/r/763477 (https://phabricator.wikimedia.org/T295666) (owner: 10Gehel)
[11:37:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] R:varnish:instance: Add hiere key to control cloud ratelimits [puppet] - 10https://gerrit.wikimedia.org/r/740828 (https://phabricator.wikimedia.org/T224891) (owner: 10Jbond)
[11:43:24] <wikibugs>	 (03PS12) 10Jbond: R:varnish:instance: Add hiere key to control cloud ratelimits [puppet] - 10https://gerrit.wikimedia.org/r/740828 (https://phabricator.wikimedia.org/T224891)
[11:44:55] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolsdb primary: come back in read only mode [puppet] - 10https://gerrit.wikimedia.org/r/763482 (owner: 10Majavah)
[11:44:57] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P20971 and previous config saved to /var/cache/conftool/dbconfig/20220217-114456-kormat.json
[11:45:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:09] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Superset/Turnilo for Kinneretgordon - https://phabricator.wikimedia.org/T301098 (10MMandere) >>! In T301098#7714249, @gerritbot wrote: > Change 763200 **merged** by MMandere: > %%%[operations/puppet@production] admin: Change Kinneret username%%% > https://gerr...
[12:00:01] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20972 and previous config saved to /var/cache/conftool/dbconfig/20220217-120001-kormat.json
[12:00:03] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[12:00:05] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[12:00:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:06] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[12:00:07] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[12:00:10] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[12:00:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:14] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T300774)', diff saved to https://phabricator.wikimedia.org/P20973 and previous config saved to /var/cache/conftool/dbconfig/20220217-120014-kormat.json
[12:00:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:01:26] <wikibugs>	 (03PS1) 10Majavah: prometheus: add heartbeat collection on mysqld_exporter [puppet] - 10https://gerrit.wikimedia.org/r/763490
[12:03:53] <wikibugs>	 (03CR) 10Jbond: R:varnish:instance: Add hiere key to control cloud ratelimits (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/740828 (https://phabricator.wikimedia.org/T224891) (owner: 10Jbond)
[12:13:49] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10puppet-compiler: compiler1003.puppet-diffs.eqiad1.wikimedia.cloud out of disk space - https://phabricator.wikimedia.org/T295253 (10Majavah) 05Open→03Resolved a:03Majavah
[12:18:48] <wikibugs>	 (03PS1) 10MMandere: admin: Change Kinneret username [puppet] - 10https://gerrit.wikimedia.org/r/763498 (https://phabricator.wikimedia.org/T301098)
[12:19:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/763498 (https://phabricator.wikimedia.org/T301098) (owner: 10MMandere)
[12:20:07] <wikibugs>	 (03CR) 10MMandere: [C: 03+2] admin: Change Kinneret username [puppet] - 10https://gerrit.wikimedia.org/r/763498 (https://phabricator.wikimedia.org/T301098) (owner: 10MMandere)
[12:25:58] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300774)', diff saved to https://phabricator.wikimedia.org/P20974 and previous config saved to /var/cache/conftool/dbconfig/20220217-122557-kormat.json
[12:26:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:26:04] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[12:30:23] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Superset/Turnilo for Kinneretgordon - https://phabricator.wikimedia.org/T301098 (10MMandere) 05In progress→03Resolved a:03MMandere Marking this task as resolved. @KinneretG, please feel free to reopen it if you encounter any new iss...
[12:41:03] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P20975 and previous config saved to /var/cache/conftool/dbconfig/20220217-124102-kormat.json
[12:41:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:43:30] <wikibugs>	 10SRE, 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: Allow idrac tftp fetching of firmware updates (either to existing tftp or new solution) - https://phabricator.wikimedia.org/T283771 (10jbond) i ran a script yesterday which has collected all the current drac and bios versions.  Sorry...
[12:44:45] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Michael.hay - https://phabricator.wikimedia.org/T301782 (10MMandere)
[12:53:03] <wikibugs>	 (03CR) 10Jbond: conftool: add request-actions / request-patterns (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/763486 (owner: 10Giuseppe Lavagetto)
[12:56:07] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P20976 and previous config saved to /var/cache/conftool/dbconfig/20220217-125607-kormat.json
[12:56:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:01:56] <moritzm>	 !log installing expat security updates
[13:02:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:12] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300774)', diff saved to https://phabricator.wikimedia.org/P20977 and previous config saved to /var/cache/conftool/dbconfig/20220217-131111-kormat.json
[13:11:13] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[13:11:15] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[13:11:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:18] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[13:11:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:13:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Add Cumin alias for durum [puppet] - 10https://gerrit.wikimedia.org/r/763509
[13:13:43] <wikibugs>	 (03PS2) 10Muehlenhoff: Add Cumin alias for durum [puppet] - 10https://gerrit.wikimedia.org/r/763509
[13:16:41] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: OK: UP: 20 AdminDown: 1 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[13:18:27] <moritzm>	 !log installing zsh security updates
[13:18:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:08] <wikibugs>	 (03PS2) 104nn1l2: InitialiseSettings: General cleanup, wgAddGroups (R-Z) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763398 (https://phabricator.wikimedia.org/T301647)
[13:19:41] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[13:23:24] <wikibugs>	 (03CR) 104nn1l2: InitialiseSettings: General cleanup, wgAddGroups (R-Z) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763398 (https://phabricator.wikimedia.org/T301647) (owner: 104nn1l2)
[13:35:51] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[13:35:53] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[13:35:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:43:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job mjolnir in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[13:43:20] <moritzm>	 !log installing paramiko securiy updates
[13:43:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:52:08] <wikibugs>	 (03CR) 10David Caro: backy2: on Bullseye, hack around a silly package name mismatch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763336 (https://phabricator.wikimedia.org/T301909) (owner: 10Andrew Bogott)
[13:57:02] <wikibugs>	 (03CR) 10Ssingh: [C: 03+1] "Thank you for this patch!" [puppet] - 10https://gerrit.wikimedia.org/r/763509 (owner: 10Muehlenhoff)
[13:58:25] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[13:58:26] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
[13:58:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:31] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20979 and previous config saved to /var/cache/conftool/dbconfig/20220217-135831-kormat.json
[13:58:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:38] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[14:00:04] <jouncebot>	 brennen: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for UTC evening backport and config training . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T1400).
[14:00:04] <jouncebot>	 nn1l2 and anoop: A patch you scheduled for UTC evening backport and config training is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:08] <nn1l2>	 hi
[14:00:14] <anoop>	 Hello
[14:00:21] <wikibugs>	 (03CR) 10Jbond: "just noticed i never pressed send on this comment :P" [puppet] - 10https://gerrit.wikimedia.org/r/722599 (https://phabricator.wikimedia.org/T261966) (owner: 10Jbond)
[14:00:26] <wikibugs>	 (03PS6) 10Jbond: C:cassandra: add optional java_package variable [puppet] - 10https://gerrit.wikimedia.org/r/722599 (https://phabricator.wikimedia.org/T261966)
[14:00:36] <Lucas_WMDE>	 o/
[14:00:44] <Lucas_WMDE>	 so the timing of this window turns out to be a bit unclear…
[14:01:05] <wikibugs>	 (03CR) 10Jbond: C:package_builder: Add Script for building debian packages from git (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/681445 (owner: 10Jbond)
[14:01:36] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] InitialiseSettings: General cleanup, wgAddGroups (R-Z) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763398 (https://phabricator.wikimedia.org/T301647) (owner: 104nn1l2)
[14:02:16] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: conftool: add request-actions / request-patterns [puppet] - 10https://gerrit.wikimedia.org/r/763486
[14:03:01] <Lucas_WMDE>	 but since there’s nothing else in the calendar at the moment, I assume it’s okay to do deployments unless brennen or someone else disagrees
[14:03:05] <Lucas_WMDE>	 (I’ll wait for a few minutes)
[14:06:17] <wikibugs>	 (03CR) 10Hashar: ci: Qemu image and snapshot creation (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/758514 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar)
[14:06:26] <wikibugs>	 (03PS8) 10Jbond: exim: add the ability to silently drop senders [puppet] - 10https://gerrit.wikimedia.org/r/748884 (https://phabricator.wikimedia.org/T298038) (owner: 10JHathaway)
[14:06:28] <wikibugs>	 (03PS14) 10Hashar: ci: Qemu image and snapshot creation [puppet] - 10https://gerrit.wikimedia.org/r/758514 (https://phabricator.wikimedia.org/T284774)
[14:06:39] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "i think this looks good to go?" [puppet] - 10https://gerrit.wikimedia.org/r/748884 (https://phabricator.wikimedia.org/T298038) (owner: 10JHathaway)
[14:07:15] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] Remove ArgparseFormatter as it's now unused [cookbooks] - 10https://gerrit.wikimedia.org/r/762860 (owner: 10Volans)
[14:07:53] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:11:02] <Lucas_WMDE>	 alright, let’s do the config changes then
[14:11:23] <wikibugs>	 (03PS3) 10Lucas Werkmeister (WMDE): InitialiseSettings: General cleanup, wgAddGroups (R-Z) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763398 (https://phabricator.wikimedia.org/T301647) (owner: 104nn1l2)
[14:11:58] <Lucas_WMDE>	 (waiting for the diffConfig build to finish before +2ing)
[14:12:24] <hashar>	 jouncebot: now
[14:12:24] <jouncebot>	 For the next 0 hour(s) and 47 minute(s): UTC evening backport and config training (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T1400)
[14:12:42] <Lucas_WMDE>	 hashar: I moved the window in the calendar yesterday, assuming that this is where it was supposed to be
[14:12:50] <Lucas_WMDE>	 per the discussion in Gerrit that assumption might have been wrong
[14:12:57] <Lucas_WMDE>	 but I’m assuming I can still do deployments now
[14:13:17] <hashar>	 Lucas_WMDE: yes it is all good :)
[14:13:19] <Lucas_WMDE>	 but if you disagree I can also hold off (nothing merged yet)
[14:13:21] <Lucas_WMDE>	 ok :)
[14:13:34] <hashar>	 I think the original intent for those windows were:
[14:13:40] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Enable RelatedArticles for desktop (non-mobile) view at zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762761 (https://phabricator.wikimedia.org/T299856) (owner: 10Minato826)
[14:13:52] <hashar>	 1) ensure someone knowing how to deploky stuff is available as a service to other developers that don't know much about scap
[14:13:57] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "see comment" [cookbooks] - 10https://gerrit.wikimedia.org/r/762934 (owner: 10Volans)
[14:14:04] <hashar>	 2) avoid concurrent conflicting deployments
[14:14:06] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "diffConfig empty, good to go" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763398 (https://phabricator.wikimedia.org/T301647) (owner: 104nn1l2)
[14:14:14] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] add Hannah Okwelum to platform-engineering group [puppet] - 10https://gerrit.wikimedia.org/r/763456 (https://phabricator.wikimedia.org/T301876) (owner: 10ArielGlenn)
[14:14:28] <hashar>	 I am quite happy to let folks deploy out of window or adjust the window if needed :)
[14:14:30] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Remove ArgparseFormatter as it's now unused [cookbooks] - 10https://gerrit.wikimedia.org/r/762860 (owner: 10Volans)
[14:15:03] <wikibugs>	 (03Merged) 10jenkins-bot: InitialiseSettings: General cleanup, wgAddGroups (R-Z) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763398 (https://phabricator.wikimedia.org/T301647) (owner: 104nn1l2)
[14:15:05] <icinga-wm>	 PROBLEM - Host prometheus6001 is DOWN: PING CRITICAL - Packet loss = 100%
[14:15:09] <icinga-wm>	 PROBLEM - Host bast6001 is DOWN: PING CRITICAL - Packet loss = 100%
[14:15:13] <icinga-wm>	 PROBLEM - Host ncredir6002 is DOWN: PING CRITICAL - Packet loss = 100%
[14:15:15] <icinga-wm>	 PROBLEM - Host install6001 is DOWN: PING CRITICAL - Packet loss = 100%
[14:15:29] <icinga-wm>	 PROBLEM - Host netflow6001 is DOWN: PING CRITICAL - Packet loss = 100%
[14:15:33] <Lucas_WMDE>	 nn1l2: change is on mwdebug1001, let’s test
[14:15:39] <Lucas_WMDE>	 ^ something going on in dc 6?
[14:15:45] <Lucas_WMDE>	 (don’t remember which one that is)
[14:15:53] <icinga-wm>	 PROBLEM - Host ncredir6001 is DOWN: PING CRITICAL - Packet loss = 100%
[14:15:58] <Lucas_WMDE>	 drmrs
[14:15:59] <taavi>	 Lucas_WMDE: drmrs, please ignore
[14:16:02] <Lucas_WMDE>	 ok
[14:16:41] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "LGTM thanks" [puppet] - 10https://gerrit.wikimedia.org/r/763461 (owner: 10Majavah)
[14:17:07] <wikibugs>	 (03PS1) 10Hashar: Stop excluding the 'wikipedia' interwiki prefix [extensions/WikimediaMaintenance] (wmf/1.38.0-wmf.22) - 10https://gerrit.wikimedia.org/r/763298 (https://phabricator.wikimedia.org/T301936)
[14:17:23] <hashar>	 ^^ I am adding this patch to the current window
[14:17:31] <Lucas_WMDE>	 ah, very good
[14:17:56] <Lucas_WMDE>	 I’ll sync the config change and then let you do that one, ok?
[14:18:26] <nn1l2>	 wikidata still works :)
[14:18:33] <Lucas_WMDE>	 syncing (I tested simplewiki ^^)
[14:18:52] <wikibugs>	 (03Merged) 10jenkins-bot: Remove ArgparseFormatter as it's now unused [cookbooks] - 10https://gerrit.wikimedia.org/r/762860 (owner: 10Volans)
[14:19:10] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:763398|InitialiseSettings: General cleanup, wgAddGroups (R-Z) (T301647)]] (no-op) (duration: 00m 50s)
[14:19:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:15] <stashbot>	 T301647: Clean up InitialiseSettings - https://phabricator.wikimedia.org/T301647
[14:19:30] <Lucas_WMDE>	 hashar: want to self-service?
[14:19:32] <wikibugs>	 (03Abandoned) 10Hashar: Revert "Optimise Skin::getLanguages()" [core] (wmf/1.38.0-wmf.22) - 10https://gerrit.wikimedia.org/r/763294 (https://phabricator.wikimedia.org/T301936) (owner: 10Hashar)
[14:19:58] <hashar>	 Lucas_WMDE: sure I can self deploy ;)
[14:20:04] <hashar>	 will do whenever the other patches are complete
[14:20:04] <Lucas_WMDE>	 ok, over to you then
[14:20:15] <Lucas_WMDE>	 I’d say unbreak the train before the remaining config change
[14:20:22] <hashar>	 zabe: i am going to deploy the interwiki config change ;)
[14:20:44] <hashar>	 well the interwiki has been broken since yesterday so there is no rush
[14:20:57] <Lucas_WMDE>	 ok fine
[14:21:05] <Lucas_WMDE>	 but I think you can start the gate-and-submit for yours at least
[14:21:08] <wikibugs>	 (03PS8) 10Lucas Werkmeister (WMDE): Enable RelatedArticles for desktop (non-mobile) view at zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762761 (https://phabricator.wikimedia.org/T299856) (owner: 10Minato826)
[14:21:11] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] Stop excluding the 'wikipedia' interwiki prefix [extensions/WikimediaMaintenance] (wmf/1.38.0-wmf.22) - 10https://gerrit.wikimedia.org/r/763298 (https://phabricator.wikimedia.org/T301936) (owner: 10Hashar)
[14:21:13] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Enable RelatedArticles for desktop (non-mobile) view at zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762761 (https://phabricator.wikimedia.org/T299856) (owner: 10Minato826)
[14:21:15] <hashar>	 true! done
[14:21:25] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:21:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:36] <zabe>	 nice
[14:22:15] <wikibugs>	 (03Merged) 10jenkins-bot: Enable RelatedArticles for desktop (non-mobile) view at zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762761 (https://phabricator.wikimedia.org/T299856) (owner: 10Minato826)
[14:22:37] <Lucas_WMDE>	 anoop: your zhwikinews change is on mwdebug1001, can you test it?
[14:22:40] <hashar>	 zabe: I am inclined toward a tiny logic change in the code but I have been too lazy to investigate! Daniel Kinzler hinted at a configuration issue and I am more than happy to remove a hack from interwikiDump.php :] We will see how it behaves once pulled on mwdebug1001
[14:22:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:22:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:22:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:12] <wikibugs>	 (03Merged) 10jenkins-bot: Stop excluding the 'wikipedia' interwiki prefix [extensions/WikimediaMaintenance] (wmf/1.38.0-wmf.22) - 10https://gerrit.wikimedia.org/r/763298 (https://phabricator.wikimedia.org/T301936) (owner: 10Hashar)
[14:23:18] <anoop>	 ok, working fine
[14:23:26] <Lucas_WMDE>	 yay
[14:23:51] <icinga-wm>	 RECOVERY - Host netflow6001 is UP: PING OK - Packet loss = 0%, RTA = 101.23 ms
[14:23:53] <icinga-wm>	 RECOVERY - Host ncredir6002 is UP: PING OK - Packet loss = 0%, RTA = 101.65 ms
[14:23:53] <icinga-wm>	 RECOVERY - Host bast6001 is UP: PING OK - Packet loss = 0%, RTA = 105.57 ms
[14:23:53] <icinga-wm>	 RECOVERY - Host ncredir6001 is UP: PING OK - Packet loss = 0%, RTA = 101.69 ms
[14:23:56] <Lucas_WMDE>	 I also see something that looks like related pages at the bottom of zhwikinews
[14:24:00] <Lucas_WMDE>	 though I can’t read Chinese ^^
[14:24:03] <icinga-wm>	 RECOVERY - Host install6001 is UP: PING OK - Packet loss = 0%, RTA = 101.66 ms
[14:24:05] <Lucas_WMDE>	 sync running
[14:24:13] <hashar>	 Amir1: I am going to actually use your deploy-commands site ( https://deploy-commands.toolforge.org/bacc/763298 ). It is really a blessing ;]
[14:24:13] <icinga-wm>	 RECOVERY - Host prometheus6001 is UP: PING OK - Packet loss = 0%, RTA = 101.92 ms
[14:24:18] <Lucas_WMDE>	 and drmrs is coming back how nice
[14:24:19] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:24:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:27] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20980 and previous config saved to /var/cache/conftool/dbconfig/20220217-142427-kormat.json
[14:24:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:32] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[14:24:33] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:762761|Enable RelatedArticles for desktop (non-mobile) view at zhwikinews (T299856)]] (duration: 00m 49s)
[14:24:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:37] <Amir1>	 hashar: ^^
[14:24:37] <stashbot>	 T299856: Enable RelatedArticles for desktop (non-mobile) view at zhwikinews - https://phabricator.wikimedia.org/T299856
[14:24:40] <Lucas_WMDE>	 hashar: all yours
[14:25:01] <Lucas_WMDE>	 looks like it already merged, nice
[14:25:03] <Amir1>	 the sync summary is the most useful part for me
[14:25:05] * hashar copy paste
[14:25:49] <hashar>	 hmm
[14:25:59] <hashar>	 somehow that rebased Echo and GrowthExperiments
[14:26:21] <hashar>	 ah local patches
[14:26:39] <Lucas_WMDE>	 mhm
[14:26:44] <hashar>	 patch on mwdebug1001
[14:28:03] <hashar>	 I think due to security patches
[14:28:16] <wikibugs>	 10SRE, 10observability, 10Patch-For-Review: Move Kafka logging to the new intermediate PKI - https://phabricator.wikimedia.org/T300130 (10elukey) Very interesting use case in https://gerrit.wikimedia.org/r/c/operations/puppet/+/763113, namely Beta/deployment-prep. We have two sets of VMs:  * Kafka logging (c...
[14:29:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:29:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:30:06] <hashar>	 zabe: patch is on mwdebug1001. I am testing it
[14:30:26] <hashar>	 oh no
[14:30:31] <hashar>	 I have to regenerate the interwikidump
[14:30:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:30:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:30:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:30:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:30:47] <zabe>	 yep
[14:31:15] <zabe>	 also syncing out the wikimediamaintenance patch should do nothing, so there is no real risk
[14:31:46] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:31:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:32:10] <logmsgbot>	 !log hashar@deploy1002 Synchronized php-1.38.0-wmf.22/extensions/WikimediaMaintenance/dumpInterwiki.php: Backport: [[gerrit:763298|Stop excluding the 'wikipedia' interwiki prefix (T301936)]] (duration: 00m 48s)
[14:32:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:32:15] <stashbot>	 T301936: Interwiki prefix "wikipedia" not working on multilingual wikimedia projects  - https://phabricator.wikimedia.org/T301936
[14:34:54] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add Cumin alias for durum [puppet] - 10https://gerrit.wikimedia.org/r/763509 (owner: 10Muehlenhoff)
[14:35:41] <wikibugs>	 (03PS1) 10Hashar: Regen interwiki cache to drop erroneous 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763516 (https://phabricator.wikimedia.org/T301936)
[14:35:47] <hashar>	 zabe: ^ ;)
[14:37:03] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Regen interwiki cache to drop erroneous 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763516 (https://phabricator.wikimedia.org/T301936) (owner: 10Hashar)
[14:37:26] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] Regen interwiki cache to drop erroneous 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763516 (https://phabricator.wikimedia.org/T301936) (owner: 10Hashar)
[14:37:36] <hashar>	 going to copy paste from https://deploy-commands.toolforge.org/bacc/763516
[14:37:39] <hashar>	 thx Lucas_WMDE !
[14:37:41] <wikibugs>	 (03CR) 10Zabe: [C: 03+1] Regen interwiki cache to drop erroneous 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763516 (https://phabricator.wikimedia.org/T301936) (owner: 10Hashar)
[14:38:08] <wikibugs>	 (03Merged) 10jenkins-bot: Regen interwiki cache to drop erroneous 'wikipedia' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763516 (https://phabricator.wikimedia.org/T301936) (owner: 10Hashar)
[14:38:43] <hashar>	 testing on mwdebug1001
[14:39:32] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P20981 and previous config saved to /var/cache/conftool/dbconfig/20220217-143931-kormat.json
[14:39:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:44] <jinxer-wm>	 (ProbeHttpFailed) firing: URL did not return HTTP 2xx or 3xx response (or probe/connection failed) - https://wikitech.wikimedia.org/wiki/Prometheus#Watchrat_Non-23xx_HTTP_response - https://grafana.wikimedia.org/d/GYciEga7z/watchrat - https://alerts.wikimedia.org
[14:39:48] <jinxer-wm>	 (ProbeHttpFailed) firing: (22) URL did not return HTTP 2xx or 3xx response (or probe/connection failed) - https://wikitech.wikimedia.org/wiki/Prometheus#Watchrat_Non-23xx_HTTP_response - https://grafana.wikimedia.org/d/GYciEga7z/watchrat - https://alerts.wikimedia.org
[14:41:20] <wikibugs>	 (03CR) 10JHathaway: [C: 03+1] "looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/740818 (https://phabricator.wikimedia.org/T224891) (owner: 10Jbond)
[14:41:50] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[14:41:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:18] <logmsgbot>	 !log dcausse@deploy1002 Started deploy [wikimedia/discovery/analytics@3a25565]: (no justification provided)
[14:42:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:29] <icinga-wm>	 PROBLEM - Host mr1-drmrs is DOWN: PING CRITICAL - Packet loss = 100%
[14:42:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[14:42:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[14:43:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:08] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10Papaul) @elukey can you please get me the Partitioning/Raid information?  Thanks
[14:44:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[14:44:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:13] <zabe>	 hashar: lgtm, after purging the interwiki links are working for me
[14:44:22] <logmsgbot>	 !log dcausse@deploy1002 Finished deploy [wikimedia/discovery/analytics@3a25565]: (no justification provided) (duration: 02m 04s)
[14:44:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:42] <hashar>	 zabe: yes indeed
[14:44:51] <wikibugs>	 (03CR) 10Volans: "replies inline, follow up PS coming shortly" [cookbooks] - 10https://gerrit.wikimedia.org/r/762934 (owner: 10Volans)
[14:44:55] <hashar>	 I have replied on the task with all the tests I have made
[14:45:01] <hashar>	 some page will have to be purged I believe
[14:45:14] <hashar>	 thank you very much for the patch to WikimediaMaintenance and the analyzis!
[14:45:23] <logmsgbot>	 !log hashar@deploy1002 Synchronized wmf-config/interwiki.php: Config: [[gerrit:763516|Regen interwiki cache to drop erroneous 'wikipedia' (T301936)]] (duration: 00m 48s)
[14:45:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:28] <stashbot>	 T301936: Interwiki prefix "wikipedia" not working on multilingual wikimedia projects  - https://phabricator.wikimedia.org/T301936
[14:45:33] <icinga-wm>	 PROBLEM - Host asw1-b12-drmrs.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[14:45:34] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10Papaul)
[14:45:35] <icinga-wm>	 PROBLEM - Host asw1-b13-drmrs.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[14:45:39] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10elukey) @Papaul Hi! IIRC these nodes have two 2TB disks, so I'd go for the standard raid1 recipe: `echo partman/standard.cfg partman/raid1-2dev`  Lemme...
[14:46:55] <zabe>	 yw
[14:47:20] <hashar>	 jouncebot: now
[14:47:20] <jouncebot>	 For the next 0 hour(s) and 12 minute(s): UTC evening backport and config training (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T1400)
[14:47:23] <hashar>	 {{success}}
[14:47:36] <hashar>	 !log UTC evening backport and config training has completed.
[14:47:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:48:05] <moritzm>	 imported openjdk-8 8u322-b06-1~deb11u1 for bullseye-wikimedia (forward port of latest Java 8 security fixes)
[14:48:13] <wikibugs>	 (03CR) 10JHathaway: R:varnish:instance: Add hiere key to control cloud ratelimits (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/740828 (https://phabricator.wikimedia.org/T224891) (owner: 10Jbond)
[14:48:31] <icinga-wm>	 PROBLEM - Host asw1-b13-drmrs.wikimedia.org IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[14:48:57] <icinga-wm>	 PROBLEM - Host asw1-b12-drmrs.wikimedia.org IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[14:49:15] <wikibugs>	 (03PS1) 10Elukey: install_server: add partman recipe for ml-cache nodes [puppet] - 10https://gerrit.wikimedia.org/r/763518 (https://phabricator.wikimedia.org/T299433)
[14:49:29] <icinga-wm>	 PROBLEM - Host mr1-drmrs IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[14:49:44] <wikibugs>	 (03PS9) 10JHathaway: exim: add the ability to silently drop senders [puppet] - 10https://gerrit.wikimedia.org/r/748884 (https://phabricator.wikimedia.org/T298038)
[14:50:39] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] install_server: add partman recipe for ml-cache nodes [puppet] - 10https://gerrit.wikimedia.org/r/763518 (https://phabricator.wikimedia.org/T299433) (owner: 10Elukey)
[14:51:08] <wikibugs>	 (03CR) 10JHathaway: "kindly review" [puppet] - 10https://gerrit.wikimedia.org/r/763362 (owner: 10JHathaway)
[14:51:17] <wikibugs>	 (03CR) 10JHathaway: "kindly review" [puppet] - 10https://gerrit.wikimedia.org/r/763370 (owner: 10JHathaway)
[14:51:27] <wikibugs>	 (03CR) 10JHathaway: "kindly review" [puppet] - 10https://gerrit.wikimedia.org/r/763309 (owner: 10JHathaway)
[14:51:41] <wikibugs>	 (03CR) 10JHathaway: "kindly review" [puppet] - 10https://gerrit.wikimedia.org/r/763311 (owner: 10JHathaway)
[14:52:20] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team, 10Patch-For-Review: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10elukey) Went ahead and merged the change, I've also ran puppet across install nodes, so you can install the os whenever you want :)
[14:52:30] <wikibugs>	 (03PS4) 10Volans: sre.hosts.provision: check password correctness [cookbooks] - 10https://gerrit.wikimedia.org/r/762934
[14:53:36] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job mjolnir in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[14:53:49] <wikibugs>	 (03CR) 10Volans: "addressed comment" [cookbooks] - 10https://gerrit.wikimedia.org/r/762934 (owner: 10Volans)
[14:53:53] <jinxer-wm>	 (ProbeHttpFailed) resolved: URL did not return HTTP 2xx or 3xx response (or probe/connection failed) - https://wikitech.wikimedia.org/wiki/Prometheus#Watchrat_Non-23xx_HTTP_response - https://grafana.wikimedia.org/d/GYciEga7z/watchrat - https://alerts.wikimedia.org
[14:53:58] <jinxer-wm>	 (ProbeHttpFailed) resolved: (22) URL did not return HTTP 2xx or 3xx response (or probe/connection failed) - https://wikitech.wikimedia.org/wiki/Prometheus#Watchrat_Non-23xx_HTTP_response - https://grafana.wikimedia.org/d/GYciEga7z/watchrat - https://alerts.wikimedia.org
[14:54:37] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P20982 and previous config saved to /var/cache/conftool/dbconfig/20220217-145436-kormat.json
[14:54:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:47] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: conftool: add request-actions / request-patterns [puppet] - 10https://gerrit.wikimedia.org/r/763486
[14:59:37] <icinga-wm>	 RECOVERY - Host mr1-drmrs is UP: PING OK - Packet loss = 0%, RTA = 85.64 ms
[14:59:39] <icinga-wm>	 RECOVERY - Host asw1-b12-drmrs.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 85.51 ms
[15:00:19] <icinga-wm>	 RECOVERY - Host asw1-b13-drmrs.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 85.51 ms
[15:01:13] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.dns.netbox
[15:01:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:29] <icinga-wm>	 RECOVERY - Host asw1-b13-drmrs.wikimedia.org IPv6 is UP: PING OK - Packet loss = 0%, RTA = 85.56 ms
[15:01:55] <icinga-wm>	 RECOVERY - Host asw1-b12-drmrs.wikimedia.org IPv6 is UP: PING OK - Packet loss = 0%, RTA = 85.47 ms
[15:05:25] <wikibugs>	 10SRE-swift-storage: Storage request for datasets published by research team - https://phabricator.wikimedia.org/T294380 (10MatthewVernon) Hi, sorry for the slow response.  The swift and `S3` endpoints are not available externally; instead the edge caches reverse-proxy e.g. upload.wikimedia.org to swift.discover...
[15:06:03] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:06:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:36] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team, 10Patch-For-Review: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10Papaul) @elukey thanks
[15:09:41] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20983 and previous config saved to /var/cache/conftool/dbconfig/20220217-150941-kormat.json
[15:09:46] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[15:09:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:47] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[15:09:47] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
[15:09:49] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
[15:09:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:55] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
[15:09:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:15] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[15:10:16] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
[15:10:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:21] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20984 and previous config saved to /var/cache/conftool/dbconfig/20220217-151021-kormat.json
[15:10:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:55] <wikibugs>	 10SRE, 10observability, 10Patch-For-Review: Move Kafka logging to the new intermediate PKI - https://phabricator.wikimedia.org/T300130 (10jbond) >>! In T300130#7718276, @elukey wrote: >  but in theory it should be possible to point the logging project hosts to it via `profile::pki::client`. that and adding t...
[15:14:35] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[15:15:39] <icinga-wm>	 RECOVERY - Host mr1-drmrs IPv6 is UP: PING OK - Packet loss = 0%, RTA = 85.69 ms
[15:20:30] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1012.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
[15:20:32] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1012.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
[15:20:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:20:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:22:06] <wikibugs>	 10SRE, 10ops-eqiad: Installation issues on PowerEdge R440 eqiad Ganeti servers with buster / firmware update needed - https://phabricator.wikimedia.org/T299527 (10MoritzMuehlenhoff)
[15:23:09] <wikibugs>	 10SRE, 10ops-eqiad: Installation issues on PowerEdge R440 eqiad Ganeti servers with buster / firmware update needed - https://phabricator.wikimedia.org/T299527 (10MoritzMuehlenhoff) One more server is ready and downtimed; ganeti1012
[15:23:41] <moritzm>	 !log imported openjdk-8 8u322-b06-1~deb11u1 for bullseye-wikimedia (forward port of latest Java 8 security fixes)
[15:23:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:49] <icinga-wm>	 PROBLEM - Disk space on thanos-be2001 is CRITICAL: DISK CRITICAL - free space: / 2118 MB (3% inode=98%): /tmp 2118 MB (3% inode=98%): /var/tmp 2118 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=thanos-be2001&var-datasource=codfw+prometheus/ops
[15:25:35] <wikibugs>	 (03PS9) 10Cathal Mooney: Base config additions and updated templates to configure EVPN ASW [homer/public] - 10https://gerrit.wikimedia.org/r/759709 (https://phabricator.wikimedia.org/T299758)
[15:26:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Base config additions and updated templates to configure EVPN ASW [homer/public] - 10https://gerrit.wikimedia.org/r/759709 (https://phabricator.wikimedia.org/T299758) (owner: 10Cathal Mooney)
[15:26:19] <elukey>	 moritzm: we cannot let go Java 8 :D
[15:26:21] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 1:00:00 on testvm[2001-2003].codfw.wmnet with reason: Instance restarts
[15:26:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:27] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on testvm[2001-2003].codfw.wmnet with reason: Instance restarts
[15:26:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:25] <moritzm>	 yeah :-)
[15:33:10] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM, assuming nothing wired in pcc" [puppet] - 10https://gerrit.wikimedia.org/r/763362 (owner: 10JHathaway)
[15:34:56] <wikibugs>	 10SRE-swift-storage, 10Observability-Logging, 10User-fgiunchedi: Missed swift log rotation can lead to full root filesystem - https://phabricator.wikimedia.org/T301657 (10MatthewVernon) Having a quick look at the logrotate configuration, it has ` /srv/log/swift/*.log { [...]     postrotate         service rs...
[15:35:42] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20986 and previous config saved to /var/cache/conftool/dbconfig/20220217-153542-kormat.json
[15:35:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:35:48] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[15:36:02] <wikibugs>	 (03CR) 10BBlack: conftool: add request-actions / request-patterns (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763486 (owner: 10Giuseppe Lavagetto)
[15:37:01] <wikibugs>	 (03CR) 10Jbond: "LGTM but lets also check with pcc" [puppet] - 10https://gerrit.wikimedia.org/r/763370 (owner: 10JHathaway)
[15:39:17] <wikibugs>	 (03CR) 10BBlack: conftool: add request-actions / request-patterns (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763486 (owner: 10Giuseppe Lavagetto)
[15:41:36] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "pcc: https://puppet-compiler.wmflabs.org/pcc-worker1001/33833/" [puppet] - 10https://gerrit.wikimedia.org/r/763362 (owner: 10JHathaway)
[15:41:39] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.reboot-vm for VM testvm2002.codfw.wmnet
[15:41:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:41:56] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10Traffic, and 2 others: [epic] The SSL certificate for Beta cluster domains fails to properly renew & deploy - https://phabricator.wikimedia.org/T293585 (10AlexisJazz)
[15:42:13] <wikibugs>	 10SRE-swift-storage, 10Observability-Logging, 10User-fgiunchedi: Missed swift log rotation can lead to full root filesystem - https://phabricator.wikimedia.org/T301657 (10MatthewVernon) ` swift (2.26.0-7) unstable; urgency=medium    * Fix logging and logrotate to do like all the other OpenStack daemons. `  F...
[15:45:30] <wikibugs>	 (03CR) 10Jbond: "see inline" [puppet] - 10https://gerrit.wikimedia.org/r/763309 (owner: 10JHathaway)
[15:46:03] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33835/console" [puppet] - 10https://gerrit.wikimedia.org/r/763309 (owner: 10JHathaway)
[15:47:07] <wikibugs>	 (03PS1) 10Elukey: install_server: set new partman recipe for kubestage1003 [puppet] - 10https://gerrit.wikimedia.org/r/763539 (https://phabricator.wikimedia.org/T300744)
[15:47:09] <wikibugs>	 (03PS1) 10Elukey: Add overlayfs settings for kubestage1003 [puppet] - 10https://gerrit.wikimedia.org/r/763540 (https://phabricator.wikimedia.org/T300744)
[15:49:23] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2002.codfw.wmnet
[15:49:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:49:29] <wikibugs>	 (03CR) 10JHathaway: Remove ordered_json function (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763309 (owner: 10JHathaway)
[15:50:47] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P20987 and previous config saved to /var/cache/conftool/dbconfig/20220217-155047-kormat.json
[15:50:50] <wikibugs>	 (03CR) 10Andrew Bogott: backy2: on Bullseye, hack around a silly package name mismatch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763336 (https://phabricator.wikimedia.org/T301909) (owner: 10Andrew Bogott)
[15:50:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:06] <wikibugs>	 (03PS2) 10David Caro: mcrouter::monitoring: remove module [puppet] - 10https://gerrit.wikimedia.org/r/751136 (https://phabricator.wikimedia.org/T272559)
[15:54:08] <wikibugs>	 (03CR) 10David Caro: mcrouter::monitoring: remove module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/751136 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[15:54:10] <wikibugs>	 (03PS15) 10Hashar: ci: Qemu image and snapshot creation [puppet] - 10https://gerrit.wikimedia.org/r/758514 (https://phabricator.wikimedia.org/T284774)
[15:54:24] <wikibugs>	 (03PS1) 10MVernon: swift: use rsyslog-rotate to get rsyslog to close old files [puppet] - 10https://gerrit.wikimedia.org/r/763541 (https://phabricator.wikimedia.org/T301657)
[15:55:04] <wikibugs>	 (03PS3) 10David Caro: parsoid: remove unused module [puppet] - 10https://gerrit.wikimedia.org/r/751163 (https://phabricator.wikimedia.org/T272559)
[15:56:27] <wikibugs>	 (03CR) 10Hashar: "I have made the last qemu-img create to preallocate disk space with:" [puppet] - 10https://gerrit.wikimedia.org/r/758514 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar)
[15:56:31] <wikibugs>	 (03PS3) 10Andrew Bogott: backy2: initialize backy2 database if necessary [puppet] - 10https://gerrit.wikimedia.org/r/763401
[15:56:33] <wikibugs>	 (03PS4) 10Andrew Bogott: backy2: don't back up shelved instances [puppet] - 10https://gerrit.wikimedia.org/r/763345
[15:56:35] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33836/console" [puppet] - 10https://gerrit.wikimedia.org/r/763362 (owner: 10JHathaway)
[16:00:07] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: conftool: add request-actions / request-patterns (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/763486 (owner: 10Giuseppe Lavagetto)
[16:03:09] <wikibugs>	 (03PS2) 10Elukey: install_server: set new partman recipe for kubestage2001 [puppet] - 10https://gerrit.wikimedia.org/r/763539 (https://phabricator.wikimedia.org/T300744)
[16:03:11] <wikibugs>	 (03PS2) 10Elukey: Add overlayfs settings for kubestage2001 [puppet] - 10https://gerrit.wikimedia.org/r/763540 (https://phabricator.wikimedia.org/T300744)
[16:05:08] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on February 16, 2022. - https://phabricator.wikimedia.org/T301995 (10Zabe) Since there seems to be a valid, I did the same mitigation as in T271808 and T293070. ` root@deployment-cache-up...
[16:05:52] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P20988 and previous config saved to /var/cache/conftool/dbconfig/20220217-160551-kormat.json
[16:05:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:06:34] <wikibugs>	 10SRE, 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on February 16, 2022. - https://phabricator.wikimedia.org/T301995 (10AlexisJazz) >>! In T301995#7718593, @Zabe wrote: > Since there seems to be a valid certificate, I did the same mitigat...
[16:08:23] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] install_server: set new partman recipe for kubestage2001 [puppet] - 10https://gerrit.wikimedia.org/r/763539 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[16:09:36] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] install_server: set new partman recipe for kubestage2001 [puppet] - 10https://gerrit.wikimedia.org/r/763539 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[16:09:39] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] Add overlayfs settings for kubestage2001 [puppet] - 10https://gerrit.wikimedia.org/r/763540 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[16:11:00] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33837/console" [puppet] - 10https://gerrit.wikimedia.org/r/763311 (owner: 10JHathaway)
[16:17:43] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add overlayfs settings for kubestage2001 [puppet] - 10https://gerrit.wikimedia.org/r/763540 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[16:18:08] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33839/console" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763311 (owner: 10JHathaway)
[16:18:35] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+1] "not blocking but would also be nice to have tests for theses (perhaps can be handled when moving to namespaced functions)" [puppet] - 10https://gerrit.wikimedia.org/r/763311 (owner: 10JHathaway)
[16:18:42] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+1] "PCC SUCCESS (NOOP 11): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33838/console" [puppet] - 10https://gerrit.wikimedia.org/r/763311 (owner: 10JHathaway)
[16:19:44] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+1] Remove ordered_json function (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763309 (owner: 10JHathaway)
[16:20:55] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bullseye
[16:20:56] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20989 and previous config saved to /var/cache/conftool/dbconfig/20220217-162056-kormat.json
[16:20:58] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[16:20:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:59] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
[16:21:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:21:04] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Depooling db1110 (T300774)', diff saved to https://phabricator.wikimedia.org/P20990 and previous config saved to /var/cache/conftool/dbconfig/20220217-162104-kormat.json
[16:21:05] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[16:21:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:21:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:21:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:21:26] <wikibugs>	 (03CR) 10EllenR: [C: 03+1] "lgtm" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762881 (https://phabricator.wikimedia.org/T297629) (owner: 10Eigyan)
[16:21:34] <wikibugs>	 (03PS1) 10Ayounsi: Add drmrs routers [homer/public] - 10https://gerrit.wikimedia.org/r/763551 (https://phabricator.wikimedia.org/T300277)
[16:21:56] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/762934 (owner: 10Volans)
[16:22:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add drmrs routers [homer/public] - 10https://gerrit.wikimedia.org/r/763551 (https://phabricator.wikimedia.org/T300277) (owner: 10Ayounsi)
[16:22:49] <wikibugs>	 (03PS7) 10Eigyan: [wmf-config]: Deploy the fawiki test safety survey to production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762881 (https://phabricator.wikimedia.org/T297629)
[16:23:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:23:56] <wikibugs>	 (03CR) 10Volans: [C: 03+2] sre.hosts.provision: check password correctness [cookbooks] - 10https://gerrit.wikimedia.org/r/762934 (owner: 10Volans)
[16:25:17] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] mcrouter::monitoring: remove module [puppet] - 10https://gerrit.wikimedia.org/r/751136 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[16:25:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:25:31] <wikibugs>	 (03PS2) 10Ayounsi: Add drmrs routers [homer/public] - 10https://gerrit.wikimedia.org/r/763551 (https://phabricator.wikimedia.org/T300277)
[16:26:46] <wikibugs>	 (03Merged) 10jenkins-bot: sre.hosts.provision: check password correctness [cookbooks] - 10https://gerrit.wikimedia.org/r/762934 (owner: 10Volans)
[16:27:51] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.dns.netbox
[16:27:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job calico-felix in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:30:26] <jinxer-wm>	 (KubernetesCalicoDown) firing: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[16:31:06] <wikibugs>	 (03PS2) 10MVernon: swift: use rsyslog-rotate to get rsyslog to close old files [puppet] - 10https://gerrit.wikimedia.org/r/763541 (https://phabricator.wikimedia.org/T301657)
[16:31:55] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata for Skye Berghel - https://phabricator.wikimedia.org/T301581 (10JBennett) Approved.
[16:32:50] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Michael.hay - https://phabricator.wikimedia.org/T301782 (10JBennett) approved
[16:32:52] <wikibugs>	 (03PS1) 10Accraze: ml-services: add elwiki, enwiktionary, eswikibooks [deployment-charts] - 10https://gerrit.wikimedia.org/r/763556 (https://phabricator.wikimedia.org/T301415)
[16:33:17] <wikibugs>	 (03PS12) 10AGueyte: Update Event Stream for IPInfo events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/756635 (https://phabricator.wikimedia.org/T296415)
[16:33:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update Event Stream for IPInfo events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/756635 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte)
[16:35:18] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] mcrouter::monitoring: remove module [puppet] - 10https://gerrit.wikimedia.org/r/751136 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[16:35:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:38:00] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:38:14] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] pybal::testing: remove unused role/profile [puppet] - 10https://gerrit.wikimedia.org/r/751709 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[16:39:26] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] pybal::testing: remove unused role/profile [puppet] - 10https://gerrit.wikimedia.org/r/751709 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[16:39:43] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
[16:39:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:40:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:40:26] <jinxer-wm>	 (KubernetesCalicoDown) resolved: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[16:40:56] <jinxer-wm>	 (KubernetesCalicoDown) firing: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[16:41:15] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10dcaro)
[16:42:11] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
[16:42:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:15] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.ganeti.makevm for new host datahubsearch1002.eqiad.wmnet
[16:42:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:42:18] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] profile::nutcracker: remove unused profile [puppet] - 10https://gerrit.wikimedia.org/r/751701 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[16:43:09] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10dcaro)
[16:45:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:45:56] <jinxer-wm>	 (KubernetesCalicoDown) resolved: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[16:46:56] <jinxer-wm>	 (KubernetesCalicoDown) firing: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[16:47:06] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:47:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:48:47] <wikibugs>	 (03PS1) 10Ladsgroup: Revert "db1146: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/763302
[16:49:05] <wikibugs>	 (03PS2) 10Ladsgroup: Revert "db1146: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/763302
[16:49:08] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Revert "db1146: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/763302 (owner: 10Ladsgroup)
[16:50:13] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: conftool: add request-actions / request-patterns [puppet] - 10https://gerrit.wikimedia.org/r/763486
[16:50:15] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: [draft] varnish/frontend: consume etcd data for dynamic banning of requests. [puppet] - 10https://gerrit.wikimedia.org/r/763557
[16:50:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:51:24] <wikibugs>	 (03CR) 10MVernon: "Hi," [puppet] - 10https://gerrit.wikimedia.org/r/763541 (https://phabricator.wikimedia.org/T301657) (owner: 10MVernon)
[16:51:49] <wikibugs>	 10SRE, 10ops-eqiad: 8 x SMF Patches between cages Eqiad - LVS & WMCS - https://phabricator.wikimedia.org/T301419 (10Jclark-ctr) @wiki_willy @RobH    we need cables for old cage to finish connection      4x 20m. SC/LC fibers  8x 40GBaseLR optics   8x 10GBase-LR  4x 15m  SC/LC fibers
[16:51:56] <jinxer-wm>	 (KubernetesCalicoDown) resolved: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[16:52:05] <elukey>	 this is me reimaging --^
[16:52:56] <jinxer-wm>	 (KubernetesCalicoDown) firing: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[16:53:00] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:53:56] <wikibugs>	 (03PS1) 10David Caro: Remove unused module xvfb [puppet] - 10https://gerrit.wikimedia.org/r/763561 (https://phabricator.wikimedia.org/T272559)
[16:55:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[16:58:44] <wikibugs>	 10SRE-swift-storage, 10Observability-Logging, 10Patch-For-Review, 10User-fgiunchedi: Missed swift log rotation can lead to full root filesystem - https://phabricator.wikimedia.org/T301657 (10MatthewVernon) To answer my own question, the bullseye version in the package is using `copytruncate`, which copies...
[17:00:04] <jouncebot>	 jbond and rzl: That opportune time is upon us again. Time for a Puppet request window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T1700).
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:01:26] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org
[17:02:17] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] grafana-next: set grafana codfw base domain to grafana next [puppet] - 10https://gerrit.wikimedia.org/r/763329 (https://phabricator.wikimedia.org/T282863) (owner: 10Cwhite)
[17:05:25] <wikibugs>	 (03PS1) 10Elukey: Change docker package name for kubestage2001 [puppet] - 10https://gerrit.wikimedia.org/r/763562 (https://phabricator.wikimedia.org/T300744)
[17:09:27] <XioNoX>	 !log stop advertising drmrs from esams
[17:09:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:52] <wikibugs>	 (03PS1) 10Vgutierrez: prometheus: Aggreation rules for HAProxy TTFB [puppet] - 10https://gerrit.wikimedia.org/r/763566 (https://phabricator.wikimedia.org/T290005)
[17:11:31] <logmsgbot>	 !log razzi@cumin1001 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host datahubsearch1002.eqiad.wmnet
[17:11:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:57] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.ganeti.makevm for new host datahubsearch1002.eqiad.wmnet
[17:12:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:12:07] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Change docker package name for kubestage2001 [puppet] - 10https://gerrit.wikimedia.org/r/763562 (https://phabricator.wikimedia.org/T300744) (owner: 10Elukey)
[17:12:55] <wikibugs>	 (03PS1) 10Ladsgroup: db1105: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/763568 (https://phabricator.wikimedia.org/T300510)
[17:14:04] <wikibugs>	 (03PS2) 10Ladsgroup: db1105: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/763568 (https://phabricator.wikimedia.org/T300510)
[17:16:10] <wikibugs>	 (03PS3) 10Ayounsi: Add drmrs routers [homer/public] - 10https://gerrit.wikimedia.org/r/763551 (https://phabricator.wikimedia.org/T300277)
[17:17:56] <jinxer-wm>	 (KubernetesCalicoDown) resolved: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[17:18:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job calico-felix in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[17:19:22] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bullseye
[17:19:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:19:45] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] db1105: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/763568 (https://phabricator.wikimedia.org/T300510) (owner: 10Ladsgroup)
[17:19:47] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] "Already pushed from my laptop, merging to not have diff with esams. Feel free to leave post-merge comments if needed." [homer/public] - 10https://gerrit.wikimedia.org/r/763551 (https://phabricator.wikimedia.org/T300277) (owner: 10Ayounsi)
[17:19:52] <logmsgbot>	 !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
[17:19:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:25] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300774)', diff saved to https://phabricator.wikimedia.org/P20991 and previous config saved to /var/cache/conftool/dbconfig/20220217-172124-kormat.json
[17:21:26] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org
[17:21:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:31] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[17:21:57] <wikibugs>	 (03Merged) 10jenkins-bot: Add drmrs routers [homer/public] - 10https://gerrit.wikimedia.org/r/763551 (https://phabricator.wikimedia.org/T300277) (owner: 10Ayounsi)
[17:24:41] <wikibugs>	 (03CR) 10David Caro: backy2: don't back up shelved instances (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763345 (owner: 10Andrew Bogott)
[17:24:58] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[17:24:59] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[17:25:00] <wikibugs>	 (03PS7) 10Razzi: analytics_cluster::datahub::opensearch: start of puppet role [puppet] - 10https://gerrit.wikimedia.org/r/762957 (https://phabricator.wikimedia.org/T301382)
[17:25:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:25:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P20992 and previous config saved to /var/cache/conftool/dbconfig/20220217-172504-ladsgroup.json
[17:25:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:25:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:25:11] <stashbot>	 T300510: Upgrade s2 to Bullseye - https://phabricator.wikimedia.org/T300510
[17:25:20] <wikibugs>	 (03CR) 10Razzi: [V: 03+1] "Updated the patch, still going with a 1-node cluster as I spin up the other machines, let me know if that's alright for now." [puppet] - 10https://gerrit.wikimedia.org/r/762957 (https://phabricator.wikimedia.org/T301382) (owner: 10Razzi)
[17:26:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20993 and previous config saved to /var/cache/conftool/dbconfig/20220217-172650-ladsgroup.json
[17:26:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:27:28] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/762957 (https://phabricator.wikimedia.org/T301382) (owner: 10Razzi)
[17:29:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.reimage for host db1105.eqiad.wmnet with OS bullseye
[17:30:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[17:33:00] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[17:35:12] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus: Alert when the rate of pages fixed by Saneitizer is too high [puppet] - 10https://gerrit.wikimedia.org/r/763573 (https://phabricator.wikimedia.org/T295365)
[17:36:30] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P20994 and previous config saved to /var/cache/conftool/dbconfig/20220217-173630-kormat.json
[17:36:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: Alert when the rate of pages fixed by Saneitizer is too high [puppet] - 10https://gerrit.wikimedia.org/r/763573 (https://phabricator.wikimedia.org/T295365) (owner: 10Ebernhardson)
[17:39:17] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1105.eqiad.wmnet with reason: host reimage
[17:39:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:40:57] <wikibugs>	 10SRE, 10ops-drmrs, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: Q3:(Need By: ASAP) rack/setup/install cr[12]-drmrs - https://phabricator.wikimedia.org/T300277 (10ayounsi) Current status: * Physical work left (I'll give the details tomorrow): ** Planned: move Telia's link to the routers now that w...
[17:41:01] <wikibugs>	 (03CR) 10Razzi: [C: 03+2] analytics_cluster::datahub::opensearch: start of puppet role [puppet] - 10https://gerrit.wikimedia.org/r/762957 (https://phabricator.wikimedia.org/T301382) (owner: 10Razzi)
[17:42:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1105.eqiad.wmnet with reason: host reimage
[17:42:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:42:17] <wikibugs>	 (03CR) 10Herron: [C: 03+1] remove deprecated piechart plugin [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/763334 (https://phabricator.wikimedia.org/T282863) (owner: 10Cwhite)
[17:42:34] <wikibugs>	 (03CR) 10Herron: [C: 03+1] update grafana-simple-json-datasource to 1.4.2 [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/763337 (https://phabricator.wikimedia.org/T282863) (owner: 10Cwhite)
[17:42:50] <wikibugs>	 (03CR) 10Herron: [C: 03+1] update grafana-image-renderer to 3.3.0 [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/763335 (https://phabricator.wikimedia.org/T282863) (owner: 10Cwhite)
[17:43:08] <wikibugs>	 (03PS1) 10Razzi: datahub::opensearch: Fix sdd typo to be ssd [puppet] - 10https://gerrit.wikimedia.org/r/763575 (https://phabricator.wikimedia.org/T301382)
[17:44:34] <wikibugs>	 (03CR) 10Razzi: [C: 03+2] datahub::opensearch: Fix sdd typo to be ssd [puppet] - 10https://gerrit.wikimedia.org/r/763575 (https://phabricator.wikimedia.org/T301382) (owner: 10Razzi)
[17:45:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[17:50:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[17:50:51] <wikibugs>	 10ops-eqiad, 10DC-Ops: cloudvirt1017.mgmt/SSH - https://phabricator.wikimedia.org/T302016 (10mdipietro)
[17:51:35] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P20995 and previous config saved to /var/cache/conftool/dbconfig/20220217-175135-kormat.json
[17:51:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:00] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[17:53:28] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Still troubleshooting mariadb issues
[17:53:28] <logmsgbot>	 !log razzi@cumin1001 END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Still troubleshooting mariadb issues
[17:53:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:44] <wikibugs>	 (03PS2) 10Elukey: ml-services: add elwiki, enwiktionary, eswikibooks [deployment-charts] - 10https://gerrit.wikimedia.org/r/763556 (https://phabricator.wikimedia.org/T301415) (owner: 10Accraze)
[17:53:46] <wikibugs>	 (03PS1) 10Elukey: kserve-inference: improve the revscoring_inference_service config [deployment-charts] - 10https://gerrit.wikimedia.org/r/763580
[17:54:10] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on datahubsearch1001.eqiad.wmnet with reason: Node is being set up for first time and puppet run failed
[17:54:12] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on datahubsearch1001.eqiad.wmnet with reason: Node is being set up for first time and puppet run failed
[17:54:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:38] <wikibugs>	 (03PS13) 10AGueyte: Update Event Stream for IPInfo events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/756635 (https://phabricator.wikimedia.org/T296415)
[17:54:52] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] backy2: initialize backy2 database if necessary [puppet] - 10https://gerrit.wikimedia.org/r/763401 (owner: 10Andrew Bogott)
[17:55:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[17:56:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1105.eqiad.wmnet with OS bullseye
[17:56:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Update Event Stream for IPInfo events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/756635 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte)
[17:56:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:56:24] <wikibugs>	 (03PS3) 10Elukey: ml-services: add elwiki, enwiktionary, eswikibooks [deployment-charts] - 10https://gerrit.wikimedia.org/r/763556 (https://phabricator.wikimedia.org/T301415) (owner: 10Accraze)
[17:57:23] <wikibugs>	 (03PS5) 10Andrew Bogott: backy2: don't back up shelved instances [puppet] - 10https://gerrit.wikimedia.org/r/763345
[18:00:04] <jouncebot>	 chrisalbon and accraze: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T1800).
[18:06:40] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300774)', diff saved to https://phabricator.wikimedia.org/P20997 and previous config saved to /var/cache/conftool/dbconfig/20220217-180639-kormat.json
[18:06:41] <logmsgbot>	 !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
[18:06:43] <logmsgbot>	 !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
[18:06:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:46] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[18:06:47] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Depooling db1100 (T300774)', diff saved to https://phabricator.wikimedia.org/P20998 and previous config saved to /var/cache/conftool/dbconfig/20220217-180647-kormat.json
[18:06:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:56] <wikibugs>	 (03PS14) 10AGueyte: Update Event Stream for IPInfo events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/756635 (https://phabricator.wikimedia.org/T296415)
[18:06:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:08:23] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[18:09:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P20999 and previous config saved to /var/cache/conftool/dbconfig/20220217-180900-ladsgroup.json
[18:09:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:09:05] <stashbot>	 T300510: Upgrade s2 to Bullseye - https://phabricator.wikimedia.org/T300510
[18:10:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[18:11:01] <wikibugs>	 (03PS10) 10JHathaway: Remove ordered_yaml function [puppet] - 10https://gerrit.wikimedia.org/r/763362
[18:11:45] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:13:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[18:16:16] <wikibugs>	 (03PS2) 10ArielGlenn: add Hannah Okwelum to platform-engineering group [puppet] - 10https://gerrit.wikimedia.org/r/763456 (https://phabricator.wikimedia.org/T301876)
[18:17:09] <wikibugs>	 (03CR) 10Accraze: [C: 03+1] kserve-inference: improve the revscoring_inference_service config [deployment-charts] - 10https://gerrit.wikimedia.org/r/763580 (owner: 10Elukey)
[18:18:18] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] add Hannah Okwelum to platform-engineering group [puppet] - 10https://gerrit.wikimedia.org/r/763456 (https://phabricator.wikimedia.org/T301876) (owner: 10ArielGlenn)
[18:18:44] <wikibugs>	 (03PS2) 10Elukey: kserve-inference: improve the revscoring_inference_service config [deployment-charts] - 10https://gerrit.wikimedia.org/r/763580
[18:18:46] <wikibugs>	 (03PS4) 10Elukey: ml-services: add elwiki, enwiktionary, eswikibooks [deployment-charts] - 10https://gerrit.wikimedia.org/r/763556 (https://phabricator.wikimedia.org/T301415) (owner: 10Accraze)
[18:23:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[18:23:59] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] kserve-inference: improve the revscoring_inference_service config [deployment-charts] - 10https://gerrit.wikimedia.org/r/763580 (owner: 10Elukey)
[18:24:04] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] ml-services: add elwiki, enwiktionary, eswikibooks [deployment-charts] - 10https://gerrit.wikimedia.org/r/763556 (https://phabricator.wikimedia.org/T301415) (owner: 10Accraze)
[18:24:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21000 and previous config saved to /var/cache/conftool/dbconfig/20220217-182405-ladsgroup.json
[18:24:05] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21001 and previous config saved to /var/cache/conftool/dbconfig/20220217-182405-kormat.json
[18:24:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:24:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[18:31:35] <logmsgbot>	 !log accraze@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
[18:31:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:33:06] <wikibugs>	 10SRE, 10ops-drmrs, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q3:(Need By: ASAP) rack/setup/install cr[12]-drmrs - https://phabricator.wikimedia.org/T300277 (10RobH) I can put in a followup ticket for them to correct the 'unplanned' items but I'll wait until you finish your setup or give the go a...
[18:33:14] <wikibugs>	 10SRE, 10ops-drmrs, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q3:(Need By: ASAP) rack/setup/install cr[12]-drmrs - https://phabricator.wikimedia.org/T300277 (10RobH) a:05RobH→03ayounsi
[18:34:39] <logmsgbot>	 !log accraze@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
[18:34:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:35:12] <wikibugs>	 10SRE, 10ops-drmrs, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q3:(Need By: ASAP) rack/setup/install cr[12]-drmrs - https://phabricator.wikimedia.org/T300277 (10RobH) No packing slips in box according to the Interxion engineer who did our remote hands work, so I've requested a copy from Myriad so...
[18:36:30] <wikibugs>	 (03PS1) 10Papaul: Add restbase-dev200[1-3] to site.pp and netboot [puppet] - 10https://gerrit.wikimedia.org/r/763583 (https://phabricator.wikimedia.org/T299437)
[18:36:55] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] backy2: don't back up shelved instances [puppet] - 10https://gerrit.wikimedia.org/r/763345 (owner: 10Andrew Bogott)
[18:37:41] <wikibugs>	 (03PS1) 10Majavah: toolsdb: enable pt-heartbeat [puppet] - 10https://gerrit.wikimedia.org/r/763584
[18:38:35] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] Add restbase-dev200[1-3] to site.pp and netboot [puppet] - 10https://gerrit.wikimedia.org/r/763583 (https://phabricator.wikimedia.org/T299437) (owner: 10Papaul)
[18:39:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21002 and previous config saved to /var/cache/conftool/dbconfig/20220217-183909-ladsgroup.json
[18:39:10] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21003 and previous config saved to /var/cache/conftool/dbconfig/20220217-183910-kormat.json
[18:39:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:39:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:39:19] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33841/console" [puppet] - 10https://gerrit.wikimedia.org/r/763584 (owner: 10Majavah)
[18:49:05] <wikibugs>	 (03PS1) 10Razzi: datahub::opensearch: Change curator version to 5.8.1-1 for [puppet] - 10https://gerrit.wikimedia.org/r/763587 (https://phabricator.wikimedia.org/T301382)
[18:49:19] <wikibugs>	 (03PS2) 10Razzi: datahub::opensearch: Change curator version to 5.8.1-1 [puppet] - 10https://gerrit.wikimedia.org/r/763587 (https://phabricator.wikimedia.org/T301382)
[18:50:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] datahub::opensearch: Change curator version to 5.8.1-1 [puppet] - 10https://gerrit.wikimedia.org/r/763587 (https://phabricator.wikimedia.org/T301382) (owner: 10Razzi)
[18:50:52] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host restbase-dev2001.codfw.wmnet with OS buster
[18:50:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:00] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, and 2 others: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host restbase-dev2001.codfw.w...
[18:52:11] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10RESTBase, 10Platform Team Workboards (Platform Engineering Reliability): Q2:(Need By: TBD) rack/setup/install restbase103[123].eqiad.wmnet - https://phabricator.wikimedia.org/T294372 (10Cmjohnson) @hnowlan I am having issues with partman for these servers. Can you verify t...
[18:52:39] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[18:54:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P21004 and previous config saved to /var/cache/conftool/dbconfig/20220217-185414-ladsgroup.json
[18:54:15] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300774)', diff saved to https://phabricator.wikimedia.org/P21005 and previous config saved to /var/cache/conftool/dbconfig/20220217-185414-kormat.json
[18:54:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:54:21] <stashbot>	 T300510: Upgrade s2 to Bullseye - https://phabricator.wikimedia.org/T300510
[18:54:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:54:26] <stashbot>	 T300774: Drop fr_img_* columns - https://phabricator.wikimedia.org/T300774
[18:59:06] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: cloudvirt1017.mgmt/SSH - https://phabricator.wikimedia.org/T302016 (10Cmjohnson) This will require either a hard reboot/power off or replacing the cable. I will attempt the cable first.
[18:59:48] <wikibugs>	 10SRE, 10ops-eqiad: Installation issues on PowerEdge R440 eqiad Ganeti servers with buster / firmware update needed - https://phabricator.wikimedia.org/T299527 (10Cmjohnson)
[19:00:04] <jouncebot>	 hashar and jeena: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T1900).
[19:02:46] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
[19:02:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:03:02] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Q2:(Need By: TBD) rack/setup/install ms-fe1009-1012 - https://phabricator.wikimedia.org/T294137 (10Cmjohnson) @cmooney don't forget that 1012 is in the new cage, it could take awhile to get that going.
[19:04:55] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host restbase-dev2002.codfw.wmnet with OS buster
[19:05:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:05:04] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, and 2 others: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host restbase-dev2002.codfw.w...
[19:07:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P21006 and previous config saved to /var/cache/conftool/dbconfig/20220217-190748-ladsgroup.json
[19:07:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:07:53] <stashbot>	 T300510: Upgrade s2 to Bullseye - https://phabricator.wikimedia.org/T300510
[19:08:07] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on restbase-dev2001.codfw.wmnet with reason: host reimage
[19:08:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:08:20] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:08:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:11:20] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase-dev2001.codfw.wmnet with reason: host reimage
[19:11:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:13:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[19:15:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[19:18:00] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[19:20:55] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase-dev2001.codfw.wmnet with OS buster
[19:20:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:21:01] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host restbase-dev2001.codfw.wmnet...
[19:21:51] <wikibugs>	 (03PS3) 10Cathal Mooney: New function and changes to wmf-netbox plugin to support EVPN config. [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/760566 (https://phabricator.wikimedia.org/T299758)
[19:22:33] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on restbase-dev2002.codfw.wmnet with reason: host reimage
[19:22:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:22:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P21007 and previous config saved to /var/cache/conftool/dbconfig/20220217-192252-ladsgroup.json
[19:22:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:23:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[19:24:37] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host restbase-dev2003.codfw.wmnet with OS buster
[19:24:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:24:43] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host restbase-dev2003.codfw.w...
[19:25:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[19:26:01] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase-dev2002.codfw.wmnet with reason: host reimage
[19:26:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:30:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[19:30:43] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): Q3:(Need By: TBD) rack/setup/install elastic20[73-86] - https://phabricator.wikimedia.org/T299608 (10Papaul)
[19:33:00] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[19:35:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[19:35:47] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase-dev2002.codfw.wmnet with OS buster
[19:35:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:35:58] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host restbase-dev2002.codfw.wmnet...
[19:37:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P21008 and previous config saved to /var/cache/conftool/dbconfig/20220217-193757-ladsgroup.json
[19:38:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:41:37] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on restbase-dev2003.codfw.wmnet with reason: host reimage
[19:41:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:45:05] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase-dev2003.codfw.wmnet with reason: host reimage
[19:45:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P21009 and previous config saved to /var/cache/conftool/dbconfig/20220217-195302-ladsgroup.json
[19:53:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:09] <stashbot>	 T300510: Upgrade s2 to Bullseye - https://phabricator.wikimedia.org/T300510
[19:54:21] <wikibugs>	 (03CR) 10JHathaway: Remove ordered_yaml function (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/763362 (owner: 10JHathaway)
[19:54:53] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] Remove ordered_yaml function [puppet] - 10https://gerrit.wikimedia.org/r/763362 (owner: 10JHathaway)
[19:54:54] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase-dev2003.codfw.wmnet with OS buster
[19:54:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:55:00] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host restbase-dev2003.codfw.wmnet...
[19:57:00] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10Papaul)
[20:02:28] <logmsgbot>	 !log dcausse@deploy1002 Started deploy [wikimedia/discovery/analytics@66350a9]: (no justification provided)
[20:02:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:04:31] <logmsgbot>	 !log dcausse@deploy1002 Finished deploy [wikimedia/discovery/analytics@66350a9]: (no justification provided) (duration: 02m 02s)
[20:04:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:05:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[20:06:35] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10Papaul) 05Open→03Resolved @hnowlan this is complete
[20:07:24] <wikibugs>	 (03PS3) 10JHathaway: Remove ordered_json function [puppet] - 10https://gerrit.wikimedia.org/r/763309
[20:09:51] <icinga-wm>	 PROBLEM - Widespread puppet agent failures on alert1001 is CRITICAL: 0.01239 ge 0.01 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[20:10:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[20:11:01] <wikibugs>	 (03CR) 10Cwhite: [C: 04-1] "One of two ways is better, IMHO:" [puppet] - 10https://gerrit.wikimedia.org/r/763587 (https://phabricator.wikimedia.org/T301382) (owner: 10Razzi)
[20:15:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[20:17:31] <wikibugs>	 (03PS2) 10JHathaway: Remove puppet:///files and move files to modules [puppet] - 10https://gerrit.wikimedia.org/r/763370
[20:20:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[20:23:26] <icinga-wm>	 PROBLEM - Check systemd state on doc1001 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-doc1002.eqiad.wmnet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:34:15] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] Remove ordered_json function [puppet] - 10https://gerrit.wikimedia.org/r/763309 (owner: 10JHathaway)
[20:34:33] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] Remove ordered_json function (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763309 (owner: 10JHathaway)
[20:38:30] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[20:43:03] <wikibugs>	 (03PS1) 10Ssingh: dnsrecursor: allow outgoing IPv6 queries [puppet] - 10https://gerrit.wikimedia.org/r/763593
[20:43:52] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33845/console" [puppet] - 10https://gerrit.wikimedia.org/r/763593 (owner: 10Ssingh)
[20:44:47] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "NOOP on existing hosts, as expected." [puppet] - 10https://gerrit.wikimedia.org/r/763593 (owner: 10Ssingh)
[20:45:01] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1 C: 03+2] dnsrecursor: allow outgoing IPv6 queries [puppet] - 10https://gerrit.wikimedia.org/r/763593 (owner: 10Ssingh)
[20:51:13] <wikibugs>	 (03PS1) 10Ssingh: P:wikidough: enable IPv6 in backend recursor [puppet] - 10https://gerrit.wikimedia.org/r/763595
[20:51:54] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33846/console" [puppet] - 10https://gerrit.wikimedia.org/r/763595 (owner: 10Ssingh)
[20:53:51] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1 C: 03+2] P:wikidough: enable IPv6 in backend recursor [puppet] - 10https://gerrit.wikimedia.org/r/763595 (owner: 10Ssingh)
[20:55:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[20:55:52] <icinga-wm>	 RECOVERY - Widespread puppet agent failures on alert1001 is OK: (C)0.01 ge (W)0.006 ge 0.005382 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[20:58:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[20:59:08] <wikibugs>	 (03PS3) 10JHathaway: Remove puppet:///files and move files to modules [puppet] - 10https://gerrit.wikimedia.org/r/763370
[21:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, and Urbanecm: How many deployers does it take to do UTC late backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220217T2100).
[21:00:05] <jouncebot>	 eigyan: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:46] <eigyan>	 greetings everyone
[21:03:30] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] Remove puppet:///files and move files to modules [puppet] - 10https://gerrit.wikimedia.org/r/763370 (owner: 10JHathaway)
[21:04:54] <icinga-wm>	 PROBLEM - SSH on dns5001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:05:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[21:08:52] <wikibugs>	 (03PS1) 10Jbond: Rakefile: Add sperate rake jobs for static/unit tests [puppet] - 10https://gerrit.wikimedia.org/r/763597
[21:09:13] <wikibugs>	 (03PS3) 10Razzi: opensearch: make curator version bullseye compatible [puppet] - 10https://gerrit.wikimedia.org/r/763587 (https://phabricator.wikimedia.org/T301382)
[21:10:12] <icinga-wm>	 PROBLEM - Widespread puppet agent failures on alert1001 is CRITICAL: 0.01561 ge 0.01 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[21:14:11] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops, 10Kubernetes: (Need By: TBD) rack/setup/install kubernetes20[19|2(012)] - https://phabricator.wikimedia.org/T299470 (10Papaul)
[21:15:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[21:15:37] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "I clearly have missed some cleaning up steps. Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/763561 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[21:19:34] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10Papaul)
[21:19:51] <logmsgbot>	 !log razzi@cumin1001 END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=93) for new host datahubsearch1002.eqiad.wmnet
[21:19:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:20:05] <icinga-wm>	 RECOVERY - Check systemd state on doc1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:21:27] <wikibugs>	 (03CR) 10Razzi: "Updated the patch." [puppet] - 10https://gerrit.wikimedia.org/r/763587 (https://phabricator.wikimedia.org/T301382) (owner: 10Razzi)
[21:23:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[21:25:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[21:27:30] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "PCC checks out: https://puppet-compiler.wmflabs.org/pcc-worker1003/33848/" [puppet] - 10https://gerrit.wikimedia.org/r/763587 (https://phabricator.wikimedia.org/T301382) (owner: 10Razzi)
[21:28:24] <wikibugs>	 10SRE, 10ops-eqiad: Installation issues on PowerEdge R440 eqiad Ganeti servers with buster / firmware update needed - https://phabricator.wikimedia.org/T299527 (10Cmjohnson) all 3 are completed
[21:34:59] <wikibugs>	 10ops-eqiad, 10DC-Ops: eqiad: Unrack wmf3570 & wmf4579 - https://phabricator.wikimedia.org/T302034 (10wiki_willy)
[21:38:52] <icinga-wm>	 RECOVERY - Widespread puppet agent failures on alert1001 is OK: (C)0.01 ge (W)0.006 ge 0.002691 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[21:50:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[21:51:43] <eigyan>	 greetings team, any updates on https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/762881
[21:53:00] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[21:54:53] <RhinosF1>	 eigyan: it doesn't look like Jon's CR was fixed
[21:55:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[21:55:37] <RhinosF1>	 "would suggest setting coverage to 1 as coverage is all users, not % of users who meet the other criteria.
[21:55:37] <RhinosF1>	 10% of users is likely too low given you are targetting minEdits of 5 on Farsi Wikipedia."
[22:00:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:01:02] <eigyan>	 Thank you RhinosF1, the coverage is mandated to that specific audience
[22:01:21] <eigyan>	 by request of Trust and Safety stakeholders
[22:01:31] <eigyan>	 whom I am making this edit for
[22:02:15] <eigyan>	 Does that make sense?
[22:05:57] <eigyan>	 RhinosF1is your suggestion that I get approval from Trust and Safety stakeholders to change coverage and resubmit?
[22:05:57] <AntiComposite>	 the concern is that it may not may not do what you expect
[22:06:03] <RhinosF1>	 eigyan: you still need to respond to Jon
[22:06:08] <RhinosF1>	 You cant simply ignore it
[22:06:31] <RhinosF1>	 i don't care who told you to do it, I care that someone has raised a concern that hasn't been addressed
[22:07:19] <eigyan>	 I am only 6 months with the foundation and learning new things each deploy, my apologies, I thought that was more context, not a mandate
[22:07:33] <wikibugs>	 (03CR) 10RhinosF1: [C: 04-1] "please answer Jon's suggested change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762881 (https://phabricator.wikimedia.org/T297629) (owner: 10Eigyan)
[22:07:49] <RhinosF1>	 Jdlrobson: fyi ^
[22:08:23] <RhinosF1>	 eigyan: I don't know of any organisation ever that's going to allow you to simply ignore a code review without further comment because so and so said
[22:08:57] <eigyan>	 Sure thing
[22:09:16] <icinga-wm>	 PROBLEM - Check systemd state on thanos-be2001 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:10:30] <RhinosF1>	 eigyan: if you are confused about a code review, you need to make that clear on the change. We're all here to help but we can only do that if you're honest and upfront with us.
[22:11:43] <RhinosF1>	 I do apologise for the fact that no one answered during the window though
[22:11:47] <RhinosF1>	 Sometimes people get busy
[22:11:48] <wikibugs>	 (03PS1) 10Andrew Bogott: Openstack Cinder and Nova: tweak cgroup kernel settings on Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/763605 (https://phabricator.wikimedia.org/T281276)
[22:11:51] <wikibugs>	 (03CR) 10Eigyan: [wmf-config]: Deploy the fawiki test safety survey to production (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762881 (https://phabricator.wikimedia.org/T297629) (owner: 10Eigyan)
[22:12:00] <RhinosF1>	 I think you missed the jouncebot ping by a few seconds
[22:12:25] <mepps>	 hey RhinosF1 and eigyan :)
[22:12:29] <RhinosF1>	 you can always ping a deployer if no one shows to double check
[22:12:32] <RhinosF1>	 Hey mepps
[22:12:54] <wikibugs>	 (03PS3) 10JHathaway: ini(), php_ini(): convert to modern Ruby function API [puppet] - 10https://gerrit.wikimedia.org/r/763311 (https://phabricator.wikimedia.org/T265138)
[22:13:01] <mepps>	 yeah the call for coverage set to 0.1 was a stakeholder decision
[22:13:12] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[22:13:32] <RhinosF1>	 mepps: can we name stakeholders
[22:13:47] <RhinosF1>	 Or are they able to explain why
[22:14:20] <mepps>	 in the patch Rhinosf1 or here?  i believe it would be TAndic, let me make sure she's on gerrit too
[22:14:32] <RhinosF1>	 mepps: on the patch is best
[22:14:37] <RhinosF1>	 For transparency
[22:15:05] <mepps>	 sounds good RhinosF1, i'm also trying to see if it was documented in phab
[22:15:08] <RhinosF1>	 You'll have to reschedule the change anyway because a) its C-1'd and b) no one was around to deploy
[22:15:17] <wikibugs>	 10SRE, 10ops-eqiad: 8 x SMF Patches between cages Eqiad - LVS & WMCS - https://phabricator.wikimedia.org/T301419 (10cmooney)
[22:15:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:16:02] <icinga-wm>	 PROBLEM - SSH on mw2258.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:16:38] * RhinosF1 is not seeing a gerrit account
[22:17:25] <wikibugs>	 (03CR) 10Dzahn: "yea, eh,, see my original comment on https://gerrit.wikimedia.org/r/c/operations/puppet/+/597016  I wasn't sure back then either but have " [puppet] - 10https://gerrit.wikimedia.org/r/763561 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[22:18:47] <eigyan>	 Ok team, thanks for the feedback. Is it ok to sign off now?
[22:19:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/763605 (https://phabricator.wikimedia.org/T281276) (owner: 10Andrew Bogott)
[22:20:15] <RhinosF1>	 eigyan: no
[22:20:20] <wikibugs>	 (03CR) 10Mepps: [wmf-config]: Deploy the fawiki test safety survey to production (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762881 (https://phabricator.wikimedia.org/T297629) (owner: 10Eigyan)
[22:20:21] <RhinosF1>	 Not until jon responds
[22:20:31] <eigyan>	 Cool
[22:20:32] <mutante>	 mepps: hi! checked the user name, she has "TAndic"
[22:20:32] <RhinosF1>	 they'll be no deploys until Monday now
[22:20:40] <eigyan>	 Cool
[22:20:41] <mepps>	 RhinosF1 Is jon around?
[22:20:48] <RhinosF1>	 mutante: I tried to request their review and couldn't find them
[22:20:48] <mutante>	 mepps: or .. I mean.. she could use that to login without having to register anything
[22:20:59] <mutante>	 it's the same as the wikitech user
[22:21:02] <urbanecm>	 mepps: he was pinged here, so...likely not at IRC at least
[22:21:10] <mepps>	 thanks mutante
[22:21:18] <mutante>	 yw
[22:21:19] <RhinosF1>	 mepps: I have pinged him earlier up, he'll show if he is. I also brought the patch to his attention on gerrit.
[22:21:41] <mepps>	 thanks RhinosF1, sounds like we need to pause for the night and reschedule for another deploy window
[22:21:48] <RhinosF1>	 Yes
[22:21:59] <RhinosF1>	 The window closed anyway a bit ago
[22:22:03] <urbanecm>	 the window's officially over too (sorry, didn't get here earlier)
[22:22:10] <RhinosF1>	 It would have to be Monday as we don't deploy on Fridays
[22:22:29] <mepps>	 Thanks RhinosF1 and urbanecm
[22:22:57] <RhinosF1>	 Np, happy to give advice
[22:23:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:24:59] <eigyan>	 Can I help with anything else, I understand the deploy is not happening.
[22:25:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:25:42] <RhinosF1>	 I'm around a bit longer if you have questions eigyan
[22:25:45] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
[22:25:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:25:51] <RhinosF1>	 In general about deploys
[22:26:02] <icinga-wm>	 RECOVERY - Check systemd state on thanos-be2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:26:27] <mutante>	 the "no deploys on Friday rule" is not just us but more like a general industry thing
[22:27:37] <eigyan>	 Thanks RhinosF1 I am going to sign off for the night. Thanks for everyones help.
[22:27:44] <RhinosF1>	 Okay
[22:27:48] <RhinosF1>	 Have a good weekend
[22:27:58] <mutante>	 cheers eigyan 
[22:28:41] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[22:28:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:30:13] <papaul>	 fyi i just merged a patch on cookbook to remove datahubsearche1002
[22:30:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:31:57] <mutante>	 razzi: ^ papaul's message is for you ... I guess
[22:32:55] <RhinosF1>	 papaul: I imagine expected because the spelling error
[22:33:12] <RhinosF1>	 It should be datahubsearch without the last E I think
[22:33:51] <wikibugs>	 (03PS1) 10JHathaway: WIP: Deprecated types :( [puppet] - 10https://gerrit.wikimedia.org/r/763611
[22:34:01] <papaul>	 RhinosF1: indeed withotu the last E
[22:34:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP: Deprecated types :( [puppet] - 10https://gerrit.wikimedia.org/r/763611 (owner: 10JHathaway)
[22:34:46] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] ini(), php_ini(): convert to modern Ruby function API [puppet] - 10https://gerrit.wikimedia.org/r/763311 (https://phabricator.wikimedia.org/T265138) (owner: 10JHathaway)
[22:35:05] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] ini(), php_ini(): convert to modern Ruby function API (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/763311 (https://phabricator.wikimedia.org/T265138) (owner: 10JHathaway)
[22:35:34] <RhinosF1>	 papaul: I left a message in their channel too
[22:36:05] <wikibugs>	 (03PS1) 10Andrew Bogott: dnsrecursor: change webserver listening address [puppet] - 10https://gerrit.wikimedia.org/r/763612 (https://phabricator.wikimedia.org/T300254)
[22:38:10] <icinga-wm>	 PROBLEM - Check systemd state on thanos-be2001 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:40:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:40:34] <icinga-wm>	 RECOVERY - Check systemd state on thanos-be2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:42:41] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] dnsrecursor: change webserver listening address [puppet] - 10https://gerrit.wikimedia.org/r/763612 (https://phabricator.wikimedia.org/T300254) (owner: 10Andrew Bogott)
[22:43:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:47:46] <icinga-wm>	 PROBLEM - Check systemd state on thanos-be2001 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:48:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:50:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:51:24] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[22:51:47] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Openstack Cinder and Nova: tweak cgroup kernel settings on Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/763605 (https://phabricator.wikimedia.org/T281276) (owner: 10Andrew Bogott)
[22:53:00] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:53:15] <RhinosF1>	 papaul: razzi confirmed it fine
[22:55:13] <razzi>	 yep, sorry for any confusion, tried to create a datahubsearch1002 earlier but the command froze, and I never circled back around to clean up
[22:57:26] <razzi>	 The extra "e" in the name confuses me however, looking in my command history I never had that typo in the name
[23:00:08] <mutante>	 razzi: I think the typo was just here on IRC. but thing is..it's somehow not in DNS with either variant ..it seems
[23:00:45] <mutante>	 even though we would expect it to be now after papaul merged that 
[23:01:22] <mutante>	 And I saw your change at  https://phabricator.wikimedia.org/rONED7016edd1945493000dcc877db6f2f56509d5cdf5  and copied it from there
[23:01:58] <mutante>	 razzi: nevermind, it works when I try from another host
[23:02:07] <mutante>	 datahubsearch1002.eqiad.wmnet has address 10.64.16.38
[23:02:07] <mutante>	 Host datahubsearch1002.eqiad.wmnet not found: 3(NXDOMAIN)
[23:02:07] <mutante>	 Host datahubsearch1002.eqiad.wmnet not found: 3(NXDOMAIN)
[23:02:10] <mutante>	 well.. partially
[23:02:16] <mutante>	 as if syncing was interrupted
[23:02:55] <razzi>	 Hm, ok thanks for that context mutante 
[23:03:39] <mutante>	 try:   dig datahubsearch1002.eqiad.wmnet @ns0.wikimedia.org     and then replace ns0 with ns1 and ns2   
[23:03:58] <mutante>	 hmm.. maybe best to ask traffic to check the sync status
[23:04:08] <mutante>	 before messing with it
[23:04:33] <mutante>	 or run the DNS cookbook one more time
[23:04:47] <razzi>	 I never even installed an os on the vm, can definitely destroy it and start again with datahubsearch1003
[23:04:56] <mutante>	 this could match what you said about the process hanging
[23:05:00] <icinga-wm>	 PROBLEM - Ensure legal html en.wp on en.wikipedia.org is CRITICAL: Text\sis\savailable\sunder\sthe\sa\srel=license\s+href=(https:)?\/\/en.wikipedia.org\/wiki\/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_LicenseCreative\sCommons\sAttribution-ShareAlike\sLicense/aa\srel=license\shref=\/\/creativecommons.org\/licenses\/by-sa\/3\.0/ html not found https://phabricator.wikimedia.org/project/members/28/
[23:05:15] <mutante>	 ^ duh.. those are content checks on wiki
[23:05:57] <mutante>	 somebody edited a footer maybe
[23:06:10] <icinga-wm>	 RECOVERY - Check systemd state on thanos-be2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:06:11] <mutante>	 or the tickets had a 1 year downtime that just expired because those used to be open tickets
[23:06:48] <mutante>	 razzi: that's worth a try since it includes the DNS cookbook as well and good test if it happens again
[23:06:48] <wikibugs>	 (03PS2) 10JHathaway: Add nagios_core & mailalias_core modules [puppet] - 10https://gerrit.wikimedia.org/r/763611 (https://phabricator.wikimedia.org/T265138)
[23:07:10] <wikibugs>	 (03CR) 10JHathaway: "kindly review" [puppet] - 10https://gerrit.wikimedia.org/r/763611 (https://phabricator.wikimedia.org/T265138) (owner: 10JHathaway)
[23:07:30] <icinga-wm>	 RECOVERY - SSH on dns5001.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:07:30] <wikibugs>	 10SRE, 10DNS, 10Traffic, 10WMSE (IT): Need Assistance adding DNS records to claim domain - https://phabricator.wikimedia.org/T300076 (10MRamirez_WMF) @jbond I have the records to finalize the transition  TXT name Copy record‎@‎ (or skip if not supported by provider) TXT value MS=ms70322281 TTL ‎3600‎ (or y...
[23:07:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add nagios_core & mailalias_core modules [puppet] - 10https://gerrit.wikimedia.org/r/763611 (https://phabricator.wikimedia.org/T265138) (owner: 10JHathaway)
[23:10:01] <AntiComposite>	 yup, this is what cause the legal html alert https://en.wikipedia.org/w/index.php?title=MediaWiki:Wikimedia-copyright&diff=1072378717&oldid=861624479&diffmode=source 
[23:10:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[23:11:26] <RhinosF1>	 AntiComposite: should it be reverted?
[23:11:38] <AntiComposite>	 probably not
[23:11:55] <AntiComposite>	 CC license statements are supposed to have the version in them
[23:12:44] <mutante>	 ah, Thanks for that AntiComposite 
[23:12:53] <mutante>	 then the monitoring needs to be adjusted
[23:13:08] <wikibugs>	 (03PS3) 10JHathaway: Add nagios_core & mailalias_core modules [puppet] - 10https://gerrit.wikimedia.org/r/763611 (https://phabricator.wikimedia.org/T265138)
[23:13:14] <mutante>	 hardcoding a version isn't ideal for a check like that
[23:13:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add nagios_core & mailalias_core modules [puppet] - 10https://gerrit.wikimedia.org/r/763611 (https://phabricator.wikimedia.org/T265138) (owner: 10JHathaway)
[23:14:33] <wikibugs>	 (03CR) 10Subramanya Sastry: [C: 03+1] parsoid: remove unused module [puppet] - 10https://gerrit.wikimedia.org/r/751163 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[23:15:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[23:17:16] <icinga-wm>	 RECOVERY - SSH on mw2258.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:17:33] <wikibugs>	 10SRE, 10WMF-Legal, 10observability: monitoring alert: legal footer change on en.wikipedia - due to creative commons license version change - https://phabricator.wikimedia.org/T302045 (10Dzahn) As the #WMF-Legal project tag was added to this task, some general information to avoid wrong expectations: Please...
[23:18:00] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[23:18:05] <wikibugs>	 (03CR) 10Jdlrobson: [wmf-config]: Deploy the fawiki test safety survey to production (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762881 (https://phabricator.wikimedia.org/T297629) (owner: 10Eigyan)
[23:18:45] <wikibugs>	 10SRE, 10WMF-Legal, 10observability: monitoring alert: legal footer change on en.wikipedia - due to creative commons license version change - https://phabricator.wikimedia.org/T302045 (10Dzahn) @Herald well, understood, but this is a an alert that Legal once requested to be notified on and it links to a Phab...
[23:20:10] <icinga-wm>	 PROBLEM - Check systemd state on thanos-be2001 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:20:10] <wikibugs>	 10SRE, 10Icinga, 10WMF-Legal, 10observability: monitoring alert: legal footer change on en.wikipedia - due to creative commons license version change - https://phabricator.wikimedia.org/T302045 (10Dzahn)
[23:20:22] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[23:21:06] <AntiComposite>	 mutante, don't know what complaining at Herald's going to do for you :)
[23:21:12] <wikibugs>	 10SRE, 10Icinga, 10WMF-Legal, 10observability: monitoring alert: legal footer change on en.wikipedia - due to creative commons license version change - https://phabricator.wikimedia.org/T302045 (10Dzahn)
[23:21:43] <mutante>	 AntiComposite: :) gets it off my chest
[23:21:57] <mutante>	 but I take my comment back that "hardcoding a version is bad"
[23:22:08] <mutante>	 for a "legal check" like this maybe that is EXACTLY right
[23:24:18] <mutante>	 it was just a way to ask whether legal still wants that check and that phab workboard 
[23:24:29] <mutante>	 since they have that herald rule
[23:27:13] <AntiComposite>	 yeah, it looks like it's working as intended, as long as the alert is actually acted on
[23:27:16] <icinga-wm>	 RECOVERY - Check systemd state on thanos-be2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:28:00] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[23:28:49] <wikibugs>	 10SRE, 10Icinga, 10WMF-Legal, 10observability: monitoring alert: legal footer change on en.wikipedia - due to creative commons license version change - https://phabricator.wikimedia.org/T302045 (10Dzahn)
[23:32:14] <icinga-wm>	 ACKNOWLEDGEMENT - Ensure legal html en.wp on en.wikipedia.org is CRITICAL: Text\sis\savailable\sunder\sthe\sa\srel=license\s+href=(https:)?\/\/en.wikipedia.org\/wiki\/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_LicenseCreative\sCommons\sAttribution-ShareAlike\sLicense/aa\srel=license\shref=\/\/creativecommons.org\/licenses\/by-sa\/3\.0/ html not found daniel_zahn https://phabricator.wikimedia.org/T302045 https:/
[23:32:14] <icinga-wm>	 ator.wikimedia.org/project/members/28/
[23:34:12] <icinga-wm>	 PROBLEM - Check systemd state on thanos-be2001 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:34:41] <wikibugs>	 10SRE, 10observability: "ensure legal html" footer monitoring turned CRIT - https://phabricator.wikimedia.org/T119456 (10Dzahn) another one today because CC license version was changed to 3.0   created T302045
[23:35:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[23:37:01] <wikibugs>	 10SRE, 10Icinga, 10WMF-Legal, 10observability: monitoring alert: legal footer change on en.wikipedia - due to creative commons license version change - https://phabricator.wikimedia.org/T302045 (10Dzahn) related tickets: T108081 T119456
[23:40:56] <wikibugs>	 (03CR) 10Dzahn: "adding Krinkle because of 09e84e65363e8e7c69ba28e8" [puppet] - 10https://gerrit.wikimedia.org/r/763561 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[23:44:47] <wikibugs>	 (03CR) 10Dzahn: "general comment: Just because wmcs itself does not use a specific class does not mean nobody in cloud VPS is using classes though. It can " [puppet] - 10https://gerrit.wikimedia.org/r/751725 (https://phabricator.wikimedia.org/T272559) (owner: 10David Caro)
[23:55:22] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org