[00:00:07] <jouncebot>	 twentyafterfour: I, the Bot under the Fountain, allow thee, The Deployer, to do Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210701T0000).
[00:02:00] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:03:32] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:07:24] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:07:44] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:09:40] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:24:48] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:26:42] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:32:30] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:34:28] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:34:46] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:36:02] <wikibugs>	 10SRE, 10Traffic: Image load failing with 429 from varnish - https://phabricator.wikimedia.org/T285875 (10Legoktm) p:05Triage→03Unbreak! In general we've made some changes recently to rate limiting after repeated abuse/DDoS attacks. Could you please clarify what software (e.g. Firefox, Chrome, some other t...
[00:36:42] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:46:20] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:48:14] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:53:44] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:55:18] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:57:08] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:57:24] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:03:55] <wikibugs>	 10SRE, 10Traffic: Image load failing with 429 from varnish - https://phabricator.wikimedia.org/T285875 (10RoySmith) > Could you please clarify what software (e.g. Firefox, Chrome, some other tool) you're using to access pages/images that is returning 429s?  I'm not sure who that was intended for, but I get sim...
[01:10:03] <wikibugs>	 10SRE, 10Traffic: Image load failing with 429 from varnish - https://phabricator.wikimedia.org/T285875 (10Legoktm) >>! In T285875#7188943, @RoySmith wrote: >> Could you please clarify what software (e.g. Firefox, Chrome, some other tool) you're using to access pages/images that is returning 429s? >  > I'm not...
[01:11:48] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:13:42] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:15:54] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:17:48] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:22:51] <wikibugs>	 10SRE, 10Traffic: Image load failing with 429 from varnish - https://phabricator.wikimedia.org/T285875 (10RoySmith) If you could generate the URLs for the other size images, I'd be happy to give them a try from here.
[01:23:40] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:25:36] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:04:40] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:06:30] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:26:21] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on mw2380 - https://phabricator.wikimedia.org/T285603 (10Papaul) @Dzahn @jijiki @Joe I received the disk today, I will be replacing it tomorrow Thursday at 10:00am CT. If you need to do anything on this server before I replace the disk please let me know or you can just de-poo...
[02:35:46] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:37:56] <wikibugs>	 10SRE, 10Thumbor: Image load failing with 429 from varnish - https://phabricator.wikimedia.org/T285875 (10Legoktm) p:05Unbreak!→03High @colewhite and I dug into this, it appears to be an issue with Thumbor:  {P16748}  Looking at https://logstash.wikimedia.org/goto/b74fa1ac65d1c96d08666f798a7f1fad we found...
[02:39:26] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:41:45] <wikibugs>	 10SRE, 10serviceops, 10Patch-For-Review: Delay spinner showing for graphs for 1s - https://phabricator.wikimedia.org/T256641 (10Seddon) a:05Seddon→03None
[02:45:13] <wikibugs>	 10SRE, 10Thumbor: Image load failing with 429 from varnish - https://phabricator.wikimedia.org/T285875 (10Legoktm) T226318#5282215 suggests that the 429 vs 500 may be a red herring in that thumbor will refuse to re-render a file it failed to render previously given that it's not going to make a difference.
[02:45:21] <wikibugs>	 10SRE, 10Thumbor: Thumbor fails to render PNG with "Failed to convert image convert: IDAT: invalid distance too far back", returns 429 "Too Many Requests" - https://phabricator.wikimedia.org/T285875 (10Legoktm)
[02:54:08] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:04:25] <icinga-wm>	 PROBLEM - LVS zotero codfw port 4969/tcp - Zotero- zotero.svc.codfw.wmnet IPv4 #page on zotero.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[03:04:57] <legoktm>	 Bleh 
[03:06:15] <icinga-wm>	 RECOVERY - LVS zotero codfw port 4969/tcp - Zotero- zotero.svc.codfw.wmnet IPv4 #page on zotero.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 196 bytes in 1.189 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[03:06:35] <legoktm>	 still looking
[03:09:18] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:15:41] <legoktm>	 all the errors are like "Error: Could not parse CSS stylesheet"
[03:21:06] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:26:56] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:27:42] <legoktm>	 the zotero spikes seem normal, I'm not looking anymore unless it pages again
[03:29:00] <icinga-wm>	 PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: database-backups-snapshots.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:30:30] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:32:18] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:32:30] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:36:08] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:37:48] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:41:30] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:50:44] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:52:34] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:00:04] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:01:55] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:23:58] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:25:48] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[04:27:32] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:29:22] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:29:40] <wikibugs>	 (03PS2) 10ArielGlenn: Cleanup old mediainfo dumps [puppet] - 10https://gerrit.wikimedia.org/r/702413 (https://phabricator.wikimedia.org/T273266) (owner: 10Matthias Mullie)
[04:35:07] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] Cleanup old mediainfo dumps [puppet] - 10https://gerrit.wikimedia.org/r/702413 (https://phabricator.wikimedia.org/T273266) (owner: 10Matthias Mullie)
[04:48:59] <marostegui>	 !log Disconnect eqiad -> codfw replication from s1-s8
[04:49:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:57:30] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:59:25] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:59:34] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:03:28] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:05:12] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:11:02] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:22:41] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Set core sections to unidir replication. [puppet] - 10https://gerrit.wikimedia.org/r/702255 (owner: 10Marostegui)
[05:25:12] <wikibugs>	 (03PS1) 10Marostegui: db1122,db1129,db1104: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/702505
[05:25:56] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1122,db1129,db1104: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/702505 (owner: 10Marostegui)
[05:27:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P16749 and previous config saved to /var/cache/conftool/dbconfig/20210701-052702-marostegui.json
[05:27:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:28:48] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:30:44] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:40:14] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:44:08] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:50:00] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:51:23] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:52:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16750 and previous config saved to /var/cache/conftool/dbconfig/20210701-055243-marostegui.json
[05:52:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:52] <marostegui>	 !log Deploy schema change on s6 eqiad master (db1173) T277123
[05:55:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:56:00] <stashbot>	 T277123: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123
[05:57:34] <marostegui>	 !log Deploy schema change on s5 eqiad master (db1130) T277123
[05:57:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:57:49] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:58:53] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:59:03] <elukey>	 the flapping of OSPF --^ is related to Lumen IIUC right?
[05:59:15] <elukey>	 if it keeps going we may need to downtime it, very spammy
[06:02:13] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:03:29] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:03:43] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:06:23] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:15:21] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:18:47] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:23:12] <XioNoX>	 looking
[06:24:22] <XioNoX>	 yep, lumen, opening a ticket
[06:26:47] <XioNoX>	  NEEDS ATTENTION
[06:26:47] <XioNoX>	 Initial service diagnostics have detected that current optical light levels are outside manufacturer recommendations at 957 STATION RD, BELLPORT, NY.
[06:26:47] <XioNoX>	 Recommended range is Min: -6 dBm; Max: -1 dBm. Measured light level is -55 dBm.
[06:26:47] <XioNoX>	 Next Steps: Please open a Repair Ticket for review by a Lumen technician.
[06:27:23] <XioNoX>	 reminds me of https://img.ifunny.co/images/fefe8365fc0b2abc23aa87fe276113f1cd7c4fdcc6f430322c3fe76d5704a240_1.jpg
[06:29:17] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:31:29] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:31:51] <marostegui>	 !log Deploy schema change on s2,s8 eqiad masters T277123
[06:31:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:01] <stashbot>	 T277123: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123
[06:34:42] <marostegui>	 !log Deploy schema change on s7 eqiad (db1136) masters T277123
[06:34:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:56:23] <wikibugs>	 (03PS1) 10Marostegui: db1110,db1180: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/702578
[06:57:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1110,db1180: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/702578 (owner: 10Marostegui)
[06:59:11] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:05:07] <wikibugs>	 10SRE, 10serviceops, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10Legoktm) a:03Legoktm I've tried to summarize a combination of what I did and the feedback here into https://wikitech.wikimedia.org/wiki/Switch_Dat...
[07:06:03] <marostegui>	 !log Deploy schema change on s4 eqiad (db1138) master T277123
[07:06:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:06:11] <stashbot>	 T277123: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123
[07:07:22] <wikibugs>	 10SRE, 10MW-on-K8s, 10Shellbox, 10serviceops, and 3 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10Samwilson) The 1.36 release notes say that "Command::execute() now returns a Shellbox\Command\UnboxedResult instead of a MediaWiki\Shell\Result....
[07:08:17] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:13:44] <wikibugs>	 10SRE, 10serviceops, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10Joe) After talking off-phabricator with a few people, I think what we have seen is more of a failure of coordination between affected SRE teams than...
[07:14:41] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:15:41] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:17:37] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:18:31] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:22:47] <wikibugs>	 (03PS1) 10Muehlenhoff: elastic: Switch to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/702580 (https://phabricator.wikimedia.org/T164456)
[07:25:59] <wikibugs>	 (03CR) 10DCausse: "> Patch Set 22: Code-Review+1" [deployment-charts] - 10https://gerrit.wikimedia.org/r/671204 (https://phabricator.wikimedia.org/T264006) (owner: 10Mstyles)
[07:27:27] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/702580 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff)
[07:29:51] <icinga-wm>	 PROBLEM - SSH on analytics1069.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:30:33] <wikibugs>	 (03CR) 10JMeybohm: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/701530 (https://phabricator.wikimedia.org/T264209) (owner: 10JMeybohm)
[07:32:19] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:46:05] <wikibugs>	 10SRE, 10Traffic, 10User-notice: Rate limit requests in violation of User-Agent policy more aggressively - https://phabricator.wikimedia.org/T224891 (10ayounsi) p:05Medium→03High
[08:04:26] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Code LGTM although please publish/rebase the patch against the upstream-21.4.0 branch which is what is deployed (yes it is confusing, mast" [software/librenms] - 10https://gerrit.wikimedia.org/r/702438 (https://phabricator.wikimedia.org/T229542) (owner: 10Cathal Mooney)
[08:06:58] <wikibugs>	 (03CR) 10Ayounsi: "Thanks!" (034 comments) [software/librenms] - 10https://gerrit.wikimedia.org/r/702438 (https://phabricator.wikimedia.org/T229542) (owner: 10Cathal Mooney)
[08:09:26] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10SRE-tools: Broken disk on thanos-be1003 but not reported / task not opened - https://phabricator.wikimedia.org/T285662 (10fgiunchedi) Thank you for the context, now I also recall a similar failure mode where we were wishing to have the number of expected disks! Indeed I...
[08:11:26] <logmsgbot>	 !log jelto@cumin1001 conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
[08:11:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:13:07] <logmsgbot>	 !log jelto@cumin1001 conftool action : set/pooled=inactive; selector: name=mw1261.eqiad.wmnet
[08:13:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:13:18] <mutante>	 ^ depools are us, working together in a session on how to decom old eqiad hardware
[08:19:12] <wikibugs>	 (03CR) 10Muehlenhoff: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/697605 (owner: 10Ottomata)
[08:22:03] <logmsgbot>	 !log jelto@cumin1001 conftool action : set/pooled=no; selector: name=mw126[2-6].eqiad.wmnet
[08:22:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:07] <logmsgbot>	 !log jelto@cumin1001 conftool action : set/pooled=inactive; selector: name=mw126[2-6].eqiad.wmnet
[08:23:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:42] <wikibugs>	 (03CR) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config (038 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli)
[08:27:52] <wikibugs>	 (03CR) 10Effie Mouzeli: tegola-vector-tiles: add caching support (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/701369 (owner: 10Jgiannelos)
[08:28:00] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] tegola-vector-tiles: add caching support [deployment-charts] - 10https://gerrit.wikimedia.org/r/701369 (owner: 10Jgiannelos)
[08:28:33] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.hosts.decommission for hosts mw1261.eqiad.wmnet
[08:28:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:25] <wikibugs>	 (03Merged) 10jenkins-bot: tegola-vector-tiles: add caching support [deployment-charts] - 10https://gerrit.wikimedia.org/r/701369 (owner: 10Jgiannelos)
[08:31:03] <wikibugs>	 (03CR) 10Ayounsi: [C: 04-1] "Indeed! I'm also trying to think on how to not "hard-code" interface names." (034 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/702446 (https://phabricator.wikimedia.org/T285461) (owner: 10Cathal Mooney)
[08:31:39] <icinga-wm>	 RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:31:50] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM overall, see inline" (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/702477 (owner: 10Dave Pifke)
[08:36:41] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli)
[08:36:56] <wikibugs>	 (03PS13) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159)
[08:46:08] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "I see jbond already made most of the comments I had ready, so they've been amended. LGTM!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701530 (https://phabricator.wikimedia.org/T264209) (owner: 10JMeybohm)
[08:49:22] <wikibugs>	 (03PS1) 10Effie Mouzeli: hieradata: enable TLS on memcached eqiad hosts [puppet] - 10https://gerrit.wikimedia.org/r/702590 (https://phabricator.wikimedia.org/T271967)
[08:50:57] <logmsgbot>	 !log jelto@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1261.eqiad.wmnet
[08:51:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:51:07] <wikibugs>	 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `mw1261.eqiad.wmnet` - m...
[08:52:41] <marostegui>	 !log Deploy schema change on s1 eqiad (db1163) master T277123
[08:52:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:52:52] <stashbot>	 T277123: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123
[08:53:33] <wikibugs>	 (03PS1) 10Effie Mouzeli: hieradata: replace mcrouter proxies in with eqiad hosts [puppet] - 10https://gerrit.wikimedia.org/r/702592 (https://phabricator.wikimedia.org/T271967)
[08:53:37] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:54:41] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:55:54] <wikibugs>	 10SRE, 10Thumbor: Thumbor fails to render PNG with "Failed to convert image convert: IDAT: invalid distance too far back", returns 429 "Too Many Requests" - https://phabricator.wikimedia.org/T285875 (10ema) >>! In T285875#7188988, @Legoktm wrote: > @colewhite and I dug into this, it appears to be an issue with...
[08:57:41] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[09:01:11] <wikibugs>	 10SRE, 10Traffic: Preserve Server response header when generating custom error page via VCL - https://phabricator.wikimedia.org/T285926 (10ema)
[09:04:20] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Report subprocess stdout/stderr as strings [alerts] - 10https://gerrit.wikimedia.org/r/702593
[09:05:08] <marostegui>	 !log Deploy schema change on s1 eqiad (db1157) master T277123
[09:05:10] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Stop using legacy entityNamespaces setting in onSetupAfterCache hook [extensions/Wikibase] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702400 (https://phabricator.wikimedia.org/T285472)
[09:05:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:16] <stashbot>	 T277123: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123
[09:05:57] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1265 is CRITICAL: Host mw1265 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[09:07:24] <wikibugs>	 (03PS2) 10Filippo Giunchedi: Report subprocess stdout/stderr as strings [alerts] - 10https://gerrit.wikimedia.org/r/702593
[09:12:16] <wikibugs>	 (03CR) 10David Caro: puppet.refresh_certs: don't fail if resources changed (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/701876 (owner: 10David Caro)
[09:12:41] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1264 is CRITICAL: Host mw1264 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[09:13:05] <wikibugs>	 (03CR) 10David Caro: puppet.refresh_certs: don't fail if resources changed (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/701876 (owner: 10David Caro)
[09:13:47] <wikibugs>	 (03CR) 10David Caro: "This requires https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/701876" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/702082 (https://phabricator.wikimedia.org/T274498) (owner: 10David Caro)
[09:19:58] <wikibugs>	 (03CR) 10Jforrester: "> Patch Set 1:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702421 (https://phabricator.wikimedia.org/T260297) (owner: 10Legoktm)
[09:26:33] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1263 is CRITICAL: Host mw1263 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[09:28:21] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: don't deploy alerts to 'global' instance by default [puppet] - 10https://gerrit.wikimedia.org/r/702599 (https://phabricator.wikimedia.org/T284810)
[09:28:53] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "> Patch Set 6:" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/701938 (owner: 10Jgiannelos)
[09:30:25] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1262 is CRITICAL: Host mw1262 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[09:30:28] <wikibugs>	 (03Abandoned) 10Cathal Mooney: Modified version of LibreNMS Prometheus.php to add prefix [software/librenms] - 10https://gerrit.wikimedia.org/r/702438 (https://phabricator.wikimedia.org/T229542) (owner: 10Cathal Mooney)
[09:30:33] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1266 is CRITICAL: Host mw1266 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups
[09:31:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30064/console" [puppet] - 10https://gerrit.wikimedia.org/r/702599 (https://phabricator.wikimedia.org/T284810) (owner: 10Filippo Giunchedi)
[09:35:34] <logmsgbot>	 !log jiji@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[09:35:35] <logmsgbot>	 !log jiji@deploy1002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[09:35:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:35:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:01] <logmsgbot>	 !log jiji@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[09:36:04] <logmsgbot>	 !log jiji@deploy1002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[09:36:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:36] <wikibugs>	 (03PS4) 10JMeybohm: dragonfly: Add dragonfly supernode and client (dfdaemon) modules [puppet] - 10https://gerrit.wikimedia.org/r/701530 (https://phabricator.wikimedia.org/T264209)
[09:38:01] <wikibugs>	 (03CR) 10JMeybohm: dragonfly: Add dragonfly supernode and client (dfdaemon) modules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701530 (https://phabricator.wikimedia.org/T264209) (owner: 10JMeybohm)
[09:41:34] <wikibugs>	 10SRE, 10Machine-Learning-Team, 10serviceops: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10elukey)
[09:41:56] <wikibugs>	 10SRE, 10Machine-Learning-Team, 10serviceops: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10elukey)
[09:42:46] <wikibugs>	 10SRE, 10Machine-Learning-Team, 10serviceops, 10Kubernetes: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10JMeybohm)
[09:44:41] <wikibugs>	 10SRE, 10Machine-Learning-Team, 10serviceops, 10Kubernetes: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10JMeybohm)
[09:47:12] <logmsgbot>	 !log jiji@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
[09:47:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:52:14] <wikibugs>	 10SRE, 10Machine-Learning-Team, 10serviceops, 10Kubernetes: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10JMeybohm) I don't like the idea of having another way of how calico-node is run (it's already complex enough). Because of that I'll sugg...
[09:55:07] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] maps: Switch buster nodes to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/702114 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff)
[09:55:33] <Amir1>	 !log start of clean up of autoreview logs in ruwiki (T285608)
[09:55:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:41] <stashbot>	 T285608: Stop logging and clean up auto review logs - https://phabricator.wikimedia.org/T285608
[09:56:02] <moritzm>	 !log installing remaining gnutls28 security updates
[09:56:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:42] <wikibugs>	 (03PS1) 10David Caro: ceph.keyring: Add requirement for the ceph-common package [puppet] - 10https://gerrit.wikimedia.org/r/702602
[09:57:08] <wikibugs>	 (03PS2) 10David Caro: ceph.keyring: Add requirement for the ceph-common package [puppet] - 10https://gerrit.wikimedia.org/r/702602
[09:58:35] <wikibugs>	 10SRE, 10Machine-Learning-Team, 10serviceops, 10Kubernetes: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10elukey) Definitely, it seems a good way to proceed. The only concern that I have is that our kube masters are lightweight VMs (1 virtual...
[09:58:50] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] ceph.keyring: Add requirement for the ceph-common package [puppet] - 10https://gerrit.wikimedia.org/r/702602 (owner: 10David Caro)
[09:59:59] <Amir1>	 marostegui: buckle up, it's 40M rows being deleted from ruwiki
[10:00:04] <jouncebot>	 mvolz: Time to snap out of that daydream and deploy Services – Citoid /  Zotero. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210701T1000).
[10:05:29] <moritzm>	 !log installing remaining libgcrypt20 security updates
[10:05:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:26] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] Switch remaining (stretch) maps hosts to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/702347 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff)
[10:11:44] <apergos>	 40 million??
[10:11:44] <wikibugs>	 (03PS4) 10Ayounsi: Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806)
[10:12:02] <apergos>	 oooohhh wow thank goodness
[10:12:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[10:13:47] <wikibugs>	 (03CR) 10Ayounsi: Move RPKI alerts to Prometheus/AM (032 comments) [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[10:14:04] <marostegui>	 Amir1: sweeeet
[10:16:22] <wikibugs>	 (03PS1) 10Effie Mouzeli: tegola-vector-tiles: fix values [deployment-charts] - 10https://gerrit.wikimedia.org/r/702604
[10:19:00] <wikibugs>	 10SRE, 10Machine-Learning-Team, 10serviceops, 10Kubernetes: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10JMeybohm) Yeah, maybe. Calico-node runs with a memory limit of 400Mi and CPU requests if 350m but the other components will also take up...
[10:19:35] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] Report subprocess stdout/stderr as strings [alerts] - 10https://gerrit.wikimedia.org/r/702593 (owner: 10Filippo Giunchedi)
[10:20:29] <wikibugs>	 (03PS5) 10Ayounsi: Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806)
[10:20:36] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] tegola-vector-tiles: fix values [deployment-charts] - 10https://gerrit.wikimedia.org/r/702604 (owner: 10Effie Mouzeli)
[10:21:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[10:21:49] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Ship a minimal README.md [alerts] - 10https://gerrit.wikimedia.org/r/702606
[10:22:04] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] Ship a minimal README.md [alerts] - 10https://gerrit.wikimedia.org/r/702606 (owner: 10Filippo Giunchedi)
[10:22:19] <effie>	 Amir1: then he will realise that he will never take this space back 
[10:22:26] <effie>	 and cry himself to sleep 
[10:22:27] <effie>	 :p
[10:23:16] <wikibugs>	 (03Merged) 10jenkins-bot: tegola-vector-tiles: fix values [deployment-charts] - 10https://gerrit.wikimedia.org/r/702604 (owner: 10Effie Mouzeli)
[10:24:34] <Amir1>	 effie: honestly, the main problem is that these changes are so massive that if I go too slow, it'll take a month to finish, if I go too fast, it'll bring down replication. This is okay (around 20GB), the image table in commons is around 300GB, that'll take a good month at least
[10:25:18] <wikibugs>	 (03PS5) 10Muehlenhoff: Don't show Kerberos ticket info in general [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840)
[10:25:20] <Amir1>	 I don't know why rows read in s6 is high, is the query not very optimized :/
[10:25:46] <effie>	 we know who we're gonna call either way 
[10:25:47] <effie>	 :p
[10:26:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Don't show Kerberos ticket info in general [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff)
[10:27:07] <logmsgbot>	 !log jiji@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
[10:27:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:36] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff)
[10:31:25] <icinga-wm>	 RECOVERY - SSH on analytics1069.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:33:11] <wikibugs>	 (03CR) 10Muehlenhoff: "Updated the patch, the CI failure is some unrelated breakage." [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff)
[10:33:48] <wikibugs>	 10SRE, 10serviceops, 10Sustainability: Jobrunner on Buster occasional timeout on codfw file upload - https://phabricator.wikimedia.org/T275752 (10fgiunchedi) Another data point, as expected post-switchover the high latency uploads from jobrunners moved from codfw to eqiad since codfw is now active.
[10:35:09] <wikibugs>	 10SRE, 10serviceops, 10Sustainability: Jobrunner on Buster occasional timeout on codfw file upload - https://phabricator.wikimedia.org/T275752 (10fgiunchedi) Also to avoid confusion I'd like to clarify that on the swift side I can't find anything obviously wrong though I don't have the bandwidth to investiga...
[10:35:58] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10SRE-tools: Broken disk on thanos-be1003 but not reported / task not opened - https://phabricator.wikimedia.org/T285662 (10Volans) p:05Triage→03Medium Ack, let's keep it around for now to explore what options we have.
[10:52:55] <wikibugs>	 (03PS6) 10Ayounsi: Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806)
[10:53:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[10:55:23] <wikibugs>	 (03PS1) 10Effie Mouzeli: tegola-vector-tiles: fix values for postgres [deployment-charts] - 10https://gerrit.wikimedia.org/r/702609
[10:57:07] <wikibugs>	 (03PS3) 10Zabe: Avoid using MWNamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697851
[10:57:09] <wikibugs>	 (03CR) 10Jgiannelos: [C: 03+1] tegola-vector-tiles: fix values for postgres [deployment-charts] - 10https://gerrit.wikimedia.org/r/702609 (owner: 10Effie Mouzeli)
[10:58:33] <wikibugs>	 (03PS1) 10Gergő Tisza: Welcome tour: Mark as complete when notice is shown [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702401 (https://phabricator.wikimedia.org/T284800)
[10:59:09] <wikibugs>	 (03PS7) 10Ayounsi: Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806)
[10:59:51] <wikibugs>	 (03PS1) 10Gergő Tisza: Welcome tour: Mark as complete when notice is shown [extensions/GrowthExperiments] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702402 (https://phabricator.wikimedia.org/T284800)
[10:59:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[11:00:05] <jouncebot>	 Amir1, Lucas_WMDE, apergos, and duesen: It is that lovely time of the day again! You are hereby commanded to deploy EU Backport and Config training. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210701T1100).
[11:00:05] <jouncebot>	 Lucas_WMDE and zabe: A patch you scheduled for EU Backport and Config training is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:06] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] tegola-vector-tiles: fix values for postgres [deployment-charts] - 10https://gerrit.wikimedia.org/r/702609 (owner: 10Effie Mouzeli)
[11:00:10] <Lucas_WMDE>	 o/
[11:00:14] <apergos>	 hello
[11:00:19] <zabe>	 o/
[11:00:21] <apergos>	 no one is signed up for the EU training slot.
[11:00:29] <apergos>	 we will have someone present for the US slot though!  
[11:00:31] <Lucas_WMDE>	 was just about to ask, thanks
[11:00:33] <Lucas_WMDE>	 oh cool!
[11:00:35] <apergos>	 there is only one patch in the window
[11:00:37] <effie>	 I have a sticker if you need one 
[11:00:40] <apergos>	 and you know who :-P
[11:00:50] <Lucas_WMDE>	 I see two patches
[11:00:55] <apergos>	 wut
[11:01:00] <apergos>	 who snuck one in last minute?
[11:01:08] <zabe>	 I did
[11:01:09] <Lucas_WMDE>	 zabe did!
[11:01:15] <Lucas_WMDE>	 looking at it now
[11:01:17] <apergos>	 oh three now!
[11:01:21] <Lucas_WMDE>	 might do this before my own backport to speed it up
[11:01:29] <Lucas_WMDE>	 four!
[11:01:38] <apergos>	 ok, it's nice to have a tiny warning so we can spend some time to read these and make sure they are ok to go
[11:01:40] <Lucas_WMDE>	 (four people, three patches)
[11:01:44] <apergos>	 just saying :-P
[11:01:49] <Lucas_WMDE>	 (the other way around)
[11:02:04] <tgr>	 we didn't want you to feel neglected apergos 
[11:02:27] <apergos>	 thanks soooo much :-P
[11:02:32] <Lucas_WMDE>	 oh, I already reviewed this config patch a month ago :D
[11:02:33] <apergos>	 these all look pretty straight forward
[11:02:41] <Lucas_WMDE>	 rebasing and deploying the MWNamespace one
[11:02:41] <apergos>	 as far as deployment goes.
[11:02:52] <tgr>	 anyway I can do the backports, they will take a while to get through CI though
[11:02:53] <apergos>	 are we all self serve here or who's doing what?
[11:02:58] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Avoid using MWNamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697851 (owner: 10Zabe)
[11:03:00] <wikibugs>	 (03Merged) 10jenkins-bot: tegola-vector-tiles: fix values for postgres [deployment-charts] - 10https://gerrit.wikimedia.org/r/702609 (owner: 10Effie Mouzeli)
[11:03:05] <apergos>	 ok tgr you got those
[11:03:20] <wikibugs>	 (03PS1) 10Hnowlan: maps: make maps2010 a buster replica of maps2009 [puppet] - 10https://gerrit.wikimedia.org/r/702615 (https://phabricator.wikimedia.org/T269582)
[11:03:36] <apergos>	 zabe: are you self serve or woudl you like someone to deploy for you?
[11:03:46] <wikibugs>	 (03PS8) 10Ayounsi: Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806)
[11:04:03] <wikibugs>	 (03Merged) 10jenkins-bot: Avoid using MWNamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697851 (owner: 10Zabe)
[11:04:20] <apergos>	 oic. I guess Lucas is doing yours then :-D
[11:04:26] <zabe>	 I can't self serve, I don't have access. But it looks like Lucas is already doing it.
[11:04:32] <Lucas_WMDE>	 yeah I can do it
[11:04:41] <apergos>	 okey dokey!
[11:04:41] <Lucas_WMDE>	 (looked at the deployers list in puppet and didn’t see a zabe there)
[11:04:49] <tgr>	 I might I might even sneak in two more
[11:04:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[11:05:18] <Lucas_WMDE>	 hm, I just pulled the change to mwdebug1001
[11:05:22] <Lucas_WMDE>	 but we probably want to test on codfw?
[11:05:29] <apergos>	 ah sighi
[11:05:32] <Lucas_WMDE>	 I think I had an outdated version of my `deploy` script that opens all the terminals
[11:06:23] <wikibugs>	 (03PS2) 10Hnowlan: maps: reimage maps2008 as buster replica in new cluster [puppet] - 10https://gerrit.wikimedia.org/r/702099
[11:06:25] <apergos>	 but does the extension have any setting for codfw?
[11:06:29] <Lucas_WMDE>	 ok, now pulled to mwdebug2001
[11:07:20] <logmsgbot>	 !log jiji@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
[11:07:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:28] * apergos sees that it does, oh well, shoulda looked before asking
[11:08:07] <Lucas_WMDE>	 looks okay to merge from my side
[11:08:18] <Lucas_WMDE>	 apergos: not sure what you mean by having settings for codfw…
[11:08:21] <apergos>	 zabe, can you test please?
[11:08:27] <zabe>	 Lucas_WMDE: for me everything does look like before (which should be the case), so I think we are good.
[11:08:34] <Lucas_WMDE>	 ok
[11:08:36] <apergos>	 oh just have the mwdebug hosts in the dropdown, Lucas_WMDE, and indeed it does
[11:08:47] <Lucas_WMDE>	 ah, the *browser* extension ^^
[11:08:51] <Lucas_WMDE>	 I thought you meant Wikibase
[11:08:52] <apergos>	 yeah, sorry :-D
[11:08:56] <apergos>	 no no no!
[11:09:10] <Lucas_WMDE>	 syncing
[11:09:38] <Lucas_WMDE>	 (I also looked at the effective excludeNamespaces setting in shell.php and it looked fine, still including tons of odd numbers = talk namespaces)
[11:10:12] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:697851|Avoid using MWNamespace]] (duration: 01m 06s)
[11:10:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:10:22] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Welcome tour: Mark as complete when notice is shown [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702401 (https://phabricator.wikimedia.org/T284800) (owner: 10Gergő Tisza)
[11:10:25] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Welcome tour: Mark as complete when notice is shown [extensions/GrowthExperiments] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702402 (https://phabricator.wikimedia.org/T284800) (owner: 10Gergő Tisza)
[11:10:47] <apergos>	 someone's on a roll :-)
[11:10:50] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Stop using legacy entityNamespaces setting in onSetupAfterCache hook [extensions/Wikibase] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702400 (https://phabricator.wikimedia.org/T285472) (owner: 10Lucas Werkmeister (WMDE))
[11:10:53] <icinga-wm>	 PROBLEM - Disk space on releases1002 is CRITICAL: DISK CRITICAL - free space: /srv/docker 5040 MB (3% inode=80%): /srv/docker/overlay2/e08bd827952d234ff75ff3917e9fb0f2e8bf6358f44d847a205186531165ca73/merged 5040 MB (3% inode=80%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=releases1002&var-datasource=eqiad+prometheus/ops
[11:10:55] <Lucas_WMDE>	 just +2ed all the backports
[11:11:04] <Lucas_WMDE>	 I think it’s okay to deploy them in whatever order they merge in
[11:11:17] <Lucas_WMDE>	 Wikibase will probably slower than GrowthExperiments
[11:11:38] <Lucas_WMDE>	 tgr: do you want to deploy them when they’re merged or should I?
[11:13:23] <tgr>	 Lucas_WMDE: I can deploy them, I have two more coming (but I should probably wait until these are merged, otherwise Zuul will restart)
[11:13:29] <Lucas_WMDE>	 ok
[11:14:05] <tgr>	 I can just wait until you are done with the Wikibase one. There is no window after this so no rush.
[11:14:38] <Lucas_WMDE>	 sounds good
[11:15:10] <Lucas_WMDE>	 I think I’ll take my time testing the Wikibase change on mwdebug, but we can +2 your other backports as soon as the Wikibase change merges, that still gives me ca. 15 minutes ^^
[11:16:20] <apergos>	 um a clarifying question, if we dpeloy them in a different order than they are merged, won't that means some rebase shuffling during this window ad then again for whoever gets the next patch after these? 
[11:16:34] <apergos>	 oh it's scap. it's rsync, nm
[11:16:42] <hashar>	 Lucas_WMDE: tgr: you can just +2 all the changes and let CI process/merge them. BUT you will have to be careful when you fetch on the deployment server
[11:17:12] <Lucas_WMDE>	 rebase ~~ALL the changes~~ just some of the changes
[11:17:35] <hashar>	 although it might a bit of a mental burden to have patch merged  but only fetch the one you want to deploy on the server
[11:17:39] <Lucas_WMDE>	 apergos: it looks like Zuul is enforcing that they’re merged in the same order that they were +2ed, and then we’ll deploy in that order too
[11:17:46] <apergos>	 right
[11:17:50] <Lucas_WMDE>	 I thought it might do them in parallel but they’re chained
[11:17:52] <hashar>	 more or less yes
[11:17:56] <apergos>	 convenient
[11:17:58] <hashar>	 assuming the jobs pass
[11:18:22] <hashar>	 if you +2 A then B then C they will be merged in that order cause they all depend on each other in the CI queue
[11:18:29] <hashar>	 but if A fails for some reason, it is dropped from the CI queue
[11:18:29] <tgr>	 hashar: right, but when Zuul is in the middle of testing a set of patches that have been +2-ed, and you +2 another one, won't it discard the process and start a new CI job for the new set of +2-ed patches?
[11:18:43] <hashar>	 and B and C have all the jobs cancelled and retriggered  to no more take in account A
[11:18:52] <hashar>	 so you end up with B and C merged in that order but A left out
[11:18:58] <Lucas_WMDE>	 *nod*
[11:19:09] <logmsgbot>	 !log jiji@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
[11:19:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:19:23] <hashar>	 tgr: it put the the +2ed patch on top of the existing queue
[11:19:37] <hashar>	 B is tested in CI as if A had been merged already
[11:20:01] <apergos>	 oh that's nice!
[11:20:04] <apergos>	 TIL
[11:20:08] <tgr>	 I see. Not much point in waiting then, I guess.
[11:20:17] <hashar>	 I should really really do a training about zuul again
[11:20:30] <apergos>	 it is definitely time for a refresher
[11:20:33] <hashar>	 I used to do those years ago but definitely missed that and there is a bit of confusion
[11:20:34] <hashar>	 yeah
[11:20:46] <apergos>	 plus lots of new folks since then 
[11:21:01] <hashar>	 yeah I can't say I am any good at having folks onboarded :^D
[11:21:05] <apergos>	 :-D
[11:21:12] <hashar>	 anyway, the doc is https://zuul-ci.org/docs/zuul/discussion/gating.html 
[11:21:31] <hashar>	 from upstream, which is like 2 major versions above the one we use but that doc still applies
[11:21:36] <hashar>	 (i wrote part of it)
[11:22:01] <apergos>	 👀
[11:22:08] <hashar>	 the bulk of the idea is when one +2 a change A done to mediawiki,  then +2 a change B for Wikibase
[11:22:10] <wikibugs>	 10SRE: Please add btullis@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T285936 (10BTullis)
[11:22:42] <hashar>	 when starting processing B, Zuul creates a merge commit of A gainst master for mediawiki  and creates a git ref like refs/zuul/master/B
[11:22:57] <hashar>	 CI then clone Wikibase,  fetches change B
[11:23:12] <hashar>	 and clone mediawiki/core then attempts to fetch refs/zuul/master/B   which has the change A ahead in the queue
[11:23:16] <hashar>	 then run the tests
[11:23:31] <hashar>	 thus the jobs running for B run for code that contains the change A
[11:23:36] <apergos>	 nice!
[11:23:48] <hashar>	 and it does that for the whole chain of changes that are in the gate-and-submit queue
[11:24:09] <hashar>	 adding a Depends-On header in the commit message triggers the same logic
[11:24:22] <hashar>	 so one can test a change as if a change from another repo already got merged
[11:24:33] <wikibugs>	 10SRE, 10Gerrit-Privilege-Requests: Grant Access to mediawiki gerrit group for divec - https://phabricator.wikimedia.org/T285931 (10Jdforrester-WMF)
[11:24:35] <hashar>	 but yeah point taken I should do a presentation
[11:24:45] <apergos>	 I've only added those headers in order to tell devs DON'T MERGE THIS YET
[11:24:47] <apergos>	 :-D
[11:24:54] <hashar>	 maybe to whole engineering folks
[11:24:58] <hashar>	 yeah
[11:24:58] <apergos>	 please do, I will be showing up!
[11:25:11] <hashar>	 so if your change B has a Depends-On: A   and you +2 B
[11:25:31] <hashar>	 Zuul fetch the metadata for change B, notice it depends-on A,  check whether A got merged or already had been +2
[11:25:35] <hashar>	 and if not, it bails out
[11:25:49] <apergos>	 gonna have to start using that a lot more often
[11:26:02] <hashar>	 so essentially B is blocked until either A got merged or is ahead in the queue (ie A received a +2  before B got a +2)
[11:26:25] <hashar>	 it does not apply to operations/dns or operations/puppet though, they use a slightly different workflow
[11:26:28] <Lucas_WMDE>	 yeah, Depends-on is super cool
[11:26:42] <hashar>	 (there is no gate-and-submit , patches are just directly submitted bypassing ci)
[11:27:07] <hashar>	 yeah depends-on and the whole gating system ( maybe for a tech department update )  is the killer future of Zuul
[11:27:22] <Lucas_WMDE>	 gate-and-submit-wmf is nearing completion
[11:27:37] <Lucas_WMDE>	 I think I’ll try to `git fetch` on the deployment host after both GrowthExperiments changes merge, but before Wikibase
[11:27:45] <Lucas_WMDE>	 so then `scap pull` on mwdebug gets the fix on all wikis
[11:27:50] <Lucas_WMDE>	 for easy testin
[11:27:52] <Lucas_WMDE>	 *testing
[11:27:57] <Lucas_WMDE>	 (and then sync twice if it works)
[11:28:30] <hashar>	 I think that even if you pull both changes in mediawiki/core  you can solely submodule update GrowthExperiments and deploy that
[11:28:36] <hashar>	 then submodule update Wikibase and deploy that
[11:28:45] <Lucas_WMDE>	 true, good point
[11:28:48] <hashar>	 should be fine unless you trigger a full sync in which case both repos will be updated
[11:28:54] <Lucas_WMDE>	 (also I forgot that I was going to let tgr deploy these, sorry ^^)
[11:29:05] <hashar>	 that is why I guess folks +2  <wait> ,  pull , submodule update, deploy
[11:29:14] <hashar>	 then +2 <wait again sigh>, pull etc
[11:29:32] <apergos>	 yes exactly
[11:29:33] <tgr>	 Lucas_WMDE: I can wait until you are done, I have two more backports to set up in the meanwhile
[11:29:41] <hashar>	 which is slow but gives a guarantee that the state on the deploy server is always fine / deployable
[11:29:48] <apergos>	 (please do add them to the deployment page for the record!)
[11:29:51] <Lucas_WMDE>	 ok then I’ll set up everything until mwdebug and let you test
[11:29:55] <Lucas_WMDE>	 +1 apergos
[11:30:01] <hashar>	 relying on not updating submodules is a good hack, but might be surprising if something goes south or someone else step in after
[11:30:41] <apergos>	 yeah you do not want to have to be thinking about anything extra if something is broken
[11:30:46] <hashar>	 and of course, making those tests dramatically faster would help. I did some investigation but have yet to write a problem statement
[11:31:08] <apergos>	 I wonder if some things can be split and run in parallel for a cheap speedup
[11:31:24] <hashar>	 so yeah +2 ing everything and relying on NOT updating the submodules is not documented, cause it is way too fragile
[11:31:51] <hashar>	 the tests we run are overkill, we simply run too many of them and some should only trigger for the repository they actually test
[11:32:02] <tgr>	 you can do `git submodule update extensions/Wikibase` to only update that, regardless of the patches merged.
[11:32:19] <tgr>	 but I don't think there's much drawback to updating all.
[11:32:27] <hashar>	 like we run the whole wikibase test suite for any repos participating in the  wmf-quibble jobs (aka Vector, or Flow or CirrusSearch)
[11:32:39] <tgr>	 the GE patches are all frontend, they won't interfere with the test.
[11:32:48] <hashar>	 so in short, gotta split the tests so that they don't all trigger for any patches
[11:33:13] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.hosts.decommission for hosts mw1262.eqiad.wmnet
[11:33:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:33:34] <wikibugs>	 (03PS1) 10Hnowlan: maps: reimage maps1010 as buster replica of maps1009 [puppet] - 10https://gerrit.wikimedia.org/r/702619 (https://phabricator.wikimedia.org/T269582)
[11:33:43] <elukey>	 !log reboot ml-serve-ctrl100[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
[11:33:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:34:04] <wikibugs>	 (03PS1) 10Gergő Tisza: SuggestedEdits: Return default JS data as 'noresults' [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702403 (https://phabricator.wikimedia.org/T285906)
[11:34:19] <wikibugs>	 (03PS1) 10Gergő Tisza: SuggestedEdits: Return default JS data as 'noresults' [extensions/GrowthExperiments] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702404 (https://phabricator.wikimedia.org/T285906)
[11:34:31] <James_F>	 You can mark phpunit tests as @standalone and they'll run in commits on your repo but not on other repo's patches.
[11:34:58] <James_F>	 Unfortunately only Cirrus and Scribunto are using the tag so far. 
[11:35:10] <Lucas_WMDE>	 oh I didn’t know that
[11:35:12] <James_F>	 Moving 90% of Wikibase's tests into @standalone would be so nice.
[11:35:26] <Lucas_WMDE>	 that sounds like something we should do at least for most of our proper unit tests
[11:35:27] <James_F>	 Lucas_WMDE: New as of ~ 15 months ago. 
[11:35:33] <marostegui>	 !log Deploy schema change on s8 eqiad master T276150
[11:35:41] <Lucas_WMDE>	 (but probably not for our integration tests? sometimes we have legitimate issues due to core changes)
[11:35:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:43] <stashbot>	 T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150
[11:35:46] <elukey>	 !log reboot ml-serve-ctrl200[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
[11:35:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:59] <James_F>	 Lucas_WMDE: That'd be great, though right now the main issue is Wikibase's endless selenium tests. Speeding up one of the jobs but not the other won't make anything merge faster.
[11:36:35] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] SuggestedEdits: Return default JS data as 'noresults' [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702403 (https://phabricator.wikimedia.org/T285906) (owner: 10Gergő Tisza)
[11:36:39] <James_F>	 Lucas_WMDE: It depends on the nature of the integration test. Unit tests are ultra-fast anyway so there's no point improving that, really.
[11:36:39] <Lucas_WMDE>	 hm, I don’t find any @standalone with codesearch
[11:36:40] <wikibugs>	 (03PS2) 10Hnowlan: maps: make maps1008 a buster replica of maps1009 [puppet] - 10https://gerrit.wikimedia.org/r/702102 (https://phabricator.wikimedia.org/T269582)
[11:36:42] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] SuggestedEdits: Return default JS data as 'noresults' [extensions/GrowthExperiments] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702404 (https://phabricator.wikimedia.org/T285906) (owner: 10Gergő Tisza)
[11:37:43] <Lucas_WMDE>	 ah, @group Standalone?
[11:37:53] <James_F>	 Sorry, yes.
[11:38:01] <James_F>	 PHPunit group, not phpdoc tag.
[11:38:14] <Lucas_WMDE>	 *files away for later*
[11:38:21] <icinga-wm>	 PROBLEM - Check systemd state on ml-serve-ctrl1002 is CRITICAL: CRITICAL - degraded: The following units failed: ifup@ens5.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:38:47] <James_F>	 We reduced Scribunto's test suite from ~ 3 mins to ~ 5 seconds IIRC.
[11:39:13] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl1002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[11:39:42] <wikibugs>	 (03Merged) 10jenkins-bot: Welcome tour: Mark as complete when notice is shown [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702401 (https://phabricator.wikimedia.org/T284800) (owner: 10Gergő Tisza)
[11:39:44] <wikibugs>	 (03Merged) 10jenkins-bot: Welcome tour: Mark as complete when notice is shown [extensions/GrowthExperiments] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702402 (https://phabricator.wikimedia.org/T284800) (owner: 10Gergő Tisza)
[11:40:19] <Lucas_WMDE>	 okay, fetched both those backports onto deploy1002
[11:41:07] <Lucas_WMDE>	 tgr: okay, both GrowthExperiments backports should be on mwdebug2001 now
[11:41:11] <Lucas_WMDE>	 can you test them?
[11:41:42] <tgr>	 Lucas_WMDE: there are two more merging now, I'll wait for those
[11:41:51] <tgr>	 unless you need to do a sync-world?
[11:42:09] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl1001 is CRITICAL: instance=10.64.16.202 verb=CREATE https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[11:42:18] <wikibugs>	 10SRE, 10decommission-hardware: decommission maps2002.codfw.wmnet - https://phabricator.wikimedia.org/T285938 (10MoritzMuehlenhoff)
[11:42:20] <Lucas_WMDE>	 no, but I’d still prefer to deploy these in the order they were merged…
[11:42:28] <Lucas_WMDE>	 I didn’t realize you wanted to wait deploying them
[11:43:36] <tgr>	 I don't think the order matters whatsoever for scap
[11:45:43] <Lucas_WMDE>	 maybe not for scap…
[11:45:59] <tgr>	 is mwmaint2002 the active maintenance server?
[11:46:04] <tgr>	 https://wikitech.wikimedia.org/wiki/Mwmaint2001 seems outdated
[11:46:11] <Lucas_WMDE>	 I think so… I’m on it, at least
[11:46:25] <logmsgbot>	 !log jelto@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1262.eqiad.wmnet
[11:46:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:35] <wikibugs>	 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `mw1262.eqiad.wmnet` - m...
[11:46:38] <James_F>	 Yes, 1002 in eqiad and 2002 in codfw IIRC.
[11:46:52] <Lucas_WMDE>	 so should I pull, test, sync the Wikibase change? while keeping in mind that the GrowthExperiments changes are still outstanding?
[11:47:47] <wikibugs>	 (03Merged) 10jenkins-bot: Stop using legacy entityNamespaces setting in onSetupAfterCache hook [extensions/Wikibase] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702400 (https://phabricator.wikimedia.org/T285472) (owner: 10Lucas Werkmeister (WMDE))
[11:48:24] <tgr>	 I can test the GE ones in a second.
[11:48:26] <Lucas_WMDE>	 Wikibase change pulled to mwdebug2001, testing
[11:49:10] <tgr>	 But yeah, you follow the exact same process. As long as you only sync the Wikibase directory, other patches won't matter.
[11:49:29] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove Hiera settings for maps2002 [puppet] - 10https://gerrit.wikimedia.org/r/702624 (https://phabricator.wikimedia.org/T285938)
[11:49:41] <tgr>	 The worst that can happen is unrelated errors from the other patches, but in this case there's no risk of that.
[11:50:50] <apergos>	 (note that for the us window, where there is expected to be a trainee, it would be best to do the more simple deploy as merged process)
[11:51:57] <Lucas_WMDE>	 okay, I think the Wikibase change is working correctly. syncing
[11:53:44] <tgr>	 apergos: on one hand it's easier to follow, on the other hand "...and now we wait half an hour for the patch to merge" is not super engaging training
[11:54:02] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized php-1.37.0-wmf.12/extensions/Wikibase/: Backport: [[gerrit:702400|Stop using legacy entityNamespaces setting in onSetupAfterCache hook (T285472)]] (duration: 01m 15s)
[11:54:09] <apergos>	 no but that's when you talk about all the things to keep in mind, review git commands together and so on :)
[11:54:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:12] <stashbot>	 T285472: Remove entityNamespaces settings - https://phabricator.wikimedia.org/T285472
[11:55:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] SuggestedEdits: Return default JS data as 'noresults' [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702403 (https://phabricator.wikimedia.org/T285906) (owner: 10Gergő Tisza)
[11:58:34] <wikibugs>	 (03Merged) 10jenkins-bot: SuggestedEdits: Return default JS data as 'noresults' [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702403 (https://phabricator.wikimedia.org/T285906) (owner: 10Gergő Tisza)
[11:58:54] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.hosts.decommission for hosts mw1263.eqiad.wmnet
[11:58:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:59:03] <Lucas_WMDE>	 tgr: do you want to take over or should I pull those changes?
[11:59:04] <tgr>	 file_put_contents(/cache/composer/repo/https---repo.packagist.org/provider-wikimedia$textcat.json): failed to open stream: No space left on device
[11:59:18] <tgr>	 I can take over, thanks
[11:59:24] <Lucas_WMDE>	 ok
[11:59:26] <tgr>	 if CI is willing
[11:59:29] <wikibugs>	 (03Merged) 10jenkins-bot: SuggestedEdits: Return default JS data as 'noresults' [extensions/GrowthExperiments] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702404 (https://phabricator.wikimedia.org/T285906) (owner: 10Gergő Tisza)
[12:00:34] <Lucas_WMDE>	 jouncebot: now
[12:00:34] <jouncebot>	 No deployments scheduled for the next 3 hour(s) and 59 minute(s)
[12:00:39] <Lucas_WMDE>	 ok cool
[12:00:45] <wikibugs>	 (03CR) 10Gergő Tisza: [V: 03+2 C: 03+2] "Oops, meant to do that for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/702403/" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702403 (https://phabricator.wikimedia.org/T285906) (owner: 10Gergő Tisza)
[12:00:50] <apergos>	 oh right. window supposedly over. heh
[12:01:26] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: install jobs-framework-cli [puppet] - 10https://gerrit.wikimedia.org/r/702639
[12:01:43] <wikibugs>	 (03CR) 10Gergő Tisza: "...which is this patch. I'm confused, how did this even merge?" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702403 (https://phabricator.wikimedia.org/T285906) (owner: 10Gergő Tisza)
[12:01:59] <Lucas_WMDE>	 tgr: the CI failure was in the test build, not the gate-and-submit build
[12:02:12] <Lucas_WMDE>	 only gate-and-submit matters for merging
[12:02:20] <tgr>	 don't both need to succeed though?
[12:02:27] <Lucas_WMDE>	 not as far as I know
[12:02:46] <Lucas_WMDE>	 gate-and-submit can even complete and merge the change before the regular test build finishes running
[12:02:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] toolforge: install jobs-framework-cli [puppet] - 10https://gerrit.wikimedia.org/r/702639 (owner: 10Arturo Borrero Gonzalez)
[12:03:22] <Lucas_WMDE>	 (probably not very common, but I’ve seen it happen)
[12:09:21] <logmsgbot>	 !log jelto@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1263.eqiad.wmnet
[12:09:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:09:33] <wikibugs>	 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `mw1263.eqiad.wmnet` - m...
[12:12:09] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [V: 03+2 C: 03+2] "Overriding jenkins because the failure it reports is not related to this patch." [puppet] - 10https://gerrit.wikimedia.org/r/702639 (owner: 10Arturo Borrero Gonzalez)
[12:15:37] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: bastion: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/702642
[12:16:17] <apergos>	 going to wander off for awhile since the actual window is over and waiting for zuul is mind-numbing, as was pointed out earlier :-P  
[12:16:34] <icinga-wm>	 PROBLEM - Check systemd state on stat1007 is CRITICAL: CRITICAL - degraded: The following units failed: performance-asoranking.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:16:55] <Lucas_WMDE>	 are we waiting for zuul? I thought everything’s merged
[12:17:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] toolforge: bastion: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/702642 (owner: 10Arturo Borrero Gonzalez)
[12:19:24] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.hosts.decommission for hosts mw[1264-1265].eqiad.wmnet
[12:19:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:19] <tgr>	 Lucas_WMDE: no, it just took a while to test
[12:20:30] <Lucas_WMDE>	 ok, no problem
[12:20:40] <logmsgbot>	 !log tgr@deploy1002 Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: Backport: [[gerrit:702401|Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702403|SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 09s)
[12:20:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:50] <stashbot>	 T284800: Donors to newcomers: URL parameters - https://phabricator.wikimedia.org/T284800
[12:20:50] <stashbot>	 T285906: [wmf.12-regression] mobile - Suggested edits initial load is not functional - https://phabricator.wikimedia.org/T285906
[12:20:53] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [V: 03+2 C: 03+2] "overriding jenkins, the error it reports is not related to this patch." [puppet] - 10https://gerrit.wikimedia.org/r/702642 (owner: 10Arturo Borrero Gonzalez)
[12:22:44] <wikibugs>	 (03PS9) 10Filippo Giunchedi: Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[12:22:56] <logmsgbot>	 !log tgr@deploy1002 Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/: Backport: [[gerrit:702402|Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702404|SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 08s)
[12:23:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:23:16] <tgr>	 !log EU deploys done
[12:23:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:23:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[12:23:28] <Lucas_WMDE>	 nice, thanks
[12:26:55] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Switch ncredir to profile::nginx [puppet] - 10https://gerrit.wikimedia.org/r/697799 (owner: 10Muehlenhoff)
[12:27:23] <wikibugs>	 10SRE: Integrate Buster 10.10 point update - https://phabricator.wikimedia.org/T285206 (10MoritzMuehlenhoff)
[12:29:58] <logmsgbot>	 !log jelto@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1264-1265].eqiad.wmnet
[12:30:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:30:08] <wikibugs>	 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `mw[1264-1265].eqiad.wmn...
[12:37:44] <logmsgbot>	 !log jiji@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
[12:37:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:38:37] <DannyS712>	 the irc feed for recent changes on en.wikiquote.org is missing a lot of edits (page creation by spambots) though it does detect me deleting the pages. Is this the correct place to report this issue?
[12:39:27] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.hosts.decommission for hosts mw1266.eqiad.wmnet
[12:39:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:49] <logmsgbot>	 !log jelto@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1266.eqiad.wmnet
[12:49:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:59] <wikibugs>	 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `mw1266.eqiad.wmnet` - m...
[12:50:32] <wikibugs>	 (03PS1) 10Elukey: ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927)
[12:54:59] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "I decommissioned mw126[1-6].eqiad.wmnet in Rack A5 using the sre.hosts.decomission cookbook." [puppet] - 10https://gerrit.wikimedia.org/r/679527 (https://phabricator.wikimedia.org/T280203) (owner: 10Dzahn)
[12:56:03] <wikibugs>	 (03PS2) 10Elukey: ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927)
[12:57:50] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+1] Remove Hiera settings for maps2002 [puppet] - 10https://gerrit.wikimedia.org/r/702624 (https://phabricator.wikimedia.org/T285938) (owner: 10Muehlenhoff)
[13:00:55] <wikibugs>	 (03PS1) 10Elukey: Add dummy tokens to ML server master nodes [labs/private] - 10https://gerrit.wikimedia.org/r/702646
[13:01:13] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Add dummy tokens to ML server master nodes [labs/private] - 10https://gerrit.wikimedia.org/r/702646 (owner: 10Elukey)
[13:02:18] <wikibugs>	 (03PS3) 10Elukey: ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927)
[13:02:59] <marostegui>	 !log Deploy schema change on s2 eqiad master T276150
[13:03:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:10] <stashbot>	 T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150
[13:05:11] <wikibugs>	 (03PS1) 10Ottomata: Require python3-pandas for performance asoranking [puppet] - 10https://gerrit.wikimedia.org/r/702647 (https://phabricator.wikimedia.org/T275786)
[13:07:06] <wikibugs>	 (03PS4) 10Elukey: ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927)
[13:07:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927) (owner: 10Elukey)
[13:08:48] <icinga-wm>	 PROBLEM - SSH on logstash2021.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:09:03] <wikibugs>	 (03PS1) 10Ema: varnish: Server response header in custom error pages [puppet] - 10https://gerrit.wikimedia.org/r/702648 (https://phabricator.wikimedia.org/T285926)
[13:09:43] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Require python3-pandas for performance asoranking [puppet] - 10https://gerrit.wikimedia.org/r/702647 (https://phabricator.wikimedia.org/T275786) (owner: 10Ottomata)
[13:10:09] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Require python3-pandas for performance asoranking [puppet] - 10https://gerrit.wikimedia.org/r/702647 (https://phabricator.wikimedia.org/T275786) (owner: 10Ottomata)
[13:11:46] <icinga-wm>	 PROBLEM - SSH on mw1279.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:15:48] <icinga-wm>	 RECOVERY - Check systemd state on stat1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:16:13] <wikibugs>	 (03PS5) 10Elukey: ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927)
[13:17:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927) (owner: 10Elukey)
[13:18:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove Hiera settings for maps2002 [puppet] - 10https://gerrit.wikimedia.org/r/702624 (https://phabricator.wikimedia.org/T285938) (owner: 10Muehlenhoff)
[13:18:10] <wikibugs>	 (03PS1) 10David Caro: backy2: add missing ceph::common dependency to tests [puppet] - 10https://gerrit.wikimedia.org/r/702652
[13:18:12] <wikibugs>	 (03PS1) 10David Caro: wmcs.ceph: remove unused backup role [puppet] - 10https://gerrit.wikimedia.org/r/702653
[13:19:06] <elukey>	 dcaro: o/ from jenkins I see errors like "profile::wmcs::backy2 on debian-10-x86_64 is expected to compile into a catalogue without dependency cycles", something WIP?
[13:20:52] <wikibugs>	 (03PS6) 10Elukey: ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927)
[13:21:54] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30072/console" [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927) (owner: 10Elukey)
[13:22:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927) (owner: 10Elukey)
[13:22:45] <dcaro>	 elukey: yep, got this to fix it https://gerrit.wikimedia.org/r/c/operations/puppet/+/702652
[13:23:00] <dcaro>	 not sure why it did not break when I first merged the previous patch
[13:23:37] <dcaro>	 elukey: do you know if the jenkins job tries to be smart when running the puppet tests and skips some?
[13:25:06] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove maps2002 from conftool [puppet] - 10https://gerrit.wikimedia.org/r/702654 (https://phabricator.wikimedia.org/T285938)
[13:25:43] <elukey>	 dcaro: thanks! No idea :(
[13:28:53] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "The jenkins failures should be separate, currently being fixed by wmcs :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927) (owner: 10Elukey)
[13:31:45] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Preserve Server response header when generating custom error page via VCL - https://phabricator.wikimedia.org/T285926 (10ema) p:05Triage→03Medium
[13:33:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove maps2002 from conftool [puppet] - 10https://gerrit.wikimedia.org/r/702654 (https://phabricator.wikimedia.org/T285938) (owner: 10Muehlenhoff)
[13:33:55] <wikibugs>	 (03PS1) 10David Caro: wmcs.ceph: Add the new 17, 19 and 20 OSDs [puppet] - 10https://gerrit.wikimedia.org/r/702655 (https://phabricator.wikimedia.org/T285858)
[13:34:39] <wikibugs>	 (03PS3) 10MSantos: maps: fix osm sync directory path [puppet] - 10https://gerrit.wikimedia.org/r/701558
[13:35:33] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.decommission for hosts maps2002.codfw.wmnet
[13:35:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:54] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] wmcs.ceph: Add the new 17, 19 and 20 OSDs [puppet] - 10https://gerrit.wikimedia.org/r/702655 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro)
[13:36:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] maps: fix osm sync directory path [puppet] - 10https://gerrit.wikimedia.org/r/701558 (owner: 10MSantos)
[13:37:19] <wikibugs>	 (03PS4) 10MSantos: maps: fix osm sync directory path [puppet] - 10https://gerrit.wikimedia.org/r/701558
[13:37:36] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mwdebug: bump mediawiki version [deployment-charts] - 10https://gerrit.wikimedia.org/r/702657
[13:38:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] maps: fix osm sync directory path [puppet] - 10https://gerrit.wikimedia.org/r/701558 (owner: 10MSantos)
[13:42:24] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "thanks! looks good. merging. will follow-up with a change for the yaml in hieradata/hosts" [puppet] - 10https://gerrit.wikimedia.org/r/679527 (https://phabricator.wikimedia.org/T280203) (owner: 10Dzahn)
[13:43:15] <wikibugs>	 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10decommission-hardware, 10Patch-For-Review: decommission maps2002.codfw.wmnet - https://phabricator.wikimedia.org/T285938 (10MSantos)
[13:44:03] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove DHCP record for maps2002 [puppet] - 10https://gerrit.wikimedia.org/r/702658
[13:45:09] <wikibugs>	 (03PS1) 10Dzahn: remove hieradata/hosts files for former eqiad canaries [puppet] - 10https://gerrit.wikimedia.org/r/702659 (https://phabricator.wikimedia.org/T280203)
[13:48:38] <wikibugs>	 (03CR) 10Jelto: [C: 03+1] "lgtm. We just have remember to recreate this files for the new canaries (and maybe merge them)" [puppet] - 10https://gerrit.wikimedia.org/r/702659 (https://phabricator.wikimedia.org/T280203) (owner: 10Dzahn)
[13:49:01] <wikibugs>	 (03PS1) 10Jgiannelos: Make production images lighter [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/702661
[13:49:08] <wikibugs>	 (03PS1) 10Effie Mouzeli: tegola-vector-tiles: disable probes and enable debugging [deployment-charts] - 10https://gerrit.wikimedia.org/r/702662
[13:49:15] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] remove hieradata/hosts files for former eqiad canaries [puppet] - 10https://gerrit.wikimedia.org/r/702659 (https://phabricator.wikimedia.org/T280203) (owner: 10Dzahn)
[13:49:31] <wikibugs>	 (03PS2) 10Jgiannelos: Reduce production image size [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/702661
[13:50:08] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts maps2002.codfw.wmnet
[13:50:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:50:18] <wikibugs>	 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10decommission-hardware, 10Patch-For-Review: decommission maps2002.codfw.wmnet - https://phabricator.wikimedia.org/T285938 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: `maps2002.codfw.wmnet` - maps2002.c...
[13:50:23] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove DHCP record for maps2002 [puppet] - 10https://gerrit.wikimedia.org/r/702658
[13:50:48] <wikibugs>	 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Dzahn)
[13:51:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove DHCP record for maps2002 [puppet] - 10https://gerrit.wikimedia.org/r/702658 (owner: 10Muehlenhoff)
[13:52:54] <wikibugs>	 (03CR) 10Jgiannelos: [C: 03+1] tegola-vector-tiles: disable probes and enable debugging [deployment-charts] - 10https://gerrit.wikimedia.org/r/702662 (owner: 10Effie Mouzeli)
[13:53:07] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] tegola-vector-tiles: disable probes and enable debugging [deployment-charts] - 10https://gerrit.wikimedia.org/r/702662 (owner: 10Effie Mouzeli)
[13:53:56] <wikibugs>	 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10decommission-hardware, 10Patch-For-Review: decommission maps2002.codfw.wmnet - https://phabricator.wikimedia.org/T285938 (10MoritzMuehlenhoff)
[13:54:10] <wikibugs>	 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10decommission-hardware, 10Patch-For-Review: decommission maps2002.codfw.wmnet - https://phabricator.wikimedia.org/T285938 (10MoritzMuehlenhoff) a:03Papaul
[13:54:43] <wikibugs>	 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Dzahn) @Jclark-ctr @wiki_willy The 6 servers at the bottom of rack A5 (mw1261 through mw1266) have been decomed and...
[13:55:41] <wikibugs>	 (03Merged) 10jenkins-bot: tegola-vector-tiles: disable probes and enable debugging [deployment-charts] - 10https://gerrit.wikimedia.org/r/702662 (owner: 10Effie Mouzeli)
[13:55:56] <wikibugs>	 (03PS2) 10Ema: varnish: Server response header in custom error pages [puppet] - 10https://gerrit.wikimedia.org/r/702648 (https://phabricator.wikimedia.org/T285926)
[13:57:34] <wikibugs>	 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Jelto)
[13:59:13] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install mw14[14-56] - https://phabricator.wikimedia.org/T273915 (10Dzahn) In case it helps here, today we shut down 6 servers in A5 (T280203#7190053), you can replace those with new servers.
[13:59:30] <wikibugs>	 (03PS3) 10Ema: varnish: Server response header in custom error pages [puppet] - 10https://gerrit.wikimedia.org/r/702648 (https://phabricator.wikimedia.org/T285926)
[14:00:28] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mwdebug: bump mediawiki version [deployment-charts] - 10https://gerrit.wikimedia.org/r/702657 (owner: 10Giuseppe Lavagetto)
[14:00:44] <wikibugs>	 10SRE, 10serviceops: bring 43 new mediawiki appserver in eqiad into production - https://phabricator.wikimedia.org/T279309 (10Jelto)
[14:00:46] <icinga-wm>	 PROBLEM - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: git_pull_charts.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:00:59] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] wmcs.ceph: Add the new 17, 19 and 20 OSDs [puppet] - 10https://gerrit.wikimedia.org/r/702655 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro)
[14:01:12] <wikibugs>	 (03PS2) 10David Caro: wmcs.ceph: Add the new 17, 19 and 20 OSDs [puppet] - 10https://gerrit.wikimedia.org/r/702655 (https://phabricator.wikimedia.org/T285858)
[14:01:21] <wikibugs>	 (03PS10) 10Ayounsi: Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806)
[14:01:47] <wikibugs>	 (03PS1) 10Effie Mouzeli: tegola-vector-tiles: bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/702663
[14:02:54] <wikibugs>	 (03Merged) 10jenkins-bot: mwdebug: bump mediawiki version [deployment-charts] - 10https://gerrit.wikimedia.org/r/702657 (owner: 10Giuseppe Lavagetto)
[14:07:28] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] tegola-vector-tiles: bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/702663 (owner: 10Effie Mouzeli)
[14:09:36] <icinga-wm>	 RECOVERY - SSH on logstash2021.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:10:13] <wikibugs>	 (03Merged) 10jenkins-bot: tegola-vector-tiles: bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/702663 (owner: 10Effie Mouzeli)
[14:12:34] <icinga-wm>	 RECOVERY - SSH on mw1279.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:12:38] <wikibugs>	 10SRE, 10Dumps-Generation: Create new group for root access to snapshot*, dumpsdata* and labstore1006,7 with holger in it - https://phabricator.wikimedia.org/T277629 (10ArielGlenn) 05Stalled→03Resolved Hey this is now verified and we're closing. Thanks for your patience, everybody!
[14:17:34] <icinga-wm>	 PROBLEM - Varnish frontend child restarted on cp3059 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3059&var-datasource=esams+prometheus/ops
[14:17:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch ncredir to profile::nginx [puppet] - 10https://gerrit.wikimedia.org/r/697799 (owner: 10Muehlenhoff)
[14:23:34] <wikibugs>	 10SRE, 10Traffic: cp3059 Varnish child crash: Worker Pool Queue does not move - https://phabricator.wikimedia.org/T285953 (10ema)
[14:25:28] <wikibugs>	 (03CR) 10Herron: [C: 03+1] Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis)
[14:27:20] <wikibugs>	 (03PS1) 10Muehlenhoff: Default nginx::profile to light flavour [puppet] - 10https://gerrit.wikimedia.org/r/702669 (https://phabricator.wikimedia.org/T164456)
[14:31:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Default nginx::profile to light flavour [puppet] - 10https://gerrit.wikimedia.org/r/702669 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff)
[14:33:23] <wikibugs>	 (03PS2) 10Muehlenhoff: Default nginx::profile to light flavour [puppet] - 10https://gerrit.wikimedia.org/r/702669 (https://phabricator.wikimedia.org/T164456)
[14:34:18] <papaul>	 can someone from service ops please respond to https://phabricator.wikimedia.org/T285603
[14:35:33] <wikibugs>	 (03PS1) 10Ema: varnish: do not set reason for 428, 429, 431 and 511 [puppet] - 10https://gerrit.wikimedia.org/r/702671 (https://phabricator.wikimedia.org/T285926)
[14:39:53] <wikibugs>	 (03PS2) 10Ema: varnish: do not set reason for 428, 429, 431 and 511 [puppet] - 10https://gerrit.wikimedia.org/r/702671 (https://phabricator.wikimedia.org/T285926)
[14:41:43] <wikibugs>	 10SRE, 10Platform Engineering, 10SRE-Access-Requests, 10Patch-For-Review: Root access to AQS cluster - https://phabricator.wikimedia.org/T285899 (10herron) Looks reasonable to me, and thanks much for writing the patch!  Typically group changes involving full root access are reviewed/approved during the SRE...
[14:43:00] <wikibugs>	 10SRE, 10Platform Engineering, 10SRE-Access-Requests, 10Patch-For-Review: Root access to AQS cluster - https://phabricator.wikimedia.org/T285899 (10herron) p:05Triage→03Medium
[14:43:15] <wikibugs>	 10SRE, 10Traffic: cp3059 Varnish child crash: Worker Pool Queue does not move - https://phabricator.wikimedia.org/T285953 (10ema) Relevant upstream issues:  - https://github.com/varnishcache/varnish-cache/issues/2814 - https://github.com/varnishcache/varnish-cache/issues/2862  Related patch to look into: https...
[14:43:28] <wikibugs>	 (03PS7) 10Elukey: ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927)
[14:44:50] <wikibugs>	 10SRE, 10Traffic: cp3059 Varnish child crash: Worker Pool Queue does not move - https://phabricator.wikimedia.org/T285953 (10ema) p:05Triage→03Medium
[14:44:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927) (owner: 10Elukey)
[14:45:10] <moritzm>	 !log installing glib2.0 security updates on buster
[14:45:14] <wikibugs>	 (03PS2) 10Herron: add fgoodwin (uid=frankie) to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/702439 (https://phabricator.wikimedia.org/T285580)
[14:45:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:46:36] <icinga-wm>	 RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:46:57] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] varnish: do not set reason for 428, 429, 431 and 511 [puppet] - 10https://gerrit.wikimedia.org/r/702671 (https://phabricator.wikimedia.org/T285926) (owner: 10Ema)
[14:51:14] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
[14:51:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:24] <logmsgbot>	 !log jiji@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
[14:51:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:46] <wikibugs>	 (03CR) 10Herron: [C: 03+2] add fgoodwin (uid=frankie) to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/702439 (https://phabricator.wikimedia.org/T285580) (owner: 10Herron)
[14:53:06] <effie>	 !log depool mw2380 for disk repair - T285603
[14:53:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:14] <stashbot>	 T285603: Degraded RAID on mw2380 - https://phabricator.wikimedia.org/T285603
[14:55:09] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on mw2380 - https://phabricator.wikimedia.org/T285603 (10jijiki) @Papaul sorry for the delay, the server can be turned off any time
[14:56:01] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on mw2380 - https://phabricator.wikimedia.org/T285603 (10Papaul) Thank you.
[14:57:37] <logmsgbot>	 !log jiji@cumin1001 conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
[14:57:38] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for fgoodwin - https://phabricator.wikimedia.org/T285580 (10herron) 05Open→03Resolved Hi @FGoodwin, your ldap account has been added to group `wmf`.  I'll transition this to resolved now, but please don't hesitate to reopen if an...
[14:57:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:54] <wikibugs>	 10SRE: Integrate Buster 10.10 point update - https://phabricator.wikimedia.org/T285206 (10MoritzMuehlenhoff)
[14:58:39] <papaul>	 !log poweroff mw2380 for disk replacement 
[14:58:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:02] <wikibugs>	 (03PS2) 10Elukey: backy2: add missing ceph::common dependency to tests [puppet] - 10https://gerrit.wikimedia.org/r/702652 (owner: 10David Caro)
[15:01:06] <icinga-wm>	 PROBLEM - Host mw2380 is DOWN: PING CRITICAL - Packet loss = 100%
[15:01:54] <wikibugs>	 (03PS1) 10David Caro: ceph.keyring: ensure that the bootstrap dir exists [puppet] - 10https://gerrit.wikimedia.org/r/702677
[15:02:14] <wikibugs>	 (03CR) 10Elukey: "Just rebasing to see if jenkins is happy with this change. In case, can we merge to smooth out a bit the current puppet validation checks?" [puppet] - 10https://gerrit.wikimedia.org/r/702652 (owner: 10David Caro)
[15:02:30] <wikibugs>	 (03PS2) 10David Caro: ceph.keyring: ensure that the bootstrap dir exists [puppet] - 10https://gerrit.wikimedia.org/r/702677 (https://phabricator.wikimedia.org/T285858)
[15:02:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ceph.keyring: ensure that the bootstrap dir exists [puppet] - 10https://gerrit.wikimedia.org/r/702677 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro)
[15:03:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ceph.keyring: ensure that the bootstrap dir exists [puppet] - 10https://gerrit.wikimedia.org/r/702677 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro)
[15:03:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks fine, but the creation of  a new access group will need discussion/signoff in next SRE meeting (12th of July)." [puppet] - 10https://gerrit.wikimedia.org/r/702452 (https://phabricator.wikimedia.org/T285899) (owner: 10Eevans)
[15:04:04] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] backy2: add missing ceph::common dependency to tests [puppet] - 10https://gerrit.wikimedia.org/r/702652 (owner: 10David Caro)
[15:04:36] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Some nits inline but overall LGTM, feel free to merge (alerts will auto-deploy)" (033 comments) [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[15:05:46] <wikibugs>	 (03CR) 10David Caro: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/702652 (owner: 10David Caro)
[15:06:04] <wikibugs>	 (03PS3) 10David Caro: ceph.keyring: ensure that the bootstrap dir exists [puppet] - 10https://gerrit.wikimedia.org/r/702677 (https://phabricator.wikimedia.org/T285858)
[15:06:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ceph.keyring: ensure that the bootstrap dir exists [puppet] - 10https://gerrit.wikimedia.org/r/702677 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro)
[15:07:01] <wikibugs>	 (03PS4) 10David Caro: ceph.keyring: ensure that the bootstrap dir exists [puppet] - 10https://gerrit.wikimedia.org/r/702677 (https://phabricator.wikimedia.org/T285858)
[15:07:02] <icinga-wm>	 RECOVERY - Host mw2380 is UP: PING OK - Packet loss = 0%, RTA = 31.57 ms
[15:08:08] <dcaro>	 elukey: merged the test fix, let me know if it helps
[15:08:37] <elukey>	 <3
[15:09:20] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/702669 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff)
[15:09:32] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on mw2380 - https://phabricator.wikimedia.org/T285603 (10Papaul) 05Open→03Resolved Disk replaced. Please go ahead and re-image the server.  thanks
[15:09:34] <icinga-wm>	 PROBLEM - PHP7 jobrunner on mw2380 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[15:10:17] <wikibugs>	 (03PS8) 10Elukey: ml_k8s::master: add profile::kubernetes::node [puppet] - 10https://gerrit.wikimedia.org/r/702645 (https://phabricator.wikimedia.org/T285927)
[15:10:46] <icinga-wm>	 PROBLEM - Check systemd state on snapshot1008 is CRITICAL: CRITICAL - degraded: The following units failed: cirrussearch-dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:11:44] <icinga-wm>	 PROBLEM - PHP7 rendering on mw2380 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[15:12:38] <icinga-wm>	 PROBLEM - SSH on mw2380 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[15:15:08] <icinga-wm>	 PROBLEM - Host mw2380 is DOWN: PING CRITICAL - Packet loss = 100%
[15:15:53] <wikibugs>	 (03CR) 10Bstorm: "Are there cases where the path does not have a directory to create? I only ask because this seems more clever than explicit. It might be a" [puppet] - 10https://gerrit.wikimedia.org/r/702677 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro)
[15:19:46] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on mw2380 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=mw2380&var-datasource=codfw+prometheus/ops
[15:25:08] <wikibugs>	 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10decommission-hardware, 10Patch-For-Review: decommission maps2002.codfw.wmnet - https://phabricator.wikimedia.org/T285938 (10Papaul)
[15:25:50] <wikibugs>	 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10decommission-hardware, 10Patch-For-Review: decommission maps2002.codfw.wmnet - https://phabricator.wikimedia.org/T285938 (10Papaul) 05Open→03Resolved complete
[15:32:49] <wikibugs>	 (03CR) 10Ema: [C: 03+2] varnish: do not set reason for 428, 429, 431 and 511 [puppet] - 10https://gerrit.wikimedia.org/r/702671 (https://phabricator.wikimedia.org/T285926) (owner: 10Ema)
[15:33:07] <wikibugs>	 (03PS2) 10Jdlrobson: Use Vue.js for QuickSurveys on available wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702434 (https://phabricator.wikimedia.org/T285890)
[15:37:50] <wikibugs>	 (03CR) 10Ayounsi: "Thanks!" (033 comments) [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[15:38:04] <wikibugs>	 (03PS11) 10Ayounsi: Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806)
[15:38:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[15:47:59] <wikibugs>	 (03CR) 10MSantos: [C: 03+2] Unify production server and pregeneration images [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/701529 (owner: 10Jgiannelos)
[15:48:14] <wikibugs>	 (03CR) 10Filippo Giunchedi: Move RPKI alerts to Prometheus/AM (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[15:49:11] <wikibugs>	 (03PS12) 10Ayounsi: Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806)
[15:49:16] <wikibugs>	 (03Merged) 10jenkins-bot: Unify production server and pregeneration images [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/701529 (owner: 10Jgiannelos)
[15:49:31] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:49:43] <wikibugs>	 (03CR) 10Ayounsi: Move RPKI alerts to Prometheus/AM (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[15:50:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[15:51:07] <icinga-wm>	 RECOVERY - Check systemd state on ml-serve-ctrl1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:51:23] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:51:35] <wikibugs>	 (03PS13) 10Ayounsi: Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806)
[15:52:31] <wikibugs>	 (03PS3) 10Hnowlan: maps: reimage maps2008 as buster replica in new cluster [puppet] - 10https://gerrit.wikimedia.org/r/702099
[15:52:33] <wikibugs>	 (03PS2) 10Hnowlan: maps: make maps2010 a buster replica of maps2009 [puppet] - 10https://gerrit.wikimedia.org/r/702615 (https://phabricator.wikimedia.org/T269582)
[15:52:35] <wikibugs>	 (03PS1) 10Hnowlan: maps: standardise the maps2.0 config in codfw, remove old nodes [puppet] - 10https://gerrit.wikimedia.org/r/702687 (https://phabricator.wikimedia.org/T269582)
[15:53:29] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Move RPKI alerts to Prometheus/AM [alerts] - 10https://gerrit.wikimedia.org/r/700649 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[15:54:59] <wikibugs>	 (03PS1) 10Reedy: Revert "Replace depricating method IContextSource::getWikiPage && IContextSource::canUseWikiPage" [extensions/ConfirmEdit] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702707
[15:55:50] <wikibugs>	 (03PS4) 10Hnowlan: maps: reimage maps2008 as buster replica in new cluster [puppet] - 10https://gerrit.wikimedia.org/r/702099
[15:56:27] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA: Degraded RAID on db1129 - https://phabricator.wikimedia.org/T285715 (10Cmjohnson) Ticket opened with Dell
[15:56:53] <wikibugs>	 (03PS2) 10Reedy: Revert "Replace depricating method IContextSource::getWikiPage && IContextSource::canUseWikiPage" [extensions/ConfirmEdit] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702707 (https://phabricator.wikimedia.org/T285959)
[15:57:25] <wikibugs>	 (03PS5) 10Hnowlan: maps: reimage maps2008 as buster replica in new cluster [puppet] - 10https://gerrit.wikimedia.org/r/702099
[15:57:27] <wikibugs>	 (03PS3) 10Hnowlan: maps: make maps2010 a buster replica of maps2009 [puppet] - 10https://gerrit.wikimedia.org/r/702615 (https://phabricator.wikimedia.org/T269582)
[15:57:35] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Revert "Replace depricating method IContextSource::getWikiPage && IContextSource::canUseWikiPage" [extensions/ConfirmEdit] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702707 (https://phabricator.wikimedia.org/T285959) (owner: 10Reedy)
[15:58:23] <wikibugs>	 (03PS1) 10Ayounsi: Remove old RPKI Grafana alerts [puppet] - 10https://gerrit.wikimedia.org/r/702688 (https://phabricator.wikimedia.org/T282806)
[15:58:57] <wikibugs>	 (03PS1) 10Razzi: Make analytics-hive temporarily point to an-coord1002 [dns] - 10https://gerrit.wikimedia.org/r/702689
[16:00:04] <jouncebot>	 jbond42 and cdanis: Dear deployers, time to do the Puppet request window deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210701T1600).
[16:01:24] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Make analytics-hive temporarily point to an-coord1002 [dns] - 10https://gerrit.wikimedia.org/r/702689 (owner: 10Razzi)
[16:02:04] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Remove old RPKI Grafana alerts [puppet] - 10https://gerrit.wikimedia.org/r/702688 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[16:02:19] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Remove old RPKI Grafana alerts [puppet] - 10https://gerrit.wikimedia.org/r/702688 (https://phabricator.wikimedia.org/T282806) (owner: 10Ayounsi)
[16:05:50] <wikibugs>	 (03CR) 10David Caro: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/702677 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro)
[16:06:21] <wikibugs>	 (03PS2) 10Razzi: Make analytics-hive temporarily point to an-coord1002 [dns] - 10https://gerrit.wikimedia.org/r/702689
[16:06:25] <wikibugs>	 (03CR) 10Volans: puppet.refresh_certs: don't fail if resources changed (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/701876 (owner: 10David Caro)
[16:06:31] <wikibugs>	 (03PS6) 10Hnowlan: maps: reimage maps2008 as buster replica in new cluster [puppet] - 10https://gerrit.wikimedia.org/r/702099
[16:07:52] <wikibugs>	 10SRE, 10ops-eqiad, 10User-fgiunchedi: Disk failed on thanos-be1003 - https://phabricator.wikimedia.org/T285664 (10Cmjohnson) A ticket has been created with Dell  You have successfully submitted request SR1063937753.
[16:09:46] <wikibugs>	 (03CR) 10David Caro: puppet.refresh_certs: don't fail if resources changed (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/701876 (owner: 10David Caro)
[16:11:16] <wikibugs>	 (03CR) 10Volans: puppet.refresh_certs: don't fail if resources changed (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/701876 (owner: 10David Caro)
[16:11:54] <vgutierrez>	 !log restart varnish-fe on cp3059 - T285953
[16:12:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:04] <stashbot>	 T285953: cp3059 Varnish child crash: Worker Pool Queue does not move - https://phabricator.wikimedia.org/T285953
[16:13:36] <wikibugs>	 (03CR) 10Razzi: [C: 03+2] Make analytics-hive temporarily point to an-coord1002 [dns] - 10https://gerrit.wikimedia.org/r/702689 (owner: 10Razzi)
[16:14:25] <icinga-wm>	 RECOVERY - Varnish frontend child restarted on cp3059 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3059&var-datasource=esams+prometheus/ops
[16:15:25] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[16:18:57] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis)
[16:19:18] <wikibugs>	 (03PS5) 10Btullis: Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 (https://phabricator.wikimedia.org/T285754)
[16:19:24] <wikibugs>	 (03CR) 10Btullis: [V: 03+2 C: 03+2] Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis)
[16:20:02] <brennen>	 !jouncebot now
[16:20:02] <wm-bot>	 a Python reminder bot for deployments. see https://wikitech.wikimedia.org/wiki/Tool:Jouncebot
[16:20:08] <brennen>	 jouncebot now
[16:20:08] <jouncebot>	 For the next 0 hour(s) and 39 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210701T1600)
[16:20:24] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Replace depricating method IContextSource::getWikiPage && IContextSource::canUseWikiPage" [extensions/ConfirmEdit] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702707 (https://phabricator.wikimedia.org/T285959) (owner: 10Reedy)
[16:23:01] <logmsgbot>	 !log reedy@deploy1002 Synchronized php-1.37.0-wmf.12/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: T285959 (duration: 01m 20s)
[16:23:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:23:08] <stashbot>	 T285959: Captcha interface is not shown to unregistered users, page save is not possible - https://phabricator.wikimedia.org/T285959
[16:23:42] <brennen>	 Reedy: all clear?  thinking of rolling back to group0 for T285951, per Krinkle.
[16:23:43] <stashbot>	 T285951: Some section links in search results are redlinks - https://phabricator.wikimedia.org/T285951
[16:23:56] <Reedy>	 Yup
[16:25:06] <brennen>	 cool, thx.
[16:25:37] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:27:33] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:27:57] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
[16:28:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:56] <logmsgbot>	 !log brennen@deploy1002 rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.12"
[16:29:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:13] <wikibugs>	 (03PS1) 10Brennen Bearnes: Revert "group1 wikis to 1.37.0-wmf.12" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702694
[16:30:15] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Revert "group1 wikis to 1.37.0-wmf.12" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702694 (owner: 10Brennen Bearnes)
[16:30:55] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.37.0-wmf.12" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702694 (owner: 10Brennen Bearnes)
[16:33:34] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:33:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:47] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[17:00:05] <jouncebot>	 chrisalbon and accraze: Dear deployers, time to do the Services – Graphoid / ORES deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210701T1700).
[17:00:27] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: EventDispatcher: Ensure we fetch page content from the primary database [extensions/DiscussionTools] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702708 (https://phabricator.wikimedia.org/T285895)
[17:00:36] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: EventDispatcher: Ensure we fetch page content from the primary database [extensions/DiscussionTools] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702709 (https://phabricator.wikimedia.org/T285895)
[17:13:45] <wikibugs>	 10SRE, 10serviceops, 10Patch-For-Review: Delay spinner showing for graphs for 1s - https://phabricator.wikimedia.org/T256641 (10herron) p:05Triage→03Medium
[17:15:35] <wikibugs>	 10SRE, 10observability: mtail testing infrastructure does not report Runtime errors - https://phabricator.wikimedia.org/T285533 (10herron) p:05Triage→03Medium
[17:16:02] <wikibugs>	 10SRE, 10observability, 10good first task: mtail testing infrastructure prints python deprecation warnings - https://phabricator.wikimedia.org/T285534 (10herron) p:05Triage→03Medium
[17:17:36] <wikibugs>	 10SRE, 10observability, 10User-fgiunchedi: Thanos bucket operations sporadic errors - https://phabricator.wikimedia.org/T285835 (10herron) p:05Triage→03High
[17:18:36] <wikibugs>	 10SRE, 10Machine-Learning-Team, 10serviceops, 10Kubernetes, 10Patch-For-Review: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10herron) p:05Triage→03Medium
[17:18:44] <wikibugs>	 (03CR) 10Ayounsi: "This change is ready for review." [homer/public] - 10https://gerrit.wikimedia.org/r/701347 (owner: 10Ayounsi)
[17:19:41] <wikibugs>	 10SRE, 10Commons, 10MediaWiki-File-management, 10SRE-swift-storage, and 4 others: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends". - https://phabricator.wikimedia.org/T270994 (10aaron...
[17:20:32] <wikibugs>	 (03PS4) 10Ayounsi: Port labs-in4/6 to Capirca [homer/public] - 10https://gerrit.wikimedia.org/r/701347 (https://phabricator.wikimedia.org/T285461)
[17:22:06] <wikibugs>	 (03CR) 10Majavah: Port labs-in4/6 to Capirca (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/701347 (https://phabricator.wikimedia.org/T285461) (owner: 10Ayounsi)
[17:23:40] <wikibugs>	 (03PS1) 10Bstorm: cloud nfs: set up cloudstore1009 for DRBD [puppet] - 10https://gerrit.wikimedia.org/r/702701 (https://phabricator.wikimedia.org/T224747)
[17:26:26] <wikibugs>	 10SRE: Please add btullis@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T285936 (10herron) Hi @BTullis, sure, I've just added you to analytics-alerts and you should be receiving these emails now.  For analytics-announce, a subscription request via https://lists.wikimedia....
[17:27:11] <wikibugs>	 10SRE: Please add btullis@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T285936 (10herron) p:05Triage→03Medium
[17:28:00] <wikibugs>	 10SRE, 10Gerrit-Privilege-Requests: Grant Access to mediawiki gerrit group for divec - https://phabricator.wikimedia.org/T285931 (10herron) p:05Triage→03Medium
[17:29:09] <wikibugs>	 10SRE, 10SRE-OnFire, 10observability: Ensure SRE team has a good understanding of how & when to declare an outage on the status page; & it is easy to do so - https://phabricator.wikimedia.org/T285769 (10herron) p:05Triage→03Medium
[17:29:41] <wikibugs>	 10SRE, 10SRE-OnFire, 10observability, 10Patch-For-Review: Automated uploads of minimal & comprehensible timeseries metrics for statuspage display - https://phabricator.wikimedia.org/T285569 (10herron) p:05Triage→03Medium
[17:32:29] <wikibugs>	 10SRE, 10Gerrit-Privilege-Requests: Grant Access to mediawiki gerrit group for divec - https://phabricator.wikimedia.org/T285931 (10Legoktm) @Jdforrester-WMF access to the "mediawiki" group is handled by #mediawiki-gerrit-group-requests per <https://www.mediawiki.org/wiki/Gerrit/Privilege_policy#Requesting_Ger...
[17:34:43] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Redirect https://lists.wikimedia.org/pipermail/foobar/ to https://lists.wikimedia.org/hyperkitty/list/foobar@lists.wikimedia.org/ - https://phabricator.wikimedia.org/T285949 (10herron) p:05Triage→03Medium
[17:34:57] <icinga-wm>	 PROBLEM - Check systemd state on grafana2001 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-var-lib-grafana.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:40:41] <icinga-wm>	 RECOVERY - Check systemd state on grafana2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:40:53] <wikibugs>	 10SRE, 10MediaWiki-Gerrit-Group-Requests: Grant Access to mediawiki gerrit group for divec - https://phabricator.wikimedia.org/T285931 (10Jdforrester-WMF)
[17:42:47] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] cloud nfs: set up cloudstore1009 for DRBD [puppet] - 10https://gerrit.wikimedia.org/r/702701 (https://phabricator.wikimedia.org/T224747) (owner: 10Bstorm)
[17:48:31] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10BTullis) Thanks. I can confirm that I've now been able to access puppetmasters and other servers requiring `ops` group membership.  One thing that doesn't...
[17:52:18] <wikibugs>	 (03PS1) 10Ahmon Dancy: collect both version and tag from wikiversions output [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704
[17:55:04] <wikibugs>	 (03PS5) 10Ayounsi: Port labs-in4/6 to Capirca [homer/public] - 10https://gerrit.wikimedia.org/r/701347 (https://phabricator.wikimedia.org/T285461)
[17:55:23] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10BTullis) Also LibreNMS and Logstash authentication don't seem to let me in. Neither is urgent, just thought I'd let you know in case there is anything els...
[17:56:15] <icinga-wm>	 PROBLEM - SSH on mw1284.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[18:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: #bothumor I � Unicode. All rise for Morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210701T1800).
[18:00:05] <jouncebot>	 Jdlrobson and MatmaRex: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[18:00:12] <MatmaRex>	 hiii
[18:00:15] <urbanecm>	 I can deploy today
[18:00:26] <urbanecm>	 Jdlrobson: around?
[18:01:05] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] EventDispatcher: Ensure we fetch page content from the primary database [extensions/DiscussionTools] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702708 (https://phabricator.wikimedia.org/T285895) (owner: 10Bartosz Dziewoński)
[18:01:14] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] EventDispatcher: Ensure we fetch page content from the primary database [extensions/DiscussionTools] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702709 (https://phabricator.wikimedia.org/T285895) (owner: 10Bartosz Dziewoński)
[18:02:28] <urbanecm>	 MatmaRex: i'll ping you once it's ready to be tested
[18:02:46] <MatmaRex>	 thanks
[18:03:24] <MatmaRex>	 urbanecm: i can test the happy path, but the real verification will be in whether the exceptions stop (and no new ones appear in their place)
[18:03:56] <urbanecm>	 ack. The point of the test in this case is whether it's not _worse_, i guess
[18:04:44] <MatmaRex>	 yeah, just in case we all somehow missed a typo or something
[18:04:56] <MatmaRex>	 i found the "mediawiki-new-errors" logstash dashboard and i'll watch that afterwards
[18:05:55] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM, as discussed the install and cloudcontrol terms could be left out (they'd hit the default allow as the IPs don't match private4 or p" [homer/public] - 10https://gerrit.wikimedia.org/r/701347 (https://phabricator.wikimedia.org/T285461) (owner: 10Ayounsi)
[18:07:32] <wikibugs>	 (03Merged) 10jenkins-bot: EventDispatcher: Ensure we fetch page content from the primary database [extensions/DiscussionTools] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702708 (https://phabricator.wikimedia.org/T285895) (owner: 10Bartosz Dziewoński)
[18:07:35] <wikibugs>	 (03Merged) 10jenkins-bot: EventDispatcher: Ensure we fetch page content from the primary database [extensions/DiscussionTools] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702709 (https://phabricator.wikimedia.org/T285895) (owner: 10Bartosz Dziewoński)
[18:09:15] <urbanecm>	 MatmaRex: excellent
[18:09:40] <urbanecm>	 Jdlrobson: ping?
[18:10:42] <urbanecm>	 MatmaRex: pulled to mwdebug2001, please have a look.
[18:10:53] <MatmaRex>	 yup
[18:13:18] <MatmaRex>	 urbanecm: seems good, i got a notification for this comment: https://test2.wikipedia.org/wiki/Talk:Main_Page#c-Matma_Rex_test_2021-07-01-2021-07-01T18%3A12%3A00.000Z-Matma_Rex-2021-07-01T18%3A11%3A00.000Z
[18:14:02] <urbanecm>	 MatmaRex: i see `Expectation (writes <=) 0 by MediaWiki::restInPeace not met (actual: 2): query-m: DELETE FROM `echo_unread_wikis` WHERE euw_user = N AND euw_wiki = 'X'` but that's probably caused by Echo, right?
[18:14:44] <MatmaRex>	 hmm, yeah, looks unrelated
[18:15:18] <urbanecm>	 Let me just check if it appears in logs before, and if it does, i'll sync
[18:15:38] <MatmaRex>	 urbanecm: looks like this bug: https://phabricator.wikimedia.org/T219592
[18:16:44] <urbanecm>	 yeah, happens quite a lot
[18:16:46] <urbanecm>	 syncing :)
[18:17:37] <wikibugs>	 (03CR) 10Zfilipin: [C: 03+1] "@Mukunda Modell: feel free to merge!" [phabricator/deployment] (wmf/stable) - 10https://gerrit.wikimedia.org/r/697069 (https://phabricator.wikimedia.org/T274579) (owner: 10Sahilgrewalhere)
[18:18:44] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: 6d9043087ec421e1321cd6797934928e2651b1c1: EventDispatcher: Ensure we fetch page content from the primary database (T285895) (duration: 01m 14s)
[18:18:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:18:54] <stashbot>	 T285895: ApiUsageException: There is no revision with ID [REDACTED]. - https://phabricator.wikimedia.org/T285895
[18:20:17] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: 654877f92fa18ae766d693630025c69400cad3ac: EventDispatcher: Ensure we fetch page content from the primary database (T285895) (duration: 01m 12s)
[18:20:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:20:27] <urbanecm>	 MatmaRex: here you go. Anything else I can help with?
[18:20:41] <MatmaRex>	 thanks. hopefully nothing else :D
[18:21:04] <urbanecm>	 great :)
[18:21:26] <urbanecm>	 and by the way, thanks for the toolset your team creates. I like them a lot.
[18:22:05] <icinga-wm>	 PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The following units failed: hive-server2.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:22:07] <icinga-wm>	 PROBLEM - Hive Server on an-coord1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hive.service.server.HiveServer2 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive
[18:23:41] <MatmaRex>	 :D
[18:23:59] <icinga-wm>	 RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:24:01] <icinga-wm>	 RECOVERY - Hive Server on an-coord1001 is OK: PROCS OK: 1 process with command name java, args org.apache.hive.service.server.HiveServer2 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive
[18:36:45] <Jdlrobson>	 urbanecm: hey present now
[18:37:05] <urbanecm>	 hey Jdlrobson 
[18:37:22] <urbanecm>	 can you test it in prod somehow?
[18:37:40] <Jdlrobson>	 yep
[18:37:42] <Jdlrobson>	 if you enable it
[18:37:44] <urbanecm>	 great
[18:37:46] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Use Vue.js for QuickSurveys on available wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702434 (https://phabricator.wikimedia.org/T285890) (owner: 10Jdlrobson)
[18:37:52] <urbanecm>	 enable what? the config patch?
[18:38:18] <Jdlrobson>	 yep that one
[18:38:27] <urbanecm>	 good
[18:38:31] <wikibugs>	 (03Merged) 10jenkins-bot: Use Vue.js for QuickSurveys on available wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702434 (https://phabricator.wikimedia.org/T285890) (owner: 10Jdlrobson)
[18:39:03] <urbanecm>	 Jdlrobson: available at mwdebug2001, please have a look
[18:39:13] <Jdlrobson>	 looking
[18:41:02] <Jdlrobson>	 urbanecm: feel free to sync!
[18:41:21] <urbanecm>	 syncing
[18:42:14] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10SecTeam-Processed, 10Security: New production ssh key for sbassett - https://phabricator.wikimedia.org/T285877 (10sbassett) @herron - thanks, confirmed it's working.  I'll make this task public now.
[18:42:26] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10SecTeam-Processed, 10Security: New production ssh key for sbassett - https://phabricator.wikimedia.org/T285877 (10sbassett)
[18:43:03] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 7995f7abe3b94eb0326064cbbd1d3111f8f21365: Use Vue.js for QuickSurveys on available wikis (T285890) (duration: 01m 09s)
[18:43:09] <urbanecm>	 Jdlrobson: should be live!
[18:43:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:13] <stashbot>	 T285890: Remove OOUI surveys and default to Vue.js - https://phabricator.wikimedia.org/T285890
[18:43:13] <urbanecm>	 anything else i can help with?
[18:43:25] <Jdlrobson>	 thanks urbanecm nothing else I Need
[18:43:32] <urbanecm>	 good!
[18:44:02] <Jdlrobson>	 sorry again for the lateness. Obviously need to check my IRC notification settings as something is going wrong there..
[18:44:57] <urbanecm>	 or maybe your browser just blocks irccloud notifications?
[18:45:05] <urbanecm>	 (according to your whois, you're on irccloud)
[18:49:20] <wikibugs>	 10SRE: Redundant bootloaders for software RAID - https://phabricator.wikimedia.org/T215183 (10RobH)
[18:49:32] <wikibugs>	 10SRE: Redundant bootloaders for software RAID - https://phabricator.wikimedia.org/T215183 (10RobH)
[18:49:34] <wikibugs>	 10SRE, 10DC-Ops: documented procedure for replacing disks in software RAID servers - https://phabricator.wikimedia.org/T220842 (10RobH) 05Resolved→03Open
[18:49:36] <wikibugs>	 10SRE, 10DC-Ops: documented procedure for replacing disks in software RAID servers - https://phabricator.wikimedia.org/T220842 (10RobH) 05Open→03Resolved This is now documented on https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Sw_raid_rebuild_directions  The responsibility of it belonging to the ser...
[18:50:02] <logmsgbot>	 !log otto@deploy1002 Started deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883]
[18:50:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:35] <wikibugs>	 (03PS1) 10Ahmon Dancy: Trigger update-train-versions job at end of wmf-publish pipeline [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702710
[18:50:47] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+2] Trigger update-train-versions job at end of wmf-publish pipeline [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702710 (owner: 10Ahmon Dancy)
[18:50:59] <wikibugs>	 (03PS1) 10Razzi: Point analytics-hive to an-coord1001.eqiad.wmnet once again [dns] - 10https://gerrit.wikimedia.org/r/702731
[18:51:24] <wikibugs>	 10SRE, 10DC-Ops: documented procedure for replacing disks in software RAID servers - https://phabricator.wikimedia.org/T220842 (10RobH) a:05RobH→03None
[18:51:37] <wikibugs>	 (03PS2) 10Legoktm: mysql_legacy: Re-add x2 and properly support active/active sections [software/spicerack] - 10https://gerrit.wikimedia.org/r/701474 (https://phabricator.wikimedia.org/T285519)
[18:53:06] <wikibugs>	 (03PS3) 10Legoktm: mysql_legacy: Re-add x2 and properly support active/active sections [software/spicerack] - 10https://gerrit.wikimedia.org/r/701474 (https://phabricator.wikimedia.org/T285519)
[18:53:38] <wikibugs>	 (03Abandoned) 10Legoktm: sre.switchdc.mediawiki: Handle x2 specially [cookbooks] - 10https://gerrit.wikimedia.org/r/701475 (https://phabricator.wikimedia.org/T285519) (owner: 10Legoktm)
[18:54:27] <wikibugs>	 (03CR) 10Razzi: [C: 03+2] Point analytics-hive to an-coord1001.eqiad.wmnet once again [dns] - 10https://gerrit.wikimedia.org/r/702731 (owner: 10Razzi)
[18:55:22] <logmsgbot>	 !log otto@deploy1002 Finished deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883] (duration: 05m 19s)
[18:55:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:57:50] <wikibugs>	 (03PS1) 10Brennen Bearnes: Consistently normalize Title::mFragment before setting [core] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702711 (https://phabricator.wikimedia.org/T285951)
[18:57:56] <wikibugs>	 (03PS1) 10Andrew Bogott: Added a dummy password for profile::openstack::eqiad1::ldap_user_pass [labs/private] - 10https://gerrit.wikimedia.org/r/702732
[18:58:26] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Added a dummy password for profile::openstack::eqiad1::ldap_user_pass [labs/private] - 10https://gerrit.wikimedia.org/r/702732 (owner: 10Andrew Bogott)
[18:58:35] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+1] Consistently normalize Title::mFragment before setting [core] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702711 (https://phabricator.wikimedia.org/T285951) (owner: 10Brennen Bearnes)
[18:58:41] <wikibugs>	 (03PS1) 10Bstorm: cloud nfs: commit solidly to the drbd setup step 1 [puppet] - 10https://gerrit.wikimedia.org/r/702733 (https://phabricator.wikimedia.org/T224747)
[18:59:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mysql_legacy: Re-add x2 and properly support active/active sections [software/spicerack] - 10https://gerrit.wikimedia.org/r/701474 (https://phabricator.wikimedia.org/T285519) (owner: 10Legoktm)
[18:59:07] <brennen>	 Pchelolo: i'll go ahead and deploy the above
[18:59:15] <wikibugs>	 10SRE, 10DBA, 10Datacenter-Switchover, 10Patch-For-Review: Figure out how x2 should be handled in DC switchover - https://phabricator.wikimedia.org/T285519 (10Legoktm) Most of the spicerack confusion and trouble is that x2 matches `A:db-core` even  though it's more like parsercache. If it didn't match that...
[18:59:39] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Consistently normalize Title::mFragment before setting [core] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702711 (https://phabricator.wikimedia.org/T285951) (owner: 10Brennen Bearnes)
[18:59:52] <Pchelolo>	 brennen: hopefully it will work
[19:00:04] <jouncebot>	 brennen and marxarelli: How many deployers does it take to do MediaWiki train - American Version deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210701T1900).
[19:00:06] <brennen>	 testable on mwdebug?
[19:00:15] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:00:40] <wikibugs>	 (03CR) 10Legoktm: "Not sure what to do about "mccabe: MC0001 / MysqlLegacy.get_core_dbs is too complex (11)"" [software/spicerack] - 10https://gerrit.wikimedia.org/r/701474 (https://phabricator.wikimedia.org/T285519) (owner: 10Legoktm)
[19:02:09] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:04:43] <wikibugs>	 (03PS1) 10Andrew Bogott: Added a dummy password for profile::openstack::eqiad1::ldap_user_pass again [labs/private] - 10https://gerrit.wikimedia.org/r/702735
[19:04:50] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Added a dummy password for profile::openstack::eqiad1::ldap_user_pass again [labs/private] - 10https://gerrit.wikimedia.org/r/702735 (owner: 10Andrew Bogott)
[19:05:50] <MatmaRex>	 if this is reported more broadly, it may be a wmf.12 deployment blocker: https://phabricator.wikimedia.org/T285966 - some pages were displaying with missing styles on group1 wikis
[19:06:23] <MatmaRex>	 can anyone have a look and try to reproduce? i reproduced it once, but it fixed itself after refreshing the page.
[19:07:19] <icinga-wm>	 RECOVERY - Disk space on releases1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=releases1002&var-datasource=eqiad+prometheus/ops
[19:07:38] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] "PCC is correct https://puppet-compiler.wmflabs.org/compiler1002/30081/" [puppet] - 10https://gerrit.wikimedia.org/r/702733 (https://phabricator.wikimedia.org/T224747) (owner: 10Bstorm)
[19:07:54] <brennen>	 MatmaRex: hmm - i do see some weirdness on commons
[19:08:06] <brennen>	 but noting that wmf.12 isn't on group1 at the moment
[19:08:27] <MatmaRex>	 oh. heh
[19:08:45] <MatmaRex>	 well, probably not a blocker then. i'm silly for not checking
[19:09:12] <brennen>	 it did make it to group1 earlier; rolled back around...  hrm, 16:28
[19:09:16] <brennen>	 (UTC)
[19:09:37] <wikibugs>	 (03Merged) 10jenkins-bot: Trigger update-train-versions job at end of wmf-publish pipeline [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/702710 (owner: 10Ahmon Dancy)
[19:10:05] <wikibugs>	 (03PS2) 10Andrew Bogott: toolforge: add a profile for installing the disable_tool script [puppet] - 10https://gerrit.wikimedia.org/r/701928 (https://phabricator.wikimedia.org/T170355)
[19:14:53] <wikibugs>	 (03PS3) 10Andrew Bogott: toolforge: add a profile for installing the disable_tool script [puppet] - 10https://gerrit.wikimedia.org/r/701928 (https://phabricator.wikimedia.org/T170355)
[19:16:42] <wikibugs>	 (03PS1) 10Bstorm: cloud nfs: cleaning up the non-drbd setup [puppet] - 10https://gerrit.wikimedia.org/r/702738 (https://phabricator.wikimedia.org/T224747)
[19:16:56] <wikibugs>	 (03PS1) 10Btullis: Grant icinga permissions to btullis [puppet] - 10https://gerrit.wikimedia.org/r/702739 (https://phabricator.wikimedia.org/T285754)
[19:17:55] <wikibugs>	 (03PS2) 10Btullis: Grant icinga permissions to btullis [puppet] - 10https://gerrit.wikimedia.org/r/702739 (https://phabricator.wikimedia.org/T285754)
[19:18:04] <RhinosF1>	 brennen: could it be a cache issue then?
[19:18:16] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] toolforge: add a profile for installing the disable_tool script [puppet] - 10https://gerrit.wikimedia.org/r/701928 (https://phabricator.wikimedia.org/T170355) (owner: 10Andrew Bogott)
[19:18:17] <logmsgbot>	 !log brennen@deploy1002 Synchronized php-1.37.0-wmf.12/.pipeline/config.yaml: Backport: [[gerrit:702168|Trigger update-train-versions job at end of wmf-publish pipeline]] (duration: 01m 08s)
[19:18:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:18:35] <wikibugs>	 10SRE, 10serviceops, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10wkandek) Thanks everybody for the feedback on the communications for the DC switchover process. We will spend some time this quarter (Q1) in working...
[19:18:39] <brennen>	 RhinosF1: i don't know enough to rule that out
[19:19:58] <wikibugs>	 (03Merged) 10jenkins-bot: Consistently normalize Title::mFragment before setting [core] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702711 (https://phabricator.wikimedia.org/T285951) (owner: 10Brennen Bearnes)
[19:21:18] <brennen>	 (that is, yeah, it feels like something caching related, but i'm not sure how that is or isn't interacting with train.)
[19:23:45] <RhinosF1>	 Jdlrobson: Krinkle ^
[19:24:06] <RhinosF1>	 You both have Minerva changes
[19:24:42] <RhinosF1>	 brennen: my guess is some change causes css to be rendered that is invalid on the previous version
[19:24:51] <RhinosF1>	 So rolling forward is fine but not back
[19:24:58] <brennen>	 that seems plausible.
[19:25:21] <brennen>	 rolling forward ought to provide a test of that, at any rate.
[19:25:25] <RhinosF1>	 There used to be a file somewhere to invalidate cache from a specific time
[19:25:37] <RhinosF1>	 But very quick searches do not find
[19:25:50] <Krinkle>	 Which patch we are talking about, how far is it deployed?
[19:26:20] <brennen>	 .12 is currently only on group1, seeing broken link styling on https://commons.m.wikimedia.org/wiki/Main_Page
[19:26:22] <RhinosF1>	 Krinkle: mobile website seems to have styling issues on group 1 wikis post rollback
[19:26:25] <brennen>	 sorry:  only on group0
[19:27:06] <Krinkle>	 is there a task? define broken. It looks ok at a glance
[19:27:20] <RhinosF1>	 Krinkle: https://phabricator.wikimedia.org/T285966
[19:27:42] <Krinkle>	 The first screenshot there is in VisualEditor
[19:27:53] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] Grant icinga permissions to btullis [puppet] - 10https://gerrit.wikimedia.org/r/702739 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis)
[19:28:51] <Krinkle>	 what error do you see at https://commons.m.wikimedia.org/wiki/Main_Page ?
[19:29:06] <brennen>	 Krinkle: uploaded a screenshot to the task
[19:29:36] <Krinkle>	 OK, it only happens when logged-out it seems
[19:29:41] <Krinkle>	 I can repro
[19:29:43] <RhinosF1>	 Yeah
[19:29:49] <RhinosF1>	 I can too logged out
[19:30:22] <Krinkle>	 I defer to Jdlrobson. It looks like there are missing styles indeed. It is adding a link background image, but not setting any background position or no-repeat rules, so it just repeats
[19:30:26] <Krinkle>	 and lots of other styles missing as well
[19:30:56] <RhinosF1>	 Krinkle: commons ain't on .12 though yet
[19:30:58] <RhinosF1>	 It was
[19:31:04] <RhinosF1>	 But rolled back
[19:31:12] <RhinosF1>	 So cache must be blameable then
[19:31:14] <Krinkle>	 the cached copy uses wmf.12 indeed
[19:31:24] <Krinkle>	 When I add ?snlala it renders fine logged0out
[19:31:36] <Krinkle>	 so I guess this means Minerva made breaking changes to its style modules
[19:31:44] <RhinosF1>	 Krinkle: do we know how to ban cache
[19:31:54] <Krinkle>	 loading a new module that doesn't exist before, with styles that arent covered by the same module names before
[19:32:09] <Krinkle>	 usually that needs to be primed first in a separate deployment 7 days ahead
[19:32:36] <wikibugs>	 (03PS1) 10Andrew Bogott: profile::toolforge::disable_tool: fix typos [puppet] - 10https://gerrit.wikimedia.org/r/702745 (https://phabricator.wikimedia.org/T170355)
[19:32:37] <Krinkle>	 wmf.12 cache: <link rel="stylesheet" href="/w/load.php?lang=en&amp;modules=ext.discussionTools.init.styles%7Cext.wikimediaBadges%7Cmediawiki.hlist%7Cmediawiki.ui.button%2Cicon%7Cmobile.init.styles%7Cskins.minerva.base.styles%7Cskins.minerva.content.styles.images%7Cskins.minerva.icons.wikimedia%7Cskins.minerva.mainMenu.icons%2Cstyles%7Cskins.minerva.mainPage.styles&amp;only=styles&amp;skin=minerva"/>
[19:32:44] <RhinosF1>	 Krinkle: there used to be a file to ban a certain time
[19:32:50] <Krinkle>	 wmf.11 query bypass: <link rel="stylesheet" href="/w/load.php?lang=en&amp;modules=ext.discussionTools.init.styles%7Cext.wikimediaBadges%7Cmediawiki.hlist%7Cmediawiki.ui.button%2Cicon%7Cmobile.init.styles%7Cskins.minerva.base.styles%7Cskins.minerva.content.styles%7Cskins.minerva.content.styles.images%7Cskins.minerva.icons.wikimedia%7Cskins.minerva.mainMenu.icons%2Cstyles%7Cskins.minerva.mainPage.styles&amp;only=styles&amp;skin=minerva"/>
[19:32:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::toolforge::disable_tool: fix typos [puppet] - 10https://gerrit.wikimedia.org/r/702745 (https://phabricator.wikimedia.org/T170355) (owner: 10Andrew Bogott)
[19:33:03] <Krinkle>	 RhinosF1: these are not static files, the purge logic for /static as for site logos does not apply here
[19:33:05] <Krinkle>	 this is about HTML caches
[19:33:08] <Krinkle>	 which are part of parser cache etc.
[19:33:21] <RhinosF1>	 Ah
[19:33:25] <Krinkle>	 the HTML loads a stylesheet, but the problem isn't with the stylesheet itself
[19:33:47] <Krinkle>	 OK, so "skins.minerva.content.styles" is missing from the wmf.12 entry
[19:33:52] <Krinkle>	 I may be to blame for this.
[19:34:02] <RhinosF1>	 https://gerrit.wikimedia.org/r/q/0d61c78f
[19:34:06] <RhinosF1>	 You will be
[19:34:14] <logmsgbot>	 !log brennen@deploy1002 Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: [[gerrit:702711|Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
[19:34:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:34:21] <stashbot>	 T285951: Some section links in search results are redlinks - https://phabricator.wikimedia.org/T285951
[19:34:34] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Grant icinga permissions to btullis [puppet] - 10https://gerrit.wikimedia.org/r/702739 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis)
[19:34:59] <Krinkle>	 Yeah, I shoould have kept skins.minerva.content.styles defined as-is
[19:35:19] <Krinkle>	 I made for forward-compat but not back-compat which only surfaces during rollback
[19:35:32] <Krinkle>	 depending on whether roll forward happens quicker than me patching it, I can patch it.
[19:35:52] <wikibugs>	 (03PS2) 10Andrew Bogott: profile::toolforge::disable_tool: standardize on the singular 'disable_tool' [puppet] - 10https://gerrit.wikimedia.org/r/702745 (https://phabricator.wikimedia.org/T170355)
[19:35:53] <Krinkle>	 easy enough to keep it defined, then the new stylesheet urls that are in some HTML caches now will automatically start working
[19:35:55] <brennen>	 Krinkle: other blockers are resolved as soon as a sync finishes here momentarily.
[19:35:57] <logmsgbot>	 !log brennen@deploy1002 Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: [[gerrit:702711|Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
[19:36:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:36:07] <Krinkle>	 brennen: ok, I'll leave it then, but as a lesson for next time.
[19:36:26] <brennen>	 k, thanks for investigating all.
[19:36:41] <brennen>	 rolling forward momentarily.
[19:37:26] <wikibugs>	 (03PS1) 10Brennen Bearnes: group1 wikis to 1.37.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702747
[19:37:28] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] group1 wikis to 1.37.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702747 (owner: 10Brennen Bearnes)
[19:37:44] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] profile::toolforge::disable_tool: standardize on the singular 'disable_tool' [puppet] - 10https://gerrit.wikimedia.org/r/702745 (https://phabricator.wikimedia.org/T170355) (owner: 10Andrew Bogott)
[19:38:19] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.37.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702747 (owner: 10Brennen Bearnes)
[19:39:51] <logmsgbot>	 !log brennen@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
[19:39:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:41:04] <logmsgbot>	 !log brennen@deploy1002 Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 12s)
[19:41:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:41:58] <wikibugs>	 10SRE, 10Commons, 10MediaWiki-File-management, 10SRE-swift-storage, and 4 others: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends". - https://phabricator.wikimedia.org/T270994 (10Ezarat...
[19:42:48] <wikibugs>	 (03PS1) 10Brennen Bearnes: all wikis to 1.37.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702749
[19:42:50] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] all wikis to 1.37.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702749 (owner: 10Brennen Bearnes)
[19:43:31] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.37.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702749 (owner: 10Brennen Bearnes)
[19:43:37] <dduvall>	 brennen: here and log watching
[19:45:04] <logmsgbot>	 !log brennen@deploy1002 rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.12
[19:45:05] <dduvall>	 saw a strange pulse of UserIdentityValue deprecation errors
[19:45:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:45:14] <dduvall>	 perhaps transient
[19:46:39] <RhinosF1>	 Krinkle: looks perfect again now
[19:46:59] <Krinkle>	 RhinosF1: details at https://phabricator.wikimedia.org/T266361#7191087
[19:47:18] <Krinkle>	 the wmf.12 code was already backwards compatible, so all caches work again now both new and old.
[19:47:33] <Krinkle>	 I forgot to test the rollback scenario, woudl have been a 1 line fix.
[19:48:54] <RhinosF1>	 Krinkle: no problem, thanks for helping work it out
[20:00:36] <icinga-wm>	 RECOVERY - SSH on mw1284.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:14:40] <icinga-wm>	 PROBLEM - SSH on logstash2021.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:17:13] <wikibugs>	 10SRE: Please add btullis@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T285936 (10BTullis) 05Open→03Resolved a:03BTullis Many thanks.
[20:17:32] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:21:37] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "thanks!  Pretty sure I've created that dir by hand a few times" [puppet] - 10https://gerrit.wikimedia.org/r/702677 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro)
[20:22:20] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:23:17] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/702677 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro)
[20:25:30] <wikibugs>	 10SRE, 10Performance-Team, 10serviceops, 10MW-1.36-notes, and 3 others: Enable "/*/mw-with-onhost-tier/" route for MediaWiki where safe - https://phabricator.wikimedia.org/T264604 (10Krinkle) So the problem appears to be bad interactions between WANCache's "pre-emptive regeneration" feature (as prompted by...
[20:51:32] <Amir1>	 I don't know where to ask this but where can I find node12 docker images in WMF's docker-registry? catalog (https://docker-registry.wikimedia.org/v2/_catalog) didn't have any node12 images
[20:57:23] <Amir1>	 found it, it seems it's not in production namespace but I could use the releng namespace for my usecase
[21:04:14] <wikibugs>	 10SRE, 10Thumbor: Thumbor fails to render PNG with "Failed to convert image convert: IDAT: invalid distance too far back", returns 429 "Too Many Requests" - https://phabricator.wikimedia.org/T285875 (10TheDJ) This is a libpng error (via image magick). Likely these images were always problematic, but the proble...
[21:06:44] <wikibugs>	 (03PS2) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[21:09:24] <wikibugs>	 (03PS3) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[21:13:08] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:15:02] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:23:22] <legoktm>	 Amir1: there's no production node12 image since no one has started moving services to bullseye yet
[21:23:50] <Amir1>	 noted
[21:23:53] <Amir1>	 thanks
[21:24:30] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:26:26] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:27:55] <RhinosF1>	 brennen: still around?
[21:28:50] <brennen>	 RhinosF1: here
[21:29:14] <RhinosF1>	 brennen: does https://en.wikipedia.org/wiki/Special:EmailUser/RhinosF1 say $1 for you
[21:29:20] <RhinosF1>	 Can reproduce for all users
[21:29:45] <brennen>	 say $1 where?
[21:29:50] <legoktm>	 RhinosF1: screenshot please
[21:30:15] <RhinosF1>	 https://usercontent.irccloud-cdn.com/file/Dy8Dwc7T/1625175010.JPG
[21:30:31] <RhinosF1>	 brennen: everywhere that should be my username in the notice
[21:30:34] <RhinosF1>	 legoktm: ^
[21:31:00] <legoktm>	 I see your name properly substituted in the message
[21:31:23] <brennen>	 yeah, i can't repro while logged in
[21:32:39] <thcipriani>	 happening for other users besides yourself?
[21:32:50] <thcipriani>	 (ftr: I can't reproduce either)
[21:33:16] <RhinosF1>	 thcipriani: by the lack of reports I'd guess no
[21:33:23] <RhinosF1>	 legoktm: very strange
[21:33:43] <legoktm>	 RhinosF1: what does ?uselang=qqx say?
[21:33:58] <legoktm>	 it should be "(emailpagetext: RhinosF1)"
[21:34:41] <RhinosF1>	 legoktm: enter the username in https://en.wikipedia.org/wiki/Special:EmailUser
[21:34:46] <wikibugs>	 (03PS4) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[21:34:51] <RhinosF1>	 I don't get /username on the end
[21:34:55] <RhinosF1>	 Didn't that used to happen
[21:35:21] <legoktm>	 the URL is off yes, but the top message is still correct
[21:35:41] <wikibugs>	 (03PS1) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754
[21:36:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (owner: 10Ryan Kemper)
[21:36:15] <legoktm>	 https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+log/refs/heads/master/includes/specials/SpecialEmailUser.php seems unlikely to be a recent regression in any case
[21:36:22] <RhinosF1>	 legoktm: (emailpagetext: RhinosF1) when I add the /RhinosF1
[21:36:28] <RhinosF1>	 Just takes me to enter username otherwise
[21:36:45] <RhinosF1>	 $1 still on /RhinosF1 too
[21:37:30] <legoktm>	 err. to clarify, when visiting https://en.wikipedia.org/wiki/Special:EmailUser/RhinosF1 you see $1, and when visiting https://en.wikipedia.org/wiki/Special:EmailUser/RhinosF1?uselang=qqx you see (emailpagetext: RhinosF1) ?
[21:37:38] <RhinosF1>	 Yes
[21:38:28] <legoktm>	 and the $1 shows up if you just visit the link?
[21:38:53] <wikibugs>	 (03PS1) 10Ahmon Dancy: Temporarily disable notification for security patch failures [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702755
[21:39:23] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+2] Temporarily disable notification for security patch failures [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702755 (owner: 10Ahmon Dancy)
[21:40:05] <wikibugs>	 (03Merged) 10jenkins-bot: Temporarily disable notification for security patch failures [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702755 (owner: 10Ahmon Dancy)
[21:40:24] <dancy>	 jouncebot help
[21:40:24] <jouncebot>	 **** JounceBot Help ****
[21:40:24] <jouncebot>	 JounceBot is a deployment helper bot for the Wikimedia Foundation.
[21:40:24] <jouncebot>	 You can find my source at https://github.com/mattofak/jouncebot
[21:40:24] <jouncebot>	 Available commands:
[21:40:24] <jouncebot>	  HELP    Print all commands known to the server.
[21:40:25] <jouncebot>	  NEXT    Get the next deployment event(s if they happen at the same time).
[21:40:25] <jouncebot>	  NOW     Get the current deployment event(s) or the time until the next.
[21:40:25] <RhinosF1>	 legoktm: ye
[21:40:25] <jouncebot>	  REFRESH Refresh my knowledge about deployments.
[21:40:32] <dancy>	 jouncebot hnow
[21:40:34] <dancy>	 jouncebot now
[21:40:35] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 19 minute(s)
[21:40:41] <dancy>	 Excellent.
[21:41:46] <legoktm>	 RhinosF1: I'm pretty stumped. file a bug? 
[21:42:32] <legoktm>	 it doesn't seem blocker worthy unless other people experirence it too but still weird
[21:43:08] <logmsgbot>	 !log dancy@deploy1002 Synchronized .pipeline/config.yaml: Config: [[gerrit:702755|Temporarily disable notification for security patch failures]] (duration: 00m 57s)
[21:43:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:44:28] <RhinosF1>	 https://phabricator.wikimedia.org/T285985
[21:44:59] * bd808 sees the outdated bits of `jouncebot help` and winces
[21:47:31] <wikibugs>	 (03PS5) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[21:51:40] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): cloudvirt1038: PCIe error - https://phabricator.wikimedia.org/T276922 (10wiki_willy) Just got off the phone with Dell.  It's escalated on their side, and they're going to sync up tomorrow in figuring out a solution for this, which could very w...
[21:53:45] <wikibugs>	 (03PS1) 10BryanDavis: Update `help` message [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/702758
[21:55:09] <wikibugs>	 (03CR) 10Legoktm: [C: 03+2] Update `help` message [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/702758 (owner: 10BryanDavis)
[21:55:39] <wikibugs>	 (03Merged) 10jenkins-bot: Update `help` message [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/702758 (owner: 10BryanDavis)
[21:58:48] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:58:50] <wikibugs>	 (03PS6) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[22:00:26] <wikibugs>	 (03PS7) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[22:00:40] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[22:05:24] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] d/changelog: Prepare for 0.75 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/700095 (owner: 10Bstorm)
[22:06:52] <wikibugs>	 (03Merged) 10jenkins-bot: d/changelog: Prepare for 0.75 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/700095 (owner: 10Bstorm)
[22:11:23] <wikibugs>	 (03PS8) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[22:12:52] <wikibugs>	 (03PS9) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[22:14:18] <wikibugs>	 (03PS10) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[22:15:19] <wikibugs>	 (03PS11) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[22:17:56] <wikibugs>	 (03PS12) 10Ahmon Dancy: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824)
[22:24:12] <bd808>	 jouncebot: help
[22:24:12] <jouncebot>	 **** JounceBot Help ****
[22:24:12] <jouncebot>	 JounceBot is a deployment helper bot for the Wikimedia movement.
[22:24:12] <jouncebot>	 Source at: https://gerrit.wikimedia.org/g/wikimedia/bots/jouncebot
[22:24:13] <jouncebot>	 Available commands:
[22:24:13] <jouncebot>	  HELP    Print all commands known to the server.
[22:24:13] <jouncebot>	  NEXT    Get the next deployment event(s if they happen at the same time).
[22:24:13] <jouncebot>	  NOW     Get the current deployment event(s) or the time until the next.
[22:24:14] <jouncebot>	  REFRESH Refresh my knowledge about deployments.
[22:24:23] <bd808>	 that looks a bit better :)
[22:24:28] <dancy>	 Nice work
[22:27:06] <dancy>	 jouncebot now
[22:27:06] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 32 minute(s)
[22:27:26] <wikibugs>	 (03PS1) 10Zabe: Add 'editautoreviewprotected' protection level to hewikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702761 (https://phabricator.wikimedia.org/T275076)
[22:27:35] <urbanecm>	 !log Start server-side upload for 1 video file (T285682)
[22:27:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:27:44] <stashbot>	 T285682: Server side upload for Victorgrigas - https://phabricator.wikimedia.org/T285682
[22:28:07] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+2] "Tested at https://releases-jenkins.wikimedia.org/job/mediawiki-config-pipeline-wmf-publish/197/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824) (owner: 10Ahmon Dancy)
[22:29:30] <wikibugs>	 (03Merged) 10jenkins-bot: Use train-versions.json to map from version to image tag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702704 (https://phabricator.wikimedia.org/T282824) (owner: 10Ahmon Dancy)
[22:31:12] <logmsgbot>	 !log dancy@deploy1002 Synchronized .pipeline: Config: [[gerrit:702704|Use train-versions.json to map from version to image tag (T282824)]] (duration: 00m 57s)
[22:31:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:31:20] <stashbot>	 T282824: MW container image build workflow vs docker-registry caching - https://phabricator.wikimedia.org/T282824
[22:33:02] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Redirect https://lists.wikimedia.org/pipermail/foobar/ to https://lists.wikimedia.org/hyperkitty/list/foobar@lists.wikimedia.org/ - https://phabricator.wikimedia.org/T285949 (10Legoktm) a:03Legoktm
[22:36:00] <urbanecm>	 !log Start server-side upload for 1 video file (T285789)
[22:36:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:36:08] <stashbot>	 T285789: Server side upload for 고려 - https://phabricator.wikimedia.org/T285789
[22:37:35] <urbanecm>	 !log Start server-side upload for 1 video file (T285182)
[22:37:41] <wikibugs>	 (03PS3) 10Cwhite: logstash: add ECS transition support for Oslo structured logs [puppet] - 10https://gerrit.wikimedia.org/r/695563 (https://phabricator.wikimedia.org/T234565)
[22:37:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:43] <stashbot>	 T285182: Server side upload for PantheraLeo1359531 - https://phabricator.wikimedia.org/T285182
[22:38:15] <wikibugs>	 (03PS2) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754
[22:38:19] <wikibugs>	 (03PS1) 10Legoktm: mailman3: Redirect pipermail list archive index to hyperkitty [puppet] - 10https://gerrit.wikimedia.org/r/702767 (https://phabricator.wikimedia.org/T285949)
[22:39:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (owner: 10Ryan Kemper)
[22:39:29] <wikibugs>	 (03PS3) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053)
[22:40:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)
[22:41:02] <wikibugs>	 (03CR) 10Legoktm: [C: 03+2] mailman3: Redirect pipermail list archive index to hyperkitty [puppet] - 10https://gerrit.wikimedia.org/r/702767 (https://phabricator.wikimedia.org/T285949) (owner: 10Legoktm)
[22:41:08] <wikibugs>	 (03CR) 10Ryan Kemper: "PCC looks good. Thanks for all your work on this @Muehlenhoff" [puppet] - 10https://gerrit.wikimedia.org/r/702580 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff)
[22:41:12] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] elastic: Switch to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/702580 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff)
[22:47:01] <wikibugs>	 (03PS4) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053)
[22:47:13] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Redirect https://lists.wikimedia.org/pipermail/foobar/ to https://lists.wikimedia.org/hyperkitty/list/foobar@lists.wikimedia.org/ - https://phabricator.wikimedia.org/T285949 (10Legoktm) ` km@cashew ~> curl -I "https://lists.wikimedia.org/pipermail/xtools/...
[22:47:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)
[22:50:52] <wikibugs>	 (03PS5) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053)
[22:51:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)
[22:55:20] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: add ECS transition support for Oslo structured logs [puppet] - 10https://gerrit.wikimedia.org/r/695563 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite)
[22:56:13] <wikibugs>	 10SRE, 10Services, 10Wikibase-Quality-Constraints, 10Wikidata, and 3 others: Deploy Shellbox instance (shellbox-constraints) for Wikidata constraint regexes - https://phabricator.wikimedia.org/T285104 (10Addshore) Any idea on a timeline for being able to get this ticket moving? It's blocking T176312 which...
[23:00:05] <jouncebot>	 brennen: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for US Backport and Config training deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210701T2300).
[23:00:05] <jouncebot>	 zabe: A patch you scheduled for US Backport and Config training is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:00:16] <zabe>	 o/
[23:05:48] <thcipriani>	 hey zabe I was reading the back-and-forth on the task and I can't quite tell what's going on: would it be OK to move this patch to a later window after folks have had some time to review?
[23:06:45] <zabe>	 ok, sounds fair
[23:07:29] <wikibugs>	 (03PS6) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053)
[23:07:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)
[23:08:15] <wikibugs>	 (03PS1) 10Thcipriani: deployment training: readme whitespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702774
[23:08:46] <thcipriani>	 thanks for understanding zabe <3
[23:09:31] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] deployment training: readme whitespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702774 (owner: 10Thcipriani)
[23:09:54] <wikibugs>	 (03PS7) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053)
[23:10:14] <wikibugs>	 (03Merged) 10jenkins-bot: deployment training: readme whitespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702774 (owner: 10Thcipriani)
[23:10:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)
[23:14:15] <wikibugs>	 (03PS8) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053)
[23:15:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)
[23:17:16] <icinga-wm>	 RECOVERY - SSH on logstash2021.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:18:32] <wikibugs>	 (03PS9) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053)
[23:19:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)
[23:20:50] <wikibugs>	 (03PS10) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053)
[23:21:38] <logmsgbot>	 !log thcipriani@deploy1002 Synchronized README: Config: [[gerrit:702774|deployment training: readme whitespace]] (duration: 00m 57s)
[23:21:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:22:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)
[23:22:29] <wikibugs>	 (03CR) 10Ryan Kemper: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)
[23:25:47] <wikibugs>	 (03PS1) 10Thcipriani: Revert "deployment training: readme whitespace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702777
[23:27:05] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] Revert "deployment training: readme whitespace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702777 (owner: 10Thcipriani)
[23:27:44] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "deployment training: readme whitespace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702777 (owner: 10Thcipriani)
[23:29:47] <logmsgbot>	 !log thcipriani@deploy1002 Synchronized README: Config: [[gerrit:702777|Revert "deployment training: readme whitespace"]] (duration: 00m 56s)
[23:29:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:35:47] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] prometheus: don't deploy alerts to 'global' instance by default [puppet] - 10https://gerrit.wikimedia.org/r/702599 (https://phabricator.wikimedia.org/T284810) (owner: 10Filippo Giunchedi)
[23:37:19] <wikibugs>	 (03PS11) 10Ryan Kemper: cirrus: systemd timer for readahead script [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053)
[23:38:38] <icinga-wm>	 PROBLEM - NFS Share Volume Space /srv/scratch on cloudstore1008 is CRITICAL: DISK CRITICAL - free space: /srv/scratch 595580 MB (15% inode=99%): https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage%23NFS_volume_cleanup https://grafana.wikimedia.org/d/50z0i4XWz/tools-overall-nfs-storage-utilization?orgId=1
[23:40:50] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)
[23:57:33] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] "Seems overall reasonable. Wish we had a better place for the binary, but i think this will do." [puppet] - 10https://gerrit.wikimedia.org/r/702754 (https://phabricator.wikimedia.org/T264053) (owner: 10Ryan Kemper)