[01:16:04] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:34:56] Bulk output? Again? [01:35:56] Now mass input [01:48:16] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: / (spec from root) is CRITICAL: Test spec from root returned the unexpected status 503 (expecting: 200): /_info (retrieve service info) is CRITICAL: Test retrieve service info returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid [01:52:26] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [02:07:16] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.38.0-wmf.4 [core] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730138 [02:07:18] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.38.0-wmf.4 [core] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730138 (owner: 10TrainBranchBot) [02:22:45] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:22:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:28:04] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:28:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:36:11] (03Merged) 10jenkins-bot: Branch commit for wmf/1.38.0-wmf.4 [core] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730138 (owner: 10TrainBranchBot) [02:41:46] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:41:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:44:24] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:44:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:10:26] RECOVERY - dump of s5 in codfw on alert1001 is OK: Last dump for s5 at codfw (db2101.codfw.wmnet:3315) taken on 2021-10-12 00:00:02 (79 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [03:34:46] RECOVERY - dump of s5 in eqiad on alert1001 is OK: Last dump for s5 at eqiad (db1150.eqiad.wmnet:3315) taken on 2021-10-12 00:00:01 (79 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [03:44:49] 10SRE, 10Language-Team (Language-2021-October-December): Remove Matxin Key from Production - https://phabricator.wikimedia.org/T292635 (10KartikMistry) a:03KartikMistry [03:45:09] !log kartik@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [03:45:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:48:17] !log kartik@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . [03:48:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:54:09] !log kartik@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . [03:54:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:56:32] !log cxserver: Remove Matxin Key from Production (T292635) [03:56:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:56:37] T292635: Remove Matxin Key from Production - https://phabricator.wikimedia.org/T292635 [03:57:03] 10SRE, 10Language-Team (Language-2021-October-December): Remove Matxin Key from Production - https://phabricator.wikimedia.org/T292635 (10KartikMistry) [03:57:11] 10SRE, 10Language-Team (Language-2021-October-December): Remove Matxin Key from Production - https://phabricator.wikimedia.org/T292635 (10KartikMistry) p:05Triage→03Medium [04:06:58] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 235, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:07:06] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:09:00] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 236, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:09:08] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:15:06] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 235, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:15:14] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:19:06] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 236, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:19:20] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:08:40] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 235, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:08:50] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:29:12] 10SRE, 10serviceops, 10Sustainability: Jobrunner on Buster occasional timeout on codfw file upload - https://phabricator.wikimedia.org/T275752 (10Joe) p:05Medium→03High So, this problem clearly hasn't been solved. We need to isolate where the problem is; the easiest way to test is imho as follows: * Tem... [06:13:07] (03CR) 10Elukey: "Added Filippo to the code review, LGTM but I am a little ignorant about this part of rsyslog :)" [puppet] - 10https://gerrit.wikimedia.org/r/729957 (https://phabricator.wikimedia.org/T288348) (owner: 10Btullis) [06:13:34] (03Abandoned) 10Elukey: Move stat100[5,8] to AMD ROCm 4.3.1 [puppet] - 10https://gerrit.wikimedia.org/r/726578 (https://phabricator.wikimedia.org/T287267) (owner: 10Elukey) [06:13:48] 10SRE, 10LDAP-Access-Requests: Grant Access to for - https://phabricator.wikimedia.org/T293053 (10Aklapper) 05Open→03Stalled Hi @Jacquelinechen, thanks for taking the time to report this and welcome to Wikimedia Phabricator! Please fill in *all* fields above, and also connect y... [06:17:14] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 236, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:17:26] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:20:21] (03CR) 10Elukey: [C: 03+1] "The change LGTM! On one side, I'd prefer to remember people that they have to insert a password every week (as opposed to every two), but " [puppet] - 10https://gerrit.wikimedia.org/r/727349 (https://phabricator.wikimedia.org/T268985) (owner: 10Btullis) [06:27:52] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:29:42] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 235, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:04:48] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me, I haven't found the time to work on https://phabricator.wikimedia.org/T287763, but we can go ahead with bumping this." [puppet] - 10https://gerrit.wikimedia.org/r/727349 (https://phabricator.wikimedia.org/T268985) (owner: 10Btullis) [07:17:10] (03PS1) 10Majavah: P::toolforge: force remove /srv/composer on buster [puppet] - 10https://gerrit.wikimedia.org/r/730143 [07:19:36] 10SRE, 10LDAP-Access-Requests: Grant Access to for - https://phabricator.wikimedia.org/T293053 (10Jacquelinechen) Dear Aklapper, Thank you for your email. I have connected Mediawiki user staff account to the phabricator account. Re missing information, is this what you require? Th... [07:20:56] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 197 probes of 630 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [07:22:12] !log installing RT security updates [07:22:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:39] (03CR) 10Giuseppe Lavagetto: "Overall LGTM, I'm just not sure of why you removed a volumemount declaration." [deployment-charts] - 10https://gerrit.wikimedia.org/r/726933 (https://phabricator.wikimedia.org/T289224) (owner: 10Majavah) [07:25:54] jouncebot: nowandnext [07:25:54] No deployments scheduled for the next 3 hour(s) and 34 minute(s) [07:25:54] In 3 hour(s) and 34 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T1100) [07:26:29] 10SRE, 10LDAP-Access-Requests: Grant Access to for - https://phabricator.wikimedia.org/T293053 (10Aklapper) 05Stalled→03Open [07:27:02] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 37 probes of 630 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [07:29:44] (03CR) 10Urbanecm: [C: 03+2] Revert "Mentee overview: Truncate long usernames" [extensions/GrowthExperiments] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/728247 (https://phabricator.wikimedia.org/T292224) (owner: 10Urbanecm) [07:29:55] (03CR) 10Urbanecm: [C: 03+2] UncachedMenteeOverviewDataProvider::getFilteredMenteesForMentor: Cast IDs to ints [extensions/GrowthExperiments] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/729914 (https://phabricator.wikimedia.org/T290609) (owner: 10Urbanecm) [07:29:59] (03CR) 10Urbanecm: [C: 03+2] updateMenteeData: Collect more profiling data [extensions/GrowthExperiments] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/729915 (https://phabricator.wikimedia.org/T290609) (owner: 10Urbanecm) [07:33:57] !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster [07:34:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:23] !log run kafka preferred-replica-election on kafka-main2001 to rebalance partition leaders after the last topic moves - T288825 [07:40:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:29] T288825: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 [07:52:36] (03CR) 10ZPapierski: Added spicerack.kafka with offset transfer function (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [07:53:02] we might get a page soon, cf. https://librenms.wikimedia.org/graphs/to=1634025000/id=8198/type=port_bits/from=1633938600/ [07:53:20] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/728380 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [07:53:21] XioNoX: need any help? [07:53:44] volans: sure, need to track down what's going on, about to look at netflow [07:54:24] (03Merged) 10jenkins-bot: Revert "Mentee overview: Truncate long usernames" [extensions/GrowthExperiments] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/728247 (https://phabricator.wikimedia.org/T292224) (owner: 10Urbanecm) [07:54:27] (03Merged) 10jenkins-bot: UncachedMenteeOverviewDataProvider::getFilteredMenteesForMentor: Cast IDs to ints [extensions/GrowthExperiments] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/729914 (https://phabricator.wikimedia.org/T290609) (owner: 10Urbanecm) [07:54:30] (03Merged) 10jenkins-bot: updateMenteeData: Collect more profiling data [extensions/GrowthExperiments] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/729915 (https://phabricator.wikimedia.org/T290609) (owner: 10Urbanecm) [07:55:11] it's not clear cut, maybe just a brief spike [07:55:32] I'm not seeing anything related in a couple of graphs I'm looking at [07:56:05] oh, it's for eqiad<->codfw, I thought it was transit [07:56:21] and the spike is gone, so we're fine [07:56:31] it's a noticeable increase of traffic though [07:57:14] ohh, the telia link is down [07:57:24] https://librenms.wikimedia.org/device/device=1/tab=port/port=6815/ [07:57:48] we need to get the ball rolling to add capacity to those links [07:58:08] !log urbanecm@deploy1002 Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: 17dc3aa, e0ca905, c0f4f4e: GrowthExperiments backports (T292224, T290609, T290609) (duration: 00m 59s) [07:58:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:16] T290609: Make mentee overview module's updateMenteeData.php scale better - https://phabricator.wikimedia.org/T290609 [07:58:16] T292224: Mentee overview: Truncate long usernames to 15 characters - https://phabricator.wikimedia.org/T292224 [07:58:30] Service window end: 2021-10-12 12:00 UTC [08:00:38] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [08:00:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:16] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [08:03:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:05:38] ack [08:10:24] PROBLEM - k8s API server requests latencies on kubemaster2001 is CRITICAL: instance=10.192.0.56 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [08:11:58] RECOVERY - k8s API server requests latencies on kubemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [08:12:03] (03PS1) 10Kormat: db1127: Re-enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/730146 (https://phabricator.wikimedia.org/T292956) [08:14:09] (03CR) 10Giuseppe Lavagetto: [C: 03+2] scaffold: bump common templates version [deployment-charts] - 10https://gerrit.wikimedia.org/r/730002 (owner: 10Giuseppe Lavagetto) [08:17:50] RECOVERY - haproxy failover on dbproxy1019 is OK: OK check_failover servers up 16 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [08:18:08] (03PS3) 10David Caro: puppet.PuppetHost.get_ca_server: use only the last line [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) [08:18:14] (03CR) 10David Caro: [V: 03+2 C: 03+2] puppet.PuppetHost.get_ca_server: use only the last line [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [08:18:19] (03Merged) 10jenkins-bot: scaffold: bump common templates version [deployment-charts] - 10https://gerrit.wikimedia.org/r/730002 (owner: 10Giuseppe Lavagetto) [08:19:30] (03CR) 10Kormat: [C: 03+2] db1127: Re-enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/730146 (https://phabricator.wikimedia.org/T292956) (owner: 10Kormat) [08:25:33] 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Muniza - https://phabricator.wikimedia.org/T292955 (10MunizaA) [08:28:04] (03CR) 10Jelto: [V: 03+1 C: 03+2] hiera::deployment_server add helm3 deploy user to deployment server [puppet] - 10https://gerrit.wikimedia.org/r/730022 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [08:31:04] !log kormat@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17453 and previous config saved to /var/cache/conftool/dbconfig/20211012-083103-kormat.json [08:31:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:10] T292956: Bring db1127 back into service - https://phabricator.wikimedia.org/T292956 [08:39:51] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I get why you want to make declaring specific volumes easier, but I would not remove the ability for whoever writes charts to declare volu" [deployment-charts] - 10https://gerrit.wikimedia.org/r/570137 (owner: 10Jeena Huneidi) [08:43:12] PROBLEM - SSH on thumbor1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [08:46:07] !log kormat@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17454 and previous config saved to /var/cache/conftool/dbconfig/20211012-084607-kormat.json [08:46:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:14] T292956: Bring db1127 back into service - https://phabricator.wikimedia.org/T292956 [08:51:40] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] "Deployment notes: purge both URLs from the cache afterwards. https://wikitech.wikimedia.org/wiki/Backport_windows/Deployers#Purging" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/727497 (https://phabricator.wikimedia.org/T292742) (owner: 10Juan90264) [09:01:11] !log kormat@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17455 and previous config saved to /var/cache/conftool/dbconfig/20211012-090111-kormat.json [09:01:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:17] T292956: Bring db1127 back into service - https://phabricator.wikimedia.org/T292956 [09:03:28] (03CR) 10Jgiannelos: [C: 03+2] tile-pregeneration: Exit envoy sidecar gracefully [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/729959 (https://phabricator.wikimedia.org/T283159) (owner: 10Jgiannelos) [09:04:24] (03CR) 10Kormat: [C: 03+1] mariadb: Add easy-to-use wrapper for pt-kill [puppet] - 10https://gerrit.wikimedia.org/r/726857 (owner: 10Jcrespo) [09:04:35] (03Merged) 10jenkins-bot: tile-pregeneration: Exit envoy sidecar gracefully [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/729959 (https://phabricator.wikimedia.org/T283159) (owner: 10Jgiannelos) [09:11:24] (03CR) 10Effie Mouzeli: tile-pregeneration: Exit envoy sidecar gracefully (031 comment) [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/729959 (https://phabricator.wikimedia.org/T283159) (owner: 10Jgiannelos) [09:16:15] !log kormat@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17456 and previous config saved to /var/cache/conftool/dbconfig/20211012-091614-kormat.json [09:16:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:21] T292956: Bring db1127 back into service - https://phabricator.wikimedia.org/T292956 [09:16:47] 10SRE, 10Release-Engineering-Team: Reduce latency of new Scap releases - https://phabricator.wikimedia.org/T292646 (10hashar) [09:17:29] 10SRE, 10Release-Engineering-Team (Doing): Reduce latency of new Scap releases - https://phabricator.wikimedia.org/T292646 (10hashar) [09:20:34] (03CR) 10Effie Mouzeli: Rename main cluster to services (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/725003 (owner: 10Alexandros Kosiaris) [09:25:36] PROBLEM - Check systemd state on search-loader2001 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_mjolnir-kafka-bulk-daemon.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:26:49] (03PS2) 10David Caro: cumin::manageable: Move the reboot-host script there [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) [09:26:51] (03CR) 10David Caro: cumin::manageable: Move the reboot-host script there (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [09:34:14] (03PS3) 10David Caro: base::sysctl::core_dumps: move core_dumps to their own class [puppet] - 10https://gerrit.wikimedia.org/r/728457 [09:34:16] (03CR) 10David Caro: base::sysctl::core_dumps: move core_dumps to their own class (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/728457 (owner: 10David Caro) [09:40:30] (03PS1) 10Jgiannelos: tile-pregeneration: Fix wording about envoy [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/730149 [09:44:28] (03PS1) 10Jgiannelos: tegola-vector-tiles: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/730153 [09:45:23] (03PS1) 10Muehlenhoff: Restrict the use of component/ganeti216 to Stretch hosts [puppet] - 10https://gerrit.wikimedia.org/r/730154 [09:45:55] (03PS2) 10Muehlenhoff: Restrict the use of component/ganeti216 to Stretch hosts [puppet] - 10https://gerrit.wikimedia.org/r/730154 [09:49:43] (03CR) 10Muehlenhoff: cumin::manageable: Move the reboot-host script there (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [09:50:17] (03CR) 10Filippo Giunchedi: "LGTM but letting Cole comment/vote" [puppet] - 10https://gerrit.wikimedia.org/r/729957 (https://phabricator.wikimedia.org/T288348) (owner: 10Btullis) [09:51:06] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/728457 (owner: 10David Caro) [09:51:15] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/730154 (owner: 10Muehlenhoff) [09:52:02] PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is CRITICAL: 50.87 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [09:53:56] RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [09:54:04] (03PS1) 10Filippo Giunchedi: pontoon: add lb to observability [puppet] - 10https://gerrit.wikimedia.org/r/730156 [09:54:37] (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: add lb to observability [puppet] - 10https://gerrit.wikimedia.org/r/730156 (owner: 10Filippo Giunchedi) [09:59:49] (03PS3) 10ZPapierski: [WIP] Add kafka position transfer to wdqs cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) [10:01:31] (03PS9) 10Jelto: services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) [10:01:49] (03CR) 10David Caro: cumin::manageable: Move the reboot-host script there (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [10:04:40] (03CR) 10jerkins-bot: [V: 04-1] services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [10:04:46] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add kafka position transfer to wdqs cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [10:06:55] (03PS3) 10David Caro: cumin::manageable: Move the reboot-host script there [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) [10:07:02] (03CR) 10David Caro: cumin::manageable: Move the reboot-host script there (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [10:09:04] (03CR) 10Muehlenhoff: [C: 03+2] Restrict the use of component/ganeti216 to Stretch hosts [puppet] - 10https://gerrit.wikimedia.org/r/730154 (owner: 10Muehlenhoff) [10:09:12] (03PS4) 10David Caro: cumin::target: Move the reboot-host script there [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) [10:10:11] (03CR) 10David Caro: cumin::target: Move the reboot-host script there (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [10:11:51] (03CR) 10Jgiannelos: [C: 03+2] tegola-vector-tiles: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/730153 (owner: 10Jgiannelos) [10:12:03] !log jmm@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2025.codfw.wmnet with OS buster [10:12:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:13:01] (03PS36) 10ZPapierski: Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) [10:13:12] (03PS37) 10ZPapierski: Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) [10:13:30] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/media-list/{title} (Get media list from test page) is CRITICAL: Test Get media list from test page returned the unexpected status 503 (expecting: 200): /{domain}/v1/page/metadata/{title} (retrieve extended metadata for Video article on English Wikipedia) is CRITICAL: Test retrieve extended metadata for Video article on English Wikipedia returned th [10:13:30] cted status 503 (expecting: 200): /{domain}/v1/page/mobile-sections/{title} (retrieve test page via mobile-sections) is CRITICAL: Test retrieve test page via mobile-sections returned the unexpected status 503 (expecting: 200): /{domain}/v1/page/summary/{title} (Get summary for test page) is CRITICAL: Test Get summary for test page returned the unexpected status 503 (expecting: 200): /{domain}/v1/transform/html/to/mobile-html/{title} (Get [10:13:30] mobile HTML for test page) is CRITICAL: Test Get preview mobile HTML for test page returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [10:13:40] PROBLEM - restbase endpoints health on restbase2018 is CRITICAL: /en.wikipedia.org/v1/transform/wikitext/to/html/{title} (Transform wikitext to html) is CRITICAL: Test Transform wikitext to html returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [10:13:58] (03CR) 10Ema: [C: 03+2] cache: exclude single backend experiment from pooled ATS backends [puppet] - 10https://gerrit.wikimedia.org/r/726912 (https://phabricator.wikimedia.org/T288106) (owner: 10Ema) [10:15:44] RECOVERY - restbase endpoints health on restbase2018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [10:16:00] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/suggest/sections/{title}/{from}/{to} (Suggest source sections to translate) is CRITICAL: Test Suggest source sections to translate returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/CX [10:16:03] (03Merged) 10jenkins-bot: tegola-vector-tiles: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/730153 (owner: 10Jgiannelos) [10:16:28] !log cp4027: enable and run puppet to test https://gerrit.wikimedia.org/r/726912 T288106 [10:16:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:34] T288106: Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106 [10:17:38] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [10:20:08] RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX [10:22:05] (03CR) 10Volans: [C: 03+1] "LGTM, ship it! :)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [10:22:34] !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp4028.ulsfo.wmnet,service=ats-be [10:22:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:15] !log jgiannelos@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [10:23:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:42] !log depool/repool ats-be on cp4028 to verify updates to /etc/varnish/directors.frontend.vcl on cp4027 keep on working fine T288106 [10:23:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:49] T288106: Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106 [10:24:30] !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp4028.ulsfo.wmnet,service=ats-be [10:24:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:13] (03CR) 10Volans: [C: 03+2] Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [10:26:24] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received: /v2/suggest/sections/{title}/{from}/{to} (Suggest source sections to translate) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX [10:30:02] !log apply https://gerrit.wikimedia.org/r/726912 to all A:cp nodes T288106 [10:30:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:09] T288106: Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106 [10:30:18] 10SRE, 10User-herron: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 (10elukey) All topics done except `__consumer_offsets`, seemed a little delicate to move and not super traffic-intensive. Proposal: I'd bump the partitions of the topmost traffic intensi... [10:30:28] RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX [10:32:20] (03Merged) 10jenkins-bot: Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [10:35:48] 10SRE, 10User-herron: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 (10Joe) >>! In T288825#7419227, @elukey wrote: > All topics done except `__consumer_offsets`, seemed a little delicate to move and not super traffic-intensive. > > Proposal: I'd bump the... [10:40:04] (03CR) 10David Caro: toolforge: wheel of misfortune: dry run on buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/729593 (https://phabricator.wikimedia.org/T282949) (owner: 10Majavah) [10:43:27] (03PS1) 10Jgiannelos: tegola-vector-tiles: Debug staging pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/730159 [10:45:12] (03PS3) 10Majavah: toolforge: wheel of misfortune: dry run on buster [puppet] - 10https://gerrit.wikimedia.org/r/729593 (https://phabricator.wikimedia.org/T282949) [10:45:31] (03CR) 10Majavah: toolforge: wheel of misfortune: dry run on buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/729593 (https://phabricator.wikimedia.org/T282949) (owner: 10Majavah) [10:45:48] 10SRE, 10Release-Engineering-Team (Doing): Reduce latency of new Scap releases - https://phabricator.wikimedia.org/T292646 (10jijiki) @Legoktm is working on a cookbook to speed up packaging of scap https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/727605. The rollout process has to stay as it is though (... [10:46:44] <_joe_> jouncebot: next [10:46:45] In 0 hour(s) and 13 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T1100) [10:46:51] (03CR) 10Effie Mouzeli: [C: 03+1] tegola-vector-tiles: Debug staging pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/730159 (owner: 10Jgiannelos) [10:47:29] (03PS2) 10Volans: dhcp: add support for MAC address based config [software/spicerack] - 10https://gerrit.wikimedia.org/r/730030 (https://phabricator.wikimedia.org/T269855) [10:48:05] (03CR) 10Giuseppe Lavagetto: static.php: correctly report a bad request (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728552 (owner: 10Giuseppe Lavagetto) [10:50:39] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "I like the simplified version." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728553 (https://phabricator.wikimedia.org/T285232) (owner: 10Giuseppe Lavagetto) [10:52:20] (03PS4) 10Bartosz Dziewoński: Remove NS_MAIN from wgExtraSignatureNamespaces on most 'special' wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725015 (https://phabricator.wikimedia.org/T291630) [10:53:46] !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests [10:53:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:51] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests [10:53:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:28] RECOVERY - Ensure hosts are not performing a change on every puppet run on cumin2002 is OK: OK: all nodes running as expected https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes [10:58:03] !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet [10:58:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T1100). [11:00:04] Juan_90264, cormacparle, _joe_, _joe_, and MatmaRex: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:19] I can deploy today [11:00:22] <_joe_> o/ [11:00:22] hello [11:00:29] congrats to _joe_ for being listed twice [11:00:46] <_joe_> urbanecm: my bad, copy-pasta at the last minute :D [11:00:51] Juan's not around, skipping (for now) [11:01:03] o/ [11:01:03] can't find cormacparle either [11:01:13] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [11:01:34] _joe_: do you want to self-service, or should I do it for you? [11:01:52] <_joe_> urbanecm: I'd prefer to get an independent +2, but ofc I can deploy myself if needed :) [11:01:58] cparle’s change also hasn’t been cherry-picked to a wmf branch yet afaict [11:02:17] oh, listed as config, but is a backport in fact [11:02:41] !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2003.codfw.wmnet [11:02:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:46] (03PS1) 10Volans: CHANGELOG: add changelogs for release v1.0.5 [software/spicerack] - 10https://gerrit.wikimedia.org/r/730162 [11:03:49] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v1.0.5 [software/spicerack] - 10https://gerrit.wikimedia.org/r/730162 (owner: 10Volans) [11:03:51] (03CR) 10Urbanecm: [C: 03+2] static.php: correctly report a bad request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728552 (owner: 10Giuseppe Lavagetto) [11:04:08] _joe_: it looks ^^ can't be reasonably tested, right? [11:04:25] (03PS1) 10Volans: sre.ganeti.makevm: add rollback support [cookbooks] - 10https://gerrit.wikimedia.org/r/730163 [11:04:29] <_joe_> urbanecm: I can craft a fcgi request to test it, but I dont' think it's worth doing [11:04:37] (03Merged) 10jenkins-bot: static.php: correctly report a bad request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728552 (owner: 10Giuseppe Lavagetto) [11:04:52] ack [11:04:56] i'll just sync it [11:05:10] <_joe_> it's really a zero-risk patch afaict, but let's just test it doesn't break static.php first on mwdebug1002, I can do it [11:05:40] <_joe_> lmk when the patch is on the deployment server :) [11:06:01] too late, i already ran scap :D [11:06:16] <_joe_> ahah [11:06:18] <_joe_> ok [11:06:34] !log urbanecm@deploy1002 Synchronized w/static.php: e77ae17efb34723598fc69e87109944384df442a: static.php: correctly report a bad request (duration: 00m 57s) [11:06:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:43] canaries passed at least 🙂 [11:07:03] <_joe_> I can see the wikis on an empty cache while being logged in, so I assume it works :D [11:07:20] <_joe_> the second patch is slightly riskier, I'd like to only deploy it to mwdebug1002 first [11:07:31] sure [11:08:12] (03CR) 10Urbanecm: [C: 03+2] static.php: Add support for /static/current rewrites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728553 (https://phabricator.wikimedia.org/T285232) (owner: 10Giuseppe Lavagetto) [11:08:24] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:08:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:08:29] (03CR) 10Jgiannelos: [C: 03+2] tegola-vector-tiles: Debug staging pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/730159 (owner: 10Jgiannelos) [11:09:00] (03Merged) 10jenkins-bot: static.php: Add support for /static/current rewrites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728553 (https://phabricator.wikimedia.org/T285232) (owner: 10Giuseppe Lavagetto) [11:09:38] _joe_: pulled to mwdebug1001 [11:09:50] <_joe_> urbanecm: ok gimme a few minutes to test it [11:09:51] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v1.0.5 [software/spicerack] - 10https://gerrit.wikimedia.org/r/730162 (owner: 10Volans) [11:09:54] sure [11:11:00] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:11:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:16] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 236, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:13:02] (03Merged) 10jenkins-bot: tegola-vector-tiles: Debug staging pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/730159 (owner: 10Jgiannelos) [11:13:46] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [11:13:46] (03PS1) 10Volans: Upstream release v1.0.5 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/730187 [11:14:58] !log jgiannelos@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [11:15:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:10] <_joe_> urbanecm: uhm let's rollback, something isn't right [11:16:23] <_joe_> yeah definitely, let me revert [11:17:16] (03PS1) 10Giuseppe Lavagetto: Revert "static.php: Add support for /static/current rewrites" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730178 [11:17:42] <_joe_> urbanecm: should I just merge it? [11:19:07] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Revert "static.php: Add support for /static/current rewrites" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730178 (owner: 10Giuseppe Lavagetto) [11:19:26] <_joe_> I'll figure out what's wrong with the patch later [11:19:32] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:19:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:51] (03Merged) 10jenkins-bot: Revert "static.php: Add support for /static/current rewrites" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730178 (owner: 10Giuseppe Lavagetto) [11:19:55] (03CR) 10Volans: [C: 03+2] Upstream release v1.0.5 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/730187 (owner: 10Volans) [11:21:04] <_joe_> ok, done, you can proceed with the rest of deployments urbanecm [11:22:08] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:22:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:39] _joe_: thanks for the rv, got distracted a bit [11:27:59] (03PS5) 10Urbanecm: Remove NS_MAIN from wgExtraSignatureNamespaces on most 'special' wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725015 (https://phabricator.wikimedia.org/T291630) (owner: 10Bartosz Dziewoński) [11:28:04] (03CR) 10Urbanecm: [C: 03+2] Remove NS_MAIN from wgExtraSignatureNamespaces on most 'special' wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725015 (https://phabricator.wikimedia.org/T291630) (owner: 10Bartosz Dziewoński) [11:28:50] (03Merged) 10jenkins-bot: Remove NS_MAIN from wgExtraSignatureNamespaces on most 'special' wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725015 (https://phabricator.wikimedia.org/T291630) (owner: 10Bartosz Dziewoński) [11:29:22] MatmaRex: your patch is at mwdebug1001, can you test? [11:29:36] (03PS3) 10Hashar: Split canary jobrunner to their own role [puppet] - 10https://gerrit.wikimedia.org/r/724694 (https://phabricator.wikimedia.org/T291870) [11:29:41] looking [11:29:52] thanks [11:30:40] urbanecm: looks good [11:30:44] thanks, syncing [11:30:45] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:30:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:06] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 860ea0944d6dc1e6b5061eb84eec378eb5ac8441: Remove NS_MAIN from wgExtraSignatureNamespaces on most special wikis (T291630) (duration: 00m 57s) [11:32:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:12] T291630: Main namespace should not be listed in $wgExtraSignatureNamespaces on most wikis (Commons, MediaWiki.org) - https://phabricator.wikimedia.org/T291630 [11:32:20] MatmaRex: should be live [11:32:49] thanks! [11:33:01] np! [11:33:23] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:33:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:33:51] Cormac PM'ed me via slack, saying his patch is not needed anymore. [11:33:54] So, we're done. [11:34:16] !log UTC morning B&C window done [11:34:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:52] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet [11:40:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:02] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:42:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:44:50] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:44:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:37] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet [11:46:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:28] (03PS10) 10Jelto: services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) [11:47:48] 10SRE, 10SRE-swift-storage, 10User-fgiunchedi: Put ms-be10[64-67] in service - https://phabricator.wikimedia.org/T290546 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is complete, dispersion is at 100% and replication cycle will be finished in 1-2 days. Cluster is now at ~70% average fs utilization [11:47:51] 10SRE, 10ops-eqiad, 10DC-Ops: Q1:(Need By: ASAP) rack/setup/install ms-be10[64-67] - https://phabricator.wikimedia.org/T285808 (10fgiunchedi) [11:49:12] (03CR) 10jerkins-bot: [V: 04-1] services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [11:49:41] !log `kafka topics --alter --topic eqiad.resource-purge --partitions 5` on kafka-main2001 - T288825 [11:49:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:47] T288825: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 [11:49:51] (03CR) 10Majavah: apple-search: New chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/726933 (https://phabricator.wikimedia.org/T289224) (owner: 10Majavah) [11:50:34] (03PS1) 10Giuseppe Lavagetto: Revert "Revert "static.php: Add support for /static/current rewrites"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730182 [11:50:36] _joe_: ^ got a chart question for you, if you have a moment [11:52:30] (03PS2) 10Giuseppe Lavagetto: Revert "Revert "static.php: Add support for /static/current rewrites"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730182 [11:53:14] <_joe_> majavah: in a bit, I was about to go afk for lunch, I'll get to it afterwards :) [11:53:35] <_joe_> urbanecm: I found the culprit btw, a brown paper bag mistake on my part https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/730182/1..2/w/static.php [11:53:49] thanks! [11:54:02] _joe_: nice! [11:54:28] <_joe_> majavah: oh I think you declared the volumemount but not the volume? [11:54:46] <_joe_> but I'll take a better look later [11:58:24] !log `kafka topics --alter --topic codfw.resource-purge --partitions 5` on kafka-main2001 - T288825 [11:58:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:30] T288825: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 [12:09:24] !log [12:09:24] elukey: Message missing. Nothing logged. [12:09:27] 10SRE, 10ops-eqiad, 10DBA: Bad ram on db1127 - https://phabricator.wikimedia.org/T292366 (10Kormat) [12:09:58] !log `kafka topics --alter --topic eqiad.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - T288825 [12:10:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:03] T288825: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 [12:10:18] !log `kafka topics --alter --topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - T288825 [12:10:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:21] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 97 probes of 631 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [12:15:22] !log `kafka topics --alter --topic eqiad.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - T288825 [12:15:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:15:30] !log `kafka topics --alter --topic codfw.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - T288825 [12:15:31] T288825: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 [12:15:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:15] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 41 probes of 631 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [12:18:49] 10SRE, 10Traffic, 10User-ema: Create runbook for VarnishTrafficDrop alert, change dashboard link - https://phabricator.wikimedia.org/T292820 (10ema) Runbook created: https://wikitech.wikimedia.org/wiki/Monitoring/VarnishTrafficDrop [12:20:35] 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.11 point update - https://phabricator.wikimedia.org/T292838 (10MoritzMuehlenhoff) [12:21:45] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for KCVelaga (WMF) - https://phabricator.wikimedia.org/T292992 (10TAndic) Access for @KCVelaga_WMF approved from the manager side, @JAnstee_WMF can chime in as well if needed [12:21:51] (03CR) 10EllenR: [C: 03+1] "Looks good, only question is the list of Merge conflicts and if that is something that needs to be reviewed (probably newbie lack of knowl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728490 (https://phabricator.wikimedia.org/T292459) (owner: 10Jhernandez) [12:23:59] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [12:24:27] (03PS1) 10Ema: VarnishTrafficDrop: add runbook and change dashboard link [alerts] - 10https://gerrit.wikimedia.org/r/730193 (https://phabricator.wikimedia.org/T292820) [12:25:45] !log uploaded spicerack_1.0.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia [12:25:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:26:33] (03CR) 10jerkins-bot: [V: 04-1] VarnishTrafficDrop: add runbook and change dashboard link [alerts] - 10https://gerrit.wikimedia.org/r/730193 (https://phabricator.wikimedia.org/T292820) (owner: 10Ema) [12:30:07] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.68 ms [12:44:30] (03PS1) 10Volans: kafka: docstrings minor improvements [software/spicerack] - 10https://gerrit.wikimedia.org/r/730195 [12:46:56] (03PS1) 10Muehlenhoff: Extend globbing for testvm* [puppet] - 10https://gerrit.wikimedia.org/r/730196 [12:47:39] RECOVERY - Ensure hosts are not performing a change on every puppet run on cumin1001 is OK: OK: all nodes running as expected https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes [12:51:06] (03CR) 10Muehlenhoff: [C: 03+2] Extend globbing for testvm* [puppet] - 10https://gerrit.wikimedia.org/r/730196 (owner: 10Muehlenhoff) [12:51:18] (03PS2) 10Muehlenhoff: Extend globbing for testvm* [puppet] - 10https://gerrit.wikimedia.org/r/730196 [12:51:43] (03PS2) 10Volans: kafka: docstrings minor improvements [software/spicerack] - 10https://gerrit.wikimedia.org/r/730195 [12:51:45] (03PS1) 10Volans: changelog: fix typo [software/spicerack] - 10https://gerrit.wikimedia.org/r/730198 [12:51:59] (03Abandoned) 10Jelto: hiera:kubernetes:deployment_server add deploy users for helm3 [puppet] - 10https://gerrit.wikimedia.org/r/725014 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [12:52:19] (03CR) 10Filippo Giunchedi: [C: 03+2] o11y: port alertmanager alerts [alerts] - 10https://gerrit.wikimedia.org/r/724761 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [12:52:25] (03PS2) 10Filippo Giunchedi: o11y: port alertmanager alerts [alerts] - 10https://gerrit.wikimedia.org/r/724761 (https://phabricator.wikimedia.org/T288726) [12:54:25] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] o11y: port alertmanager alerts [alerts] - 10https://gerrit.wikimedia.org/r/724761 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [12:54:27] (03PS1) 10Jelto: hiera::deployment_server add missing mathoid helm3 deploy user [puppet] - 10https://gerrit.wikimedia.org/r/730199 (https://phabricator.wikimedia.org/T251305) [12:54:35] (03CR) 10Filippo Giunchedi: [C: 03+2] icinga: remove alertmanager::alerts [puppet] - 10https://gerrit.wikimedia.org/r/724771 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [12:54:42] (03PS2) 10Filippo Giunchedi: icinga: remove alertmanager::alerts [puppet] - 10https://gerrit.wikimedia.org/r/724771 (https://phabricator.wikimedia.org/T288726) [13:01:06] 10SRE, 10User-herron: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 (10elukey) The main-codfw cluster looks way better now, even if in some places is still unbalanced, like the kafka bytes out graph. One of the reason is that the topic partition leaders f... [13:04:59] 10SRE, 10User-herron: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 (10elukey) Example for the highest traffic volume: ` Topic:eqiad.resource-purge PartitionCount:5 ReplicationFactor:3 Configs: Topic: eqiad.resource-purge Part... [13:05:14] (03CR) 10Volans: [C: 03+2] changelog: fix typo [software/spicerack] - 10https://gerrit.wikimedia.org/r/730198 (owner: 10Volans) [13:05:50] !log upgraed spicerack to 1.0.5 on cumin hosts [13:05:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:14] (03PS4) 10Ottomata: EventBus - Enable x_client_ip_forwarding_enabled for analytics purposes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/724380 (https://phabricator.wikimedia.org/T288853) [13:10:37] (03CR) 10ZPapierski: "recheck" [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [13:11:40] !log btullis@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power T291732 [13:11:42] !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power T291732 [13:11:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:47] T291732: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 [13:11:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:01] (03CR) 10Ottomata: [C: 03+2] EventBus - Enable x_client_ip_forwarding_enabled for analytics purposes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/724380 (https://phabricator.wikimedia.org/T288853) (owner: 10Ottomata) [13:12:39] (03CR) 10Ayounsi: [C: 03+1] "Overall LGTM, should it verify the format and correctness (no typos) of the MAC address?" [software/spicerack] - 10https://gerrit.wikimedia.org/r/730030 (https://phabricator.wikimedia.org/T269855) (owner: 10Volans) [13:13:38] !log otto@deploy1002 Synchronized wmf-config/CommonSettings.php: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - T288853 (duration: 00m 56s) [13:13:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:44] T288853: Migrated Server-side EventLogging events recording http.client_ip as 127.0.0.1 - https://phabricator.wikimedia.org/T288853 [13:13:51] (03CR) 10Volans: "reply inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/730030 (https://phabricator.wikimedia.org/T269855) (owner: 10Volans) [13:14:03] !log add 50G to prometheus/k8s in eqiad [13:14:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:31] 10SRE, 10User-herron: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 (10elukey) In theory we can use the following json file to hint Kafka: ` {"partitions": [ {"topic": "eqiad.resource-purge", "partition": 0, "replicas": [2001,2005,2003]}, {"topic... [13:16:45] 10SRE, 10Analytics, 10Analytics-Kanban, 10Data-Engineering, and 6 others: Migrated Server-side EventLogging events recording http.client_ip as 127.0.0.1 - https://phabricator.wikimedia.org/T288853 (10Ottomata) Okay, deployed, and I see HomepageVisit events with real client IPs. @nettrom_WMF, f it looks ok... [13:17:45] (03PS1) 10Filippo Giunchedi: o11y: tweak threshold for AlertManagerNoAlerts [alerts] - 10https://gerrit.wikimedia.org/r/730202 [13:17:55] (03CR) 10Filippo Giunchedi: [C: 03+2] o11y: tweak threshold for AlertManagerNoAlerts [alerts] - 10https://gerrit.wikimedia.org/r/730202 (owner: 10Filippo Giunchedi) [13:19:17] (03CR) 10Ayounsi: [C: 03+1] dhcp: add support for MAC address based config (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/730030 (https://phabricator.wikimedia.org/T269855) (owner: 10Volans) [13:19:19] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [13:19:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:11] (03Merged) 10jenkins-bot: o11y: tweak threshold for AlertManagerNoAlerts [alerts] - 10https://gerrit.wikimedia.org/r/730202 (owner: 10Filippo Giunchedi) [13:21:59] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [13:22:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:58] !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet [13:26:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:30] 10SRE, 10User-herron: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 (10elukey) It worked! Going to do the same for other topics: ` Topic:codfw.resource-purge PartitionCount:5 ReplicationFactor:3 Configs: Topic: codfw.resource-purg... [13:29:43] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [cookbooks] - 10https://gerrit.wikimedia.org/r/730163 (owner: 10Volans) [13:31:17] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [13:34:47] (03CR) 10Ottomata: [C: 03+1] Release 2020.02~wmf6 [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/728557 (https://phabricator.wikimedia.org/T292699) (owner: 10Elukey) [13:37:50] 10SRE, 10Traffic: DNS Discovery for active/passive failover within a data centre - https://phabricator.wikimedia.org/T287584 (10Ottomata) Hahah, I think declining this is fine for now, but intra DC failover is probably something our traffic infrastructure should support, ya? I'm not opposed to Ben's corosync/... [13:38:40] 10SRE, 10User-herron: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 (10elukey) @Joe Let me know if the current status of main-codfw is good in your opinion, it is not perfectly balanced in the incoming traffic but I think the procedure should be good to b... [13:39:36] (03CR) 10Ottomata: "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/729957 (https://phabricator.wikimedia.org/T288348) (owner: 10Btullis) [13:40:11] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add extra include search path to {CPP,C,CXX,FORTRAN}FLAGS (031 comment) [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/727352 (https://phabricator.wikimedia.org/T292699) (owner: 10Elukey) [13:40:17] !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet [13:40:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:46] (03CR) 10Elukey: [V: 03+2 C: 03+2] Release 2020.02~wmf6 [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/728557 (https://phabricator.wikimedia.org/T292699) (owner: 10Elukey) [13:42:05] 10SRE, 10ops-eqiad, 10Analytics-Clusters: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10BTullis) Host is not booting cleanly. We get an error from /dev/sdc on boot and it required the root password for maintenance. `dmesg` shows this. ` [ 105.195... [13:44:58] (03PS4) 10ZPapierski: [WIP] Add kafka position transfer to wdqs cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) [13:46:43] 10SRE, 10ops-eqiad, 10Analytics-Clusters: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10BTullis) I quit out of the maintenance prompt with Ctrl-D but it failed at fsck again. ` Reloading system manager configuration Starting default target [ 1167.... [13:48:06] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add kafka position transfer to wdqs cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [13:49:11] (03PS11) 10Jelto: services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) [13:49:26] 10SRE, 10ops-eqiad, 10Analytics-Clusters: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10elukey) One thing that I do when this happens is to enter the root password and comment the disk in /etc/fstab, and then powercycle. In theory the OS should bo... [13:49:53] !log jgiannelos@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [13:49:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:31] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [13:51:48] (03PS5) 10ZPapierski: [WIP] Add kafka position transfer to wdqs cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) [13:52:51] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [13:52:55] 10SRE, 10ops-eqiad, 10Analytics-Clusters: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10BTullis) Great, thanks @elukey - I had got as far as looking at various megacli commands, but as far as the RAID controller was concerned everything is fine. I... [13:54:08] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add kafka position transfer to wdqs cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [13:55:29] RECOVERY - Ensure hosts are not performing a change on every puppet run on cumin2001 is OK: OK: all nodes running as expected https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes [13:55:38] (03PS12) 10JMeybohm: services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [13:55:40] (03PS1) 10JMeybohm: Rakefile: Print stderr in case helmfile build fails [deployment-charts] - 10https://gerrit.wikimedia.org/r/730204 [13:57:10] (03PS6) 10ZPapierski: [WIP] Add kafka position transfer to wdqs cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) [13:57:13] (03CR) 10jerkins-bot: [V: 04-1] services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [13:59:37] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add kafka position transfer to wdqs cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [14:01:41] (03PS2) 10Ema: VarnishTrafficDrop: add runbook and change dashboard link [alerts] - 10https://gerrit.wikimedia.org/r/730193 (https://phabricator.wikimedia.org/T292820) [14:02:37] (03CR) 10Giuseppe Lavagetto: apple-search: New chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/726933 (https://phabricator.wikimedia.org/T289224) (owner: 10Majavah) [14:03:28] (03PS1) 10Filippo Giunchedi: thanos: deploy global alerts only on hosts running thanos-rule [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) [14:03:59] (03CR) 10jerkins-bot: [V: 04-1] thanos: deploy global alerts only on hosts running thanos-rule [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [14:06:41] (03PS2) 10Filippo Giunchedi: thanos: deploy global alerts only on hosts running thanos-rule [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) [14:06:55] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Rakefile: Print stderr in case helmfile build fails [deployment-charts] - 10https://gerrit.wikimedia.org/r/730204 (owner: 10JMeybohm) [14:06:59] (03CR) 10David Caro: [C: 03+2] cumin::target: Move the reboot-host script there [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [14:07:34] (03CR) 10David Caro: [C: 03+2] cumin::target: Move the reboot-host script there (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [14:08:14] (03PS7) 10ZPapierski: [WIP] Add kafka position transfer to wdqs cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) [14:09:01] (03PS1) 10Jgiannelos: Revert "tegola-vector-tiles: Debug staging pregeneration" [deployment-charts] - 10https://gerrit.wikimedia.org/r/730184 [14:11:22] (03PS2) 10JMeybohm: Rakefile: Print stderr in case helmfile build fails [deployment-charts] - 10https://gerrit.wikimedia.org/r/730204 [14:11:24] (03PS13) 10JMeybohm: services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [14:12:25] (03PS6) 10Majavah: apple-search: New chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/726933 (https://phabricator.wikimedia.org/T289224) [14:13:16] (03CR) 10jerkins-bot: [V: 04-1] services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [14:13:35] (03CR) 10Filippo Giunchedi: [V: 03+1 C: 03+2] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31617/console" [puppet] - 10https://gerrit.wikimedia.org/r/730156 (owner: 10Filippo Giunchedi) [14:13:38] (03CR) 10Majavah: apple-search: New chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/726933 (https://phabricator.wikimedia.org/T289224) (owner: 10Majavah) [14:15:14] 10SRE, 10ops-eqiad, 10Analytics-Clusters: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10Jclark-ctr) preformed flea power drained, power off, remove power cables, unseat power supplies, hold the power button for 20-30 seconds and plug it all back i... [14:17:11] 10SRE, 10Traffic: DNS Discovery for active/passive failover within a data centre - https://phabricator.wikimedia.org/T287584 (10BBlack) I think (but I'm sure it can be debated!) that from the Traffic POV, a service's resiliency/failover within a DC shouldn't be managed via DNS automations like the discovery se... [14:18:25] (03PS14) 10Jelto: services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) [14:19:12] (03PS9) 10BBlack: interface: update rps script to also set the number of queues via ethtool [puppet] - 10https://gerrit.wikimedia.org/r/662688 (https://phabricator.wikimedia.org/T236208) (owner: 10Jbond) [14:19:14] (03PS1) 10BBlack: interface-rps.py: no-op format/comment fixups [puppet] - 10https://gerrit.wikimedia.org/r/730210 (https://phabricator.wikimedia.org/T236208) [14:22:37] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:23:01] (03CR) 10BBlack: [C: 03+1] "Nice work! I've split off the no-op formatting/commentary changes to a separate pre-patch just to satsify my own OCD. The change itself l" [puppet] - 10https://gerrit.wikimedia.org/r/662688 (https://phabricator.wikimedia.org/T236208) (owner: 10Jbond) [14:24:35] (03PS3) 10Filippo Giunchedi: thanos: deploy global alerts only on hosts running thanos-rule [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) [14:26:34] (03CR) 10Filippo Giunchedi: [C: 03+1] "Very nice re: runbook!" [alerts] - 10https://gerrit.wikimedia.org/r/730193 (https://phabricator.wikimedia.org/T292820) (owner: 10Ema) [14:28:45] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.68 ms [14:28:48] (03PS1) 10Zabe: Pre-format comments for non-local files too [core] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730226 (https://phabricator.wikimedia.org/T292570) [14:32:20] (03PS3) 10Volans: dhcp: add support for MAC address based config [software/spicerack] - 10https://gerrit.wikimedia.org/r/730030 (https://phabricator.wikimedia.org/T269855) [14:32:33] (03CR) 10Volans: "addressed comment" [software/spicerack] - 10https://gerrit.wikimedia.org/r/730030 (https://phabricator.wikimedia.org/T269855) (owner: 10Volans) [14:34:00] (03PS4) 10Filippo Giunchedi: thanos: deploy global alerts only on hosts running thanos-rule [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) [14:36:33] (03PS5) 10Filippo Giunchedi: thanos: deploy global alerts only on hosts running thanos-rule [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) [14:38:04] (03CR) 10jerkins-bot: [V: 04-1] thanos: deploy global alerts only on hosts running thanos-rule [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [14:38:46] (03PS6) 10Filippo Giunchedi: thanos: deploy global alerts only on hosts running thanos-rule [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) [14:39:56] (03CR) 10Paladox: "Is this still needed or can it be abandoned?" [puppet] - 10https://gerrit.wikimedia.org/r/451206 (https://phabricator.wikimedia.org/T196968) (owner: 10Dzahn) [14:40:13] (03CR) 10jerkins-bot: [V: 04-1] thanos: deploy global alerts only on hosts running thanos-rule [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [14:41:29] (03CR) 10David Caro: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/729593 (https://phabricator.wikimedia.org/T282949) (owner: 10Majavah) [14:41:59] (03PS7) 10Filippo Giunchedi: thanos: deploy global alerts only on hosts running thanos-rule [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) [14:42:01] (03CR) 10Majavah: toolforge: wheel of misfortune: dry run on buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/729593 (https://phabricator.wikimedia.org/T282949) (owner: 10Majavah) [14:43:11] (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31624/console" [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [14:48:39] 10SRE, 10SRE-swift-storage: Spontaneous reboot of ms-be2045 - https://phabricator.wikimedia.org/T290881 (10Papaul) [14:49:15] RECOVERY - SSH on thumbor1001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:50:36] !log volans@cumin2002 START - Cookbook sre.dns.netbox [14:50:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:28] (03PS1) 10Giuseppe Lavagetto: mediawiki: actually pass requests for /static/current to static.php [deployment-charts] - 10https://gerrit.wikimedia.org/r/730212 (https://phabricator.wikimedia.org/T285232) [14:58:47] 10SRE, 10Patch-For-Review: migrate services from cumin2001 to cumin2002 - https://phabricator.wikimedia.org/T276589 (10LSobanski) As discussed in the meeting, cumin2001 will remain in service until DB tooling is packaged for Bullseye (ETA. mid-Q3 FY21/22). At that point cumin1001 will also be re-imagined to Bu... [15:02:19] !log volans@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [15:02:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:12] (03PS3) 10Ema: VarnishTrafficDrop: add runbook and change dashboard link [alerts] - 10https://gerrit.wikimedia.org/r/730193 (https://phabricator.wikimedia.org/T292820) [15:04:35] 10SRE, 10Traffic, 10Patch-For-Review: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 (10ssingh) [15:04:42] 10SRE, 10Traffic, 10Patch-For-Review: Deploy durum: check service for Wikidough - https://phabricator.wikimedia.org/T289536 (10ssingh) 05Open→03Resolved Thanks to everyone for helping with the task. We just discussed this in IRC but for those following along: we have decided to go with managing the recor... [15:06:27] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [15:10:02] (03CR) 10Ema: [C: 03+2] VarnishTrafficDrop: add runbook and change dashboard link [alerts] - 10https://gerrit.wikimedia.org/r/730193 (https://phabricator.wikimedia.org/T292820) (owner: 10Ema) [15:12:30] 10SRE, 10Infrastructure-Foundations, 10Traffic: Anycast: Add IPv6 support to bird and anycast-healthchecker (Puppet) - https://phabricator.wikimedia.org/T292737 (10ssingh) >>! In T292737#7415977, @ayounsi wrote: > Thanks that's great! > > Could you update the doc to reflect the new config knobs? Thanks for... [15:21:24] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [15:24:29] (03CR) 10Volans: [C: 03+2] sre.hosts.reimage: check installed OS version [cookbooks] - 10https://gerrit.wikimedia.org/r/730008 (owner: 10Volans) [15:26:49] (03CR) 10Ayounsi: [C: 03+1] sre.hosts.reimage: update Netbox status [cookbooks] - 10https://gerrit.wikimedia.org/r/730023 (owner: 10Volans) [15:26:59] (03Merged) 10jenkins-bot: sre.hosts.reimage: check installed OS version [cookbooks] - 10https://gerrit.wikimedia.org/r/730008 (owner: 10Volans) [15:27:05] (03CR) 10Volans: [C: 03+2] sre.hosts.reimage: update Netbox status [cookbooks] - 10https://gerrit.wikimedia.org/r/730023 (owner: 10Volans) [15:27:12] (03CR) 10Volans: [C: 03+2] sre.ganeti.makevm: add rollback support [cookbooks] - 10https://gerrit.wikimedia.org/r/730163 (owner: 10Volans) [15:29:37] (03Merged) 10jenkins-bot: sre.hosts.reimage: update Netbox status [cookbooks] - 10https://gerrit.wikimedia.org/r/730023 (owner: 10Volans) [15:29:46] (03Merged) 10jenkins-bot: sre.ganeti.makevm: add rollback support [cookbooks] - 10https://gerrit.wikimedia.org/r/730163 (owner: 10Volans) [15:31:44] 10SRE-swift-storage: Consider swift ring management automation - https://phabricator.wikimedia.org/T265117 (10MatthewVernon) a:03MatthewVernon I think we have an outline of how to make this work, so I'll take ownership of this item. [15:35:25] (03PS1) 10Majavah: Fix wrong var being passed [extensions/SecurePoll] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/730230 (https://phabricator.wikimedia.org/T289950) [15:36:03] (03PS1) 10Majavah: Fix wrong var being passed [extensions/SecurePoll] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730231 (https://phabricator.wikimedia.org/T289950) [15:36:05] (03CR) 10Volans: "small improvement suggestion inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/718936 (https://phabricator.wikimedia.org/T285803) (owner: 10RLazarus) [15:37:35] (03PS1) 10Legoktm: Include generated styles before Mediawiki overrides [extensions/SyntaxHighlight_GeSHi] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/730233 (https://phabricator.wikimedia.org/T292736) [15:37:49] dancy, (cc zabe): hi! https://gerrit.wikimedia.org/r/c/mediawiki/extensions/SecurePoll/+/730215 is a patch for a train blocker, I made cherry-picks for wmf.3 too because it's broken in prod and we have a securepoll election in process at the moment [15:38:02] not /exactly/ sure about impact, but at least logspam but likely some logic issues too [15:38:04] Thanks majavah. [15:38:16] jounebot now [15:38:22] jouncebot now [15:38:22] No deployments scheduled for the next 0 hour(s) and 21 minute(s) [15:38:32] jouncebot next [15:38:32] In 0 hour(s) and 21 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T1600) [15:39:25] I'll merge the wmf.3 one. [15:39:58] (after the merge to master completes) [15:41:41] !log btullis@cumin1001 START - Cookbook sre.hosts.remove-downtime for analytics1069.eqiad.wmnet [15:41:41] !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for analytics1069.eqiad.wmnet [15:41:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:00] (03PS1) 10Jgiannelos: tegola-vector-tiles: Debug codfw pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/730219 [15:47:02] dancy: merged on master now [15:47:30] Thx. Hitting +2 on https://gerrit.wikimedia.org/r/c/mediawiki/extensions/SecurePoll/+/730230 now [15:47:42] (03CR) 10Ahmon Dancy: [C: 03+2] Fix wrong var being passed [extensions/SecurePoll] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/730230 (https://phabricator.wikimedia.org/T289950) (owner: 10Majavah) [15:48:08] !log volans@cumin2002 START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet [15:48:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:27] !log volans@cumin2002 END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet [15:48:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:15] * volans testing the rollback capabilities of the sre.ganeti.makevm sorry for spamming a bit [15:49:22] !log volans@cumin2002 START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet [15:49:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:57] thanks! I can test on mwdebug once ready, I think I know what it should unbreak [15:50:55] ok. I'll notify you [15:51:03] !log jgiannelos@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [15:51:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:51:30] puppet request window is a no-op so feel free to extend into the next hour [15:51:42] thx rzl! [15:52:00] (03Merged) 10jenkins-bot: Fix wrong var being passed [extensions/SecurePoll] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/730230 (https://phabricator.wikimedia.org/T289950) (owner: 10Majavah) [15:54:07] majavah: Deployed to mwdebug1002 [15:54:40] (03CR) 10BryanDavis: [C: 03+2] toolhub: Crawler CronJob concurrencyPolicy back to Forbid [deployment-charts] - 10https://gerrit.wikimedia.org/r/729891 (https://phabricator.wikimedia.org/T292861) (owner: 10BryanDavis) [15:55:41] dancy: it works, feel free to sync [15:55:47] excellent. [15:57:09] (03PS2) 10Jforrester: Pre-format comments for non-local files too [core] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730226 (https://phabricator.wikimedia.org/T292570) (owner: 10Zabe) [15:57:13] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [15:57:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:42] !log volans@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2009.codfw.wmnet [15:57:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:15] !log dancy@deploy1002 Synchronized php-1.38.0-wmf.3/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730230|Fix wrong var being passed (T289950 T293102)]] (duration: 02m 13s) [15:58:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:21] T293102: PHP Warning: substr() expects parameter 1 to be string, object given - https://phabricator.wikimedia.org/T293102 [15:58:22] T289950: SecurePoll should use the new hook runner system for its own hooks - https://phabricator.wikimedia.org/T289950 [15:59:07] (03Merged) 10jenkins-bot: toolhub: Crawler CronJob concurrencyPolicy back to Forbid [deployment-charts] - 10https://gerrit.wikimedia.org/r/729891 (https://phabricator.wikimedia.org/T292861) (owner: 10BryanDavis) [15:59:24] (03CR) 10Ahmon Dancy: [C: 03+2] Fix wrong var being passed [extensions/SecurePoll] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730231 (https://phabricator.wikimedia.org/T289950) (owner: 10Majavah) [15:59:49] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [15:59:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:05] !log volans@cumin2002 START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet [16:00:05] jbond and rzl: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Puppet request window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T1600). [16:00:05] No Gerrit patches in the queue for this window AFAICS. [16:00:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:02:36] (03Merged) 10jenkins-bot: Fix wrong var being passed [extensions/SecurePoll] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730231 (https://phabricator.wikimedia.org/T289950) (owner: 10Majavah) [16:03:26] (03CR) 10Brennen Bearnes: [C: 03+2] Pre-format comments for non-local files too [core] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730226 (https://phabricator.wikimedia.org/T292570) (owner: 10Zabe) [16:03:50] (03CR) 10Cwhite: [C: 03+1] "Yep, this should DTRT." [puppet] - 10https://gerrit.wikimedia.org/r/729957 (https://phabricator.wikimedia.org/T288348) (owner: 10Btullis) [16:06:08] !log dancy@deploy1002 Synchronized php-1.38.0-wmf.4/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730231|Fix wrong var being passed (T289950 T293102)]] (duration: 00m 57s) [16:06:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:15] T293102: PHP Warning: substr() expects parameter 1 to be string, object given - https://phabricator.wikimedia.org/T293102 [16:06:16] T289950: SecurePoll should use the new hook runner system for its own hooks - https://phabricator.wikimedia.org/T289950 [16:08:09] (03PS1) 10BryanDavis: toolhub: Bump container version to 2021-10-12-152757-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/730221 (https://phabricator.wikimedia.org/T292861) [16:08:29] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:08:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:41] !log volans@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet [16:09:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:48] !log volans@cumin2002 START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet [16:10:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:10] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:11:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:32] 10SRE, 10ops-eqiad, 10Analytics-Clusters: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10BTullis) Created follow-up task: {T293111} [16:14:42] (03CR) 10Bstorm: "Won't helm-diff installed from buster/wikimedia outside the component require the helm3 package from buster/wikimedia? It's going to do th" [puppet] - 10https://gerrit.wikimedia.org/r/729577 (https://phabricator.wikimedia.org/T292771) (owner: 10Majavah) [16:15:12] (03CR) 10Bstorm: kubeadm: add helm-diff (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/729577 (https://phabricator.wikimedia.org/T292771) (owner: 10Majavah) [16:16:09] !log volans@cumin2002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2009.codfw.wmnet [16:16:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:50] !log volans@cumin2002 START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet [16:17:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:37] (03CR) 10Majavah: kubeadm: add helm-diff (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/729577 (https://phabricator.wikimedia.org/T292771) (owner: 10Majavah) [16:22:06] (03CR) 10Dzahn: "please let me know whether IF approved this" [puppet] - 10https://gerrit.wikimedia.org/r/728648 (owner: 10Dzahn) [16:22:23] (03Merged) 10jenkins-bot: Pre-format comments for non-local files too [core] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730226 (https://phabricator.wikimedia.org/T292570) (owner: 10Zabe) [16:26:28] !log dancy@deploy1002 Synchronized php-1.38.0-wmf.4/includes: Backport: [[gerrit:730226|Pre-format comments for non-local files too (T292570)]] (duration: 01m 15s) [16:26:30] !log volans@cumin2002 END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet [16:26:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:34] T292570: ImageHistoryPseudoPager: PHP Notice: Undefined offset: [n] - https://phabricator.wikimedia.org/T292570 [16:26:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:22] (03CR) 10Ahmon Dancy: Revert "Revert "static.php: Add support for /static/current rewrites"" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730182 (owner: 10Giuseppe Lavagetto) [16:29:53] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:29:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:38] !log volans@cumin2002 START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet [16:30:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:52] !log volans@cumin2002 END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts testvm2009.codfw.wmnet [16:30:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:29] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:22] (03CR) 10BryanDavis: [C: 03+2] toolhub: Bump container version to 2021-10-12-152757-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/730221 (https://phabricator.wikimedia.org/T292861) (owner: 10BryanDavis) [16:38:08] (03Merged) 10jenkins-bot: toolhub: Bump container version to 2021-10-12-152757-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/730221 (https://phabricator.wikimedia.org/T292861) (owner: 10BryanDavis) [16:41:02] (03PS1) 10Brennen Bearnes: Fix history page iteration in backwards mode [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/730235 (https://phabricator.wikimedia.org/T292791) [16:41:21] (03PS1) 10Brennen Bearnes: Fix history page iteration in backwards mode [core] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730236 (https://phabricator.wikimedia.org/T292791) [16:41:44] jouncebot now [16:41:44] For the next 0 hour(s) and 18 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T1600) [16:41:48] jouncebot next [16:41:48] In 0 hour(s) and 18 minute(s): Services – Graphoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T1700) [16:45:46] dancy: hiiii! are you doing the train deploy today? [16:45:56] I am. [16:46:15] ah cool! I have a small favour to ask but only if it's really very much no trouble... [16:46:31] I forgot to merge a minor CentralNotice deploy in time for the branch cut yesterday [16:46:56] if there's any chance of still putting it on the train this week, again only if it's no trouble, that'd be fantastic [16:47:07] (03PS1) 10Volans: sre.ganeti.makevm: add Netbox sync on rollback [cookbooks] - 10https://gerrit.wikimedia.org/r/730247 [16:47:12] otherwise we can do a backport deploy later this week of course [16:47:16] yeah, go ahead and make a cherry-pick for the wmf.4 branch [16:47:18] 10SRE, 10Analytics, 10Analytics-Kanban, 10Data-Engineering, and 6 others: Migrated Server-side EventLogging events recording http.client_ip as 127.0.0.1 - https://phabricator.wikimedia.org/T288853 (10nettrom_WMF) 05Open→03Resolved I've verified that there are now events in the Data Lake with client IPs... [16:47:53] dancy: fantastic, thanks so much!!! [16:48:12] No problem [16:48:15] :) :) [16:48:44] !log volans@cumin2002 START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet [16:48:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:25] !log bd808@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . [16:50:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:52:13] PROBLEM - ganeti-wconfd running on ganeti2026 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 114 (gnt-masterd), command name ganeti-wconfd https://wikitech.wikimedia.org/wiki/Ganeti [16:53:29] !log failed over ganeti master for test cluster to ganeti2025 [16:53:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:48] ^ the error is expected and a monitoring glitch from the failover [16:55:32] (03CR) 10Ahmon Dancy: [C: 03+2] Fix history page iteration in backwards mode [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/730235 (https://phabricator.wikimedia.org/T292791) (owner: 10Brennen Bearnes) [16:55:36] (03CR) 10Ahmon Dancy: [C: 03+2] Fix history page iteration in backwards mode [core] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730236 (https://phabricator.wikimedia.org/T292791) (owner: 10Brennen Bearnes) [16:55:55] !log volans@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet [16:55:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:56:09] !log bd808@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' . [16:56:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:57:14] dancy: could you ping me once you're done deploying? or if I could get https://gerrit.wikimedia.org/r/c/mediawiki/extensions/SyntaxHighlight_GeSHi/+/730233 added to your queue of backports [16:57:33] I'll add that to my list. [16:57:50] ty :) [17:00:05] chrisalbon and accraze: It is that lovely time of the day again! You are hereby commanded to deploy Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T1700). [17:05:29] PROBLEM - Host mr1-ulsfo.oob is DOWN: PING CRITICAL - Packet loss = 100% [17:09:21] !log bd808@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . [17:09:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:40] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [17:11:29] RECOVERY - Host mr1-ulsfo.oob is UP: PING OK - Packet loss = 0%, RTA = 74.09 ms [17:11:45] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [17:12:41] !log installing rsync bugfix updates [17:12:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:13:30] (03Merged) 10jenkins-bot: Fix history page iteration in backwards mode [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/730235 (https://phabricator.wikimedia.org/T292791) (owner: 10Brennen Bearnes) [17:16:13] !log dancy@deploy1002 Synchronized php-1.38.0-wmf.3/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730235|Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s) [17:16:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:16:19] T292791: Oldest revisions not listed in the correct order - https://phabricator.wikimedia.org/T292791 [17:16:30] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:16:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:31] (03Merged) 10jenkins-bot: Fix history page iteration in backwards mode [core] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730236 (https://phabricator.wikimedia.org/T292791) (owner: 10Brennen Bearnes) [17:19:06] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:19:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:31] !log dancy@deploy1002 Synchronized php-1.38.0-wmf.4/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730236|Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s) [17:23:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:36] T292791: Oldest revisions not listed in the correct order - https://phabricator.wikimedia.org/T292791 [17:24:53] (03CR) 10MSantos: [C: 03+2] Revert "tegola-vector-tiles: Debug staging pregeneration" [deployment-charts] - 10https://gerrit.wikimedia.org/r/730184 (owner: 10Jgiannelos) [17:25:04] (03CR) 10Ahmon Dancy: [C: 03+2] Include generated styles before Mediawiki overrides [extensions/SyntaxHighlight_GeSHi] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/730233 (https://phabricator.wikimedia.org/T292736) (owner: 10Legoktm) [17:25:17] (03CR) 10MSantos: [C: 03+2] tegola-vector-tiles: Debug codfw pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/730219 (owner: 10Jgiannelos) [17:27:43] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:27:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:28:56] (03Merged) 10jenkins-bot: Revert "tegola-vector-tiles: Debug staging pregeneration" [deployment-charts] - 10https://gerrit.wikimedia.org/r/730184 (owner: 10Jgiannelos) [17:29:18] (03Merged) 10jenkins-bot: tegola-vector-tiles: Debug codfw pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/730219 (owner: 10Jgiannelos) [17:29:29] (03PS1) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730253 [17:29:32] (03PS1) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730254 [17:29:37] (03PS1) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730255 [17:29:43] (03PS1) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730256 [17:29:49] (03PS1) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730257 [17:29:55] (03PS1) 10AndyRussG: build: Updating npm dependencies [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730258 [17:30:01] (03PS1) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730259 [17:30:07] (03PS1) 10AndyRussG: Only filter for user if the given username is valid [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730260 (https://phabricator.wikimedia.org/T292639) [17:30:11] (03Merged) 10jenkins-bot: Include generated styles before Mediawiki overrides [extensions/SyntaxHighlight_GeSHi] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/730233 (https://phabricator.wikimedia.org/T292736) (owner: 10Legoktm) [17:30:17] (03PS1) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730261 [17:30:20] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:30:23] (03PS1) 10AndyRussG: Avoid writing empty prioritylangs rows to translate_metadata [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730262 (https://phabricator.wikimedia.org/T204026) [17:30:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:30:30] (03PS1) 10AndyRussG: Merge single use dependencies into ext.centralNotice.adminUi [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730263 (https://phabricator.wikimedia.org/T221805) [17:30:36] (03PS1) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730264 [17:30:42] (03PS1) 10AndyRussG: MediaWikiTestCase -> MediaWikiIntegrationTestCase [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730265 (https://phabricator.wikimedia.org/T293043) [17:30:46] (03PS1) 10Volans: sre.hosts.reimage: fix netbox client call [cookbooks] - 10https://gerrit.wikimedia.org/r/730266 [17:31:11] dancy: aaaargh many apologies, I think I'm making a mess of this :( [17:31:36] (03PS2) 10Volans: sre.hosts.reimage: fix netbox client call [cookbooks] - 10https://gerrit.wikimedia.org/r/730266 [17:31:38] CentralNotice is unusual in that it has its own deploy branch, wmf_deploy [17:31:53] ooh, interesting. I haven't run across that yet [17:32:02] !log dancy@deploy1002 Synchronized php-1.38.0-wmf.3/extensions/SyntaxHighlight_GeSHi/includes/ResourceLoaderPygmentsModule.php: Backport: [[gerrit:730233|Include generated styles before Mediawiki overrides (T292736)]] (duration: 00m 57s) [17:32:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:32:07] T292736: SyntaxHighlight always highlights syntax errors in red - https://phabricator.wikimedia.org/T292736 [17:32:18] yeah interesting and often a pain :| [17:32:35] legoktm: Your commit has been deployed. Will you be making a .4 cherry pick? [17:32:47] Here is the merge commit that I used to bring master into the wmf_deploy banch: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralNotice/+/730141 [17:32:53] (03CR) 10Volans: [C: 03+2] "Trivial call fix, self merging to not leave the cookbook broken." [cookbooks] - 10https://gerrit.wikimedia.org/r/730266 (owner: 10Volans) [17:33:08] I tried cherry-picking that onto the wmf.4 branch in CentralNotice, but it didn't work, so I rebased instead [17:33:16] but then Gerrit wouldn't let me submit it for review [17:33:21] dancy: I believe the original commit made it into wmf.4, let me double check [17:33:37] yep, it did [17:33:39] thanks! [17:33:39] So I used the -F option on git review, but that created all those extra Gerrit changes ^ [17:33:44] legoktm: Great! [17:34:55] normally it's the branch cut automated stuff that takes what's in wmf_deploy and puts it in the new version branch in the CentralNotice repo [17:35:26] (03Merged) 10jenkins-bot: sre.hosts.reimage: fix netbox client call [cookbooks] - 10https://gerrit.wikimedia.org/r/730266 (owner: 10Volans) [17:35:38] not sure what I'm doing wrong here, but definitely all those CN Gerrit changes ^ are wrong [17:35:43] AndyRussG: I'll take a look at the branch cut script to see what it does. [17:35:56] dancy ah thanks so so much, apologies for the bother!! [17:36:06] I'll abandon all of the above wrong Gerrit changes [17:36:33] I don't think we want to create a new commit on the wmf.4 branch in CN [17:36:40] Hmm maybe I could just do a hard reset [17:36:58] (03Abandoned) 10AndyRussG: MediaWikiTestCase -> MediaWikiIntegrationTestCase [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730265 (https://phabricator.wikimedia.org/T293043) (owner: 10AndyRussG) [17:37:06] (03Abandoned) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730264 (owner: 10AndyRussG) [17:37:12] (03Abandoned) 10AndyRussG: Merge single use dependencies into ext.centralNotice.adminUi [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730263 (https://phabricator.wikimedia.org/T221805) (owner: 10AndyRussG) [17:37:18] (03Abandoned) 10AndyRussG: Avoid writing empty prioritylangs rows to translate_metadata [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730262 (https://phabricator.wikimedia.org/T204026) (owner: 10AndyRussG) [17:37:25] (03Abandoned) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730261 (owner: 10AndyRussG) [17:37:31] (03Abandoned) 10AndyRussG: Only filter for user if the given username is valid [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730260 (https://phabricator.wikimedia.org/T292639) (owner: 10AndyRussG) [17:37:38] (03Abandoned) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730259 (owner: 10AndyRussG) [17:37:44] (03Abandoned) 10AndyRussG: build: Updating npm dependencies [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730258 (owner: 10AndyRussG) [17:37:49] (03Abandoned) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730257 (owner: 10AndyRussG) [17:38:00] (03Abandoned) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730256 (owner: 10AndyRussG) [17:38:05] (03Abandoned) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730255 (owner: 10AndyRussG) [17:38:10] (03Abandoned) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730254 (owner: 10AndyRussG) [17:38:16] (03Abandoned) 10AndyRussG: Localisation updates from https://translatewiki.net. [extensions/CentralNotice] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/730253 (owner: 10AndyRussG) [17:38:56] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:39:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:40:12] dancy: yea normal git review after git reset --hard wmf_deploy gives the same error, "No changes between HEAD and origin/wmf/1.38.0-wmf.4. Submitting for review would be pointless." [17:40:20] I suppose I could just push? [17:40:25] sorry for the bother!!!!!!!!!!!!!!! [17:40:45] hold off for a bit while I poke around. [17:40:51] ok thanks!!!! [17:41:39] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:41:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:41:45] I don't expect we want to have a different cherry-picked commit on the CN wmf.4 branch, since normally those point to the same commits as the ones in the deploy branch [17:41:46] !log jgiannelos@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [17:41:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:42:11] PROBLEM - k8s API server requests latencies on kubemaster1002 is CRITICAL: instance=10.64.32.116 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [17:42:14] AngyRussG: Agreed [17:43:54] !log jgiannelos@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [17:43:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:31] RECOVERY - k8s API server requests latencies on kubemaster1002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [17:46:55] !log volans@cumin2002 START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster [17:46:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:08] AndyRussG: So it looks like you want the wmf.4 branch of CN to point to commit 46062bbd32515dfb6ed69a073080dc1fb61d827d correct? [17:47:29] dancy: yep! [17:47:36] (03PS2) 10Volans: sre.ganeti.makevm: add Netbox sync on rollback [cookbooks] - 10https://gerrit.wikimedia.org/r/730247 [17:49:57] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [17:51:17] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [17:55:23] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:55:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:29] !log dancy@deploy1002 Synchronized php-1.38.0-wmf.4/extensions/CentralNotice: Backport: [[gerrit:730141]] (duration: 00m 59s) [17:56:31] AndyRussG: hacking completed. [17:56:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:58:02] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:58:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:36] dancy: ahhh fantastic thanks so much!!!!!! [18:00:05] Deploy window Pre MediaWiki train break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T1800) [18:00:48] (03PS1) 10Ahmon Dancy: testwikis wikis to 1.38.0-wmf.4 refs T281168 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730269 [18:00:50] (03CR) 10Ahmon Dancy: [C: 03+2] testwikis wikis to 1.38.0-wmf.4 refs T281168 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730269 (owner: 10Ahmon Dancy) [18:01:31] (03Merged) 10jenkins-bot: testwikis wikis to 1.38.0-wmf.4 refs T281168 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730269 (owner: 10Ahmon Dancy) [18:01:36] !log dancy@deploy1002 Started scap: testwikis wikis to 1.38.0-wmf.4 refs T281168 [18:01:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:01:42] T281168: 1.38.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T281168 [18:03:19] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [18:06:31] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [18:08:40] (03PS1) 10David Caro: remote: use only the last line for the uptime [software/spicerack] - 10https://gerrit.wikimedia.org/r/730270 (https://phabricator.wikimedia.org/T292465) [18:09:14] (03PS1) 10Jgiannelos: tegola-vector-tiles: Use hardcoded IP for codfw main kafka [deployment-charts] - 10https://gerrit.wikimedia.org/r/730271 [18:10:16] (03CR) 10Ottomata: tegola-vector-tiles: Use hardcoded IP for codfw main kafka (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/730271 (owner: 10Jgiannelos) [18:11:31] (03CR) 10Ottomata: tegola-vector-tiles: Use hardcoded IP for codfw main kafka (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/730271 (owner: 10Jgiannelos) [18:12:09] !log volans@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster [18:12:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:14:45] (03PS4) 10Dzahn: static-bugzilla: compress bug HTML with gzip and add 10k more bugs [container/miscweb] - 10https://gerrit.wikimedia.org/r/728668 (https://phabricator.wikimedia.org/T281538) [18:14:55] (03CR) 10jerkins-bot: [V: 04-1] remote: use only the last line for the uptime [software/spicerack] - 10https://gerrit.wikimedia.org/r/730270 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [18:18:30] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: compress bug HTML with gzip and add 10k more bugs [container/miscweb] - 10https://gerrit.wikimedia.org/r/728668 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [18:19:19] (03CR) 10Volans: "question inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/730270 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [18:19:53] dancy: looks great btw, I'm here for quite a while yet today, so pls don't hesitate to reach out if there's anything I should do or if any questions come up... thanks again!! [18:20:02] OK. Will do! [18:20:26] (03Merged) 10jenkins-bot: static-bugzilla: compress bug HTML with gzip and add 10k more bugs [container/miscweb] - 10https://gerrit.wikimedia.org/r/728668 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [18:23:20] (03PS2) 10Jgiannelos: tegola-vector-tiles: Use hardcoded IP for codfw main kafka cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/730271 [18:24:29] 10SRE, 10serviceops, 10Release-Engineering-Team (Doing): Reduce latency of new Scap releases - https://phabricator.wikimedia.org/T292646 (10Legoktm) >>! In T292646#7419245, @jijiki wrote: > The rollout process has to stay as it is though (upgrade on canaries first, and roll out to all hosts after 1-2 days)... [18:26:42] (03CR) 10Jgiannelos: tegola-vector-tiles: Use hardcoded IP for codfw main kafka cluster (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/730271 (owner: 10Jgiannelos) [18:27:56] 10SRE, 10Infrastructure-Foundations, 10netops: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10RobH) [18:34:09] (03PS1) 10Dzahn: static-bugzilla: add compressed bug content, 20000 - 30000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730275 (https://phabricator.wikimedia.org/T281538) [18:34:32] (03PS1) 10BryanDavis: toolhub: add values.yaml setting for crawler concurency [deployment-charts] - 10https://gerrit.wikimedia.org/r/730276 (https://phabricator.wikimedia.org/T292861) [18:42:32] 10SRE, 10Traffic, 10Performance-Team (Radar), 10User-ema: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 (10dpifke) [18:47:13] !log dancy@deploy1002 Finished scap: testwikis wikis to 1.38.0-wmf.4 refs T281168 (duration: 45m 36s) [18:47:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:19] T281168: 1.38.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T281168 [18:48:43] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add compressed bug content, 20000 - 30000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730275 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [18:49:40] 10SRE, 10Infrastructure-Foundations, 10netops: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10RobH) [18:49:41] (03CR) 10BryanDavis: [C: 03+2] toolhub: add values.yaml setting for crawler concurency [deployment-charts] - 10https://gerrit.wikimedia.org/r/730276 (https://phabricator.wikimedia.org/T292861) (owner: 10BryanDavis) [18:53:59] (03Merged) 10jenkins-bot: toolhub: add values.yaml setting for crawler concurency [deployment-charts] - 10https://gerrit.wikimedia.org/r/730276 (https://phabricator.wikimedia.org/T292861) (owner: 10BryanDavis) [18:54:05] (03PS1) 10BryanDavis: toolhub: Set concurrencyPolicy=Replace temporarily [deployment-charts] - 10https://gerrit.wikimedia.org/r/730278 (https://phabricator.wikimedia.org/T292861) [18:54:07] (03Abandoned) 10Jgiannelos: tegola-vector-tiles: Use hardcoded IP for codfw main kafka cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/730271 (owner: 10Jgiannelos) [19:00:05] dancy and brennen: How many deployers does it take to do MediaWiki train - Utc-7 Version deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T1900). [19:00:34] (03PS1) 10Ahmon Dancy: group0 wikis to 1.38.0-wmf.4 refs T281168 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730280 [19:00:36] (03CR) 10Ahmon Dancy: [C: 03+2] group0 wikis to 1.38.0-wmf.4 refs T281168 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730280 (owner: 10Ahmon Dancy) [19:01:00] (03PS2) 10Dzahn: static-bugzilla: add compressed bug content, 20000 - 30000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730275 (https://phabricator.wikimedia.org/T281538) [19:01:13] o/ [19:01:21] (03Merged) 10jenkins-bot: group0 wikis to 1.38.0-wmf.4 refs T281168 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730280 (owner: 10Ahmon Dancy) [19:02:56] !log dancy@deploy1002 rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.4 refs T281168 [19:03:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:03] T281168: 1.38.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T281168 [19:05:18] (03PS1) 10Dzahn: static-bugzilla: add compressed bug content, 30000 - 40000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730281 (https://phabricator.wikimedia.org/T281538) [19:08:45] !log pt1979@cumin2002 START - Cookbook sre.dns.netbox [19:08:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:35] (03PS1) 10Dzahn: static-bugzilla: add compressed bug content, 40000 - 50000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730285 (https://phabricator.wikimedia.org/T281538) [19:13:30] 10SRE, 10serviceops, 10Release-Engineering-Team (Doing): Reduce latency of new Scap releases - https://phabricator.wikimedia.org/T292646 (10jijiki) @Legoktm we may debdeploy scap everywhere, and then for whatever reason we need to push change Y fast due to issue X. If scap fails everywhere because of a bug w... [19:13:53] !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [19:13:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:33] (03PS1) 10Dzahn: static-bugzilla: add compressed bug content, 50000 - 60000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730287 (https://phabricator.wikimedia.org/T281538) [19:15:28] (03CR) 10Dzahn: [V: 03+1 C: 03+2] static-bugzilla: add compressed bug content, 20000 - 30000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730275 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [19:19:08] (03Merged) 10jenkins-bot: static-bugzilla: add compressed bug content, 20000 - 30000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730275 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [19:22:40] (03PS2) 10Dzahn: static-bugzilla: add compressed bug content, 50000 - 60000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730287 (https://phabricator.wikimedia.org/T281538) [19:22:42] (03PS1) 10Dzahn: static-bugzilla: add compressed bug content, 60000 - 70000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730288 (https://phabricator.wikimedia.org/T281538) [19:24:51] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add compressed bug content, 30000 - 40000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730281 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [19:24:53] (03PS5) 10Jforrester: Add new config names for CentralAuth denylist controls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720362 (https://phabricator.wikimedia.org/T277932) [19:24:57] (03PS2) 10Dzahn: static-bugzilla: add compressed bug content, 30000 - 40000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730281 (https://phabricator.wikimedia.org/T281538) [19:28:55] (03PS1) 10Dzahn: static-bugzilla: add compressed bug content, 60001 - 73681 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730292 (https://phabricator.wikimedia.org/T281538) [19:30:40] (03CR) 10Dzahn: [V: 03+1 C: 03+2] static-bugzilla: add compressed bug content, 30000 - 40000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730281 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [19:31:36] !log dancy@deploy1002 Pruned MediaWiki: 1.38.0-wmf.1 (duration: 04m 02s) [19:31:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:34:17] (03PS1) 10Dzahn: static-bugzilla: add compressed activity content, 1 - 20000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730314 (https://phabricator.wikimedia.org/T281538) [19:38:23] (03Merged) 10jenkins-bot: static-bugzilla: add compressed bug content, 30000 - 40000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730281 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [19:39:08] (03PS2) 10Dzahn: static-bugzilla: add compressed bug content, 40000 - 50000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730285 (https://phabricator.wikimedia.org/T281538) [19:39:15] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add compressed bug content, 40000 - 50000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730285 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [19:39:27] (03CR) 10Ottomata: tegola-vector-tiles: Use hardcoded IP for codfw main kafka cluster (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/730271 (owner: 10Jgiannelos) [19:42:32] (03PS4) 10Jforrester: Drop old config names for CentralAuth denylist controls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720363 (https://phabricator.wikimedia.org/T277932) [19:43:51] (03Merged) 10jenkins-bot: static-bugzilla: add compressed bug content, 40000 - 50000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730285 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [19:44:40] (03PS3) 10Dzahn: static-bugzilla: add compressed bug content, 50000 - 60000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730287 (https://phabricator.wikimedia.org/T281538) [19:45:24] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add compressed bug content, 50000 - 60000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730287 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [19:46:03] (03PS1) 10Dzahn: static-bugzilla: add compressed activity content, 20001 - 40000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730316 (https://phabricator.wikimedia.org/T281538) [19:52:18] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for KCVelaga (WMF) - https://phabricator.wikimedia.org/T292992 (10CDanis) a:03Ottomata Andrew, can you approve? Thanks! [19:52:40] 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Muniza - https://phabricator.wikimedia.org/T292955 (10CDanis) a:03Ottomata Andrew, can you approve? Thanks! [19:53:23] 10SRE, 10SRE-Access-Requests, 10Campaign-Tools: Request access to private data group for Ldelench - https://phabricator.wikimedia.org/T292841 (10CDanis) [19:54:54] (03Merged) 10jenkins-bot: static-bugzilla: add compressed bug content, 50000 - 60000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730287 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [19:55:34] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add compressed bug content, 60000 - 70000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730288 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [19:55:45] (03PS2) 10Dzahn: static-bugzilla: add compressed bug content, 60000 - 70000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730288 (https://phabricator.wikimedia.org/T281538) [19:55:52] (03CR) 10Dzahn: [V: 03+1 C: 03+2] static-bugzilla: add compressed bug content, 60000 - 70000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730288 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:02:33] (03PS1) 10Dzahn: static-bugzilla: add compressed activity content, 40001 - 60000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730330 (https://phabricator.wikimedia.org/T281538) [20:02:42] (03PS1) 10CDanis: add no-ssh user ldelench to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/730331 (https://phabricator.wikimedia.org/T292841) [20:05:21] (03CR) 10CDanis: [C: 03+2] add no-ssh user ldelench to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/730331 (https://phabricator.wikimedia.org/T292841) (owner: 10CDanis) [20:06:11] 10SRE, 10SRE-Access-Requests, 10Campaign-Tools, 10Patch-For-Review: Request access to private data group for Ldelench - https://phabricator.wikimedia.org/T292841 (10CDanis) 05Open→03Resolved a:03CDanis Access granted! [20:07:23] (03CR) 10CDanis: [C: 03+2] NEL alert is empirically high-signal & should page SRE [puppet] - 10https://gerrit.wikimedia.org/r/727594 (https://phabricator.wikimedia.org/T292792) (owner: 10CDanis) [20:09:10] (03Merged) 10jenkins-bot: static-bugzilla: add compressed bug content, 60000 - 70000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730288 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:09:34] (03PS1) 10Dzahn: static-bugzilla: add compressed activity content, 60001 - 73681 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730334 (https://phabricator.wikimedia.org/T281538) [20:11:55] (03CR) 10BryanDavis: [C: 03+2] toolhub: Set concurrencyPolicy=Replace temporarily [deployment-charts] - 10https://gerrit.wikimedia.org/r/730278 (https://phabricator.wikimedia.org/T292861) (owner: 10BryanDavis) [20:13:08] (03PS1) 10CDanis: add cjming to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/730335 (https://phabricator.wikimedia.org/T292782) [20:13:49] (03PS1) 10Dzahn: static-bugzilla: add all_bugs, all_activies pages, gzip index [container/miscweb] - 10https://gerrit.wikimedia.org/r/730336 (https://phabricator.wikimedia.org/T281538) [20:14:33] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add compressed bug content, 60001 - 73681 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730292 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:14:41] (03PS2) 10Dzahn: static-bugzilla: add compressed bug content, 60001 - 73681 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730292 (https://phabricator.wikimedia.org/T281538) [20:14:46] (03CR) 10Dzahn: [V: 03+1 C: 03+2] static-bugzilla: add compressed bug content, 60001 - 73681 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730292 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:15:41] (03CR) 10CDanis: [C: 03+2] add cjming to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/730335 (https://phabricator.wikimedia.org/T292782) (owner: 10CDanis) [20:16:05] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for Clare Ming - https://phabricator.wikimedia.org/T292782 (10CDanis) 05Open→03Resolved a:03CDanis Access granted! [20:18:51] (03Merged) 10jenkins-bot: static-bugzilla: add compressed bug content, 60001 - 73681 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730292 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:19:02] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10CDanis) Hi Natalia, just a friendly reminder that this ticket is still awaiting an SSH public key from you. Please see https://wikitech.wikimedia.org/wiki/SRE/Productio... [20:19:39] (03Merged) 10jenkins-bot: toolhub: Set concurrencyPolicy=Replace temporarily [deployment-charts] - 10https://gerrit.wikimedia.org/r/730278 (https://phabricator.wikimedia.org/T292861) (owner: 10BryanDavis) [20:21:20] !log bd808@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . [20:21:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:06] (03PS2) 10Dzahn: static-bugzilla: add compressed activity content, 1 - 20000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730314 (https://phabricator.wikimedia.org/T281538) [20:29:21] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add compressed activity content, 1 - 20000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730314 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:29:27] (03PS1) 10Dzahn: static-bugzilla: add skins directory [container/miscweb] - 10https://gerrit.wikimedia.org/r/730339 (https://phabricator.wikimedia.org/T281538) [20:31:28] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for KCVelaga (WMF) - https://phabricator.wikimedia.org/T292992 (10Ottomata) Approved. [20:34:02] 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Muniza - https://phabricator.wikimedia.org/T292955 (10Ottomata) @diego, does Leila need to approve this too? I'll let @odimitrijevic officially approve this one from our end, but it is ok by me. Olja, this is a case of a non WMF employee... [20:38:30] (03CR) 10Dzahn: [V: 03+1 C: 03+2] static-bugzilla: add compressed activity content, 1 - 20000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730314 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:38:43] (03PS2) 10Dzahn: static-bugzilla: add compressed activity content, 20001 - 40000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730316 (https://phabricator.wikimedia.org/T281538) [20:38:57] (03Merged) 10jenkins-bot: static-bugzilla: add compressed activity content, 1 - 20000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730314 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:39:35] PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [20:41:11] RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [20:45:46] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add compressed activity content, 20001 - 40000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730316 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:50:59] (03Merged) 10jenkins-bot: static-bugzilla: add compressed activity content, 20001 - 40000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730316 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:52:29] (03PS2) 10Dzahn: static-bugzilla: add compressed activity content, 40001 - 60000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730330 (https://phabricator.wikimedia.org/T281538) [20:52:45] (03CR) 10Dzahn: [V: 03+1 C: 03+2] static-bugzilla: add compressed activity content, 20001 - 40000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730316 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:53:13] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add compressed activity content, 40001 - 60000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730330 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:55:28] 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Muniza - https://phabricator.wikimedia.org/T292955 (10odimitrijevic) Approved [20:56:52] urbanecm: Hello, I will be present in the "UTC late backport window" backport, and sorry for forgetting to put my change to the last backport [20:57:18] (03Merged) 10jenkins-bot: static-bugzilla: add compressed activity content, 40001 - 60000 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730330 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:57:28] Juan_90264: ack. Please make sure to appear during your backport windows -- it makes things more convenient for everyone :). [20:58:19] (03PS2) 10Dzahn: static-bugzilla: add compressed activity content, 60001 - 73681 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730334 (https://phabricator.wikimedia.org/T281538) [20:58:21] Okay [20:58:26] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add compressed activity content, 60001 - 73681 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730334 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:02:18] (03Merged) 10jenkins-bot: static-bugzilla: add compressed activity content, 60001 - 73681 [container/miscweb] - 10https://gerrit.wikimedia.org/r/730334 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:03:26] (03PS2) 10Dzahn: static-bugzilla: add all_bugs, all_activies pages, gzip index [container/miscweb] - 10https://gerrit.wikimedia.org/r/730336 (https://phabricator.wikimedia.org/T281538) [21:11:32] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add all_bugs, all_activies pages, gzip index [container/miscweb] - 10https://gerrit.wikimedia.org/r/730336 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:12:01] (03PS1) 10CDanis: kcv-wikimf shell access & analytics-privatedata-userswq [puppet] - 10https://gerrit.wikimedia.org/r/730342 [21:13:32] (03PS1) 10Andrew Bogott: Added mark_tool commandline to cloudcontrol nodes [puppet] - 10https://gerrit.wikimedia.org/r/730344 (https://phabricator.wikimedia.org/T170355) [21:14:07] (03CR) 10jerkins-bot: [V: 04-1] Added mark_tool commandline to cloudcontrol nodes [puppet] - 10https://gerrit.wikimedia.org/r/730344 (https://phabricator.wikimedia.org/T170355) (owner: 10Andrew Bogott) [21:14:43] (03PS2) 10CDanis: kcv-wikimf shell access & analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/730342 (https://phabricator.wikimedia.org/T292992) [21:14:58] (03Merged) 10jenkins-bot: static-bugzilla: add all_bugs, all_activies pages, gzip index [container/miscweb] - 10https://gerrit.wikimedia.org/r/730336 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:15:28] (03PS2) 10Andrew Bogott: Added mark_tool commandline to cloudcontrol nodes [puppet] - 10https://gerrit.wikimedia.org/r/730344 (https://phabricator.wikimedia.org/T170355) [21:16:00] (03CR) 10jerkins-bot: [V: 04-1] kcv-wikimf shell access & analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/730342 (https://phabricator.wikimedia.org/T292992) (owner: 10CDanis) [21:16:10] (03PS2) 10Dzahn: static-bugzilla: add skins directory [container/miscweb] - 10https://gerrit.wikimedia.org/r/730339 (https://phabricator.wikimedia.org/T281538) [21:16:22] (03PS3) 10CDanis: kcv-wikimf shell access & analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/730342 (https://phabricator.wikimedia.org/T292992) [21:17:27] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add skins directory [container/miscweb] - 10https://gerrit.wikimedia.org/r/730339 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:18:43] (03CR) 10CDanis: [C: 03+2] kcv-wikimf shell access & analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/730342 (https://phabricator.wikimedia.org/T292992) (owner: 10CDanis) [21:20:01] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for KCVelaga (WMF) - https://phabricator.wikimedia.org/T292992 (10CDanis) 05Open→03Resolved Access granted! [21:21:03] (03CR) 10Andrew Bogott: [C: 03+2] Added mark_tool commandline to cloudcontrol nodes [puppet] - 10https://gerrit.wikimedia.org/r/730344 (https://phabricator.wikimedia.org/T170355) (owner: 10Andrew Bogott) [21:22:33] 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Muniza - https://phabricator.wikimedia.org/T292955 (10CDanis) a:05Ottomata→03MunizaA @MunizaA can you please sign the L3 agreement by clicking on L3 ? thanks! [21:25:36] (03Merged) 10jenkins-bot: static-bugzilla: add skins directory [container/miscweb] - 10https://gerrit.wikimedia.org/r/730339 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:35:44] (03CR) 10Dzahn: "recheck" [container/miscweb] - 10https://gerrit.wikimedia.org/r/730339 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:37:22] (03PS1) 10Andrew Bogott: mark_tool.py: fix a couple of runtime errors [puppet] - 10https://gerrit.wikimedia.org/r/730347 [21:40:27] (03CR) 10Andrew Bogott: [C: 03+2] mark_tool.py: fix a couple of runtime errors [puppet] - 10https://gerrit.wikimedia.org/r/730347 (owner: 10Andrew Bogott) [21:41:04] 10SRE, 10Infrastructure-Foundations, 10Traffic-Icebox, 10netops: externally-hosted NEL report forwarders for more timely report reception - https://phabricator.wikimedia.org/T292870 (10CDanis) >>! In T292870#7415867, @ayounsi wrote: > I'd wary of the complexity of the setup. Yeah, a fair criticism. > As... [21:47:38] (03PS1) 10Andrew Bogott: mark_tool.py: fix password/dn swap [puppet] - 10https://gerrit.wikimedia.org/r/730353 [21:48:55] (03CR) 10jerkins-bot: [V: 04-1] mark_tool.py: fix password/dn swap [puppet] - 10https://gerrit.wikimedia.org/r/730353 (owner: 10Andrew Bogott) [21:50:00] (03PS2) 10Andrew Bogott: mark_tool.py: fix password/dn swap [puppet] - 10https://gerrit.wikimedia.org/r/730353 [21:51:33] (03CR) 10Andrew Bogott: [C: 03+2] mark_tool.py: fix password/dn swap [puppet] - 10https://gerrit.wikimedia.org/r/730353 (owner: 10Andrew Bogott) [22:03:03] (03CR) 10Dzahn: [C: 03+1] modules::gitlab::ssh explicitly add git user with fixed id [puppet] - 10https://gerrit.wikimedia.org/r/728380 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [22:08:14] (03PS1) 10Dzahn: miscweb: bump staging version to 2021-10-12-213721-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/730356 [22:13:33] (03CR) 10Dzahn: [C: 03+2] "this is the built by the job manually restarted in CI here: https://integration.wikimedia.org/ci/job/miscweb-pipeline-publish/34/" [deployment-charts] - 10https://gerrit.wikimedia.org/r/730356 (owner: 10Dzahn) [22:13:52] (03CR) 10Dzahn: [C: 03+2] "s/built/build built/s :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/730356 (owner: 10Dzahn) [22:17:46] (03Merged) 10jenkins-bot: miscweb: bump staging version to 2021-10-12-213721-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/730356 (owner: 10Dzahn) [22:47:18] 10SRE, 10Toolhub, 10Traffic: Toolhub API requests with PATCH verbs blocked by CDN - https://phabricator.wikimedia.org/T293157 (10bd808) The timing of this is on me, but I just found the block of PATCH today and the app is planned to be announced to the community tomorrow. I can hold the announce if necessary... [22:47:56] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Clare Ming - https://phabricator.wikimedia.org/T292782 (10cjming) thank you! [22:51:10] !log dzahn@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' . [22:51:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:53:32] !log [urbanecm@labweb1001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Jamesmontalvo3 # [22:53:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:54:16] Waiting 23:00 UTC... [22:54:33] jouncebot: next [22:54:33] In 0 hour(s) and 5 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T2300) [22:56:58] Juan_90264: changing Asturian logo, eh [22:57:20] Yes [22:57:40] (03CR) 10Cwhite: graphite: disable tags support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/729968 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [22:58:22] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/729975 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [22:58:25] (03CR) 10Dzahn: [C: 03+1] Change logo in astwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/727497 (https://phabricator.wikimedia.org/T292742) (owner: 10Juan90264) [22:59:30] (03CR) 10Cwhite: [C: 03+2] schemas - metrics: Add puppet keys to the metrics name space [software/ecs] - 10https://gerrit.wikimedia.org/r/722873 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [23:00:00] I will now make another change and try to add it to the backport now https://phabricator.wikimedia.org/T291479 [23:00:04] RoanKattouw and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211012T2300). [23:00:04] Huji, Juan_90264, and zabe: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:05] (03Merged) 10jenkins-bot: schemas - metrics: Add puppet keys to the metrics name space [software/ecs] - 10https://gerrit.wikimedia.org/r/722873 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [23:00:30] (03CR) 10Zabe: "I would recomend using the script by legoktm for this: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/h" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/727497 (https://phabricator.wikimedia.org/T292742) (owner: 10Juan90264) [23:00:36] o/ [23:01:01] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/730206 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [23:01:39] Okay Zabe [23:02:36] urbanecm: Online? [23:04:35] I kinda hoped someone else will do the window...but it doesn't seem so [23:04:40] I can deploy today then. [23:04:59] !log dzahn@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' . [23:05:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:05:14] (03PS8) 10Urbanecm: Change logo in astwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/727497 (https://phabricator.wikimedia.org/T292742) (owner: 10Juan90264) [23:05:18] (03CR) 10Urbanecm: [C: 03+2] Change logo in astwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/727497 (https://phabricator.wikimedia.org/T292742) (owner: 10Juan90264) [23:05:36] Juan_90264: starting with your patch [23:05:45] Okay [23:06:07] (03Merged) 10jenkins-bot: Change logo in astwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/727497 (https://phabricator.wikimedia.org/T292742) (owner: 10Juan90264) [23:06:26] Don't forget to purge the cache later [23:08:04] Juan_90264: you can test it at mwdebug1001 [23:08:24] (I'm quite aware of what needs to be done with logo changes, don't worry :)) [23:08:40] I tested it now and approve the change [23:10:07] syncing [23:10:53] It seems that I won't be able to make the change yet, the logo provided is png and not svg, it will end up missing version 1.5x and 2x. https://phabricator.wikimedia.org/T291479 [23:11:06] yes, a svg is definitely needed [23:11:15] and try to use the new yaml-based approach ;) [23:12:41] !log urbanecm@deploy1002 Synchronized static/images/project-logos/: 59c31d9046a68e73b07d8179ac569425d18dcf73: Change logo in astwiki (T292742) (duration: 02m 09s) [23:12:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:12:47] T292742: Change logo in astwiki - https://phabricator.wikimedia.org/T292742 [23:13:06] Ok I will use the new approach in yaml [23:13:28] zabe: your patch is quite large, although it looks fairly simple, I don't feel comfortable deploying it now. Ideally, this should have a +1 from someone else prior to deployment. [23:13:51] sure, going to reschedule [23:13:56] (I'm also wondering about a deployment plan; it looks syncing it all in an arbitrary order should work, but...not 100% sure) [23:13:58] thanks zabe [23:14:52] the libs are dev-only, so it should be a no-op [23:15:17] !log urbanecm@deploy1002 Synchronized wmf-config/logos.php: 59c31d9046a68e73b07d8179ac569425d18dcf73: Change logo in astwiki (T292742) (duration: 01m 04s) [23:15:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:15:36] zabe: libs yes, but the files like w/static.php are actually heavily used :)) [23:16:29] !log UTC late B&C window done [23:16:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:16:43] declaring the window done [23:19:03] 10SRE, 10SRE-Access-Requests: Requesting access to Analytic Cluster for Muniza - https://phabricator.wikimedia.org/T292955 (10diego) >>! In T292955#7421619, @Ottomata wrote: > @diego, does Leila need to approve this too? @Ottomata, I don't think so, I'm Muniza's hiring manager, and Muniza has already signed a... [23:20:34] Thank you urbanecm, the logos https://ast.wikipedia.org/static/images/project-logos/astwiki-1.5x.png and https://ast.wikipedia.org/static/images/project-logos/astwiki-2x.png, are working [23:22:25] 10SRE, 10Toolhub, 10Traffic: Toolhub API requests with PATCH verbs blocked by CDN - https://phabricator.wikimedia.org/T293157 (10bd808) [23:24:13] (03PS1) 10Dzahn: miscweb: set staging version to production to debug issue pulling from registry [deployment-charts] - 10https://gerrit.wikimedia.org/r/730362 [23:29:38] (03CR) 10Dzahn: [C: 03+2] miscweb: set staging version to production to debug issue pulling from registry [deployment-charts] - 10https://gerrit.wikimedia.org/r/730362 (owner: 10Dzahn) [23:34:20] (03PS1) 10BryanDavis: cache: Allow PATCH method to be passed to backend services [puppet] - 10https://gerrit.wikimedia.org/r/730363 (https://phabricator.wikimedia.org/T293157) [23:43:06] (03Merged) 10jenkins-bot: miscweb: set staging version to production to debug issue pulling from registry [deployment-charts] - 10https://gerrit.wikimedia.org/r/730362 (owner: 10Dzahn) [23:48:17] !log dzahn@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' . [23:48:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log