[00:00:25] (03CR) 10Dzahn: "also need this: https://gerrit.wikimedia.org/r/c/operations/puppet/+/726094" [puppet] - 10https://gerrit.wikimedia.org/r/725390 (owner: 10Dzahn) [00:01:30] RECOVERY - Check systemd state on ms-be1059 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:02:14] (03PS6) 10Dzahn: geoip: replace maxmind update cron with system timer and config [puppet] - 10https://gerrit.wikimedia.org/r/721595 (https://phabricator.wikimedia.org/T273673) [00:02:18] RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:02:55] (03CR) 10jerkins-bot: [V: 04-1] geoip: replace maxmind update cron with system timer and config [puppet] - 10https://gerrit.wikimedia.org/r/721595 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [00:05:47] (03PS2) 10Cwhite: logstash: set log field aside prior to parsing k8s logs [puppet] - 10https://gerrit.wikimedia.org/r/726129 (https://phabricator.wikimedia.org/T292099) [00:06:11] (03PS7) 10Dzahn: geoip: replace maxmind update cron with system timer and config [puppet] - 10https://gerrit.wikimedia.org/r/721595 (https://phabricator.wikimedia.org/T273673) [00:08:40] PROBLEM - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: deploy_to_mwdebug.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:47:37] Hello folks! I'm Juan90264 on WikiTech and Wikipedia [01:48:17] Someone online? [01:52:35] what's up? [01:53:00] Hello [01:54:13] Is Lucas_WMDE on this channel? [01:58:36] He might be when it wasn't 4am for him :) [01:58:37] (03PS1) 10Ladsgroup: Don't fail job if subscribed wiki is unknown [extensions/Wikibase] (wmf/1.38.0-wmf.2) - 10https://gerrit.wikimedia.org/r/725923 (https://phabricator.wikimedia.org/T292446) [01:59:36] Okay Reedy [02:00:04] Deploy window Branching MediaWiki, extensions, skins, and vendor – See Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T0200) [02:04:46] (03CR) 10Cwhite: [C: 03+1] "LGTM modulo what Keith said" [puppet] - 10https://gerrit.wikimedia.org/r/725838 (owner: 10Filippo Giunchedi) [02:05:30] Does anyone know what is the most active Backport user here? What if yesterday was? [02:06:56] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.38.0-wmf.3 [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/726225 [02:07:04] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.38.0-wmf.3 [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/726225 (owner: 10TrainBranchBot) [02:07:24] (03CR) 10Cwhite: [C: 03+1] "LGTM" [alerts] - 10https://gerrit.wikimedia.org/r/725884 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [02:07:34] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/725885 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [02:08:15] (03CR) 10Cwhite: [C: 03+2] git - schema: Add new schema for adding git information [software/ecs] - 10https://gerrit.wikimedia.org/r/722580 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [02:09:03] (03Merged) 10jenkins-bot: git - schema: Add new schema for adding git information [software/ecs] - 10https://gerrit.wikimedia.org/r/722580 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [02:09:08] (03CR) 10Cwhite: [C: 03+2] move version handling and template rendering to build step [software/ecs] - 10https://gerrit.wikimedia.org/r/722965 (owner: 10Cwhite) [02:09:26] (03CR) 10Cwhite: [C: 03+2] add dynamic_templates template rendering [software/ecs] - 10https://gerrit.wikimedia.org/r/722966 (https://phabricator.wikimedia.org/T291647) (owner: 10Cwhite) [02:09:40] (03Merged) 10jenkins-bot: move version handling and template rendering to build step [software/ecs] - 10https://gerrit.wikimedia.org/r/722965 (owner: 10Cwhite) [02:10:00] (03Merged) 10jenkins-bot: add dynamic_templates template rendering [software/ecs] - 10https://gerrit.wikimedia.org/r/722966 (https://phabricator.wikimedia.org/T291647) (owner: 10Cwhite) [02:15:54] Juan_90264, welcome back [02:16:36] I'm still getting used to IRC :) [02:17:59] (03PS1) 10Cwhite: bump changelog 1.7.0-5 [software/ecs] - 10https://gerrit.wikimedia.org/r/726226 [02:18:22] (03PS2) 10Cwhite: schemas - metrics: Add puppet keys to the metrics name space [software/ecs] - 10https://gerrit.wikimedia.org/r/722873 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [02:18:32] I'm not sure who's the most active deployer (it's probably martin. they're also asleep right now.) [02:20:12] (03PS2) 10Cwhite: bump changelog 1.7.0-5 [software/ecs] - 10https://gerrit.wikimedia.org/r/726226 [02:22:16] Thanks AntiComposite [02:22:36] Okay, I'll come back to this channel at 11:00 UTC, and I hope that at least the deployers will let me know the sole reason for not deploying, if they come to ignore them. [02:23:12] so far I haven't seen anything other than you not being here [02:23:17] oh they've gone [02:24:24] PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The following units failed: docker-reporter-releng-images.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:29:12] (03Merged) 10jenkins-bot: Branch commit for wmf/1.38.0-wmf.3 [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/726225 (owner: 10TrainBranchBot) [02:29:24] (03PS4) 10MacFan4000: ExtensionDistributor: Add 1.37 as preview branch; remove 1.31 as it's EOL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725912 [03:12:18] PROBLEM - dump of s5 in codfw on alert1001 is CRITICAL: Last dump for s5 at codfw (db2101.codfw.wmnet:3315) taken on 2021-10-05 00:00:02 is 79 GB, but previous one was 94 GB, a change of 15.4% https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [03:36:46] PROBLEM - dump of s5 in eqiad on alert1001 is CRITICAL: Last dump for s5 at eqiad (db1150.eqiad.wmnet:3315) taken on 2021-10-05 00:00:02 is 79 GB, but previous one was 94 GB, a change of 15.4% https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [04:55:52] (03PS3) 10KartikMistry: Update cxserver to use nodejs12 [deployment-charts] - 10https://gerrit.wikimedia.org/r/725866 (https://phabricator.wikimedia.org/T290754) [05:09:51] (03CR) 10Giuseppe Lavagetto: [C: 03+2] admin: add Shari to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/724946 (https://phabricator.wikimedia.org/T292069) (owner: 10Giuseppe Lavagetto) [05:12:10] 10SRE, 10SRE-Access-Requests, 10Product-Analytics: Requesting access to Superset for Swakiyama - https://phabricator.wikimedia.org/T292069 (10Joe) 05Open→03Resolved After getting Carol's approval via email, I merged the patches. It means that Shari should be able to access superset in the next hour or so... [05:30:00] PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), Fresh: 102 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [05:33:44] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I think it would be easier to add a check in the command we run." [puppet] - 10https://gerrit.wikimedia.org/r/725390 (owner: 10Dzahn) [05:35:16] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Not sure if my message was clear, but I was suggesting to just add a safeguard to the cron, instead of absenting everything." [puppet] - 10https://gerrit.wikimedia.org/r/725390 (owner: 10Dzahn) [06:25:13] (03PS1) 10Elukey: Set new AMD ROCm version for an-worker1096 [puppet] - 10https://gerrit.wikimedia.org/r/726389 (https://phabricator.wikimedia.org/T287267) [06:25:24] PROBLEM - SSH on bast5001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [06:36:02] RECOVERY - dump of m5 in eqiad on alert1001 is OK: Last dump for m5 at eqiad (db1117.eqiad.wmnet:3325) taken on 2021-10-05 06:08:56 (20 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [06:38:23] !log reboot an-worker1096 after installing new GPU drivers [06:38:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:33] (03CR) 10Filippo Giunchedi: [C: 03+2] o11y: port Icinga checks [alerts] - 10https://gerrit.wikimedia.org/r/725884 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [06:42:37] (03PS2) 10Filippo Giunchedi: o11y: port Icinga checks [alerts] - 10https://gerrit.wikimedia.org/r/725884 (https://phabricator.wikimedia.org/T288726) [06:42:48] (03CR) 10Filippo Giunchedi: [C: 03+2] alerts: remove icinga overload alert, moved to AM [puppet] - 10https://gerrit.wikimedia.org/r/725885 (https://phabricator.wikimedia.org/T288726) (owner: 10Filippo Giunchedi) [06:42:54] (03PS2) 10Filippo Giunchedi: alerts: remove icinga overload alert, moved to AM [puppet] - 10https://gerrit.wikimedia.org/r/725885 (https://phabricator.wikimedia.org/T288726) [06:45:48] 10SRE, 10serviceops, 10Patch-For-Review: Rebuild production Stretch images with GNUTLS/OpenSSL updates for LE issue chain update - https://phabricator.wikimedia.org/T291458 (10Joe) 05In progress→03Resolved [06:45:56] 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: OpenSSL < 1.1.0 compatibility issues with new LE issuance chain - https://phabricator.wikimedia.org/T283165 (10Joe) [06:48:18] (03CR) 10Filippo Giunchedi: [C: 03+2] alerts: move alerts-deploy to systemd units [puppet] - 10https://gerrit.wikimedia.org/r/725840 (https://phabricator.wikimedia.org/T292303) (owner: 10Filippo Giunchedi) [06:58:21] (03PS1) 10Filippo Giunchedi: alerts: fix newline escape in unit description [puppet] - 10https://gerrit.wikimedia.org/r/726505 [06:58:37] (03PS5) 10Jgiannelos: Add script to send tile invalidation events [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) [06:59:30] (03CR) 10Filippo Giunchedi: [C: 03+2] alerts: fix newline escape in unit description [puppet] - 10https://gerrit.wikimedia.org/r/726505 (owner: 10Filippo Giunchedi) [07:01:49] (03PS2) 10Elukey: Set new AMD ROCm version for an-worker1096 [puppet] - 10https://gerrit.wikimedia.org/r/726389 (https://phabricator.wikimedia.org/T287267) [07:01:51] (03PS1) 10Elukey: amd_rocm: update settings/packages for ROCm 4.3.1 [puppet] - 10https://gerrit.wikimedia.org/r/726507 (https://phabricator.wikimedia.org/T287267) [07:02:52] (03CR) 10Elukey: [C: 03+2] amd_rocm: update settings/packages for ROCm 4.3.1 [puppet] - 10https://gerrit.wikimedia.org/r/726507 (https://phabricator.wikimedia.org/T287267) (owner: 10Elukey) [07:03:06] (03CR) 10Elukey: [C: 03+2] Set new AMD ROCm version for an-worker1096 [puppet] - 10https://gerrit.wikimedia.org/r/726389 (https://phabricator.wikimedia.org/T287267) (owner: 10Elukey) [07:13:55] (03PS1) 10Ladsgroup: eventgate-main: Increase number of replicas to five (from 3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/726513 (https://phabricator.wikimedia.org/T292048) [07:18:23] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "There are various levels at which this problem possibly builds up, see also T288825, but this surely won't hurt." [deployment-charts] - 10https://gerrit.wikimedia.org/r/726513 (https://phabricator.wikimedia.org/T292048) (owner: 10Ladsgroup) [07:20:50] (03PS1) 10Filippo Giunchedi: alerts: fix resource ordering [puppet] - 10https://gerrit.wikimedia.org/r/726514 (https://phabricator.wikimedia.org/T292303) [07:21:58] (03CR) 10Filippo Giunchedi: [C: 03+2] alerts: fix resource ordering [puppet] - 10https://gerrit.wikimedia.org/r/726514 (https://phabricator.wikimedia.org/T292303) (owner: 10Filippo Giunchedi) [07:22:28] (03PS1) 10Jgiannelos: tile-pregeneration: Exclude canary test events [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/726522 [07:22:47] (03Merged) 10jenkins-bot: eventgate-main: Increase number of replicas to five (from 3) [deployment-charts] - 10https://gerrit.wikimedia.org/r/726513 (https://phabricator.wikimedia.org/T292048) (owner: 10Ladsgroup) [07:26:20] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=wdqs1004.wmnet [07:26:20] !log ladsgroup@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [07:26:22] RECOVERY - SSH on bast5001.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [07:26:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:26:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:40] !log ladsgroup@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [07:27:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:28:56] (03PS2) 10Jgiannelos: tile-pregeneration: Exclude canary test events [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/726522 (https://phabricator.wikimedia.org/T270175) [07:35:43] 10SRE, 10SRE-Access-Requests: Grant Access to wmf, analytics-privatedata-users for TTaylor - https://phabricator.wikimedia.org/T292299 (10Joe) [07:38:19] (03PS1) 10Elukey: Upgrade all an-workers with GPUs to ROCm 4.3.1 [puppet] - 10https://gerrit.wikimedia.org/r/726539 (https://phabricator.wikimedia.org/T287267) [07:38:56] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM, this is obsolete since the creation of profile::debdeploy::client" [puppet] - 10https://gerrit.wikimedia.org/r/726026 (owner: 10Dzahn) [07:40:05] (03CR) 10Elukey: [C: 03+2] Upgrade all an-workers with GPUs to ROCm 4.3.1 [puppet] - 10https://gerrit.wikimedia.org/r/726539 (https://phabricator.wikimedia.org/T287267) (owner: 10Elukey) [07:43:56] (03PS6) 10Jgiannelos: Add script to send tile invalidation events [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) [07:49:10] RECOVERY - dump of m5 in codfw on alert1001 is OK: Last dump for m5 at codfw (db2078.codfw.wmnet:3325) taken on 2021-10-05 07:27:07 (20 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [07:50:03] (03PS2) 10Jelto: Enable Content-Security-Policy reporting [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/725900 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [07:52:16] (03CR) 10Jelto: [C: 03+1] "I just added some commas and a curly bracket. I've done a test deployement on gitlab2001 (gitlab-replica.wikimedia.org) and looks good to " [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/725900 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [07:54:07] jelto: ;))) [07:54:09] (03CR) 10Jelto: [C: 04-1] gitlab: enable Content-Security-Policy reporting (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/725012 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [07:55:05] thank you for the typo fixes, can you state on the task that you have rolled it to the gitlab-replica instance please? Then I guess Security can check stuff is properly setup [07:57:18] !log upgrade GPU drivers (AMD ROCm 4.3.1) on an-worker1[096-101] [07:57:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:46] * hashar backs to ooo [07:58:32] !log installing apache security updates [07:58:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:59:32] hashar: done, added the information to the task [08:00:20] PROBLEM - Check systemd state on an-worker1099 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_amd_rocm_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:01:44] PROBLEM - Check systemd state on an-worker1101 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_amd_rocm_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:02:28] RECOVERY - Check systemd state on an-worker1099 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:03:52] RECOVERY - Check systemd state on an-worker1101 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:06:30] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 235, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:06:48] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:15:39] (03PS7) 10Jgiannelos: Add script to send tile invalidation events [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) [08:19:02] (03PS2) 10Vgutierrez: acme_chief,api: Provide .alt.chained.crt.key.ocsp [software/acme-chief] - 10https://gerrit.wikimedia.org/r/725983 (https://phabricator.wikimedia.org/T290249) [08:19:04] (03PS1) 10Vgutierrez: acme_chief: Adopt f-strings [software/acme-chief] - 10https://gerrit.wikimedia.org/r/726556 [08:19:36] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:21:24] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 236, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:31:00] (03PS8) 10Jgiannelos: Add script to send tile invalidation events [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) [08:32:43] (03PS1) 10Vgutierrez: Release 0.32 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/726559 (https://phabricator.wikimedia.org/T290249) [08:34:01] (03CR) 10Volans: "Looks good, most comments are documentation-related and couple of nits. Just one question about dry-run and a comment regarding the certif" [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [08:41:54] jouncebot: nowandnext [08:41:54] No deployments scheduled for the next 2 hour(s) and 18 minute(s) [08:41:54] In 2 hour(s) and 18 minute(s): European mid-day backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T1100) [08:41:59] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: Set "mwversion" for Logstash entries from php-wmerrors [puppet] - 10https://gerrit.wikimedia.org/r/722483 (https://phabricator.wikimedia.org/T253781) (owner: 10Krinkle) [08:48:38] (03CR) 10Jbond: [C: 03+2] "thanks lgtm will merge" [puppet] - 10https://gerrit.wikimedia.org/r/726026 (owner: 10Dzahn) [08:51:55] !log installing openssl security updates for stretch (buster/bullseye already fixed) [08:51:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:22] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: Stop wikidata dispatching via systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/725673 (https://phabricator.wikimedia.org/T48643) (owner: 10Ladsgroup) [08:53:04] (03CR) 10Jgiannelos: "I moved everything under OSM from tilerator to make things easy in case we drop tilerator in the future." [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) (owner: 10Jgiannelos) [08:53:05] ACKNOWLEDGEMENT - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: deploy_to_mwdebug.service Effie Mouzeli Running tests https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:53:15] (03CR) 10Vgutierrez: [C: 03+2] acme_chief: Adopt f-strings [software/acme-chief] - 10https://gerrit.wikimedia.org/r/726556 (owner: 10Vgutierrez) [08:53:42] (03CR) 10Vgutierrez: [C: 03+2] acme_chief,api: Provide .alt.chained.crt.key.ocsp [software/acme-chief] - 10https://gerrit.wikimedia.org/r/725983 (https://phabricator.wikimedia.org/T290249) (owner: 10Vgutierrez) [08:53:50] (03CR) 10Vgutierrez: [C: 03+2] Release 0.32 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/726559 (https://phabricator.wikimedia.org/T290249) (owner: 10Vgutierrez) [08:55:01] (03CR) 10Jbond: "see comment" [puppet] - 10https://gerrit.wikimedia.org/r/726094 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [08:55:53] (03CR) 10Jbond: mediawiki/geoip: add option to also pull new MaxMind databases from master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/726094 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [08:56:40] (03Merged) 10jenkins-bot: acme_chief: Adopt f-strings [software/acme-chief] - 10https://gerrit.wikimedia.org/r/726556 (owner: 10Vgutierrez) [08:57:18] (03Merged) 10jenkins-bot: acme_chief,api: Provide .alt.chained.crt.key.ocsp [software/acme-chief] - 10https://gerrit.wikimedia.org/r/725983 (https://phabricator.wikimedia.org/T290249) (owner: 10Vgutierrez) [08:57:20] (03Merged) 10jenkins-bot: Release 0.32 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/726559 (https://phabricator.wikimedia.org/T290249) (owner: 10Vgutierrez) [08:59:20] PROBLEM - Check for large files in client bucket on mwmaint1002 is CRITICAL: WARNING: large files in client bucket https://wikitech.wikimedia.org/wiki/Puppet%23check_client_bucket_large_file [08:59:24] !log depool and restart blazegraph on wdqs1007 [08:59:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:29] (03PS1) 10Jgiannelos: tegola-vector-tiles: Configure kafka for eqiad/codfw main [deployment-charts] - 10https://gerrit.wikimedia.org/r/726561 [09:03:36] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Patch-For-Review, and 3 others: Audit usages or the realm variable with a view to drop it - https://phabricator.wikimedia.org/T289661 (10dcaro) [09:04:48] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10User-dcaro, and 2 others: Add more rspec test to the puppet code - https://phabricator.wikimedia.org/T289668 (10dcaro) [09:05:54] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: / (spec from root) is CRITICAL: Test spec from root returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid [09:05:55] (03PS1) 10Jgiannelos: tegola-vector-tiles: Enable pregeneration cronjob to all envs [deployment-charts] - 10https://gerrit.wikimedia.org/r/726562 [09:06:00] (03PS1) 10Kormat: admin: Add ttaylor to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/726563 (https://phabricator.wikimedia.org/T292299) [09:06:43] (03CR) 10Kormat: [C: 03+2] admin: Add ttaylor to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/726563 (https://phabricator.wikimedia.org/T292299) (owner: 10Kormat) [09:07:48] ^ looking at citoid [09:08:00] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [09:08:49] (03PS1) 10Arturo Borrero Gonzalez: openstack: drop manila code [puppet] - 10https://gerrit.wikimedia.org/r/726564 (https://phabricator.wikimedia.org/T291257) [09:08:56] (03PS1) 10Vgutierrez: acme_chief: Adopt f-strings [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726565 [09:08:58] (03PS1) 10Vgutierrez: acme_chief,api: Provide .alt.chained.crt.key.ocsp [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726566 (https://phabricator.wikimedia.org/T290249) [09:09:00] (03PS1) 10Vgutierrez: Release 0.32 [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726567 (https://phabricator.wikimedia.org/T290249) [09:09:02] (03PS1) 10Vgutierrez: debian: Add release 0.32 to the changelog [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726568 (https://phabricator.wikimedia.org/T290249) [09:09:34] !log updating routinator on rpki2001 (T291543) [09:09:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:07] (03CR) 10Muehlenhoff: "If this gets merged, releases::reprepro and aptrepo::distribution are also unused and could be removed." [puppet] - 10https://gerrit.wikimedia.org/r/725670 (owner: 10Muehlenhoff) [09:11:31] (03CR) 10Jgiannelos: [C: 04-1] "Block until we are ready to start pregenerating tiles" [deployment-charts] - 10https://gerrit.wikimedia.org/r/726562 (owner: 10Jgiannelos) [09:12:18] (03PS2) 10Arturo Borrero Gonzalez: openstack: drop manila code [puppet] - 10https://gerrit.wikimedia.org/r/726564 (https://phabricator.wikimedia.org/T291257) [09:12:59] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Remove NS_MAIN from wgExtraSignatureNamespaces on most 'special' wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725015 (https://phabricator.wikimedia.org/T291630) (owner: 10Bartosz Dziewoński) [09:13:49] 10SRE, 10LDAP-Access-Requests: Add Deniz Erdogan to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T292301 (10Deniz_WMDE) @KFrancis I signed it, thank you! [09:14:42] PROBLEM - RPKI Validator RTR port on rpki2001 is CRITICAL: connect to address 10.192.0.103 and port 3323: Connection refused https://wikitech.wikimedia.org/wiki/RPKI%23RPKI_to_router_port [09:15:15] I guess that's topranks [09:16:48] RECOVERY - RPKI Validator RTR port on rpki2001 is OK: TCP OK - 0.032 second response time on 10.192.0.103 port 3323 https://wikitech.wikimedia.org/wiki/RPKI%23RPKI_to_router_port [09:18:06] vgutierrez: indeed yes, should be coming back now. [09:19:19] (03CR) 10Alexandros Kosiaris: [C: 03+1] Add docs about template, label, and canary conventions [deployment-charts] - 10https://gerrit.wikimedia.org/r/724489 (https://phabricator.wikimedia.org/T291848) (owner: 10Ottomata) [09:23:01] (03CR) 10David Caro: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/726564 (https://phabricator.wikimedia.org/T291257) (owner: 10Arturo Borrero Gonzalez) [09:23:12] (03CR) 10Michael Große: [C: 03+1] Don't fail job if subscribed wiki is unknown [extensions/Wikibase] (wmf/1.38.0-wmf.2) - 10https://gerrit.wikimedia.org/r/725923 (https://phabricator.wikimedia.org/T292446) (owner: 10Ladsgroup) [09:24:30] PROBLEM - Routinator process on rpki2001 is CRITICAL: PROCS CRITICAL: 0 processes with command name routinator https://wikitech.wikimedia.org/wiki/RPKI%23Process [09:25:18] PROBLEM - RPKI Validator RTR port on rpki2001 is CRITICAL: connect to address 10.192.0.103 and port 3323: Connection refused https://wikitech.wikimedia.org/wiki/RPKI%23RPKI_to_router_port [09:26:38] RECOVERY - Routinator process on rpki2001 is OK: PROCS OK: 1 process with command name routinator https://wikitech.wikimedia.org/wiki/RPKI%23Process [09:27:26] RECOVERY - RPKI Validator RTR port on rpki2001 is OK: TCP OK - 0.032 second response time on 10.192.0.103 port 3323 https://wikitech.wikimedia.org/wiki/RPKI%23RPKI_to_router_port [09:28:36] (03CR) 10Vgutierrez: [C: 03+1] Release 6.0.8-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/725307 (https://phabricator.wikimedia.org/T268736) (owner: 10Ema) [09:31:47] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: drop manila code [puppet] - 10https://gerrit.wikimedia.org/r/726564 (https://phabricator.wikimedia.org/T291257) (owner: 10Arturo Borrero Gonzalez) [09:32:55] (03CR) 10Vgutierrez: [C: 03+2] acme_chief: Adopt f-strings [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726565 (owner: 10Vgutierrez) [09:33:00] (03CR) 10Vgutierrez: [C: 03+2] acme_chief,api: Provide .alt.chained.crt.key.ocsp [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726566 (https://phabricator.wikimedia.org/T290249) (owner: 10Vgutierrez) [09:33:04] (03CR) 10Vgutierrez: [C: 03+2] Release 0.32 [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726567 (https://phabricator.wikimedia.org/T290249) (owner: 10Vgutierrez) [09:33:35] (03CR) 10Vgutierrez: [C: 03+2] debian: Add release 0.32 to the changelog [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726568 (https://phabricator.wikimedia.org/T290249) (owner: 10Vgutierrez) [09:33:56] (03PS1) 10Arturo Borrero Gonzalez: hieradata: drop openstack manila keys [labs/private] - 10https://gerrit.wikimedia.org/r/726569 (https://phabricator.wikimedia.org/T291257) [09:34:09] 10SRE, 10SRE-Access-Requests: Grant Access to wmf, analytics-privatedata-users for TTaylor - https://phabricator.wikimedia.org/T292299 (10Kormat) 05Open→03Resolved a:03Kormat Hi @ttaylor, this is now done. You can confirm it by searching for 'ttaylor' on https://contact.toolforge.org/. Cheers. [09:34:34] (03CR) 10Arturo Borrero Gonzalez: [V: 03+2 C: 03+2] hieradata: drop openstack manila keys [labs/private] - 10https://gerrit.wikimedia.org/r/726569 (https://phabricator.wikimedia.org/T291257) (owner: 10Arturo Borrero Gonzalez) [09:36:05] (03Merged) 10jenkins-bot: acme_chief: Adopt f-strings [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726565 (owner: 10Vgutierrez) [09:36:20] (03Merged) 10jenkins-bot: acme_chief,api: Provide .alt.chained.crt.key.ocsp [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726566 (https://phabricator.wikimedia.org/T290249) (owner: 10Vgutierrez) [09:36:38] (03Merged) 10jenkins-bot: Release 0.32 [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726567 (https://phabricator.wikimedia.org/T290249) (owner: 10Vgutierrez) [09:37:13] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ayounsi) @aborrero is it possible to have more information on this new service? Design doc or similar. I can't find anything on Wik... [09:37:15] (03Merged) 10jenkins-bot: debian: Add release 0.32 to the changelog [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/726568 (https://phabricator.wikimedia.org/T290249) (owner: 10Vgutierrez) [09:46:04] !log generated cassandra certificate using FQDN for restbase2023 [09:46:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:04] 10SRE, 10Infrastructure-Foundations, 10netops: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 (10MoritzMuehlenhoff) https://packages.nlnetlabs.nl/ also provides the routinator debs for bullseye (plus it's a static Go binary anyway), so if we're recreating the VMs a... [09:54:57] 10SRE, 10Infrastructure-Foundations, 10netops: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 (10cmooney) @MoritzMuehlenhoff yes smart thinking we'll do that :) [09:58:25] 10SRE, 10wikitech.wikimedia.org, 10Sustainability (Incident Followup): Incident response tools operational readiness review - https://phabricator.wikimedia.org/T290130 (10LSobanski) [10:06:15] (03PS1) 10Jbond: profile::contact: add role contacts based on current mappings /modules/profile/files/sre/owners.yaml [puppet] - 10https://gerrit.wikimedia.org/r/726572 [10:06:43] !log upload acme-chief 0.32 to apt.wm.o (buster) - T290249 [10:06:45] (03CR) 10jerkins-bot: [V: 04-1] profile::contact: add role contacts based on current mappings /modules/profile/files/sre/owners.yaml [puppet] - 10https://gerrit.wikimedia.org/r/726572 (owner: 10Jbond) [10:06:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:50] T290249: Support OCSP stapling from prefetched responses in HAProxy - https://phabricator.wikimedia.org/T290249 [10:07:27] 10SRE, 10Infrastructure-Foundations, 10netops: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 (10cmooney) @ayounsi Riccardo suggested maybe using a separate disk/partition for the routinator data? That was partly to just do a quick dirty job and not rebuild, but w... [10:09:16] !log update acme-chief to version 0.32 on acmechief-test hosts - T290249 [10:09:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:36] (03PS2) 10Jbond: profile::contact: add role contacts based on current mappings [puppet] - 10https://gerrit.wikimedia.org/r/726572 [10:10:22] 10SRE, 10Infrastructure-Foundations, 10netops: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 (10cmooney) p:05Triage→03Low [10:11:51] !log update acme-chief to version 0.32 on acmechief hosts - T290249 [10:11:54] (03PS3) 10Jbond: profile::contact: add role contacts based on current mappings [puppet] - 10https://gerrit.wikimedia.org/r/726572 [10:11:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:11:57] T290249: Support OCSP stapling from prefetched responses in HAProxy - https://phabricator.wikimedia.org/T290249 [10:13:33] (03PS4) 10Jbond: profile::contact: add role contacts based on current mappings [puppet] - 10https://gerrit.wikimedia.org/r/726572 [10:17:18] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31485/console" [puppet] - 10https://gerrit.wikimedia.org/r/726572 (owner: 10Jbond) [10:17:22] RECOVERY - WDQS high update lag on wdqs1006 is OK: (C)4.32e+04 ge (W)2.16e+04 ge 2.14e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [10:18:23] 10SRE, 10Infrastructure-Foundations, 10netops: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 (10MoritzMuehlenhoff) >>! In T292503#7401527, @cmooney wrote: > @ayounsi Riccardo suggested maybe using a separate disk/partition for the routinator data? That was partly... [10:18:27] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31487/console" [puppet] - 10https://gerrit.wikimedia.org/r/726572 (owner: 10Jbond) [10:19:20] 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez) [10:19:57] 10SRE, 10Acme-chief, 10Traffic, 10Patch-For-Review: Support OCSP stapling from prefetched responses in HAProxy - https://phabricator.wikimedia.org/T290249 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [10:26:59] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Overall seems simple and feature complete; I have a few doubts about specific details of the implementation." [docker-images/imagecatalog] - 10https://gerrit.wikimedia.org/r/723663 (https://phabricator.wikimedia.org/T287130) (owner: 10RLazarus) [10:28:02] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Patch-For-Review, and 3 others: Refactor puppet:base module to reducs unneeded shared code paths - https://phabricator.wikimedia.org/T289661 (10jbond) [10:29:13] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Patch-For-Review, and 3 others: Refactor puppet:base module to reducs unneeded shared code paths - https://phabricator.wikimedia.org/T289661 (10jbond) [10:33:51] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10puppet-compiler, and 2 others: Improve PCC support for cloud VPS environments - https://phabricator.wikimedia.org/T289666 (10jbond) [10:51:28] jouncebot: now [10:51:28] No deployments scheduled for the next 0 hour(s) and 8 minute(s) [10:51:37] jouncebot: next [10:51:37] In 0 hour(s) and 8 minute(s): European mid-day backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T1100) [10:58:39] (03CR) 10Jbond: [C: 03+2] nagios_common: add SSL certificate validation to remaining http checks [puppet] - 10https://gerrit.wikimedia.org/r/725766 (owner: 10Jbond) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for European mid-day backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T1100). [11:00:05] Juan_90264 and Dereckson: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:13] I can deploy today [11:00:24] (unless effie has something urgent that needs to go first?) [11:00:29] (03PS30) 10ZPapierski: Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) [11:00:36] (03PS1) 10Ladsgroup: mediawiki: Remove absented systemd timer jobs [puppet] - 10https://gerrit.wikimedia.org/r/726577 [11:00:39] Hi Dereckson, around? [11:00:48] urbanecm: thank you, I decided to do it after the deployment [11:00:56] no need to torture you [11:01:00] ack -- I'll ping you when done :) [11:01:04] (03CR) 10ZPapierski: Added spicerack.kafka with offset transfer function (0310 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [11:01:39] (03PS2) 10Giuseppe Lavagetto: mediawiki: Add rsyslog sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/725892 (https://phabricator.wikimedia.org/T288851) [11:02:14] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: Add rsyslog sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/725892 (https://phabricator.wikimedia.org/T288851) (owner: 10Giuseppe Lavagetto) [11:03:50] there might be easily none deployment apparently [11:04:29] effie: go ahead, no B&C customers are present [11:05:11] Hello [11:05:12] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: Remove absented systemd timer jobs [puppet] - 10https://gerrit.wikimedia.org/r/726577 (owner: 10Ladsgroup) [11:05:25] urbanecm: sure? [11:05:32] urbanecm: I've one change to deploy, for tcywiki [11:05:50] effie: well it looked so 🙂 [11:05:57] may i still go ahead? :D [11:06:00] sorry [11:06:06] (03PS1) 10Elukey: Move stat100[5,8] to AMD ROCm 4.3.1 [puppet] - 10https://gerrit.wikimedia.org/r/726578 (https://phabricator.wikimedia.org/T287267) [11:06:22] urbanecm: sure sure [11:06:26] appreciated [11:06:32] (03PS6) 10Urbanecm: Enable local uploads for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390303 (https://phabricator.wikimedia.org/T166763) (owner: 10TerraCodes) [11:06:32] thanks [11:06:43] (03CR) 10Urbanecm: [C: 03+2] Enable local uploads for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390303 (https://phabricator.wikimedia.org/T166763) (owner: 10TerraCodes) [11:06:51] Dereckson: nice to see you again, btw [11:07:24] :) [11:07:38] (03Merged) 10jenkins-bot: Enable local uploads for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390303 (https://phabricator.wikimedia.org/T166763) (owner: 10TerraCodes) [11:07:56] Hi anoop, we're deploying your change by the way [11:08:05] hi AntiComposite [11:08:07] *anoop [11:08:17] Dereckson: your change is at mwdebug1001 [11:08:43] testing [11:10:03] Thanks Dereckson , hi urbanecm [11:10:19] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10aborrero) >>! In T289882#7401435, @ayounsi wrote: > @aborrero is it possible to have more information on this new service? Design d... [11:10:25] It's a progress, but we'll need another change apparently. [11:10:37] But that one will be on wiki. [11:10:48] Dereckson: so, ok to sync i guess? [11:10:51] so for the current deployment, it looks good, you can sync yes [11:10:56] thanks, doing [11:11:02] anoop: you'll need to put licenses on the wiki [11:11:23] anoop: https://tcy.wikipedia.org/wiki/%E0%B2%AE%E0%B2%BE%E0%B2%A6%E0%B3%8D%E0%B2%AF%E0%B2%AE%E0%B3%8A_%E0%B2%B5%E0%B2%BF%E0%B2%95%E0%B2%BF:Licenses should contain a list of available license template, one for each EDP normally [11:11:37] ok, I will add it [11:11:37] as long as it's not done, people will have a error message pointing to that link ^ [11:11:46] Thanks [11:12:14] !log urbanecm@deploy1002 Synchronized dblists/commonsuploads.dblist: 04524992865b0ae5750eb6fb0a374aa74a65b383: Enable local uploads for tcywiki (T166763) (duration: 00m 59s) [11:12:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:21] T166763: Enable local file upload in Tulu Wikipedia - https://phabricator.wikimedia.org/T166763 [11:12:25] Dereckson: anoop: should be live [11:12:33] thanks [11:12:50] np [11:12:55] effie: sorry for the delay and thanks for your green light [11:13:25] Juan90264 / Juan_90264 appears to not even be connected to libera [11:13:29] so...I'm now actually done :-) [11:14:27] ok, I am going ahead [11:15:08] !log upgrade scap to 4.0.2 - T291095 [11:15:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:16] T291095: Deploy Scap version 4.0.2 - https://phabricator.wikimedia.org/T291095 [11:17:47] !log disabling puppet on cassandra nodes for rollout of 724061 - defaulting to cn=fqdn certificates [11:17:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:14] (03CR) 10Hnowlan: [V: 03+1 C: 03+2] cassandra: use FQDN in CN name for future instances [puppet] - 10https://gerrit.wikimedia.org/r/724061 (https://phabricator.wikimedia.org/T141541) (owner: 10Hnowlan) [11:24:58] (03PS1) 10Elukey: Revert "Move analytics-hive to an-coord1002" [dns] - 10https://gerrit.wikimedia.org/r/725925 [11:27:54] (03CR) 10Elukey: [C: 03+2] Revert "Move analytics-hive to an-coord1002" [dns] - 10https://gerrit.wikimedia.org/r/725925 (owner: 10Elukey) [11:28:13] !log hnowlan@cumin1001 START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001 [11:28:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:36] (03PS1) 10Effie Mouzeli: mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) [11:32:30] (03CR) 10jerkins-bot: [V: 04-1] mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) (owner: 10Effie Mouzeli) [11:32:42] PROBLEM - Check systemd state on ms-be1040 is CRITICAL: CRITICAL - degraded: The following units failed: session-206973.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:34:06] (03PS2) 10Effie Mouzeli: mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) [11:37:13] (03CR) 10jerkins-bot: [V: 04-1] mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) (owner: 10Effie Mouzeli) [11:37:31] !log hnowlan@cumin1001 END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001 [11:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:40] (03PS1) 10Joal: Update AQS druid datasources [puppet] - 10https://gerrit.wikimedia.org/r/726582 [11:39:34] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Apart from the yaml error, lgtm." [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) (owner: 10Effie Mouzeli) [11:41:06] (03PS3) 10Effie Mouzeli: mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) [11:44:07] (03CR) 10jerkins-bot: [V: 04-1] mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) (owner: 10Effie Mouzeli) [11:48:23] (03PS4) 10Effie Mouzeli: mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) [11:52:03] (03CR) 10jerkins-bot: [V: 04-1] mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) (owner: 10Effie Mouzeli) [11:53:04] (03PS1) 10Kormat: admin: Move ifried from ldap-only to posix [puppet] - 10https://gerrit.wikimedia.org/r/726583 (https://phabricator.wikimedia.org/T292118) [11:53:06] (03PS1) 10Hnowlan: Revert "cassandra: use FQDN in CN name for future instances" [puppet] - 10https://gerrit.wikimedia.org/r/726588 [11:53:21] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Request access to private data group for ifried - https://phabricator.wikimedia.org/T292118 (10Kormat) [11:54:57] (03CR) 10Kormat: [C: 03+2] admin: Move ifried from ldap-only to posix [puppet] - 10https://gerrit.wikimedia.org/r/726583 (https://phabricator.wikimedia.org/T292118) (owner: 10Kormat) [11:55:44] (03CR) 10Hnowlan: [C: 03+2] Revert "cassandra: use FQDN in CN name for future instances" [puppet] - 10https://gerrit.wikimedia.org/r/726588 (owner: 10Hnowlan) [11:55:48] (03CR) 10Ema: [V: 03+2 C: 03+2] Release 6.0.8-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/725307 (https://phabricator.wikimedia.org/T268736) (owner: 10Ema) [11:57:07] !log hnowlan@cumin1001 START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001 [11:57:08] !log hnowlan@cumin1001 END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001 [11:57:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:16] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 103 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [11:57:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:56] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Request access to private data group for ifried - https://phabricator.wikimedia.org/T292118 (10Kormat) 05Open→03Resolved a:03Kormat Hey @ifried, this is done now. You were already in the 'wmf' LDAP group, but i've created a posix account for you, and ad... [11:58:34] !log reverted restbase2023 to use CN=hostname certificate due to loading errors [11:58:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:09] 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Christina Macholan - https://phabricator.wikimedia.org/T292515 (10CMacholan) [12:02:58] (03PS5) 10Effie Mouzeli: mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) [12:06:11] (03CR) 10jerkins-bot: [V: 04-1] mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) (owner: 10Effie Mouzeli) [12:07:52] (03PS6) 10Effie Mouzeli: mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) [12:10:42] (03CR) 10jerkins-bot: [V: 04-1] mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) (owner: 10Effie Mouzeli) [12:11:48] RECOVERY - Check systemd state on ms-be1040 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:11:59] (03PS7) 10Effie Mouzeli: mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) [12:14:52] (03CR) 10jerkins-bot: [V: 04-1] mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) (owner: 10Effie Mouzeli) [12:16:49] (03PS8) 10Effie Mouzeli: mwdebug: bump namespace limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) [12:21:59] (03PS5) 10MacFan4000: ExtensionDistributor: Add 1.37 as preview branch; remove 1.31 as it's EOL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725912 [12:23:20] 10SRE, 10Infrastructure-Foundations, 10netops, 10Documentation: Document our OOB - https://phabricator.wikimedia.org/T292504 (10Aklapper) [12:24:17] !log deployment-cache-text06: upgrade varnish to 6.0.8-1wm1 T292290 [12:24:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:24] T292290: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 [12:28:06] (03PS2) 10MVernon: alertmanager: route data-persistence team alerts [puppet] - 10https://gerrit.wikimedia.org/r/723220 (https://phabricator.wikimedia.org/T257056) [12:28:44] (03PS3) 10MVernon: alertmanager: route data-persistence team alerts [puppet] - 10https://gerrit.wikimedia.org/r/723220 (https://phabricator.wikimedia.org/T257056) [12:29:48] (03PS1) 10Majavah: acme_chief: add openstack certs [puppet] - 10https://gerrit.wikimedia.org/r/726585 (https://phabricator.wikimedia.org/T267194) [12:30:52] (03PS1) 10Elukey: amd_rocm: add support for ROCm 4.2 [puppet] - 10https://gerrit.wikimedia.org/r/726606 (https://phabricator.wikimedia.org/T287267) [12:31:10] (03CR) 10MVernon: alertmanager: route data-persistence team alerts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/723220 (https://phabricator.wikimedia.org/T257056) (owner: 10MVernon) [12:31:52] (03CR) 10jerkins-bot: [V: 04-1] amd_rocm: add support for ROCm 4.2 [puppet] - 10https://gerrit.wikimedia.org/r/726606 (https://phabricator.wikimedia.org/T287267) (owner: 10Elukey) [12:33:20] (03PS2) 10Elukey: amd_rocm: add support for ROCm 4.2 [puppet] - 10https://gerrit.wikimedia.org/r/726606 (https://phabricator.wikimedia.org/T287267) [12:35:06] (03CR) 10Elukey: [C: 03+2] amd_rocm: add support for ROCm 4.2 [puppet] - 10https://gerrit.wikimedia.org/r/726606 (https://phabricator.wikimedia.org/T287267) (owner: 10Elukey) [12:36:35] (03PS1) 10Ema: ATS: make backend ram cache size configurable [puppet] - 10https://gerrit.wikimedia.org/r/726607 (https://phabricator.wikimedia.org/T286502) [12:39:29] (03PS2) 10Ema: ATS: make backend ram cache size configurable [puppet] - 10https://gerrit.wikimedia.org/r/726607 (https://phabricator.wikimedia.org/T286502) [12:41:37] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/726607 (https://phabricator.wikimedia.org/T286502) (owner: 10Ema) [12:42:14] (03PS1) 10Muehlenhoff: Add repo sync definition and repo component for Routinator [puppet] - 10https://gerrit.wikimedia.org/r/726610 (https://phabricator.wikimedia.org/T292503) [12:42:25] (03PS3) 10Vgutierrez: haproxy: Allow configuring timeouts [puppet] - 10https://gerrit.wikimedia.org/r/719479 (https://phabricator.wikimedia.org/T290005) [12:42:41] (03PS1) 10Elukey: Downgrade AMD ROCm to 4.2 (from 4.3.1) on an-worker1096 [puppet] - 10https://gerrit.wikimedia.org/r/726611 (https://phabricator.wikimedia.org/T287267) [12:43:25] !log import AMD ROCm 4.2 to buster-wikimedia's thirdparty/amd-rocm42 - T287267 [12:43:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:31] T287267: Update ROCm version on GPU instances. - https://phabricator.wikimedia.org/T287267 [12:43:57] (03CR) 10Elukey: [C: 03+2] Downgrade AMD ROCm to 4.2 (from 4.3.1) on an-worker1096 [puppet] - 10https://gerrit.wikimedia.org/r/726611 (https://phabricator.wikimedia.org/T287267) (owner: 10Elukey) [12:44:19] (03PS1) 10Filippo Giunchedi: graphite: add Bullseye support [puppet] - 10https://gerrit.wikimedia.org/r/726612 (https://phabricator.wikimedia.org/T247963) [12:44:21] (03PS1) 10Filippo Giunchedi: graphite: add Bullseye version of graphite auth/index [puppet] - 10https://gerrit.wikimedia.org/r/726613 (https://phabricator.wikimedia.org/T247963) [12:44:23] (03PS1) 10Filippo Giunchedi: graphite: stop using LVM for /srv in labs [puppet] - 10https://gerrit.wikimedia.org/r/726614 (https://phabricator.wikimedia.org/T247963) [12:46:27] (03PS3) 10Hashar: Enable Content-Security-Policy reporting [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/725900 (https://phabricator.wikimedia.org/T285363) [12:46:31] (03PS3) 10Ema: ATS: make backend ram cache size configurable [puppet] - 10https://gerrit.wikimedia.org/r/726607 (https://phabricator.wikimedia.org/T286502) [12:46:45] (03PS1) 10Muehlenhoff: On bullseye and later install routinator from thirdparty/routinator [puppet] - 10https://gerrit.wikimedia.org/r/726615 [12:46:52] (03CR) 10Hashar: "Tried on https://gitlab-replica.wikimedia.org and Firefox emitted a few warnings which I have amended in the next patchset. May you deploy" [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/725900 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [12:47:08] (03PS3) 10Vgutierrez: haproxy: Add H2 performance tuning settings [puppet] - 10https://gerrit.wikimedia.org/r/719974 (https://phabricator.wikimedia.org/T290005) [12:48:08] (03CR) 10Vgutierrez: [C: 03+1] ATS: make backend ram cache size configurable [puppet] - 10https://gerrit.wikimedia.org/r/726607 (https://phabricator.wikimedia.org/T286502) (owner: 10Ema) [12:49:04] (03PS2) 10Filippo Giunchedi: graphite: add Bullseye support [puppet] - 10https://gerrit.wikimedia.org/r/726612 (https://phabricator.wikimedia.org/T247963) [12:49:06] (03PS2) 10Filippo Giunchedi: graphite: add Bullseye version of graphite auth/index [puppet] - 10https://gerrit.wikimedia.org/r/726613 (https://phabricator.wikimedia.org/T247963) [12:49:08] (03PS2) 10Filippo Giunchedi: graphite: stop using LVM for /srv in labs [puppet] - 10https://gerrit.wikimedia.org/r/726614 (https://phabricator.wikimedia.org/T247963) [12:49:10] (03PS3) 10MVernon: data-protection: add alerting for prometheus-mysqld-exporter failing [alerts] - 10https://gerrit.wikimedia.org/r/723223 (https://phabricator.wikimedia.org/T257056) [12:49:46] (03CR) 10Ayounsi: [C: 03+1] Add repo sync definition and repo component for Routinator [puppet] - 10https://gerrit.wikimedia.org/r/726610 (https://phabricator.wikimedia.org/T292503) (owner: 10Muehlenhoff) [12:50:37] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Upgrade puppetboard to the latest version - https://phabricator.wikimedia.org/T292522 (10jbond) 05Open→03In progress a:03jbond [12:51:49] (03CR) 10Ayounsi: [C: 03+1] On bullseye and later install routinator from thirdparty/routinator [puppet] - 10https://gerrit.wikimedia.org/r/726615 (owner: 10Muehlenhoff) [12:52:01] 10Puppet, 10Infrastructure-Foundations, 10Packaging, 10User-jbond: create puppetboard debian package - https://phabricator.wikimedia.org/T292523 (10jbond) [12:52:06] PROBLEM - Check systemd state on an-worker1096 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_amd_rocm_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:52:16] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Upgrade puppetboard to the latest version - https://phabricator.wikimedia.org/T292522 (10jbond) [12:52:52] (03CR) 10David Caro: [C: 03+1] "LGTM, we have no VMs using this profile, only cloudmetrics bare metal (thus for `$::realm == 'prod'`)." [puppet] - 10https://gerrit.wikimedia.org/r/726614 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [12:52:55] 10Puppet, 10Infrastructure-Foundations, 10Packaging, 10User-jbond: create puppetboard debian package - https://phabricator.wikimedia.org/T292523 (10jbond) The repo linked above now has a version of puppet board which builds correctly and has been tested in a labs instance. [12:53:12] 10SRE, 10Datacenter-Switchover, 10Patch-For-Review, 10User-notice: September 2021 Datacenter switchover (codfw -> eqiad) - https://phabricator.wikimedia.org/T287539 (10Trizek-WMF) [12:53:22] 10SRE, 10Traffic: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 (10ema) Preliminary testing in beta looks good, uploading the package to the archive. [12:53:34] !log upload varnish 6.0.8-1wm1 to apt.wikimedia.org T292290 [12:53:36] 10SRE, 10CommRel-Specialists-Support (Jul-Sep-2021), 10Datacenter-Switchover: CommRel support for September 2021 Switchover - https://phabricator.wikimedia.org/T287546 (10Trizek-WMF) 05Open→03Resolved We debriefed and we are going to improve the processes so that they will be less time consuming, more re... [12:53:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:41] T292290: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 [12:54:28] 10Puppet, 10Infrastructure-Foundations, 10Packaging, 10User-jbond: Update python3-pypuppetdb package to 2.4.0 - https://phabricator.wikimedia.org/T292525 (10jbond) [12:54:36] 10Puppet, 10Infrastructure-Foundations, 10Packaging, 10User-jbond: Update python3-pypuppetdb package to 2.4.0 - https://phabricator.wikimedia.org/T292525 (10jbond) p:05Triage→03Medium [12:55:18] (03PS2) 10Vgutierrez: haproxy: Add PROXY protocol support [puppet] - 10https://gerrit.wikimedia.org/r/720021 (https://phabricator.wikimedia.org/T290005) [12:55:34] (03PS1) 10Elukey: Downgrade AMD ROCm to 4.2 on all GPU-based Hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/726619 (https://phabricator.wikimedia.org/T287267) [12:58:07] 10Puppet, 10Infrastructure-Foundations, 10Packaging, 10User-jbond: Update python3-pypuppetdb package to 2.4.0 - https://phabricator.wikimedia.org/T292525 (10jbond) I have created an update package for 2.4.0 https://salsa.debian.org/python-team/packages/pypuppetdb/-/merge_requests/4 [12:58:22] (03PS3) 10Vgutierrez: haproxy: Allow adding/removing HTTP headers [puppet] - 10https://gerrit.wikimedia.org/r/720272 (https://phabricator.wikimedia.org/T290005) [12:58:53] (03PS1) 10Ppchelko: uppercaseTitlesForUnicodeTransition: improve userlist format [core] (wmf/1.38.0-wmf.2) - 10https://gerrit.wikimedia.org/r/726589 (https://phabricator.wikimedia.org/T219279) [12:59:30] (03CR) 10Ppchelko: [C: 03+2] uppercaseTitlesForUnicodeTransition: improve userlist format [core] (wmf/1.38.0-wmf.2) - 10https://gerrit.wikimedia.org/r/726589 (https://phabricator.wikimedia.org/T219279) (owner: 10Ppchelko) [13:00:47] (03CR) 10Elukey: [C: 03+2] Downgrade AMD ROCm to 4.2 on all GPU-based Hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/726619 (https://phabricator.wikimedia.org/T287267) (owner: 10Elukey) [13:09:14] 10SRE, 10Infrastructure-Foundations, 10netops, 10Documentation: Document our OOB - https://phabricator.wikimedia.org/T292504 (10ayounsi) 05Open→03Resolved https://wikitech.wikimedia.org/wiki/OOB [13:15:47] (03CR) 10Jelto: Enable Content-Security-Policy reporting (033 comments) [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/725900 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [13:19:26] (03Merged) 10jenkins-bot: uppercaseTitlesForUnicodeTransition: improve userlist format [core] (wmf/1.38.0-wmf.2) - 10https://gerrit.wikimedia.org/r/726589 (https://phabricator.wikimedia.org/T219279) (owner: 10Ppchelko) [13:19:56] (03CR) 10Vgutierrez: [C: 03+1] acmechief: Remove mx2002 [puppet] - 10https://gerrit.wikimedia.org/r/723422 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [13:23:54] !log ppchelko@deploy1002 Synchronized php-1.38.0-wmf.2/maintenance/uppercaseTitlesForUnicodeTransition.php: Backport uppercaseTitlesForUnicodeTransition.php maintenance script improvements T219279 (duration: 00m 58s) [13:24:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:04] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [13:26:43] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good (reviewing this change made me notice a few errors in the owners.yaml (mostly related to outdated entries), but let's merge the" [puppet] - 10https://gerrit.wikimedia.org/r/726572 (owner: 10Jbond) [13:28:01] elukey or ottomata - could one of you merge and deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/726582? [13:29:30] (03CR) 10MVernon: "Hi," [alerts] - 10https://gerrit.wikimedia.org/r/723223 (https://phabricator.wikimedia.org/T257056) (owner: 10MVernon) [13:30:44] RECOVERY - Check systemd state on cumin2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:33:14] (03PS1) 10Ppchelko: Remove mb_strtoupper overrides for HHVM [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726623 (https://phabricator.wikimedia.org/T219279) [13:34:02] (03CR) 10Ppchelko: [C: 04-2] "-2 until the scripts actually run to prevent accidental merging." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726623 (https://phabricator.wikimedia.org/T219279) (owner: 10Ppchelko) [13:35:46] (03CR) 10Elukey: [C: 03+2] Update AQS druid datasources [puppet] - 10https://gerrit.wikimedia.org/r/726582 (owner: 10Joal) [13:35:54] (03PS2) 10Elukey: Update AQS druid datasources [puppet] - 10https://gerrit.wikimedia.org/r/726582 (owner: 10Joal) [13:39:20] !log elukey@cumin1001 START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - elukey@cumin1001 [13:39:24] !log elukey@cumin1001 END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - elukey@cumin1001 [13:39:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:08] mmmm [13:40:53] 10SRE, 10wikitech.wikimedia.org, 10Sustainability (Incident Followup): Incident response tools operational readiness review - https://phabricator.wikimedia.org/T290130 (10LSobanski) [13:41:40] (03PS1) 10Muehlenhoff: Update a few stale/obsoleted entries [puppet] - 10https://gerrit.wikimedia.org/r/726624 [13:43:08] (03PS1) 10Jbond: facter: interface_primary consider tokenize slaac addresses [puppet] - 10https://gerrit.wikimedia.org/r/726625 [13:43:50] (03CR) 10jerkins-bot: [V: 04-1] facter: interface_primary consider tokenize slaac addresses [puppet] - 10https://gerrit.wikimedia.org/r/726625 (owner: 10Jbond) [13:44:38] jouncebot: now [13:44:38] No deployments scheduled for the next 2 hour(s) and 15 minute(s) [13:44:40] jouncebot: next [13:44:40] In 2 hour(s) and 15 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T1600) [13:45:05] (03PS2) 10Jbond: facter: interface_primary consider tokenize slaac addresses [puppet] - 10https://gerrit.wikimedia.org/r/726625 [13:45:16] Juan_90264: hi [13:45:23] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/726624 (owner: 10Muehlenhoff) [13:45:27] (I just saw in the logs that you pinged me earlier) [13:45:37] unfortunately the backport window was a while ago [13:45:38] Hi [13:45:59] you need to be around during the deployment window, we don’t (generally) deploy changes without the people who brought them [13:46:21] (03CR) 10Jbond: [V: 03+1 C: 03+2] profile::contact: add role contacts based on current mappings (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/726572 (owner: 10Jbond) [13:46:27] !log run renameInvalidUsernames.php --wiki loginwiki --list /tmp/rename_users_for_uppercase_all.txt T219279 [13:46:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:34] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [13:47:07] (03PS4) 10Elukey: Workaround quote escaping bug [cookbooks] - 10https://gerrit.wikimedia.org/r/711249 (owner: 10Razzi) [13:47:44] Thank you Lucas for the answer. [13:48:15] (03CR) 10Reedy: [C: 03+2] ExtensionDistributor: Add 1.37 as preview branch; remove 1.31 as it's EOL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725912 (owner: 10MacFan4000) [13:48:28] if 11:00 UTC isn’t a good time for you, you can use one of the other two windows [13:48:48] (this week they’re still “morning” and “evening” in the calendar, starting next week they’re “UTC evening” and “UTC late”) [13:49:03] (03Merged) 10jenkins-bot: ExtensionDistributor: Add 1.37 as preview branch; remove 1.31 as it's EOL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725912 (owner: 10MacFan4000) [13:49:09] usually someone is around during those windows as well, I believe (but not me ^^) [13:49:54] I will be available for today's "Evening backport window" backport [13:50:30] ok, then please add your changes to that window on Wikitech, and hopefully somebody will deploy them [13:50:50] (the bot will also ping you if you’re online when the window starts) [13:51:00] !log reedy@deploy1002 Synchronized wmf-config/CommonSettings.php: ExtensionDistributor - Drop REL1_31, start REL1_37 (duration: 00m 57s) [13:51:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:43] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/726612 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [13:57:23] (03PS1) 10Joal: Update AQS druid datasource [puppet] - 10https://gerrit.wikimedia.org/r/726627 [13:57:37] elukey: --^ [13:57:54] (03CR) 10jerkins-bot: [V: 04-1] Update AQS druid datasource [puppet] - 10https://gerrit.wikimedia.org/r/726627 (owner: 10Joal) [13:58:11] arf [13:59:37] (03PS2) 10Joal: Update AQS druid datasource [puppet] - 10https://gerrit.wikimedia.org/r/726627 [14:02:12] (03CR) 10Elukey: [C: 03+2] Update AQS druid datasource [puppet] - 10https://gerrit.wikimedia.org/r/726627 (owner: 10Joal) [14:02:36] thanks Lucas_WMDE for talking with Juan [14:03:06] RECOVERY - Check systemd state on an-worker1096 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:03:08] np [14:05:34] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM! Feel free to merge, will be auto-deployed (or I can too)" [puppet] - 10https://gerrit.wikimedia.org/r/723220 (https://phabricator.wikimedia.org/T257056) (owner: 10MVernon) [14:06:11] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10serviceops-radar: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10ayounsi) [14:06:56] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, feel free to merge and it'll be auto-deployed by puppet at the next run" [alerts] - 10https://gerrit.wikimedia.org/r/723223 (https://phabricator.wikimedia.org/T257056) (owner: 10MVernon) [14:07:16] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10serviceops-radar: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10ayounsi) I pinged a few persons on the task description to know if those records are needed (and document their usecase if so). [14:09:24] PROBLEM - Check systemd state on an-worker1096 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_amd_rocm_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:15:29] (03PS1) 10Volans: icinga-status: fix non-unicode chars [puppet] - 10https://gerrit.wikimedia.org/r/726634 [14:15:33] elukey: ^^^ [14:16:09] (03PS2) 10Volans: icinga-status: fix non-unicode chars [puppet] - 10https://gerrit.wikimedia.org/r/726634 [14:16:11] sorry, amended [14:18:47] (03CR) 10Volans: "question inline" [puppet] - 10https://gerrit.wikimedia.org/r/726625 (owner: 10Jbond) [14:20:17] (03PS3) 10Filippo Giunchedi: graphite: stop using LVM for /srv in labs [puppet] - 10https://gerrit.wikimedia.org/r/726614 (https://phabricator.wikimedia.org/T247963) [14:21:21] (03Abandoned) 10Elukey: Workaround quote escaping bug [cookbooks] - 10https://gerrit.wikimedia.org/r/711249 (owner: 10Razzi) [14:22:01] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10serviceops-radar: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10akosiaris) Thanks for chasing that info down! So for ` CNAMEs: swift -> ms-fe kubestagemaster -> kubestagemaster1001 / kubestagemaster2001 staging ->... [14:22:17] (03CR) 10Filippo Giunchedi: [C: 03+2] graphite: stop using LVM for /srv in labs [puppet] - 10https://gerrit.wikimedia.org/r/726614 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [14:23:42] (03CR) 10Ayounsi: "Looks cleaner like that. Obviously I didn't test it but the logic looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/726625 (owner: 10Jbond) [14:25:03] (03CR) 10Elukey: [C: 03+1] icinga-status: fix non-unicode chars [puppet] - 10https://gerrit.wikimedia.org/r/726634 (owner: 10Volans) [14:26:12] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10serviceops-radar: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10Volans) >>! In T270071#7402357, @akosiaris wrote: > As far as > > ` > SVC records that point to host IPs outside of the SVC subnets. Are those really... [14:27:45] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Pchelolo) Renaming users completed. There's 4 accounts across all sites... [14:28:07] (03CR) 10Jbond: facter: interface_primary consider tokenize slaac addresses (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/726625 (owner: 10Jbond) [14:28:22] (03CR) 10MVernon: [C: 03+2] alertmanager: route data-persistence team alerts [puppet] - 10https://gerrit.wikimedia.org/r/723220 (https://phabricator.wikimedia.org/T257056) (owner: 10MVernon) [14:28:33] (03PS4) 10MVernon: alertmanager: route data-persistence team alerts [puppet] - 10https://gerrit.wikimedia.org/r/723220 (https://phabricator.wikimedia.org/T257056) [14:29:08] (03CR) 10MVernon: [C: 03+2] data-protection: add alerting for prometheus-mysqld-exporter failing [alerts] - 10https://gerrit.wikimedia.org/r/723223 (https://phabricator.wikimedia.org/T257056) (owner: 10MVernon) [14:29:13] (03PS4) 10MVernon: data-protection: add alerting for prometheus-mysqld-exporter failing [alerts] - 10https://gerrit.wikimedia.org/r/723223 (https://phabricator.wikimedia.org/T257056) [14:30:04] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/726634 (owner: 10Volans) [14:30:07] !log run foreachwiki uppercaseTitlesForUnicodeTransition.php --charmap current_to_php7_overrides.php T219279 [14:30:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:15] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [14:30:35] (03CR) 10Volans: [C: 03+2] icinga-status: fix non-unicode chars [puppet] - 10https://gerrit.wikimedia.org/r/726634 (owner: 10Volans) [14:31:04] (03CR) 10Filippo Giunchedi: [C: 03+2] graphite: add Bullseye support [puppet] - 10https://gerrit.wikimedia.org/r/726612 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [14:31:30] Emperor: there is also a change from you to puppet-merge [14:31:34] can I go ahead? [14:31:38] MVernon: alertmanager: route data-persistence team alerts (d7b42e5f99) [14:32:01] volans: sure, yes (I had only just clicked go on gerrit, sorry if caused you hassle by bad timing) [14:32:10] no prob [14:32:30] we have a check in puppet-merge for that :) [14:32:32] (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: set log field aside prior to parsing k8s logs [puppet] - 10https://gerrit.wikimedia.org/r/726129 (https://phabricator.wikimedia.org/T292099) (owner: 10Cwhite) [14:36:36] RECOVERY - Check systemd state on an-worker1096 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:39:29] 10SRE: Migrate puppetboard to Buster - https://phabricator.wikimedia.org/T264276 (10jbond) [14:39:33] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Upgrade puppetboard to the latest version - https://phabricator.wikimedia.org/T292522 (10jbond) [14:39:42] 10SRE: Migrate puppetboard to Bullseye - https://phabricator.wikimedia.org/T264276 (10jbond) 05Open→03In progress p:05Triage→03Medium [14:39:46] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Upgrade puppetboard to the latest version - https://phabricator.wikimedia.org/T292522 (10jbond) [14:40:19] (03PS1) 10Jbond: linux-host-entries: update puppetboard[12]002 servers to bullseye [puppet] - 10https://gerrit.wikimedia.org/r/726637 (https://phabricator.wikimedia.org/T264276) [14:40:40] !log depool mw1455 and mw1422 [14:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:43] (03CR) 10Jbond: [C: 03+2] linux-host-entries: update puppetboard[12]002 servers to bullseye [puppet] - 10https://gerrit.wikimedia.org/r/726637 (https://phabricator.wikimedia.org/T264276) (owner: 10Jbond) [14:45:09] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10serviceops-radar: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10aborrero) > * SVC records that point to host IPs outside of the SVC subnets. Are those really required or just tech debt that should be fixed? > ** nfs... [14:46:45] 10SRE, 10Fundraising-Backlog, 10Traffic, 10fr-donorservices, and 2 others: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561 (10Jgreen) >>! In T188561#7264108, @DStrine wrote: > @JBennett @BBlack @Dwisehaupt @Jgreen I'm hearing that the email service provider (now branded a... [14:49:19] (03CR) 10Arturo Borrero Gonzalez: "Is the `dns-01` challenge configured to work with the wikimediacloud.org domain?" [puppet] - 10https://gerrit.wikimedia.org/r/726585 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [14:50:13] (03CR) 10Vgutierrez: haproxy: Allow adding/removing HTTP headers (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/720272 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [14:51:35] 10SRE, 10DNS, 10Traffic: Additional DNS entries for Wikilearn project (Community Development) - https://phabricator.wikimedia.org/T292537 (10Vgutierrez) 05Open→03In progress p:05Triage→03Medium [14:51:40] (03CR) 10Majavah: acme_chief: add openstack certs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/726585 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [14:55:13] (03CR) 10Muehlenhoff: [C: 03+2] Update a few stale/obsoleted entries [puppet] - 10https://gerrit.wikimedia.org/r/726624 (owner: 10Muehlenhoff) [14:56:34] (03CR) 10Arturo Borrero Gonzalez: acme_chief: add openstack certs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/726585 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [14:56:46] 10SRE, 10DNS, 10Traffic: Additional DNS entries for Wikilearn project (Community Development) - https://phabricator.wikimedia.org/T292537 (10Vgutierrez) @Ijon could you confirm that you want forum.dev.learn.wiki. pointing to a private class C IP (192.168.193.13)? [14:57:36] (03CR) 10Muehlenhoff: [C: 03+2] On bullseye and later install routinator from thirdparty/routinator [puppet] - 10https://gerrit.wikimedia.org/r/726615 (owner: 10Muehlenhoff) [14:57:51] (03CR) 10Muehlenhoff: [C: 03+2] Add repo sync definition and repo component for Routinator [puppet] - 10https://gerrit.wikimedia.org/r/726610 (https://phabricator.wikimedia.org/T292503) (owner: 10Muehlenhoff) [14:58:06] 10SRE, 10DNS, 10Traffic: Additional DNS entries for Wikilearn project (Community Development) - https://phabricator.wikimedia.org/T292537 (10Vgutierrez) 05In progress→03Stalled [14:58:12] !log reimage puppetboard1002 [14:58:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:36] (03CR) 10Muehlenhoff: [C: 03+2] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/726615 (owner: 10Muehlenhoff) [15:02:46] (03CR) 10Vgutierrez: [C: 04-1] acme_chief: add openstack certs (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/726585 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [15:03:29] (03PS2) 10Majavah: acme_chief: add openstack certs [puppet] - 10https://gerrit.wikimedia.org/r/726585 (https://phabricator.wikimedia.org/T267194) [15:03:48] (03CR) 10Majavah: acme_chief: add openstack certs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/726585 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [15:04:45] (03CR) 10Vgutierrez: [C: 03+1] "that was quick! LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/726585 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [15:06:19] (03PS3) 10Jbond: facter: interface_primary consider tokenize slaac addresses [puppet] - 10https://gerrit.wikimedia.org/r/726625 [15:06:33] (03CR) 10Jbond: "updated" [puppet] - 10https://gerrit.wikimedia.org/r/726625 (owner: 10Jbond) [15:10:22] !log imported routinator 0.10.1-1bullseye to thirdparty/routinator for bullseye-wikimedia T292503 [15:10:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:28] T292503: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 [15:12:02] (03PS1) 10Volans: icinga-status: replace UTF-8 errors with ? [puppet] - 10https://gerrit.wikimedia.org/r/726641 [15:12:29] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/726625 (owner: 10Jbond) [15:15:23] !log jbond@cumin1001 START - Cookbook sre.puppet.renew-cert for puppetboard1002.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001 [15:15:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:15:38] !log jbond@cumin1001 END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetboard1002.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001 [15:15:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:07] (03PS6) 10Bearloga: statistics::product_analytics: create and prepare [puppet] - 10https://gerrit.wikimedia.org/r/724497 (https://phabricator.wikimedia.org/T291957) [15:16:51] (03CR) 10Vgutierrez: [C: 04-1] "Sorry, but wikimediacloud.org is missing CAA records:" [puppet] - 10https://gerrit.wikimedia.org/r/726585 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [15:23:08] RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:25:28] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10serviceops-radar: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10akosiaris) >>! In T270071#7402401, @Volans wrote: >>>! In T270071#7402357, @akosiaris wrote: >> As far as >> >> ` >> SVC records that point to host IP... [15:25:44] (03CR) 10Vgutierrez: [C: 04-1] acme_chief: add openstack certs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/726585 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [15:26:55] (03PS1) 10Jbond: P:monitoring: change ordering [puppet] - 10https://gerrit.wikimedia.org/r/726642 [15:28:24] (03CR) 10jerkins-bot: [V: 04-1] P:monitoring: change ordering [puppet] - 10https://gerrit.wikimedia.org/r/726642 (owner: 10Jbond) [15:28:53] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31488/console" [puppet] - 10https://gerrit.wikimedia.org/r/726642 (owner: 10Jbond) [15:29:08] PROBLEM - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: deploy_to_mwdebug.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:30:22] (03PS2) 10Jbond: P:monitoring: change ordering [puppet] - 10https://gerrit.wikimedia.org/r/726642 [15:31:25] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31489/console" [puppet] - 10https://gerrit.wikimedia.org/r/726642 (owner: 10Jbond) [15:32:45] (03CR) 10MSantos: Add script to send tile invalidation events (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) (owner: 10Jgiannelos) [15:34:06] (03PS1) 10Arlolra: [WIP] Enable legacy media dom on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726643 (https://phabricator.wikimedia.org/T292498) [15:34:10] (03PS1) 10Kormat: admin: Add sgimeno to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/726644 (https://phabricator.wikimedia.org/T292541) [15:34:46] (03CR) 10Subramanya Sastry: [C: 03+1] "As long as https://releases.wikimedia.org/parsoid/ is not going away, works for me." [puppet] - 10https://gerrit.wikimedia.org/r/725670 (owner: 10Muehlenhoff) [15:35:14] (03CR) 10Kormat: [C: 03+2] admin: Add sgimeno to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/726644 (https://phabricator.wikimedia.org/T292541) (owner: 10Kormat) [15:36:28] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:monitoring: change ordering [puppet] - 10https://gerrit.wikimedia.org/r/726642 (owner: 10Jbond) [15:38:55] !log reimage puppetboard2002 [15:39:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:16] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Christina Macholan - https://phabricator.wikimedia.org/T292515 (10Kormat) p:05Triage→03Medium [15:45:58] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Christina Macholan - https://phabricator.wikimedia.org/T292515 (10Kormat) [15:49:30] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Christina Macholan - https://phabricator.wikimedia.org/T292515 (10Kormat) [15:53:46] (03PS19) 10Dave Pifke: webperf: connect to Kafka using TLS [puppet] - 10https://gerrit.wikimedia.org/r/721047 (https://phabricator.wikimedia.org/T290131) [15:55:59] (03CR) 10Jbond: [C: 03+1] icinga-status: replace UTF-8 errors with ? [puppet] - 10https://gerrit.wikimedia.org/r/726641 (owner: 10Volans) [15:57:41] !log jbond@cumin2002 START - Cookbook sre.puppet.renew-cert for puppetboard2002.codfw.wmnet: Renew puppet certificate - jbond@cumin2002 [15:57:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:51] (03CR) 10Volans: [C: 03+2] icinga-status: replace UTF-8 errors with ? [puppet] - 10https://gerrit.wikimedia.org/r/726641 (owner: 10Volans) [15:57:54] (03PS3) 10Vgutierrez: haproxy: Allow loading lua scripts [puppet] - 10https://gerrit.wikimedia.org/r/720273 (https://phabricator.wikimedia.org/T290005) [15:57:56] !log jbond@cumin2002 END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetboard2002.codfw.wmnet: Renew puppet certificate - jbond@cumin2002 [15:58:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:09] (03CR) 10Vgutierrez: haproxy: Allow loading lua scripts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/720273 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [15:58:20] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Christina Macholan - https://phabricator.wikimedia.org/T292515 (10Kormat) This needs approval from @Ottomata and @Sbodington. [15:58:37] (03PS1) 10Volans: uwsgi: restore unicode output for NRPE check [puppet] - 10https://gerrit.wikimedia.org/r/726649 [15:58:56] jbond: ^^ as agreed :) [16:00:04] jbond and rzl: My dear minions, it's time we take the moon! Just kidding. Time for Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T1600). [16:00:04] dpifke: A patch you scheduled for Puppet request window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:01:51] Here. [16:04:13] (03CR) 10MSantos: [C: 03+1] "LGTM. You can merge when we're ready." [deployment-charts] - 10https://gerrit.wikimedia.org/r/726562 (owner: 10Jgiannelos) [16:04:37] (03CR) 10MSantos: [C: 03+1] tile-pregeneration: Exclude canary test events [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/726522 (https://phabricator.wikimedia.org/T270175) (owner: 10Jgiannelos) [16:06:09] (03CR) 10Elukey: [C: 03+1] uwsgi: restore unicode output for NRPE check [puppet] - 10https://gerrit.wikimedia.org/r/726649 (owner: 10Volans) [16:06:45] (03CR) 10MSantos: [C: 03+1] tegola-vector-tiles: Configure kafka for eqiad/codfw main [deployment-charts] - 10https://gerrit.wikimedia.org/r/726561 (owner: 10Jgiannelos) [16:06:54] (03CR) 10Volans: [C: 03+2] uwsgi: restore unicode output for NRPE check [puppet] - 10https://gerrit.wikimedia.org/r/726649 (owner: 10Volans) [16:07:35] (03CR) 10Alexandros Kosiaris: logstash: set log field aside prior to parsing k8s logs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/726129 (https://phabricator.wikimedia.org/T292099) (owner: 10Cwhite) [16:07:57] (03PS1) 10Jbond: P:base: include monitoring earlier so that the plugins directory is created [puppet] - 10https://gerrit.wikimedia.org/r/726651 [16:08:44] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31490/console" [puppet] - 10https://gerrit.wikimedia.org/r/726651 (owner: 10Jbond) [16:09:04] (03CR) 10MSantos: [C: 04-1] "Per our sync earlier. Let's see if makes sense to change the timestamp of last parsed files for this iteration and then merge it." [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) (owner: 10Jgiannelos) [16:13:53] (03PS2) 10Jbond: P:base: include monitoring earlier so that the plugins directory is created [puppet] - 10https://gerrit.wikimedia.org/r/726651 [16:14:19] (03PS3) 10Jbond: P:base: include monitoring earlier so that the plugins directory is created [puppet] - 10https://gerrit.wikimedia.org/r/726651 [16:15:28] (03PS1) 10Volans: quotereviewer: set logging level [software] - 10https://gerrit.wikimedia.org/r/726652 [16:15:45] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Bump the main ones instead. No need to duplicate just for mwdebug and we have hit this enough in the past to warrant a bumping to 5G for m" [deployment-charts] - 10https://gerrit.wikimedia.org/r/726580 (https://phabricator.wikimedia.org/T280497) (owner: 10Effie Mouzeli) [16:16:37] (03CR) 10Alexandros Kosiaris: [C: 03+1] "If this works, let's move it to the chart too!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/726000 (https://phabricator.wikimedia.org/T280497) (owner: 10Effie Mouzeli) [16:17:01] (03CR) 10Cwhite: logstash: set log field aside prior to parsing k8s logs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/726129 (https://phabricator.wikimedia.org/T292099) (owner: 10Cwhite) [16:18:11] 10SRE, 10LDAP-Access-Requests: Add Deniz Erdogan to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T292301 (10KFrancis) Hi all, I am confirming receipt of the signed NDA. Please proceed with the access request. Thanks! [16:20:07] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/726651 (owner: 10Jbond) [16:20:32] (03CR) 10Volans: [C: 03+2] quotereviewer: set logging level [software] - 10https://gerrit.wikimedia.org/r/726652 (owner: 10Volans) [16:24:17] 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: OpenSSL < 1.1.0 compatibility issues with new LE issuance chain - https://phabricator.wikimedia.org/T283165 (10akosiaris) >>! In T283165#7365637, @MoritzMuehlenhoff wrote: > For production: > * OpenSSL in Buster and Bullseye is not affect... [16:28:08] (03PS1) 10Muehlenhoff: owners.yaml: Update a few entries [puppet] - 10https://gerrit.wikimedia.org/r/726657 [16:29:16] (03CR) 10Jbond: [C: 03+2] P:base: include monitoring earlier so that the plugins directory is created [puppet] - 10https://gerrit.wikimedia.org/r/726651 (owner: 10Jbond) [16:30:09] 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: OpenSSL < 1.1.0 compatibility issues with new LE issuance chain - https://phabricator.wikimedia.org/T283165 (10akosiaris) >>! In T283165#7402880, @akosiaris wrote: >>>! In T283165#7365637, @MoritzMuehlenhoff wrote: >> For production: >> *... [16:31:02] 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: OpenSSL < 1.1.0 compatibility issues with new LE issuance chain - https://phabricator.wikimedia.org/T283165 (10MoritzMuehlenhoff) > With T291458 done, I 've already rebuilt bullseye (which was not affected) and buster main images (with lib... [16:33:43] (03PS1) 10Dduvall: Move Redis server definitions to services files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726660 [16:34:21] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/726657 (owner: 10Muehlenhoff) [16:35:11] (03CR) 10jerkins-bot: [V: 04-1] Move Redis server definitions to services files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726660 (owner: 10Dduvall) [16:35:55] 10SRE, 10Infrastructure-Foundations, 10netops: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 (10MoritzMuehlenhoff) I've added routinator to apt.wikimedia.org at "thirdparty/routinator" for bullseye-wikimedia and adapted the Puppet code, so that when the these get... [16:36:29] 10Puppet, 10Infrastructure-Foundations, 10GitLab (Infrastructure), 10Patch-For-Review, and 3 others: Puppetise gitlab-ansible playbook - https://phabricator.wikimedia.org/T283076 (10Jelto) I'd like to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/724430 tomorrow on `gitlab2001` (replica). For t... [16:37:37] 10SRE, 10SRE-swift-storage, 10ops-codfw: Spontaneous reboot of ms-be2045 - https://phabricator.wikimedia.org/T290881 (10Papaul) 05Open→03Resolved I checked the server today all looking good. closing this task [16:39:19] (03CR) 10Ssingh: [C: 03+1] haproxy: Allow loading lua scripts [puppet] - 10https://gerrit.wikimedia.org/r/720273 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [16:39:31] (03CR) 10Ssingh: [C: 03+1] haproxy: Allow adding/removing HTTP headers [puppet] - 10https://gerrit.wikimedia.org/r/720272 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [16:45:31] (03CR) 10Jgiannelos: [C: 03+2] tile-pregeneration: Exclude canary test events [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/726522 (https://phabricator.wikimedia.org/T270175) (owner: 10Jgiannelos) [16:46:45] (03Merged) 10jenkins-bot: tile-pregeneration: Exclude canary test events [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/726522 (https://phabricator.wikimedia.org/T270175) (owner: 10Jgiannelos) [16:47:29] !log coordinated with deployment backup and starting train prep for 1.38.0-wmf.3 (T281167), branched at 65279490f82c785181b8b6961e40901a4aaafca4 [16:47:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:37] T281167: 1.38.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T281167 [16:53:44] (03CR) 10Jgiannelos: [C: 03+2] tegola-vector-tiles: Configure kafka for eqiad/codfw main [deployment-charts] - 10https://gerrit.wikimedia.org/r/726561 (owner: 10Jgiannelos) [16:56:15] !log successfully applied security patches for 1.38.0-wmf.3 train (T281167) [16:56:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:56:22] T281167: 1.38.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T281167 [16:57:18] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Pchelolo) Script run finished. We've has some AbuseFilter violations: -... [16:57:32] urbanecm: is it possible that you can help me with this https://phabricator.wikimedia.org/T219279#7403003 ? [16:57:52] Daimona: ^^ [16:58:17] (03Merged) 10jenkins-bot: tegola-vector-tiles: Configure kafka for eqiad/codfw main [deployment-charts] - 10https://gerrit.wikimedia.org/r/726561 (owner: 10Jgiannelos) [16:59:06] Pchelolo: Daimona is likely a better PoC for AF issues than I am, yeah. [16:59:31] oki, thank you [16:59:57] urbanecm: Though it looks like it needs someone to possibly alter the filters... :D [17:00:00] (03PS2) 10Ppchelko: Remove mb_strtoupper overrides for HHVM [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726623 (https://phabricator.wikimedia.org/T219279) [17:00:04] chrisalbon and accraze: May I have your attention please! Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T1700) [17:00:15] Well, yeah, I can do that (but a lot of others can too :)) [17:00:54] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Pchelolo) In any case it seems like the pages we've failed to rename are... [17:01:50] Pchelolo: hi! May I have a bit of context without going through the whole task? :P [17:02:04] !log btullis@cumin1001 START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001 [17:02:08] Daimona: abuse filter is preventing a maint script from running [17:02:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:02:17] !log btullis@cumin1001 END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001 [17:02:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:02:35] I'm mass renaming some pages one various wikis that will be inaccessible due to changes in unicode capitalization [17:03:01] !log btullis@cumin1001 START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001 [17:03:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:03:12] Ohhhhhh I understand now [17:03:22] AbuseFilter is preventing the script from editing, right? [17:03:22] like ꞓ was not capitalized before, now it is, so a page starting with ꞓ will be inaccessible, and have to be renamed to Ꞓ [17:03:30] Daimona: ye [17:03:42] It's not that e.g. some AF-related title or username must be updated [17:03:47] cswiki 14 and ruwiki 16/11 [17:03:50] Daimona: yeah, only the 3 rules on those specific wikis [17:04:00] Got it [17:04:13] So... I think there are two ways to do this [17:04:24] (03CR) 10Ppchelko: "A little bit of a footgun, but if CommonSettings is deployed before Php72Overrides, we should be fine." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726623 (https://phabricator.wikimedia.org/T219279) (owner: 10Ppchelko) [17:04:58] First one is to edit the filters. People who can do this are local admins, global AF maintainers and staff. If it's just a temporary thing (for a few minutes), all of them are probably fine. [17:05:21] The alternative is a config change to permanently exclude the Maintenance Script user from filters, which I think might be a good idea regardless [17:05:35] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Remove mb_strtoupper overrides for HHVM [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726623 (https://phabricator.wikimedia.org/T219279) (owner: 10Ppchelko) [17:05:47] This one is easy to implement via the AbuseFilterShouldFilterAction hook, but it requires a deployment ofc [17:06:06] I think the latter might make more sense in future if someone is willing to do it [17:08:14] I'm willing to do both...in few minutes. [17:08:52] It's also possible to live hack config on mwmaint and disable AF as a whole [17:08:52] thank you urbanecm. disabling them for a few minutes will be preferable, cause permanent thing will likely need some discussion [17:09:00] Kk [17:09:05] (03PS1) 10Brennen Bearnes: testwikis wikis to 1.38.0-wmf.3 refs T281167 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726664 [17:09:07] (03CR) 10Brennen Bearnes: [C: 03+2] testwikis wikis to 1.38.0-wmf.3 refs T281167 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726664 (owner: 10Brennen Bearnes) [17:09:47] brennen: I see you're doing the train. Can you please ping me when it's safe to make a mw-config deployment for T219279 ? [17:09:48] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [17:09:52] (03Merged) 10jenkins-bot: testwikis wikis to 1.38.0-wmf.3 refs T281167 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726664 (owner: 10Brennen Bearnes) [17:09:54] !log brennen@deploy1002 Started scap: testwikis wikis to 1.38.0-wmf.3 refs T281167 [17:09:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:00] T281167: 1.38.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T281167 [17:10:04] Was just gonna say until scap runs you'll be blocked [17:10:56] (03PS9) 10Jgiannelos: Add script to send tile invalidation events [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) [17:10:59] (I'm off for dinner. If you need me, please ping me) [17:11:18] bon apetit [17:11:24] Pchelolo: try again please and lmk once done [17:12:18] !log btullis@cumin1001 END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001 [17:12:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:00] (03PS3) 10Dduvall: Move Redis server definitions to services files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726660 [17:14:07] Pchelolo: will ping. [17:14:11] thank you [17:14:44] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10AntiCompositeNumber) >>! In T219279#7402408, @Pchelolo wrote: > Renaming... [17:15:20] Pchelolo: did you see my comment? [17:15:48] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Christina Macholan - https://phabricator.wikimedia.org/T292515 (10Ottomata) Approved, this will require ssh access too. [17:16:02] urbanecm: yeah, running. [17:16:09] it takes a little while [17:16:32] Ack, just wanted to make sure you saw it :) [17:19:15] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Pchelolo) > Should they be renamed to something else, like was done with... [17:19:31] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Tamzin) Was just about to ask this. While rare, there //are// cases of a... [17:23:06] (03PS1) 10Jbond: P:contacts::get_owners: return a hash mapping roles to their owners [puppet] - 10https://gerrit.wikimedia.org/r/726667 [17:23:30] (03CR) 10Ahmon Dancy: [C: 03+2] train-dev: Remove hardcoding of datacenters in redis configuration [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/725392 (owner: 10Dduvall) [17:24:02] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31491/console" [puppet] - 10https://gerrit.wikimedia.org/r/726667 (owner: 10Jbond) [17:24:11] (03CR) 10Dzahn: mediawiki/geoip: add option to also pull new MaxMind databases from master (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/726094 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [17:24:31] (03PS3) 10Dzahn: mediawiki/geoip: add option to also pull new MaxMind databases from master [puppet] - 10https://gerrit.wikimedia.org/r/726094 (https://phabricator.wikimedia.org/T288844) [17:25:03] (03Merged) 10jenkins-bot: train-dev: Remove hardcoding of datacenters in redis configuration [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/725392 (owner: 10Dduvall) [17:25:26] urbanecm: done, thank you a lot. [17:26:06] Great. Re-enabling. [17:26:42] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31492/console" [puppet] - 10https://gerrit.wikimedia.org/r/726667 (owner: 10Jbond) [17:27:22] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Pchelolo) > with usertalk notes pointing them to Special:GlobalRenameReq... [17:30:52] (03PS2) 10Jbond: P:contacts::get_owners: return a hash mapping roles to their owners [puppet] - 10https://gerrit.wikimedia.org/r/726667 [17:31:04] Pchelolo: did the user renaming do a _global_ rename? [17:31:11] and will the users know of their new username? [17:31:12] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Legoktm) >>! In T219279#7403003, @Pchelolo wrote: > If someone could be... [17:31:42] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31493/console" [puppet] - 10https://gerrit.wikimedia.org/r/726667 (owner: 10Jbond) [17:31:56] urbanecm: once we deploy a capitalization change there will be no difference for not-renamed and renamed usernames [17:32:04] ack [17:32:18] except 4 users who's capitalized name is taken [17:32:38] and for the global/local thing? [17:32:56] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Tamzin) >>! In T219279#7403097, @Pchelolo wrote: >> with usertalk notes... [17:33:08] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:contacts::get_owners: return a hash mapping roles to their owners [puppet] - 10https://gerrit.wikimedia.org/r/726667 (owner: 10Jbond) [17:33:44] (currently about 1/3 through syncing testwiki) [17:35:04] urbanecm: not sure I understand what's the 'global/local thing'? [17:35:33] WMF uses CentralAuth for accounts. I'm trying to make sure the renames were done consistently on all wikis _and_ CentralAuth's own tables [17:35:39] (03CR) 10Dzahn: "Alright, I guess then it makes sense to first convert the cron to timer since that means the current cron command has to become a template" [puppet] - 10https://gerrit.wikimedia.org/r/725390 (owner: 10Dzahn) [17:36:49] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Urbanecm) >>! In T219279#7403110, @Tamzin wrote: >>>! In T219279#7403097... [17:38:47] urbanecm: the renameInvalidUsernames script uses the same machinery as global rename - posts jobs, renames for all wikis one-by-one [17:38:55] ah, good [17:38:58] just takes a few shotcuts. [17:39:05] thanks for checking that [17:40:05] i was asking, as there are no new gblrename entries at https://meta.wikimedia.org/wiki/Special:Log/login/Maintenance_script [17:40:14] but i found what happened [17:40:31] *you ran it at loginwiki, so [17:40:33] *https://meta.wikimedia.org/wiki/Special:Log/gblrename/Maintenance_script [17:40:52] but yeah, script ran at loginwiki, and https://login.wikimedia.org/wiki/Special:Log/gblrename has the logs [17:41:21] we'd need to do this whole thing again soon for php 7.2 -> 7.3 [17:41:43] would it be better to run renames on metawiki? [17:42:15] Yeah renames are usually from meta [17:42:19] or in general, if you have ideas for improvements please put them on the task. [17:42:39] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Tamzin) Ah. That's a shame. You'd think after two renames I'd know that.... [17:43:19] Pchelolo: yes, do future renames from metawiki please -- that's where they're generally expected. [17:43:33] ok. updated the instructions. [17:44:25] Appreciated [17:55:53] !log brennen@deploy1002 Finished scap: testwikis wikis to 1.38.0-wmf.3 refs T281167 (duration: 45m 59s) [17:56:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:02] T281167: 1.38.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T281167 [17:57:17] Pchelolo: you're clear if you still need to deploy. plz ping back when finished. [17:57:28] brennen: thank you. doing. [17:57:37] (03CR) 10Ppchelko: [C: 03+2] Remove mb_strtoupper overrides for HHVM [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726623 (https://phabricator.wikimedia.org/T219279) (owner: 10Ppchelko) [17:58:19] (03Merged) 10jenkins-bot: Remove mb_strtoupper overrides for HHVM [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726623 (https://phabricator.wikimedia.org/T219279) (owner: 10Ppchelko) [18:00:04] Deploy window Pre MediaWiki train break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T1800) [18:04:24] !log ppchelko@deploy1002 Synchronized wmf-config/CommonSettings.php: Remove mb_strtoupper overrides for HHVM T219279 CS.php (duration: 01m 06s) [18:04:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:31] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [18:06:40] brennen: I'm half-done, but how do I deploy a deletion of a file from mw-config? scap sync-file is not usable for deleted file [18:06:46] should I do a full sync? [18:08:40] Pchelolo: Sync the parent folder [18:08:47] (03PS1) 10Cwhite: logstash: test moving the k8s parsing to earlier in the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/726671 (https://phabricator.wikimedia.org/T292099) [18:08:47] so scap sync-dir wmf-config [18:08:55] gotcha [18:09:12] thx Reedy. [18:11:06] !log ppchelko@deploy1002 Synchronized wmf-config: Remove mb_strtoupper overrides for HHVM T219279 Php72ToUpper.php removal (duration: 01m 06s) [18:11:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:13] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [18:11:21] brennen: done. thank you for leaving me a window [18:11:45] sure thing [18:16:30] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Pchelolo) Ok, we're done with removal of overrides. Now production upper... [18:19:28] (03CR) 10Ahmon Dancy: [C: 03+1] Move Redis server definitions to services files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726660 (owner: 10Dduvall) [18:21:20] !log 1.38.0-wmf.3 (T281167): pruning old branches, starting with 1.37.0-wmf.21, proceeeding to 1.37.0-wmf.23 if time allows [18:21:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:27] T281167: 1.38.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T281167 [18:23:33] !log brennen@deploy1002 Pruned MediaWiki: 1.37.0-wmf.21 (duration: 04m 20s) [18:23:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:33] !log brennen@deploy1002 Pruned MediaWiki: 1.37.0-wmf.23 (duration: 01m 57s) [18:26:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:13] our tab complete thanks you :) [18:30:04] :) [18:30:24] 10SRE, 10MW-on-K8s, 10Traffic, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Krinkle) [18:34:39] 10SRE, 10SRE-swift-storage, 10ops-eqiad: swift - ms-be1059 - device sdi:3 unavailable - https://phabricator.wikimedia.org/T292486 (10Cmjohnson) These are just out of warranty by a few months. I do have spare disks on-site, my guess is a disk went bad on you. Is there any way you can tell me which disk slot... [18:35:24] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/31494/" [puppet] - 10https://gerrit.wikimedia.org/r/722950 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [18:38:51] (03PS1) 10Dzahn: etcd: remove absented cron code [puppet] - 10https://gerrit.wikimedia.org/r/726673 (https://phabricator.wikimedia.org/T273673) [18:39:41] (03CR) 10Dzahn: "[conf1004:~] $ sudo systemctl status etcd-backup" [puppet] - 10https://gerrit.wikimedia.org/r/722950 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [18:45:26] (03CR) 10Dzahn: [C: 03+2] "confirmed with cumin it's gone from conf*" [puppet] - 10https://gerrit.wikimedia.org/r/726673 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [18:50:59] 10SRE, 10DNS, 10Traffic: Additional DNS entries for Wikilearn project (Community Development) - https://phabricator.wikimedia.org/T292537 (10Ijon) Oh, thanks for catching this silly mistake! Indeed, the dev record should be to 52.44.207.59, not the private IP. [18:54:13] 10SRE, 10SRE-swift-storage, 10ops-eqiad: swift - ms-be1059 - device sdi:3 unavailable - https://phabricator.wikimedia.org/T292486 (10Dzahn) @Cmjohnson The failed disk should have an amber light on it when you're onsite. [18:55:35] 10SRE, 10SRE-swift-storage, 10ops-eqiad: swift - ms-be1059 - device sdi:3 unavailable - https://phabricator.wikimedia.org/T292486 (10Dzahn) serial PEYHD0DRHC617F [19:00:04] brennen and jeena: Time to snap out of that daydream and deploy MediaWiki train - American Version. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T1900). [19:00:19] no blockers, logs clean, proceeding. [19:00:54] (03PS1) 10Brennen Bearnes: group0 wikis to 1.38.0-wmf.3 refs T281167 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726676 [19:00:56] (03CR) 10Brennen Bearnes: [C: 03+2] group0 wikis to 1.38.0-wmf.3 refs T281167 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726676 (owner: 10Brennen Bearnes) [19:01:43] (03Merged) 10jenkins-bot: group0 wikis to 1.38.0-wmf.3 refs T281167 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726676 (owner: 10Brennen Bearnes) [19:02:07] (03PS8) 10Dzahn: geoip: replace maxmind update cron with system timer and config [puppet] - 10https://gerrit.wikimedia.org/r/721595 (https://phabricator.wikimedia.org/T273673) [19:03:05] !log brennen@deploy1002 rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3 refs T281167 [19:03:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:13] T281167: 1.38.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T281167 [19:04:31] seeing a few ImageHistoryPseudoPager errors [19:05:54] yeah, rolling back. [19:06:20] Pchelolo: do those look like action api errors? [19:09:13] !log brennen@deploy1002 rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.2 [19:09:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:10:49] brennen: I'm filing a task for the error [19:11:26] (03CR) 10Cwhite: [C: 03+2] logstash: set log field aside prior to parsing k8s logs [puppet] - 10https://gerrit.wikimedia.org/r/726129 (https://phabricator.wikimedia.org/T292099) (owner: 10Cwhite) [19:12:00] jeena: thanks, you beat me to it [19:15:36] (action api seems irrelevant; i hadn't read that task closely) [19:20:28] (03PS1) 10Brennen Bearnes: Revert "group0 wikis to 1.38.0-wmf.3 refs T281167" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726679 [19:20:30] (03CR) 10Brennen Bearnes: [C: 03+2] Revert "group0 wikis to 1.38.0-wmf.3 refs T281167" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726679 (owner: 10Brennen Bearnes) [19:21:36] (03Merged) 10jenkins-bot: Revert "group0 wikis to 1.38.0-wmf.3 refs T281167" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726679 (owner: 10Brennen Bearnes) [19:25:14] RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:27:04] 10SRE: Restore amire80 home directory on mwmaint1002 - https://phabricator.wikimedia.org/T292573 (10Amire80) [19:27:38] (03PS2) 10Cwhite: logstash: test moving the k8s parsing to earlier in the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/726671 (https://phabricator.wikimedia.org/T292099) [19:30:54] !log restoring /home/amire80 from and to mwmaint2002 via Bacula bconsole (T292573) [19:30:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:31:00] T292573: Restore amire80 home directory on mwmaint1002 - https://phabricator.wikimedia.org/T292573 [19:31:32] PROBLEM - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: deploy_to_mwdebug.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:36:02] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Pchelolo) 05Open→03Resolved a:03Pchelolo Ok, re-run the script. Al... [19:36:14] 10SRE: Restore amire80 home directory on mwmaint1002 - https://phabricator.wikimedia.org/T292573 (10Dzahn) Hi @Amire80 this is because mwmaint2002 had the OS reinstalled. This happened shortly after the switch you describe but the switch of the active server did not make the files go away. They are separate in... [19:36:39] 10SRE: Restore amire80 home directory on mwmaint1002 - https://phabricator.wikimedia.org/T292573 (10Dzahn) a:03Dzahn [19:49:55] (03PS9) 10Dzahn: geoip: replace maxmind update cron with system timer and config [puppet] - 10https://gerrit.wikimedia.org/r/721595 (https://phabricator.wikimedia.org/T273673) [19:50:25] (03PS1) 10Brennen Bearnes: Pre-format comments for non-local files too [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/726594 (https://phabricator.wikimedia.org/T292570) [19:55:21] (03CR) 10Brennen Bearnes: [C: 03+2] Pre-format comments for non-local files too [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/726594 (https://phabricator.wikimedia.org/T292570) (owner: 10Brennen Bearnes) [19:55:57] Pchelolo: waiting on CI, will sling that out. [19:58:50] ok, tell me if it works [20:06:46] !log cumin 'puppetmaster*' "disable-puppet 'T288844 - T273673 - gerrit:721595 - ${USER}'" [20:06:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:54] T288844: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 [20:06:54] T273673: replace all puppet crons with systemd timers - https://phabricator.wikimedia.org/T273673 [20:10:50] (03CR) 10Dzahn: [C: 03+2] geoip: replace maxmind update cron with system timer and config [puppet] - 10https://gerrit.wikimedia.org/r/721595 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [20:18:26] (03Merged) 10jenkins-bot: Pre-format comments for non-local files too [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/726594 (https://phabricator.wikimedia.org/T292570) (owner: 10Brennen Bearnes) [20:18:59] !log puppetmaster1003 et al - converting maxmind geoip database fetching from cron to timers [20:19:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:51] (03PS2) 10Arlolra: Enable legacy media dom on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726643 (https://phabricator.wikimedia.org/T292498) [20:27:35] (03CR) 10Legoktm: Enable legacy media dom on metawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726643 (https://phabricator.wikimedia.org/T292498) (owner: 10Arlolra) [20:27:54] (03CR) 10Arlolra: "The alternative to this is to always ship the styles, regardless of the flag. That's in https://gerrit.wikimedia.org/r/c/mediawiki/core/+" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726643 (https://phabricator.wikimedia.org/T292498) (owner: 10Arlolra) [20:28:50] Pchelolo: no error generated loading https://test.wikipedia.org/wiki/File:Symbol_LED.svg so i think we're good [20:29:05] cool! [20:29:06] er... dammit, i think i might be wrong [20:29:20] 10SRE-swift-storage, 10TimedMediaHandler-Transcode: Intermittent transcode failure 'An unknown error occurred in storage backend "local-swift-codfw".' - https://phabricator.wikimedia.org/T201090 (10TheDJ) [20:29:29] (03CR) 10Arlolra: Enable legacy media dom on metawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726643 (https://phabricator.wikimedia.org/T292498) (owner: 10Arlolra) [20:30:42] 10SRE-swift-storage, 10TimedMediaHandler-Transcode: Intermittent transcode failure 'An unknown error occurred in storage backend "local-swift-codfw".' - https://phabricator.wikimedia.org/T201090 (10TheDJ) @Yann that seems like a different error that should be filed as a separate ticket ? [20:31:08] heh, i guess the question is did i trigger those with mwdebug enabled or did somebody else click... [20:31:18] (03CR) 10Legoktm: Enable legacy media dom on metawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726643 (https://phabricator.wikimedia.org/T292498) (owner: 10Arlolra) [20:32:04] it looks like they are increasing? [20:35:46] (03CR) 10Subramanya Sastry: Enable legacy media dom on metawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726643 (https://phabricator.wikimedia.org/T292498) (owner: 10Arlolra) [20:36:42] (03CR) 10Subramanya Sastry: Enable legacy media dom on metawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726643 (https://phabricator.wikimedia.org/T292498) (owner: 10Arlolra) [20:37:52] 10SRE, 10SRE-Access-Requests: Update ssh key for lmata - https://phabricator.wikimedia.org/T292583 (10herron) p:05Triage→03Medium [20:38:35] (03PS1) 10Herron: admin: update lmata ssh key [puppet] - 10https://gerrit.wikimedia.org/r/726682 (https://phabricator.wikimedia.org/T292583) [20:41:04] jeena: i think we're good. tried it with a few different files, pretty sure it's no longer happening with the patch. going ahead with sync and rolling back to group0. [20:42:45] ok [20:43:55] (03CR) 10Arlolra: Enable legacy media dom on metawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726643 (https://phabricator.wikimedia.org/T292498) (owner: 10Arlolra) [20:44:48] !log brennen@deploy1002 Synchronized php-1.38.0-wmf.3/includes/page: Backport: [[gerrit:726594|Pre-format comments for non-local files too]] (T292570) (duration: 01m 04s) [20:44:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:44:55] T292570: ImageHistoryPseudoPager: PHP Notice: Undefined offset: [n] - https://phabricator.wikimedia.org/T292570 [20:45:37] (03PS1) 10Brennen Bearnes: group0 wikis to 1.38.0-wmf.3 refs T281167 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726683 [20:45:39] (03CR) 10Brennen Bearnes: [C: 03+2] group0 wikis to 1.38.0-wmf.3 refs T281167 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726683 (owner: 10Brennen Bearnes) [20:46:27] (03Merged) 10jenkins-bot: group0 wikis to 1.38.0-wmf.3 refs T281167 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726683 (owner: 10Brennen Bearnes) [20:47:52] !log brennen@deploy1002 rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3 refs T281167 [20:47:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:47:58] T281167: 1.38.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T281167 [21:00:15] 10SRE, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: Q1:(Need By: TBD) rack/setup (4) fundraising hosts - https://phabricator.wikimedia.org/T289812 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson frauth1002 Rack C1 U29 port32 Cableid# 23000015 /23000049 frpm1002 Rack C1 U30 port33 Cableid# 2300003... [21:04:01] (03PS1) 10Dzahn: geoip: remove absented cron code for maxmind update [puppet] - 10https://gerrit.wikimedia.org/r/726684 (https://phabricator.wikimedia.org/T273673) [21:04:19] (03CR) 10jerkins-bot: [V: 04-1] geoip: remove absented cron code for maxmind update [puppet] - 10https://gerrit.wikimedia.org/r/726684 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [21:04:33] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Update ssh key for lmata - https://phabricator.wikimedia.org/T292583 (10lmata) Yes thank you @herron [21:04:52] (03PS2) 10Dzahn: geoip: remove absented cron code for maxmind update [puppet] - 10https://gerrit.wikimedia.org/r/726684 (https://phabricator.wikimedia.org/T273673) [21:05:45] (03CR) 10LMata: [C: 03+1] "Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/726682 (https://phabricator.wikimedia.org/T292583) (owner: 10Herron) [21:06:04] RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:07:21] (03CR) 10Dzahn: [C: 03+2] "all root crontabs on all puppet masters are empty now" [puppet] - 10https://gerrit.wikimedia.org/r/726684 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [21:10:36] PROBLEM - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: deploy_to_mwdebug.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:16:31] (03PS1) 10Dzahn: geoip::maxmind: use ensure_resource to ensure data_directory [puppet] - 10https://gerrit.wikimedia.org/r/726687 [21:17:12] (03CR) 10jerkins-bot: [V: 04-1] geoip::maxmind: use ensure_resource to ensure data_directory [puppet] - 10https://gerrit.wikimedia.org/r/726687 (owner: 10Dzahn) [21:18:07] (03PS2) 10Dzahn: geoip::maxmind: use ensure_resource to ensure data_directory [puppet] - 10https://gerrit.wikimedia.org/r/726687 [21:19:23] (03CR) 10Dzahn: [C: 03+2] geoip::maxmind: use ensure_resource to ensure data_directory [puppet] - 10https://gerrit.wikimedia.org/r/726687 (owner: 10Dzahn) [21:21:46] (03PS1) 10Dduvall: train-dev: Fix array_map call in wmf-config/redis.php [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/726689 [21:22:13] (03CR) 10Cwhite: [C: 03+1] "LGTM, confirmed elsewhere" [puppet] - 10https://gerrit.wikimedia.org/r/726682 (https://phabricator.wikimedia.org/T292583) (owner: 10Herron) [21:23:17] (03CR) 10Ahmon Dancy: [C: 03+2] train-dev: Fix array_map call in wmf-config/redis.php [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/726689 (owner: 10Dduvall) [21:23:58] (03Merged) 10jenkins-bot: train-dev: Fix array_map call in wmf-config/redis.php [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/726689 (owner: 10Dduvall) [21:25:17] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q2: (Need By: TBD) rack/setup/install kubestage100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T290894 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson kubestage1003 B1 U21 Port29 Cableid# 201333910847 kubestage1004 D3 U37 Port 32 Cableid#1943 [21:25:31] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q2: (Need By: TBD) rack/setup/install kubestage100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T290894 (10Jclark-ctr) [21:26:28] 10SRE, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: Q1:(Need By: TBD) rack/setup (4) fundraising hosts - https://phabricator.wikimedia.org/T289812 (10Jclark-ctr) [21:27:51] (03CR) 10Cwhite: [C: 03+2] logstash: clean up unused cache clear script [puppet] - 10https://gerrit.wikimedia.org/r/725105 (https://phabricator.wikimedia.org/T144396) (owner: 10Cwhite) [21:32:00] (03CR) 10Cwhite: [V: 03+2 C: 03+2] "Boldly +2-ing for the related patches. We'll need it to set up new plugins with for OpenSearch/Loki." [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/552486 (https://phabricator.wikimedia.org/T217340) (owner: 10Filippo Giunchedi) [21:32:23] (03CR) 10Cwhite: [V: 03+2 C: 03+2] Set up build on production builder host [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/688418 (owner: 10Cwhite) [21:32:36] (03PS2) 10Cwhite: Add logstash-output-opensearch plugin [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/713713 [21:40:38] (03CR) 10Dzahn: puppetmaster/geoip: do not duplicate pulling of maxmind on all servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/725390 (owner: 10Dzahn) [21:53:18] (03PS1) 10Dzahn: puppetmaster/geoip: add puppet CA server name to geoip config pt.1 [puppet] - 10https://gerrit.wikimedia.org/r/726696 (https://phabricator.wikimedia.org/T288844) [21:53:56] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster/geoip: add puppet CA server name to geoip config pt.1 [puppet] - 10https://gerrit.wikimedia.org/r/726696 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [21:55:23] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T289888 (10Bstorm) @nskaggs and @aborrero Just checking on this, do we want to do a straight refresh of exactly as it is? The cloud-suppor... [22:00:12] (03PS2) 10Dzahn: puppetmaster/geoip: add puppet CA server name to geoip config pt.1 [puppet] - 10https://gerrit.wikimedia.org/r/726696 (https://phabricator.wikimedia.org/T288844) [22:03:07] (03PS3) 10Dzahn: puppetmaster/geoip: add puppet CA server name to geoip config pt.1 [puppet] - 10https://gerrit.wikimedia.org/r/726696 (https://phabricator.wikimedia.org/T288844) [22:06:37] (03PS3) 10Arlolra: Enable legacy media dom on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726643 (https://phabricator.wikimedia.org/T292498) [22:09:05] (03CR) 10Dzahn: [C: 03+2] puppetmaster/geoip: add puppet CA server name to geoip config pt.1 [puppet] - 10https://gerrit.wikimedia.org/r/726696 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [22:09:41] (03CR) 10Dzahn: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/31498/" [puppet] - 10https://gerrit.wikimedia.org/r/726696 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [22:10:22] PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), Fresh: 102 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [22:17:26] Pchelolo: i see patches for both of the current UBNs, but it's getting somewhat late in the day. evaluating whether to roll train back to testwikis 'til morning. thoughts? [22:17:56] up to you. I'm pretty much done for the day [22:19:19] rolling back seems like the life-simplifying choice for all involved. doing so. [22:20:09] !log 1.38.0-wmf.3 (T281167) rolling back to testwikis for the day; will revisit in US-morning [22:20:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:20:16] T281167: 1.38.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T281167 [22:22:48] Pchelolo: thanks for the assistance on all of the above. [22:24:11] !log brennen@deploy1002 rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.2 [22:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:24:29] (03PS1) 10Dzahn: geoip: skip pulling of maxmind files if not on the puppet CA server [puppet] - 10https://gerrit.wikimedia.org/r/726699 (https://phabricator.wikimedia.org/T288844) [22:25:26] (03PS1) 10Brennen Bearnes: Revert "group0 wikis to 1.38.0-wmf.3 refs T281167" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726700 (https://phabricator.wikimedia.org/T281167) [22:25:28] (03CR) 10Brennen Bearnes: [C: 03+2] Revert "group0 wikis to 1.38.0-wmf.3 refs T281167" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726700 (https://phabricator.wikimedia.org/T281167) (owner: 10Brennen Bearnes) [22:26:30] (03Merged) 10jenkins-bot: Revert "group0 wikis to 1.38.0-wmf.3 refs T281167" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726700 (https://phabricator.wikimedia.org/T281167) (owner: 10Brennen Bearnes) [22:27:07] (03CR) 10Dzahn: [C: 03+2] geoip: skip pulling of maxmind files if not on the puppet CA server [puppet] - 10https://gerrit.wikimedia.org/r/726699 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [22:31:50] Welcome Bsadowski1! [22:50:00] (03PS1) 10Dduvall: train-dev: Fix servers by DC in wmf-config/redis.php [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/726701 [22:51:49] (03CR) 10Dduvall: [C: 03+2] train-dev: Fix servers by DC in wmf-config/redis.php [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/726701 (owner: 10Dduvall) [22:52:44] (03Merged) 10jenkins-bot: train-dev: Fix servers by DC in wmf-config/redis.php [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/726701 (owner: 10Dduvall) [22:58:03] (03PS1) 10Dzahn: geoip: use ISO-8601 date format in logs and fix log shell redirection [puppet] - 10https://gerrit.wikimedia.org/r/726703 (https://phabricator.wikimedia.org/T288844) [22:58:21] (03CR) 10jerkins-bot: [V: 04-1] geoip: use ISO-8601 date format in logs and fix log shell redirection [puppet] - 10https://gerrit.wikimedia.org/r/726703 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [22:59:36] (03PS2) 10Dzahn: geoip: use ISO-8601 date format in logs and fix log shell redirection [puppet] - 10https://gerrit.wikimedia.org/r/726703 (https://phabricator.wikimedia.org/T288844) [23:00:05] RoanKattouw, Niharika, and Urbanecm: Your horoscope predicts another unfortunate Evening backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T2300). [23:00:05] tgr and Juan_90264: A patch you scheduled for Evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:10] o/ [23:00:34] * urbanecm waves [23:00:41] tgr: can you deploy please? [23:00:50] sure [23:00:56] Thanks [23:02:10] !log deleting old stretch docker images from the registry for T292485 [23:02:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:02:17] T292485: docker-reporter-releng-images => docker registry: status=3/NOTIMPLEMENTED - https://phabricator.wikimedia.org/T292485 [23:02:54] (03PS2) 10Gergő Tisza: Add image_suggestion_interaction event stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725386 [23:05:30] PROBLEM - Check systemd state on ms-be1028 is CRITICAL: CRITICAL - degraded: The following units failed: swift-drive-audit.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:06:41] (03CR) 10Gergő Tisza: [C: 03+2] Add image_suggestion_interaction event stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725386 (owner: 10Gergő Tisza) [23:07:34] (03Merged) 10jenkins-bot: Add image_suggestion_interaction event stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725386 (owner: 10Gergő Tisza) [23:11:58] (03PS1) 10Jforrester: Replace deprecated ParserOptions::getUser with ::getUserIdentity [extensions/Scribunto] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/726596 (https://phabricator.wikimedia.org/T292589) [23:12:17] (03PS1) 10Jforrester: Replace deprecated ParserOptions::getUser with ::getUserIdentity [extensions/Scribunto] (wmf/1.38.0-wmf.2) - 10https://gerrit.wikimedia.org/r/726597 (https://phabricator.wikimedia.org/T292589) [23:14:57] (03PS1) 10Arlolra: Add a separate config for content.media.less [core] (wmf/1.38.0-wmf.2) - 10https://gerrit.wikimedia.org/r/726707 (https://phabricator.wikimedia.org/T292498) [23:15:35] (03PS13) 10Gergő Tisza: Adding and use wordmark in azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704376 (https://phabricator.wikimedia.org/T284877) (owner: 10Juan90264) [23:16:06] (03PS1) 10Arlolra: Add a separate config for content.media.less [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/726709 (https://phabricator.wikimedia.org/T292498) [23:16:09] !log tgr@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725386|Add image_suggestion_interaction event stream]] (duration: 01m 12s) [23:16:12] Juan_90264: around? [23:16:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:17:56] (03CR) 10jerkins-bot: [V: 04-1] Adding and use wordmark in azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704376 (https://phabricator.wikimedia.org/T284877) (owner: 10Juan90264) [23:22:52] I guess these are simple enough to deploy in absentia. [23:26:38] RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:27:12] (03CR) 10Gergő Tisza: "CI error:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704376 (https://phabricator.wikimedia.org/T284877) (owner: 10Juan90264) [23:27:20] (03CR) 10Gergő Tisza: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704376 (https://phabricator.wikimedia.org/T284877) (owner: 10Juan90264) [23:27:44] if systemd on deneb fails again, that's me and somewhat expected [23:31:07] (03CR) 10Gergő Tisza: [C: 03+2] Adding and use wordmark in azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704376 (https://phabricator.wikimedia.org/T284877) (owner: 10Juan90264) [23:32:50] (03CR) 10Dzahn: [C: 03+2] geoip: use ISO-8601 date format in logs and fix log shell redirection [puppet] - 10https://gerrit.wikimedia.org/r/726703 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:35:14] Hello [23:35:48] (03Merged) 10jenkins-bot: Adding and use wordmark in azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704376 (https://phabricator.wikimedia.org/T284877) (owner: 10Juan90264) [23:39:54] Hello? "Evening backport window" deployers? [23:40:22] tgr: ^ [23:40:24] jouncebot: now [23:40:24] For the next 0 hour(s) and 19 minute(s): Evening backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211005T2300) [23:41:12] (03CR) 10RLazarus: [C: 03+2] webperf: connect to Kafka using TLS [puppet] - 10https://gerrit.wikimedia.org/r/721047 (https://phabricator.wikimedia.org/T290131) (owner: 10Dave Pifke) [23:42:32] Thanks TGR for merged [23:42:36] (03PS1) 10Dzahn: geoip: also limit pulling of legacy databases to CA server [puppet] - 10https://gerrit.wikimedia.org/r/726718 (https://phabricator.wikimedia.org/T288844) [23:44:23] !log tgr@deploy1002 Synchronized static/images/mobile/copyright/wikipedia-wordmark-az.svg: Config: [[gerrit:704376|Adding and use wordmark in azwiki (T284877)]] (duration: 01m 23s) [23:44:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:44:29] T284877: Create translated mobile wordmark for Azerbaijani Wikipedia - https://phabricator.wikimedia.org/T284877 [23:44:51] (03PS2) 10Dzahn: geoip: also limit pulling of legacy databases to CA server [puppet] - 10https://gerrit.wikimedia.org/r/726718 (https://phabricator.wikimedia.org/T288844) [23:46:00] (03CR) 10Dzahn: [C: 03+2] geoip: also limit pulling of legacy databases to CA server [puppet] - 10https://gerrit.wikimedia.org/r/726718 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:47:27] !log tgr@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704376|Adding and use wordmark in azwiki (T284877)]] (duration: 01m 04s) [23:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:16] (03CR) 10Gergő Tisza: [C: 03+2] Wikiversity Logo Update for 2017 Logo Version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725413 (https://phabricator.wikimedia.org/T292109) (owner: 10Juan90264) [23:49:18] 10SRE, 10NavigationTiming, 10Performance-Team, 10Patch-For-Review: Switch to encrypted kafka for coal/navtiming/statsv - https://phabricator.wikimedia.org/T290131 (10dpifke) 05Open→03Resolved Confirmed all three services are now talking TLS to Kafka on webperf1001 and webperf2001. [23:49:20] Okay, thanks tgr for deploying. And as there are two more patches that still need to be deployed, I will try to put in the backport "European mid-day backport window" [23:49:58] (03Merged) 10jenkins-bot: Wikiversity Logo Update for 2017 Logo Version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725413 (https://phabricator.wikimedia.org/T292109) (owner: 10Juan90264) [23:50:13] we have enough time for those too [23:50:52] Perfect, thanks again [23:50:54] (03PS8) 10Gergő Tisza: Add WN as an alias to project namespace in Polish Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725132 (https://phabricator.wikimedia.org/T291344) (owner: 10Juan90264) [23:51:14] (03PS1) 10Dzahn: puppetmaster/geoip: add puppet CA server name to geoip config pt.2 [puppet] - 10https://gerrit.wikimedia.org/r/726720 (https://phabricator.wikimedia.org/T288844) [23:52:16] tgr: whenever you're done, arlolra and I are going to deploy some core backports [23:54:12] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/31499/puppetmaster1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/726720 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:54:14] (03CR) 10Gergő Tisza: [C: 03+2] Add WN as an alias to project namespace in Polish Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725132 (https://phabricator.wikimedia.org/T291344) (owner: 10Juan90264) [23:54:56] !log tgr@deploy1002 Synchronized static/images/mobile/copyright/wikiversity.svg: Config: [[gerrit:725413|Wikiversity Logo Update for 2017 Logo Version (T292109)]] (duration: 01m 03s) [23:55:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:55:05] T292109: Wikiversity logo - https://phabricator.wikimedia.org/T292109 [23:56:15] (03Merged) 10jenkins-bot: Add WN as an alias to project namespace in Polish Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/725132 (https://phabricator.wikimedia.org/T291344) (owner: 10Juan90264) [23:58:26] (03PS1) 10Bstorm: toolforge postgres: drop database tuning [puppet] - 10https://gerrit.wikimedia.org/r/726723 (https://phabricator.wikimedia.org/T267616)