[00:00:04] RoanKattouw and Urbanecm: (Dis)respected human, time to deploy UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211111T0000). Please do the needful. [00:00:04] MatmaRex: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:18] sup, anyone around? [00:10:55] it's an easy one [00:11:08] (03CR) 10Reedy: [C: 03+2] Configure upload dialog on officewiki to upload locally [mediawiki-config] - 10https://gerrit.wikimedia.org/r/738021 (https://phabricator.wikimedia.org/T295510) (owner: 10Bartosz Dziewoński) [00:11:54] (03Merged) 10jenkins-bot: Configure upload dialog on officewiki to upload locally [mediawiki-config] - 10https://gerrit.wikimedia.org/r/738021 (https://phabricator.wikimedia.org/T295510) (owner: 10Bartosz Dziewoński) [00:16:33] thanks Reedy [00:18:19] !log reedy@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Set wgForeignUploadTargets on officewiki T295510 (duration: 00m 56s) [00:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:18:23] T295510: Change default file upload config for officewiki to local - https://phabricator.wikimedia.org/T295510 [00:18:27] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [00:18:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:22:09] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [00:22:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:31:03] PROBLEM - k8s API server requests latencies on kubemaster2001 is CRITICAL: instance=10.192.0.56 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [00:33:11] RECOVERY - k8s API server requests latencies on kubemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [01:00:04] twentyafterfour: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Phabricator update . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211111T0100). [01:00:11] RECOVERY - MariaDB Replica Lag: s1 on db2141 is OK: OK slave_sql_lag Replication lag: 0.13 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [02:24:55] PROBLEM - Check systemd state on db1108 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_prometheus-mysqld-exporter@analytics_meta.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:30:08] 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for David Martin - https://phabricator.wikimedia.org/T295264 (10RLazarus) Done, thanks. [02:43:27] 10SRE, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): wikitech-static down - https://phabricator.wikimedia.org/T295266 (10Reedy) >>! In T295266#7488312, @RLazarus wrote: > A few suspicious excerpts, not sure how much of this is normal background noise: > > ` > Nov 8 04:20:50 wikitech-static impo... [02:47:04] 10SRE, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): wikitech-static down - https://phabricator.wikimedia.org/T295266 (10Reedy) >>! In T295266#7491492, @Andrew wrote: > I've seen that host struggle with memory issues in the past, so we may just be seeing organic growth of mediawiki resource needs... [05:13:51] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 31 hosts with reason: Primary switchover s8 T294321 [05:13:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:55] T294321: Switchover s8 from db1104 to db1109 - https://phabricator.wikimedia.org/T294321 [05:14:14] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 31 hosts with reason: Primary switchover s8 T294321 [05:14:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:14:50] In 45 minutes we'll switchover s8 master [05:20:10] (03PS2) 10Marostegui: wmnet: Update s8-master [dns] - 10https://gerrit.wikimedia.org/r/737832 (https://phabricator.wikimedia.org/T294321) [05:20:20] (03PS2) 10Marostegui: mariadb: Promote db1109 as s8 master [puppet] - 10https://gerrit.wikimedia.org/r/737831 (https://phabricator.wikimedia.org/T294321) [05:32:38] (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db1109 as s8 master [puppet] - 10https://gerrit.wikimedia.org/r/737831 (https://phabricator.wikimedia.org/T294321) (owner: 10Marostegui) [05:46:28] (03PS1) 10Marostegui: db1104: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/738057 (https://phabricator.wikimedia.org/T290868) [06:00:04] kormat and marostegui: (Dis)respected human, time to deploy Database primary switchover for s8 (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211111T0600). Please do the needful. [06:00:07] let's go for it? [06:00:16] 👍 [06:00:22] !log Starting s8 eqiad failover from db1104 to db1109 - T294321 [06:00:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:00:26] T294321: Switchover s8 from db1104 to db1109 - https://phabricator.wikimedia.org/T294321 [06:00:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T294321', diff saved to https://phabricator.wikimedia.org/P17721 and previous config saved to /var/cache/conftool/dbconfig/20211111-060031-marostegui.json [06:00:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:00:42] RO confirmed [06:01:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'Promote db1109 to s8 primary and set section read-write T294321', diff saved to https://phabricator.wikimedia.org/P17722 and previous config saved to /var/cache/conftool/dbconfig/20211111-060102-marostegui.json [06:01:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:05] all done [06:01:19] kormat: can you clean up heartbeat? [06:01:29] topology looking good [06:01:45] on it [06:01:48] I can see new changes [06:02:24] heartbeat cleaned [06:02:42] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1104 (old master) T294321', diff saved to https://phabricator.wikimedia.org/P17723 and previous config saved to /var/cache/conftool/dbconfig/20211111-060242-marostegui.json [06:02:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:03:10] (03CR) 10Marostegui: [C: 03+2] db1104: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/738057 (https://phabricator.wikimedia.org/T290868) (owner: 10Marostegui) [06:03:18] marostegui: orchestrator looks happy. [06:03:22] yep [06:03:25] tendril too? [06:03:28] (03CR) 10Marostegui: [C: 03+2] wmnet: Update s8-master [dns] - 10https://gerrit.wikimedia.org/r/737832 (https://phabricator.wikimedia.org/T294321) (owner: 10Marostegui) [06:03:47] who even looks at tendril any more [06:04:01] (i haven't opened it in weeks) [06:04:22] hahaha [06:04:23] tendril seems fine [06:06:32] !log Stop replication on db1104 (old master) T294321 [06:06:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:35] T294321: Switchover s8 from db1104 to db1109 - https://phabricator.wikimedia.org/T294321 [06:09:10] (03CR) 10Marostegui: "Switchover done, if this patch is correct, it can be merged anytime" [puppet] - 10https://gerrit.wikimedia.org/r/736946 (https://phabricator.wikimedia.org/T290868) (owner: 10Marostegui) [06:10:21] !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host db1104.eqiad.wmnet with OS buster [06:10:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:35:16] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 84 probes of 640 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:37:54] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1104.eqiad.wmnet with OS buster [06:37:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:40:46] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 45 probes of 640 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:56:41] !log `systemctl start prometheus-mysqld-exporter@analytics_meta` on db1108 [06:56:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:41] (03CR) 10Marostegui: [C: 03+1] "Good for me, but I would like this to get signed off by Moritz just in case." [puppet] - 10https://gerrit.wikimedia.org/r/737926 (https://phabricator.wikimedia.org/T265990) (owner: 10Ladsgroup) [07:11:55] (03CR) 10Elukey: [C: 03+1] constants: add new drmrs datacenter [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737989 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [07:12:19] (03CR) 10Elukey: [C: 03+1] interactive: change input prefix to ==> [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737990 (owner: 10Volans) [07:13:04] RECOVERY - Check systemd state on db1108 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:15:02] (03CR) 10Elukey: [C: 03+1] "<3" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737991 (owner: 10Volans) [07:15:10] (03CR) 10Elukey: [C: 03+1] constants: add CORE_DATACENTERS constant [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737992 (owner: 10Volans) [07:16:32] (03PS3) 10Majavah: P::kubernetes: allow disabling kafka ipv6 on hiera [puppet] - 10https://gerrit.wikimedia.org/r/737200 (https://phabricator.wikimedia.org/T281986) [07:16:50] (03CR) 10Majavah: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737200 (https://phabricator.wikimedia.org/T281986) (owner: 10Majavah) [07:37:15] 10SRE, 10Analytics-Radar, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10elukey) First thing to follow up - deployment-prep: 1) we have a kafka cluster in there (for example, to test... [08:01:07] (03PS17) 10Giuseppe Lavagetto: mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) [08:01:09] (03PS14) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [08:01:11] (03PS5) 10Giuseppe Lavagetto: mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 [08:01:49] (03PS1) 10Arturo Borrero Gonzalez: Revert "Revert "openstack: monitor: cmd-checklist-runner: exit with a different return code"" [puppet] - 10https://gerrit.wikimedia.org/r/737957 [08:03:55] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Revert "Revert "openstack: monitor: cmd-checklist-runner: exit with a different return code"" [puppet] - 10https://gerrit.wikimedia.org/r/737957 (owner: 10Arturo Borrero Gonzalez) [08:04:14] (03PS15) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [08:04:16] (03PS6) 10Giuseppe Lavagetto: mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 [08:07:09] (03PS16) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [08:07:11] (03PS7) 10Giuseppe Lavagetto: mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 [08:07:51] (03CR) 10Arturo Borrero Gonzalez: "Thanks for the PoC. Really appreciated." [puppet] - 10https://gerrit.wikimedia.org/r/737774 (owner: 10Jbond) [08:08:38] (03PS17) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [08:08:40] (03PS8) 10Giuseppe Lavagetto: mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 [08:09:11] (03PS11) 10Ideophagous: Bug:T291737 Squashed two commits into one, previous commit comments follow: Bug:T291737 Change-Id: Ib263a5419c6ace911a597d025b28d6ef13549c10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735713 [08:10:44] (03PS18) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [08:10:45] (03PS9) 10Giuseppe Lavagetto: mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 [08:13:06] !log Restart db1132 T288720 [08:13:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:13:10] T288720: Failover m5 master (db1128) to db1132 to upgrade its kernel - https://phabricator.wikimedia.org/T288720 [08:17:10] !log Upgrade db2078 T288720 [08:17:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:16] (03PS1) 10Majavah: aptrepo: init kubeadm 1.21 [puppet] - 10https://gerrit.wikimedia.org/r/738065 (https://phabricator.wikimedia.org/T282942) [08:33:55] (03PS1) 10Arturo Borrero Gonzalez: openstack: networktests: add systemd timer job to run the test suite [puppet] - 10https://gerrit.wikimedia.org/r/738068 (https://phabricator.wikimedia.org/T294955) [08:38:50] (03CR) 10Thiemo Kreuz (WMDE): Remove unused code from StaticSiteConfiguration class (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737858 (owner: 10Thiemo Kreuz (WMDE)) [08:42:48] (03CR) 10Jcrespo: simplelamp2: ensure httpd::mpm comes before httpd, revert previous change (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/733086 (owner: 10Dzahn) [08:45:00] (03CR) 10Jcrespo: [C: 03+2] Add section and host to error log message [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/736652 (https://phabricator.wikimedia.org/T293975) (owner: 10Volans) [08:49:02] (03PS1) 10Arturo Borrero Gonzalez: openstack: networktests: run as dedicated user [puppet] - 10https://gerrit.wikimedia.org/r/738070 (https://phabricator.wikimedia.org/T294955) [08:52:34] PROBLEM - haproxy failover on dbproxy2004 is CRITICAL: CRITICAL check_failover servers up 2 down 1 https://wikitech.wikimedia.org/wiki/HAProxy [08:53:14] ^ me [08:53:44] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC: https://puppet-compiler.wmflabs.org/compiler1003/32367/" [puppet] - 10https://gerrit.wikimedia.org/r/738070 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [08:54:36] RECOVERY - haproxy failover on dbproxy2004 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [08:55:58] (03CR) 10ArielGlenn: "There were a few places where the name is still jobs, which affected the diffs. I think I marked all of them but can't be sure until a PCC" [puppet] - 10https://gerrit.wikimedia.org/r/736074 (owner: 10Dzahn) [08:56:32] (03PS2) 10Arturo Borrero Gonzalez: openstack: networktests: add systemd timer job to run the test suite [puppet] - 10https://gerrit.wikimedia.org/r/738068 (https://phabricator.wikimedia.org/T294955) [08:57:34] (03CR) 10Giuseppe Lavagetto: mediawiki::php: support multiple php version in monitoring too (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [08:58:08] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] aptrepo: init kubeadm 1.21 [puppet] - 10https://gerrit.wikimedia.org/r/738065 (https://phabricator.wikimedia.org/T282942) (owner: 10Majavah) [08:59:25] (03CR) 10Jcrespo: [C: 03+2] dbbackups: Switch s8 backup generation from db1116 to db1171 [puppet] - 10https://gerrit.wikimedia.org/r/736946 (https://phabricator.wikimedia.org/T290868) (owner: 10Marostegui) [09:01:09] (03PS3) 10Arturo Borrero Gonzalez: openstack: networktests: add systemd timer job to run the test suite [puppet] - 10https://gerrit.wikimedia.org/r/738068 (https://phabricator.wikimedia.org/T294955) [09:03:16] !log pull all packages for buster-wikimedia/thirdparty/kubeadm-k8s-1-21 (T282942) [09:03:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:20] T282942: Upgrade Toolforge Kubernetes to latest 1.21 - https://phabricator.wikimedia.org/T282942 [09:05:56] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: networktests: add systemd timer job to run the test suite [puppet] - 10https://gerrit.wikimedia.org/r/738068 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [09:06:12] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC: https://puppet-compiler.wmflabs.org/compiler1003/32368/" [puppet] - 10https://gerrit.wikimedia.org/r/738068 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [09:08:04] (03CR) 10Jelto: [C: 03+1] "lgtm, thanks for submitting the change" [deployment-charts] - 10https://gerrit.wikimedia.org/r/737974 (owner: 10JMeybohm) [09:08:45] (03PS1) 10Vgutierrez: site: Reimage cp3065 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/738177 (https://phabricator.wikimedia.org/T290005) [09:10:42] !log depool cp3065 to be reimaged as cache::upload_haproxy - T290005 [09:10:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:45] T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 [09:12:11] (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp3065 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/738177 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [09:13:54] !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp3065.esams.wmnet with OS buster [09:13:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:04] 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp3065.esams.wmnet with OS buster [09:17:56] (03CR) 10Volans: "Couple of minor comments inline, feel free to create the file on netbox-next to test it live, just remember to delete it to allow puppet t" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/737914 (owner: 10Jbond) [09:25:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove contributions from s5 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17725 and previous config saved to /var/cache/conftool/dbconfig/20211111-092528-marostegui.json [09:25:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:33] T263127: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 [09:27:26] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10Volans) I have a question regarding the Ganeti setup, what will be the final clustering? I'm asking in particular to update the spicerack config for ganeti: http... [09:28:59] (03PS1) 10Majavah: kubeadm: Upgrade Calico to v3.21.0 [puppet] - 10https://gerrit.wikimedia.org/r/738179 (https://phabricator.wikimedia.org/T292698) [09:29:03] (03PS1) 10Arturo Borrero Gonzalez: openstack: networktests: fix timer job interval specification [puppet] - 10https://gerrit.wikimedia.org/r/738180 (https://phabricator.wikimedia.org/T294955) [09:29:41] (03PS1) 10Jcrespo: backups: Disable notifications of dbprov2001 and dbprov1001 [puppet] - 10https://gerrit.wikimedia.org/r/738181 (https://phabricator.wikimedia.org/T280979) [09:31:26] (03PS4) 10Thiemo Kreuz (WMDE): Use more compact PHP7 syntax where possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737859 [09:32:27] (03PS5) 10Thiemo Kreuz (WMDE): Use more compact PHP7 syntax where possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737859 [09:32:53] (03CR) 10Thiemo Kreuz (WMDE): Use more compact PHP7 syntax where possible (038 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737859 (owner: 10Thiemo Kreuz (WMDE)) [09:33:59] (03PS1) 10JMeybohm: cfssl-issuer: Allow to install issuers via chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/738182 (https://phabricator.wikimedia.org/T294560) [09:35:14] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: networktests: fix timer job interval specification [puppet] - 10https://gerrit.wikimedia.org/r/738180 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [09:40:01] (03CR) 10JMeybohm: [C: 03+2] admin_ng: Fix templates being rendered as string [deployment-charts] - 10https://gerrit.wikimedia.org/r/737974 (owner: 10JMeybohm) [09:44:15] (03Merged) 10jenkins-bot: admin_ng: Fix templates being rendered as string [deployment-charts] - 10https://gerrit.wikimedia.org/r/737974 (owner: 10JMeybohm) [09:53:11] (03PS2) 10Thiemo Kreuz (WMDE): Streamline/modernize code in MWConfigCacheGenerator [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737857 [09:57:07] (03PS3) 10Thiemo Kreuz (WMDE): Streamline/modernize code in MWConfigCacheGenerator [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737857 [09:58:17] (03CR) 10Thiemo Kreuz (WMDE): Streamline/modernize code in MWConfigCacheGenerator (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737857 (owner: 10Thiemo Kreuz (WMDE)) [10:07:01] (03CR) 10JMeybohm: "This change is ready for review." [deployment-charts] - 10https://gerrit.wikimedia.org/r/738182 (https://phabricator.wikimedia.org/T294560) (owner: 10JMeybohm) [10:08:03] (03PS1) 10Arturo Borrero Gonzalez: openstack: networktests: introduce in eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/738191 (https://phabricator.wikimedia.org/T294955) [10:12:35] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC: https://puppet-compiler.wmflabs.org/compiler1002/32370/" [puppet] - 10https://gerrit.wikimedia.org/r/738191 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [10:13:10] (03CR) 10Jelto: [C: 03+1] "lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/737975 (https://phabricator.wikimedia.org/T295385) (owner: 10JMeybohm) [10:15:13] !log pool cp3065 running haproxy - T290005 [10:15:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:16] T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 [10:15:28] (03CR) 10Jcrespo: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/32369/dbprov1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/738181 (https://phabricator.wikimedia.org/T280979) (owner: 10Jcrespo) [10:15:33] (03PS2) 10Jcrespo: backups: Disable notifications of dbprov2001 and dbprov1001 [puppet] - 10https://gerrit.wikimedia.org/r/738181 (https://phabricator.wikimedia.org/T280979) [10:15:55] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests [10:15:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:57] !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests [10:15:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:44] are EU folks allowed to do regular config+backport deploys today? (despite the US holiday) [10:18:22] !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3065.esams.wmnet with OS buster [10:18:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:18:33] 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp3065.esams.wmnet with OS buster c... [10:19:56] 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez) [10:21:40] (03CR) 10Jcrespo: [C: 03+2] backups: Disable notifications of dbprov2001 and dbprov1001 [puppet] - 10https://gerrit.wikimedia.org/r/738181 (https://phabricator.wikimedia.org/T280979) (owner: 10Jcrespo) [10:24:52] !log jynus@cumin2002 START - Cookbook sre.hosts.reimage for host dbprov2001.codfw.wmnet with OS buster [10:24:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:22] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me. This was only limited to cn=ops since last year when Stevie built the initial install, the whole setup was still evolvin" [puppet] - 10https://gerrit.wikimedia.org/r/737926 (https://phabricator.wikimedia.org/T265990) (owner: 10Ladsgroup) [10:31:03] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/737847 (https://phabricator.wikimedia.org/T294580) (owner: 10Jelto) [10:31:37] (03CR) 10Jbond: "LGTM however you still have a use of `os.path.expanduser` in mysql.py:#76" [software/spicerack] - 10https://gerrit.wikimedia.org/r/737993 (owner: 10Volans) [10:32:22] (03CR) 10Jbond: [C: 03+1] constants: add new drmrs datacenter [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737989 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [10:32:38] (03CR) 10Jbond: [C: 03+1] interactive: change input prefix to ==> [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737990 (owner: 10Volans) [10:32:58] (03CR) 10Jbond: [C: 03+1] docs: add examples to all modules [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737991 (owner: 10Volans) [10:33:32] (03CR) 10Jbond: [C: 03+1] constants: add CORE_DATACENTERS constant [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737992 (owner: 10Volans) [10:37:12] !log updated routinator in thirdparty/routinator for bullseye-wikimedia to 0.10.12 T292503 [10:37:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:16] T292503: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 [10:38:08] 10SRE, 10Infrastructure-Foundations, 10netops: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 (10MoritzMuehlenhoff) >>! In T292503#7495257, @cmooney wrote: > A security update is now available which means we need to upgrade again: > > https://www.nlnetlabs.nl/news... [10:40:00] PROBLEM - MediaWiki exceptions and fatals per minute for jobrunner on alert1001 is CRITICAL: 118 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:42:26] RECOVERY - MediaWiki exceptions and fatals per minute for jobrunner on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:46:03] the job runner exceptions was "TypeError: Argument 2 passed to Parser::preSaveTransform() must implement interface MediaWiki\Page\PageReference, null given, called in /srv/mediawiki/php-1.38.0-wmf.7/includes/preferences/DefaultPreferencesFactory.php on line" [10:46:58] I wonder if I should file a ticket about it, as it only happened for a while? [10:48:14] ah, it happened before and I cannot see it on phabricator, I think I will create one [10:48:30] (03CR) 10Jbond: [C: 04-1] P:openstack::base::cloudgw: drop unneeded profiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737774 (owner: 10Jbond) [10:56:40] !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov2001.codfw.wmnet with OS buster [10:56:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:29] (03PS1) 10Giuseppe Lavagetto: noc: move template used only by NOC to it [puppet] - 10https://gerrit.wikimedia.org/r/738193 [11:00:05] mvolz: #bothumor I � Unicode. All rise for Services – Citoid / Zotero deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211111T1100). [11:03:14] (03PS18) 10Giuseppe Lavagetto: mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) [11:03:16] (03PS19) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [11:03:18] (03PS10) 10Giuseppe Lavagetto: mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 [11:03:20] (03PS1) 10Giuseppe Lavagetto: deployment-prep: install php 7.4 on a server [puppet] - 10https://gerrit.wikimedia.org/r/738194 [11:04:06] !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host dbprov1001.eqiad.wmnet with OS buster [11:04:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:06] (03PS1) 10Jcrespo: Revert "backups: Disable notifications of dbprov2001 and dbprov1001" [puppet] - 10https://gerrit.wikimedia.org/r/737959 [11:07:33] (03CR) 10Muehlenhoff: "This would work, but this is an edge package (it seems we're just adding a python3-flask-foo package here, right?), which doesn't conflict" [puppet] - 10https://gerrit.wikimedia.org/r/737856 (https://phabricator.wikimedia.org/T295234) (owner: 10Majavah) [11:10:13] (03PS1) 10Kosta Harlan: CreateAccountCampaign: Show/hide new HTML based on query param [extensions/GrowthExperiments] (wmf/1.38.0-wmf.7) - 10https://gerrit.wikimedia.org/r/737960 (https://phabricator.wikimedia.org/T295068) [11:10:24] (03PS1) 10Kosta Harlan: LoginSignup: Add function for overriding benefits container [core] (wmf/1.38.0-wmf.7) - 10https://gerrit.wikimedia.org/r/737961 (https://phabricator.wikimedia.org/T295068) [11:10:31] (03CR) 10jerkins-bot: [V: 04-1] CreateAccountCampaign: Show/hide new HTML based on query param [extensions/GrowthExperiments] (wmf/1.38.0-wmf.7) - 10https://gerrit.wikimedia.org/r/737960 (https://phabricator.wikimedia.org/T295068) (owner: 10Kosta Harlan) [11:13:13] (03PS1) 10Jcrespo: dbbackups: Reimage db1116, db1139, db2097, db2100 to buster [puppet] - 10https://gerrit.wikimedia.org/r/738195 (https://phabricator.wikimedia.org/T290865) [11:14:05] \o urbanecm are there special instructions for doing backports with a Depends-On? I mean, logically you'd do the patch that is needed by the other one first, then do the one that has Depends-On, but wondering if there's anything else to keep in mind [11:14:46] kostajh: nothing special, just pay extra attention to the order in which you're deploying [11:15:12] (both files within one commit and the commits themselves) [11:15:37] (03CR) 10Jcrespo: "FYI" [puppet] - 10https://gerrit.wikimedia.org/r/738195 (https://phabricator.wikimedia.org/T290865) (owner: 10Jcrespo) [11:15:58] In theory Gerrit should prevent you from merging in an incorrect order, but that's all -- everything else must be manually checked. [11:16:17] Feel free to ping me during the upcoming window if you need any help. EOM kostajh :) [11:16:25] (03PS1) 10Vgutierrez: prometheus:ops: Take role cache::upload_haproxy into account [puppet] - 10https://gerrit.wikimedia.org/r/738197 (https://phabricator.wikimedia.org/T290005) [11:16:32] urbanecm: cheers, thanks [11:17:23] "This change depends on a change that failed to merge" -- I guess there is nothing to do about that until the core patch is merged, right? https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/737960 [11:17:36] (03CR) 10Awight: Streamline/modernize code in MWConfigCacheGenerator (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737857 (owner: 10Thiemo Kreuz (WMDE)) [11:18:44] (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32371/console" [puppet] - 10https://gerrit.wikimedia.org/r/738197 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [11:20:07] kostajh: likely. [11:20:10] I filed T295543 for the job queue exceptions, hopefully I did it right [11:20:11] T295543: TypeError: Argument 2 passed to Parser::preSaveTransform() must implement interface MediaWiki\Page\PageReference, null given, called in /srv/mediawiki/php-1.38.0-wmf.7/includes/preferences/DefaultPreferencesFactory.php on line - https://phabricator.wikimedia.org/T295543 [11:22:28] (03PS5) 10Jbond: hiera: create script endpoint for exporting hiera data [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/737914 [11:23:09] (03PS6) 10Jbond: hiera: create script endpoint for exporting hiera data [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/737914 [11:25:27] (03PS7) 10Jbond: hiera: create script endpoint for exporting hiera data [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/737914 [11:27:53] (03CR) 10Vgutierrez: [V: 03+1 C: 03+2] prometheus:ops: Take role cache::upload_haproxy into account [puppet] - 10https://gerrit.wikimedia.org/r/738197 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [11:27:56] (03PS8) 10Jbond: hiera: create script endpoint for exporting hiera data [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/737914 [11:28:33] !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1001.eqiad.wmnet with OS buster [11:28:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:30] (03PS2) 10Giuseppe Lavagetto: deployment-prep: install php 7.4 on a mw appserver [puppet] - 10https://gerrit.wikimedia.org/r/738194 [11:29:44] (03PS2) 10Jcrespo: Revert "backups: Disable notifications of dbprov2001 and dbprov1001" [puppet] - 10https://gerrit.wikimedia.org/r/737959 [11:30:03] (03PS1) 10MMandere: site: Add drmrs cache instances [puppet] - 10https://gerrit.wikimedia.org/r/738199 (https://phabricator.wikimedia.org/T282787) [11:30:39] (03CR) 10jerkins-bot: [V: 04-1] site: Add drmrs cache instances [puppet] - 10https://gerrit.wikimedia.org/r/738199 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [11:31:06] (03CR) 10Jcrespo: [C: 03+2] Revert "backups: Disable notifications of dbprov2001 and dbprov1001" [puppet] - 10https://gerrit.wikimedia.org/r/737959 (owner: 10Jcrespo) [11:31:59] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests [11:32:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:00] !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests [11:32:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:13] (03PS3) 10JMeybohm: cfssl-issuer: Allow to install issuers via chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/738182 (https://phabricator.wikimedia.org/T294560) [11:35:17] (03PS1) 10Vgutierrez: cumin: Add cache::upload_haproxy to cp aliases [puppet] - 10https://gerrit.wikimedia.org/r/738200 (https://phabricator.wikimedia.org/T290005) [11:35:32] (03CR) 10Muehlenhoff: Switch eqiad labsldapconfig to the read-only replicas (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/525220 (https://phabricator.wikimedia.org/T46722) (owner: 10Muehlenhoff) [11:38:03] (03CR) 10Jbond: "thanks" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/737914 (owner: 10Jbond) [11:38:18] (03PS9) 10Jbond: hiera: create script endpoint for exporting hiera data [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/737914 [11:38:54] (03CR) 10jerkins-bot: [V: 04-1] hiera: create script endpoint for exporting hiera data [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/737914 (owner: 10Jbond) [11:39:28] (03CR) 10Jcrespo: [C: 03+2] dbbackups: Reimage db1116, db1139, db2097, db2100 to buster [puppet] - 10https://gerrit.wikimedia.org/r/738195 (https://phabricator.wikimedia.org/T290865) (owner: 10Jcrespo) [11:40:43] (03PS1) 10Jcrespo: dbbackups: Reenable notifications for db1116, db1139, db2097, db2100 [puppet] - 10https://gerrit.wikimedia.org/r/737963 [11:41:33] (03PS2) 10Jcrespo: dbbackups: Reenable notifications for db1116, db1139, db2097, db2100 [puppet] - 10https://gerrit.wikimedia.org/r/737963 [11:46:09] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/738200 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [11:46:22] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/738200 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [11:46:45] (03CR) 10Vgutierrez: [C: 03+2] cumin: Add cache::upload_haproxy to cp aliases [puppet] - 10https://gerrit.wikimedia.org/r/738200 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [11:47:39] (03CR) 10Awight: [C: 04-2] "Doing this in core first." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737209 (owner: 10Awight) [11:52:34] (03CR) 10Awight: [C: 03+1] Use more compact PHP7 syntax where possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737859 (owner: 10Thiemo Kreuz (WMDE)) [11:53:36] (03PS3) 10JMeybohm: admin_ng: Add helmfile for cert-manager and cfssl-issuer [deployment-charts] - 10https://gerrit.wikimedia.org/r/737939 (https://phabricator.wikimedia.org/T294560) [11:53:38] (03PS3) 10JMeybohm: admin_ng: Create Certificates for ingressgateway [deployment-charts] - 10https://gerrit.wikimedia.org/r/737975 (https://phabricator.wikimedia.org/T295385) [11:56:14] (03PS2) 10MMandere: site: Add drmrs cache instances [puppet] - 10https://gerrit.wikimedia.org/r/738199 (https://phabricator.wikimedia.org/T282787) [11:56:16] (03PS4) 10JMeybohm: cfssl-issuer: Allow to install issuers via chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/738182 (https://phabricator.wikimedia.org/T294560) [11:58:24] (03CR) 10Majavah: [C: 04-1] "I think these should be insetup_noferm instead" [puppet] - 10https://gerrit.wikimedia.org/r/738199 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [11:59:35] (03PS3) 10Giuseppe Lavagetto: deployment-prep: install php 7.4 on a mw appserver [puppet] - 10https://gerrit.wikimedia.org/r/738194 [11:59:57] (03PS2) 10Volans: Adopt pathlib.Path [software/spicerack] - 10https://gerrit.wikimedia.org/r/737993 [11:59:58] I was hoping for a nice and quiet backport window :S [12:00:04] Amir1, Lucas_WMDE, and apergos: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport and config training. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211111T1200). [12:00:04] Lucas_WMDE, kostajh, and awight: A patch you scheduled for UTC morning backport and config training is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:07] nope [12:00:08] o/ [12:00:12] (03CR) 10Volans: "Thanks for the review, replies inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/737993 (owner: 10Volans) [12:00:18] no trainees have signed up for the window but there are more than 6 patches listed [12:00:19] \o [12:00:25] I can deploy [12:00:32] Lucas_WMDE: fwiw my patches are nice-to-have and can be ignored if we run out of time. [12:00:40] I have not had time to look at them at all carefully. [12:00:45] kostajh: do you want to self-serve? [12:01:00] I think we can skip my change for a bit [12:01:05] Lucas_WMDE: sure, I can do that [12:01:12] ok, go ahead [12:01:24] (03CR) 10Kosta Harlan: [C: 03+2] "backport" [core] (wmf/1.38.0-wmf.7) - 10https://gerrit.wikimedia.org/r/737961 (https://phabricator.wikimedia.org/T295068) (owner: 10Kosta Harlan) [12:01:35] Lucas_WMDE: I can deploy my own too, once you're done. [12:01:51] awight, as I look at yours, they are all independent, yes? [12:02:00] apergos: They should be. [12:02:05] Lucas_WMDE: since my GrowthExperiments patch depends on the one I just +2'ed, I should wait until the core patch is fully deployed before +2'ing the GrowthExperiments one? [12:02:07] 👍 [12:02:24] kostajh: I think you can already +2 the GrowthExperiments one [12:02:32] Zuul *should* ensure it’s not merged too early [12:02:42] (we can also check that they chain properly on the Zuul dashboard) [12:03:17] (03CR) 10Kosta Harlan: [C: 03+2] "backport" [extensions/GrowthExperiments] (wmf/1.38.0-wmf.7) - 10https://gerrit.wikimedia.org/r/737960 (https://phabricator.wikimedia.org/T295068) (owner: 10Kosta Harlan) [12:03:31] yup, looks like they’re properly chained [12:04:03] (03CR) 10Kosta Harlan: [C: 03+2] "recheck" [extensions/GrowthExperiments] (wmf/1.38.0-wmf.7) - 10https://gerrit.wikimedia.org/r/737960 (https://phabricator.wikimedia.org/T295068) (owner: 10Kosta Harlan) [12:04:15] Zuul says the MediaWiki change will take 20 minutes, perhaps we should do some of awight’s changes in the meantime? [12:04:26] Lucas_WMDE: sure, I can start on those. [12:04:28] (my change is risky and I want plenty of time to check it, so I’d prefer to wait with that until the end of the window) [12:04:52] (03PS2) 10Awight: Anchor relative import [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737187 [12:05:03] (03CR) 10Awight: [C: 03+2] "deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737187 (owner: 10Awight) [12:05:25] thank you btw Lucas_WMDE for commenting on the patch itself that it's risky, two thumbs up from me for that [12:06:11] (03Merged) 10jenkins-bot: Anchor relative import [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737187 (owner: 10Awight) [12:06:19] heh, I almost forgot I did that ^^ [12:06:49] (03PS3) 10MMandere: site: Add drmrs cache instances [puppet] - 10https://gerrit.wikimedia.org/r/738199 (https://phabricator.wikimedia.org/T282787) [12:08:12] (03CR) 10MMandere: site: Add drmrs cache instances (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/738199 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [12:08:31] (03CR) 10jerkins-bot: [V: 04-1] Adopt pathlib.Path [software/spicerack] - 10https://gerrit.wikimedia.org/r/737993 (owner: 10Volans) [12:08:33] !log awight@deploy1002 Synchronized multiversion/buildConfigCache.php: Config: [[gerrit:737187|Anchor relative import]] (duration: 00m 56s) [12:08:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:43] (03PS2) 10Awight: Avoid error suppression [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737192 [12:08:52] (03CR) 10Awight: [C: 03+2] "deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737192 (owner: 10Awight) [12:09:59] Anyone have an idea how I can remove a file belonging to www-data on mwdebug1002, /tmp/mw-cache-1.38.0-wmf.7/conf2-enwiki.json [12:10:17] I'd like to smoke-test the config cache generator. [12:10:18] sudo -u www-data unlink /tmp/mw-cache-1.38.0-wmf.7/conf2-enwiki.json should work, I think? [12:10:24] (03Merged) 10jenkins-bot: Avoid error suppression [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737192 (owner: 10Awight) [12:10:33] !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host db1116.eqiad.wmnet with OS buster [12:10:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:37] * awight is surprised at having this sudopower [12:10:40] +1 thanks! [12:10:47] (sudo -u www-data who am I seems to work, at least ^^) [12:10:57] !log jynus@cumin2002 START - Cookbook sre.hosts.reimage for host db2097.codfw.wmnet with OS buster [12:10:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:02] I think we’re not allowed to sudo as root, but www-data is allowed [12:12:27] (03PS2) 10Awight: Don't need to keep all config in memory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737189 [12:12:34] * majavah points towards `sudo -l` [12:12:50] (03CR) 10Awight: [C: 03+2] "deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737189 (owner: 10Awight) [12:13:05] oooooh, I did not know that [12:13:09] !log awight@deploy1002 Synchronized multiversion/MWConfigCacheGenerator.php: Config: [[gerrit:737192|Avoid error suppression]] (duration: 00m 55s) [12:13:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:11] majavah: ty! [12:13:36] (03Merged) 10jenkins-bot: Don't need to keep all config in memory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737189 (owner: 10Awight) [12:13:37] I thought since the sudoers file isn’t readable, we’re not allowed to know what we can do :D [12:13:47] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [12:13:48] :-D [12:13:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:49] (unless I could be bothered to find it in puppet config) [12:14:49] (or ask a friendly neighborhood root :-P) [12:15:20] !log awight@deploy1002 Synchronized multiversion/buildConfigCache.php: Config: [[gerrit:737189|Don't need to keep all config in memory]] (duration: 00m 55s) [12:15:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:16:14] I'm going to postpone the last patch, it's too scary and involves more entrypoints than I know how to test. [12:16:19] Lucas_WMDE: all yours! [12:16:22] 👍 [12:16:45] I’ll let kostajh take over, don’t think 8 minutes is enough for me to test the Wikibase change [12:16:47] Lucas_WMDE: you want to do yours after the core/GrowthExperiments one, rihgt? [12:16:49] yeah [12:16:52] k [12:17:31] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [12:17:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:02] !log imported openjdk-8 8u312-b07-1~deb10u1 to component/jdk8 for buster-wikimedia (rebuild of latest Java 8 security release for Buster) [12:21:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:13] (03Merged) 10jenkins-bot: LoginSignup: Add function for overriding benefits container [core] (wmf/1.38.0-wmf.7) - 10https://gerrit.wikimedia.org/r/737961 (https://phabricator.wikimedia.org/T295068) (owner: 10Kosta Harlan) [12:21:16] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [12:22:06] ^ I don't know what that means exactly, but it sounds bad, do I need to wait before deploying? (Logstash seems available to me) [12:22:49] !log jgiannelos@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [12:22:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:22:52] I think you can proceed [12:23:02] (03Merged) 10jenkins-bot: CreateAccountCampaign: Show/hide new HTML based on query param [extensions/GrowthExperiments] (wmf/1.38.0-wmf.7) - 10https://gerrit.wikimedia.org/r/737960 (https://phabricator.wikimedia.org/T295068) (owner: 10Kosta Harlan) [12:23:12] not sure if codfw logstash hosts are used for anything while eqiad is the active DC [12:23:20] alright [12:25:31] (03PS1) 10Jgiannelos: tegola-vector-tiles: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/738210 [12:26:21] I assume you need to sync the MediaWiki core file first, then HomepageHooks, then SpecialCreateAccountCampaign [12:26:39] since the special page seems to use the new constants from HomepageHooks [12:26:42] Lucas_WMDE: yes, I am starting with the mediawiki/core file [12:26:45] ok [12:27:39] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [12:27:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:28:58] !log kharlan@deploy1002 Synchronized php-1.38.0-wmf.7/includes/specialpage/LoginSignupSpecialPage.php: Backport: [[gerrit:737961|LoginSignup: Add function for overriding benefits container (T295068)]] (duration: 00m 57s) [12:29:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:01] T295068: Donors to newcomers: recurring donor control page - https://phabricator.wikimedia.org/T295068 [12:29:24] Moving on to the GrowthExperiments patch now [12:30:18] Lucas_WMDE: I just realized I didn't run "git submodule update" for the core patch. But I think that is OK? [12:30:29] yup, core's not a submodule [12:30:31] !log jynus@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2097.codfw.wmnet with OS buster [12:30:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:30:35] should be okay yeah [12:31:01] (you can always look to /srv/mediawiki at mwdebug100x if you want to be super-sure) [12:31:17] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [12:31:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:19] what would one be looking for there, urbanecm ? [12:32:01] whether the file contents look as expected after a `scap pull`, I guess? [12:32:19] after scap sync-file. You obviously need to look at the debug srv you normally don't use to test :) [12:32:26] (03CR) 10Awight: [C: 03+1] "I balked at deploying this today, because I don't know how to test or monitor many of the endpoints. Maybe I'll split this into changes t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737859 (owner: 10Thiemo Kreuz (WMDE)) [12:32:45] awight: +1 for splitting that change [12:32:59] apergos: basically if you want to make sure you synced your change properly, you can see that by logging in to a debug server you don't use for testing with scap pull (or to an appserver, doesn't matter). In /srv/mediawiki, your file should be changed [12:33:04] if it's not, you did something incorrectly [12:33:16] and now `git log -p HEAD..@{u}` doesn't show any output, but I guess that is to be expected since it did show both patches when I did `git fetch` for the first patch? [12:33:16] oic, good idea [12:33:28] kostajh: I think so, yeah [12:33:42] so now is the point where you run git submodule update for GrowthExperiments [12:33:43] (where something is "forgot to rebase" or "forgot to do git submodule update" or "forgot to fetch" or "synced incorrect file") [12:33:58] right, just did `git submodule update` [12:34:08] so verifying on mwdebug1002 [12:37:16] (03PS1) 10Arturo Borrero Gonzalez: openstack: networktests: update eqiad1 bastion [puppet] - 10https://gerrit.wikimedia.org/r/738211 (https://phabricator.wikimedia.org/T294955) [12:37:40] !log jynus@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1116.eqiad.wmnet with OS buster [12:37:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:05] OK, I think I am at the part where I need to verify my files are on mwdebug, because I don't see what I am expecting [12:38:49] wait, nevermind [12:38:58] too many parameters, sigh... [12:39:27] (03CR) 10Volans: [C: 03+2] constants: add new drmrs datacenter [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737989 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [12:39:28] uhh, something else though [12:39:34] (03CR) 10Volans: [C: 03+2] interactive: change input prefix to ==> [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737990 (owner: 10Volans) [12:39:36] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: networktests: update eqiad1 bastion [puppet] - 10https://gerrit.wikimedia.org/r/738211 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [12:39:40] (03CR) 10Volans: [C: 03+2] docs: add examples to all modules [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737991 (owner: 10Volans) [12:39:42] awight: is it possible you forgot to `git rebase` after fetching the “don’t keep all config in memory” change? [12:39:45] (03CR) 10Volans: [C: 03+2] constants: add CORE_DATACENTERS constant [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737992 (owner: 10Volans) [12:39:49] because my prompt now says git is at u-1, behind upstream [12:40:09] (in /srv/mediawiki-staging, not in a wmf. branch) [12:40:22] (shouldn’t affect you kostajh, we can sort that out afterwards) [12:40:47] ack [12:41:26] I would like to squeeze in a small config patch, if possible, after this sync. That's why I was having issues verifying my patch earlier. [12:41:42] ok [12:42:06] (03Merged) 10jenkins-bot: constants: add new drmrs datacenter [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737989 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [12:42:37] (03CR) 10Jelto: [C: 03+1] "lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/738182 (https://phabricator.wikimedia.org/T294560) (owner: 10JMeybohm) [12:42:47] so, i'm going to run these two commands `scap sync-file php-1.38.0-wmf.7/extensions/GrowthExperiments/includes/HomepageHooks.php 'Backport: [[gerrit:737960|CreateAccountCampaign: Show/hide new HTML based on query param (T295068) (HomepageHooks.php)]]'` [12:42:47] T295068: Donors to newcomers: recurring donor control page - https://phabricator.wikimedia.org/T295068 [12:42:52] (03Merged) 10jenkins-bot: interactive: change input prefix to ==> [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737990 (owner: 10Volans) [12:42:54] (03Merged) 10jenkins-bot: docs: add examples to all modules [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737991 (owner: 10Volans) [12:42:56] (03Merged) 10jenkins-bot: constants: add CORE_DATACENTERS constant [software/pywmflib] - 10https://gerrit.wikimedia.org/r/737992 (owner: 10Volans) [12:43:27] followed by `scap sync-file php-1.38.0-wmf.7/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php 'Backport: [[gerrit:737960|CreateAccountCampaign: Show/hide new HTML based on query param (T295068) (SpecialCreateAccountCampaign.php)]]'` [12:43:36] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 45.09 ms [12:43:37] does that look right? HomepageHooks.php, then SpecialCreateAccountCampaign.php. [12:43:48] I usually add (1/2) or (2/2) to the message in such cases, after the [[wikilink]] [12:43:52] apart from that, looks good to me [12:43:55] ok [12:44:04] oh, wait [12:44:09] I just noticed you already put the file name in the message [12:44:14] so I guess that would be okay too [12:44:18] ok [12:46:04] !log kharlan@deploy1002 Synchronized php-1.38.0-wmf.7/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:737960|CreateAccountCampaign: Show/hide new HTML based on query param (T295068) (1/2 HomepageHooks.php)]] (duration: 00m 54s) [12:46:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:19] !log kharlan@deploy1002 Synchronized php-1.38.0-wmf.7/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: Backport: [[gerrit:737960|CreateAccountCampaign: Show/hide new HTML based on query param (T295068) (2/2 SpecialCreateAccountCampaign.php)]] (duration: 00m 55s) [12:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:29] 👍 [12:47:34] and here comes the config patch [12:47:44] but we definitely need to fix that missing rebase first [12:47:50] before doing another config patch [12:47:52] can I take over? [12:48:29] (03PS1) 10Arturo Borrero Gonzalez: openstack: networktests: replace curl silent argument [puppet] - 10https://gerrit.wikimedia.org/r/738212 (https://phabricator.wikimedia.org/T294955) [12:48:32] jynus, urbanecm: see https://phabricator.wikimedia.org/T295543#7498559 [12:48:33] (03PS1) 10Kosta Harlan: GrowthExperiments: Add campaign pattern for control group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/738213 (https://phabricator.wikimedia.org/T295068) [12:48:35] Lucas_WMDE: yes please do [12:48:36] Lucas_WMDE: youch, yes I think I missed the rebase. [12:48:42] ok [12:48:50] I’ll do it now [12:49:01] RhinosF1: I'll check quickly [12:49:15] Ty urbanecm [12:49:16] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: networktests: replace curl silent argument [puppet] - 10https://gerrit.wikimedia.org/r/738212 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [12:49:30] Lucas_WMDE: confirmed by looking at bash history :-/ [12:49:30] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Only load static configs once [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737188 (owner: 10Awight) [12:50:44] !log lucaswerkmeister-wmde@deploy1002 Synchronized multiversion/buildConfigCache.php: Config: [[gerrit:737189|Don't need to keep all config in memory]] (resync, previous deploy for this file was missing `git rebase`) (duration: 00m 55s) [12:50:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:51] thank you, RhinosF1 [12:51:15] alright, that should be done now [12:51:23] kostajh: do you want to do your config change now? [12:51:44] Lucas_WMDE: sure [12:51:50] ok, over to you then [12:52:06] (03PS1) 10Arturo Borrero Gonzalez: openstack: networktests: fix toolforge.org IP address [puppet] - 10https://gerrit.wikimedia.org/r/738214 (https://phabricator.wikimedia.org/T294955) [12:52:15] and I’ll just do my change afterwards, there’s a 4h break before the puppet window :) [12:52:19] Lucas_WMDE: do you mind reviewing https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/738213 quickly? [12:52:49] I had a look but I have no idea what that pattern does or if it’s correct to add the control group there [12:53:00] jynus: just to make sure, does the T295543 issue cause any "significant issues"? In another words, would you prefer me reverting the change that caused the error, or can that wait for Wikipedia Library team to fix the issue in their code? [12:53:01] T295543: TypeError: Argument 2 passed to Parser::preSaveTransform() must implement interface MediaWiki\Page\PageReference, null given, called in /srv/mediawiki/php-1.38.0-wmf.7/includes/preferences/DefaultPreferencesFactory.php on line - https://phabricator.wikimedia.org/T295543 [12:53:13] Lucas_WMDE: Thanks! I need to get out the oil can and deploy more often. [12:53:16] urbanecm, no issues other than monitoring spam [12:53:29] although I have not checked user impact [12:53:43] urbanecm, definitely can wait [12:53:44] it's a new feature, worst-case scenario is "users don't see it" :) [12:53:44] Lucas_WMDE: the idea is you can go to https://en.wikipedia.org/wiki/Special:CreateAccount?campaign=growth-recurring-english-2021&geNewLandingHtml=1&geEnabled=1, and with the new config flag, https://en.wikipedia.org/wiki/Special:CreateAccount?campaign=growth-recurring-english-control-2021&geNewLandingHtml=0&geEnabled=1 [12:53:51] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: networktests: fix toolforge.org IP address [puppet] - 10https://gerrit.wikimedia.org/r/738214 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [12:54:00] great, I'll leave it enabled then :) [12:54:06] anyway, the change should be pretty straightforward, going to deploy it now [12:54:17] ok, go ahead [12:54:25] thanks jynus [12:54:28] as a code change it looks fine, no syntax errors at least ;) [12:54:42] (03CR) 10Kosta Harlan: [C: 03+2] "backport/config" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/738213 (https://phabricator.wikimedia.org/T295068) (owner: 10Kosta Harlan) [12:54:47] urbanecm, note that while I agree with you, that train of thought would work for reverting too :-D [12:55:24] sure, but having it enabled makes it easier for the maintainers to fix (if it can't be reproduced elsewhere for some reason, for instance) [12:55:27] (03Merged) 10jenkins-bot: GrowthExperiments: Add campaign pattern for control group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/738213 (https://phabricator.wikimedia.org/T295068) (owner: 10Kosta Harlan) [12:55:34] yes, that is a valid reason [12:55:48] plus more work sometimes [12:55:58] (03CR) 10Cparle: [C: 03+1] Explicitly disable references support on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737370 (https://phabricator.wikimedia.org/T230315) (owner: 10Matthias Mullie) [12:56:30] plus in general jobqueue issues are not as impacting as user request ones [12:57:18] (03CR) 10Jelto: [C: 03+1] "lgtm, let me know when you want to roll this out. I would force a puppet run on GitLab in codfw (replica) first to check if everything is " [puppet] - 10https://gerrit.wikimedia.org/r/737968 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [12:58:30] Lucas_WMDE: it seems like the config change is not on mwdebug [12:58:42] kostajh: I think you forgot to git rebase as well? :P [12:58:45] my prompt says u-1 again [12:58:47] oh dear [12:59:11] there it is [12:59:22] \o/ [12:59:29] in one minute the window is formally over. no pressure :-P [12:59:35] go scap go [12:59:56] the little scap that could [13:00:06] n't. [13:00:15] jouncebot: now [13:00:15] No deployments scheduled for the next 3 hour(s) and 59 minute(s) [13:00:18] bah [13:00:24] PROBLEM - MariaDB Replica IO: s1 on db1139 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [13:00:25] !log kharlan@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:738213|GrowthExperiments: Add campaign pattern for control group (T295068)]] (duration: 00m 55s) [13:00:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:29] T295068: Donors to newcomers: recurring donor control page - https://phabricator.wikimedia.org/T295068 [13:00:31] just barely [13:00:40] ^that is me, but it wasn't supposed to alert [13:00:54] whew ok as long as it ain't us [13:00:58] and it's known [13:00:59] alright. all done from me. thanks for your help, and patience ;) [13:01:29] !log UTC morning backport+config window formally over (I’ll do one more config change shortly) [13:01:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:33] I wonder what we can do to help future deployers not miss the rebase step. hrm [13:01:42] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [13:01:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:50] (03PS2) 10Lucas Werkmeister (WMDE): Load Wikibase Client before other Wikibase extensions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735367 (https://phabricator.wikimedia.org/T294224) [13:04:09] icinga didn't apply well the service disabling :-(, had to do it manually [13:04:14] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Load Wikibase Client before other Wikibase extensions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735367 (https://phabricator.wikimedia.org/T294224) (owner: 10Lucas Werkmeister (WMDE)) [13:05:03] (03Merged) 10jenkins-bot: Load Wikibase Client before other Wikibase extensions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735367 (https://phabricator.wikimedia.org/T294224) (owner: 10Lucas Werkmeister (WMDE)) [13:05:27] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [13:05:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:35] pulled to 1001, testing… [13:05:38] *mwdebug1001 [13:06:29] T294224 gets fixed by the change (at least the Commons), that’s good [13:06:30] T294224: mw.wikibase.lexeme is nil on Beta Wikidata, and mw.wikibase.mediainfo is nil on Commons (beta+prod) - https://phabricator.wikimedia.org/T294224 [13:06:37] now testing that everything else still seems to work [13:09:00] jynus: happy to help anytime! [13:10:55] (03CR) 10Jbond: [C: 03+1] Adopt pathlib.Path [software/spicerack] - 10https://gerrit.wikimedia.org/r/737993 (owner: 10Volans) [13:11:20] Wikibase, WikibaseLexeme and WikibaseMediaInfo are all still working as far as I can tell [13:11:54] PROBLEM - k8s API server requests latencies on kubemaster2001 is CRITICAL: instance=10.192.0.56 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [13:12:04] I think I’m comfortable syncing the config change [13:12:27] what are the risks just so we know? [13:12:30] and then you should go ahead [13:12:42] it’s possible that Wikibase will behave weirdly for some reason [13:12:46] due to load order issues [13:13:00] but the load order *with* the change is closer to what we test in CI and in most other places [13:13:15] (WikibaseRepo, WikibaseClient, *then* other extensions, rather than WikibaseLexeme between repo and client) [13:13:17] so it should be fine [13:13:51] syncing [13:13:53] (03CR) 10JMeybohm: admin_ng: Create Certificates for ingressgateway (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/737975 (https://phabricator.wikimedia.org/T295385) (owner: 10JMeybohm) [13:13:59] hoping for the best :-) [13:14:44] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:735367|Load Wikibase Client before other Wikibase extensions (T294224)]] (duration: 00m 55s) [13:14:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:47] T294224: mw.wikibase.lexeme is nil on Beta Wikidata, and mw.wikibase.mediainfo is nil on Commons (beta+prod) - https://phabricator.wikimedia.org/T294224 [13:15:28] RECOVERY - k8s API server requests latencies on kubemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [13:15:35] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [13:15:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:30] (03CR) 10JMeybohm: [C: 03+2] cfssl-issuer: Allow to install issuers via chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/738182 (https://phabricator.wikimedia.org/T294560) (owner: 10JMeybohm) [13:19:16] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [13:19:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:22] alright, I think I’m done with deploying for now :) [13:22:04] (03Merged) 10jenkins-bot: cfssl-issuer: Allow to install issuers via chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/738182 (https://phabricator.wikimedia.org/T294560) (owner: 10JMeybohm) [13:23:27] 10SRE-Access-Requests: Requesting access to deployment for SCherukuwada - https://phabricator.wikimedia.org/T295550 (10SCherukuwada) [13:23:46] yay! [13:25:13] 10SRE-Access-Requests: Requesting access to deployment for SCherukuwada - https://phabricator.wikimedia.org/T295550 (10SCherukuwada) a:03SCherukuwada Adding direct manager for approval. [13:26:57] (03CR) 10EllenR: Set up beta test environment for QuickSurvey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737503 (https://phabricator.wikimedia.org/T293798) (owner: 10EllenR) [13:27:34] PROBLEM - k8s API server requests latencies on kubemaster2001 is CRITICAL: instance=10.192.0.56 verb={LIST,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [13:28:43] (03Abandoned) 10EllenR: Set up beta test environment for QuickSurvey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737987 (https://phabricator.wikimedia.org/T293798) (owner: 10EllenR) [13:29:04] (03Abandoned) 10EllenR: Set up beta test environment for QuickSurvey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737971 (https://phabricator.wikimedia.org/T293798) (owner: 10EllenR) [13:29:20] (03PS4) 10JMeybohm: admin_ng: Create Certificates for ingressgateway [deployment-charts] - 10https://gerrit.wikimedia.org/r/737975 (https://phabricator.wikimedia.org/T295385) [13:29:42] RECOVERY - k8s API server requests latencies on kubemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [13:30:50] (03CR) 10Jelto: [C: 03+2] aptrepo::files::updates Update gitlab-ce and gitlab-runner to 14.4 [puppet] - 10https://gerrit.wikimedia.org/r/737847 (https://phabricator.wikimedia.org/T294580) (owner: 10Jelto) [13:31:17] (03PS2) 10Jelto: aptrepo::files::updates Update gitlab-ce and gitlab-runner to 14.4 [puppet] - 10https://gerrit.wikimedia.org/r/737847 (https://phabricator.wikimedia.org/T294580) [13:31:50] (03CR) 10Vgutierrez: [C: 03+1] site: Add drmrs cache instances [puppet] - 10https://gerrit.wikimedia.org/r/738199 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [13:32:46] (03CR) 10jerkins-bot: [V: 04-1] admin_ng: Create Certificates for ingressgateway [deployment-charts] - 10https://gerrit.wikimedia.org/r/737975 (https://phabricator.wikimedia.org/T295385) (owner: 10JMeybohm) [13:32:49] (03PS20) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [13:32:51] (03PS11) 10Giuseppe Lavagetto: mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 [13:32:53] (03PS4) 10Giuseppe Lavagetto: deployment-prep: install php 7.4 on a mw appserver [puppet] - 10https://gerrit.wikimedia.org/r/738194 [13:38:37] 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for SCherukuwada - https://phabricator.wikimedia.org/T295552 (10SCherukuwada) [13:38:38] !log root@cumin1001 START - Cookbook sre.hosts.ipmi-password-reset [13:38:39] !log root@cumin1001 END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99) [13:38:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:13] (03CR) 10JMeybohm: Implement CFSSL API signer (031 comment) [software/cfssl-issuer] - 10https://gerrit.wikimedia.org/r/736808 (https://phabricator.wikimedia.org/T294560) (owner: 10JMeybohm) [13:45:17] !log jynus@cumin2002 START - Cookbook sre.hosts.reimage for host db2100.codfw.wmnet with OS buster [13:45:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:36] (03PS6) 10EllenR: Set up beta test environment for QuickSurvey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737503 (https://phabricator.wikimedia.org/T293798) [13:47:50] (03CR) 10Dzahn: "yea, sorry, I kind of knew this wasn't ready yet, will amend and compile" [puppet] - 10https://gerrit.wikimedia.org/r/736074 (owner: 10Dzahn) [13:48:06] !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster [13:48:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:27] (03CR) 10EllenR: Set up beta test environment for QuickSurvey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737503 (https://phabricator.wikimedia.org/T293798) (owner: 10EllenR) [13:55:50] (03CR) 10ArielGlenn: snapshot: replace the word cron everywhere (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/736074 (owner: 10Dzahn) [13:57:04] (03PS1) 10Volans: CHANGELOG: add changelogs for release v1.0.0 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/738217 [13:59:00] !log installing bind9 security updates (only client-side-tools/libs) [13:59:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:32] (03PS2) 10Volans: CHANGELOG: add changelogs for release v1.0.0 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/738217 [14:01:35] (03PS19) 10Giuseppe Lavagetto: mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) [14:01:37] (03PS21) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [14:01:39] (03PS12) 10Giuseppe Lavagetto: mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 [14:01:41] (03PS5) 10Giuseppe Lavagetto: deployment-prep: install php 7.4 on a mw appserver [puppet] - 10https://gerrit.wikimedia.org/r/738194 [14:03:03] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v1.0.0 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/738217 (owner: 10Volans) [14:05:49] !log btullis@cumin1001 START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001 [14:05:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:45] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v1.0.0 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/738217 (owner: 10Volans) [14:10:51] !log jynus@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2100.codfw.wmnet with OS buster [14:10:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:39] !log jynus@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster [14:12:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:00] (03PS1) 10Volans: Upstream release v1.0.0 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/738219 [14:13:19] (03CR) 10Volans: [C: 03+2] Upstream release v1.0.0 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/738219 (owner: 10Volans) [14:14:32] (03CR) 10Btullis: [V: 03+1 C: 03+2] Enable refine_sanitize_delayed jobs in test [puppet] - 10https://gerrit.wikimedia.org/r/737658 (owner: 10Btullis) [14:15:24] volans: 1.0.0 \o/ [14:15:36] !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster [14:15:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:45] elukey: I decided it was time for real semver :D [14:15:55] (03PS3) 10Elukey: profile::base::certificates: vary trusted_certs on realm [puppet] - 10https://gerrit.wikimedia.org/r/737983 (https://phabricator.wikimedia.org/T291905) [14:15:57] (03PS5) 10Elukey: Move coal, navtiming and statsv to the new CA bundle [puppet] - 10https://gerrit.wikimedia.org/r/737970 (https://phabricator.wikimedia.org/T291905) [14:17:02] (03Merged) 10jenkins-bot: Upstream release v1.0.0 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/738219 (owner: 10Volans) [14:17:18] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:17:26] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32372/console" [puppet] - 10https://gerrit.wikimedia.org/r/737983 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [14:17:56] (03PS2) 10Btullis: Enable refine_sanitize_delayed jobs in test [puppet] - 10https://gerrit.wikimedia.org/r/737658 [14:21:58] (03PS3) 10Btullis: Enable refine_sanitize_delayed jobs in test [puppet] - 10https://gerrit.wikimedia.org/r/737658 [14:21:59] !log uploaded python3-wmflib_1.0.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia [14:22:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:25] (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/737993 (owner: 10Volans) [14:24:05] (03CR) 10Btullis: [C: 03+2] Enable refine_sanitize_delayed jobs in test [puppet] - 10https://gerrit.wikimedia.org/r/737658 (owner: 10Btullis) [14:24:25] (03PS1) 10Majavah: dynamicproxy: add tls to api [puppet] - 10https://gerrit.wikimedia.org/r/738220 [14:24:48] (03PS2) 10Majavah: dynamicproxy: add tls to api [puppet] - 10https://gerrit.wikimedia.org/r/738220 [14:26:07] (03CR) 10Btullis: [C: 03+2] Revert the temporary change that was made for transfer.py [homer/public] - 10https://gerrit.wikimedia.org/r/737906 (https://phabricator.wikimedia.org/T295312) (owner: 10Btullis) [14:26:55] (03Merged) 10jenkins-bot: Revert the temporary change that was made for transfer.py [homer/public] - 10https://gerrit.wikimedia.org/r/737906 (https://phabricator.wikimedia.org/T295312) (owner: 10Btullis) [14:28:04] (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] "deploying" [puppet] - 10https://gerrit.wikimedia.org/r/737926 (https://phabricator.wikimedia.org/T265990) (owner: 10Ladsgroup) [14:30:27] (03CR) 10Volans: [C: 03+2] Adopt pathlib.Path [software/spicerack] - 10https://gerrit.wikimedia.org/r/737993 (owner: 10Volans) [14:31:14] !log jmm@cumin2002 START - Cookbook sre.dns.netbox [14:31:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:33] !log jynus@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster [14:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:27] !log btullis@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001 [14:33:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:26] (03PS1) 104nn1l2: Change votewiki language back to English [mediawiki-config] - 10https://gerrit.wikimedia.org/r/738222 (https://phabricator.wikimedia.org/T292685) [14:37:46] (03PS1) 10Ladsgroup: orchestrator: Open idp to nda and wmf [puppet] - 10https://gerrit.wikimedia.org/r/738223 [14:37:57] (03Merged) 10jenkins-bot: Adopt pathlib.Path [software/spicerack] - 10https://gerrit.wikimedia.org/r/737993 (owner: 10Volans) [14:38:08] (03PS2) 10Ladsgroup: orchestrator: Open idp to nda and wmf [puppet] - 10https://gerrit.wikimedia.org/r/738223 [14:38:11] !log btullis@cumin1001 START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - btullis@cumin1001 [14:38:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:15] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/738223 (owner: 10Ladsgroup) [14:40:58] (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] "PCC https://puppet-compiler.wmflabs.org/compiler1003/32373/dborch1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/738223 (owner: 10Ladsgroup) [14:41:36] !log installing libxstream-java security updates [14:41:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:26] (03PS4) 10Elukey: profile::base::certificates: vary trusted_certs on realm [puppet] - 10https://gerrit.wikimedia.org/r/737983 (https://phabricator.wikimedia.org/T291905) [14:42:28] (03PS6) 10Elukey: Move coal, navtiming and statsv to the new CA bundle [puppet] - 10https://gerrit.wikimedia.org/r/737970 (https://phabricator.wikimedia.org/T291905) [14:42:30] !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [14:42:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:32] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32374/console" [puppet] - 10https://gerrit.wikimedia.org/r/737983 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [14:44:46] (03CR) 10Alexandros Kosiaris: [C: 04-1] admin_ng: Create Certificates for ingressgateway (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/737975 (https://phabricator.wikimedia.org/T295385) (owner: 10JMeybohm) [14:46:05] (03PS12) 10Ideophagous: Bug:T291737 Squashed two commits into one, previous commit comments follow: Bug:T291737 Change-Id: Ib263a5419c6ace911a597d025b28d6ef13549c10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735713 [14:46:13] !log installing sqlalchemy security updates on stretch [14:46:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:56] (03PS22) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [14:46:58] (03PS13) 10Giuseppe Lavagetto: mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 [14:47:00] (03PS6) 10Giuseppe Lavagetto: deployment-prep: install php 7.4 on a mw appserver [puppet] - 10https://gerrit.wikimedia.org/r/738194 [14:47:47] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [14:49:06] (03CR) 10MMandere: [C: 03+2] site: Add drmrs cache instances [puppet] - 10https://gerrit.wikimedia.org/r/738199 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [14:49:52] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 (owner: 10Giuseppe Lavagetto) [14:50:11] !log btullis@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - btullis@cumin1001 [14:50:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:44] (03PS3) 10Ssingh: dnsrecursor: add support for enabling EDNS padding [puppet] - 10https://gerrit.wikimedia.org/r/736776 (https://phabricator.wikimedia.org/T274431) [14:52:06] !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster [14:52:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:15] (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32375/console" [puppet] - 10https://gerrit.wikimedia.org/r/736776 (https://phabricator.wikimedia.org/T274431) (owner: 10Ssingh) [14:54:05] (03CR) 10Ssingh: [V: 03+1] "Thanks for the review! Rebased, no code change." [puppet] - 10https://gerrit.wikimedia.org/r/736776 (https://phabricator.wikimedia.org/T274431) (owner: 10Ssingh) [14:54:15] (03CR) 10Ssingh: [V: 03+1 C: 03+2] dnsrecursor: add support for enabling EDNS padding [puppet] - 10https://gerrit.wikimedia.org/r/736776 (https://phabricator.wikimedia.org/T274431) (owner: 10Ssingh) [14:55:08] !log installing PHP 7.0 security updates [14:55:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:00] (03PS1) 10Ssingh: wikidough: enable support for EDNS padding [puppet] - 10https://gerrit.wikimedia.org/r/738251 (https://phabricator.wikimedia.org/T274431) [14:59:49] !log installing krb5 security updates on buster/bullseye (client-side libs/tools only, KDCs already fixed) [14:59:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:02] (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32376/console" [puppet] - 10https://gerrit.wikimedia.org/r/738251 (https://phabricator.wikimedia.org/T274431) (owner: 10Ssingh) [15:01:55] (03PS20) 10Giuseppe Lavagetto: mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) [15:01:57] (03PS23) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [15:01:59] (03PS14) 10Giuseppe Lavagetto: mediawiki::php: report prometheus metrics for all php versions [puppet] - 10https://gerrit.wikimedia.org/r/737929 [15:02:01] (03PS7) 10Giuseppe Lavagetto: deployment-prep: install php 7.4 on a mw appserver [puppet] - 10https://gerrit.wikimedia.org/r/738194 [15:03:12] (03CR) 10Giuseppe Lavagetto: [C: 03+1] mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [15:03:30] (03CR) 10Ssingh: [V: 03+1 C: 03+2] wikidough: enable support for EDNS padding [puppet] - 10https://gerrit.wikimedia.org/r/738251 (https://phabricator.wikimedia.org/T274431) (owner: 10Ssingh) [15:04:30] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 45.62 ms [15:06:59] 10SRE, 10Traffic, 10Patch-For-Review: Wikidough: Support EDNS(0) Padding: RFC 7830 and RFC 8467 - https://phabricator.wikimedia.org/T274431 (10ssingh) Responses are being padded to 468 bytes, as expected and per the RFC: ` kdig @185.71.138.138 +tls-ca +tls-host=wikimedia-dns.org wikipedia.org ;; TLS session... [15:08:20] 10SRE, 10Traffic, 10Patch-For-Review: Wikidough: Support EDNS(0) Padding: RFC 7830 and RFC 8467 - https://phabricator.wikimedia.org/T274431 (10ssingh) An example of an incorrectly padded response (and as we reported to dnsdist developers): ` kdig @185.71.138.138 +tls-ca +tls-host=wikimedia-dns.org example.... [15:16:22] !log jynus@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster [15:16:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:21] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks! That definitely is better than the alternative of using if $::realm clauses" [puppet] - 10https://gerrit.wikimedia.org/r/737200 (https://phabricator.wikimedia.org/T281986) (owner: 10Majavah) [15:18:27] !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster [15:18:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:37] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp6001.drmrs.wmnet with OS buster [15:32:03] (03CR) 10Jbond: [C: 03+1] "this lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/737983 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [15:33:20] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "I tested the change in the beta cluster, and everything seems to work properly." [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [15:38:01] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Tested on the beta cluster." [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [15:39:16] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "tested on the beta cluster." [puppet] - 10https://gerrit.wikimedia.org/r/737929 (owner: 10Giuseppe Lavagetto) [15:44:10] !log mmandere@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6001.drmrs.wmnet with OS buster [15:44:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:20] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp6001.drmrs.wmnet with OS buster executed with errors: - cp6001... [15:49:21] !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster [15:49:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:12] (03PS1) 10Jcrespo: install_server: Wipe db1139 and db2100 instead of maintaining data [puppet] - 10https://gerrit.wikimedia.org/r/738259 (https://phabricator.wikimedia.org/T280979) [15:53:58] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:57:25] (03CR) 10Jcrespo: [C: 03+2] install_server: Wipe db1139 and db2100 instead of maintaining data [puppet] - 10https://gerrit.wikimedia.org/r/738259 (https://phabricator.wikimedia.org/T280979) (owner: 10Jcrespo) [15:59:18] 10SRE, 10SRE-Access-Requests: Requesting access to deployment for SCherukuwada - https://phabricator.wikimedia.org/T295550 (10RhinosF1) Note: {T295552} also open [15:59:46] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for SCherukuwada - https://phabricator.wikimedia.org/T295552 (10RhinosF1) Note: {T295550} also open [16:00:26] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for SCherukuwada - https://phabricator.wikimedia.org/T295552 (10RhinosF1) @Ottomata: does this need your approval? [16:01:59] (03PS1) 10Jbond: (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 [16:03:17] (03CR) 10jerkins-bot: [V: 04-1] (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 (owner: 10Jbond) [16:05:42] (03PS1) 10Muehlenhoff: Add ownership annotations for additional Traffic services [puppet] - 10https://gerrit.wikimedia.org/r/738262 (https://phabricator.wikimedia.org/T216088) [16:06:18] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32377/console" [puppet] - 10https://gerrit.wikimedia.org/r/738193 (owner: 10Giuseppe Lavagetto) [16:08:55] (03PS2) 10Jbond: (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 [16:10:41] (03CR) 10jerkins-bot: [V: 04-1] (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 (owner: 10Jbond) [16:12:38] !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster [16:12:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:25] (03PS1) 10Muehlenhoff: Add ownership annotations for additional Data Persistence services [puppet] - 10https://gerrit.wikimedia.org/r/738265 (https://phabricator.wikimedia.org/T216088) [16:15:12] !log mmandere@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6001.drmrs.wmnet with OS buster [16:15:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:21] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp6001.drmrs.wmnet with OS buster executed with errors: - cp6001... [16:16:33] (03PS1) 10AOkoth: Merge branch 'production' of ssh://gerrit.wikimedia.org:29418/operations/puppet into production [puppet] - 10https://gerrit.wikimedia.org/r/738267 [16:17:12] (03CR) 10jerkins-bot: [V: 04-1] Merge branch 'production' of ssh://gerrit.wikimedia.org:29418/operations/puppet into production [puppet] - 10https://gerrit.wikimedia.org/r/738267 (owner: 10AOkoth) [16:17:30] (03Abandoned) 10AOkoth: Merge branch 'production' of ssh://gerrit.wikimedia.org:29418/operations/puppet into production [puppet] - 10https://gerrit.wikimedia.org/r/738267 (owner: 10AOkoth) [16:20:36] (03PS1) 10Jcrespo: install_server: manually setup db1139 and db2100 [puppet] - 10https://gerrit.wikimedia.org/r/738269 (https://phabricator.wikimedia.org/T280979) [16:22:51] (03CR) 10Jcrespo: [C: 03+2] install_server: manually setup db1139 and db2100 [puppet] - 10https://gerrit.wikimedia.org/r/738269 (https://phabricator.wikimedia.org/T280979) (owner: 10Jcrespo) [16:25:18] PROBLEM - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is CRITICAL: 102 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:26:16] !log jynus@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1139.eqiad.wmnet with OS buster [16:26:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:34] !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster [16:26:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:40] !log jynus@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster [16:26:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:27:24] RECOVERY - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is OK: (C)100 gt (W)50 gt 50 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:28:02] !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster [16:28:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:08] !log jynus@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster [16:28:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:19] (03CR) 10Ssingh: [C: 03+1] Add ownership annotations for additional Traffic services [puppet] - 10https://gerrit.wikimedia.org/r/738262 (https://phabricator.wikimedia.org/T216088) (owner: 10Muehlenhoff) [16:30:00] !log jynus@cumin1001 START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster [16:30:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:29] (03CR) 10Muehlenhoff: Add ownership annotations for additional Data Persistence services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/738265 (https://phabricator.wikimedia.org/T216088) (owner: 10Muehlenhoff) [16:31:38] (03PS3) 10Arturo Borrero Gonzalez: ceph::control: enable auth deploy on eqiad and remove unused vars [puppet] - 10https://gerrit.wikimedia.org/r/737936 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [16:36:44] (03PS1) 10Hnowlan: cassandra: move cluster:user relation from 1:1 relation to a 1:many [puppet] - 10https://gerrit.wikimedia.org/r/738270 (https://phabricator.wikimedia.org/T235299) [16:37:22] (03CR) 10jerkins-bot: [V: 04-1] cassandra: move cluster:user relation from 1:1 relation to a 1:many [puppet] - 10https://gerrit.wikimedia.org/r/738270 (https://phabricator.wikimedia.org/T235299) (owner: 10Hnowlan) [16:39:02] (03PS2) 10Hnowlan: cassandra: move cluster:user relation from 1:1 relation to a 1:many [puppet] - 10https://gerrit.wikimedia.org/r/738270 (https://phabricator.wikimedia.org/T235299) [16:46:39] (03PS3) 10Hnowlan: cassandra: move cluster:user relation from 1:1 relation to a 1:many [puppet] - 10https://gerrit.wikimedia.org/r/738270 (https://phabricator.wikimedia.org/T235299) [16:48:05] (03CR) 10Hnowlan: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32381/console" [puppet] - 10https://gerrit.wikimedia.org/r/738270 (https://phabricator.wikimedia.org/T235299) (owner: 10Hnowlan) [16:48:53] (03CR) 10Arturo Borrero Gonzalez: "PCC in this patch version: https://puppet-compiler.wmflabs.org/compiler1001/32380/" [puppet] - 10https://gerrit.wikimedia.org/r/737936 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [16:51:25] (03CR) 10Jcrespo: "Not sure what that is, ok at first with the other stuff." [puppet] - 10https://gerrit.wikimedia.org/r/738265 (https://phabricator.wikimedia.org/T216088) (owner: 10Muehlenhoff) [16:52:42] (03CR) 10Mepps: [C: 03+1] "LGTM - ready for deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737503 (https://phabricator.wikimedia.org/T293798) (owner: 10EllenR) [16:53:19] (03PS1) 10Arturo Borrero Gonzalez: hieradata: cloud: eqiad1: relocate hiera key from common/ to eqiad/ [puppet] - 10https://gerrit.wikimedia.org/r/738271 [16:55:00] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:56:17] !log jynus@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster [16:56:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:59:40] (03PS2) 10Arturo Borrero Gonzalez: hieradata: cloud: eqiad1: relocate hiera key from common/ to eqiad/ [puppet] - 10https://gerrit.wikimedia.org/r/738271 [17:00:01] (03PS1) 10Hnowlan: cassandra: add stub values for new credentials format [labs/private] - 10https://gerrit.wikimedia.org/r/738272 (https://phabricator.wikimedia.org/T235299) [17:00:04] jbond and rzl: Dear deployers, time to do the Puppet request window deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211111T1700). [17:00:04] No Gerrit patches in the queue for this window AFAICS. [17:06:18] 10SRE-swift-storage, 10ops-codfw, 10DC-Ops: hw troubleshooting: disk failure (sdr) for ms-be2059.codfw.wmnet - https://phabricator.wikimedia.org/T295563 (10MatthewVernon) [17:06:52] (03CR) 10Hnowlan: [V: 03+1] "Proper testing blocked on https://gerrit.wikimedia.org/r/c/labs/private/+/738272/" [puppet] - 10https://gerrit.wikimedia.org/r/738270 (https://phabricator.wikimedia.org/T235299) (owner: 10Hnowlan) [17:08:18] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "PCC is a NOOP: https://puppet-compiler.wmflabs.org/compiler1002/32383/" [puppet] - 10https://gerrit.wikimedia.org/r/738271 (owner: 10Arturo Borrero Gonzalez) [17:09:28] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "HEADS UP: I plan to merge this first https://gerrit.wikimedia.org/r/c/operations/puppet/+/738271" [puppet] - 10https://gerrit.wikimedia.org/r/737936 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [17:14:24] 10SRE-swift-storage, 10ops-codfw, 10DC-Ops: hw troubleshooting: disk failure (sdr) for ms-be2059.codfw.wmnet - https://phabricator.wikimedia.org/T295563 (10MatthewVernon) {F34742354} Here's the SupportAssistCollection output. [17:32:02] (03PS1) 10Volans: scripts: clean temporary code from PuppetDB import [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/738274 [17:46:07] (03CR) 10Volans: "Some context for the reviewers that might not have followed the IRC chat." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/738274 (owner: 10Volans) [18:00:04] chrisalbon and accraze: #bothumor I � Unicode. All rise for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211111T1800). [18:14:12] (03PS3) 10Jbond: (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 [18:14:50] (03CR) 10jerkins-bot: [V: 04-1] (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 (owner: 10Jbond) [18:28:54] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [18:41:22] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.76 ms [18:48:06] (03CR) 10Jbond: "hmm i think we should just do the following in every test???" [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 (owner: 10Jbond) [19:08:04] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [19:14:08] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.72 ms [21:02:10] PROBLEM - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is CRITICAL: 104 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:04:18] RECOVERY - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is OK: (C)100 gt (W)50 gt 49 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:15:42] (03PS4) 10Jbond: (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 [21:16:27] (03CR) 10jerkins-bot: [V: 04-1] (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 (owner: 10Jbond) [21:29:18] (03PS5) 10Jbond: (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 [21:29:48] (03PS6) 10Jbond: (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 [21:30:21] (03CR) 10jerkins-bot: [V: 04-1] (WIP) Pathlib: switch to pathlib vs os.path everywhere [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/738261 (owner: 10Jbond) [22:04:23] (03PS1) 10Gergő Tisza: Enable GrowthExperiments image recommendations on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/738284 (https://phabricator.wikimedia.org/T294878)