[00:07:06] RECOVERY - BGP status on cr3-knams is OK: BGP OK - up: 13, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [00:35:01] Southparkfan: are you having issues connecting? [00:36:36] no issues at the moment (yes, my traffic flows through the NL DCs), although the alerts concerned me [00:36:37] (03PS1) 10Legoktm: Revert "Use eswiki 20th anniversary logos" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697466 (https://phabricator.wikimedia.org/T280908) [00:36:46] (03PS2) 10Legoktm: Revert "Use eswiki 20th anniversary logos" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697466 (https://phabricator.wikimedia.org/T280908) [00:39:45] the wiki page says "all links are redundant" [00:40:28] (03CR) 10Legoktm: [C: 03+2] Revert "Use eswiki 20th anniversary logos" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697466 (https://phabricator.wikimedia.org/T280908) (owner: 10Legoktm) [00:41:11] (03Merged) 10jenkins-bot: Revert "Use eswiki 20th anniversary logos" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697466 (https://phabricator.wikimedia.org/T280908) (owner: 10Legoktm) [00:43:17] !log legoktm@deploy1002 Synchronized wmf-config/logos.php: Revert "Use eswiki 20th anniversary logos" (T280908) (duration: 01m 00s) [00:43:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:43:23] T280908: Change Spanish Wikipedia logo due to its 20th anniversary as of May 1 for one month - https://phabricator.wikimedia.org/T280908 [00:44:22] hmm, the logo is still up [00:45:16] ok, just some caching [00:46:10] losing an uplink is still not ideal :) [00:46:46] !log legoktm@deploy1002 Synchronized logos/config.yaml: Revert "Use eswiki 20th anniversary logos" (T280908) (duration: 01m 07s) [00:46:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:48:50] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_report_accounting_run.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:15:38] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:29:16] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:47:12] RECOVERY - SSH on contint2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:07:51] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.37.0-wmf.8 [core] (wmf/1.37.0-wmf.8) - 10https://gerrit.wikimedia.org/r/697453 [02:07:54] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.37.0-wmf.8 [core] (wmf/1.37.0-wmf.8) - 10https://gerrit.wikimedia.org/r/697453 (owner: 10TrainBranchBot) [02:25:45] (03Merged) 10jenkins-bot: Branch commit for wmf/1.37.0-wmf.8 [core] (wmf/1.37.0-wmf.8) - 10https://gerrit.wikimedia.org/r/697453 (owner: 10TrainBranchBot) [03:35:52] RECOVERY - dump of m5 in eqiad on alert1001 is OK: Last dump for m5 at eqiad (db1117.eqiad.wmnet:3325) taken on 2021-06-01 03:07:29 (36 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [04:29:10] (03PS1) 10Marostegui: Revert "db1147: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/697467 [04:29:50] (03CR) 10Marostegui: [C: 03+2] admin: Remove previous SSH key for Andrew Kostka [puppet] - 10https://gerrit.wikimedia.org/r/697063 (https://phabricator.wikimedia.org/T283940) (owner: 10Andrew-WMDE) [04:30:00] (03CR) 10Marostegui: [C: 03+2] Revert "db1147: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/697467 (owner: 10Marostegui) [04:30:55] (03PS3) 10Marostegui: data.yaml: Add Bumeh-ctr to analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/695872 (https://phabricator.wikimedia.org/T283648) [04:31:22] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Remove previous SSH key for Andrew Kostka - https://phabricator.wikimedia.org/T283940 (10Marostegui) 05Open→03Resolved This has been merged [04:33:03] 10SRE, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests: Requesting access to contint-admins for Ladsgroup - https://phabricator.wikimedia.org/T283925 (10Marostegui) I think it needs approval from @greg or @thcipriani [04:33:15] 10SRE, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests: Requesting access to contint-admins for Ladsgroup - https://phabricator.wikimedia.org/T283925 (10Marostegui) [04:34:22] 10SRE, 10SRE-Access-Requests: Requesting access to contint-roots for Ahmon Dancy - https://phabricator.wikimedia.org/T283851 (10Marostegui) p:05Triage→03Medium @dancy can you coordinate the approval for this? [04:34:28] 10SRE, 10SRE-Access-Requests: Requesting access to gitlab1001 for Ahmon Dancy (@dancy) - https://phabricator.wikimedia.org/T283850 (10Marostegui) p:05Triage→03Medium @dancy can you coordinate the approval for this? [04:38:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16238 and previous config saved to /var/cache/conftool/dbconfig/20210601-043837-root.json [04:38:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:44:06] RECOVERY - dump of m5 in codfw on alert1001 is OK: Last dump for m5 at codfw (db2078.codfw.wmnet:3325) taken on 2021-06-01 04:04:10 (36 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [04:53:42] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16239 and previous config saved to /var/cache/conftool/dbconfig/20210601-045341-root.json [04:53:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:59:32] 10SRE, 10Analytics, 10Analytics-Kanban, 10Product-Analytics, and 2 others: Requesting access to analytics-privatedata-users for schoenbaechler - https://phabricator.wikimedia.org/T283190 (10Marostegui) 05Open→03Stalled [05:04:19] PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), Fresh: 101 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [05:08:34] (03CR) 10Marostegui: [C: 03+2] "This was checked by lego (thanks!)" [puppet] - 10https://gerrit.wikimedia.org/r/695872 (https://phabricator.wikimedia.org/T283648) (owner: 10Marostegui) [05:08:46] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16240 and previous config saved to /var/cache/conftool/dbconfig/20210601-050845-root.json [05:08:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:12:45] 10SRE, 10Analytics-Radar, 10SRE-Access-Requests: Requesting access to production shell for BUmeh - https://phabricator.wikimedia.org/T283648 (10Marostegui) 05Open→03Resolved Patch merged. User added to nda ldap group. Kerberos principal created. Please test the access in about 1-2h to make sure puppet h... [05:17:44] (03PS1) 10Marostegui: data.yaml: Add Jeff Mixter to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/697457 (https://phabricator.wikimedia.org/T283632) [05:20:50] (03CR) 10Marostegui: "Reviewed by lego!" [puppet] - 10https://gerrit.wikimedia.org/r/697457 (https://phabricator.wikimedia.org/T283632) (owner: 10Marostegui) [05:21:59] (03CR) 10Marostegui: [C: 03+2] data.yaml: Add Jeff Mixter to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/697457 (https://phabricator.wikimedia.org/T283632) (owner: 10Marostegui) [05:22:02] 10SRE, 10SRE-Access-Requests: Requesting access to gitlab1001 for Ahmon Dancy (@dancy) - https://phabricator.wikimedia.org/T283850 (10thcipriani) Approved as manager/product owner. [05:23:50] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16241 and previous config saved to /var/cache/conftool/dbconfig/20210601-052349-root.json [05:23:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:24:50] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: access to analytics data for wdqs for jmixter - https://phabricator.wikimedia.org/T283632 (10Marostegui) 05Open→03Resolved Access done: - Patch merged - User added to wmf ldap group - Principal created: ` root@krb1001:~# sudo manage_principals.py create j... [05:25:28] (03PS1) 10Marostegui: data.yaml: Add dancy to gitlab-roots [puppet] - 10https://gerrit.wikimedia.org/r/697458 (https://phabricator.wikimedia.org/T283850) [05:26:30] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to gitlab1001 for Ahmon Dancy (@dancy) - https://phabricator.wikimedia.org/T283850 (10Marostegui) [05:31:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16242 and previous config saved to /var/cache/conftool/dbconfig/20210601-053137-marostegui.json [05:31:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:32:53] (03PS1) 10Marostegui: db1146: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/697459 [05:33:40] (03CR) 10Marostegui: [C: 03+2] db1146: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/697459 (owner: 10Marostegui) [05:37:20] !log uploaded django-allauth_0.44.0+ds-1~bpo10+1 mailman3_3.3.3-1~bpo10+4 to apt.wm.o [05:37:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:45:04] (03CR) 10Marostegui: [C: 04-2] "Waiting for Wolfgang Kandek approval" [puppet] - 10https://gerrit.wikimedia.org/r/697458 (https://phabricator.wikimedia.org/T283850) (owner: 10Marostegui) [05:46:40] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to gitlab1001 for Ahmon Dancy (@dancy) - https://phabricator.wikimedia.org/T283850 (10Marostegui) @wkandek can you approve/deny this? [05:47:50] (03CR) 10Giuseppe Lavagetto: [C: 03+1] data.yaml: Add dancy to gitlab-roots [puppet] - 10https://gerrit.wikimedia.org/r/697458 (https://phabricator.wikimedia.org/T283850) (owner: 10Marostegui) [05:54:28] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Minor nit and a warning - we need a companion patch probably. LGTM overall." (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/685462 (owner: 10Volans) [05:55:14] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to gitlab1001 for Ahmon Dancy (@dancy) - https://phabricator.wikimedia.org/T283850 (10Marostegui) [05:56:05] !log restarting mailman3 on lists1001 [05:56:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:56:42] RECOVERY - mailman3_runners on lists1001 is OK: PROCS OK: 14 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:57:13] 10SRE, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests: Requesting access to contint-admins for Ladsgroup - https://phabricator.wikimedia.org/T283925 (10Marostegui) a:03Marostegui [05:57:17] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to gitlab1001 for Ahmon Dancy (@dancy) - https://phabricator.wikimedia.org/T283850 (10Marostegui) a:03Marostegui [05:59:11] 10SRE, 10SRE-Access-Requests: Requesting access to contint-roots for Ahmon Dancy - https://phabricator.wikimedia.org/T283851 (10Marostegui) [05:59:13] 10SRE, 10SRE-Access-Requests: Requesting access to contint-roots for Ahmon Dancy - https://phabricator.wikimedia.org/T283851 (10Marostegui) a:03Marostegui [06:10:33] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: add egress policies to databases [deployment-charts] - 10https://gerrit.wikimedia.org/r/693871 (owner: 10Giuseppe Lavagetto) [06:11:11] (03PS1) 10Giuseppe Lavagetto: profile::docker::builder: add new labels for base images [puppet] - 10https://gerrit.wikimedia.org/r/697460 [06:13:04] (03Merged) 10jenkins-bot: mediawiki: add egress policies to databases [deployment-charts] - 10https://gerrit.wikimedia.org/r/693871 (owner: 10Giuseppe Lavagetto) [06:13:05] 10SRE, 10Dumps-Generation: snapshot101[45] have no role, break puppet run - https://phabricator.wikimedia.org/T283545 (10ArielGlenn) Is the spam still showing up? If not, perhaps we can close this ticket. [06:15:32] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29738/console" [puppet] - 10https://gerrit.wikimedia.org/r/697460 (owner: 10Giuseppe Lavagetto) [06:19:46] PROBLEM - mailman3_runners on lists1001 is CRITICAL: PROCS CRITICAL: 13 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:21:12] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29739/console" [puppet] - 10https://gerrit.wikimedia.org/r/696516 (https://phabricator.wikimedia.org/T274880) (owner: 10Mforns) [06:22:58] RECOVERY - mailman3_runners on lists1001 is OK: PROCS OK: 14 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:24:53] (03CR) 10Elukey: [V: 03+1 C: 03+2] "pcc looks good, tested then command on an-launcher1002 and I see correct/expected permissions on HDFS. Merging!" [puppet] - 10https://gerrit.wikimedia.org/r/696516 (https://phabricator.wikimedia.org/T274880) (owner: 10Mforns) [06:30:07] ACKNOWLEDGEMENT - HP RAID on labstore1007 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Failed: 1I:1:5 - Controller: OK - Battery/Capacitor: OK --- Slot 3: OK: 1E:1:1, 1E:1:10, 1E:1:11, 1E:1:12, 1E:1:2, 1E:1:3, 1E:1:4, 1E:1:5, 1E:1:6, 1E:1:7, 1E:1:8, 1E:1:9, 1E:2:1, 1E:2:10, 1E:2:11, 1E:2:12, 1E:2:2, 1E:2:3, 1E:2:4, 1E:2:5, 1E:2:6, 1E:2:7, 1E:2:8 [06:30:08] - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T284036 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [06:30:11] 10SRE, 10ops-eqiad: Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T284036 (10ops-monitoring-bot) [06:33:11] PROBLEM - mailman3_runners on lists1001 is CRITICAL: PROCS CRITICAL: 13 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:38:35] RECOVERY - mailman3_runners on lists1001 is OK: PROCS OK: 14 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:49:19] PROBLEM - mailman3_runners on lists1001 is CRITICAL: PROCS CRITICAL: 13 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:52:51] RECOVERY - mailman3_runners on lists1001 is OK: PROCS OK: 14 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [06:53:20] (03CR) 10Muehlenhoff: [C: 03+2] Add logout script for sretest [puppet] - 10https://gerrit.wikimedia.org/r/695203 (https://phabricator.wikimedia.org/T283242) (owner: 10Muehlenhoff) [07:01:51] PROBLEM - mailman3_runners on lists1001 is CRITICAL: PROCS CRITICAL: 13 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:03:18] (03PS1) 10Giuseppe Lavagetto: Fix tests for mcrouter [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/697535 [07:03:20] (03PS1) 10Giuseppe Lavagetto: Use {{ registry }} instead than the public interface [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/697536 [07:03:42] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Fix tests for mcrouter [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/697535 (owner: 10Giuseppe Lavagetto) [07:08:05] (03CR) 10Elukey: [C: 03+1] Use {{ registry }} instead than the public interface [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/697536 (owner: 10Giuseppe Lavagetto) [07:09:03] RECOVERY - mailman3_runners on lists1001 is OK: PROCS OK: 14 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:10:38] (03CR) 10Elukey: [C: 03+1] profile::docker::builder: add new labels for base images [puppet] - 10https://gerrit.wikimedia.org/r/697460 (owner: 10Giuseppe Lavagetto) [07:11:29] (03CR) 10Giuseppe Lavagetto: [V: 03+1 C: 03+2] profile::docker::builder: add new labels for base images [puppet] - 10https://gerrit.wikimedia.org/r/697460 (owner: 10Giuseppe Lavagetto) [07:14:17] !log installing nginx security updates [07:14:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:19:03] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Use {{ registry }} instead than the public interface [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/697536 (owner: 10Giuseppe Lavagetto) [07:19:10] (03PS2) 10Giuseppe Lavagetto: Use {{ registry }} instead than the public interface [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/697536 [07:19:21] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Use {{ registry }} instead than the public interface [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/697536 (owner: 10Giuseppe Lavagetto) [07:21:36] (03PS1) 10Legoktm: mailman3: Renable debug logs for bounce processing [puppet] - 10https://gerrit.wikimedia.org/r/697537 [07:22:18] (03CR) 10Legoktm: [C: 03+2] mailman3: Renable debug logs for bounce processing [puppet] - 10https://gerrit.wikimedia.org/r/697537 (owner: 10Legoktm) [07:26:40] (03CR) 10Muehlenhoff: [C: 03+2] gerrit: add Java 11 packages [puppet] - 10https://gerrit.wikimedia.org/r/694523 (https://phabricator.wikimedia.org/T268225) (owner: 10Hashar) [07:26:45] !log depooling wdsq1005 (lag) [07:26:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:35] Gerrit will be restarted shortly over the next half hour or so for a Java upgrade [07:28:35] easy! [07:34:11] PROBLEM - mailman3_runners on lists1001 is CRITICAL: PROCS CRITICAL: 13 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:35:10] (03PS5) 10Muehlenhoff: gerrit: switch to Java 11 [puppet] - 10https://gerrit.wikimedia.org/r/694524 (https://phabricator.wikimedia.org/T268225) (owner: 10Hashar) [07:46:40] (03PS1) 10Ema: cache: use 'exp' admission policy on cache_upload [puppet] - 10https://gerrit.wikimedia.org/r/697538 (https://phabricator.wikimedia.org/T144187) [07:49:45] RECOVERY - WDQS high update lag on wdqs1005 is OK: (C)4.32e+04 ge (W)2.16e+04 ge 2.152e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [07:51:29] (03PS6) 10Hashar: gerrit: switch to Java 11 [puppet] - 10https://gerrit.wikimedia.org/r/694524 (https://phabricator.wikimedia.org/T268225) [07:51:35] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site https://wikitech.wikimedia.org/wiki/Gerrit [07:52:36] (03CR) 10Ema: [V: 03+1] "PCC SUCCESS (NOOP 1 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29740/console" [puppet] - 10https://gerrit.wikimedia.org/r/697538 (https://phabricator.wikimedia.org/T144187) (owner: 10Ema) [07:54:41] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:56:19] (03CR) 10Ema: [V: 03+1 C: 03+2] cache: use 'exp' admission policy on cache_upload [puppet] - 10https://gerrit.wikimedia.org/r/697538 (https://phabricator.wikimedia.org/T144187) (owner: 10Ema) [07:56:19] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:58:00] (03PS7) 10Hashar: gerrit: switch to Java 11 [puppet] - 10https://gerrit.wikimedia.org/r/694524 (https://phabricator.wikimedia.org/T268225) [07:58:03] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 86 probes of 619 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [07:58:56] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/694524 (https://phabricator.wikimedia.org/T268225) (owner: 10Hashar) [07:59:35] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [08:01:13] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [08:02:43] !log Restarted Gerrit on gerrit2001 for Java 11 upgrade # T268225 [08:02:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:02:47] T268225: Switch Gerrit from Java 8 to Java 11 - https://phabricator.wikimedia.org/T268225 [08:03:39] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 42 probes of 619 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [08:03:59] !log Restarted Gerrit on gerrit1001 for Java 11 upgrade # T268225 [08:04:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:04:07] 10Puppet, 10SRE: Allow the deployment of users to a host without their ssh key via the admin module - https://phabricator.wikimedia.org/T212429 (10elukey) 05Open→03Resolved This was done ages ago, probably a dupe of others tasks, closing it! [08:04:48] (03PS1) 10Jcrespo: Revert "mariadb: Disable notifications on db2100 to handle its crash" [puppet] - 10https://gerrit.wikimedia.org/r/697546 [08:06:20] 10SRE, 10ops-eqiad, 10Traffic: cp1087 powercycled - https://phabricator.wikimedia.org/T278729 (10ema) @Cmjohnson: this is still happening unfortunately. The host is currently down and depooled, please feel free to try anything else that comes to mind. No heads-up needed. [08:08:09] PROBLEM - gerrit process on gerrit1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site https://wikitech.wikimedia.org/wiki/Gerrit [08:11:21] 10SRE, 10Analytics, 10Research-Backlog, 10WMF-Legal, 10User-Elukey: Enable layered data-access and sharing for a new form of collaboration - https://phabricator.wikimedia.org/T245833 (10elukey) [08:18:42] 10SRE, 10Security-Team, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Upgrade GNU Mailman from 2.1 to Mailman3 - https://phabricator.wikimedia.org/T52864 (10RhinosF1) [08:19:43] 10SRE, 10ops-eqiad, 10Analytics-Radar: Try to move some new analytics worker nodes to different racks - https://phabricator.wikimedia.org/T276239 (10elukey) Thanks! Remaining step is to move an-worker1139 to A7, pending https://phabricator.wikimedia.org/T280203 [08:20:01] (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: Disable notifications on db2100 to handle its crash" [puppet] - 10https://gerrit.wikimedia.org/r/697546 (owner: 10Jcrespo) [08:20:17] 10SRE, 10Security-Team, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Upgrade GNU Mailman from 2.1 to Mailman3 - https://phabricator.wikimedia.org/T52864 (10Legoktm) [08:20:27] (03CR) 10Volans: "Few nits and I think a bug inline" (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/695203 (https://phabricator.wikimedia.org/T283242) (owner: 10Muehlenhoff) [08:21:15] (03CR) 10Ema: [C: 03+1] "LGTM, vgutierrez confirmed on irc" [puppet] - 10https://gerrit.wikimedia.org/r/685811 (owner: 10Muehlenhoff) [08:21:28] (03CR) 10Ema: [C: 03+1] "LGTM, vgutierrez confirmed on irc" [puppet] - 10https://gerrit.wikimedia.org/r/685810 (owner: 10Muehlenhoff) [08:38:17] RECOVERY - mailman3_runners on lists1001 is OK: PROCS OK: 14 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:38:26] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete tlsproxy::ocsp and related configs [puppet] - 10https://gerrit.wikimedia.org/r/685810 (owner: 10Muehlenhoff) [08:51:09] ACKNOWLEDGEMENT - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), Fresh: 101 jobs Jcrespo normal overload at the start of the month - The acknowledgement expires at: 2021-06-02 08:50:47. https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [08:54:22] 10SRE, 10Wikimedia-Mailing-lists: Mailman3 bounce runner is running very slowly - https://phabricator.wikimedia.org/T282348 (10Legoktm) >>! In T282348#7124014, @Legoktm wrote: > ` > May 31 06:26:05 lists1001 mailman3[31349]: mailman.interfaces.member.NotAMemberError: ahalfaker@wikimedia.org is not a member of... [09:08:22] (03PS1) 10Muehlenhoff: sretest/logout.d: Followup changes [puppet] - 10https://gerrit.wikimedia.org/r/697571 [09:11:23] (03CR) 10Muehlenhoff: Add logout script for sretest (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/695203 (https://phabricator.wikimedia.org/T283242) (owner: 10Muehlenhoff) [09:15:27] (03PS3) 10Volans: Add python-build-bullseye image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/685462 [09:15:39] (03CR) 10Volans: [V: 03+1] "thanks for the reviews, replies inline" (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/685462 (owner: 10Volans) [09:16:37] (03PS3) 10Ema: Traffic team alerts [alerts] - 10https://gerrit.wikimedia.org/r/696468 (https://phabricator.wikimedia.org/T282806) [09:19:14] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/697571 (owner: 10Muehlenhoff) [09:20:33] PROBLEM - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is CRITICAL: 119 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:21:11] (03CR) 10Muehlenhoff: [C: 03+2] sretest/logout.d: Followup changes [puppet] - 10https://gerrit.wikimedia.org/r/697571 (owner: 10Muehlenhoff) [09:21:38] (03PS1) 10Filippo Giunchedi: Revert "swift: group-writable log directory" [puppet] - 10https://gerrit.wikimedia.org/r/697573 (https://phabricator.wikimedia.org/T283951) [09:22:32] seeking cronspam-haters folks for a review of ^ [09:23:56] RECOVERY - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is OK: (C)100 gt (W)50 gt 40 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:24:09] godog: looking [09:24:47] thank you moritzm! [09:25:48] 10Puppet, 10User-jbond: Ensure puppet sends the correct ircd signals to update config and motd - https://phabricator.wikimedia.org/T284052 (10jbond) [09:26:51] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/697573 (https://phabricator.wikimedia.org/T283951) (owner: 10Filippo Giunchedi) [09:27:00] PROBLEM - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is CRITICAL: 171 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:27:19] (03CR) 10Filippo Giunchedi: [C: 03+2] Revert "swift: group-writable log directory" [puppet] - 10https://gerrit.wikimedia.org/r/697573 (https://phabricator.wikimedia.org/T283951) (owner: 10Filippo Giunchedi) [09:27:40] cheers [09:28:18] (03PS2) 10Hnowlan: maps: make maps2009 a buster imposm-based master in codfw [puppet] - 10https://gerrit.wikimedia.org/r/696418 (https://phabricator.wikimedia.org/T269582) [09:28:37] <_joe_> looking at the exceptions [09:28:47] (03PS1) 10David Caro: wmcs.backups: skip tools etcd nodes [puppet] - 10https://gerrit.wikimedia.org/r/697574 (https://phabricator.wikimedia.org/T284050) [09:29:24] RECOVERY - Check systemd state on thanos-fe1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:30:31] <_joe_> looks like they're a bunch of errors coming from dumps [09:30:34] <_joe_> apergos: ^^ [09:30:46] eh? [09:31:34] RECOVERY - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is OK: (C)100 gt (W)50 gt 42 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:31:47] parsoid? we don't parse or expand anything [09:32:06] we just get wikitext directly from the external cluster [09:32:16] (03PS1) 10Muehlenhoff: Fix args passing [puppet] - 10https://gerrit.wikimedia.org/r/697575 [09:32:24] and we'renot even doing that yet... it should be dumping tables or metadata only [09:32:26] <_joe_> not sure why it says for parsoid, I saw a bunch of errors coming from snapshot regarding bad data in gzdeflate [09:32:38] <_joe_> oh I see, the alert includes timeouts, the logstash dashboard does not [09:32:52] ah [09:33:31] <_joe_> yep, they're gone now [09:33:37] heh [09:33:45] (03CR) 10Hnowlan: [V: 03+1] "PCC SUCCESS (NOOP 1 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29741/console" [puppet] - 10https://gerrit.wikimedia.org/r/696418 (https://phabricator.wikimedia.org/T269582) (owner: 10Hnowlan) [09:34:55] RECOVERY - Check systemd state on thanos-be2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:35:52] RECOVERY - Check systemd state on thanos-be2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:36:25] RECOVERY - Check systemd state on thanos-be1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:36:31] 10SRE, 10Patch-For-Review, 10Tracking-Neverending: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10fgiunchedi) [09:36:40] RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:36:40] RECOVERY - Check systemd state on thanos-fe2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:36:48] RECOVERY - Check systemd state on thanos-fe2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:37:00] RECOVERY - Check systemd state on thanos-be2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:37:33] !log Draining Telia CT IC-307235 to do some comparative bandwidth tests from eqiad to codfw (T274234) [09:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:37] (03PS3) 10Hnowlan: maps: make maps2009 a buster imposm-based master in codfw [puppet] - 10https://gerrit.wikimedia.org/r/696418 (https://phabricator.wikimedia.org/T269582) [09:37:38] T274234: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 [09:38:36] (03CR) 10Hnowlan: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29742/console" [puppet] - 10https://gerrit.wikimedia.org/r/696418 (https://phabricator.wikimedia.org/T269582) (owner: 10Hnowlan) [09:39:01] 10SRE, 10User-MoritzMuehlenhoff, 10User-jbond: Investigate GID allocation for system users - https://phabricator.wikimedia.org/T235163 (10fgiunchedi) For completeness, the other long-term transition in place is for swift (T123918) since we change uid/gid only on (de)com of hosts. Doing it "online" and speedi... [09:39:14] RECOVERY - Check systemd state on thanos-fe2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:39:39] topranks, currently no cross-dc backups are happening FYI [09:41:22] RECOVERY - Check systemd state on thanos-fe1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:42:17] (03CR) 10Filippo Giunchedi: [C: 03+1] grafana: fetch operations/grafana-grizzly as /srv/grafana-grizzly [puppet] - 10https://gerrit.wikimedia.org/r/696626 (owner: 10Herron) [09:43:12] (03CR) 10Filippo Giunchedi: "See inline, LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/696627 (owner: 10Herron) [09:43:38] RECOVERY - Check systemd state on thanos-be1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:45:50] RECOVERY - Check systemd state on thanos-be1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:46:25] (03CR) 10Giuseppe Lavagetto: [C: 04-1] (WIP) mwdebug: add helmfile configuration (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/693875 (owner: 10Effie Mouzeli) [09:47:00] (03CR) 10Hnowlan: [V: 03+1] "PCC SUCCESS (NOOP 2 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29744/console" [puppet] - 10https://gerrit.wikimedia.org/r/693907 (https://phabricator.wikimedia.org/T277064) (owner: 10Hnowlan) [09:47:16] (03CR) 10Jbond: [C: 03+2] O:puppet_compiler: update redirects [puppet] - 10https://gerrit.wikimedia.org/r/696582 (https://phabricator.wikimedia.org/T264184) (owner: 10Jbond) [09:48:02] RECOVERY - Check systemd state on thanos-be2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:48:49] (03CR) 10Volans: [C: 04-1] "Couple of things to fix, the rest of the comments are mostly nit and suggestions." (0313 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/696377 (https://phabricator.wikimedia.org/T274527) (owner: 10Muehlenhoff) [09:48:54] (03CR) 10Effie Mouzeli: (WIP) mwdebug: add helmfile configuration (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/693875 (owner: 10Effie Mouzeli) [09:49:26] (03PS11) 10Effie Mouzeli: (WIP) mwdebug: add helmfile configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/693875 [09:50:16] RECOVERY - Check systemd state on thanos-be1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:54:04] (03CR) 10Jgiannelos: [C: 03+1] postgresql::postgis: use latest packages on buster [puppet] - 10https://gerrit.wikimedia.org/r/693907 (https://phabricator.wikimedia.org/T277064) (owner: 10Hnowlan) [09:54:49] (03CR) 10Hnowlan: [V: 03+1 C: 03+2] postgresql::postgis: use latest packages on buster [puppet] - 10https://gerrit.wikimedia.org/r/693907 (https://phabricator.wikimedia.org/T277064) (owner: 10Hnowlan) [09:57:26] jynus: Thanks for confirming yes the graphs show nothing going on. [10:01:52] 10SRE, 10Dumps-Generation: snapshot101[45] have no role, break puppet run - https://phabricator.wikimedia.org/T283545 (10jbond) 05Open→03Resolved a:05ArielGlenn→03jbond Yes this can be closed will reopen if still and issue [10:01:55] 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: (Need By: 2021-03-31) rack/setup/install snapshot101[1-5] - https://phabricator.wikimedia.org/T272509 (10jbond) [10:10:00] (03PS12) 10Effie Mouzeli: mwdebug: add helmfile configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/693875 (https://phabricator.wikimedia.org/T283056) [10:14:17] (03CR) 10Giuseppe Lavagetto: [C: 03+1] mwdebug: add helmfile configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/693875 (https://phabricator.wikimedia.org/T283056) (owner: 10Effie Mouzeli) [10:16:53] (03PS4) 10Ema: Traffic team alerts [alerts] - 10https://gerrit.wikimedia.org/r/696468 (https://phabricator.wikimedia.org/T282806) [10:17:10] (03PS1) 10David Caro: cumin.wmcs: add/fix eqiad1 aliases [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) [10:18:32] (03PS1) 10JMeybohm: chartmuseum: Set cache-interval to 60s [puppet] - 10https://gerrit.wikimedia.org/r/697584 (https://phabricator.wikimedia.org/T283147) [10:18:41] (03CR) 10jerkins-bot: [V: 04-1] cumin.wmcs: add/fix eqiad1 aliases [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) (owner: 10David Caro) [10:18:52] (03CR) 10Effie Mouzeli: [C: 03+2] mwdebug: add helmfile configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/693875 (https://phabricator.wikimedia.org/T283056) (owner: 10Effie Mouzeli) [10:19:38] (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29745/console" [puppet] - 10https://gerrit.wikimedia.org/r/697584 (https://phabricator.wikimedia.org/T283147) (owner: 10JMeybohm) [10:21:49] (03PS13) 10Giuseppe Lavagetto: mwdebug: add helmfile configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/693875 (https://phabricator.wikimedia.org/T283056) (owner: 10Effie Mouzeli) [10:22:35] (03CR) 10Volans: cumin.wmcs: add/fix eqiad1 aliases (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) (owner: 10David Caro) [10:22:45] (03CR) 10MarcoAurelio: "Due to an unexpected and sudden event, I won't be able to be around for the EU Window. If this is a blocker for the deployment, I can try " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/696535 (https://phabricator.wikimedia.org/T283380) (owner: 10MarcoAurelio) [10:22:48] <_joe_> effie: it needed a rebase [10:23:02] ohes [10:23:37] (03PS2) 10JMeybohm: chartmuseum: Set cache-interval to 60s [puppet] - 10https://gerrit.wikimedia.org/r/697584 (https://phabricator.wikimedia.org/T283147) [10:28:06] (03CR) 10JMeybohm: [C: 03+2] chartmuseum: Set cache-interval to 60s [puppet] - 10https://gerrit.wikimedia.org/r/697584 (https://phabricator.wikimedia.org/T283147) (owner: 10JMeybohm) [10:34:01] (03CR) 10David Caro: cumin.wmcs: add/fix eqiad1 aliases (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) (owner: 10David Caro) [10:34:02] (03CR) 10Volans: "Comments inline, mostly nits." (039 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [10:34:37] (03PS1) 10JMeybohm: Revert "Allow absenting the helmchartctl systemd timer" [puppet] - 10https://gerrit.wikimedia.org/r/697547 [10:34:46] (03PS2) 10JMeybohm: Revert "Allow absenting the helmchartctl systemd timer" [puppet] - 10https://gerrit.wikimedia.org/r/697547 [10:35:42] 10SRE, 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Have linters/tests results show up as comments in files on gerrit - https://phabricator.wikimedia.org/T209149 (10awight) Thoughts about this: * Why does phpcs have to be run twice? Can we run in reporting mode the first time? Is there some... [10:35:58] (03CR) 10Volans: "reply inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) (owner: 10David Caro) [10:37:04] PROBLEM - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp5007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:38:03] (03PS2) 10David Caro: cumin.wmcs: add/fix eqiad1 aliases [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) [10:38:05] (03CR) 10David Caro: cumin.wmcs: add/fix eqiad1 aliases (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) (owner: 10David Caro) [10:38:28] !log jiji@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [10:38:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:05] 10SRE, 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Have linters/tests results show up as comments in files on gerrit - https://phabricator.wikimedia.org/T209149 (10kostajh) >>! In T209149#7125536, @awight wrote: > Thoughts about this: > * Why does phpcs have to be run twice? Can we run in r... [10:39:49] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling [10:39:50] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling [10:39:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:50] RECOVERY - mailman3_queue_size on lists1001 is OK: OK: mailman3 queues are below the limits https://wikitech.wikimedia.org/wiki/Mailman/Monitoring https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3 [10:49:22] RECOVERY - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp5007 is OK: HTTP OK: HTTP/1.0 200 OK - 23586 bytes in 0.728 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:55:06] (03CR) 10Volans: "One typo, looks good syntax wise otherwise. I didn't check the logic of the selection as I don't have context on that, I'll leave it to yo" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) (owner: 10David Caro) [10:56:49] (03PS5) 10Jbond: IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) [10:57:14] (03CR) 10Jbond: "updated" (0314 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [10:59:32] (03CR) 10jerkins-bot: [V: 04-1] IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: (Dis)respected human, time to deploy European mid-day backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210601T1100). Please do the needful. [11:00:04] ma: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:53] i'll babysit ma's patch per his request [11:00:55] I can deploy today [11:01:15] (03CR) 10Urbanecm: [C: 03+2] Enable "Diff" RSS feed on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/696535 (https://phabricator.wikimedia.org/T283380) (owner: 10MarcoAurelio) [11:02:15] (03CR) 10Majavah: IDM: create new idm library with logoutd base class (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [11:02:21] (03CR) 10Muehlenhoff: [C: 03+2] Fix args passing [puppet] - 10https://gerrit.wikimedia.org/r/697575 (owner: 10Muehlenhoff) [11:02:24] (03Merged) 10jenkins-bot: Enable "Diff" RSS feed on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/696535 (https://phabricator.wikimedia.org/T283380) (owner: 10MarcoAurelio) [11:04:14] !log jiji@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:04:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:23] 10SRE, 10Maps, 10Packaging, 10Product-Infrastructure-Team-Backlog, 10serviceops: Packaging PostGIS 3.1 for the new Maps stack - https://phabricator.wikimedia.org/T277064 (10hnowlan) postgis 3.1.1 running on maps1009 successfully. Just to note for posterity and reference, we hit some issues upgrading, w... [11:04:58] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: e4989d2b19e07d2a816cd7f6afae077f86aca54e: Enable "Diff" RSS feed on meta (T283380) (duration: 00m 58s) [11:05:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:01] T283380: Enable "Diff" RSS feed on meta.wikimedia.org - https://phabricator.wikimedia.org/T283380 [11:05:08] 10SRE, 10Maps, 10Packaging, 10Product-Infrastructure-Team-Backlog, 10serviceops: Packaging PostGIS 3.1 for the new Maps stack - https://phabricator.wikimedia.org/T277064 (10hnowlan) 05Open→03Resolved [11:05:46] * urbanecm done [11:05:47] (03PS3) 10David Caro: cumin.wmcs: add/fix eqiad1 aliases [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) [11:05:49] (03CR) 10David Caro: cumin.wmcs: add/fix eqiad1 aliases (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) (owner: 10David Caro) [11:14:45] !log jbond@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE [11:14:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:30] (03CR) 10Jbond: "Thanks see inline" (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [11:16:55] !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE [11:16:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:17:54] (03CR) 10Muehlenhoff: IDM: create new idm library with logoutd base class (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [11:19:29] (03PS1) 10Jcrespo: install_server: Set backup2* hosts are non-reimagable [puppet] - 10https://gerrit.wikimedia.org/r/697586 (https://phabricator.wikimedia.org/T276442) [11:20:21] (03CR) 10Jcrespo: [C: 03+2] install_server: Set backup2* hosts are non-reimagable [puppet] - 10https://gerrit.wikimedia.org/r/697586 (https://phabricator.wikimedia.org/T276442) (owner: 10Jcrespo) [11:20:45] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me, two typos inline." (032 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [11:27:59] (03PS1) 10Giuseppe Lavagetto: mwdebug: fix the mcrouter configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/697587 [11:28:27] (03PS6) 10Jbond: IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) [11:28:57] (03PS1) 10Giuseppe Lavagetto: mediawiki: re-add newline [deployment-charts] - 10https://gerrit.wikimedia.org/r/697588 [11:29:56] (03CR) 10Jbond: "corrected thanks" (032 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [11:31:19] 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T284036 (10Peachey88) [11:32:49] (03CR) 10JMeybohm: [C: 03+2] Revert "Allow absenting the helmchartctl systemd timer" [puppet] - 10https://gerrit.wikimedia.org/r/697547 (owner: 10JMeybohm) [11:33:04] 10Puppet, 10Wikimedia-IRC-RC-Server, 10User-jbond: Ensure puppet sends the correct ircd signals to update config and motd - https://phabricator.wikimedia.org/T284052 (10Peachey88) [12:06:57] !log installing djvulibre security updates [12:07:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:35] !log re-pooling wdsq1005 (caught-up lag) [12:12:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:15:48] 10SRE, 10DBA: wmf-auto-reinstall fails on hosts that run pt-heartbeat - https://phabricator.wikimedia.org/T252528 (10LSobanski) A stub document capturing this is at https://wikitech.wikimedia.org/wiki/MariaDB/Rebooting_a_host. [12:16:53] hashar: re: gerrit java 11 upgrade, there are two criticals for missing gerrit processes, looks like monitoring needs tuning [12:23:39] godog: thanks, that should fix itself when puppet has run after the next patch which also adapts $JAVA_HOME [12:24:22] right now the change to Java 11 is hacked in manually into the service units so that we can more easily rollback in case there are issues with Java 11 and Gerrit [12:24:36] we'll make that change soonish [12:24:48] ah! thanks moritzm, that makes sense [12:33:14] (03PS1) 10Muehlenhoff: Add library hint for djvulibre [puppet] - 10https://gerrit.wikimedia.org/r/697591 [12:39:56] 10SRE, 10ops-eqiad, 10Analytics-Radar: Try to move some new analytics worker nodes to different racks - https://phabricator.wikimedia.org/T276239 (10Ottomata) @elukey @Cmjohnson to plan our work for T275767, do we have an ETA for the move of an-worker1139 to A7? [12:40:43] (03PS4) 10Kormat: db-repliation-tree: Display circular replication reasonably. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/696454 (https://phabricator.wikimedia.org/T283239) [12:42:22] (03PS5) 10Kormat: db-repliation-tree: Display circular replication reasonably. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/696454 (https://phabricator.wikimedia.org/T283239) [12:47:20] (03PS6) 10Kormat: db-replication-tree: Display circular replication reasonably. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/696454 (https://phabricator.wikimedia.org/T283239) [12:52:34] (03CR) 10Kormat: [C: 03+2] db-replication-tree: Display circular replication reasonably. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/696454 (https://phabricator.wikimedia.org/T283239) (owner: 10Kormat) [12:53:56] (03CR) 10Filippo Giunchedi: "LGTM overall, see inline" (033 comments) [alerts] - 10https://gerrit.wikimedia.org/r/696468 (https://phabricator.wikimedia.org/T282806) (owner: 10Ema) [12:54:55] (03Merged) 10jenkins-bot: db-replication-tree: Display circular replication reasonably. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/696454 (https://phabricator.wikimedia.org/T283239) (owner: 10Kormat) [13:03:22] 10SRE, 10Traffic, 10netops, 10User-jbond: varnish filtering: should we automatically update public_cloud_nets - https://phabricator.wikimedia.org/T270391 (10jbond) Nice work :) >>! In T270391#7120197, @cmooney wrote: > There is this script for AWS that @ema pointed me towards: > > https://gerrit.wikimed... [13:06:08] !log jynus@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE [13:06:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:17] !log jynus@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE [13:08:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:49] (03CR) 10Muehlenhoff: [C: 03+2] gerrit: switch to Java 11 [puppet] - 10https://gerrit.wikimedia.org/r/694524 (https://phabricator.wikimedia.org/T268225) (owner: 10Hashar) [13:08:54] (03CR) 1020after4: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/655743 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [13:09:32] (03CR) 1020after4: [C: 03+1] "afaik outbound email goes through regular outbound smtp servers." [puppet] - 10https://gerrit.wikimedia.org/r/655743 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [13:11:40] 10SRE, 10ops-eqiad, 10Analytics-Radar: Try to move some new analytics worker nodes to different racks - https://phabricator.wikimedia.org/T276239 (10elukey) @Ottomata it depends on T280203, but I think that we can move 5 out of 6 nodes right now and then wait the last one when A7 will be freed by old nodes :) [13:12:39] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, we can remove those in a few days (they'll need manual cleanup of the Java 8 packages)." [puppet] - 10https://gerrit.wikimedia.org/r/696591 (https://phabricator.wikimedia.org/T268225) (owner: 10Hashar) [13:12:46] 10SRE, 10ops-eqiad, 10Analytics-Radar: Try to move some new analytics worker nodes to different racks - https://phabricator.wikimedia.org/T276239 (10Ottomata) Right, but if that A7 move will happen in the next couple of weeks, we might just wait. If it will happen in many months from now, then I agree let's... [13:12:56] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-11-openjdk-amd64/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site https://wikitech.wikimedia.org/wiki/Gerrit [13:12:59] (03CR) 10Thcipriani: [C: 03+1] "<3" [puppet] - 10https://gerrit.wikimedia.org/r/697458 (https://phabricator.wikimedia.org/T283850) (owner: 10Marostegui) [13:15:36] (03CR) 10Andrew Bogott: [C: 03+1] wmcs.backups: skip tools etcd nodes [puppet] - 10https://gerrit.wikimedia.org/r/697574 (https://phabricator.wikimedia.org/T284050) (owner: 10David Caro) [13:20:08] RECOVERY - gerrit process on gerrit1001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-11-openjdk-amd64/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site https://wikitech.wikimedia.org/wiki/Gerrit [13:20:12] (03PS5) 10Ema: Traffic team alerts [alerts] - 10https://gerrit.wikimedia.org/r/696468 (https://phabricator.wikimedia.org/T282806) [13:20:58] (03CR) 10Ema: Traffic team alerts (033 comments) [alerts] - 10https://gerrit.wikimedia.org/r/696468 (https://phabricator.wikimedia.org/T282806) (owner: 10Ema) [13:22:11] (03PS1) 10Ema: Add __pycache__ directory to .gitignore [alerts] - 10https://gerrit.wikimedia.org/r/697598 [13:22:43] (03CR) 10Effie Mouzeli: [C: 03+1] mwdebug: fix the mcrouter configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/697587 (owner: 10Giuseppe Lavagetto) [13:22:54] (03CR) 10Effie Mouzeli: [C: 03+1] mediawiki: re-add newline [deployment-charts] - 10https://gerrit.wikimedia.org/r/697588 (owner: 10Giuseppe Lavagetto) [13:23:35] (03CR) 10Jgiannelos: Maps vector server PostGIS query improvements [deployment-charts] - 10https://gerrit.wikimedia.org/r/685799 (owner: 10Jgiannelos) [13:28:16] (03PS2) 10Jgiannelos: Maps vector server PostGIS query improvements [deployment-charts] - 10https://gerrit.wikimedia.org/r/685799 (https://phabricator.wikimedia.org/T281976) [13:30:47] (03CR) 10Effie Mouzeli: [C: 03+2] mediawiki: re-add newline [deployment-charts] - 10https://gerrit.wikimedia.org/r/697588 (owner: 10Giuseppe Lavagetto) [13:31:03] (03CR) 10Effie Mouzeli: [C: 03+2] mwdebug: fix the mcrouter configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/697587 (owner: 10Giuseppe Lavagetto) [13:32:43] (03PS7) 10Muehlenhoff: Cookbook to add a new node to a Ganeti cluster [cookbooks] - 10https://gerrit.wikimedia.org/r/696377 (https://phabricator.wikimedia.org/T274527) [13:33:50] (03Merged) 10jenkins-bot: mwdebug: fix the mcrouter configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/697587 (owner: 10Giuseppe Lavagetto) [13:33:52] (03Merged) 10jenkins-bot: mediawiki: re-add newline [deployment-charts] - 10https://gerrit.wikimedia.org/r/697588 (owner: 10Giuseppe Lavagetto) [13:36:43] (03CR) 10jerkins-bot: [V: 04-1] Cookbook to add a new node to a Ganeti cluster [cookbooks] - 10https://gerrit.wikimedia.org/r/696377 (https://phabricator.wikimedia.org/T274527) (owner: 10Muehlenhoff) [13:38:59] (03PS8) 10Muehlenhoff: Cookbook to add a new node to a Ganeti cluster [cookbooks] - 10https://gerrit.wikimedia.org/r/696377 (https://phabricator.wikimedia.org/T274527) [13:40:45] (03PS1) 10Effie Mouzeli: mediawiki: Fix nutcracker port [deployment-charts] - 10https://gerrit.wikimedia.org/r/697599 [13:41:36] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: Fix nutcracker port [deployment-charts] - 10https://gerrit.wikimedia.org/r/697599 (owner: 10Effie Mouzeli) [13:42:53] (03CR) 10Muehlenhoff: Cookbook to add a new node to a Ganeti cluster (0313 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/696377 (https://phabricator.wikimedia.org/T274527) (owner: 10Muehlenhoff) [13:43:03] (03PS9) 10Muehlenhoff: Cookbook to add a new node to a Ganeti cluster [cookbooks] - 10https://gerrit.wikimedia.org/r/696377 (https://phabricator.wikimedia.org/T274527) [13:43:17] !log Restoring Telia CT IC-307235 to normal metric / bring back into service (T274234) [13:43:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:22] T274234: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 [13:45:48] !log otto@deploy1002 Started deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - T272973 [13:45:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:52] T272973: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 [13:48:46] !log otto@deploy1002 Finished deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - T272973 (duration: 02m 58s) [13:48:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:55] (03PS12) 10Ottomata: Initial debianization and 2.1.0-py3.7-1 release [debs/airflow] (debian) - 10https://gerrit.wikimedia.org/r/693222 (https://phabricator.wikimedia.org/T277012) [13:50:20] (03PS1) 10Urbanecm: cawiki: Fix help panel links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697602 (https://phabricator.wikimedia.org/T280673) [13:50:31] jouncebot: now [13:50:31] No deployments scheduled for the next 2 hour(s) and 9 minute(s) [13:50:35] jouncebot: next [13:50:35] In 2 hour(s) and 9 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210601T1600) [13:50:45] (03CR) 10Urbanecm: [C: 03+2] cawiki: Fix help panel links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697602 (https://phabricator.wikimedia.org/T280673) (owner: 10Urbanecm) [13:51:32] (03Merged) 10jenkins-bot: cawiki: Fix help panel links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697602 (https://phabricator.wikimedia.org/T280673) (owner: 10Urbanecm) [13:52:47] (03PS1) 10Ottomata: airflow test - ensure analytics instance is absent, add analytics-test [puppet] - 10https://gerrit.wikimedia.org/r/697603 (https://phabricator.wikimedia.org/T272973) [13:53:03] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 3f757748a14ac8c205f6a5fac0611216c01ceb1c: cawiki: Fix help panel links (T280673) (duration: 00m 58s) [13:53:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:07] T280673: Deploy Growth features on Catalan Wikipedia - https://phabricator.wikimedia.org/T280673 [13:53:18] (03CR) 10jerkins-bot: [V: 04-1] airflow test - ensure analytics instance is absent, add analytics-test [puppet] - 10https://gerrit.wikimedia.org/r/697603 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [13:53:25] !log Draining Lumen CCT 442550293 to do some comparative bandwidth tests from eqiad to codfw (T274234) [13:53:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:29] T274234: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 [13:54:45] (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for djvulibre [puppet] - 10https://gerrit.wikimedia.org/r/697591 (owner: 10Muehlenhoff) [13:55:30] 10SRE, 10ops-codfw, 10DBA: codfw: db2079 memory issue on DIMM B8 - https://phabricator.wikimedia.org/T283743 (10Papaul) @Marostegui hello you can go ahead and depool the server i will be on site in about an hour. Thanks [13:55:50] (03CR) 10Muehlenhoff: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/525220 (https://phabricator.wikimedia.org/T46722) (owner: 10Muehlenhoff) [13:56:06] 10ops-codfw, 10DBA, 10Data-Persistence-Backup: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10Papaul) @jcrespo hello you can go ahead and depool the server i will be on site in about an hour to swap the DIMM. Thanks [13:56:13] (03PS2) 10Ottomata: airflow test - ensure analytics instance is absent, add analytics-test [puppet] - 10https://gerrit.wikimedia.org/r/697603 (https://phabricator.wikimedia.org/T272973) [13:56:18] 10SRE, 10ops-codfw, 10DBA: codfw: db2079 memory issue on DIMM B8 - https://phabricator.wikimedia.org/T283743 (10Marostegui) Excellent, thanks @Papaul [13:56:32] !log Stop mysql on db2079 (codfw master) - T283743 [13:56:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:36] T283743: codfw: db2079 memory issue on DIMM B8 - https://phabricator.wikimedia.org/T283743 [13:57:02] (03PS3) 10Ottomata: airflow test - ensure analytics instance is absent, add analytics-test [puppet] - 10https://gerrit.wikimedia.org/r/697603 (https://phabricator.wikimedia.org/T272973) [13:57:24] 10SRE, 10ops-eqiad, 10cloud-services-team (Hardware): cloudvirt1040 primary NIC disconnected - https://phabricator.wikimedia.org/T281399 (10Andrew) It's still up! I will re-enable monitoring and if it doesn't flap then we can declare it to be cured. [13:58:10] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29746/console" [puppet] - 10https://gerrit.wikimedia.org/r/697603 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [13:58:31] 10SRE, 10ops-eqiad, 10cloud-services-team (Hardware): cloudvirt1040 primary NIC disconnected - https://phabricator.wikimedia.org/T281399 (10Andrew) hm... mgmt shows 'DNS CRITICAL - expected '0.0.0.0' but got '10.65.0.227' -- I've never seen that before. [13:58:43] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to gitlab1001 for Ahmon Dancy (@dancy) - https://phabricator.wikimedia.org/T283850 (10wkandek) approved [13:59:01] (03CR) 10Ottomata: [V: 03+1 C: 03+2] airflow test - ensure analytics instance is absent, add analytics-test [puppet] - 10https://gerrit.wikimedia.org/r/697603 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [13:59:11] (03CR) 10Marostegui: [C: 03+2] data.yaml: Add dancy to gitlab-roots [puppet] - 10https://gerrit.wikimedia.org/r/697458 (https://phabricator.wikimedia.org/T283850) (owner: 10Marostegui) [14:00:20] 10SRE, 10ops-codfw, 10DBA: codfw: db2079 memory issue on DIMM B8 - https://phabricator.wikimedia.org/T283743 (10Marostegui) db2079 is off and ready for you @Papaul [14:00:49] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to gitlab1001 for Ahmon Dancy (@dancy) - https://phabricator.wikimedia.org/T283850 (10Marostegui) [14:02:06] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to gitlab1001 for Ahmon Dancy (@dancy) - https://phabricator.wikimedia.org/T283850 (10Marostegui) 05Open→03Resolved This is done. Please give it some time so puppet runs everywhere. [14:03:21] 10SRE, 10SRE-Access-Requests: Requesting access to contint-roots for Ahmon Dancy - https://phabricator.wikimedia.org/T283851 (10Marostegui) @thcipriani looks like you are the approval point for this one too! :) [14:04:16] (03PS1) 10Marostegui: data.yaml: Add dancy to contint-roots [puppet] - 10https://gerrit.wikimedia.org/r/697604 (https://phabricator.wikimedia.org/T283851) [14:04:46] (03CR) 10Marostegui: [C: 04-2] "Needs approval" [puppet] - 10https://gerrit.wikimedia.org/r/697604 (https://phabricator.wikimedia.org/T283851) (owner: 10Marostegui) [14:08:49] (03CR) 10Filippo Giunchedi: [C: 03+1] Add __pycache__ directory to .gitignore [alerts] - 10https://gerrit.wikimedia.org/r/697598 (owner: 10Ema) [14:10:24] (03CR) 10Filippo Giunchedi: [C: 03+1] Traffic team alerts [alerts] - 10https://gerrit.wikimedia.org/r/696468 (https://phabricator.wikimedia.org/T282806) (owner: 10Ema) [14:17:07] (03PS1) 10Ottomata: service_auto_restart - match full line when ensuring absent [puppet] - 10https://gerrit.wikimedia.org/r/697605 [14:17:43] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:19:27] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:19:27] (03PS1) 10Ottomata: airflow-analytics-test - set db_user [puppet] - 10https://gerrit.wikimedia.org/r/697607 (https://phabricator.wikimedia.org/T272973) [14:20:43] (03CR) 10Ottomata: "Alternatively, we could keep using match, but just change it to" [puppet] - 10https://gerrit.wikimedia.org/r/697605 (owner: 10Ottomata) [14:21:38] (03PS2) 10Ottomata: airflow-analytics-test - set db_user [puppet] - 10https://gerrit.wikimedia.org/r/697607 (https://phabricator.wikimedia.org/T272973) [14:24:13] (03PS3) 10Ottomata: airflow-analytics-test - set db_user [puppet] - 10https://gerrit.wikimedia.org/r/697607 (https://phabricator.wikimedia.org/T272973) [14:29:35] (03CR) 10Ema: [C: 03+2] Add __pycache__ directory to .gitignore [alerts] - 10https://gerrit.wikimedia.org/r/697598 (owner: 10Ema) [14:29:48] (03CR) 10Ema: [C: 03+2] Traffic team alerts [alerts] - 10https://gerrit.wikimedia.org/r/696468 (https://phabricator.wikimedia.org/T282806) (owner: 10Ema) [14:29:53] (03PS6) 10Ema: Traffic team alerts [alerts] - 10https://gerrit.wikimedia.org/r/696468 (https://phabricator.wikimedia.org/T282806) [14:32:06] (03PS4) 10Ottomata: airflow-analytics-test - set db_user [puppet] - 10https://gerrit.wikimedia.org/r/697607 (https://phabricator.wikimedia.org/T272973) [14:33:25] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 13 DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29749/console" [puppet] - 10https://gerrit.wikimedia.org/r/694465 (https://phabricator.wikimedia.org/T271967) (owner: 10Effie Mouzeli) [14:33:40] 10SRE, 10Wikimedia-Mailing-lists: Enourmous mailman3 outgoing queue - https://phabricator.wikimedia.org/T284003 (10Legoktm) 05Open→03Resolved a:03Legoktm So we were running into an exception T282348#7124014, and when the bounce runner crashed, it rolled back the transaction and un-unsubscribed all the us... [14:34:07] (03PS3) 10Herron: grafana: add wrapper to call grr with environment vars set [puppet] - 10https://gerrit.wikimedia.org/r/696627 [14:34:16] (03CR) 10David Caro: [C: 03+2] wmcs.backups: skip tools etcd nodes [puppet] - 10https://gerrit.wikimedia.org/r/697574 (https://phabricator.wikimedia.org/T284050) (owner: 10David Caro) [14:34:19] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/695377 (https://phabricator.wikimedia.org/T271967) (owner: 10Jbond) [14:34:38] (03CR) 10Jbond: [V: 03+1 C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/694465 (https://phabricator.wikimedia.org/T271967) (owner: 10Effie Mouzeli) [14:34:49] (03PS10) 10Jbond: (WIP) hieradata: enable tls on mc2019 (3) [puppet] - 10https://gerrit.wikimedia.org/r/694484 (https://phabricator.wikimedia.org/T271967) (owner: 10Effie Mouzeli) [14:35:12] (03PS11) 10Jbond: (WIP) hieradata: enable tls on mc2019 (3) [puppet] - 10https://gerrit.wikimedia.org/r/694484 (https://phabricator.wikimedia.org/T271967) (owner: 10Effie Mouzeli) [14:36:11] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29750/console" [puppet] - 10https://gerrit.wikimedia.org/r/694484 (https://phabricator.wikimedia.org/T271967) (owner: 10Effie Mouzeli) [14:38:38] (03CR) 10Ottomata: [C: 03+2] airflow-analytics-test - set db_user [puppet] - 10https://gerrit.wikimedia.org/r/697607 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [14:39:04] (03CR) 10Jbond: [V: 03+1 C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/694484 (https://phabricator.wikimedia.org/T271967) (owner: 10Effie Mouzeli) [14:39:57] (03PS3) 10Legoktm: lists: Drop enable_mm3 config option [puppet] - 10https://gerrit.wikimedia.org/r/693595 (owner: 10Ladsgroup) [14:39:59] (03PS2) 10Legoktm: lists: Stop routing mail to mailman2 [puppet] - 10https://gerrit.wikimedia.org/r/693599 (https://phabricator.wikimedia.org/T52864) (owner: 10Ladsgroup) [14:40:01] (03PS4) 10Legoktm: lists: Stop mailman2 service [puppet] - 10https://gerrit.wikimedia.org/r/693600 (https://phabricator.wikimedia.org/T52864) (owner: 10Ladsgroup) [14:43:03] 10Puppet, 10SRE-OnFire, 10User-jbond: Create SRE checklist for puppet - https://phabricator.wikimedia.org/T284073 (10jbond) p:05Triage→03Medium [14:43:25] (03CR) 10Legoktm: [C: 03+2] "Oh look, it's tomorrow." [puppet] - 10https://gerrit.wikimedia.org/r/693595 (owner: 10Ladsgroup) [14:44:09] 10ops-codfw, 10DBA, 10Data-Persistence-Backup: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10jcrespo) Shutting it down, will comment again when downtimed to prevent unwanted alerts and fully down. [14:44:44] 10SRE, 10CAS-SSO, 10Patch-For-Review, 10User-jbond: Cookbook for centralised logouts and session status queries - https://phabricator.wikimedia.org/T283242 (10jbond) [14:48:21] (03CR) 10Herron: [C: 03+2] grafana: add wrapper to call grr with environment vars set (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/696627 (owner: 10Herron) [14:49:38] (03CR) 10Herron: [C: 03+2] grafana: fetch operations/grafana-grizzly as /srv/grafana-grizzly [puppet] - 10https://gerrit.wikimedia.org/r/696626 (owner: 10Herron) [14:50:07] (03PS4) 10Herron: grafana: add wrapper to call grr with environment vars set [puppet] - 10https://gerrit.wikimedia.org/r/696627 [14:52:52] 10SRE: debmonitor.discovery.wmnet: Generate server cetificate via cfssl - https://phabricator.wikimedia.org/T281377 (10jbond) 05Open→03Resolved [14:52:57] 10SRE, 10CFSSL-PKI, 10Patch-For-Review: Additional CFSSL tasks - https://phabricator.wikimedia.org/T281369 (10jbond) [14:53:48] (03CR) 10Legoktm: [C: 03+2] "RIH." [puppet] - 10https://gerrit.wikimedia.org/r/693599 (https://phabricator.wikimedia.org/T52864) (owner: 10Ladsgroup) [14:55:06] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] cumin.wmcs: add/fix eqiad1 aliases [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) (owner: 10David Caro) [14:55:11] 10SRE, 10Dumps-Generation, 10Wikidata, 10observability, and 2 others: various weekly and daily dumps run from systemd timers are broken - https://phabricator.wikimedia.org/T281267 (10jbond) a:05jbond→03None [14:55:16] 10SRE, 10serviceops: ifup@eno1.service failed on some buster hosts - https://phabricator.wikimedia.org/T270220 (10jbond) 05Open→03Resolved [14:55:58] 10Puppet, 10GitLab (Initialization), 10Patch-For-Review, 10Release-Engineering-Team (Radar), and 2 others: Puppetise gitlab-ansible playbook - https://phabricator.wikimedia.org/T283076 (10jbond) [14:56:00] 10SRE, 10CFSSL-PKI, 10Patch-For-Review, 10User-jbond: Additional CFSSL tasks - https://phabricator.wikimedia.org/T281369 (10jbond) [14:56:25] 10Puppet, 10SRE, 10Orchestrator, 10CAS-SSO, 10User-jbond: Puppet host certs do not contain Subject Alt Name entries - https://phabricator.wikimedia.org/T273637 (10jbond) [14:56:49] 10Puppet, 10SRE, 10Patch-For-Review, 10User-jbond: Review puppetmaster SSL configueration - https://phabricator.wikimedia.org/T268040 (10jbond) [14:56:50] 10SRE, 10SRE-Access-Requests: Requesting access to gitlab1001 for Ahmon Dancy (@dancy) - https://phabricator.wikimedia.org/T283850 (10dancy) @Marostegui Thanks! [14:57:07] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] wmcs.do_log_msg: Fixed to use the new correct port [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/696505 (owner: 10David Caro) [14:58:25] (03PS15) 10Effie Mouzeli: profile::memcached::instance: Add TLS support [puppet] - 10https://gerrit.wikimedia.org/r/694465 (https://phabricator.wikimedia.org/T271967) [14:58:38] 10SRE, 10netbox, 10Patch-For-Review, 10User-jbond: Add SSO support to netbox - https://phabricator.wikimedia.org/T244849 (10jbond) [14:59:04] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] cloudvirt.*: adding sal messages to all the cookbooks [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/696453 (owner: 10David Caro) [14:59:43] !log Restoring Lumen CCT 442550293 to normal metric / bring back into service (T274234) [14:59:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:48] T274234: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 [14:59:56] (03CR) 10Thcipriani: [C: 03+1] data.yaml: Add dancy to contint-roots [puppet] - 10https://gerrit.wikimedia.org/r/697604 (https://phabricator.wikimedia.org/T283851) (owner: 10Marostegui) [15:01:32] 10ops-codfw, 10DBA, 10Data-Persistence-Backup: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10jcrespo) @Papaul The host should be down already and has been downtime'd for a day- it is all yours. Just reboot it after you are done and comment here... [15:02:57] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to contint-roots for Ahmon Dancy - https://phabricator.wikimedia.org/T283851 (10thcipriani) >>! In T283851#7126041, @Marostegui wrote: > @thcipriani looks like you are the approval point for this one too! :) Approved! Thank you! [15:03:38] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to contint-roots for Ahmon Dancy - https://phabricator.wikimedia.org/T283851 (10Marostegui) [15:03:45] (03CR) 10Marostegui: [C: 03+2] data.yaml: Add dancy to contint-roots [puppet] - 10https://gerrit.wikimedia.org/r/697604 (https://phabricator.wikimedia.org/T283851) (owner: 10Marostegui) [15:05:40] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to contint-roots for Ahmon Dancy - https://phabricator.wikimedia.org/T283851 (10Marostegui) 05Open→03Resolved Patch merged, please give it sometime so puppet runs everywhere. [15:07:26] (03PS1) 10Marostegui: data.yaml: Add Ladsgroup to contint-admins [puppet] - 10https://gerrit.wikimedia.org/r/697614 (https://phabricator.wikimedia.org/T283925) [15:07:57] (03CR) 10Marostegui: [C: 04-2] "Waiting for approval" [puppet] - 10https://gerrit.wikimedia.org/r/697614 (https://phabricator.wikimedia.org/T283925) (owner: 10Marostegui) [15:08:51] 10SRE, 10Data-Persistence-Backup, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10jcrespo) FYI, cross-dc backups are now in a "normal state" meaning we should only have those a few hours... [15:09:09] (03PS1) 10Ottomata: airflow - webserver host default to localhost, Admin for public role [puppet] - 10https://gerrit.wikimedia.org/r/697615 (https://phabricator.wikimedia.org/T272973) [15:09:30] (03CR) 10Effie Mouzeli: "PCC https://puppet-compiler.wmflabs.org/compiler1002/29753/" [puppet] - 10https://gerrit.wikimedia.org/r/695377 (https://phabricator.wikimedia.org/T271967) (owner: 10Jbond) [15:09:49] (03CR) 10jerkins-bot: [V: 04-1] airflow - webserver host default to localhost, Admin for public role [puppet] - 10https://gerrit.wikimedia.org/r/697615 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [15:12:53] PROBLEM - Host db2079.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:14:09] ^ expected [15:14:56] (03PS2) 10Ottomata: airflow - webserver host default to localhost, Admin for public role [puppet] - 10https://gerrit.wikimedia.org/r/697615 (https://phabricator.wikimedia.org/T272973) [15:15:54] 10SRE, 10observability, 10CAS-SSO, 10User-jbond: thanos u/i gives errors if left idle for a few hours - https://phabricator.wikimedia.org/T268233 (10jbond) [15:15:56] (03CR) 10jerkins-bot: [V: 04-1] airflow - webserver host default to localhost, Admin for public role [puppet] - 10https://gerrit.wikimedia.org/r/697615 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [15:16:21] !log ryankemper@cumin1001 START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223 [15:16:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:25] T283223: Reboot cloudelastic* to apply security updates - https://phabricator.wikimedia.org/T283223 [15:16:30] !log T283223 `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id T283223` on `ryankemper@cumin1001` tmux session `restart_cloudelastic` [15:16:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:56] (03PS1) 10Jbond: hiera: add grafana to cors list for IDP [puppet] - 10https://gerrit.wikimedia.org/r/697616 (https://phabricator.wikimedia.org/T268233) [15:18:02] 10SRE, 10Traffic, 10netops, 10User-jbond: varnish filtering: should we automatically update public_cloud_nets - https://phabricator.wikimedia.org/T270391 (10cmooney) Thanks jbond appreciate the feedback. Your improvements to the script look great. Nice work on the parsing, much cleaner than my shite, and... [15:18:43] RECOVERY - Host db2079.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.22 ms [15:19:10] 10SRE, 10Traffic, 10User-jbond: Implement machine-local forwarding DNS caches - https://phabricator.wikimedia.org/T171498 (10jbond) [15:19:16] (03PS3) 10Ottomata: airflow - webserver host default to localhost, Admin for public role [puppet] - 10https://gerrit.wikimedia.org/r/697615 (https://phabricator.wikimedia.org/T272973) [15:19:44] (03PS3) 10Jbond: O:base::resolving: drop the domain keyword and use the domain fact [puppet] - 10https://gerrit.wikimedia.org/r/690515 (https://phabricator.wikimedia.org/T171498) [15:19:59] (03CR) 10jerkins-bot: [V: 04-1] airflow - webserver host default to localhost, Admin for public role [puppet] - 10https://gerrit.wikimedia.org/r/697615 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [15:20:01] (03PS3) 10Jbond: O:base::resolving: make nameservers mandatory [puppet] - 10https://gerrit.wikimedia.org/r/690529 (https://phabricator.wikimedia.org/T171498) [15:20:11] (03PS4) 10Jbond: O:base::resolving: make nameservers mandatory [puppet] - 10https://gerrit.wikimedia.org/r/690529 (https://phabricator.wikimedia.org/T171498) [15:20:20] (03PS6) 10Jbond: O:base::resolver: unify resolv.con templates [puppet] - 10https://gerrit.wikimedia.org/r/690522 (https://phabricator.wikimedia.org/T171498) [15:20:31] (03PS7) 10Jbond: O:base::resolver: unify resolv.con templates [puppet] - 10https://gerrit.wikimedia.org/r/690522 (https://phabricator.wikimedia.org/T171498) [15:20:40] (03PS2) 10Jbond: resolvconf: create new class [puppet] - 10https://gerrit.wikimedia.org/r/691080 (https://phabricator.wikimedia.org/T171498) [15:20:52] (03PS3) 10Jbond: resolvconf: create new class [puppet] - 10https://gerrit.wikimedia.org/r/691080 (https://phabricator.wikimedia.org/T171498) [15:23:36] !log ryankemper@cumin1001 END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223 [15:23:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:40] T283223: Reboot cloudelastic* to apply security updates - https://phabricator.wikimedia.org/T283223 [15:24:15] (03PS4) 10Ottomata: airflow - webserver host default to localhost, Admin for public role [puppet] - 10https://gerrit.wikimedia.org/r/697615 (https://phabricator.wikimedia.org/T272973) [15:24:47] (03CR) 10jerkins-bot: [V: 04-1] O:base::resolving: make nameservers mandatory [puppet] - 10https://gerrit.wikimedia.org/r/690529 (https://phabricator.wikimedia.org/T171498) (owner: 10Jbond) [15:25:28] (03CR) 10jerkins-bot: [V: 04-1] O:base::resolver: unify resolv.con templates [puppet] - 10https://gerrit.wikimedia.org/r/690522 (https://phabricator.wikimedia.org/T171498) (owner: 10Jbond) [15:25:35] PROBLEM - Check systemd state on cloudelastic1002 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9400.service,prometheus-wmf-elasticsearch-exporter-9600.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:26:06] (03CR) 10jerkins-bot: [V: 04-1] resolvconf: create new class [puppet] - 10https://gerrit.wikimedia.org/r/691080 (https://phabricator.wikimedia.org/T171498) (owner: 10Jbond) [15:26:27] 10SRE, 10ops-codfw, 10DBA: codfw: db2079 memory issue on DIMM B8 - https://phabricator.wikimedia.org/T283743 (10Papaul) 05Open→03Resolved Swapped DIMM B8 with DIMM A8 we will see if we do see the issue on DIMM A8 . If we do, I will use one of the DIMM from one if the Decom servers . Resolving this task... [15:27:24] (03PS25) 10Jbond: sre: convert the generic reboot functions to the cookbook class API [cookbooks] - 10https://gerrit.wikimedia.org/r/657102 (https://phabricator.wikimedia.org/T284079) [15:28:10] 10Puppet, 10SRE, 10SRE-tools, 10User-jbond: Private puppet commit hook checks current state of folder, not what is staged - https://phabricator.wikimedia.org/T278187 (10jbond) [15:28:58] (03CR) 10Ottomata: [C: 03+2] airflow - webserver host default to localhost, Admin for public role [puppet] - 10https://gerrit.wikimedia.org/r/697615 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [15:30:21] 10SRE, 10Security, 10User-jbond: Investigate potential issues with the sudoeres env_keep values - https://phabricator.wikimedia.org/T275852 (10jbond) [15:31:57] 10SRE, 10Security, 10User-jbond: Investigate potential issues with the sudoeres env_keep values - https://phabricator.wikimedia.org/T275852 (10jbond) @MoritzMuehlenhoff @faidon are you able to comment in relation to the history around why we have `HOME` in sudoes `env_keep` and more importunately can we remo... [15:33:47] 10SRE, 10Traffic, 10Patch-For-Review, 10User-jbond: interface-rps.py should have a flag to avoid CPU0 - https://phabricator.wikimedia.org/T236208 (10jbond) [15:36:08] (03CR) 10Legoktm: [C: 03+2] "Ding dong, the witch is dead" [puppet] - 10https://gerrit.wikimedia.org/r/693600 (https://phabricator.wikimedia.org/T52864) (owner: 10Ladsgroup) [15:36:39] (03CR) 10Ladsgroup: "<3" [puppet] - 10https://gerrit.wikimedia.org/r/697614 (https://phabricator.wikimedia.org/T283925) (owner: 10Marostegui) [15:37:50] ● mailman.service - Mailman Master Queue Runner [15:37:50] Loaded: loaded (/lib/systemd/system/mailman.service; enabled; vendor preset: enabled) [15:37:50] Active: inactive (dead) since Tue 2021-06-01 15:37:21 UTC; 17s ago [15:37:55] WOHOOOOOO [15:38:11] 10SRE, 10ops-codfw, 10DBA: codfw: db2079 memory issue on DIMM B8 - https://phabricator.wikimedia.org/T283743 (10Marostegui) 05Resolved→03Open On boot, we are hitting T216240, @Papaul let's get firmware and bios upgraded please [15:38:12] * Amir1 opens a box of beer [15:38:21] !log stopped mailman2 service on lists1001 (T52864) [15:38:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:26] T52864: Upgrade GNU Mailman from 2.1 to Mailman3 - https://phabricator.wikimedia.org/T52864 [15:39:49] 10Puppet, 10User-jbond: Puppet CI should use rspec-parallel - https://phabricator.wikimedia.org/T284080 (10jbond) p:05Triage→03Medium [15:40:21] 10SRE, 10Wikimedia-Mailing-lists, 10Mobile: List archives on lists.wikimedia.org is not mobile friendly - https://phabricator.wikimedia.org/T190054 (10Ladsgroup) [15:40:25] 10SRE, 10Wikimedia-Mailing-lists, 10Security, 10Upstream: Implement proper AAA for lists.wikimedia.org (mailman) - https://phabricator.wikimedia.org/T118641 (10Ladsgroup) [15:40:31] 10SRE, 10Wikimedia-Mailing-lists, 10Privacy, 10Security, 10User-Josve05a: Stop storing Mailman passwords in plain text - https://phabricator.wikimedia.org/T181803 (10Ladsgroup) [15:40:37] 10SRE, 10Wikimedia-Mailing-lists, 10Upstream: "From" at start of line becomes ">From" in pipermail - https://phabricator.wikimedia.org/T115329 (10Ladsgroup) [15:40:43] 10SRE, 10Security-Team, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Upgrade GNU Mailman from 2.1 to Mailman3 - https://phabricator.wikimedia.org/T52864 (10Ladsgroup) 05Open→03Resolved Finally calling this done. The clean up will be handled in {T282303} [15:41:06] Congrats Amir1 + legoktm [15:41:15] 10SRE, 10CFSSL-PKI, 10Patch-For-Review, 10User-jbond: Additional CFSSL tasks - https://phabricator.wikimedia.org/T281369 (10jbond) [15:41:35] Congrats on killing it [15:41:42] 10SRE, 10Wikimedia-Mailing-lists, 10Privacy, 10Security, 10User-Josve05a: Stop storing Mailman passwords in plain text - https://phabricator.wikimedia.org/T181803 (10Ladsgroup) 05Open→03Resolved Mailman2 is now officially dead. [15:42:16] 10SRE, 10Wikimedia-Mailing-lists: https://lists.wikimedia.org/pipermail/wikija-l/ has broken encoding - https://phabricator.wikimedia.org/T269301 (10Ladsgroup) 05Open→03Resolved Mailman2 is now dead. [15:42:18] (03PS1) 10Ottomata: Subscribe airflow-webserver to webserver_config.py [puppet] - 10https://gerrit.wikimedia.org/r/697617 (https://phabricator.wikimedia.org/T272973) [15:42:32] (03CR) 10jerkins-bot: [V: 04-1] Subscribe airflow-webserver to webserver_config.py [puppet] - 10https://gerrit.wikimedia.org/r/697617 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [15:43:20] (03PS2) 10Ottomata: Subscribe airflow-webserver to webserver_config.py [puppet] - 10https://gerrit.wikimedia.org/r/697617 (https://phabricator.wikimedia.org/T272973) [15:44:38] 10Puppet, 10User-jbond: Add type validation to puppetmaster::standalone - https://phabricator.wikimedia.org/T284082 (10jbond) p:05Triage→03Low [15:44:51] (03PS7) 10Jbond: role::puppetmaster::standalone: add type checking to autosign [puppet] - 10https://gerrit.wikimedia.org/r/566512 (https://phabricator.wikimedia.org/T284082) [15:45:18] no alarms so far [15:48:49] RECOVERY - Check systemd state on cloudelastic1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:50:38] 10SRE, 10Traffic, 10netops, 10User-jbond: varnish filtering: should we automatically update public_cloud_nets - https://phabricator.wikimedia.org/T270391 (10jbond) > One thing I do think we should include is some sort of IP aggregation completely agree, its an oversight that it missed > I'm not sure if Ne... [15:52:21] (03CR) 10Jbond: [C: 03+2] hiera: add grafana to cors list for IDP [puppet] - 10https://gerrit.wikimedia.org/r/697616 (https://phabricator.wikimedia.org/T268233) (owner: 10Jbond) [15:52:31] (03CR) 10Ottomata: [C: 03+2] Subscribe airflow-webserver to webserver_config.py [puppet] - 10https://gerrit.wikimedia.org/r/697617 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [15:53:00] (03PS1) 10Ottomata: mariadb::instance - allow passing extra configs from hiera using default $template [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) [15:53:46] jbond: puppet-merginy your idp::cors change [15:54:09] ottomata: yes please [15:54:29] (03CR) 10jerkins-bot: [V: 04-1] mariadb::instance - allow passing extra configs from hiera using default $template [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [15:58:03] (03PS2) 10Ottomata: mariadb::instance - allow passing extra configs from hiera [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) [15:59:28] PROBLEM - Host db2100.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:59:32] (03CR) 10jerkins-bot: [V: 04-1] mariadb::instance - allow passing extra configs from hiera [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [16:00:04] jbond42 and cdanis: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210601T1600). [16:00:47] nothing to deploy [16:01:08] (03PS3) 10Ottomata: mariadb::instance - allow passing extra configs from hiera [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) [16:02:13] Host db2100.mgmt is DOWN is expected, there is ongoing hw maintenance [16:04:48] RECOVERY - Host db2100.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.43 ms [16:05:47] (03CR) 10Ayounsi: [C: 03+2] Ignore 192.168.0.0/16 subnets when importing IPs [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/696563 (https://phabricator.wikimedia.org/T283813) (owner: 10Ayounsi) [16:06:03] (03CR) 10Ayounsi: [C: 03+2] "Thanks, tested and works as expected." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/696563 (https://phabricator.wikimedia.org/T283813) (owner: 10Ayounsi) [16:06:43] (03Merged) 10jenkins-bot: Ignore 192.168.0.0/16 subnets when importing IPs [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/696563 (https://phabricator.wikimedia.org/T283813) (owner: 10Ayounsi) [16:07:15] (03PS4) 10Ottomata: mariadb::instance - allow passing extra configs from hiera [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) [16:11:40] (03PS2) 10Ssingh: site: add doh3001 and doh3002 with role insetup [puppet] - 10https://gerrit.wikimedia.org/r/696605 (https://phabricator.wikimedia.org/T283852) [16:13:26] (03PS5) 10Ottomata: mariadb::instance - allow passing extra configs from hiera [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) [16:16:15] 10SRE, 10ops-codfw, 10DBA: codfw: db2079 memory issue on DIMM B8 - https://phabricator.wikimedia.org/T283743 (10Papaul) 05Open→03Resolved Firmware upgrade complete [16:16:38] 10SRE, 10ops-codfw, 10DBA: codfw: db2079 memory issue on DIMM B8 - https://phabricator.wikimedia.org/T283743 (10Marostegui) MySQL started - thanks Papaul! [16:17:17] (03CR) 10David Caro: [C: 03+2] cumin.wmcs: add/fix eqiad1 aliases [puppet] - 10https://gerrit.wikimedia.org/r/697582 (https://phabricator.wikimedia.org/T279438) (owner: 10David Caro) [16:17:57] 10ops-codfw, 10DBA, 10Data-Persistence-Backup: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10Papaul) 05Open→03Resolved swapped P1 DIMM5 with P2 DIMM5 . Server is back online. closing is issue is seen on P2 DIMM 5. I will request a DIMM repla... [16:20:42] (03PS1) 10Jbond: C:puppetmaster::puppetdb: drop filter_id hack [puppet] - 10https://gerrit.wikimedia.org/r/697623 (https://phabricator.wikimedia.org/T164456) [16:21:19] shdubsh: enjoy! [16:21:55] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29760/console" [puppet] - 10https://gerrit.wikimedia.org/r/697623 (https://phabricator.wikimedia.org/T164456) (owner: 10Jbond) [16:23:05] (03CR) 10Jbond: [V: 03+1 C: 03+2] C:puppetmaster::puppetdb: drop filter_id hack [puppet] - 10https://gerrit.wikimedia.org/r/697623 (https://phabricator.wikimedia.org/T164456) (owner: 10Jbond) [16:27:30] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good (sans approval by Tyler)." [puppet] - 10https://gerrit.wikimedia.org/r/697614 (https://phabricator.wikimedia.org/T283925) (owner: 10Marostegui) [16:29:02] (03PS6) 10Ottomata: mariadb::instance - allow passing extra configs from hiera [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) [16:30:19] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [16:31:19] !log updating debmonitor clients to 0.3.0 (along with cleanup of sysuser UID allocation) [16:31:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:36:56] (03PS1) 10Jcrespo: mariadb: Reenable notifications on db2098, db2097 after maintenance [puppet] - 10https://gerrit.wikimedia.org/r/697624 (https://phabricator.wikimedia.org/T283995) [16:37:43] (03PS1) 10Jbond: P:nginx: add an nginx profile [puppet] - 10https://gerrit.wikimedia.org/r/697625 (https://phabricator.wikimedia.org/T164456) [16:37:45] (03PS1) 10Jbond: O:puppetmaster::puppetdb: add nginx profile to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/697626 (https://phabricator.wikimedia.org/T164456) [16:37:47] (03PS1) 10Jbond: O:puppetmatser::puppetdb: switch puppetdb to use nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/697627 (https://phabricator.wikimedia.org/T164456) [16:38:01] (03PS7) 10Ottomata: mariadb::instance - allow passing extra configs from hiera [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) [16:38:26] (03Abandoned) 10Jcrespo: jessie: Revert openssl conf on director/storage to package defaults [puppet] - 10https://gerrit.wikimedia.org/r/660856 (https://phabricator.wikimedia.org/T273182) (owner: 10Jcrespo) [16:39:27] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29763/console" [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [16:39:31] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29764/console" [puppet] - 10https://gerrit.wikimedia.org/r/697626 (https://phabricator.wikimedia.org/T164456) (owner: 10Jbond) [16:39:45] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29765/console" [puppet] - 10https://gerrit.wikimedia.org/r/697627 (https://phabricator.wikimedia.org/T164456) (owner: 10Jbond) [16:42:06] (03PS8) 10Ottomata: mariadb::instance - allow passing extra configs from hiera [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) [16:42:08] (03CR) 10Jcrespo: [C: 03+2] mariadb: Reenable notifications on db2098, db2097 after maintenance [puppet] - 10https://gerrit.wikimedia.org/r/697624 (https://phabricator.wikimedia.org/T283995) (owner: 10Jcrespo) [16:42:12] (03PS1) 10Muehlenhoff: Disable peek crons [puppet] - 10https://gerrit.wikimedia.org/r/697628 [16:43:04] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29766/console" [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [16:43:13] (03CR) 10jerkins-bot: [V: 04-1] Disable peek crons [puppet] - 10https://gerrit.wikimedia.org/r/697628 (owner: 10Muehlenhoff) [16:43:42] (03CR) 10Ottomata: [V: 03+1] "Ok, looks good: https://puppet-compiler.wmflabs.org/compiler1002/29766/db1108.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/697618 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [16:44:22] PROBLEM - DPKG on bast4003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [16:45:59] (03PS1) 10Jbond: (DO NOT merge) P:sretest: test change to check hiera dot notation [puppet] - 10https://gerrit.wikimedia.org/r/697629 (https://phabricator.wikimedia.org/T256221) [16:46:50] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29767/console" [puppet] - 10https://gerrit.wikimedia.org/r/697629 (https://phabricator.wikimedia.org/T256221) (owner: 10Jbond) [16:49:21] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:50:21] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:51:20] (03PS2) 10Jbond: (DO NOT merge) P:sretest: test change to check hiera dot notation [puppet] - 10https://gerrit.wikimedia.org/r/697629 (https://phabricator.wikimedia.org/T256221) [16:52:48] (03CR) 10jerkins-bot: [V: 04-1] (DO NOT merge) P:sretest: test change to check hiera dot notation [puppet] - 10https://gerrit.wikimedia.org/r/697629 (https://phabricator.wikimedia.org/T256221) (owner: 10Jbond) [16:53:24] (03PS3) 10Jbond: (DO NOT merge) P:sretest: test change to check hiera dot notation [puppet] - 10https://gerrit.wikimedia.org/r/697629 (https://phabricator.wikimedia.org/T256221) [16:54:24] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29769/console" [puppet] - 10https://gerrit.wikimedia.org/r/697629 (https://phabricator.wikimedia.org/T256221) (owner: 10Jbond) [16:54:51] (03CR) 10jerkins-bot: [V: 04-1] (DO NOT merge) P:sretest: test change to check hiera dot notation [puppet] - 10https://gerrit.wikimedia.org/r/697629 (https://phabricator.wikimedia.org/T256221) (owner: 10Jbond) [16:58:46] 10Puppet, 10Patch-For-Review, 10User-jbond: Investigate hiera lookup dot notation - https://phabricator.wikimedia.org/T256221 (10jbond) this works as expected see the [[ https://puppet-compiler.wmflabs.org/compiler1003/29769/ | CR linked ]] which produced the following PCC output ` Class[Profile::Sretest]... [16:59:32] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] cloudvirt.*maintenante: use a default cloudcontrol node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/696447 (owner: 10David Caro) [17:00:04] chrisalbon and accraze: #bothumor I � Unicode. All rise for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210601T1700). [17:00:05] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] unset_maintenance: don't set downtime on icinga [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/696446 (owner: 10David Caro) [17:00:20] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] cloudvirt.{drain|safe_reboot}: use default control node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/696448 (owner: 10David Caro) [17:01:05] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] ceph: don't log to file as syslog works already [puppet] - 10https://gerrit.wikimedia.org/r/696330 (https://phabricator.wikimedia.org/T281247) (owner: 10David Caro) [17:01:42] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] ceph: add syslog logging [puppet] - 10https://gerrit.wikimedia.org/r/695299 (https://phabricator.wikimedia.org/T281247) (owner: 10David Caro) [17:02:08] (03PS2) 10Muehlenhoff: Disable peek crons [puppet] - 10https://gerrit.wikimedia.org/r/697628 [17:02:17] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] toolforge bastion: do not run wheelofmisfortune on buster yet [puppet] - 10https://gerrit.wikimedia.org/r/693485 (https://phabricator.wikimedia.org/T282949) (owner: 10Bstorm) [17:02:25] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/697628 (owner: 10Muehlenhoff) [17:03:00] (03PS3) 10Umherirrender: Add SpecialFewestrevisions to wgDisableQueryPageUpdate for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697151 (https://phabricator.wikimedia.org/T238199) [17:03:03] (03CR) 10Bstorm: [C: 03+2] toolforge bastion: do not run wheelofmisfortune on buster yet [puppet] - 10https://gerrit.wikimedia.org/r/693485 (https://phabricator.wikimedia.org/T282949) (owner: 10Bstorm) [17:03:06] (03CR) 10jerkins-bot: [V: 04-1] Disable peek crons [puppet] - 10https://gerrit.wikimedia.org/r/697628 (owner: 10Muehlenhoff) [17:04:03] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "Please collect +1 from Andrew as well. Even though the change seems to make sense, touching this makes me nervous, as hiera could fail in " [puppet] - 10https://gerrit.wikimedia.org/r/680266 (https://phabricator.wikimedia.org/T280324) (owner: 10David Caro) [17:04:07] (03CR) 10jerkins-bot: [V: 04-1] Add SpecialFewestrevisions to wgDisableQueryPageUpdate for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697151 (https://phabricator.wikimedia.org/T238199) (owner: 10Umherirrender) [17:05:39] 10Puppet, 10Patch-For-Review, 10User-jbond: Investigate hiera lookup dot notation - https://phabricator.wikimedia.org/T256221 (10jbond) 05Open→03Resolved a:03jbond [17:05:41] (03PS1) 10Ladsgroup: mailman: Absent mm2 script files [puppet] - 10https://gerrit.wikimedia.org/r/697631 (https://phabricator.wikimedia.org/T282303) [17:07:28] (03PS1) 10Ladsgroup: mailman: Drop mm2 scripts [puppet] - 10https://gerrit.wikimedia.org/r/697632 (https://phabricator.wikimedia.org/T282303) [17:08:34] 10SRE, 10ops-codfw, 10DC-Ops, 10Discovery-Search: hw troubleshooting: failure to power up for elastic2043.codfw.wmnet - https://phabricator.wikimedia.org/T281327 (10Papaul) I did perform the minimum to post as requested by Dell last week on the server, the server still wouldn't power on. I requested that a... [17:12:40] (03CR) 10David Caro: Route Grid engine web requests via Kubernetes (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/697096 (https://phabricator.wikimedia.org/T282975) (owner: 10Majavah) [17:13:03] (03PS2) 10Ladsgroup: mailman: Absent mm2 script files and their systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/697631 (https://phabricator.wikimedia.org/T282303) [17:13:05] (03PS2) 10Ladsgroup: mailman: Drop mm2 scripts [puppet] - 10https://gerrit.wikimedia.org/r/697632 (https://phabricator.wikimedia.org/T282303) [17:14:43] (03PS3) 10Ladsgroup: mailman: Drop mm2 scripts [puppet] - 10https://gerrit.wikimedia.org/r/697632 (https://phabricator.wikimedia.org/T282303) [17:14:54] RECOVERY - DPKG on bast4003 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [17:16:02] (03CR) 10Majavah: Route Grid engine web requests via Kubernetes (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/697096 (https://phabricator.wikimedia.org/T282975) (owner: 10Majavah) [17:23:42] !log starting deletion of mbox files on lists1001 for mailman2, first reading-web-team.mbox, then smallest lists (T282303) [17:23:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:48] T282303: The Great Clean Up of Mailman2 - https://phabricator.wikimedia.org/T282303 [17:34:24] (03PS3) 10Muehlenhoff: Disable peek crons [puppet] - 10https://gerrit.wikimedia.org/r/697628 (https://phabricator.wikimedia.org/T284090) [17:34:48] (03PS4) 10Ladsgroup: mailman: Drop mm2 scripts [puppet] - 10https://gerrit.wikimedia.org/r/697632 (https://phabricator.wikimedia.org/T282303) [17:34:50] (03PS1) 10Ladsgroup: mailman: Absent configuration files of mailman2 and make package absent [puppet] - 10https://gerrit.wikimedia.org/r/697634 (https://phabricator.wikimedia.org/T282303) [17:34:52] (03PS1) 10Ladsgroup: mailman: Drop absented files and packages [puppet] - 10https://gerrit.wikimedia.org/r/697635 (https://phabricator.wikimedia.org/T282303) [17:37:09] (03CR) 10jerkins-bot: [V: 04-1] mailman: Drop absented files and packages [puppet] - 10https://gerrit.wikimedia.org/r/697635 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [17:38:53] (03CR) 10Legoktm: [C: 04-1] mailman: Absent mm2 script files and their systemd timers (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/697631 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [17:39:42] (03CR) 10Legoktm: [C: 04-1] mailman: Drop mm2 scripts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/697632 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [17:40:28] 10SRE, 10ops-codfw, 10DC-Ops, 10fundraising-tech-ops: (Need By: TBD) rack/setup/install fran2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T282056 (10Jgreen) a:05Papaul→03Jgreen [17:41:19] (03CR) 10Legoktm: "I think we should disable it from apache before we can remove the package?" [puppet] - 10https://gerrit.wikimedia.org/r/697634 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [17:41:33] (03CR) 10Muehlenhoff: [C: 03+2] Disable peek crons [puppet] - 10https://gerrit.wikimedia.org/r/697628 (https://phabricator.wikimedia.org/T284090) (owner: 10Muehlenhoff) [17:43:11] 10SRE, 10ops-codfw, 10DC-Ops, 10fundraising-tech-ops: (Need By: TBD) rack/setup/install fran2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T282056 (10Jgreen) [17:43:56] 10SRE, 10ops-codfw, 10DC-Ops, 10fundraising-tech-ops: (Need By: TBD) rack/setup/install fran2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T282056 (10Jgreen) [17:44:03] (03PS3) 10Ladsgroup: mailman: Absent mm2 script files and their systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/697631 (https://phabricator.wikimedia.org/T282303) [17:44:05] (03PS5) 10Ladsgroup: mailman: Drop mm2 scripts [puppet] - 10https://gerrit.wikimedia.org/r/697632 (https://phabricator.wikimedia.org/T282303) [17:44:07] (03PS2) 10Ladsgroup: mailman: Absent configuration files of mailman2 and make package absent [puppet] - 10https://gerrit.wikimedia.org/r/697634 (https://phabricator.wikimedia.org/T282303) [17:44:09] (03PS2) 10Ladsgroup: mailman: Drop absented files and packages [puppet] - 10https://gerrit.wikimedia.org/r/697635 (https://phabricator.wikimedia.org/T282303) [17:44:11] (03CR) 10Ladsgroup: mailman: Absent mm2 script files and their systemd timers (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/697631 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [17:44:18] 10SRE, 10ops-codfw, 10DC-Ops, 10fundraising-tech-ops: (Need By: TBD) rack/setup/install fran2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T282056 (10Jgreen) [17:45:51] (03PS1) 10Ladsgroup: backup: Drop mm2 exclude backups [puppet] - 10https://gerrit.wikimedia.org/r/697637 (https://phabricator.wikimedia.org/T282303) [17:46:34] (03CR) 10jerkins-bot: [V: 04-1] mailman: Drop absented files and packages [puppet] - 10https://gerrit.wikimedia.org/r/697635 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [17:49:00] (03CR) 10Ladsgroup: mailman: Drop mm2 scripts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/697632 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [17:49:41] (03CR) 10Ladsgroup: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/697634 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [17:52:24] 10SRE, 10ops-codfw, 10DC-Ops, 10fundraising-tech-ops: (Need By: TBD) rack/setup/install fran2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T282056 (10Jgreen) [17:52:55] 10SRE, 10Security, 10User-jbond: Investigate potential issues with the sudoeres env_keep values - https://phabricator.wikimedia.org/T275852 (10faidon) [17:55:22] (03PS2) 10Ladsgroup: backup: Drop mm2 exclude backups [puppet] - 10https://gerrit.wikimedia.org/r/697637 (https://phabricator.wikimedia.org/T282303) [17:55:24] (03PS1) 10Ladsgroup: mailman: Drop cgi in apache and access to private/ [puppet] - 10https://gerrit.wikimedia.org/r/697638 (https://phabricator.wikimedia.org/T282303) [17:55:46] 10SRE, 10Security, 10User-jbond: Investigate potential issues with the sudoeres env_keep values - https://phabricator.wikimedia.org/T275852 (10faidon) If you're talking about my 2014 commit… if I recall correctly¹ this was in order to minimize changes between different distribution and enforce a unified poli... [17:55:47] loving seeing all these drom mm2 patches! [17:56:50] (03PS3) 10Ladsgroup: mailman: Drop absented files and packages [puppet] - 10https://gerrit.wikimedia.org/r/697635 (https://phabricator.wikimedia.org/T282303) [17:56:52] (03PS3) 10Ladsgroup: backup: Drop mm2 exclude backups [puppet] - 10https://gerrit.wikimedia.org/r/697637 (https://phabricator.wikimedia.org/T282303) [17:56:54] (03PS2) 10Ladsgroup: mailman: Drop cgi in apache and access to private/ [puppet] - 10https://gerrit.wikimedia.org/r/697638 (https://phabricator.wikimedia.org/T282303) [17:58:10] (03PS3) 10Ladsgroup: mailman: Drop cgi in apache and access to private/ [puppet] - 10https://gerrit.wikimedia.org/r/697638 (https://phabricator.wikimedia.org/T282303) [17:58:31] (03CR) 10Legoktm: [C: 03+2] mailman: Absent mm2 script files and their systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/697631 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [18:00:05] RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210601T1800). Please do the needful. [18:00:05] No GERRIT patches in the queue for this window AFAICS. [18:00:36] (03CR) 10Ladsgroup: "https://puppet-compiler.wmflabs.org/compiler1001/29770/lists1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/697638 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [18:03:21] (03PS4) 10Legoktm: mailman: Drop cgi in apache and access to private/ [puppet] - 10https://gerrit.wikimedia.org/r/697638 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [18:03:48] 10SRE, 10Wikimedia-Mailing-lists, 10Security, 10Upstream: Implement proper AAA for lists.wikimedia.org (mailman) - https://phabricator.wikimedia.org/T118641 (10Ladsgroup) 05Open→03Resolved Tentatively calling this done given that we now have mailman3, reopen if you think mailman3 doesn't satisfy AAA [18:03:52] (03PS6) 10Legoktm: mailman: Drop mm2 scripts [puppet] - 10https://gerrit.wikimedia.org/r/697632 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [18:05:33] (03CR) 10Legoktm: [C: 03+2] mailman: Drop mm2 scripts [puppet] - 10https://gerrit.wikimedia.org/r/697632 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [18:12:07] (03CR) 10Ladsgroup: [C: 03+1] "On polymorphic works just fine. public archives are accessible, privates give 404" [puppet] - 10https://gerrit.wikimedia.org/r/697638 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [18:13:12] (03CR) 10Legoktm: backup: Drop mm2 exclude backups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/697637 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [18:13:37] (03CR) 10Legoktm: [C: 03+2] mailman: Drop cgi in apache and access to private/ [puppet] - 10https://gerrit.wikimedia.org/r/697638 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [18:17:41] 10SRE, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: rack/setup/install fran1001 - https://phabricator.wikimedia.org/T245554 (10Jgreen) [18:21:55] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/682259 (https://phabricator.wikimedia.org/T127717) (owner: 10Southparkfan) [18:31:51] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install phab1004 (was: phab1002) - https://phabricator.wikimedia.org/T280540 (10Dzahn) thanks, Rob! the continuation of this will be T280597 [18:42:44] (03CR) 10Dzahn: "I don't see anything in this code I would complain about, so that's a ..eh . "soft +1" :). Did check what jenkins-bot / style check has to" [puppet] - 10https://gerrit.wikimedia.org/r/682259 (https://phabricator.wikimedia.org/T127717) (owner: 10Southparkfan) [18:45:23] 10SRE, 10ops-codfw, 10DC-Ops, 10fundraising-tech-ops: (Need By: TBD) rack/setup/install fran2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T282056 (10Jgreen) 05Resolved→03Open [18:47:36] 10SRE, 10ops-codfw, 10DC-Ops, 10fundraising-tech-ops: (Need By: TBD) rack/setup/install fran2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T282056 (10Jgreen) [18:47:55] (03PS1) 10Ottomata: airflow - add clean_logs wrapper script [puppet] - 10https://gerrit.wikimedia.org/r/697643 (https://phabricator.wikimedia.org/T272973) [18:48:15] (03PS2) 10Ottomata: airflow - add clean_logs wrapper script [puppet] - 10https://gerrit.wikimedia.org/r/697643 (https://phabricator.wikimedia.org/T272973) [18:48:46] (03CR) 10jerkins-bot: [V: 04-1] airflow - add clean_logs wrapper script [puppet] - 10https://gerrit.wikimedia.org/r/697643 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [18:51:40] 10SRE, 10vm-requests: Site: 1 VM request for an-airflow1002 - https://phabricator.wikimedia.org/T284104 (10razzi) [18:51:51] (03PS3) 10Ottomata: airflow - add clean_logs wrapper script [puppet] - 10https://gerrit.wikimedia.org/r/697643 (https://phabricator.wikimedia.org/T272973) [18:52:19] (03CR) 10jerkins-bot: [V: 04-1] airflow - add clean_logs wrapper script [puppet] - 10https://gerrit.wikimedia.org/r/697643 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [18:54:07] (03PS4) 10Ottomata: airflow - add clean_logs wrapper script [puppet] - 10https://gerrit.wikimedia.org/r/697643 (https://phabricator.wikimedia.org/T272973) [18:56:07] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29771/console" [puppet] - 10https://gerrit.wikimedia.org/r/697643 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [18:56:56] (03CR) 10Ottomata: [V: 03+1 C: 03+2] airflow - add clean_logs wrapper script [puppet] - 10https://gerrit.wikimedia.org/r/697643 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [19:00:06] (03PS1) 10Ottomata: airflow - add clean_logs.sh script [puppet] - 10https://gerrit.wikimedia.org/r/697644 (https://phabricator.wikimedia.org/T272973) [19:00:43] (03CR) 10Ottomata: [V: 03+2 C: 03+2] airflow - add clean_logs.sh script [puppet] - 10https://gerrit.wikimedia.org/r/697644 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [19:05:58] (03PS1) 10Ottomata: airflow - clean_logs bug fix [puppet] - 10https://gerrit.wikimedia.org/r/697646 [19:07:47] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install new linecards into routers - https://phabricator.wikimedia.org/T277339 (10Cmjohnson) @ayounsi These are on-site, are you around Thursday 3 June? [19:09:50] (03CR) 10Ottomata: [C: 03+2] airflow - clean_logs bug fix [puppet] - 10https://gerrit.wikimedia.org/r/697646 (owner: 10Ottomata) [19:21:42] 10SRE, 10ops-eqiad, 10Analytics-Radar: Try to move some new analytics worker nodes to different racks - https://phabricator.wikimedia.org/T276239 (10Cmjohnson) @Ottomata I cannot say for sure, I am getting new MW servers online. That will allow the current MW servers in A7 to be decomm'd and removed. I thin... [19:22:07] 10SRE, 10Wikimedia-Mailing-lists, 10Accessibility, 10User-Ladsgroup: Pipermail uses background color without foreground colors - https://phabricator.wikimedia.org/T190061 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup Closing this as mm2 is now shut down, reopen if the issue persists in mm3 (I checked... [19:29:45] 10SRE, 10ops-eqiad, 10DC-Ops, 10decommission-hardware: decommission db1087.eqiad.wmnet - https://phabricator.wikimedia.org/T282093 (10Cmjohnson) [19:29:54] 10SRE, 10ops-eqiad, 10DC-Ops, 10decommission-hardware: decommission db1087.eqiad.wmnet - https://phabricator.wikimedia.org/T282093 (10Cmjohnson) 05Open→03Resolved [19:30:01] 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [19:31:17] 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T284036 (10Cmjohnson) @wiki_willy @Andrew @Bstorm This server is out of warranty. Do you want to purchase a new disk? [19:32:34] 10SRE, 10ops-codfw, 10ops-eqiad, 10DC-Ops: Dc-Ops Commands for Cumin - https://phabricator.wikimedia.org/T279721 (10Cmjohnson) @MoritzMuehlenhoff Do we need to keep this task open any longer? [19:32:39] 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T284036 (10Bstorm) Please see {T281045} that's the disk that I believe we ordered there. It must have finally failed out :) [19:33:28] 10SRE, 10ops-eqiad, 10DC-Ops: ps1-a7-eqiad power over threshold alerts - https://phabricator.wikimedia.org/T276743 (10Cmjohnson) @wiki_willy, I'd imagine that once we start decoming the mw servers in the rack that the issue will self resolve. I do not think there is any need to keep this task open. Do you? [19:39:04] 10SRE, 10vm-requests: Site: 1 VM request for an-airflow1002 - https://phabricator.wikimedia.org/T284104 (10Ottomata) 05Open→03Declined After discussing this more, we are going to use an-launcher1002 for airflow-analytics. We'll need VMs for airflow instances for other teams eventually. [19:40:52] 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T284036 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson okay, thanks. I close this ticket. I'll be at data center tomorrow and will look for the new disk. [19:41:35] (03PS1) 10QChris: Add .gitreview [software/bernard] - 10https://gerrit.wikimedia.org/r/697648 [19:41:37] (03CR) 10QChris: [V: 03+2 C: 03+2] Add .gitreview [software/bernard] - 10https://gerrit.wikimedia.org/r/697648 (owner: 10QChris) [19:43:18] 10SRE, 10Wikimedia-Hackathon-2021, 10Wikimedia-Mailing-lists, 10Upstream: Add OAuth login to mailman for accessing list memberships/archive viewing - https://phabricator.wikimedia.org/T249678 (10Legoktm) I uploaded python3-django-allauth 0.44.0+ds-1~bpo10+1 to apt.wm.o yesterday, it's a backport of the 0.4... [19:48:35] (03CR) 10Eevans: [C: 03+1] cassandra: drop support for 2.1 in metrics. Fix collector version [puppet] - 10https://gerrit.wikimedia.org/r/696399 (https://phabricator.wikimedia.org/T275353) (owner: 10Hnowlan) [19:49:21] 10SRE, 10ops-eqiad, 10DC-Ops: ps1-a7-eqiad power over threshold alerts - https://phabricator.wikimedia.org/T276743 (10wiki_willy) Hi @Cmjohnson - it's going to keep alerting, until the mw servers are decommissioned, so might as well leave it open until then. Thanks, Willy [19:53:59] 10SRE, 10ops-eqiad, 10Traffic: cp1087 powercycled - https://phabricator.wikimedia.org/T278729 (10Cmjohnson) @ema Looks to be a DIMM issue, submitted a ticket to Dell You have successfully submitted request SR1061284651. [20:03:46] (03PS1) 10Ottomata: Set up airflow-analytics on an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/697653 (https://phabricator.wikimedia.org/T272973) [20:04:43] (03CR) 10Ottomata: [C: 04-1] "Need to talk about how db replication works first." [puppet] - 10https://gerrit.wikimedia.org/r/697653 (https://phabricator.wikimedia.org/T272973) (owner: 10Ottomata) [20:05:24] (03PS2) 10Ottomata: Set up airflow-analytics on an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/697653 (https://phabricator.wikimedia.org/T272973) [20:07:23] (03CR) 10Umherirrender: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697151 (https://phabricator.wikimedia.org/T238199) (owner: 10Umherirrender) [20:11:06] (03PS1) 10Dzahn: greatly simplify httpd.conf [container/miscweb] - 10https://gerrit.wikimedia.org/r/697654 (https://phabricator.wikimedia.org/T281538) [20:13:51] (03PS1) 10Dzahn: install vim in container for debugging [container/miscweb] - 10https://gerrit.wikimedia.org/r/697655 (https://phabricator.wikimedia.org/T281538) [20:18:15] (03PS2) 10Jforrester: Add wikifunctions.org [dns] - 10https://gerrit.wikimedia.org/r/677626 (https://phabricator.wikimedia.org/T275904) [20:19:06] shdubsh: Hey, as you're on clinic duty, I'm not sure what the next steps for https://gerrit.wikimedia.org/r/677626 are. Do you know? [20:24:53] (03CR) 10Dzahn: [C: 03+2] greatly simplify httpd.conf [container/miscweb] - 10https://gerrit.wikimedia.org/r/697654 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:25:20] (03CR) 10Dzahn: [C: 03+2] install vim in container for debugging [container/miscweb] - 10https://gerrit.wikimedia.org/r/697655 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:26:17] (03Merged) 10jenkins-bot: install vim in container for debugging [container/miscweb] - 10https://gerrit.wikimedia.org/r/697655 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:30:32] James_F: Not sure. Probably just needs a merge. [20:31:14] (03PS1) 10Dzahn: pipeline: drop test variant, prod and staging is enough [container/miscweb] - 10https://gerrit.wikimedia.org/r/697657 (https://phabricator.wikimedia.org/T281538) [20:31:30] (03CR) 10jerkins-bot: [V: 04-1] pipeline: drop test variant, prod and staging is enough [container/miscweb] - 10https://gerrit.wikimedia.org/r/697657 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:31:42] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install thumbor100[56] - https://phabricator.wikimedia.org/T273914 (10Jclark-ctr) [20:32:39] shdubsh: Right. Is there a proper process for this kind of thing? [20:33:04] (03CR) 10Legoktm: "We should also check that the docker pull/push protocol doesn't try to make a request with a : in it..." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/695598 (https://phabricator.wikimedia.org/T283764) (owner: 10Dzahn) [20:33:10] (03Abandoned) 10Dzahn: pipeline: drop test variant, prod and staging is enough [container/miscweb] - 10https://gerrit.wikimedia.org/r/697657 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:36:05] (03CR) 10Dzahn: "ACK, should not have /wikimedia/ in the URL then. We'll need to setup a container or VM with nginx to actually test with httpbb" [puppet] - 10https://gerrit.wikimedia.org/r/695598 (https://phabricator.wikimedia.org/T283764) (owner: 10Dzahn) [20:38:27] James_F: not much direction given in the wikitech:DNS page. Checked out everything I could around it. LGTM [20:38:57] Yeah. Thanks! [20:39:32] (03CR) 10Cwhite: [C: 03+2] Add wikifunctions.org [dns] - 10https://gerrit.wikimedia.org/r/677626 (https://phabricator.wikimedia.org/T275904) (owner: 10Jforrester) [20:39:54] Ooh. Gosh. Thanks. [20:44:37] James_F: change deployed. Nameservers are reporting www.wikifunctions.org -> dyna.wikimedia.org [20:45:20] shdubsh: Thanks! [20:45:49] (03CR) 10Dzahn: [C: 04-2] docker_registry_ha: add nginx rewrite for URLs with tags [puppet] - 10https://gerrit.wikimedia.org/r/695598 (https://phabricator.wikimedia.org/T283764) (owner: 10Dzahn) [20:48:41] (03CR) 10Andrew Bogott: "I'm in favor of this, but would also like to see some pcc tests beforehand (especially in, but not limited to, toolforge and deployment-pr" [puppet] - 10https://gerrit.wikimedia.org/r/680266 (https://phabricator.wikimedia.org/T280324) (owner: 10David Caro) [20:54:54] (03PS1) 10Dzahn: blubber: set runs: { insecurely: false } [container/miscweb] - 10https://gerrit.wikimedia.org/r/697662 (https://phabricator.wikimedia.org/T281538) [20:55:00] (03PS7) 10Cwhite: logstash: add openstack ECS transition config and tests [puppet] - 10https://gerrit.wikimedia.org/r/689262 (https://phabricator.wikimedia.org/T234565) [20:56:11] (03CR) 10Dzahn: [C: 03+2] blubber: set runs: { insecurely: false } [container/miscweb] - 10https://gerrit.wikimedia.org/r/697662 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [20:56:19] (03PS2) 10Dzahn: blubber: set runs: { insecurely: false } [container/miscweb] - 10https://gerrit.wikimedia.org/r/697662 (https://phabricator.wikimedia.org/T281538) [20:57:07] (03CR) 10Cwhite: [C: 03+2] logstash: add openstack ECS transition config and tests [puppet] - 10https://gerrit.wikimedia.org/r/689262 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [20:57:44] (03CR) 10Bstorm: "One thing we do at the front proxy level is ip address filtering. If we move to k8s service objects, we will be losing that unless we impl" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/697096 (https://phabricator.wikimedia.org/T282975) (owner: 10Majavah) [20:59:14] (03PS1) 10Dzahn: blubber: simplify path to HTML doc roots [container/miscweb] - 10https://gerrit.wikimedia.org/r/697663 (https://phabricator.wikimedia.org/T281538) [21:02:47] (03CR) 10Bstorm: "Another thing that's in the way is https://gerrit.wikimedia.org/g/operations/software/tools-manifest/+/refs/heads/master" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/697096 (https://phabricator.wikimedia.org/T282975) (owner: 10Majavah) [21:03:24] 10SRE, 10Wikimedia-Mailing-lists: "The FOO list has N moderation requests waiting." notifications can't be turned off in Mailman 3 - https://phabricator.wikimedia.org/T284107 (10colewhite) p:05Triage→03Medium [21:04:31] 10SRE, 10SRE-Access-Requests: Superset Access for Cooltey Feng - https://phabricator.wikimedia.org/T283189 (10colewhite) p:05Triage→03Medium [21:04:34] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install thumbor100[56] - https://phabricator.wikimedia.org/T273914 (10Jclark-ctr) [21:04:51] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install thumbor100[56] - https://phabricator.wikimedia.org/T273914 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson thumbor1005 A6. U41 port21 id#1837 thumbor1006 C6 U35 port33 id#3255 [21:05:10] 10SRE, 10LDAP-Access-Requests: Grant Access to Superset/Turnilo for Kgordon - https://phabricator.wikimedia.org/T283057 (10colewhite) p:05Triage→03Medium [21:05:58] 10SRE, 10SRE-Access-Requests: Allow JStephenson to access Superset - https://phabricator.wikimedia.org/T282515 (10colewhite) p:05Triage→03Medium [21:06:15] 10SRE, 10SRE-Access-Requests: Allow JStephenson to access Superset - https://phabricator.wikimedia.org/T282515 (10colewhite) 05Stalled→03Open [21:07:52] (03PS2) 10Dzahn: blubber: simplify path to HTML doc roots [container/miscweb] - 10https://gerrit.wikimedia.org/r/697663 (https://phabricator.wikimedia.org/T281538) [21:09:36] (03PS1) 10Dzahn: blubber: install curl in container for testing httpd [container/miscweb] - 10https://gerrit.wikimedia.org/r/697666 (https://phabricator.wikimedia.org/T281538) [21:09:49] !log dropping a bunch of tables from the labswiki db as per T284108 [21:09:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:09:54] T284108: Bring labswiki database tables up to date - https://phabricator.wikimedia.org/T284108 [21:10:25] (03CR) 10Bstorm: "If this was completed and the kinks sorted out with webservicemonitor, actually, we could merge before moving the functionality of ip bloc" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/697096 (https://phabricator.wikimedia.org/T282975) (owner: 10Majavah) [21:13:23] (03PS3) 10Dzahn: blubber: simplify path to HTML doc roots [container/miscweb] - 10https://gerrit.wikimedia.org/r/697663 (https://phabricator.wikimedia.org/T281538) [21:13:52] (03CR) 10Bstorm: Replace os.execv with subprocess.check_call (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/697102 (owner: 10Majavah) [21:14:59] (03PS4) 10Dzahn: blubber: simplify path to HTML doc roots [container/miscweb] - 10https://gerrit.wikimedia.org/r/697663 (https://phabricator.wikimedia.org/T281538) [21:16:16] (03CR) 10Dzahn: [C: 03+2] blubber: simplify path to HTML doc roots [container/miscweb] - 10https://gerrit.wikimedia.org/r/697663 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:16:34] (03PS2) 10Dzahn: blubber: install curl in container for testing httpd [container/miscweb] - 10https://gerrit.wikimedia.org/r/697666 (https://phabricator.wikimedia.org/T281538) [21:17:47] (03PS2) 10Ebernhardson: mjolnir bulk daemon: Add topic for hourly updates [puppet] - 10https://gerrit.wikimedia.org/r/693205 (https://phabricator.wikimedia.org/T261407) [21:17:59] (03Merged) 10jenkins-bot: blubber: simplify path to HTML doc roots [container/miscweb] - 10https://gerrit.wikimedia.org/r/697663 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:18:06] (03CR) 10Dzahn: [C: 03+2] blubber: install curl in container for testing httpd [container/miscweb] - 10https://gerrit.wikimedia.org/r/697666 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:18:32] 10SRE, 10SRE-Access-Requests: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10colewhite) [21:18:44] 10SRE, 10SRE-Access-Requests: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10colewhite) a:03colewhite [21:19:07] (03Merged) 10jenkins-bot: blubber: install curl in container for testing httpd [container/miscweb] - 10https://gerrit.wikimedia.org/r/697666 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [21:19:39] (03CR) 10Bstorm: Replace os.execv with subprocess.check_call (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/697102 (owner: 10Majavah) [21:27:15] (03PS1) 10Ladsgroup: Fix pageterms API call for Special:Nearby in Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697671 (https://phabricator.wikimedia.org/T281639) [21:31:56] (03PS2) 10Ladsgroup: Fix pageterms API call for Special:Nearby in Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697671 (https://phabricator.wikimedia.org/T281639) [21:33:01] I quickly deploy this ^ [21:33:54] (03CR) 10Ladsgroup: [C: 03+2] Fix pageterms API call for Special:Nearby in Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697671 (https://phabricator.wikimedia.org/T281639) (owner: 10Ladsgroup) [21:34:17] (03CR) 10Legoktm: [C: 04-1] backup: Drop mm2 exclude backups [puppet] - 10https://gerrit.wikimedia.org/r/697637 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [21:34:38] (03Merged) 10jenkins-bot: Fix pageterms API call for Special:Nearby in Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/697671 (https://phabricator.wikimedia.org/T281639) (owner: 10Ladsgroup) [21:36:38] (03PS3) 10Legoktm: mailman: Absent configuration files of mailman2 and make package absent [puppet] - 10https://gerrit.wikimedia.org/r/697634 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [21:37:37] 10SRE, 10Services, 10Patch-For-Review, 10Service-deployment-requests: New Service Request Shellbox - https://phabricator.wikimedia.org/T281423 (10Legoktm) ` legoktm@deploy1002:~$ curl https://staging.svc.eqiad.wmnet:4008/index.php File not found. legoktm@deploy1002:~$ curl https://staging.svc.eqiad.wmnet:4... [21:39:28] (03CR) 10Legoktm: [C: 03+2] mailman: Absent configuration files of mailman2 and make package absent [puppet] - 10https://gerrit.wikimedia.org/r/697634 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [21:39:42] (03PS1) 10Jforrester: Provide nodejs12-slim and -devel based on Bullseye [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/697672 [21:45:06] 10SRE, 10SRE-Access-Requests: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10colewhite) a:05colewhite→03Htriedman Hi @Htriedman! A few things for you: # Have a peek at https://wikitech.wikimedia.org/wiki/Analytics/Data_access#User_re... [21:45:53] 10SRE, 10SRE-Access-Requests: Access request for Hal Tredman - https://phabricator.wikimedia.org/T283351 (10colewhite) [21:45:56] 10SRE, 10SRE-Access-Requests: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10colewhite) [21:47:31] 10SRE, 10SRE-Access-Requests: Access request for Hal Tredman - https://phabricator.wikimedia.org/T283351 (10colewhite) 05Open→03Declined I think this is a duplicate of the parent task. Please reopen if this task is unrelated to T283368. [21:47:34] 10SRE, 10SRE-Access-Requests: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10colewhite) [21:50:52] 10SRE, 10Traffic, 10netops, 10User-jbond: varnish filtering: should we automatically update public_cloud_nets - https://phabricator.wikimedia.org/T270391 (10Volans) >>! In T270391#7126325, @cmooney wrote: > I'm not sure if Netbox is the right place to *store* this data, but happy to discuss. You folk know... [21:51:39] 10SRE, 10SRE-Access-Requests: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10Ottomata) Approved! @colewhite, this is full ssh + kerberos access: https://wikitech.wikimedia.org/wiki/Analytics/Data_access#All_of_the_above Thank you! [21:54:00] (03CR) 10Volans: IDM: create new idm library with logoutd base class (034 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [21:54:31] (03Abandoned) 10Gergő Tisza: Fix Ie9a1018c198 for external cluster [extensions/GrowthExperiments] (wmf/1.37.0-wmf.7) - 10https://gerrit.wikimedia.org/r/695844 (https://phabricator.wikimedia.org/T283606) (owner: 10Gergő Tisza) [21:54:45] (03Abandoned) 10Gergő Tisza: Fix Ie9a1018c198 for external cluster [extensions/GrowthExperiments] (wmf/1.37.0-wmf.6) - 10https://gerrit.wikimedia.org/r/695843 (https://phabricator.wikimedia.org/T283606) (owner: 10Gergő Tisza) [21:54:58] (03Abandoned) 10Gergő Tisza: fixLinkRecommendationData.php: also fix search index for old DB entries [extensions/GrowthExperiments] (wmf/1.37.0-wmf.6) - 10https://gerrit.wikimedia.org/r/695838 (https://phabricator.wikimedia.org/T283606) (owner: 10Gergő Tisza) [21:55:01] (03Abandoned) 10Gergő Tisza: fixLinkRecommendationData.php: also fix search index for old DB entries [extensions/GrowthExperiments] (wmf/1.37.0-wmf.7) - 10https://gerrit.wikimedia.org/r/695837 (https://phabricator.wikimedia.org/T283606) (owner: 10Gergő Tisza) [22:09:05] 10SRE, 10Analytics, 10Analytics-Kanban, 10Product-Analytics, and 2 others: Requesting access to analytics-privatedata-users for schoenbaechler - https://phabricator.wikimedia.org/T283190 (10colewhite) a:05Marostegui→03schoenbaechler [22:09:33] 10SRE, 10Analytics, 10SRE-Access-Requests: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10colewhite) [22:12:15] (03PS1) 10Cwhite: admin: add htriedman to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/697677 (https://phabricator.wikimedia.org/T283368) [22:15:13] 10SRE, 10Wikimedia-Mailing-lists: Figure out future of arbcom-l archives - https://phabricator.wikimedia.org/T281328 (10Legoktm) 05Stalled→03Resolved The arbcom-l archives are no longer accessible via the internet (MM2 private archives CGI is gone and it is not imported into hyperkitty) but they'll stay on... [22:19:00] 10SRE, 10SRE-Access-Requests: Allow JStephenson to access Superset - https://phabricator.wikimedia.org/T282515 (10colewhite) a:03colewhite @JStephenson, checking in to see if this task is complete. Were you able to get access to SuperSet? [22:24:09] (03PS1) 10Cwhite: admin: add cooltey to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/697679 (https://phabricator.wikimedia.org/T283189) [22:25:24] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Superset Access for Cooltey Feng - https://phabricator.wikimedia.org/T283189 (10colewhite) a:03colewhite [22:28:20] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 102 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [22:29:22] 10SRE, 10Analytics, 10SRE-Access-Requests: Requesting access to SUPERSET for CMADEO - https://phabricator.wikimedia.org/T284109 (10colewhite) [22:32:34] (03PS1) 10Cwhite: admin: add cmadeo to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/697682 (https://phabricator.wikimedia.org/T284109) [22:33:12] (03CR) 10jerkins-bot: [V: 04-1] admin: add cmadeo to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/697682 (https://phabricator.wikimedia.org/T284109) (owner: 10Cwhite) [22:34:17] 10SRE, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to SUPERSET for CMADEO - https://phabricator.wikimedia.org/T284109 (10colewhite) p:05Triage→03Medium a:03colewhite Hi @cmadeo! We'll need a note of approval from your manager and @Ottomata for inclusion to analytics-pri... [22:37:20] (03PS2) 10Cwhite: admin: add cmadeo to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/697682 (https://phabricator.wikimedia.org/T284109) [22:43:42] 10SRE, 10Gerrit, 10LDAP-Access-Requests: Add dancy to `archiva-deployers` LDAP group - https://phabricator.wikimedia.org/T283347 (10colewhite) 05Open→03Stalled a:03colewhite @thcipriani, do you approve this request? [22:44:50] 10SRE, 10Gerrit, 10LDAP-Access-Requests: Add dancy to `archiva-deployers` LDAP group - https://phabricator.wikimedia.org/T283347 (10thcipriani) >>! In T283347#7127854, @colewhite wrote: > @thcipriani, do you approve this request? Sorry I missed this one! Approved! [22:49:05] (03PS1) 10Cwhite: admin: add kgordon to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/697684 (https://phabricator.wikimedia.org/T283057) [22:52:28] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to Superset/Turnilo for Kgordon - https://phabricator.wikimedia.org/T283057 (10colewhite) [22:55:48] 10SRE, 10Analytics, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to Superset/Turnilo for Kgordon - https://phabricator.wikimedia.org/T283057 (10colewhite) [22:56:54] 10SRE, 10Analytics, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to Superset/Turnilo for Kgordon - https://phabricator.wikimedia.org/T283057 (10colewhite) Hi @Kgordon! We just need approval from @Ottomata for Turnilo/Superset (https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Dashboar... [23:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Evening backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210601T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:00:06] 10SRE, 10Gerrit, 10LDAP-Access-Requests: Add dancy to `archiva-deployers` LDAP group - https://phabricator.wikimedia.org/T283347 (10colewhite) 05Stalled→03Resolved `uid=dancy` added to `archiva-deployers` [23:11:35] 10SRE, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10Izno) [23:11:40] 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Radar): Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Izno) [23:12:23] 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Radar): Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Izno) [23:13:23] 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Radar): Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Izno) [23:15:34] 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Radar): Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Izno) [23:16:19] 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Radar): Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Izno) [23:17:43] 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Radar): Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Izno) [23:35:34] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /_info (retrieve service info) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [23:37:20] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [23:44:15] (03PS1) 10Dzahn: blubber: use a single html dir for both staging and prod variants [container/miscweb] - 10https://gerrit.wikimedia.org/r/697691 (https://phabricator.wikimedia.org/T281538) [23:47:08] (03PS2) 10Dzahn: blubber: use a single html dir for both staging and prod variants [container/miscweb] - 10https://gerrit.wikimedia.org/r/697691 (https://phabricator.wikimedia.org/T281538) [23:49:13] (03PS3) 10Dzahn: blubber: use a single html dir for both staging and prod variants [container/miscweb] - 10https://gerrit.wikimedia.org/r/697691 (https://phabricator.wikimedia.org/T281538) [23:51:06] (03CR) 10Dzahn: [C: 03+2] blubber: use a single html dir for both staging and prod variants [container/miscweb] - 10https://gerrit.wikimedia.org/r/697691 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [23:52:40] (03Merged) 10jenkins-bot: blubber: use a single html dir for both staging and prod variants [container/miscweb] - 10https://gerrit.wikimedia.org/r/697691 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn)