[00:11:09] RECOVERY - SSH on bast5001.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [00:11:45] (03PS1) 10Legoktm: Add framework for setting $wgLocalHTTPProxy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731861 [00:12:29] (03PS2) 10Legoktm: Add framework for setting $wgLocalHTTPProxy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731861 (https://phabricator.wikimedia.org/T288848) [00:12:31] (03PS1) 10Legoktm: Enable $wgLocalHTTPProxy on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731862 (https://phabricator.wikimedia.org/T288848) [00:13:23] (03CR) 10jerkins-bot: [V: 04-1] Add framework for setting $wgLocalHTTPProxy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731861 (https://phabricator.wikimedia.org/T288848) (owner: 10Legoktm) [00:13:36] (03CR) 10jerkins-bot: [V: 04-1] Enable $wgLocalHTTPProxy on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731862 (https://phabricator.wikimedia.org/T288848) (owner: 10Legoktm) [00:14:59] (03PS3) 10Legoktm: Add framework for setting $wgLocalHTTPProxy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731861 (https://phabricator.wikimedia.org/T288848) [00:15:01] (03PS2) 10Legoktm: Enable $wgLocalHTTPProxy on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731862 (https://phabricator.wikimedia.org/T288848) [00:16:30] (03PS4) 10Legoktm: Add framework for setting $wgLocalHTTPProxy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731861 (https://phabricator.wikimedia.org/T288848) [00:16:32] (03PS3) 10Legoktm: Enable $wgLocalHTTPProxy on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731862 (https://phabricator.wikimedia.org/T288848) [00:30:51] PROBLEM - Check systemd state on dns2001 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_systemd-timesyncd.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:34:04] (03PS1) 10Krinkle: [Beta Cluster] mc-labs.php: Remove onHostRoutingPrefix for WAN cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731817 (https://phabricator.wikimedia.org/T264604) [00:34:14] (03CR) 10jerkins-bot: [V: 04-1] [Beta Cluster] mc-labs.php: Remove onHostRoutingPrefix for WAN cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731817 (https://phabricator.wikimedia.org/T264604) (owner: 10Krinkle) [00:37:14] (03CR) 10Ryan Kemper: [C: 03+2] wdqs: enable the streaming updater on wdqs1007 [puppet] - 10https://gerrit.wikimedia.org/r/730820 (https://phabricator.wikimedia.org/T288231) (owner: 10DCausse) [00:38:55] !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer [00:38:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:48:09] @James_F This patch should hopefully solve the UBN - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaSearch/+/730208 [01:32:36] Seddon: each bug needs to be on its own line in the commit message [01:33:16] That's a pretty big patch to fix a UBN though...what's the issue? [01:34:45] I found https://phabricator.wikimedia.org/T291392 [01:36:44] @legoktm[m]: Simone has been refactoring a bunch of stuff. Something broke back in September and was going to be fixed in the next phase. That was originally going to get deployed last week but got reverted as it resulted in a breakage for the user. [01:38:24] So I'm not exactly following, is something actively broken now? [01:40:56] @legoktm[m] its throwing a lot of production errors but nothing major for the user [01:41:21] But its been do that for three weeks [01:41:34] doing that for four weeks* [01:42:10] IMO if it's a UBN the patch to fix it should be relatively targeted and small enough to just fix the issue, not a big refactor [01:42:29] I guess it's not really a UBN then if it's been going on for a month [02:00:05] Deploy window Branching MediaWiki, extensions, skins, and vendor – See Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T0200) [02:06:49] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:06:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:06:56] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.38.0-wmf.5 [core] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731865 [02:06:58] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.38.0-wmf.5 [core] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731865 (owner: 10TrainBranchBot) [02:08:14] Have any of the developers seen the following error happen? Make "Draft" an alias for "Nacrt" and the page in Draft won't be automatically moved to "Nacrt", but with alias working [02:09:43] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:09:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:11:24] Someone developer online? [02:16:24] PROBLEM - SSH on bast5002 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [02:17:56] @hashar I've updated the train blocked ticket [02:18:28] RECOVERY - SSH on bast5002 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [02:18:49] !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [02:18:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:20:12] PROBLEM - WDQS high update lag on wdqs1003 is CRITICAL: 5.287e+06 ge 3.6e+06 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [02:20:40] (03CR) 10Ryan Kemper: [C: 03+2] wdqs: enable the streaming updater on wdqs1012 [puppet] - 10https://gerrit.wikimedia.org/r/730821 (https://phabricator.wikimedia.org/T288231) (owner: 10DCausse) [02:21:58] !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer [02:22:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:26:03] (03Merged) 10jenkins-bot: Branch commit for wmf/1.38.0-wmf.5 [core] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731865 (owner: 10TrainBranchBot) [02:27:13] legoktm[m]: based on hashar proceeding with the train and that it's been present for a month, I've bumped the task down from UBN [02:34:00] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:34:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:34:11] Have any of the developers seen the following error happen? Make "Draft" an alias for "Nacrt" and the page in Draft won't be automatically moved to "Nacrt", but with alias working [02:36:46] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:36:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:39:15] Juan_90264: This channel isn't the right channel for that question [02:39:23] #wikimedia-tech [02:43:00] Thanks Seddon [02:46:20] PROBLEM - Check systemd state on dns4002 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_systemd-timesyncd.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:47:11] (03PS7) 10Juan90264: Repair the size of the logo of Kashmiri Wikipedia and Kashmiri Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731231 (https://phabricator.wikimedia.org/T293373) [03:49:04] !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [03:49:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:49:48] PROBLEM - WDQS high update lag on wdqs1003 is CRITICAL: 5.197e+06 ge 3.6e+06 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [03:55:39] RECOVERY - WDQS high update lag on wdqs1003 is OK: (C)3.6e+06 ge (W)1.2e+06 ge 6.849e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [03:58:32] (03PS8) 10Juan90264: Repair the size of the logo of Kashmiri Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731231 (https://phabricator.wikimedia.org/T293342) [04:01:42] (03PS9) 10Juan90264: Repair the size of the logo of Kashmiri Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731231 (https://phabricator.wikimedia.org/T293342) [04:02:44] (03CR) 10Juan90264: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731231 (https://phabricator.wikimedia.org/T293342) (owner: 10Juan90264) [04:24:02] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event_sanitized_analytics_delayed.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:30:44] (03CR) 10Ryan Kemper: [C: 03+2] wdqs: enable the streaming updater on wdqs1013 [puppet] - 10https://gerrit.wikimedia.org/r/730822 (https://phabricator.wikimedia.org/T288231) (owner: 10DCausse) [04:36:06] !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer [04:36:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:37:32] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=jmx_wdqs_updater site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [04:39:21] 10SRE, 10DBA, 10observability, 10Sustainability (Incident Followup): Monitor/dashboard number of queries killed by the automatic query killer - https://phabricator.wikimedia.org/T293531 (10Marostegui) >>! In T293531#7437575, @Legoktm wrote: > Maybe I misread events_coredb_slave.sql, but it looked to me lik... [04:55:11] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [04:56:33] RECOVERY - MD RAID on labweb1002 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [05:03:07] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=jmx_wdqs_updater site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [05:05:45] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [05:40:51] PROBLEM - Check systemd state on dns2002 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_systemd-timesyncd.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:45:51] (03PS1) 10Marostegui: db2112: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/731872 (https://phabricator.wikimedia.org/T290865) [05:46:36] (03CR) 10Marostegui: [C: 03+2] db2112: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/731872 (https://phabricator.wikimedia.org/T290865) (owner: 10Marostegui) [05:46:56] !log Reimage db2112 (s1 codfw master) T290865 [05:47:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:47:04] T290865: Upgrade s1 to Buster + MariaDB 10.4 - https://phabricator.wikimedia.org/T290865 [05:51:36] !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster [05:51:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:52:36] (03PS4) 10Marostegui: dbbackups: Switch s1 backup generation from db2097 to db2141 [puppet] - 10https://gerrit.wikimedia.org/r/721285 (https://phabricator.wikimedia.org/T290865) (owner: 10Jcrespo) [05:54:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1184 for upgrade', diff saved to https://phabricator.wikimedia.org/P17504 and previous config saved to /var/cache/conftool/dbconfig/20211019-055429-marostegui.json [05:54:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:00:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17505 and previous config saved to /var/cache/conftool/dbconfig/20211019-060001-root.json [06:00:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1178 for upgrade', diff saved to https://phabricator.wikimedia.org/P17506 and previous config saved to /var/cache/conftool/dbconfig/20211019-060123-marostegui.json [06:01:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:03:52] !log Upgrade db1184, db1178 [06:03:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:34] !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [06:04:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:05:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17507 and previous config saved to /var/cache/conftool/dbconfig/20211019-060551-root.json [06:05:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:31] !log Upgrade dbstore1005 [06:06:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:11:34] (03PS1) 10Elukey: profile::systemd::timesyncd: extend ensure to to auto_restarts [puppet] - 10https://gerrit.wikimedia.org/r/731873 [06:13:06] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31750/console" [puppet] - 10https://gerrit.wikimedia.org/r/731873 (owner: 10Elukey) [06:15:05] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17508 and previous config saved to /var/cache/conftool/dbconfig/20211019-061505-root.json [06:15:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:18:53] (03CR) 10Elukey: [C: 03+1] p::s::timesyncd: use ensure in auto_restarts [puppet] - 10https://gerrit.wikimedia.org/r/731843 (owner: 10BBlack) [06:19:24] (03Abandoned) 10Elukey: profile::systemd::timesyncd: extend ensure to to auto_restarts [puppet] - 10https://gerrit.wikimedia.org/r/731873 (owner: 10Elukey) [06:20:56] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17509 and previous config saved to /var/cache/conftool/dbconfig/20211019-062054-root.json [06:21:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:26:04] 10SRE, 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10elukey) Thanks for all the work John! >>! In T291905#7435523, @jbond wrote: >>>! In T291905#7431136, @elukey wrote: >> Let me know y... [06:30:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17510 and previous config saved to /var/cache/conftool/dbconfig/20211019-063008-root.json [06:30:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:31:33] (03CR) 10Elukey: api-gateway: allow HTTP host header rewrite for discovery endpoints (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/730966 (https://phabricator.wikimedia.org/T288789) (owner: 10Elukey) [06:36:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17511 and previous config saved to /var/cache/conftool/dbconfig/20211019-063559-root.json [06:36:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:38:18] 10SRE, 10ops-eqiad, 10Infrastructure-Foundations, 10netops: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10ayounsi) [06:38:21] !log marostegui@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster [06:38:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:45:13] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17512 and previous config saved to /var/cache/conftool/dbconfig/20211019-064512-root.json [06:45:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:45:51] (03CR) 10Ayounsi: [C: 03+1] add centrallog2002 to codfw anycast_neighbors and syslog fw allows [homer/public] - 10https://gerrit.wikimedia.org/r/731828 (https://phabricator.wikimedia.org/T292196) (owner: 10Herron) [06:51:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17513 and previous config saved to /var/cache/conftool/dbconfig/20211019-065104-root.json [06:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:33] 10SRE, 10LDAP-Access-Requests, 10WMF-NDA-Requests: When WMF staff requests to be added to ldap/wmf, also add their Phabricator account to #WMF-NDA - https://phabricator.wikimedia.org/T290605 (10Aklapper) Thanks a lot. [07:00:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17514 and previous config saved to /var/cache/conftool/dbconfig/20211019-070016-root.json [07:00:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:02:06] (03PS1) 10Elukey: knative-serving: add prometheus annotations [deployment-charts] - 10https://gerrit.wikimedia.org/r/731875 (https://phabricator.wikimedia.org/T289841) [07:04:42] (03CR) 10AOkoth: [C: 03+1] "Looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/731849 (https://phabricator.wikimedia.org/T293696) (owner: 10Brennen Bearnes) [07:06:08] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17515 and previous config saved to /var/cache/conftool/dbconfig/20211019-070607-root.json [07:06:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17516 and previous config saved to /var/cache/conftool/dbconfig/20211019-071519-root.json [07:15:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:16:35] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/731843 (owner: 10BBlack) [07:18:09] 10SRE, 10Traffic, 10Performance-Team (Radar), 10User-ema: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 (10ema) Caches have now filled up. Response start looks good on cp3060 [[https://grafana.wikimedia.org/d/M7xQ_BeWk/response-time-by-host?viewPanel=5&orgId=1&var-host=cp3060... [07:21:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17517 and previous config saved to /var/cache/conftool/dbconfig/20211019-072111-root.json [07:21:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:21:41] (03CR) 10Jbond: [C: 03+2] p::s::timesyncd: use ensure in auto_restarts [puppet] - 10https://gerrit.wikimedia.org/r/731843 (owner: 10BBlack) [07:24:27] !log A:cp start rolling varnish upgrades to 6.0.8-1wm1 T292290 [07:24:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:33] T292290: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 [07:27:39] 10ops-eqiad: eqiad: patch 2nd Equinix IXP - https://phabricator.wikimedia.org/T293726 (10ayounsi) p:05Triage→03Medium [07:27:53] 10ops-eqiad: eqiad: patch 2nd Equinix IXP - https://phabricator.wikimedia.org/T293726 (10ayounsi) [07:30:13] (03PS1) 10Marostegui: Revert "db2079: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/731823 [07:30:46] (03CR) 10Marostegui: [C: 03+2] Revert "db2079: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/731823 (owner: 10Marostegui) [07:32:19] !log oblivian@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [07:32:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:48] RECOVERY - Check systemd state on dns4002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:34:00] RECOVERY - Check systemd state on dns2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:34:00] RECOVERY - Check systemd state on dns1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:34:40] RECOVERY - Check systemd state on dns3001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:34:52] RECOVERY - Check systemd state on dns5002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:35:30] RECOVERY - Check systemd state on dns2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:35:30] RECOVERY - Check systemd state on dns3002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:35:30] RECOVERY - Check systemd state on dns5001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:35:34] RECOVERY - Check systemd state on dns4001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:35:40] RECOVERY - Check systemd state on dns1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:36:30] (03CR) 10Ayounsi: [C: 03+1] anycast_monitoring: add check for durum [puppet] - 10https://gerrit.wikimedia.org/r/731399 (owner: 10Ssingh) [07:50:04] (03PS2) 10Hashar: contint: regularly prune docker material [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) [07:50:24] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) (owner: 10Hashar) [07:56:27] PROBLEM - Disk space on aqs1012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra-a 132961 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=aqs1012&var-datasource=eqiad+prometheus/ops [08:00:32] (03PS5) 10Ayounsi: Add transit BGP communities for anycast traffic engineering [homer/public] - 10https://gerrit.wikimedia.org/r/728255 (https://phabricator.wikimedia.org/T288843) [08:01:17] !log oblivian@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [08:01:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:01:28] (03CR) 10Ayounsi: Add transit BGP communities for anycast traffic engineering (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/728255 (https://phabricator.wikimedia.org/T288843) (owner: 10Ayounsi) [08:03:03] !log push prep-work for anycast tuning in ulsfo (try 2) - T288843 [08:03:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:09] T288843: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 [08:06:39] Going to depool codfw swift and swift-ro so I can add load to some new hosts (T288458) [08:06:40] T288458: Put ms-be20[62-65] in service - https://phabricator.wikimedia.org/T288458 [08:06:42] cc godog [08:07:04] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) (owner: 10Hashar) [08:07:17] !log mvernon@cumin2002 conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift-ro [08:07:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:32] (03CR) 10Volans: "replies inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/730513 (owner: 10Jbond) [08:07:32] !log mvernon@cumin2002 conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift [08:07:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:55] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/730506 (owner: 10Jbond) [08:09:10] (03PS1) 10David Caro: ntp::daemon and systemd::timesyncd: use ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/731876 (https://phabricator.wikimedia.org/T293691) [08:10:23] (03PS3) 10Hashar: contint: regularly prune docker material [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) [08:10:32] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) (owner: 10Hashar) [08:13:48] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31754/console" [puppet] - 10https://gerrit.wikimedia.org/r/731876 (https://phabricator.wikimedia.org/T293691) (owner: 10David Caro) [08:14:42] (03PS4) 10Hashar: contint: regularly prune docker material [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) [08:17:28] !log oblivian@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [08:17:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:04] (03CR) 10David Caro: "Is this needed anymore?" [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [08:18:20] Emperor: ack, cheers! [08:19:29] (03CR) 10Hashar: [C: 03+1] "I have cherry picked it on the integration puppet master and I do get a systemd timer /lib/systemd/system/docker-system-prune-dangling.tim" [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) (owner: 10Hashar) [08:19:53] (03PS3) 10Phuedx: Clean-up decommisioned Print schema configs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570625 (https://phabricator.wikimedia.org/T196159) (owner: 10Polishdeveloper) [08:20:13] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/731876 (https://phabricator.wikimedia.org/T293691) (owner: 10David Caro) [08:21:59] (03PS1) 10MMandere: rsyslog: Add drmrs DC site [puppet] - 10https://gerrit.wikimedia.org/r/731877 (https://phabricator.wikimedia.org/T282787) [08:22:09] (03PS1) 10MVernon: codfw-prod: more weight to ms-be20[62-65] [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) [08:22:47] (03CR) 10Hashar: "Looks like John and Jelto were the last to touch systemd::timer. The "splay" parameter got introduced with the first commit but lacked a" [puppet] - 10https://gerrit.wikimedia.org/r/731838 (owner: 10Hashar) [08:24:10] (03PS6) 10Ayounsi: Add transit BGP communities for anycast traffic engineering [homer/public] - 10https://gerrit.wikimedia.org/r/728255 (https://phabricator.wikimedia.org/T288843) [08:24:23] (03CR) 10Hashar: "Looks like John and Jelto were the last to touch systemd::timer. Without support for "splay", one has to create a unit AND a timer and att" [puppet] - 10https://gerrit.wikimedia.org/r/731839 (owner: 10Hashar) [08:24:30] (03PS2) 10MVernon: codfw-prod: more weight to ms-be20[62-65] [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) [08:25:23] (03PS3) 10MVernon: codfw-prod: more weight to ms-be20[62-65] [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) [08:25:38] (03CR) 10Ayounsi: [C: 03+2] "Previous "then next policy" placement wasn't allowing any prefixes to be exported. Tested on cr3-ulsfo and works as expected." [homer/public] - 10https://gerrit.wikimedia.org/r/728255 (https://phabricator.wikimedia.org/T288843) (owner: 10Ayounsi) [08:25:42] (03CR) 10Hashar: [C: 03+1] "This is for the WMCS instances for CI, that would prune every images on Sunday and prune the dangling one every day. Should help keep the" [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) (owner: 10Hashar) [08:26:15] (03Merged) 10jenkins-bot: Add transit BGP communities for anycast traffic engineering [homer/public] - 10https://gerrit.wikimedia.org/r/728255 (https://phabricator.wikimedia.org/T288843) (owner: 10Ayounsi) [08:27:16] (03CR) 10Volans: "Some comments inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/731153 (owner: 10Jbond) [08:28:15] (03CR) 10Filippo Giunchedi: "nonobject_weight should be 92 afaics, LGTM overall" [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) (owner: 10MVernon) [08:30:08] (03CR) 10Volans: sre: add contool aware SREBatchRunnerBase (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/731153 (owner: 10Jbond) [08:32:00] !log [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php hrwiki --fix [08:32:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:33:43] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s8 T281058 [08:33:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:33:49] T281058: Rename AbuseFilter indexes for consistency - https://phabricator.wikimedia.org/T281058 [08:33:53] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s8 T281058 [08:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:35] (03CR) 10David Caro: [V: 03+1 C: 03+2] ntp::daemon and systemd::timesyncd: use ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/731876 (https://phabricator.wikimedia.org/T293691) (owner: 10David Caro) [08:35:32] (03CR) 10MVernon: codfw-prod: more weight to ms-be20[62-65] (031 comment) [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) (owner: 10MVernon) [08:36:23] PROBLEM - Check systemd state on cp2033 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_varnish-frontend-hospital.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:37:39] looking ^ [08:38:19] RECOVERY - Disk space on aqs1012 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=aqs1012&var-datasource=eqiad+prometheus/ops [08:38:44] (03PS1) 10Urbanecm: DPL: Explicitly note it is not possible to enable DPL on any more wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731879 [08:40:12] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] "🙏" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731879 (owner: 10Urbanecm) [08:40:49] (03CR) 10MVernon: codfw-prod: more weight to ms-be20[62-65] (031 comment) [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) (owner: 10MVernon) [08:40:57] !log push prep-work for anycast tuning to all sites - T288843 [08:41:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:03] T288843: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 [08:41:25] moritzm: it seems that wmf_auto_restart for varnish-frontend-hospital.service failed on cp2033 while upgrading varnish [08:41:40] (03PS1) 10Gergő Tisza: [WIP] Enable GrowthExperiments Add Image feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731880 (https://phabricator.wikimedia.org/T290949) [08:41:57] probably some sort of race between the restart of varnish-frontend-hospital.service itself and wmf_auto_restart checking if a restart is needed? [08:41:59] (03CR) 10JMeybohm: [C: 03+1] "I think it's worth noticing that we can (an should) remove the code specific to helmBinary switching again after we've migrated to helm3 c" [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [08:42:28] (03PS1) 10Giuseppe Lavagetto: mediawiki: nest kubernetes labels in rsyslog [deployment-charts] - 10https://gerrit.wikimedia.org/r/731881 [08:42:31] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Enable GrowthExperiments Add Image feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731880 (https://phabricator.wikimedia.org/T290949) (owner: 10Gergő Tisza) [08:43:50] (03CR) 10Muehlenhoff: sre: add contool aware SREBatchRunnerBase (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/731153 (owner: 10Jbond) [08:44:37] RECOVERY - Check systemd state on cp2033 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:44:54] (03CR) 10Filippo Giunchedi: codfw-prod: more weight to ms-be20[62-65] (031 comment) [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) (owner: 10MVernon) [08:45:47] ema: did the automated service restart happen while the update was in flight? that wouldn't be surprising, there's no logic in wmf-auto-restart to check for ongoing installs/updates/removals [08:46:17] I'll be moving read traffic to graphite2003 first and then write traffic, FYI (I'll be !log ggin) [08:48:20] moritzm: ah yes! basically varnish-frontend-hospital.service was autorestarted by the varnish-frontend.service restart (because of Requires=) [08:48:31] and at the same time the auto-restart timer kicked in [08:48:40] (03CR) 10Filippo Giunchedi: [C: 03+2] discovery: move read traffic to graphite2003 [dns] - 10https://gerrit.wikimedia.org/r/731435 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [08:49:29] (03PS1) 10Marostegui: mariadb: Promote db2103 to s1 codfw master [puppet] - 10https://gerrit.wikimedia.org/r/731882 (https://phabricator.wikimedia.org/T290865) [08:49:58] moritzm: perhaps we can somehow specify the relationship between services and their auto-restart timers? [08:50:22] !log point graphite.discovery.wmnet to graphite2003 - T247963 [08:50:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:28] T247963: Migrate role::graphite::production to Bullseye - https://phabricator.wikimedia.org/T247963 [08:50:52] (03PS5) 10Ayounsi: Configure transit specific outbound BGP communities [homer/public] - 10https://gerrit.wikimedia.org/r/728256 (https://phabricator.wikimedia.org/T288843) [08:53:20] (03PS4) 10MVernon: codfw-prod: more weight to ms-be20[62-65] [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) [08:54:01] (03CR) 10Kormat: [C: 03+1] mariadb: Promote db2103 to s1 codfw master [puppet] - 10https://gerrit.wikimedia.org/r/731882 (https://phabricator.wikimedia.org/T290865) (owner: 10Marostegui) [08:54:09] (03CR) 10MVernon: codfw-prod: more weight to ms-be20[62-65] (031 comment) [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) (owner: 10MVernon) [08:58:56] (03CR) 10Filippo Giunchedi: [C: 03+1] rsyslog: Add drmrs DC site [puppet] - 10https://gerrit.wikimedia.org/r/731877 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [08:59:28] (03CR) 10Volans: sre: add contool aware SREBatchRunnerBase (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/731153 (owner: 10Jbond) [08:59:30] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) (owner: 10MVernon) [08:59:57] I don't see an obvious way to bake it into unit/timer relations, but we could add a check to the wmf-auto-restart script instead [09:00:36] (03CR) 10MVernon: [V: 03+2 C: 03+2] codfw-prod: more weight to ms-be20[62-65] [software/swift-ring] - 10https://gerrit.wikimedia.org/r/731878 (https://phabricator.wikimedia.org/T288458) (owner: 10MVernon) [09:01:46] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10cmooney) a:05cmooney→03None Sorry @Dzahn I should have updated it before now. Makes sense to re-assign to DC-Ops... [09:03:04] !log push anycast tuning to all Telia transit links - T288843 [09:03:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:10] T288843: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 [09:10:13] 10SRE, 10Data-Persistence (Consultation), 10Performance-Team, 10Wikimedia-Rdbms, 10Sustainability (Incident Followup): Reimplement HHVM-like slow query log - https://phabricator.wikimedia.org/T293534 (10Kormat) >>! In T293534#7437661, @Legoktm wrote: > Right, searching for `channel:DBPerformance AND mess... [09:10:30] 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10ayounsi) Example before/after for Telia in eqiad: `lines=20 ayounsi@re0.cr2-eqiad> show route advertising-protocol bgp 80.239.132.225 inet.0: 852341 des... [09:13:42] (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db2103 to s1 codfw master [puppet] - 10https://gerrit.wikimedia.org/r/731882 (https://phabricator.wikimedia.org/T290865) (owner: 10Marostegui) [09:15:07] 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10ayounsi) Confirmed with Telia's looking glass: https://lg.twelve99.net/?type=bgp&router=prs-b6&address=185.71.138.0/24 [09:17:02] (03CR) 10Marostegui: [C: 03+2] dbbackups: Switch s1 backup generation from db2097 to db2141 [puppet] - 10https://gerrit.wikimedia.org/r/721285 (https://phabricator.wikimedia.org/T290865) (owner: 10Jcrespo) [09:17:46] (03CR) 10Filippo Giunchedi: [C: 03+2] monitoring: check graphite2003 metrics [puppet] - 10https://gerrit.wikimedia.org/r/731434 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [09:17:51] (03PS2) 10Filippo Giunchedi: monitoring: check graphite2003 metrics [puppet] - 10https://gerrit.wikimedia.org/r/731434 (https://phabricator.wikimedia.org/T247963) [09:18:21] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] monitoring: check graphite2003 metrics [puppet] - 10https://gerrit.wikimedia.org/r/731434 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [09:18:31] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 T281058 [09:18:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:36] T281058: Rename AbuseFilter indexes for consistency - https://phabricator.wikimedia.org/T281058 [09:18:41] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 T281058 [09:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:58] !log Stop slave on db2112 T290865 [09:20:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:03] T290865: Upgrade s1 to Buster + MariaDB 10.4 - https://phabricator.wikimedia.org/T290865 [09:21:59] 10ops-codfw, 10DBA: Upgrade db2112 firmware/BIOS - https://phabricator.wikimedia.org/T293740 (10Marostegui) [09:22:21] 10ops-codfw, 10DBA: Upgrade db2112 firmware/BIOS - https://phabricator.wikimedia.org/T293740 (10Marostegui) p:05Triage→03Medium [09:26:59] (03PS1) 10Btullis: Add the data-engineering team to Alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/731884 (https://phabricator.wikimedia.org/T293399) [09:27:10] !log Cloned and applied security patches for 1.38.0-wmf.5 # T281169 [09:27:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:16] T281169: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 [09:27:46] !log sap clean --delete 1.38.0-wmf.2 && scap clean --delete 1.38.0-wmf.3 # T281169 [09:27:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:25] (03PS3) 10David Caro: toolforge: new add_grid_webgrid_generic_node recipe [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/726894 (https://phabricator.wikimedia.org/T292465) [09:28:27] (03PS1) 10David Caro: start_instance_with_prefix: fix issue when no previous instance exist [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731885 [09:28:29] (03PS1) 10David Caro: start_instance_prefix: add reusable params helpers [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731906 [09:28:31] (03PS1) 10David Caro: wmcs: create composite type OpenstackIdentifier [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731907 [09:28:33] (03PS1) 10David Caro: start_instance_with_prefix: Group options in a dataclass [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731908 [09:28:35] (03PS1) 10David Caro: InstanceCreationOpts: Add a way to genera cli args [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731909 [09:28:37] (03PS1) 10David Caro: start_instance_with_prefix: allow integer-suffixed prefixes [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731910 [09:28:39] (03PS1) 10David Caro: start_instance_with_prefix: fix next instance counter [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731911 (https://phabricator.wikimedia.org/T292465) [09:28:41] (03PS1) 10David Caro: start_instance_with_prefix: add tries parameter [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731912 (https://phabricator.wikimedia.org/T292465) [09:28:43] (03PS1) 10David Caro: start_instance_with_prefix: work around extra stderr message [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731913 [09:28:45] (03PS1) 10David Caro: grid: Added a couple cookbooks to add a new webgrid generic node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731914 [09:32:34] (03CR) 10jerkins-bot: [V: 04-1] start_instance_with_prefix: Group options in a dataclass [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731908 (owner: 10David Caro) [09:33:03] 10ops-codfw, 10DBA: Upgrade db2112 firmware/BIOS - https://phabricator.wikimedia.org/T293740 (10Marostegui) [09:33:18] (03CR) 10jerkins-bot: [V: 04-1] grid: Added a couple cookbooks to add a new webgrid generic node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731914 (owner: 10David Caro) [09:34:59] !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster [09:35:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:11] !log move graphite/statsd writes to graphite2003 - T247963 [09:37:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:17] T247963: Migrate role::graphite::production to Bullseye - https://phabricator.wikimedia.org/T247963 [09:37:26] (03CR) 10Filippo Giunchedi: [C: 03+2] statsd: failover writes to graphite2003 [puppet] - 10https://gerrit.wikimedia.org/r/731433 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [09:37:43] (03CR) 10Filippo Giunchedi: [C: 03+2] wmnet: move writes to graphite2003 [dns] - 10https://gerrit.wikimedia.org/r/731436 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [09:41:48] RECOVERY - Check systemd state on search-loader2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:42:38] !log hashar@deploy1002 Pruned MediaWiki: 1.38.0-wmf.2 (duration: 16m 06s) [09:42:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:22] !log hashar@deploy1002 Pruned MediaWiki: 1.38.0-wmf.3 (duration: 01m 39s) [09:44:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:48] ^ I am done ;) [09:47:55] will do the rest this afternoon [09:47:56] (03CR) 10Jbond: [C: 03+2] mediawiki: add get_primary_dc function [software/spicerack] - 10https://gerrit.wikimedia.org/r/730440 (owner: 10Jbond) [09:50:22] !log marostegui@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster [09:50:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:18] !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster [09:52:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:37] (03CR) 10MMandere: [C: 03+2] rsyslog: Add drmrs DC site [puppet] - 10https://gerrit.wikimedia.org/r/731877 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [09:53:48] (03Merged) 10jenkins-bot: mediawiki: add get_primary_dc function [software/spicerack] - 10https://gerrit.wikimedia.org/r/730440 (owner: 10Jbond) [09:56:00] (03CR) 10Jbond: builder/systemtap: merge role::systemtap::devserver into builder (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/730863 (owner: 10Dzahn) [09:58:19] (03CR) 10Jbond: standard::ntp: move standard ntp to its own profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/730852 (owner: 10Jbond) [10:01:07] 10SRE, 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10jbond) > One question - when I use profile::pki::get_cert do I need to do any extra work before using a label like kafka (that I assu... [10:01:39] (03PS1) 10Filippo Giunchedi: mwdebug: add graphite2003 to network policies [deployment-charts] - 10https://gerrit.wikimedia.org/r/731917 (https://phabricator.wikimedia.org/T247963) [10:01:46] (03CR) 10Jbond: [C: 03+1] gitlab: remove cas3 from external providers [puppet] - 10https://gerrit.wikimedia.org/r/731849 (https://phabricator.wikimedia.org/T293696) (owner: 10Brennen Bearnes) [10:02:04] 10SRE, 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10elukey) >>! In T291905#7439824, @jbond wrote: >> One question - when I use profile::pki::get_cert do I need to do any extra work befo... [10:03:45] (03PS1) 10Filippo Giunchedi: ProductionServices: use graphite2003 for statsd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731918 (https://phabricator.wikimedia.org/T247963) [10:04:32] (03CR) 10jerkins-bot: [V: 04-1] ProductionServices: use graphite2003 for statsd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731918 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [10:05:58] (03PS2) 10Filippo Giunchedi: ProductionServices: use graphite2003 for statsd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731918 (https://phabricator.wikimedia.org/T247963) [10:10:00] anyone available for a mw config change/deployment ? ^ [10:10:25] I apologise about the timing, I mistakenly thought changing dns/puppet would be enough to flip statsd, doesn't seem like it though [10:14:06] <_joe_> jouncebot: next [10:14:07] In 0 hour(s) and 45 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T1100) [10:14:17] <_joe_> godog: we can anticipate my patch as well [10:14:27] <_joe_> if you need it now [10:14:34] <_joe_> else we can wait for the normal deployment window [10:15:10] _joe_: yeah if it isn't too much of a problem I'd like to push it now [10:15:32] again apologies for the timing, that's on me [10:15:34] <_joe_> ok [10:15:53] <_joe_> it's ok, I mean, we're allowed to merge changes outside of the window for operational purposes [10:16:04] <_joe_> adn I'm sure the deployers won't mind either [10:16:11] !log marostegui@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster [10:16:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:15] (03CR) 10Giuseppe Lavagetto: [C: 03+2] ProductionServices: use graphite2003 for statsd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731918 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [10:16:44] volans: ^ so I guess it is time to kick a manual restart from the idrac [10:16:58] (03Merged) 10jenkins-bot: ProductionServices: use graphite2003 for statsd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731918 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [10:17:00] ack, thanks [10:17:46] marostegui: as you want depends what you want to do next [10:17:52] is not showing anything here [10:18:01] volans: I can kick the stretch install [10:18:05] let's try that [10:18:08] ack [10:18:13] <_joe_> godog: I'll do the deployment [10:18:15] I'll keep the console if you don't mind [10:18:19] sure [10:18:20] _joe_: thanks, appreciate it [10:18:24] !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS stretch [10:18:27] volans: ^ [10:18:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:18:36] ack [10:18:42] <_joe_> godog: see any reason to "test" it on mwdebug, btw? [10:18:54] _joe_: bah not really, pretty straightforward [10:20:13] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [10:20:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:22] (03PS1) 10Btullis: Add the first data-engineering team alert to Alertmanager [alerts] - 10https://gerrit.wikimedia.org/r/731919 (https://phabricator.wikimedia.org/T293399) [10:21:26] !log oblivian@deploy1002 Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:731918|ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s) [10:21:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:32] T247963: Migrate role::graphite::production to Bullseye - https://phabricator.wikimedia.org/T247963 [10:22:04] <_joe_> godog: in ~ 10 seconds you should see all the mediawiki traffic migrated [10:22:05] !log flip mw statsd traffic with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/731918 - T247963 [10:22:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:17] _joe_: yeah I can see the UDP firehose already :( [10:22:38] !log oblivian@deploy1002 Synchronized tests/WmfConfigServicesTest.php: Config: [[gerrit:731918|ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s) [10:22:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:56] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [10:23:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:03] to the tune of ~500 bytes 100k times/s [10:23:26] <_joe_> ok, given we're here, I'll also deploy my patch and declare the deploy window done before it should happen :D [10:23:41] \o/ [10:23:55] (03PS1) 10MMandere: resolving: Add drmrs DC site [puppet] - 10https://gerrit.wikimedia.org/r/731920 (https://phabricator.wikimedia.org/T282787) [10:24:11] the udp firehose -> https://grafana-rw.wikimedia.org/d/000000337/graphite-codfw?orgId=1&refresh=1m&forceLogin=true&from=1634628240544&to=1634639040544&viewPanel=16 [10:24:43] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "Deploying" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730182 (https://phabricator.wikimedia.org/T285232) (owner: 10Giuseppe Lavagetto) [10:24:52] (03PS2) 10ArielGlenn: index page and directory for Wikimedia Enterprise HTML dumps [puppet] - 10https://gerrit.wikimedia.org/r/731768 (https://phabricator.wikimedia.org/T273585) [10:25:33] (03Merged) 10jenkins-bot: static.php: Add support for /static/current rewrites (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730182 (https://phabricator.wikimedia.org/T285232) (owner: 10Giuseppe Lavagetto) [10:25:52] (03CR) 10Elukey: [C: 03+2] knative-serving: add prometheus annotations [deployment-charts] - 10https://gerrit.wikimedia.org/r/731875 (https://phabricator.wikimedia.org/T289841) (owner: 10Elukey) [10:26:10] (03CR) 10Jbond: "thanks updated" [cookbooks] - 10https://gerrit.wikimedia.org/r/730513 (owner: 10Jbond) [10:26:21] (03PS6) 10Jbond: cookbooks sre: update run_scripts to accept a list of scripts not functions [cookbooks] - 10https://gerrit.wikimedia.org/r/730513 [10:26:46] (03PS4) 10Jbond: cookbook sre: update SREBatchBase/SREBatchRunnerBase with minor fixes [cookbooks] - 10https://gerrit.wikimedia.org/r/730506 [10:26:51] (03CR) 10Jbond: [C: 03+2] cookbook sre: update SREBatchBase/SREBatchRunnerBase with minor fixes [cookbooks] - 10https://gerrit.wikimedia.org/r/730506 (owner: 10Jbond) [10:26:57] (03PS7) 10Jbond: cookbooks sre: update run_scripts to accept a list of scripts not functions [cookbooks] - 10https://gerrit.wikimedia.org/r/730513 [10:27:10] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet [10:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:30] (03PS1) 10Btullis: Remove HDFS Capacity Remaining check from Icinga [puppet] - 10https://gerrit.wikimedia.org/r/731921 (https://phabricator.wikimedia.org/T293399) [10:28:26] (03CR) 10Jbond: debdeploy: set autostarts to include debdeploy client (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [10:28:34] (03Abandoned) 10Jbond: debdeploy: set autostarts to include debdeploy client [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [10:28:56] !log elukey@deploy1002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [10:29:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:16] ok mw metrics are back to where they should be [10:29:19] !log elukey@deploy1002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [10:29:20] (03Merged) 10jenkins-bot: cookbook sre: update SREBatchBase/SREBatchRunnerBase with minor fixes [cookbooks] - 10https://gerrit.wikimedia.org/r/730506 (owner: 10Jbond) [10:29:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:26] e.g. https://grafana.wikimedia.org/d/2Zx07tGZz/wanobjectcache?orgId=1&from=now-1h&to=now [10:29:29] (03CR) 10Jbond: [C: 03+2] "LGTM will merge thanks" [puppet] - 10https://gerrit.wikimedia.org/r/731838 (owner: 10Hashar) [10:29:37] (03CR) 10jerkins-bot: [V: 04-1] cookbooks sre: update run_scripts to accept a list of scripts not functions [cookbooks] - 10https://gerrit.wikimedia.org/r/730513 (owner: 10Jbond) [10:30:36] (03CR) 10Elukey: [C: 03+1] Remove HDFS Capacity Remaining check from Icinga [puppet] - 10https://gerrit.wikimedia.org/r/731921 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [10:31:56] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [10:32:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:33:35] (03CR) 10Elukey: Add the first data-engineering team alert to Alertmanager (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/731919 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [10:34:50] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [10:34:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1101 (s7,s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17519 and previous config saved to /var/cache/conftool/dbconfig/20211019-103634-marostegui.json [10:36:36] (03PS8) 10Jbond: cookbooks sre: update run_scripts to accept a list of scripts not functions [cookbooks] - 10https://gerrit.wikimedia.org/r/730513 [10:36:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:57] (03PS2) 10Jbond: sre: add contool aware SREBatchRunnerBase [cookbooks] - 10https://gerrit.wikimedia.org/r/731153 [10:37:31] !log Upgrade db1101 (s7,s8) [10:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:03] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet [10:38:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:55] !log oblivian@deploy1002 Synchronized w/static.php: Config: [[gerrit:730182|static.php: Add support for /static/current rewrites (take 2) (T285232)]] (duration: 00m 55s) [10:39:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:01] T285232: The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 [10:43:03] elukey btullis I need to restart superset on an-tool1010 to pick up graphite/statsd dns change, ok to do so? [10:44:58] godog: yes, fine by me. I'll watch `journalctl -f -u superset` for good measure while you do it, to check for any surprises. [10:45:06] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17520 and previous config saved to /var/cache/conftool/dbconfig/20211019-104506-root.json [10:45:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:26] !log bounce superset on an-tool1010 to pick up statsd changes - T247963 [10:45:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:31] T247963: Migrate role::graphite::production to Bullseye - https://phabricator.wikimedia.org/T247963 [10:45:37] !log bounce navtiming on webperf1001 to pick up statsd changes - T247963 [10:45:41] btullis: thank you! appreciate it [10:45:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:49] Pleasure. [10:48:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17521 and previous config saved to /var/cache/conftool/dbconfig/20211019-104829-root.json [10:48:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:50] ok overall I had to restart only two python apps that won't honor /etc/hosts or dns for statsd, not bad [10:49:12] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS stretch [10:49:12] and there are some maint scripts on mwmaint still running that don't have the config change applied yet, but that's fine I think [10:49:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:49:53] <_joe_> godog: yeah not much to do about those short of killing thems [10:49:54] err, three [10:49:57] <_joe_> *them [10:50:05] !log bounce superset on an-tool1005 to pick up statsd changes - T247963 [10:50:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:36] ok things seem to be ok, going to lunch [10:51:05] !log failover master in ganeti-test to ganeti2026 [10:51:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:17] PROBLEM - ganeti-wconfd running on ganeti2025 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (gnt-masterd), command name ganeti-wconfd https://wikitech.wikimedia.org/wiki/Ganeti [10:53:23] 10SRE, 10ops-codfw, 10DBA: Upgrade db2112 firmware/BIOS - https://phabricator.wikimedia.org/T293740 (10Marostegui) @Papaul db2112 has mysql stopped, so you could reboot it (or power it off) whenever you feel like [10:53:47] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet [10:53:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:12] !log Upgrade clouddb1021 [10:56:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:35] (03CR) 10JMeybohm: [C: 03+1] "LGTM" [deployment-charts] - 10https://gerrit.wikimedia.org/r/731372 (owner: 10Giuseppe Lavagetto) [10:57:23] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Rakefile: handle yaml errors where no fixtures are present. [deployment-charts] - 10https://gerrit.wikimedia.org/r/731372 (owner: 10Giuseppe Lavagetto) [10:57:31] (03PS2) 10Btullis: Add the first data-engineering team alert to Alertmanager [alerts] - 10https://gerrit.wikimedia.org/r/731919 (https://phabricator.wikimedia.org/T293399) [10:59:00] 10SRE, 10SRE-Access-Requests, 10Gerrit-Privilege-Requests, 10LDAP-Access-Requests: Offboard Tonina Zhelyazkova from WMF systems - https://phabricator.wikimedia.org/T293621 (10WMDE-leszek) >>! In T293621#7437664, @Urbanecm wrote: > @WMDE-leszek Hey, I see https://meta.wikimedia.org/wiki/Special:CentralAuth?... [11:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T1100). [11:00:05] _joe_ and godog: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:10] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17522 and previous config saved to /var/cache/conftool/dbconfig/20211019-110009-root.json [11:00:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:18] o/ [11:00:32] <_joe_> Lucas_WMDE: my change and filippo's are already deployed [11:00:33] looks like both changes are done already? [11:00:35] ok [11:00:38] <_joe_> yep [11:00:46] I don’t have anything to add myself [11:00:53] <_joe_> his change was time-sensitive, so I decided to piggyback onto it [11:01:29] (03Merged) 10jenkins-bot: Rakefile: handle yaml errors where no fixtures are present. [deployment-charts] - 10https://gerrit.wikimedia.org/r/731372 (owner: 10Giuseppe Lavagetto) [11:02:41] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet [11:02:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17523 and previous config saved to /var/cache/conftool/dbconfig/20211019-110333-root.json [11:03:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:26] so, looks no scheduled patches are here... [11:05:36] ..in that case, i'll get a small patch out now [11:05:40] (03PS2) 10Urbanecm: DPL: Explicitly note it is not possible to enable DPL on any more wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731879 [11:05:48] (03CR) 10Urbanecm: [C: 03+2] DPL: Explicitly note it is not possible to enable DPL on any more wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731879 (owner: 10Urbanecm) [11:06:08] \o/ [11:06:35] (03Merged) 10jenkins-bot: DPL: Explicitly note it is not possible to enable DPL on any more wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731879 (owner: 10Urbanecm) [11:06:51] 🔥 DPL 🚮 [11:07:26] syncing, even though it's a no-op [11:07:47] <_joe_> Lucas_WMDE: :D [11:08:11] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 7c31b04e50101a60db7ae8acae64bc031f5e1007: DPL: Explicitly note it is not possible to enable DPL on any more wikis (duration: 00m 55s) [11:08:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:08:31] anyone want to add that to the bash tool? ;) [11:08:45] (03CR) 10Urbanecm: "removed the CR+1, please do not review your own patches" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731231 (https://phabricator.wikimedia.org/T293342) (owner: 10Juan90264) [11:09:03] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:09:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:18] Lucas_WMDE: https://bash.toolforge.org/quip/lm09mHwB8Fs0LHO5rLha [11:10:09] \o/ [11:10:21] our previous DPL quote was too tame https://bash.toolforge.org/quip/AU7VU8Yx6snAnmqnK_tn [11:11:38] :D [11:11:49] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:11:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:44] <_joe_> Lucas_WMDE: the notable thing is that that quote is probably from 2011 [11:12:49] ouch [11:14:34] (03PS1) 10Cparle: Escape captions when writing stored data into js state [extensions/WikibaseMediaInfo] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/731898 (https://phabricator.wikimedia.org/T293556) [11:14:40] * Lucas_WMDE tries not to think about how many more ticking bombs we have in extensions/ [11:15:09] (03PS1) 10Cparle: Escape captions when writing stored data into js state [extensions/WikibaseMediaInfo] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731899 (https://phabricator.wikimedia.org/T293556) [11:15:13] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17524 and previous config saved to /var/cache/conftool/dbconfig/20211019-111513-root.json [11:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:33] looks like cparle is preparing a backport [11:16:09] (03PS1) 10Majavah: wheel of misfortune: check if pid exists first [puppet] - 10https://gerrit.wikimedia.org/r/731924 [11:16:43] 10SRE, 10SRE-Access-Requests, 10Gerrit-Privilege-Requests, 10LDAP-Access-Requests: Offboard Tonina Zhelyazkova from WMF systems - https://phabricator.wikimedia.org/T293621 (10Urbanecm) >>! In T293621#7440085, @WMDE-leszek wrote: >>>! In T293621#7437664, @Urbanecm wrote: >> @WMDE-leszek Hey, I see https://m... [11:18:12] (03PS3) 10Jbond: sre: add contool aware SREBatchRunnerBase [cookbooks] - 10https://gerrit.wikimedia.org/r/731153 [11:18:14] (03PS1) 10Jbond: cookbooks.sre: update to use correct icinga_hosts instance [cookbooks] - 10https://gerrit.wikimedia.org/r/731925 [11:18:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17525 and previous config saved to /var/cache/conftool/dbconfig/20211019-111837-root.json [11:18:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:41] (03PS4) 10Jbond: sre: add contool aware SREBatchRunnerBase [cookbooks] - 10https://gerrit.wikimedia.org/r/731153 [11:22:43] (03CR) 10Jbond: "updated see responses inline, thanks" [cookbooks] - 10https://gerrit.wikimedia.org/r/731153 (owner: 10Jbond) [11:24:29] (03PS2) 10Jgiannelos: Configure event stream for map tiles state change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730848 (https://phabricator.wikimedia.org/T289771) [11:26:43] (03CR) 10Urbanecm: [C: 03+2] Escape captions when writing stored data into js state [extensions/WikibaseMediaInfo] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/731898 (https://phabricator.wikimedia.org/T293556) (owner: 10Cparle) [11:26:52] ^^per discussion with cparle^^ [11:27:09] 👍 [11:27:22] (03CR) 10Urbanecm: [C: 03+2] Escape captions when writing stored data into js state [extensions/WikibaseMediaInfo] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731899 (https://phabricator.wikimedia.org/T293556) (owner: 10Cparle) [11:28:13] (03PS2) 10Giuseppe Lavagetto: mediawiki: nest kubernetes labels in rsyslog [deployment-charts] - 10https://gerrit.wikimedia.org/r/731881 [11:28:15] (03PS1) 10Giuseppe Lavagetto: mediawiki: rewrite static assets by default [deployment-charts] - 10https://gerrit.wikimedia.org/r/731926 (https://phabricator.wikimedia.org/T285232) [11:29:24] (03CR) 10Btullis: Add the first data-engineering team alert to Alertmanager (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/731919 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [11:30:17] (03PS1) 10Kosta Harlan: GrowthExperiments: Add campaign pattern for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731928 (https://phabricator.wikimedia.org/T293699) [11:30:17] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17526 and previous config saved to /var/cache/conftool/dbconfig/20211019-113017-root.json [11:30:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:38] * cormacparle waves [11:32:04] hi cormacparle! I'll ping you when it's ready for testing. [11:32:22] great, thanks urbanecm [11:33:41] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17527 and previous config saved to /var/cache/conftool/dbconfig/20211019-113340-root.json [11:33:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:21] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17529 and previous config saved to /var/cache/conftool/dbconfig/20211019-114520-root.json [11:45:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:49] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1105 (s1,s2) for upgrade', diff saved to https://phabricator.wikimedia.org/P17530 and previous config saved to /var/cache/conftool/dbconfig/20211019-114649-marostegui.json [11:46:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:57] !log Upgrade db1105 (s1,s2) [11:47:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:44] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17531 and previous config saved to /var/cache/conftool/dbconfig/20211019-114844-root.json [11:48:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:07] (03Merged) 10jenkins-bot: Escape captions when writing stored data into js state [extensions/WikibaseMediaInfo] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/731898 (https://phabricator.wikimedia.org/T293556) (owner: 10Cparle) [11:53:35] (03Merged) 10jenkins-bot: Escape captions when writing stored data into js state [extensions/WikibaseMediaInfo] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731899 (https://phabricator.wikimedia.org/T293556) (owner: 10Cparle) [11:54:36] finally [11:54:47] ready to test? [11:54:53] pulling [11:55:01] gmme few seconds [11:55:58] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:56:00] cormacparle: pulled onto mwdebug1001, please test. [11:56:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:56:14] !log push anycast tuning to Tele2, Init7, DT transit links - T288843 [11:56:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:56:19] T288843: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 [11:58:03] (03CR) 10Filippo Giunchedi: "I've added the record for completeness, but I don't think this egress policy is correct since there's nothing listening on 9125/udp on gra" [deployment-charts] - 10https://gerrit.wikimedia.org/r/731917 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [11:58:17] urbanecm: looks good to me [11:58:27] thanks, syncing [11:58:47] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:58:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:05] Deploy window Pre MediaWiki train break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T1200) [12:00:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17532 and previous config saved to /var/cache/conftool/dbconfig/20211019-120024-root.json [12:00:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:55] !log urbanecm@deploy1002 Synchronized php-1.38.0-wmf.4/extensions/WikibaseMediaInfo/: 79808a90a95dd5dac2b532b87fb7ec1a490ea0f0: Escape captions when writing stored data into js state (T293556) (duration: 00m 56s) [12:00:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:33] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17533 and previous config saved to /var/cache/conftool/dbconfig/20211019-120132-root.json [12:01:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:51] !log urbanecm@deploy1002 Synchronized php-1.38.0-wmf.5/extensions/WikibaseMediaInfo/: ec0125770775c1a1a54c3b592d86d287fd9e3ad6: Escape captions when writing stored data into js state (T293556) (duration: 00m 55s) [12:01:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:59] (03CR) 10Jbond: [C: 03+2] systemd::timer::job: add support for splay [puppet] - 10https://gerrit.wikimedia.org/r/731839 (owner: 10Hashar) [12:02:01] cormacparle: should be live [12:02:07] checking ... [12:03:02] hmmm not seeing it [12:03:08] (03CR) 10Jbond: [C: 03+1] "LGTM, ping me on irc when its ready to merge" [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) (owner: 10Hashar) [12:03:14] cormacparle: resource loader modules are cached for a while [12:03:28] (?debug=1 _should_ bypass that, at cost of long loading times) [12:03:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17534 and previous config saved to /var/cache/conftool/dbconfig/20211019-120348-root.json [12:03:49] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [12:03:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:03:56] (03PS1) 10Arturo Borrero Gonzalez: cloud: ceph: refactor rbd client profile for cloudcontrol [puppet] - 10https://gerrit.wikimedia.org/r/731933 (https://phabricator.wikimedia.org/T293752) [12:04:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17535 and previous config saved to /var/cache/conftool/dbconfig/20211019-120409-root.json [12:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:31] (03CR) 10Arturo Borrero Gonzalez: "this definitely needs another pair of eyes" [puppet] - 10https://gerrit.wikimedia.org/r/731933 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez) [12:04:33] (03CR) 10jerkins-bot: [V: 04-1] cloud: ceph: refactor rbd client profile for cloudcontrol [puppet] - 10https://gerrit.wikimedia.org/r/731933 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez) [12:04:59] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1167 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17536 and previous config saved to /var/cache/conftool/dbconfig/20211019-120458-marostegui.json [12:05:01] urbanecm: do you need to sync 1.38.0-wmf.4 as well as .5? [12:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:16] i...did sync both? [12:05:39] https://sal.toolforge.org/log/axBtmHwB1jz_IcWu5DiO (wmf.5) and https://sal.toolforge.org/log/YxBtmHwB1jz_IcWuCjee (wmf.4) [12:05:39] ah, so you did, sorry [12:05:58] and yes it works with `debug=1` [12:06:02] \o/ [12:06:05] thanks very much! [12:06:07] so then it's done :) [12:06:07] np [12:06:23] (the cache is a short lived one, it should work w/o debug=1 in five minutes or so) [12:06:30] 👍 [12:07:51] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [12:09:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1167 (s8) after upgrade', diff saved to https://phabricator.wikimedia.org/P17537 and previous config saved to /var/cache/conftool/dbconfig/20211019-120918-marostegui.json [12:09:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:08] (03CR) 10Jbond: [C: 03+2] P:puppetboard:ng: add new profile for puppetboard [puppet] - 10https://gerrit.wikimedia.org/r/731783 (owner: 10Jbond) [12:12:31] (03PS1) 10Muehlenhoff: Add ownership annotations for IF services [puppet] - 10https://gerrit.wikimedia.org/r/731934 [12:12:49] !log push anycast tuning to all Lumen and NTT transit links - T288843 [12:12:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:54] T288843: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 [12:13:45] (03PS1) 10Ema: upgrade-varnish: test frontend only [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 [12:15:00] (03PS2) 10Ema: upgrade-varnish: test frontend only [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 [12:16:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17538 and previous config saved to /var/cache/conftool/dbconfig/20211019-121636-root.json [12:16:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:16:48] 10SRE, 10Goal, 10MW-1.38-notes (1.38.0-wmf.4; 2021-10-12), 10Patch-For-Review, and 2 others: Fully migrate producers off statsd - https://phabricator.wikimedia.org/T205870 (10fgiunchedi) [12:17:57] PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS2914/IPv4: Active - NTT, AS2914/IPv6: Active - NTT https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [12:17:59] (03CR) 10jerkins-bot: [V: 04-1] upgrade-varnish: test frontend only [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 (owner: 10Ema) [12:18:02] (03PS3) 10Ema: upgrade-varnish: support frontend instance only [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 [12:18:53] (03PS4) 10Ema: upgrade-varnish: support frontend instance only [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 [12:19:13] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17539 and previous config saved to /var/cache/conftool/dbconfig/20211019-121913-root.json [12:19:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:51] 10SRE, 10Analytics, 10SRE Observability (FY2021/2022-Q2): statsd and gunicorn metrics for superset - https://phabricator.wikimedia.org/T293761 (10fgiunchedi) [12:19:53] RECOVERY - BGP status on cr4-ulsfo is OK: BGP OK - up: 98, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [12:19:53] (03CR) 10David Caro: "rbd is already a specific ceph service, this will not include radosgw. I was thinking on something a level up here, more like:" [puppet] - 10https://gerrit.wikimedia.org/r/731933 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez) [12:21:01] (03PS1) 10Jbond: P:puppetboard:ng: fix hiera keys [puppet] - 10https://gerrit.wikimedia.org/r/731937 [12:21:32] (03CR) 10Jbond: [C: 03+2] P:puppetboard:ng: fix hiera keys [puppet] - 10https://gerrit.wikimedia.org/r/731937 (owner: 10Jbond) [12:21:37] (03CR) 10Ayounsi: [C: 03+1] Add ownership annotations for IF services [puppet] - 10https://gerrit.wikimedia.org/r/731934 (owner: 10Muehlenhoff) [12:22:06] 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.11 point update - https://phabricator.wikimedia.org/T292838 (10MoritzMuehlenhoff) [12:24:50] (03PS1) 10Marostegui: dbproxy1018: Depool clouddb1017 [puppet] - 10https://gerrit.wikimedia.org/r/731938 (https://phabricator.wikimedia.org/T290865) [12:24:52] (03CR) 10David Caro: cloud: ceph: refactor rbd client profile for cloudcontrol (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/731933 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez) [12:24:58] (03CR) 10Filippo Giunchedi: "LGTM! Nice job, see inline for minor comment" [puppet] - 10https://gerrit.wikimedia.org/r/731884 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [12:26:00] (03CR) 10Volans: [C: 03+1] "LGTM, couple of optional improvements inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 (owner: 10Ema) [12:26:14] (03CR) 10Marostegui: [C: 03+2] dbproxy1018: Depool clouddb1017 [puppet] - 10https://gerrit.wikimedia.org/r/731938 (https://phabricator.wikimedia.org/T290865) (owner: 10Marostegui) [12:26:17] (03PS1) 10Muehlenhoff: Remove obsolete pki role [puppet] - 10https://gerrit.wikimedia.org/r/731939 [12:26:46] (03PS6) 10Ayounsi: Configure transit specific outbound BGP communities [homer/public] - 10https://gerrit.wikimedia.org/r/728256 (https://phabricator.wikimedia.org/T288843) [12:27:37] (03CR) 10Ayounsi: [C: 03+2] Configure transit specific outbound BGP communities [homer/public] - 10https://gerrit.wikimedia.org/r/728256 (https://phabricator.wikimedia.org/T288843) (owner: 10Ayounsi) [12:28:11] (03Merged) 10jenkins-bot: Configure transit specific outbound BGP communities [homer/public] - 10https://gerrit.wikimedia.org/r/728256 (https://phabricator.wikimedia.org/T288843) (owner: 10Ayounsi) [12:30:15] 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10ayounsi) 05Open→03Resolved a:03ayounsi A good baseline has now been applied across most of our transits. Further tuning will happen when sub-optimal... [12:31:16] (03CR) 10Elukey: [C: 03+1] Add ownership annotations for IF services [puppet] - 10https://gerrit.wikimedia.org/r/731934 (owner: 10Muehlenhoff) [12:31:34] (03PS1) 10Marostegui: Revert "dbproxy1018: Depool clouddb1017" [puppet] - 10https://gerrit.wikimedia.org/r/731947 [12:31:41] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17540 and previous config saved to /var/cache/conftool/dbconfig/20211019-123140-root.json [12:31:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:19] (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1018: Depool clouddb1017" [puppet] - 10https://gerrit.wikimedia.org/r/731947 (owner: 10Marostegui) [12:33:40] (03CR) 10Filippo Giunchedi: "See inline, also please consider adding unit tests" [alerts] - 10https://gerrit.wikimedia.org/r/731919 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [12:34:16] !log Upgrade dbstore1003 [12:34:17] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17541 and previous config saved to /var/cache/conftool/dbconfig/20211019-123416-root.json [12:34:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:40:11] !log installing aftpd security updates [12:40:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:28] (03PS3) 10ArielGlenn: index page and directory for Wikimedia Enterprise HTML dumps [puppet] - 10https://gerrit.wikimedia.org/r/731768 (https://phabricator.wikimedia.org/T273585) [12:43:04] (03CR) 10ArielGlenn: [C: 03+2] index page and directory for Wikimedia Enterprise HTML dumps [puppet] - 10https://gerrit.wikimedia.org/r/731768 (https://phabricator.wikimedia.org/T273585) (owner: 10ArielGlenn) [12:46:44] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17542 and previous config saved to /var/cache/conftool/dbconfig/20211019-124644-root.json [12:46:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:34] (03PS1) 10Filippo Giunchedi: icinga: read monitoring groups from wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/731943 (https://phabricator.wikimedia.org/T286467) [12:49:09] apergos: looking forward for enterprise dumps to be available from dumps.wm.o. Thanks for helping that to happen. [12:49:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17543 and previous config saved to /var/cache/conftool/dbconfig/20211019-124920-root.json [12:49:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:31] pssst you can download the first set right now [12:49:48] but we won't annnounce it yet, the enterprise folks are going to do a thing. [12:52:02] (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31756/console" [puppet] - 10https://gerrit.wikimedia.org/r/731943 (https://phabricator.wikimedia.org/T286467) (owner: 10Filippo Giunchedi) [12:53:23] took me a while to figure out the URL, but yay :) [12:53:32] this is a very nice and useful dataset [12:54:46] (03PS1) 10Kevin Bazira: add enwiki-damaging inference service to LiftWing [deployment-charts] - 10https://gerrit.wikimedia.org/r/731944 (https://phabricator.wikimedia.org/T293762) [12:55:49] (03PS2) 10Filippo Giunchedi: icinga: read monitoring groups from wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/731943 (https://phabricator.wikimedia.org/T286467) [12:55:56] let the enterprise folks know, they'll be glad to hear it! [12:56:40] the idea is to provide these areound the 1st/2nd of the month and the 20th/21st of the month near the start of the regular xml/sql dump runs [12:56:44] step by step [12:57:01] should be more an enough. After all, if you ever need most recent data, you can download them as an one-off yourself [12:57:16] (WMCS has credential-less access to the files, too) [12:57:29] *more than enough [12:57:42] (03PS1) 10Hashar: Unbreak Parsoid: add missing files from f8485f48 [vendor] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731948 (https://phabricator.wikimedia.org/T293735) [12:57:49] (03CR) 10Hashar: [C: 03+2] Unbreak Parsoid: add missing files from f8485f48 [vendor] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731948 (https://phabricator.wikimedia.org/T293735) (owner: 10Hashar) [12:58:48] yes they do; these are fore folks who don't have an instance and wouldn't set one up, or want to be able to provide mirrors or whatever [12:58:55] PROBLEM - SSH on bast3005 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [12:59:23] yeah, dumps is surely a good thing to store this :) [12:59:34] (you don't need an instance though, PAWS is in WMCS, so it has access, too :)) [12:59:38] well, they are dumps after all! [12:59:41] exactly! [12:59:47] oh via paws, that's interesting, didn't think of that [12:59:56] yeah, paws or toolforge [13:00:04] hashar and dancy: That opportune time is upon us again. Time for a MediaWiki train - Utc-0+Utc-7 Version deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T1300). [13:00:07] both should be low-barrier projects [13:00:55] RECOVERY - SSH on bast3005 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [13:01:18] train has some delay pending CI to merge a patch to mediawiki/vendor [13:01:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17544 and previous config saved to /var/cache/conftool/dbconfig/20211019-130147-root.json [13:01:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:04:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17545 and previous config saved to /var/cache/conftool/dbconfig/20211019-130424-root.json [13:04:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:11] well downloading from us is even more low barrier. anyways the more the merrier, the point is not just to collect all the knowledge but for it to get used :-) [13:10:55] yup [13:11:05] after all, we want people to download knowledge from us apergos :) [13:13:47] from us or anyone else with it! [13:15:58] (03CR) 10Btullis: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/731934 (owner: 10Muehlenhoff) [13:16:44] (03CR) 10Hashar: [C: 03+1] "I have cherry picked it on the integration puppet master so I guess that can be merged at any time ;)" [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) (owner: 10Hashar) [13:16:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17546 and previous config saved to /var/cache/conftool/dbconfig/20211019-131651-root.json [13:16:53] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: rewrite static assets by default [deployment-charts] - 10https://gerrit.wikimedia.org/r/731926 (https://phabricator.wikimedia.org/T285232) (owner: 10Giuseppe Lavagetto) [13:16:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:11] (03PS2) 10Btullis: Add the data-engineering team to Alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/731884 (https://phabricator.wikimedia.org/T293399) [13:18:45] (03CR) 10Btullis: Add the data-engineering team to Alertmanager (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/731884 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [13:19:21] 10SRE, 10SRE-Access-Requests, 10Analytics: Kerberos identity for kcv-wikimf - https://phabricator.wikimedia.org/T293189 (10Ottomata) @KCVelaga_WMF do you have ssh access yet? You'll need that to if not. In {T291475} I don't see that being requested. See https://wikitech.wikimedia.org/wiki/Analytics/Data_a... [13:19:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17547 and previous config saved to /var/cache/conftool/dbconfig/20211019-131927-root.json [13:19:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:43] (03CR) 10Ottomata: "Hiya, yesterday as part of https://phabricator.wikimedia.org/T277193 we converted wgEventStreams to be keyed by stream name. I'm in the p" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730848 (https://phabricator.wikimedia.org/T289771) (owner: 10Jgiannelos) [13:21:52] (03Merged) 10jenkins-bot: mediawiki: rewrite static assets by default [deployment-charts] - 10https://gerrit.wikimedia.org/r/731926 (https://phabricator.wikimedia.org/T285232) (owner: 10Giuseppe Lavagetto) [13:22:02] (03Merged) 10jenkins-bot: Unbreak Parsoid: add missing files from f8485f48 [vendor] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731948 (https://phabricator.wikimedia.org/T293735) (owner: 10Hashar) [13:22:53] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: nest kubernetes labels in rsyslog [deployment-charts] - 10https://gerrit.wikimedia.org/r/731881 (owner: 10Giuseppe Lavagetto) [13:25:16] (03CR) 10Muehlenhoff: Add ownership annotations for IF services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/731934 (owner: 10Muehlenhoff) [13:25:24] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM! Feel free to merge at your convenience" [puppet] - 10https://gerrit.wikimedia.org/r/731884 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [13:25:29] (03PS1) 10Hashar: testwikis wikis to 1.38.0-wmf.5 refs T281169 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731972 [13:25:31] (03CR) 10Hashar: [C: 03+2] testwikis wikis to 1.38.0-wmf.5 refs T281169 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731972 (owner: 10Hashar) [13:25:35] (03PS2) 10Muehlenhoff: Add ownership annotations for IF services [puppet] - 10https://gerrit.wikimedia.org/r/731934 (https://phabricator.wikimedia.org/T216088) [13:26:21] (03Merged) 10jenkins-bot: testwikis wikis to 1.38.0-wmf.5 refs T281169 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731972 (owner: 10Hashar) [13:26:23] !log hashar@deploy1002 Started scap: testwikis wikis to 1.38.0-wmf.5 refs T281169 [13:26:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:29] T281169: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 [13:27:11] (03PS2) 10Ottomata: wgEventStreams - remove redundant stream setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731833 (https://phabricator.wikimedia.org/T277193) [13:27:13] (03Merged) 10jenkins-bot: mediawiki: nest kubernetes labels in rsyslog [deployment-charts] - 10https://gerrit.wikimedia.org/r/731881 (owner: 10Giuseppe Lavagetto) [13:27:30] (03CR) 10Elukey: [C: 03+1] add enwiki-damaging inference service to LiftWing [deployment-charts] - 10https://gerrit.wikimedia.org/r/731944 (https://phabricator.wikimedia.org/T293762) (owner: 10Kevin Bazira) [13:28:36] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [13:28:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:12] (03CR) 10Btullis: [C: 03+2] Add the data-engineering team to Alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/731884 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [13:30:08] (03CR) 10Elukey: [C: 03+2] add enwiki-damaging inference service to LiftWing [deployment-charts] - 10https://gerrit.wikimedia.org/r/731944 (https://phabricator.wikimedia.org/T293762) (owner: 10Kevin Bazira) [13:30:28] (03PS2) 10Marostegui: dbbackups: Switch s1 backup generation from db1139 to db1140 [puppet] - 10https://gerrit.wikimedia.org/r/721286 (https://phabricator.wikimedia.org/T290865) (owner: 10Jcrespo) [13:31:21] !log oblivian@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [13:31:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:37] (03CR) 10Elukey: [V: 03+2 C: 03+2] add enwiki-damaging inference service to LiftWing [deployment-charts] - 10https://gerrit.wikimedia.org/r/731944 (https://phabricator.wikimedia.org/T293762) (owner: 10Kevin Bazira) [13:32:12] hashar: dancy train slot is now, but looks paused atm? ok if I deploy a config change? [13:33:51] (03PS3) 10Jgiannelos: Configure event stream for map tiles state change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730848 (https://phabricator.wikimedia.org/T289771) [13:37:31] PROBLEM - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: deploy_to_mwdebug.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:37:35] ottomata: doesn't look at all paused to me? https://sal.toolforge.org/log/ahC7mHwB1jz_IcWuSeUO [13:38:45] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [13:39:05] majavah: ah ok, just saw hashar's comment: [13:39:05] > train has some delay pending CI to merge a patch to mediawiki/vendor [13:39:58] 10SRE, 10SRE-Access-Requests, 10Analytics: Kerberos identity for kcv-wikimf - https://phabricator.wikimedia.org/T293189 (10KCVelaga_WMF) Hi @Ottomata: If I am not wrong, I think that had been done with {T292992}. I just tried, and I am able to run Hive queries through JupyterHub. Please let me know if I am m... [13:40:41] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [13:44:31] (03PS1) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/731976 [13:45:51] !log oblivian@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [13:45:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:56] 10SRE, 10SRE-Access-Requests, 10Analytics: Kerberos identity for kcv-wikimf - https://phabricator.wikimedia.org/T293189 (10Ottomata) You are right! You have it! :) [13:49:27] (03PS1) 10Kormat: mariadb: Use scripts instead of aliases for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731977 (https://phabricator.wikimedia.org/T291352) [13:50:09] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Use scripts instead of aliases for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731977 (https://phabricator.wikimedia.org/T291352) (owner: 10Kormat) [13:51:20] (03PS2) 10Kormat: mariadb: Use scripts instead of aliases for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731977 (https://phabricator.wikimedia.org/T291352) [13:52:50] (03PS3) 10Btullis: Add the first data-engineering team alert to Alertmanager [alerts] - 10https://gerrit.wikimedia.org/r/731919 (https://phabricator.wikimedia.org/T293399) [13:52:54] !log kevinbazira@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' . [13:52:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:58] (03CR) 10Btullis: Add the first data-engineering team alert to Alertmanager (032 comments) [alerts] - 10https://gerrit.wikimedia.org/r/731919 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [13:53:43] woooowww [13:54:10] (03PS3) 10Kormat: mariadb: Use scripts instead of aliases for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731977 (https://phabricator.wikimedia.org/T291352) [13:55:05] 10SRE, 10ops-codfw, 10DBA: Upgrade db2112 firmware/BIOS - https://phabricator.wikimedia.org/T293740 (10Papaul) @Marostegui welcome back from your vacation. I will work on it today and let you know when it is ready. [13:55:32] 10SRE, 10ops-codfw, 10DBA: Upgrade db2112 firmware/BIOS - https://phabricator.wikimedia.org/T293740 (10Marostegui) Thank you Papaul!! <3 [14:01:24] (03PS4) 10Kormat: mariadb: Use scripts instead of aliases for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731977 (https://phabricator.wikimedia.org/T291352) [14:02:02] (03CR) 10Kormat: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31762/console" [puppet] - 10https://gerrit.wikimedia.org/r/731977 (https://phabricator.wikimedia.org/T291352) (owner: 10Kormat) [14:04:16] (03PS5) 10Kormat: mariadb: Use scripts instead of aliases for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731977 (https://phabricator.wikimedia.org/T291352) [14:04:34] those rsync takes so long [14:05:05] (03CR) 10Jbond: [C: 03+2] contint: regularly prune docker material [puppet] - 10https://gerrit.wikimedia.org/r/731840 (https://phabricator.wikimedia.org/T292729) (owner: 10Hashar) [14:05:48] I will do group0 later tonight [14:06:36] (03CR) 10Filippo Giunchedi: "LGTM (see inline)" [puppet] - 10https://gerrit.wikimedia.org/r/731774 (https://phabricator.wikimedia.org/T293439) (owner: 10Herron) [14:08:14] (03CR) 10Filippo Giunchedi: [C: 04-1] kafka_shipper: point codfw hosts to kafka-logging-codfw (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/731125 (https://phabricator.wikimedia.org/T293439) (owner: 10Herron) [14:09:41] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1003/31763/" [puppet] - 10https://gerrit.wikimedia.org/r/731976 (https://phabricator.wikimedia.org/T293439) (owner: 10Herron) [14:11:36] !log hashar@deploy1002 Finished scap: testwikis wikis to 1.38.0-wmf.5 refs T281169 (duration: 45m 13s) [14:11:37] (03CR) 10Herron: kafka_shipper: map site -> brokers centrally & point codfw to site local brokers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/731774 (https://phabricator.wikimedia.org/T293439) (owner: 10Herron) [14:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:43] T281169: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 [14:13:08] (03CR) 10Filippo Giunchedi: "LGTM, see inline" [alerts] - 10https://gerrit.wikimedia.org/r/731919 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [14:15:36] !log oblivian@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [14:15:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:54] 10SRE, 10ops-codfw, 10DC-Ops, 10observability, 10User-fgiunchedi: codfw: Testing Out Sample PDUs - https://phabricator.wikimedia.org/T265435 (10fgiunchedi) Thanks @papaul, that's unfortunate re: v2 support but thanks for investigating further. We could try with v3 and see what happens though [14:20:23] (03PS4) 10Herron: kafka_shipper: map site->kafka cluster name & point codfw to codfw brokers [puppet] - 10https://gerrit.wikimedia.org/r/731976 (https://phabricator.wikimedia.org/T293439) [14:21:35] (03PS4) 10Jgiannelos: Configure event stream for map tiles state change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730848 (https://phabricator.wikimedia.org/T289771) [14:22:16] (03PS1) 10Giuseppe Lavagetto: rsyslog: fix typo [deployment-charts] - 10https://gerrit.wikimedia.org/r/731980 [14:24:12] (03CR) 10Giuseppe Lavagetto: [C: 03+2] rsyslog: fix typo [deployment-charts] - 10https://gerrit.wikimedia.org/r/731980 (owner: 10Giuseppe Lavagetto) [14:24:28] (03PS5) 10Ema: upgrade-varnish: support frontend instance only [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 [14:24:37] (03PS1) 10ArielGlenn: add Enterprise HTML dups to the other dumps listing [puppet] - 10https://gerrit.wikimedia.org/r/731981 [14:24:59] (03CR) 10Marostegui: [C: 03+1] mariadb: Use scripts instead of aliases for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731977 (https://phabricator.wikimedia.org/T291352) (owner: 10Kormat) [14:25:12] (03CR) 10jerkins-bot: [V: 04-1] add Enterprise HTML dups to the other dumps listing [puppet] - 10https://gerrit.wikimedia.org/r/731981 (owner: 10ArielGlenn) [14:25:14] (03Abandoned) 10Herron: kafka_shipper: point codfw hosts to kafka-logging-codfw [puppet] - 10https://gerrit.wikimedia.org/r/731125 (https://phabricator.wikimedia.org/T293439) (owner: 10Herron) [14:26:43] (03PS2) 10ArielGlenn: add Enterprise HTML dups to the other dumps listing [puppet] - 10https://gerrit.wikimedia.org/r/731981 (https://phabricator.wikimedia.org/T273585) [14:26:58] (03CR) 10jerkins-bot: [V: 04-1] upgrade-varnish: support frontend instance only [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 (owner: 10Ema) [14:27:15] (03CR) 10Kormat: [C: 03+2] mariadb: Use scripts instead of aliases for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731977 (https://phabricator.wikimedia.org/T291352) (owner: 10Kormat) [14:27:53] (03CR) 10Ema: upgrade-varnish: support frontend instance only (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 (owner: 10Ema) [14:28:32] (03CR) 10ArielGlenn: [C: 03+2] add Enterprise HTML dups to the other dumps listing [puppet] - 10https://gerrit.wikimedia.org/r/731981 (https://phabricator.wikimedia.org/T273585) (owner: 10ArielGlenn) [14:28:59] (03Merged) 10jenkins-bot: rsyslog: fix typo [deployment-charts] - 10https://gerrit.wikimedia.org/r/731980 (owner: 10Giuseppe Lavagetto) [14:29:16] !log disable puppet on lvs, cp, authdns, mc, mw-be and wcqs to while i merge G:662699 [14:29:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:43] (03PS6) 10Ema: upgrade-varnish: support frontend instance only [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 [14:30:14] (03CR) 10Jbond: [C: 03+2] interface-rps.py: no-op format/comment fixups [puppet] - 10https://gerrit.wikimedia.org/r/730210 (https://phabricator.wikimedia.org/T236208) (owner: 10BBlack) [14:30:18] (03PS1) 10Kormat: mariadb: Fix quoting in prompt for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731984 (https://phabricator.wikimedia.org/T291352) [14:30:26] (03CR) 10Jbond: [C: 03+2] interface: update rps script to also set the number of queues via ethtool [puppet] - 10https://gerrit.wikimedia.org/r/662688 (https://phabricator.wikimedia.org/T236208) (owner: 10Jbond) [14:30:36] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Fix quoting in prompt for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731984 (https://phabricator.wikimedia.org/T291352) (owner: 10Kormat) [14:31:17] (03PS2) 10Kormat: mariadb: Fix quoting in prompt for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731984 (https://phabricator.wikimedia.org/T291352) [14:32:50] (03CR) 10Kormat: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31764/console" [puppet] - 10https://gerrit.wikimedia.org/r/731984 (https://phabricator.wikimedia.org/T291352) (owner: 10Kormat) [14:34:10] (03CR) 10Kormat: [V: 03+1 C: 03+2] mariadb: Fix quoting in prompt for mysql.
[puppet] - 10https://gerrit.wikimedia.org/r/731984 (https://phabricator.wikimedia.org/T291352) (owner: 10Kormat) [14:34:32] !log oblivian@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [14:34:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:27] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/730513 (owner: 10Jbond) [14:36:40] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/731925 (owner: 10Jbond) [14:39:23] 10SRE, 10Analytics, 10SRE Observability (FY2021/2022-Q2): statsd and gunicorn metrics for superset - https://phabricator.wikimedia.org/T293761 (10BTullis) Thanks for the heads-up @fgiunchedi I've just had a quick look and, as far as I can see, the stats from gunicorn //haven't// been used to date. The only... [14:40:31] 10SRE, 10MW-on-K8s, 10serviceops: Evaluate istio as an ingress for production usage - https://phabricator.wikimedia.org/T287007 (10JMeybohm) [14:41:59] (03CR) 10Volans: sre: add contool aware SREBatchRunnerBase (035 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/731153 (owner: 10Jbond) [14:44:03] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/731934 (https://phabricator.wikimedia.org/T216088) (owner: 10Muehlenhoff) [14:45:47] (03PS1) 10Jbond: interface-rps: pass device and driver to ethtool functions [puppet] - 10https://gerrit.wikimedia.org/r/731986 [14:46:06] (03PS4) 10Juan90264: Create Portal and Portal talk namespace for shiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731953 (https://phabricator.wikimedia.org/T288909) [14:46:40] (03CR) 10Jbond: [C: 03+2] interface-rps: pass device and driver to ethtool functions [puppet] - 10https://gerrit.wikimedia.org/r/731986 (owner: 10Jbond) [14:47:48] (03CR) 10Volans: [C: 03+1] "LGTM, one nit inline, I'll leave the varnish specific bits to the traffic team" [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 (owner: 10Ema) [14:47:58] (03PS1) 10Giuseppe Lavagetto: mediawiki: fix rsyslog template [deployment-charts] - 10https://gerrit.wikimedia.org/r/731988 [14:48:02] 10SRE, 10ops-codfw, 10DBA: Upgrade db2112 firmware/BIOS - https://phabricator.wikimedia.org/T293740 (10Papaul) 05Open→03Resolved @Marostegui firmware upgrade complete [14:48:08] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: fix rsyslog template [deployment-charts] - 10https://gerrit.wikimedia.org/r/731988 (owner: 10Giuseppe Lavagetto) [14:48:52] (03PS1) 10Jbond: interface-rps.py: fix pydocs [puppet] - 10https://gerrit.wikimedia.org/r/731989 [14:49:00] (03PS2) 10Giuseppe Lavagetto: mediawiki: fix rsyslog template [deployment-charts] - 10https://gerrit.wikimedia.org/r/731988 [14:49:25] (03CR) 10Jbond: [V: 03+2 C: 03+2] interface-rps.py: fix pydocs [puppet] - 10https://gerrit.wikimedia.org/r/731989 (owner: 10Jbond) [14:49:50] 10SRE, 10ops-codfw, 10DBA: Upgrade db2112 firmware/BIOS - https://phabricator.wikimedia.org/T293740 (10Marostegui) Thanks! I will try the reimage tomorrow again [14:54:32] (03PS7) 10Ema: upgrade-varnish: support frontend instance only [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 [14:55:22] (03CR) 10Volans: [C: 03+1] upgrade-varnish: support frontend instance only [cookbooks] - 10https://gerrit.wikimedia.org/r/731935 (owner: 10Ema) [15:00:59] (03CR) 10Filippo Giunchedi: "I think this works too, see my comment re: global variables on https://gerrit.wikimedia.org/r/c/operations/puppet/+/731774" [puppet] - 10https://gerrit.wikimedia.org/r/731976 (https://phabricator.wikimedia.org/T293439) (owner: 10Herron) [15:04:14] PROBLEM - Check systemd state on ms-be2036 is CRITICAL: CRITICAL - degraded: The following units failed: swift-drive-audit.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:05:34] (03PS1) 10Jbond: interface-rps.py: stdout=subprocess.PIPE instead of capture_ouput=True [puppet] - 10https://gerrit.wikimedia.org/r/731992 [15:05:42] (03PS4) 10Btullis: Add the first data-engineering team alert to Alertmanager [alerts] - 10https://gerrit.wikimedia.org/r/731919 (https://phabricator.wikimedia.org/T293399) [15:08:11] (03CR) 10Btullis: "I've added a unit test in the latest patchset, but I'm fumbling around here a little." [alerts] - 10https://gerrit.wikimedia.org/r/731919 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [15:08:22] (03CR) 10Volans: [C: 03+1] "LGMT, nit inline" [puppet] - 10https://gerrit.wikimedia.org/r/731992 (owner: 10Jbond) [15:12:54] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet [15:12:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:13] (03PS2) 10Jbond: interface-rps.py: stdout=subprocess.PIPE instead of capture_ouput=True [puppet] - 10https://gerrit.wikimedia.org/r/731992 [15:13:21] (03CR) 10Jbond: interface-rps.py: stdout=subprocess.PIPE instead of capture_ouput=True (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/731992 (owner: 10Jbond) [15:13:26] (03CR) 10Jbond: [C: 03+2] interface-rps.py: stdout=subprocess.PIPE instead of capture_ouput=True [puppet] - 10https://gerrit.wikimedia.org/r/731992 (owner: 10Jbond) [15:17:26] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet [15:17:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:01] (03CR) 10BryanDavis: [C: 03+2] toolhub: Bump container version & set URLLIB3_DISABLE_WARNINGS=True [deployment-charts] - 10https://gerrit.wikimedia.org/r/731196 (https://phabricator.wikimedia.org/T292025) (owner: 10BryanDavis) [15:20:37] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/731939 (owner: 10Muehlenhoff) [15:23:15] (03CR) 10Jbond: [C: 03+1] Add ownership annotations for IF services [puppet] - 10https://gerrit.wikimedia.org/r/731934 (https://phabricator.wikimedia.org/T216088) (owner: 10Muehlenhoff) [15:24:35] (03Merged) 10jenkins-bot: toolhub: Bump container version & set URLLIB3_DISABLE_WARNINGS=True [deployment-charts] - 10https://gerrit.wikimedia.org/r/731196 (https://phabricator.wikimedia.org/T292025) (owner: 10BryanDavis) [15:26:10] !log bd808@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . [15:26:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:23] 10SRE, 10MW-on-K8s, 10Traffic, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10jijiki) {F34697825} [15:28:25] !log bd808@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' . [15:28:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:57] (03PS3) 10Ottomata: thorium decom - Remove absented rsync module [puppet] - 10https://gerrit.wikimedia.org/r/725059 (https://phabricator.wikimedia.org/T292075) [15:30:16] !log bd808@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . [15:30:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:00] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 T277118 [15:32:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:05] T277118: iw_url in interwiki is varbinary(127) in production but blob in code - https://phabricator.wikimedia.org/T277118 [15:32:07] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 T277118 [15:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:33] (03CR) 10Ottomata: [C: 03+2] wgEventStreams - remove redundant stream setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731833 (https://phabricator.wikimedia.org/T277193) (owner: 10Ottomata) [15:34:19] (03PS1) 10Ottomata: Re-add stream setting in wgEventRelayerConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731999 (https://phabricator.wikimedia.org/T277193) [15:34:44] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on 9 hosts with reason: Schema change s6 T277118 [15:34:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:34:51] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 9 hosts with reason: Schema change s6 T277118 [15:34:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:00] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 T277118 [15:35:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:07] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 T277118 [15:35:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:16] (03CR) 10Ottomata: [C: 03+2] thorium decom - Remove absented rsync module [puppet] - 10https://gerrit.wikimedia.org/r/725059 (https://phabricator.wikimedia.org/T292075) (owner: 10Ottomata) [15:35:59] (03CR) 10Ottomata: [C: 03+2] Re-add stream setting in wgEventRelayerConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731999 (https://phabricator.wikimedia.org/T277193) (owner: 10Ottomata) [15:36:12] (03PS1) 10Jbond: check_user: remove hack for bad json [puppet] - 10https://gerrit.wikimedia.org/r/732001 [15:37:05] (03CR) 10Jbond: [C: 03+2] cookbooks sre: update run_scripts to accept a list of scripts not functions [cookbooks] - 10https://gerrit.wikimedia.org/r/730513 (owner: 10Jbond) [15:37:14] (03CR) 10Jbond: [C: 03+2] cookbooks.sre: update to use correct icinga_hosts instance [cookbooks] - 10https://gerrit.wikimedia.org/r/731925 (owner: 10Jbond) [15:38:36] (03CR) 10Jbond: [C: 03+2] check_user: remove hack for bad json [puppet] - 10https://gerrit.wikimedia.org/r/732001 (owner: 10Jbond) [15:39:43] (03Merged) 10jenkins-bot: cookbooks sre: update run_scripts to accept a list of scripts not functions [cookbooks] - 10https://gerrit.wikimedia.org/r/730513 (owner: 10Jbond) [15:39:47] (03Merged) 10jenkins-bot: cookbooks.sre: update to use correct icinga_hosts instance [cookbooks] - 10https://gerrit.wikimedia.org/r/731925 (owner: 10Jbond) [15:40:04] !log otto@deploy1002 Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - remove now redundant stream setting - T277193 (duration: 01m 04s) [15:40:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:10] T277193: wgEventStreams (EventStreamConfig) should support per wiki overrides - https://phabricator.wikimedia.org/T277193 [15:41:15] (03PS5) 10Bernard Wang: Add WMEDesktopWebUIActionsTrackingOversampleLoggedInUsers config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731827 (https://phabricator.wikimedia.org/T292588) [15:46:04] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 T277118 [15:46:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:10] T277118: iw_url in interwiki is varbinary(127) in production but blob in code - https://phabricator.wikimedia.org/T277118 [15:46:12] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 T277118 [15:46:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:57] 10SRE, 10Analytics, 10SRE Observability (FY2021/2022-Q2): statsd and gunicorn metrics for superset - https://phabricator.wikimedia.org/T293761 (10BTullis) Ah, it looks like there **is** another way of instrumenting superset, but it's only with statsd: https://superset.apache.org/docs/installation/event-loggi... [15:50:00] (03CR) 10Jbond: "@bblack i think this should be safe to merge now can you confirm?" [puppet] - 10https://gerrit.wikimedia.org/r/662699 (https://phabricator.wikimedia.org/T236208) (owner: 10Jbond) [15:51:37] (03PS1) 10BryanDavis: toolhub: Bump container version to 2021-10-19-154611-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/732003 (https://phabricator.wikimedia.org/T293541) [15:52:21] 10SRE, 10Analytics, 10SRE Observability (FY2021/2022-Q2): statsd and gunicorn metrics for superset - https://phabricator.wikimedia.org/T293761 (10Ottomata) https://github.com/prometheus/statsd_exporter ? Not the best, but it would work. [15:53:32] 10SRE, 10Analytics, 10SRE Observability (FY2021/2022-Q2): statsd and gunicorn metrics for superset - https://phabricator.wikimedia.org/T293761 (10Ottomata) > Note that it’s also possible to implement you own logger by deriving superset.stats_logger.BaseStatsLogger. Probably better to implement our own prome... [15:57:32] (03CR) 10BryanDavis: [C: 03+2] toolhub: Bump container version to 2021-10-19-154611-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/732003 (https://phabricator.wikimedia.org/T293541) (owner: 10BryanDavis) [15:57:54] jouncebot: now [15:57:54] No deployments scheduled for the next 0 hour(s) and 2 minute(s) [15:58:00] jouncebot: next [15:58:00] In 0 hour(s) and 1 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T1600) [15:58:17] (03PS10) 10Cwhite: opensearch: fork elasticsearch module into opensearch module [puppet] - 10https://gerrit.wikimedia.org/r/721359 (https://phabricator.wikimedia.org/T288618) [15:58:42] (03PS11) 10Cwhite: opensearch: fork elasticsearch module into opensearch module [puppet] - 10https://gerrit.wikimedia.org/r/721359 (https://phabricator.wikimedia.org/T288618) [16:00:04] jbond and rzl: I, the Bot under the Fountain, call upon thee, The Deployer, to do Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T1600). [16:00:04] Lucas_WMDE and MichaelG_WMDE: A patch you scheduled for Puppet request window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:09] o/ [16:00:20] 👋 [16:00:45] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 T277118 [16:00:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:52] T277118: iw_url in interwiki is varbinary(127) in production but blob in code - https://phabricator.wikimedia.org/T277118 [16:00:54] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 T277118 [16:00:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:36] * MichaelG_WMDE is a bit impressed that jouncebot can handle multiple devs and nicely connect them by "and", all I did was to copy a template in the table [16:01:59] (03Merged) 10jenkins-bot: toolhub: Bump container version to 2021-10-19-154611-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/732003 (https://phabricator.wikimedia.org/T293541) (owner: 10BryanDavis) [16:02:03] RECOVERY - Check systemd state on ms-be2036 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:04:40] (03CR) 10Brennen Bearnes: "Thanks for review! A deploy of this today would probably be ideal." [puppet] - 10https://gerrit.wikimedia.org/r/731849 (https://phabricator.wikimedia.org/T293696) (owner: 10Brennen Bearnes) [16:05:50] Lucas_WMDE, MichaelG_WMDE: sorry I'm late! taking a look [16:05:55] thanks! [16:06:11] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 T277118 [16:06:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:17] T277118: iw_url in interwiki is varbinary(127) in production but blob in code - https://phabricator.wikimedia.org/T277118 [16:06:22] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 T277118 [16:06:25] I don’t know how much time you usually leave between absenting and dropping something, maybe the second patch should wait longer idk [16:06:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:06] in principle it only needs one puppet run in between, so we could do it after 30 min [16:07:12] sounds good [16:07:16] col [16:07:19] cool [16:07:20] but if you're not in a hurry I'll wait a bit longer, just in case puppet fails on one host or whatever [16:07:35] sure [16:07:48] no need to fit it into the deploy window, I'll merge the first patch now and then come back and do the second one later today if that sounds good to you [16:07:55] keep in mind that if there is any host down for hw maintenance it will not get the resources removed [16:07:57] no need for you to hang around for it :) [16:08:05] sounds good to me! :) [16:08:27] volans: yeah, in practice it's just the maintenance hosts so it's easy to keep track of [16:08:32] or puppet disabled ofc [16:08:37] ah ok :D [16:08:38] for the first patch, I tried to check that the service currently doesn’t do anything, but it looks like I don’t have journal access on mwmaint1002 [16:08:40] if it were more nodes I'd worry more about it [16:08:42] thanks! [16:08:52] so I could check that it doesn’t run for long, but not if it logged anything [16:09:06] maybe you can take a quick look there to be sure? [16:09:21] sure [16:09:22] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Schema change s1 T277118 [16:09:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:28] looks like a pretty straightforward diff but I'm happy to double-check [16:09:32] (03CR) 10RLazarus: [C: 03+2] mediawiki: Absent wikibase_repo_prune2 systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/726746 (https://phabricator.wikimedia.org/T292604) (owner: 10Ladsgroup) [16:09:33] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Schema change s1 T277118 [16:09:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:37] looking at the code, it should log something like “16:00:00 0 rows pruned” [16:11:53] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 T277118 [16:11:59] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 T277118 [16:12:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:01] T277118: iw_url in interwiki is varbinary(127) in production but blob in code - https://phabricator.wikimedia.org/T277118 [16:12:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:13] (03PS6) 10Bernard Wang: Add WMEDesktopWebUIActionsTrackingOversampleLoggedInUsers config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731827 (https://phabricator.wikimedia.org/T292588) [16:16:43] MichaelG_WMDE, Lucas_WMDE: merged! I do indeed see "16:15:00 16:15:00 0 rows pruned." in the log, because puppet didn't quite finish running in time to stop that last invocation [16:16:58] but I'll recheck after 16:30 and observe it didn't run again [16:17:02] ok! [16:18:02] in the meantime though, `systemctl list-timers mediawiki_job_wikibase_repo_prune2.timer` returns `0 timers listed.` [16:18:11] as expected :) [16:18:44] and status says “Unit mediawiki_job_wikibase_repo_prune2.timer could not be found.” :) [16:19:31] looks like the only remaining wikidata/wikibase-related one is mediawiki_job_wikidata-updateQueryServiceLag [16:20:34] (03CR) 10David Caro: [C: 03+1] "Small nit, ok for me though." [puppet] - 10https://gerrit.wikimedia.org/r/731924 (owner: 10Majavah) [16:20:48] 👍 [16:20:49] 10SRE, 10SRE-Access-Requests, 10Gerrit-Privilege-Requests, 10LDAP-Access-Requests: Offboard Tonina Zhelyazkova from WMF systems - https://phabricator.wikimedia.org/T293621 (10Dzahn) Since anyone can create an account on Wikitech and there is nothing private in it, I am not sure there is much value in wipi... [16:25:59] (03CR) 10Dzahn: [C: 03+1] "I'll merge today in 2 hours or so." [puppet] - 10https://gerrit.wikimedia.org/r/731849 (https://phabricator.wikimedia.org/T293696) (owner: 10Brennen Bearnes) [16:26:44] (03CR) 10Brennen Bearnes: [C: 03+1] gitlab: remove cas3 from external providers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/731849 (https://phabricator.wikimedia.org/T293696) (owner: 10Brennen Bearnes) [16:29:21] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [16:29:50] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [16:30:24] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [16:32:30] 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10NaRay) [16:35:28] rzl: did you check that it didn’t run again? :) [16:37:26] rzl@mwmaint1002:~$ journalctl -u mediawiki_job_wikibase_repo_prune2 | tail -n1 [16:37:26] Oct 19 16:15:00 mwmaint1002 systemd[1]: mediawiki_job_wikibase_repo_prune2.service: Succeeded. [16:37:28] lgtm! [16:37:32] \o/ [16:37:38] then I’ll head out, thank you for deploying :) [16:37:58] oh jeez I'm sorry I didn't realize you were still waiting! yes indeed, thanks for sticking around [16:38:04] have a good evening [16:39:27] it’s fine :) thanks! [16:40:21] thank you :) [16:41:52] !log bd808@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . [16:41:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:51] (03CR) 10Hnowlan: [C: 03+2] Update api-gateway chart's comment about service routes [deployment-charts] - 10https://gerrit.wikimedia.org/r/730963 (owner: 10Elukey) [16:45:22] (03CR) 10Hnowlan: maps: Add script to send tile invalidation events (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) (owner: 10Jgiannelos) [16:46:31] !log bd808@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' . [16:46:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:42] (03Merged) 10jenkins-bot: Update api-gateway chart's comment about service routes [deployment-charts] - 10https://gerrit.wikimedia.org/r/730963 (owner: 10Elukey) [16:48:30] !log bd808@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . [16:48:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:10] (03PS1) 10BBlack: Add digicert-2021-unified certs and intermediates [puppet] - 10https://gerrit.wikimedia.org/r/732009 (https://phabricator.wikimedia.org/T289507) [16:56:37] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "@Majavah let me know if you need me to merge this. I don't remember if you have +2 on this repo." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/731220 (owner: 10Majavah) [17:00:05] chrisalbon and accraze: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T1700). [17:09:54] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye [17:09:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:02] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T289888 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudmetrics1003.eqiad.wmn... [17:11:03] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS bullseye [17:11:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:11:10] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T289888 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudmetrics1004.eqiad.wmn... [17:14:23] (03PS1) 10Majavah: openstack:haproxy: port-based tls termination support [puppet] - 10https://gerrit.wikimedia.org/r/732012 (https://phabricator.wikimedia.org/T267194) [17:14:49] (03CR) 10Majavah: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/732012 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [17:22:38] (03PS2) 10Muehlenhoff: Remove obsolete pki role [puppet] - 10https://gerrit.wikimedia.org/r/731939 [17:22:53] (03CR) 10CDanis: [C: 03+2] Add rate of high-signal NELs as a status page metric [puppet] - 10https://gerrit.wikimedia.org/r/731171 (https://phabricator.wikimedia.org/T285569) (owner: 10CDanis) [17:23:52] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete pki role [puppet] - 10https://gerrit.wikimedia.org/r/731939 (owner: 10Muehlenhoff) [17:25:05] (03CR) 10Andrew Bogott: [C: 03+2] openstack:haproxy: port-based tls termination support [puppet] - 10https://gerrit.wikimedia.org/r/732012 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [17:25:44] !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudmetrics1003.eqiad.wmnet with OS bullseye [17:25:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:25:51] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T289888 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudmetrics1003.eqiad.wmnet w... [17:30:23] (03PS1) 10Arlolra: Fix capitalization on enterprise dumps page [puppet] - 10https://gerrit.wikimedia.org/r/732016 [17:32:29] (03PS1) 10Majavah: openstack:haproxy: set x-forwarded-proto [puppet] - 10https://gerrit.wikimedia.org/r/732017 (https://phabricator.wikimedia.org/T267194) [17:35:24] (03PS2) 10Legoktm: dumps: Fix capitalization on enterprise dumps page [puppet] - 10https://gerrit.wikimedia.org/r/732016 (owner: 10Arlolra) [17:36:11] (03CR) 10Legoktm: [C: 03+2] dumps: Fix capitalization on enterprise dumps page [puppet] - 10https://gerrit.wikimedia.org/r/732016 (owner: 10Arlolra) [17:37:35] (03CR) 10Andrew Bogott: [C: 03+2] openstack:haproxy: set x-forwarded-proto [puppet] - 10https://gerrit.wikimedia.org/r/732017 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [17:37:39] 10SRE, 10ops-eqiad: eqiad: patch 2nd Equinix IXP - https://phabricator.wikimedia.org/T293726 (10wiki_willy) a:03Cmjohnson [17:37:59] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS bullseye [17:38:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:06] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T289888 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudmetrics1004.eqiad.wmnet w... [17:41:01] (03PS1) 10Legoktm: dumps: Improve enterprise index a bit more [puppet] - 10https://gerrit.wikimedia.org/r/732019 [17:41:59] (03PS2) 10Legoktm: dumps: Improve enterprise index a bit more [puppet] - 10https://gerrit.wikimedia.org/r/732019 [17:42:42] (03PS1) 10Reedy: Fix comment about file path being maintained by puppet [puppet] - 10https://gerrit.wikimedia.org/r/732020 [17:43:13] heh, snap [17:44:39] RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:44:46] (03PS1) 10Majavah: openstack::haproxy: tls-ify nova and glance on codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732021 (https://phabricator.wikimedia.org/T267194) [17:47:07] (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: tls-ify nova and glance on codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732021 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [17:49:25] PROBLEM - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: deploy_to_mwdebug.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:53:49] (03PS5) 10Legoktm: Add framework for setting $wgLocalHTTPProxy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731861 (https://phabricator.wikimedia.org/T288848) [17:53:51] (03PS4) 10Legoktm: Enable $wgLocalHTTPProxy on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731862 (https://phabricator.wikimedia.org/T288848) [17:56:26] (03CR) 10Legoktm: [C: 03+2] Add framework for setting $wgLocalHTTPProxy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731861 (https://phabricator.wikimedia.org/T288848) (owner: 10Legoktm) [17:57:43] !log removing six email addresses on request (with deleteUserEmail.php) [17:57:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:57:50] (03Merged) 10jenkins-bot: Add framework for setting $wgLocalHTTPProxy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731861 (https://phabricator.wikimedia.org/T288848) (owner: 10Legoktm) [17:59:19] (03CR) 10Dzahn: [C: 03+2] gitlab: remove cas3 from external providers [puppet] - 10https://gerrit.wikimedia.org/r/731849 (https://phabricator.wikimedia.org/T293696) (owner: 10Brennen Bearnes) [17:59:26] !log legoktm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Add framework for setting $wgLocalHTTPProxy (T288848) (1/2) (duration: 01m 05s) [17:59:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:32] T288848: Make HTTP calls work within mediawiki on kubernetes - https://phabricator.wikimedia.org/T288848 [18:00:05] RoanKattouw and Urbanecm: May I have your attention please! UTC evening backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T1800) [18:00:05] MatmaRex and Seddon: A patch you scheduled for UTC evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:11] (03PS5) 10Herron: kafka_shipper: map site->kafka cluster name & point codfw to codfw brokers [puppet] - 10https://gerrit.wikimedia.org/r/731976 (https://phabricator.wikimedia.org/T293439) [18:00:12] i can deploy today [18:00:18] MatmaRex__: Seddon: around? [18:00:21] * legoktm will be done in a few seconds [18:00:24] hi. my irc client has wedged itself but i hope this works [18:00:29] ack, I'll wait for legoktm to finish [18:00:37] MatmaRex__: i can see your message, so I guess so! [18:00:48] !log legoktm@deploy1002 Synchronized wmf-config/: Add framework for setting $wgLocalHTTPProxy (T288848) (2/2) (duration: 01m 06s) [18:00:52] {{done}} [18:00:53] (03CR) 10jerkins-bot: [V: 04-1] kafka_shipper: map site->kafka cluster name & point codfw to codfw brokers [puppet] - 10https://gerrit.wikimedia.org/r/731976 (https://phabricator.wikimedia.org/T293439) (owner: 10Herron) [18:00:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:57] that was fast :) [18:01:57] Seddon: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaSearch/+/730208 is quite large. May I ask why do we need to backport it? [18:02:15] (03PS2) 10Urbanecm: Enable topic subscriptions as a beta feature on all remaining projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731805 (https://phabricator.wikimedia.org/T287802) (owner: 10Bartosz Dziewoński) [18:02:18] (03CR) 10Urbanecm: [C: 03+2] Enable topic subscriptions as a beta feature on all remaining projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731805 (https://phabricator.wikimedia.org/T287802) (owner: 10Bartosz Dziewoński) [18:02:55] Present! [18:03:11] hi Seddon! Can you please look at my q above? :- [18:03:24] (03CR) 10Dzahn: "Notice: /Stage[main]/Gitlab/Exec[Reconfigure GitLab]/returns: Chef Infra Client finished, 18/649 resources updated in 45 seconds" [puppet] - 10https://gerrit.wikimedia.org/r/731849 (https://phabricator.wikimedia.org/T293696) (owner: 10Brennen Bearnes) [18:03:27] and I'm actually also curious about https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaSearch/+/731940, which is smaller (thus, safer to backport), but from the commit message, i also don't understand why refactoring should be backported :) [18:03:32] (03Merged) 10jenkins-bot: Enable topic subscriptions as a beta feature on all remaining projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731805 (https://phabricator.wikimedia.org/T287802) (owner: 10Bartosz Dziewoński) [18:03:50] (03PS6) 10Herron: kafka_shipper: map site->kafka cluster name & point codfw to codfw brokers [puppet] - 10https://gerrit.wikimedia.org/r/731976 (https://phabricator.wikimedia.org/T293439) [18:04:10] MatmaRex__: your patch is available at mwdebug1001, please test. [18:05:38] looking [18:05:49] urbanecm: so they come as a pair. Basically there is a large number of client side production errors for the past month, and this was the patch that was going to fix it. I'm happy to wait for next weeks train to resolve. [18:06:25] looks good at enwiki urbanecm [18:06:46] MatmaRex__: thanks, are you testing at other wikis, too? [18:07:09] Seddon: i see. Do they depend one on the other, please? [18:07:20] (also, I'd appreciate if in the future that could be clarified in the commit messages) [18:07:34] splitting refactoring from fixes of errors [18:07:37] @urbanecm they do, and I will pass that on! [18:07:39] Well [18:07:53] Both the fix is for the refactor, which is also fixing things [18:07:56] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=sidekiq site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:08:06] It's a mess, that I acknowledge :p [18:08:13] heh, okay [18:08:22] well, it's frontend, which is generally less risky to backport than PHP [18:08:24] so let's do it [18:08:36] (03PS1) 10Urbanecm: Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/731961 [18:08:39] everything looks good [18:08:54] (03PS1) 10Urbanecm: Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731962 [18:08:57] (03CR) 10Urbanecm: [C: 03+2] Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/731961 (owner: 10Urbanecm) [18:09:03] (03CR) 10Urbanecm: [C: 03+2] Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731962 (owner: 10Urbanecm) [18:09:38] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:09:56] MatmaRex__: thanks, syncign [18:10:08] (03CR) 10Dzahn: builder/systemtap: merge role::systemtap::devserver into builder (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/730863 (owner: 10Dzahn) [18:10:33] (03PS1) 10Urbanecm: Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/732026 (https://phabricator.wikimedia.org/T291392) [18:11:29] (03PS1) 10Urbanecm: Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/732027 (https://phabricator.wikimedia.org/T291392) [18:11:39] (03CR) 10Urbanecm: [C: 03+2] Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/732026 (https://phabricator.wikimedia.org/T291392) (owner: 10Urbanecm) [18:11:42] (03CR) 10Urbanecm: [C: 03+2] Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/732027 (https://phabricator.wikimedia.org/T291392) (owner: 10Urbanecm) [18:11:59] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 9a2893c7190e615a247674dbf7f87348bf43b91c: Enable topic subscriptions as a beta feature on all remaining projects (T287802) (duration: 01m 04s) [18:12:03] MatmaRex__: should be live [18:12:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:05] T287802: Deploy config to introduce manual topic subscriptions as Beta Feature at Phase 3 projects - https://phabricator.wikimedia.org/T287802 [18:12:16] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye [18:12:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:23] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T289888 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudmetrics1003.eqiad.wmn... [18:12:25] (03PS3) 10Urbanecm: foundationwiki: Restrict editing of sensitive namespaces to `editor` group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/717437 (https://phabricator.wikimedia.org/T205350) [18:12:34] (03CR) 10Urbanecm: "Comms gave the go ahead" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/717437 (https://phabricator.wikimedia.org/T205350) (owner: 10Urbanecm) [18:12:42] so, while waiting on CI... [18:12:51] ...let's move foundationwiki's transition a bit forward [18:12:53] (thanks) [18:12:54] (03CR) 10Urbanecm: [C: 03+2] foundationwiki: Restrict editing of sensitive namespaces to `editor` group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/717437 (https://phabricator.wikimedia.org/T205350) (owner: 10Urbanecm) [18:13:07] (03CR) 10Dzahn: builder/systemtap: merge role::systemtap::devserver into builder (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/730863 (owner: 10Dzahn) [18:13:12] (03PS2) 10Urbanecm: foundationwiki: Restrict uploading to editor group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/717497 (https://phabricator.wikimedia.org/T205350) [18:13:16] (03CR) 10Urbanecm: [C: 03+2] foundationwiki: Restrict uploading to editor group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/717497 (https://phabricator.wikimedia.org/T205350) (owner: 10Urbanecm) [18:13:23] np MatmaRex__ [18:13:25] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1002/31766/" [puppet] - 10https://gerrit.wikimedia.org/r/731976 (https://phabricator.wikimedia.org/T293439) (owner: 10Herron) [18:13:58] jouncebot: next [18:13:58] In 0 hour(s) and 46 minute(s): MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T1900) [18:14:06] 🥳 [18:14:09] (03Merged) 10jenkins-bot: foundationwiki: Restrict editing of sensitive namespaces to `editor` group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/717437 (https://phabricator.wikimedia.org/T205350) (owner: 10Urbanecm) [18:14:12] (03Merged) 10jenkins-bot: foundationwiki: Restrict uploading to editor group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/717497 (https://phabricator.wikimedia.org/T205350) (owner: 10Urbanecm) [18:15:45] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10TAndic) Access for Namrata is approved by her manager; can also be confirmed with @janstee if needed! [18:16:13] okay, patches work, syncing [18:16:43] (03PS3) 10Dzahn: builder/systemtap: convert role::systemtap::devserver to profile [puppet] - 10https://gerrit.wikimedia.org/r/730863 [18:17:56] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 1476a2d93: dd8393c1a0: foundationwiki: Restrict sensitive namespaces to editor group (T205350) (duration: 01m 03s) [18:18:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:03] T205350: Update edit permissions for Governance wiki - https://phabricator.wikimedia.org/T205350 [18:18:09] and, works even after syncing [18:18:41] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Dzahn) 05Open→03In progress p:05Triage→03High a:03Dzahn [18:18:54] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Dzahn) confirmed L3 signature [18:20:51] so, now will come the more difficult part (actually connect the wiki to SUL :)). [18:21:04] but not now now, now later. [18:22:39] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Dzahn) @TAndic Thanks, ticket looks good and is in progress. Just one more thing: for contractors we require 2 extra lines of information. An expiry date and an expiry co... [18:27:09] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10NaRay) @Dzahn My contract end date is May 9, 2022, and the person of contact will be @TAndic [18:28:29] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Dzahn) @NaRay Hello and welcome! Do you already have a user on the Wikitech wiki (https://wikitech.wikimedia.org/wiki/Main_Page) ? Could you let me know the user name or... [18:28:54] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Dzahn) >>! In T293810#7441954, @NaRay wrote: > @Dzahn My contract end date is May 9, 2022, and the person of contact will be @TAndic Perfect, thank you. This will let m... [18:30:49] !log deleting 1 more email with deleteUserEmail.php [18:30:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:32] mutante: you might be looking for https://wikitech.wikimedia.org/wiki/Special:Contributions/Naray-ctr (uid=statwithlatte) [18:33:33] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10NaRay) > @NaRay Hello and welcome! Do you already have a user on the Wikitech wiki (https://wikitech.wikimedia.org/wiki/Main_Page) ? Could you let me know the user name o... [18:33:58] urbanecm: oh, thank you. uhm.. weird [18:34:01] (03Merged) 10jenkins-bot: Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/731961 (owner: 10Urbanecm) [18:34:07] (03Merged) 10jenkins-bot: Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/731962 (owner: 10Urbanecm) [18:34:08] mutante: np. What's weird though? [18:34:12] (03Merged) 10jenkins-bot: Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/732026 (https://phabricator.wikimedia.org/T291392) (owner: 10Urbanecm) [18:35:05] urbanecm: well.. just that the uid for Naray-crt is "statwithlatte" and not ... Naray* :p [18:35:24] users come up with weird UIDs all the time :ú) [18:35:31] and that I had searched for email where the field is called mail [18:35:37] heh [18:36:22] yea, not like "ldapsearch -x email=" says something like "dont know that" [18:36:29] Seddon: your patch is at mwdebug1001, can you test? [18:36:34] looks the same as when it's not there [18:36:34] (actually, both of them, for both branches) [18:36:35] testing now [18:37:15] @urbanecm its just commons so just wmf4 for today [18:37:29] i need to do both branches [18:37:32] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS bullseye [18:37:34] otherwise it'd be undeployed with wmf.5 [18:37:36] (it's cut already) [18:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:40] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T289888 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudmetrics1003.eqiad.wmnet w... [18:37:47] Ah yes! [18:38:47] (03Merged) 10jenkins-bot: Refactor the URI.query [extensions/MediaSearch] (wmf/1.38.0-wmf.4) - 10https://gerrit.wikimedia.org/r/732027 (https://phabricator.wikimedia.org/T291392) (owner: 10Urbanecm) [18:39:00] okay, not both branches... [18:39:13] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T289888 (10Cmjohnson) [18:39:25] urbanecm: are both patches both deployed to commons? [18:39:36] Seddon: i missed one when i first messaged you [18:39:37] fixed now [18:39:56] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T289888 (10Cmjohnson) 05Open→03Resolved These are finished with on-site work and ready to be turned over [18:40:10] 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on labweb1002 - https://phabricator.wikimedia.org/T293428 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson This has been completed. [18:42:21] urbanecm: as far as I can tell its all working! [18:42:27] great, so let's push to prod! [18:42:35] Now to see if we reduce or or increase production errors :P [18:42:51] i hope in the former [18:42:58] unless that means your patch breaks logging :D [18:43:50] Out of the box thinking [18:44:38] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Dzahn) >>! In T293810#7441968, @NaRay wrote: > Apologies, my Wikitech username is Naray-ctr No worries, I did not search for your email the right way. @Urbanecm pointed... [18:45:27] !log urbanecm@deploy1002 Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/: 694580a: c02e301: MediaSearch backports(T291392, T293335, T291392, T291622, T293554) (duration: 01m 03s) [18:45:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:39] T293335: Cannot convert undefined or null to object when trying to restorePageState - https://phabricator.wikimedia.org/T293335 [18:45:39] T291622: [M] CheckForMore always return true - https://phabricator.wikimedia.org/T291622 [18:45:39] T293554: Switching tabs in MediaSearch does not re-query the search - https://phabricator.wikimedia.org/T293554 [18:45:39] T291392: Refactor the URI.query - https://phabricator.wikimedia.org/T291392 [18:46:09] wmf.4 live Seddon [18:46:19] wooop wooop [18:46:30] !log urbanecm@deploy1002 Synchronized php-1.38.0-wmf.5/extensions/MediaSearch/: a84a675: 3231578: MediaSearch backports (T291392, T293335, T291392, T291622, T293554) (duration: 01m 03s) [18:46:34] and wmf.5 too [18:46:36] anything else? [18:46:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:51] Nope! A wonderful backport as always [18:47:02] great :) [18:50:11] 10SRE, 10DBA, 10observability, 10Sustainability (Incident Followup): Monitor/dashboard number of queries killed by the automatic query killer - https://phabricator.wikimedia.org/T293531 (10herron) Does/could the query killer itself write an additional log to syslog or a file each time a kill action is take... [18:51:07] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Dzahn) [18:59:02] urbanecm: Seddon: congratulations ;) [18:59:11] :)) [18:59:15] to a healthy backport? [19:00:05] hashar and dancy: #bothumor I � Unicode. All rise for MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T1900). [19:00:20] I am just happy to see the blocker patch got reviewed, backported AND deployed ;) [19:00:34] 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on labweb1002 - https://phabricator.wikimedia.org/T293428 (10Dzahn) Thanks! confirmed. OK: Active: 2, Working: 2, Failed: 0, Spare: 0 OK [19:00:36] (03PS1) 10Hashar: group0 wikis to 1.38.0-wmf.5 refs T281169 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732035 [19:00:38] (03CR) 10Hashar: [C: 03+2] group0 wikis to 1.38.0-wmf.5 refs T281169 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732035 (owner: 10Hashar) [19:00:45] :)) [19:01:45] (03Merged) 10jenkins-bot: group0 wikis to 1.38.0-wmf.5 refs T281169 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732035 (owner: 10Hashar) [19:01:57] I'm just glad I got through it without breaking stuff :P [19:02:38] (03PS3) 10RLazarus: mediawiki: Drop absented wikibase_repo_prune2 systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/731028 (https://phabricator.wikimedia.org/T292604) (owner: 10Lucas Werkmeister (WMDE)) [19:02:59] !log hashar@deploy1002 rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.5 refs T281169 [19:03:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:05] T281169: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 [19:03:52] live! [19:04:41] (03CR) 10RLazarus: [C: 03+2] mediawiki: Drop absented wikibase_repo_prune2 systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/731028 (https://phabricator.wikimedia.org/T292604) (owner: 10Lucas Werkmeister (WMDE)) [19:05:24] [{reqId}] {exception_url} PHP Deprecated: Use of ParserOutput::setProperty was deprecated in MediaWiki 1.38. [Called from Wikibase\Client\Hooks\ShortDescHandler::doHandle] [19:05:25] [{reqId}] {exception_url} PHP Deprecated: Use of ParserOutput::getProperty was deprecated in MediaWiki 1.38. [Called from Wikibase\Client\Hooks\ShortDescHandler::doHandle] [19:05:34] "just" deprecation, will fill them later [19:14:21] (03PS1) 10Dzahn: admin: create user for Namrata Ray and add to analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/732038 (https://phabricator.wikimedia.org/T293810) [19:17:37] nothing happening beside those [19:19:33] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Dzahn) @Ottomata or @odimitrijevic Hi, do you approve this access request? It looks to me just like the one you recently approved in T292992 which... [19:24:37] (03PS1) 10Andrew Bogott: openstack::haproxy: Add tls frontend for designate api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732039 (https://phabricator.wikimedia.org/T267194) [19:24:39] (03PS1) 10Andrew Bogott: openstack::haproxy: add tls for the keystone-admin api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732040 (https://phabricator.wikimedia.org/T267194) [19:24:41] (03PS1) 10Andrew Bogott: openstack::haproxy: add tls for the placement api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732041 (https://phabricator.wikimedia.org/T267194) [19:24:43] (03PS1) 10Andrew Bogott: openstack::haproxy: add tls for the trove api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732042 (https://phabricator.wikimedia.org/T267194) [19:24:45] (03PS1) 10Andrew Bogott: openstack::haproxy: add tls for the cinder api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732043 (https://phabricator.wikimedia.org/T267194) [19:24:47] (03PS1) 10Andrew Bogott: openstack::haproxy: add tls for the neutron api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732044 (https://phabricator.wikimedia.org/T267194) [19:24:49] (03PS1) 10Andrew Bogott: openstack::haproxy: add tls for the swift api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732045 (https://phabricator.wikimedia.org/T267194) [19:24:51] (03PS1) 10Andrew Bogott: openstack::haproxy: add tls for the barbican api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732046 (https://phabricator.wikimedia.org/T267194) [19:29:09] (03CR) 10CDanis: [C: 03+1] admin: create user for Namrata Ray and add to analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/732038 (https://phabricator.wikimedia.org/T293810) (owner: 10Dzahn) [19:36:31] hey folks! I'm trying to do git review -R and I'm facing the following error that I have no idea how to fix as it's not my script. any ideas on what I can do? I've tried un and reinstalling git, git-review and python (was previously 3.8, now 3.10) https://irc.873gear.com/uploads/09d4930f24b1465b/image.png [19:39:45] dontpanic: is the code you are trying to upload this? https://phabricator.wikimedia.org/T293545#7440574 [19:40:10] yes, just changed the T number [19:40:45] it seems like there is some special character in there that it fails to recognize as Unicode? [19:41:19] I wonder if this is just specific to this very change and the content of it.. or happens for you with any test edit [19:41:32] alternative is to use the plain git command: `git push origin HEAD:refs/for/master` [19:41:46] or whatever command `git-review -n` lists (`-n` is dry mode so it might work) [19:42:00] or you can try the form uploader [19:42:24] and one feel free to fill a task against #gerrit at https://phabricator.wikimedia.org/ , I might give it a try [19:42:24] dontpanic: https://www.mediawiki.org/wiki/Gerrit_patch_uploader [19:42:47] this might be helpful as an alternative for now [19:43:12] so you can just upload until we know more [19:43:28] different topic: train went all fine on group0 will do group1 tomorrow apparently :] [19:43:36] congrats [19:44:13] hashar: I tried that, errored out and now I see that apparently my ssh keys aren't working for some reason, I'll regenerate them and try again [19:44:15] dontpanic: and would be interesting if it's gone when you just make some test edit like adding a new line [19:45:16] oh, I have a good one actually mutante, let me file the task first :P [19:46:02] PROBLEM - WMF Cloud -Omega Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Certificate cloudelastic.wikimedia.org expires in 7 day(s) (Wed 27 Oct 2021 07:00:23 PM GMT +0000). https://wikitech.wikimedia.org/wiki/Search%23Administration [19:46:18] Cloud Omega.. wtf :) [19:46:51] pretty sure that is just an LE renewal [19:49:35] yea, confirmed it's LE. [puppetmaster1001:~] $ curl -6 -S -vvv https://cloudelastic.wikimedia.org:9243 [19:51:14] shouldn't it still auto renew in something like 30d before expiry [19:51:17] we shouldn't have multiple CRITs in Icinga though for this if it's normal [19:51:35] the number is lower for LE certs [19:51:50] RECOVERY - WMF Cloud -Omega Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is OK: OK - Certificate cloudelastic.wikimedia.org will expire on Sun 26 Dec 2021 07:00:29 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Search%23Administration [19:51:55] well, making a ticket .. or lol [19:52:15] I bet the renewal period is 7 days and the monitoring threshold is 7 days as well [19:52:35] 10SRE, 10Advanced Mobile Contributions, 10Traffic-Icebox, 10Readers-Web-Backlog (Tracking), 10User-Joe: AMC – Opt-in for logged out users - https://phabricator.wikimedia.org/T215624 (10ovasileva) 05Open→03Declined Declining. This is not currently a part of our roadmap [19:52:46] off by minutes [19:53:58] new one valid until Christmas, convenient day for the alert [19:55:24] (03CR) 10Majavah: [C: 04-1] "That isn't a valid port, max port number in TCP is somewhere in the 65k range. Additionally we shouldn't really use anything above 32k due" [puppet] - 10https://gerrit.wikimedia.org/r/732040 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [19:55:39] majavah: 30d for commercial certs, 7d for LE certs I think and it leads to monitoring confusion because you can have different certs based on geo location so it can be that Icinga sees something different from you [19:56:19] (03PS1) 10Tks4Fish: ruwikiversity: Add 'portal' and 'faculty' namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732053 (https://phabricator.wikimedia.org/T293545) [19:56:31] *sigh* just kill me now [19:56:40] all that work simply for skipping steps [19:57:28] dontpanic: want us to upload a change or something? or use the form? [19:57:44] (03CR) 10jerkins-bot: [V: 04-1] ruwikiversity: Add 'portal' and 'faculty' namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732053 (https://phabricator.wikimedia.org/T293545) (owner: 10Tks4Fish) [19:57:45] nope, I did it [19:57:51] cool [19:57:55] just me being dumb and forgetting git commit -a [19:57:57] ... [19:58:09] glad it works now:) [19:58:17] yeah, thanks for the help :D [19:58:44] you're welcome. [19:59:04] let's see what jerkins has to say there :) [19:59:34] 12:57:04 Script phpunit handling the test event returned with error code 1 [20:00:03] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Ottomata) Approved. @Naray, do you need shell access, or will you just be accessing dashboards in superset? Please see: https://wikitech.wikimed... [20:01:58] (03PS2) 10Tks4Fish: ruwikiversity: Add 'portal' and 'faculty' namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732053 (https://phabricator.wikimedia.org/T293545) [20:03:09] ah, underscores, I see [20:03:29] (03CR) 10Tks4Fish: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732053 (https://phabricator.wikimedia.org/T293545) (owner: 10Tks4Fish) [20:05:11] (03CR) 10Dzahn: [C: 03+1] "looks good to me, Jenkins likes it too now. Google Translate confirmed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732053 (https://phabricator.wikimedia.org/T293545) (owner: 10Tks4Fish) [20:05:41] dontpanic: there you go, the bot likes it too now. now you just need to get it into a deployment window [20:05:54] just did that :P [20:05:57] thanks a ton :D [20:06:00] great [20:27:50] (03CR) 10Arlolra: [C: 03+1] dumps: Improve enterprise index a bit more [puppet] - 10https://gerrit.wikimedia.org/r/732019 (owner: 10Legoktm) [20:31:53] 10SRE, 10observability, 10cloud-services-team (Kanban): cloudelastic icinga TLS cert alerts - https://phabricator.wikimedia.org/T293826 (10Dzahn) [20:34:09] 10SRE, 10observability, 10cloud-services-team (Kanban): cloudelastic icinga TLS cert alerts - https://phabricator.wikimedia.org/T293826 (10Dzahn) [20:34:39] 10SRE, 10Traffic, 10observability, 10cloud-services-team (Kanban): cloudelastic icinga TLS cert alerts - https://phabricator.wikimedia.org/T293826 (10Dzahn) [20:36:44] 10SRE, 10Discovery-Search, 10Traffic, 10observability: cloudelastic icinga TLS cert alerts - https://phabricator.wikimedia.org/T293826 (10Dzahn) [20:36:59] ACKNOWLEDGEMENT - WMF Cloud -Psi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Certificate cloudelastic.wikimedia.org expires in 7 day(s) (Wed 27 Oct 2021 07:00:23 PM GMT +0000). daniel_zahn https://phabricator.wikimedia.org/T293826 https://wikitech.wikimedia.org/wiki/Search%23Administration [20:38:12] 10SRE, 10Discovery-Search, 10Traffic, 10observability: cloudelastic icinga TLS cert alerts - https://phabricator.wikimedia.org/T293826 (10Dzahn) p:05Triage→03Low ` 20:36 <+icinga-wm> ACKNOWLEDGEMENT - WMF Cloud -Psi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRIT... [20:41:09] ACKNOWLEDGEMENT - WMF Cloud -Psi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Certificate cloudelastic.wikimedia.org expires in 7 day(s) (Wed 27 Oct 2021 07:00:23 PM GMT +0000). daniel_zahn https://phabricator.wikimedia.org/T293826 https://wikitech.wikimedia.org/wiki/Search%23Administration [20:41:09] ACKNOWLEDGEMENT - WMF Cloud -Psi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Certificate cloudelastic.wikimedia.org expires in 7 day(s) (Wed 27 Oct 2021 07:00:23 PM GMT +0000). daniel_zahn https://phabricator.wikimedia.org/T293826 https://wikitech.wikimedia.org/wiki/Search%23Administration [20:47:38] PROBLEM - WMF Cloud -Psi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Certificate cloudelastic.wikimedia.org expires in 7 day(s) (Wed 27 Oct 2021 07:00:23 PM GMT +0000). https://wikitech.wikimedia.org/wiki/Search%23Administration [20:48:19] "PROBLEM", that's not a normal Icinga state, heh [20:48:46] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Dzahn) @Joe So for the current/pre-k8s setup this is resolved, minus one Hiera flip to enable on all appservers w... [20:49:42] RECOVERY - WMF Cloud -Psi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is OK: OK - Certificate cloudelastic.wikimedia.org will expire on Sun 26 Dec 2021 07:00:29 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Search%23Administration [20:51:10] 10SRE, 10Discovery-Search, 10Traffic, 10observability: cloudelastic icinga TLS cert alerts - https://phabricator.wikimedia.org/T293826 (10Dzahn) additional issue is these alerts are flapping and keep coming back so a simple ACK wasn't enough. downtiming for a day for now [20:54:59] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) affected hosts I am ACKing right now in Icinga: contint2001.mgmt ms-fe200... [20:56:56] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [20:58:28] ACKNOWLEDGEMENT - SSH on contint2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn https://phabricator.wikimedia.org/T283582 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [20:58:28] ACKNOWLEDGEMENT - SSH on ms-fe2006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn https://phabricator.wikimedia.org/T283582 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [20:58:28] ACKNOWLEDGEMENT - SSH on mw2253.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn https://phabricator.wikimedia.org/T283582 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [21:00:11] (03PS7) 10Bernard Wang: Add WMEDesktopWebUIActionsTrackingOversampleLoggedInUsers config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731827 (https://phabricator.wikimedia.org/T292588) [21:01:30] (03CR) 10Bernard Wang: Add WMEDesktopWebUIActionsTrackingOversampleLoggedInUsers config (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731827 (https://phabricator.wikimedia.org/T292588) (owner: 10Bernard Wang) [21:03:08] jouncebot: last [21:05:03] urbanecm: do you know the "deploy_to_mwdebug".service on deploy1002? [21:05:44] mutante: i'm not very familiar with it -- AFAIK it does k8s deployments to mirror the actual ones (more precisely, merges in deployment branches) [21:06:15] *nod*, ty [21:06:55] lego.ktm might be able to provide more details and help with debugging, etc. [21:06:57] why do you ask mutante ? :) [21:07:11] I do, but in a meeting rn [21:07:18] was seeing ERROR:root:A previous deployment failed. [21:07:42] and it says to look at an error file, which I did, but I see only a timestamp [21:07:53] ACK, dont think it's urgent, thanks [21:08:05] yeah, mw on k8s is experimental anyway :)) [21:08:11] yea [21:08:50] I left comments in serviceops for whenever [21:10:50] Sounds good :) [21:12:54] ACKNOWLEDGEMENT - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: deploy_to_mwdebug.service daniel_zahn experimental https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:13:12] I'll reply in -serviceops [21:14:13] (03CR) 10Cwhite: [C: 03+2] opensearch: fork elasticsearch module into opensearch module [puppet] - 10https://gerrit.wikimedia.org/r/721359 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [21:15:59] (03PS9) 10Cwhite: opensearch_dashboards: fork kibana module into opensearch_dashboards module [puppet] - 10https://gerrit.wikimedia.org/r/721385 (https://phabricator.wikimedia.org/T288618) [21:19:56] RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:20:13] :) alright, ty [21:20:42] (03CR) 10Cwhite: [C: 03+2] opensearch_dashboards: fork kibana module into opensearch_dashboards module [puppet] - 10https://gerrit.wikimedia.org/r/721385 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [21:26:07] (03PS2) 10Andrew Bogott: openstack::haproxy: add tls for the keystone-admin api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732040 (https://phabricator.wikimedia.org/T267194) [21:29:04] (03PS1) 10Ahmon Dancy: Add more stubs in DevServices.php [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/732065 [21:30:23] (03CR) 10Ahmon Dancy: [V: 03+2 C: 03+2] Add more stubs in DevServices.php [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/732065 (owner: 10Ahmon Dancy) [21:30:48] (03PS1) 10Andrew Bogott: Openstack haproxy: Fix keystone internal port [puppet] - 10https://gerrit.wikimedia.org/r/732087 [21:31:20] (03CR) 10Andrew Bogott: "Note that when this is merged we'll also need to change entries in the keystone catalog by hand at the same time." [puppet] - 10https://gerrit.wikimedia.org/r/732087 (owner: 10Andrew Bogott) [21:32:00] (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: Add tls frontend for designate api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732039 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:32:12] (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: add tls for the neutron api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732044 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:32:34] (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: add tls for the barbican api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732046 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:32:42] RECOVERY - Check for large files in client bucket on mwmaint1002 is OK: OK: client bucket file ok https://wikitech.wikimedia.org/wiki/Puppet%23check_client_bucket_large_file [21:32:45] !log mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete [21:32:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:58] 10SRE, 10DynamicPageList (Wikimedia), 10MW-1.37-notes (1.37.0-wmf.16; 2021-07-26), 10Patch-For-Review, 10Sustainability (Incident Followup): Decide on the future of DPL - https://phabricator.wikimedia.org/T287380 (10Tgr) >>! In T287380#7437747, @Urbanecm wrote: > As you can see at https://www.mediawiki.o... [21:33:00] (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: add tls for the keystone-admin api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732040 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:33:10] (03PS3) 10Andrew Bogott: openstack::haproxy: add tls for the keystone-admin api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732040 (https://phabricator.wikimedia.org/T267194) [21:34:02] (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: add tls for the placement api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732041 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:34:10] !log mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete | fixed Icinga alert: RECOVERY - Check for large files in client bucket on mwmaint1002 is OK: OK: T165885 [21:34:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:34:16] T165885: Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885 [21:34:16] (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: add tls for the trove api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732042 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:34:25] (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: add tls for the cinder api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732043 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:34:36] (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: add tls for the swift api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732045 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:35:27] (03PS2) 10Andrew Bogott: openstack::haproxy: add tls for the neutron api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732044 (https://phabricator.wikimedia.org/T267194) [21:35:54] (03PS2) 10Andrew Bogott: openstack::haproxy: add tls for the cinder api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732043 (https://phabricator.wikimedia.org/T267194) [21:36:35] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) |ores2005.mgmt|PER430| |gerrit2001.mgmt|PER430| |ms-fe2006.mgmt|PER430| |... [21:39:39] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [21:40:32] (03PS5) 10Juan90264: Create Portal and Portal talk namespace for shiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731953 (https://phabricator.wikimedia.org/T288909) [21:40:55] (03PS3) 10Andrew Bogott: openstack::haproxy: add tls for the neutron api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732044 (https://phabricator.wikimedia.org/T267194) [21:41:46] 10Puppet, 10SRE, 10Cloud-Services, 10Infrastructure-Foundations, and 2 others: Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885 (10Dzahn) @jbond I had an actual alert for this on mwmaint1002, looked up whether we apply this in prod or only cloud so far or if... [21:45:02] (03PS2) 10Andrew Bogott: openstack::haproxy: add tls for the trove api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732042 (https://phabricator.wikimedia.org/T267194) [21:46:34] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] openstack::haproxy: add tls for the trove api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732042 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:46:50] (03CR) 10Andrew Bogott: openstack::haproxy: add tls for the trove api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732042 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:47:01] (03CR) 10Andrew Bogott: [C: 03+2] openstack::haproxy: add tls for the trove api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732042 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott) [21:47:14] (03PS2) 10Andrew Bogott: openstack::haproxy: add tls for the placement api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732041 (https://phabricator.wikimedia.org/T267194) [21:50:04] (03PS2) 10Andrew Bogott: openstack::haproxy: add tls for the swift api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732045 (https://phabricator.wikimedia.org/T267194) [21:52:09] (03PS2) 10Andrew Bogott: openstack::haproxy: add tls for the barbican api in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/732046 (https://phabricator.wikimedia.org/T267194) [22:06:23] (03PS1) 10Dduvall: gitlab: Allow configuration of gitlab-runner concurrency [puppet] - 10https://gerrit.wikimedia.org/r/732093 (https://phabricator.wikimedia.org/T293833) [22:08:14] @seen ballons [22:09:35] (03CR) 10Juan90264: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/710565 (https://phabricator.wikimedia.org/T287437) (owner: 10Labdajiwa) [22:12:09] (03PS1) 10Andrew Bogott: Openstack keystone: fix (unused) profile::openstack::base::keystone::public_bind_port [puppet] - 10https://gerrit.wikimedia.org/r/732095 [22:12:40] (03CR) 10jerkins-bot: [V: 04-1] Openstack keystone: fix (unused) profile::openstack::base::keystone::public_bind_port [puppet] - 10https://gerrit.wikimedia.org/r/732095 (owner: 10Andrew Bogott) [22:12:49] (03PS9) 10Cwhite: icinga: fork icinga::monitor::elasticsearch::base_checks [puppet] - 10https://gerrit.wikimedia.org/r/721386 (https://phabricator.wikimedia.org/T288618) [22:13:56] (03PS2) 10Andrew Bogott: Openstack keystone: fix (unused) openstack::base::keystone::public_bind_port [puppet] - 10https://gerrit.wikimedia.org/r/732095 [22:16:19] (03CR) 10Andrew Bogott: [C: 03+2] "Confirmed no-op in https://puppet-compiler.wmflabs.org/compiler1003/31768/" [puppet] - 10https://gerrit.wikimedia.org/r/732095 (owner: 10Andrew Bogott) [22:18:27] (03PS1) 10Legoktm: aptrepo: Add component/php74 [puppet] - 10https://gerrit.wikimedia.org/r/732096 (https://phabricator.wikimedia.org/T293449) [22:18:29] (03PS1) 10Legoktm: package_builder: Add hook for building PHP 7.4 packages [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) [22:19:14] (03CR) 10jerkins-bot: [V: 04-1] package_builder: Add hook for building PHP 7.4 packages [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) (owner: 10Legoktm) [22:20:45] (03PS2) 10Legoktm: package_builder: Add hook for building PHP 7.4 packages [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) [22:21:22] (03CR) 10jerkins-bot: [V: 04-1] package_builder: Add hook for building PHP 7.4 packages [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) (owner: 10Legoktm) [22:21:50] (03CR) 10Cwhite: [C: 03+2] icinga: fork icinga::monitor::elasticsearch::base_checks [puppet] - 10https://gerrit.wikimedia.org/r/721386 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [22:22:27] (03PS3) 10Legoktm: package_builder: Add hook for building PHP 7.4 packages [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) [22:23:14] (03CR) 10jerkins-bot: [V: 04-1] package_builder: Add hook for building PHP 7.4 packages [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) (owner: 10Legoktm) [22:23:17] (03PS5) 10Cwhite: logstash: kafka input: add manage_truststore parameter [puppet] - 10https://gerrit.wikimedia.org/r/727625 (https://phabricator.wikimedia.org/T288618) [22:25:23] (03PS4) 10Legoktm: package_builder: Add hook for building PHP 7.4 packages [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) [22:26:10] (03CR) 10jerkins-bot: [V: 04-1] package_builder: Add hook for building PHP 7.4 packages [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) (owner: 10Legoktm) [22:33:01] (03CR) 10Legoktm: "I'm clearly doing something wrong...just going to copy and paste for now and leave this as a follow up." [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) (owner: 10Legoktm) [22:37:19] (03PS5) 10Legoktm: package_builder: Add hook for building PHP 7.4 packages [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) [22:37:21] (03PS1) 10Legoktm: [WIP] package_builder: Refactor PHP hook into a template [puppet] - 10https://gerrit.wikimedia.org/r/732098 [22:38:10] (03CR) 10jerkins-bot: [V: 04-1] package_builder: Add hook for building PHP 7.4 packages [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) (owner: 10Legoktm) [22:38:32] (03CR) 10jerkins-bot: [V: 04-1] [WIP] package_builder: Refactor PHP hook into a template [puppet] - 10https://gerrit.wikimedia.org/r/732098 (owner: 10Legoktm) [22:39:00] ...wut [22:41:29] (03CR) 10Legoktm: [C: 03+2] aptrepo: Add component/php74 [puppet] - 10https://gerrit.wikimedia.org/r/732096 (https://phabricator.wikimedia.org/T293449) (owner: 10Legoktm) [22:43:28] (03CR) 10Legoktm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31769/console" [puppet] - 10https://gerrit.wikimedia.org/r/732097 (https://phabricator.wikimedia.org/T293449) (owner: 10Legoktm) [22:47:32] (03CR) 10Cwhite: [C: 03+2] logstash: kafka input: add manage_truststore parameter [puppet] - 10https://gerrit.wikimedia.org/r/727625 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [22:54:05] (03PS9) 10Cwhite: profile: fork elasticsearch::logstash into opensearch::logstash [puppet] - 10https://gerrit.wikimedia.org/r/721395 (https://phabricator.wikimedia.org/T288618) [22:56:33] Waiting... [23:00:04] RoanKattouw and Urbanecm: #bothumor I � Unicode. All rise for UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211019T2300). [23:00:04] Juan_90264 and dontpanic: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:27] I still want to break and fix a wiki, solely for the sticker... [23:00:51] Okay [23:00:58] Urbanecm: Hello? [23:03:00] I'm quite tired now -- I'd prefer not to deploy today. [23:03:09] Anyone else to lead the window? [23:03:20] (03PS1) 10Dzahn: mediawiki::appserver: fetch additional MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) [23:03:34] dontpanic: I still wonder, where do I get this sticker? [23:03:55] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::appserver: fetch additional MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:04:00] Juan_90264: see jouncebot's message [23:04:04] Urbanecm: No problem [23:04:24] (03PS2) 10Dzahn: mediawiki::appserver: fetch additional MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) [23:04:54] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::appserver: fetch additional MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:04:58] Any deployers available? [23:05:08] (03PS3) 10Dzahn: mediawiki::appserver: fetch extra MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) [23:05:15] tgr: ? [23:05:40] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::appserver: fetch extra MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:06:00] ack [23:06:18] Thanks tgr :). Leaving it in your hands now. [23:06:50] (03PS10) 10Gergő Tisza: Repair the size of the logo of Kashmiri Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731231 (https://phabricator.wikimedia.org/T293342) (owner: 10Juan90264) [23:06:54] (03PS4) 10Dzahn: mediawiki: fetch extra MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) [23:07:25] (03CR) 10Jdlrobson: "See comment on other ticket. I think we can avoid the need for the config change by defaulting to true. We'll only use a config change if " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731827 (https://phabricator.wikimedia.org/T292588) (owner: 10Bernard Wang) [23:07:27] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: fetch extra MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:07:39] (03CR) 10Dzahn: "Pretty confident with this because as you can see it's already on canaries for a while, but before I deploy to all, maybe just a quick san" [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:09:03] (03CR) 10Gergő Tisza: [C: 03+2] Repair the size of the logo of Kashmiri Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731231 (https://phabricator.wikimedia.org/T293342) (owner: 10Juan90264) [23:09:24] (03PS5) 10Dzahn: mediawiki: fetch extra MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) [23:10:02] (03Merged) 10jenkins-bot: Repair the size of the logo of Kashmiri Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731231 (https://phabricator.wikimedia.org/T293342) (owner: 10Juan90264) [23:10:57] Great merged [23:13:06] Can we start the test? [23:13:47] !log tgr@deploy1002 Synchronized static: Config: [[gerrit:731231|Repair the size of the logo of Kashmiri Wikipedia (T293342)]] (duration: 02m 14s) [23:13:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:13:53] T293342: Requesting permanent logo change for ks.wikipedia.org - https://phabricator.wikimedia.org/T293342 [23:14:38] Juan_90264: yeah, please test [23:15:07] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:15:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:15:17] (03CR) 10Dzahn: "wait, I should not remove it from the canary role, it's not like the common one overrides it here" [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:15:37] (03CR) 10Gergő Tisza: [C: 03+2] Create Portal and Portal talk namespace for shiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731953 (https://phabricator.wikimedia.org/T288909) (owner: 10Juan90264) [23:15:59] Okay [23:16:42] (03PS6) 10Dzahn: mediawiki::appserver: fetch additional MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) [23:17:08] (03Merged) 10jenkins-bot: Create Portal and Portal talk namespace for shiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731953 (https://phabricator.wikimedia.org/T288909) (owner: 10Juan90264) [23:17:54] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::appserver: fetch additional MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:18:58] (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/31772/" [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [23:19:23] Perfect merged [23:20:57] (03PS7) 10Dzahn: mediawiki::appserver: fetch additional MaxMind databases on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/732099 (https://phabricator.wikimedia.org/T288844) [23:22:08] tgr: I tested and approved [731231] [23:23:06] (03PS5) 10Gergő Tisza: Set the project namespace and sitename for Javanese Wikipedia and Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/710565 (https://phabricator.wikimedia.org/T287437) (owner: 10Labdajiwa) [23:23:19] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:23:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:30] !log tgr@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731953|Create Portal and Portal talk namespace for shiwiki (T288909)]] (duration: 01m 03s) [23:23:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:36] T288909: Portal namespace on Wikipedia Tashlhite - https://phabricator.wikimedia.org/T288909 [23:25:53] Juan_90264: you can test the shiwiki namespaces [23:26:23] Yes, I can [23:26:43] (03CR) 10Gergő Tisza: [C: 03+2] Set the project namespace and sitename for Javanese Wikipedia and Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/710565 (https://phabricator.wikimedia.org/T287437) (owner: 10Labdajiwa) [23:27:47] (03Merged) 10jenkins-bot: Set the project namespace and sitename for Javanese Wikipedia and Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/710565 (https://phabricator.wikimedia.org/T287437) (owner: 10Labdajiwa) [23:27:59] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for DAbad - https://phabricator.wikimedia.org/T293253 (10Dzahn) p:05Triage→03Medium [23:28:09] 10SRE, 10Acme-chief, 10Patch-For-Review: acme-chief is down: ValueError: OCSP response status is not successful so the property has no value - https://phabricator.wikimedia.org/T282490 (10Dzahn) I was wondering the same but assume this should stay open until acme_chief 3.0 has been deployed? [23:29:26] tgr: I tested and approved [731953] [23:29:40] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Dzahn) a:05Dzahn→03NaRay [23:29:50] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Analytics Private Data Users for Naray-ctr - https://phabricator.wikimedia.org/T293810 (10Dzahn) p:05High→03Medium [23:36:49] (03PS3) 10Gergő Tisza: ruwikiversity: Add 'portal' and 'faculty' namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732053 (https://phabricator.wikimedia.org/T293545) (owner: 10Tks4Fish) [23:36:52] !log tgr@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710565|Set the project namespace and sitename for Javanese Wikipedia and Wiktionary (T287437)]] (duration: 01m 02s) [23:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:59] T287437: Set the project namespace and sitename for Javanese Wikipedia and Wiktionary - https://phabricator.wikimedia.org/T287437 [23:37:10] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:37:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:15] Juan_90264: you can check the last patch [23:38:37] Yes, I can [23:38:39] (03CR) 10Gergő Tisza: [C: 03+2] ruwikiversity: Add 'portal' and 'faculty' namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732053 (https://phabricator.wikimedia.org/T293545) (owner: 10Tks4Fish) [23:39:08] (03CR) 10Cwhite: [C: 03+2] profile: fork elasticsearch::logstash into opensearch::logstash [puppet] - 10https://gerrit.wikimedia.org/r/721395 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [23:39:25] (03Merged) 10jenkins-bot: ruwikiversity: Add 'portal' and 'faculty' namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732053 (https://phabricator.wikimedia.org/T293545) (owner: 10Tks4Fish) [23:40:01] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:40:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:40:15] dontpanic: Be aware that it's your turn [23:40:22] I know. [23:40:25] thanks [23:42:10] (03PS9) 10Cwhite: profile: fork elasticsearch base_checks for opensearch [puppet] - 10https://gerrit.wikimedia.org/r/721389 (https://phabricator.wikimedia.org/T288618) [23:42:21] tgr: I tested and approved [710565] [23:42:34] thanks for checking! [23:42:55] tgr: at 1002? [23:43:17] dontpanic: 1001 [23:44:03] looks good :) [23:44:43] after you sync, you have a moment for a late patch? :P [23:45:56] sure [23:46:03] please add it to the calendar [23:47:02] yep yep, just going to commit it after all syncs are done to prevent messes heh [23:47:27] !log tgr@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732053|ruwikiversity: Add 'portal' and 'faculty' namespaces (T293545)]] (duration: 01m 03s) [23:47:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:47:33] T293545: Add RuWikiversity namespaces - https://phabricator.wikimedia.org/T293545 [23:48:54] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:48:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:16] (03CR) 10Cwhite: [C: 03+1] "\o/ LGTM! (Provided there are no edge cases, of course. There are a lot of changes in that PCC run.)" [puppet] - 10https://gerrit.wikimedia.org/r/731943 (https://phabricator.wikimedia.org/T286467) (owner: 10Filippo Giunchedi) [23:51:45] (03CR) 10Cwhite: [C: 03+2] profile: fork elasticsearch base_checks for opensearch [puppet] - 10https://gerrit.wikimedia.org/r/721389 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [23:51:55] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:51:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:52:38] 10SRE, 10Wikimedia-Mailing-lists: Disable "Unblock-pt-l" - https://phabricator.wikimedia.org/T293591 (10Dzahn) 05Open→03In progress a:03Dzahn [23:52:53] dontpanic: good to go if you are ready [23:52:56] (03PS1) 10Tks4Fish: Reorder some wikis at wgExtraNamespaces and wmgVisualEditorAvailableNamespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732103 (https://phabricator.wikimedia.org/T293846) [23:53:00] commiting now [23:54:35] done tgr [23:54:47] 732103 [23:56:03] 10SRE, 10Wikimedia-Mailing-lists: Disable "Unblock-pt-l" - https://phabricator.wikimedia.org/T293591 (10Dzahn) I logged in as Administrator and followed the instructions from https://wikitech.wikimedia.org/wiki/Mailman#Disable_or_re-enable_a_mailing_list for this list. So: - removed existing owner and replac... [23:56:07] (03CR) 10Gergő Tisza: [C: 03+2] Reorder some wikis at wgExtraNamespaces and wmgVisualEditorAvailableNamespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732103 (https://phabricator.wikimedia.org/T293846) (owner: 10Tks4Fish) [23:56:50] (03Merged) 10jenkins-bot: Reorder some wikis at wgExtraNamespaces and wmgVisualEditorAvailableNamespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732103 (https://phabricator.wikimedia.org/T293846) (owner: 10Tks4Fish) [23:58:02] thanks a ton! :) [23:59:11] (03CR) 10Cwhite: [C: 03+1] "Looks great! Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/731976 (https://phabricator.wikimedia.org/T293439) (owner: 10Herron) [23:59:34] 10SRE, 10Wikimedia-Mailing-lists: Disable "Unblock-pt-l" - https://phabricator.wikimedia.org/T293591 (10Dzahn) After this I went to the "Automatic Responses" section in the list settings and tried to copy/paste the suggested AutoResponder message and activate it. The result when trying to save the settings wa... [23:59:41] !log tgr@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732103|Reorder some wikis at wgExtraNamespaces and wmgVisualEditorAvailableNamespaces (T293846)]] (duration: 01m 02s) [23:59:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:47] T293846: Reorder some wikis on wgExtraNamespaces and wmgVisualEditorAvailableNamespaces at InitialiseSettings - https://phabricator.wikimedia.org/T293846