[00:03:11] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host conf1008.eqiad.wmnet with OS bullseye [00:03:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:03:19] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10Patch-For-Review: Q3:(Need By: TBD) rack/setup/install conf100[789] - https://phabricator.wikimedia.org/T301272 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host conf1008.eqiad.wmnet with OS bullseye [00:03:29] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host conf1009.eqiad.wmnet with OS bullseye [00:03:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:03:37] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10Patch-For-Review: Q3:(Need By: TBD) rack/setup/install conf100[789] - https://phabricator.wikimedia.org/T301272 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host conf1009.eqiad.wmnet with OS bullseye [00:09:02] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on conf1007.eqiad.wmnet with reason: host reimage [00:09:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:12:11] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf1007.eqiad.wmnet with reason: host reimage [00:12:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:14:26] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on conf1008.eqiad.wmnet with reason: host reimage [00:14:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:14:43] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on conf1009.eqiad.wmnet with reason: host reimage [00:14:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:17:37] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf1008.eqiad.wmnet with reason: host reimage [00:17:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:19:20] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf1009.eqiad.wmnet with reason: host reimage [00:19:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:24:55] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf1007.eqiad.wmnet with OS bullseye [00:24:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:25:03] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops, 10Patch-For-Review: Q3:(Need By: TBD) rack/setup/install conf100[789] - https://phabricator.wikimedia.org/T301272 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host conf1007.eqiad.wmnet with OS bullseye completed... [00:31:28] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf1009.eqiad.wmnet with OS bullseye [00:31:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:31:35] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q3:(Need By: TBD) rack/setup/install conf100[789] - https://phabricator.wikimedia.org/T301272 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host conf1009.eqiad.wmnet with OS bullseye completed: - conf1009 (**PASS**... [00:31:37] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:34:16] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf1008.eqiad.wmnet with OS bullseye [00:34:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:34:22] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q3:(Need By: TBD) rack/setup/install conf100[789] - https://phabricator.wikimedia.org/T301272 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host conf1008.eqiad.wmnet with OS bullseye completed: - conf1008 (**PASS**... [00:35:11] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q3:(Need By: TBD) rack/setup/install conf100[789] - https://phabricator.wikimedia.org/T301272 (10Cmjohnson) [00:35:18] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q3:(Need By: TBD) rack/setup/install conf100[789] - https://phabricator.wikimedia.org/T301272 (10Cmjohnson) 05Open→03Resolved [01:01:01] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [01:01:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:04:53] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [01:04:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:06:14] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host dse-k8s-worker1005.mgmt.eqiad.wmnet with reboot policy FORCED [01:06:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:07:08] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host dse-k8s-worker1006.mgmt.eqiad.wmnet with reboot policy FORCED [01:07:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:08:38] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host dse-k8s-worker1007.mgmt.eqiad.wmnet with reboot policy FORCED [01:08:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:09:17] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host dse-k8s-worker1008.mgmt.eqiad.wmnet with reboot policy FORCED [01:09:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:23:08] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1006.mgmt.eqiad.wmnet with reboot policy FORCED [01:23:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:23:38] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1007.mgmt.eqiad.wmnet with reboot policy FORCED [01:23:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:25:01] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1008.mgmt.eqiad.wmnet with reboot policy FORCED [01:25:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:40:05] 10ops-eqiad: Port with no description on access switch - https://phabricator.wikimedia.org/T309741 (10phaultfinder) [01:42:14] RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:02:38] PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_dispersion_stats_lowlatency.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:08:32] PROBLEM - Check systemd state on mirror1001 is CRITICAL: CRITICAL - degraded: The following units failed: update-ubuntu-mirror.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:10:02] (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [02:35:12] PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [02:43:31] (03PS3) 10Tim Starling: mcrouter mw-stats: make other write commands also async [puppet] - 10https://gerrit.wikimedia.org/r/807665 (https://phabricator.wikimedia.org/T310662) [02:46:50] PROBLEM - Persistent high iowait on labstore1006 is CRITICAL: 66.68 ge 10 https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/d/000000568/labstore1004-1005-1006-1007 [02:56:17] RECOVERY - Persistent high iowait on labstore1006 is OK: (C)10 ge (W)5 ge 3.273 https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/d/000000568/labstore1004-1005-1006-1007 [04:01:26] PROBLEM - Persistent high iowait on labstore1006 is CRITICAL: 61.5 ge 10 https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/d/000000568/labstore1004-1005-1006-1007 [04:06:24] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=UPDATE https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27 [04:10:54] RECOVERY - Persistent high iowait on labstore1006 is OK: (C)10 ge (W)5 ge 3.835 https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/d/000000568/labstore1004-1005-1006-1007 [05:33:10] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1099 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30248 and previous config saved to /var/cache/conftool/dbconfig/20220627-053310-root.json [05:33:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:33:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1110 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30249 and previous config saved to /var/cache/conftool/dbconfig/20220627-053346-root.json [05:33:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:34:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1112 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30250 and previous config saved to /var/cache/conftool/dbconfig/20220627-053408-root.json [05:34:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:34:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1121 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30251 and previous config saved to /var/cache/conftool/dbconfig/20220627-053436-root.json [05:34:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:41:57] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1110 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30252 and previous config saved to /var/cache/conftool/dbconfig/20220627-054156-root.json [05:42:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:42:03] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30253 and previous config saved to /var/cache/conftool/dbconfig/20220627-054203-root.json [05:42:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:42:10] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30254 and previous config saved to /var/cache/conftool/dbconfig/20220627-054210-root.json [05:42:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:42:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30255 and previous config saved to /var/cache/conftool/dbconfig/20220627-054231-root.json [05:42:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:42:41] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30256 and previous config saved to /var/cache/conftool/dbconfig/20220627-054241-root.json [05:42:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:43:26] (03PS1) 10Marostegui: Revert "ProductionServices.php: Promote pc1014 to pc1 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808435 [05:43:31] (03PS2) 10Marostegui: Revert "ProductionServices.php: Promote pc1014 to pc1 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808435 [05:45:26] (03PS1) 10Marostegui: pc1011: Promote it to master [puppet] - 10https://gerrit.wikimedia.org/r/808710 [05:45:54] (03CR) 10Marostegui: [C: 03+2] Revert "ProductionServices.php: Promote pc1014 to pc1 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808435 (owner: 10Marostegui) [05:46:36] (03Merged) 10jenkins-bot: Revert "ProductionServices.php: Promote pc1014 to pc1 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808435 (owner: 10Marostegui) [05:46:50] (03CR) 10Marostegui: [C: 03+2] pc1011: Promote it to master [puppet] - 10https://gerrit.wikimedia.org/r/808710 (owner: 10Marostegui) [05:51:05] !log marostegui@deploy1002 Synchronized wmf-config/ProductionServices.php: Promote pc1011 to pc1 master (duration: 03m 46s) [05:51:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:51:37] (03PS1) 10Marostegui: pc1014: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/808711 [05:52:16] (03CR) 10Marostegui: [C: 03+2] pc1014: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/808711 (owner: 10Marostegui) [05:52:49] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [05:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:53:44] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [05:53:45] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [05:53:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:53:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:54:35] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [05:54:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:57:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30257 and previous config saved to /var/cache/conftool/dbconfig/20220627-055700-root.json [05:57:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:57:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30258 and previous config saved to /var/cache/conftool/dbconfig/20220627-055707-root.json [05:57:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:57:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30259 and previous config saved to /var/cache/conftool/dbconfig/20220627-055714-root.json [05:57:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:57:27] PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The following units failed: mediawiki_job_purge_parsercache_pc1.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:57:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30260 and previous config saved to /var/cache/conftool/dbconfig/20220627-055735-root.json [05:57:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:57:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30261 and previous config saved to /var/cache/conftool/dbconfig/20220627-055745-root.json [05:57:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:58:39] RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:04:44] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [06:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:08:56] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [06:08:58] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [06:09:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:09:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:10:02] (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [06:12:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30262 and previous config saved to /var/cache/conftool/dbconfig/20220627-061204-root.json [06:12:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30263 and previous config saved to /var/cache/conftool/dbconfig/20220627-061211-root.json [06:12:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30264 and previous config saved to /var/cache/conftool/dbconfig/20220627-061218-root.json [06:12:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30265 and previous config saved to /var/cache/conftool/dbconfig/20220627-061239-root.json [06:12:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:49] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30266 and previous config saved to /var/cache/conftool/dbconfig/20220627-061249-root.json [06:12:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:13:01] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [06:13:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:08] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30267 and previous config saved to /var/cache/conftool/dbconfig/20220627-062708-root.json [06:27:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:15] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30268 and previous config saved to /var/cache/conftool/dbconfig/20220627-062715-root.json [06:27:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:22] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30269 and previous config saved to /var/cache/conftool/dbconfig/20220627-062721-root.json [06:27:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30270 and previous config saved to /var/cache/conftool/dbconfig/20220627-062743-root.json [06:27:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:53] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30271 and previous config saved to /var/cache/conftool/dbconfig/20220627-062752-root.json [06:27:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:40:27] (03CR) 10Matthias Mullie: [C: 03+2] Echo tables can live in a different db [extensions/ImageSuggestions] (wmf/1.39.0-wmf.17) - 10https://gerrit.wikimedia.org/r/808120 (owner: 10Matthias Mullie) [06:42:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30272 and previous config saved to /var/cache/conftool/dbconfig/20220627-064212-root.json [06:42:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:19] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30273 and previous config saved to /var/cache/conftool/dbconfig/20220627-064219-root.json [06:42:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:26] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30274 and previous config saved to /var/cache/conftool/dbconfig/20220627-064225-root.json [06:42:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30275 and previous config saved to /var/cache/conftool/dbconfig/20220627-064247-root.json [06:42:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:57] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30276 and previous config saved to /var/cache/conftool/dbconfig/20220627-064256-root.json [06:43:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:45] (03CR) 10Ladsgroup: [C: 03+1] Remove references to $wgEnableLocalTimedText [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802894 (owner: 10Daimona Eaytoy) [06:46:09] (03Merged) 10jenkins-bot: Echo tables can live in a different db [extensions/ImageSuggestions] (wmf/1.39.0-wmf.17) - 10https://gerrit.wikimedia.org/r/808120 (owner: 10Matthias Mullie) [06:50:03] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [06:50:05] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [06:50:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:10] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T307525)', diff saved to https://phabricator.wikimedia.org/P30277 and previous config saved to /var/cache/conftool/dbconfig/20220627-065009-ladsgroup.json [06:50:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:17] T307525: Fix mismatching field type of user table for columns user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T307525 [06:50:39] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance [06:50:41] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance [06:50:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:42] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 14 hosts with reason: Maintenance [06:50:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:52] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 14 hosts with reason: Maintenance [06:50:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:51:08] (03CR) 10Krinkle: [C: 03+1] "Beware of the scap trap (IS before CS). If done via backport window, should be split up as per https://wikitech.wikimedia.org/wiki/Backpor" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802894 (owner: 10Daimona Eaytoy) [06:52:25] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2103.codfw.wmnet with reason: Maintenance [06:52:27] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2103.codfw.wmnet with reason: Maintenance [06:52:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:37] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 3 days, 8:00:00 on 14 hosts with reason: Maintenance [06:52:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:47] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 8:00:00 on 14 hosts with reason: Maintenance [06:52:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:35] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [06:53:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T307525)', diff saved to https://phabricator.wikimedia.org/P30278 and previous config saved to /var/cache/conftool/dbconfig/20220627-065414-ladsgroup.json [06:54:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:19] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [06:56:21] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [06:56:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30279 and previous config saved to /var/cache/conftool/dbconfig/20220627-065716-root.json [06:57:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30280 and previous config saved to /var/cache/conftool/dbconfig/20220627-065722-root.json [06:57:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30281 and previous config saved to /var/cache/conftool/dbconfig/20220627-065729-root.json [06:57:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30282 and previous config saved to /var/cache/conftool/dbconfig/20220627-065751-root.json [06:57:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30283 and previous config saved to /var/cache/conftool/dbconfig/20220627-065800-root.json [06:58:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:00:05] Amir1 and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220627T0700). [07:00:05] matthiasmullie, kuncung, and koi: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:07] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [07:00:08] o/ [07:00:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:01:08] Hi everyone [07:06:40] I can get started on backporting my patch [07:09:19] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30284 and previous config saved to /var/cache/conftool/dbconfig/20220627-070919-ladsgroup.json [07:09:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:11:23] (03PS3) 10Ladsgroup: maintain-views.yaml: Allow selecting lu_attachment_method [puppet] - 10https://gerrit.wikimedia.org/r/804694 (https://phabricator.wikimedia.org/T304015) (owner: 10Zabe) [07:11:31] (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] maintain-views.yaml: Allow selecting lu_attachment_method [puppet] - 10https://gerrit.wikimedia.org/r/804694 (https://phabricator.wikimedia.org/T304015) (owner: 10Zabe) [07:12:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30285 and previous config saved to /var/cache/conftool/dbconfig/20220627-071226-root.json [07:12:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30286 and previous config saved to /var/cache/conftool/dbconfig/20220627-071255-root.json [07:12:58] Hi, matthiasmullie, if there are no deployers around, can you please do mine, too? :) [07:12:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:05] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30287 and previous config saved to /var/cache/conftool/dbconfig/20220627-071304-root.json [07:13:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:14:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1105 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30288 and previous config saved to /var/cache/conftool/dbconfig/20220627-071414-root.json [07:14:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:14:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1109 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30289 and previous config saved to /var/cache/conftool/dbconfig/20220627-071434-root.json [07:14:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:06] PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [07:15:06] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1144 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30290 and previous config saved to /var/cache/conftool/dbconfig/20220627-071506-root.json [07:15:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:11] !log mlitn@deploy1002 Synchronized php-1.39.0-wmf.17/extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php: Backport: [[gerrit:808120|Echo tables can live in a different db]] (duration: 03m 45s) [07:15:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1157 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30291 and previous config saved to /var/cache/conftool/dbconfig/20220627-071539-root.json [07:15:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:50] (03CR) 10Ladsgroup: mediawiki: Split updateSpecialPages.php job to be per-shard (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/804788 (https://phabricator.wikimedia.org/T307314) (owner: 10Legoktm) [07:17:01] (03CR) 10Filippo Giunchedi: [V: 03+1] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/806207 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi) [07:17:32] (03CR) 10Giuseppe Lavagetto: "I would think we can do this in a less convoluted way by adding a new command to conftool to generate a prometheus exporter file, so that " [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246) (owner: 10Btullis) [07:18:11] kuncung: I'm afraid I don't know the first thing about the change you're making :p [07:18:27] kuncung: I'll gladly deploy it if you could find someone else to +1 the patch? [07:19:50] matthiasmullie: I make the tagline in header slightly large in Vector 2022 [07:19:51] (03CR) 10Slyngshede: [C: 03+2] Replace crontab with systemd timer for WikiTech dumps. [puppet] - 10https://gerrit.wikimedia.org/r/790670 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [07:20:42] (03CR) 10Urbanecm: [C: 03+1] "lgtm" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808124 (https://phabricator.wikimedia.org/T311104) (owner: 10Labdajiwa) [07:20:44] (03CR) 10Muehlenhoff: [C: 03+2] Add thirdparty/hwraid component for wikimedia-private repo [puppet] - 10https://gerrit.wikimedia.org/r/808253 (https://phabricator.wikimedia.org/T308027) (owner: 10Muehlenhoff) [07:20:55] matthiasmullie: in case my +1 makes you more comfortable :) [07:21:34] kuncung: urbanecm: thanks; will deploy it now [07:21:47] (03PS3) 10Matthias Mullie: Update tagline for jvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808124 (https://phabricator.wikimedia.org/T311104) (owner: 10Labdajiwa) [07:21:58] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808370 (https://phabricator.wikimedia.org/T310950) (owner: 10Stang) [07:22:12] Great! [07:22:23] matthiasmullie: no problem. i can also deploy if needed. [07:22:58] that's ok; I'll just yell if things appear to go wrong ^^ [07:23:06] okay, fine with me! [07:23:22] note the patch will need echo `https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-tagline-jv.svg | mwscript purgeList.php` once you sync it, as it changes static files [07:23:48] RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:24:06] urbanecm: thanks for the pointer, didn't know that! [07:24:24] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30292 and previous config saved to /var/cache/conftool/dbconfig/20220627-072424-ladsgroup.json [07:24:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:26] btw, anyone around for that 3rd patch? [07:25:08] koi: are you around? :) [07:25:17] (03CR) 10Matthias Mullie: [C: 03+2] Update tagline for jvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808124 (https://phabricator.wikimedia.org/T311104) (owner: 10Labdajiwa) [07:26:09] (03Merged) 10jenkins-bot: Update tagline for jvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808124 (https://phabricator.wikimedia.org/T311104) (owner: 10Labdajiwa) [07:27:18] kuncung: patch is on mwdebug1001, please confirm! [07:27:47] Okay. A moment please [07:30:24] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [07:30:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:10] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [07:33:11] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [07:33:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:34:53] (03PS1) 10Muehlenhoff: Extend access for bmansurov [puppet] - 10https://gerrit.wikimedia.org/r/808797 [07:35:30] matthiasmullie: Checked. LGTM [07:35:43] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [07:35:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:24] kuncung: syncing [07:37:53] matthiasmullie: Nice 👍. Thank you! [07:39:29] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T307525)', diff saved to https://phabricator.wikimedia.org/P30293 and previous config saved to /var/cache/conftool/dbconfig/20220627-073929-ladsgroup.json [07:39:32] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [07:39:33] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [07:39:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:35] T307525: Fix mismatching field type of user table for columns user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T307525 [07:39:38] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T307525)', diff saved to https://phabricator.wikimedia.org/P30294 and previous config saved to /var/cache/conftool/dbconfig/20220627-073938-ladsgroup.json [07:39:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:48] (03CR) 10Awight: Drop deprecated feature flags (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [07:39:48] !log mlitn@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808124|Update tagline for jvwiki (T311104)]] (duration: 03m 42s) [07:39:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:53] T311104: Add wordmark and tagline for Javanese Wikipedia, Wiktionary, and Wikisource - https://phabricator.wikimedia.org/T311104 [07:40:00] (03CR) 10Muehlenhoff: [C: 03+2] Extend access for bmansurov [puppet] - 10https://gerrit.wikimedia.org/r/808797 (owner: 10Muehlenhoff) [07:40:33] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30295 and previous config saved to /var/cache/conftool/dbconfig/20220627-074032-root.json [07:40:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30296 and previous config saved to /var/cache/conftool/dbconfig/20220627-074044-root.json [07:40:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:54] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30297 and previous config saved to /var/cache/conftool/dbconfig/20220627-074053-root.json [07:40:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30298 and previous config saved to /var/cache/conftool/dbconfig/20220627-074100-root.json [07:41:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30299 and previous config saved to /var/cache/conftool/dbconfig/20220627-074116-root.json [07:41:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30300 and previous config saved to /var/cache/conftool/dbconfig/20220627-074122-root.json [07:41:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:42] (03PS5) 10Awight: Drop deprecated feature flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) [07:43:38] !log mlitn@deploy1002 Synchronized static/images/mobile/copyright/wikipedia-tagline-jv.svg: Config: [[gerrit:808124|Update tagline for jvwiki (T311104)]] (duration: 03m 33s) [07:43:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:44] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T307525)', diff saved to https://phabricator.wikimedia.org/P30301 and previous config saved to /var/cache/conftool/dbconfig/20220627-074343-ladsgroup.json [07:43:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:08] urbanecm: looks like koi is not around for patch #3; we have another 15 minutes, but if no-one show up, should we move it to next window or just remove from current one? [07:44:12] PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_dispersion_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:44:14] kuncung: sync done [07:44:52] (03PS6) 10Awight: Drop deprecated feature flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) [07:45:34] (03CR) 10Awight: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808799 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [07:47:10] RECOVERY - puppet last run on labweb1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:54:04] matthiasmullie: just remove it :) [07:54:10] (or mark as not done) [07:55:17] Alright thanks [07:55:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30302 and previous config saved to /var/cache/conftool/dbconfig/20220627-075536-root.json [07:55:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30303 and previous config saved to /var/cache/conftool/dbconfig/20220627-075548-root.json [07:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30304 and previous config saved to /var/cache/conftool/dbconfig/20220627-075557-root.json [07:56:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30305 and previous config saved to /var/cache/conftool/dbconfig/20220627-075604-root.json [07:56:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30306 and previous config saved to /var/cache/conftool/dbconfig/20220627-075619-root.json [07:56:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30307 and previous config saved to /var/cache/conftool/dbconfig/20220627-075626-root.json [07:56:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:35] (03CR) 10WMDE-Fisch: [C: 03+1] Drop deprecated feature flags (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [07:58:49] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30308 and previous config saved to /var/cache/conftool/dbconfig/20220627-075848-ladsgroup.json [07:58:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:59:21] !log UTC morning backport done [07:59:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:59:39] !log installing openssl security updates [07:59:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:02] (03CR) 10Jcrespo: [C: 04-1] "No worries with the change itself, but I don't see anyone bringing up the systemctl alert spam- causing a regression with the current syst" [puppet] - 10https://gerrit.wikimedia.org/r/807118 (owner: 10Slyngshede) [08:00:26] RECOVERY - SSH on restbase1018.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [08:04:52] (03PS1) 10Marostegui: mariadb: Promote db1181 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/808801 (https://phabricator.wikimedia.org/T311033) [08:05:44] RECOVERY - Check systemd state on mirror1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:06:04] (03CR) 10Marostegui: [C: 04-2] "Wait for the failover day" [puppet] - 10https://gerrit.wikimedia.org/r/808801 (https://phabricator.wikimedia.org/T311033) (owner: 10Marostegui) [08:06:36] (03Abandoned) 10Awight: Finish removing deprecated config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808799 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [08:06:38] (03CR) 10WMDE-Fisch: [C: 04-1] "Still there's the issue that one could land before the other and something breaks." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [08:08:09] (03PS1) 10Marostegui: wmnet: Update s7-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/808802 (https://phabricator.wikimedia.org/T311033) [08:08:48] (03CR) 10Marostegui: [C: 04-2] "Wait for the failover date" [dns] - 10https://gerrit.wikimedia.org/r/808802 (https://phabricator.wikimedia.org/T311033) (owner: 10Marostegui) [08:09:55] (03PS7) 10Awight: Drop deprecated feature flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) [08:10:08] (03CR) 10Awight: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808803 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [08:10:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30309 and previous config saved to /var/cache/conftool/dbconfig/20220627-081040-root.json [08:10:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:52] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30310 and previous config saved to /var/cache/conftool/dbconfig/20220627-081052-root.json [08:10:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30311 and previous config saved to /var/cache/conftool/dbconfig/20220627-081101-root.json [08:11:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:08] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30312 and previous config saved to /var/cache/conftool/dbconfig/20220627-081107-root.json [08:11:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30313 and previous config saved to /var/cache/conftool/dbconfig/20220627-081123-root.json [08:11:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30314 and previous config saved to /var/cache/conftool/dbconfig/20220627-081130-root.json [08:11:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:13:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30315 and previous config saved to /var/cache/conftool/dbconfig/20220627-081353-ladsgroup.json [08:13:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:18] RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [08:17:20] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Drop dependent feature flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808803 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [08:18:14] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Drop deprecated feature flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [08:21:57] (03CR) 10WMDE-Fisch: [C: 03+1] Drop dependent feature flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808803 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [08:22:19] (03CR) 10WMDE-Fisch: [C: 03+1] Drop deprecated feature flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [08:22:38] PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [08:23:48] (03PS1) 10Marostegui: pc1014: Move it to pc2 [puppet] - 10https://gerrit.wikimedia.org/r/808804 [08:24:44] (03CR) 10Marostegui: [C: 03+2] pc1014: Move it to pc2 [puppet] - 10https://gerrit.wikimedia.org/r/808804 (owner: 10Marostegui) [08:25:44] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30316 and previous config saved to /var/cache/conftool/dbconfig/20220627-082544-root.json [08:25:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:56] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30317 and previous config saved to /var/cache/conftool/dbconfig/20220627-082556-root.json [08:25:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:05] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30318 and previous config saved to /var/cache/conftool/dbconfig/20220627-082605-root.json [08:26:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30319 and previous config saved to /var/cache/conftool/dbconfig/20220627-082611-root.json [08:26:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30320 and previous config saved to /var/cache/conftool/dbconfig/20220627-082627-root.json [08:26:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30321 and previous config saved to /var/cache/conftool/dbconfig/20220627-082634-root.json [08:26:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:28:59] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T307525)', diff saved to https://phabricator.wikimedia.org/P30322 and previous config saved to /var/cache/conftool/dbconfig/20220627-082859-ladsgroup.json [08:29:01] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [08:29:03] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [08:29:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:05] T307525: Fix mismatching field type of user table for columns user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T307525 [08:29:08] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T307525)', diff saved to https://phabricator.wikimedia.org/P30323 and previous config saved to /var/cache/conftool/dbconfig/20220627-082907-ladsgroup.json [08:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:33:15] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T307525)', diff saved to https://phabricator.wikimedia.org/P30324 and previous config saved to /var/cache/conftool/dbconfig/20220627-083314-ladsgroup.json [08:33:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:39] (03CR) 10Ayounsi: "Opened https://phabricator.wikimedia.org/T311385 for discussions" [dns] - 10https://gerrit.wikimedia.org/r/808198 (https://phabricator.wikimedia.org/T296452) (owner: 10Jbond) [08:39:08] (03PS10) 10Slyngshede: C:base::puppet move Puppet to Systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/807118 [08:40:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30325 and previous config saved to /var/cache/conftool/dbconfig/20220627-084048-root.json [08:40:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30326 and previous config saved to /var/cache/conftool/dbconfig/20220627-084100-root.json [08:41:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30327 and previous config saved to /var/cache/conftool/dbconfig/20220627-084109-root.json [08:41:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30328 and previous config saved to /var/cache/conftool/dbconfig/20220627-084115-root.json [08:41:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30329 and previous config saved to /var/cache/conftool/dbconfig/20220627-084131-root.json [08:41:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30330 and previous config saved to /var/cache/conftool/dbconfig/20220627-084138-root.json [08:41:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:20] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30331 and previous config saved to /var/cache/conftool/dbconfig/20220627-084819-ladsgroup.json [08:48:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:50] (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36031/console" [puppet] - 10https://gerrit.wikimedia.org/r/807118 (owner: 10Slyngshede) [08:55:04] (03CR) 10Jcrespo: C:base::puppet move Puppet to Systemd timer (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/807118 (owner: 10Slyngshede) [08:55:52] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30332 and previous config saved to /var/cache/conftool/dbconfig/20220627-085552-root.json [08:55:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30333 and previous config saved to /var/cache/conftool/dbconfig/20220627-085604-root.json [08:56:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:13] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30334 and previous config saved to /var/cache/conftool/dbconfig/20220627-085613-root.json [08:56:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:19] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30335 and previous config saved to /var/cache/conftool/dbconfig/20220627-085619-root.json [08:56:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30336 and previous config saved to /var/cache/conftool/dbconfig/20220627-085635-root.json [08:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:42] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30337 and previous config saved to /var/cache/conftool/dbconfig/20220627-085642-root.json [08:56:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:08] (03PS11) 10Slyngshede: C:base::puppet move Puppet to Systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) [08:59:33] (03CR) 10Jcrespo: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [08:59:44] (03CR) 10Slyngshede: C:base::puppet move Puppet to Systemd timer (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [09:00:28] (03CR) 10Jcrespo: [C: 03+1] C:base::puppet move Puppet to Systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [09:01:06] (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36032/console" [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [09:01:11] !log mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Dzoo' 'DZoo' # fixing stuck rename on T219279 [09:01:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:17] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [09:03:25] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30338 and previous config saved to /var/cache/conftool/dbconfig/20220627-090324-ladsgroup.json [09:03:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:08] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30339 and previous config saved to /var/cache/conftool/dbconfig/20220627-091107-root.json [09:11:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30340 and previous config saved to /var/cache/conftool/dbconfig/20220627-091123-root.json [09:11:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30341 and previous config saved to /var/cache/conftool/dbconfig/20220627-091139-root.json [09:11:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:46] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30342 and previous config saved to /var/cache/conftool/dbconfig/20220627-091146-root.json [09:11:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:53] (03PS1) 10PipelineBot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808812 [09:16:50] (03CR) 10Ladsgroup: [C: 03+1] mariadb: Promote db1181 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/808801 (https://phabricator.wikimedia.org/T311033) (owner: 10Marostegui) [09:17:07] (03CR) 10Ladsgroup: [C: 03+1] wmnet: Update s7-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/808802 (https://phabricator.wikimedia.org/T311033) (owner: 10Marostegui) [09:18:05] (03PS1) 10PipelineBot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808814 [09:18:30] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T307525)', diff saved to https://phabricator.wikimedia.org/P30343 and previous config saved to /var/cache/conftool/dbconfig/20220627-091829-ladsgroup.json [09:18:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:36] T307525: Fix mismatching field type of user table for columns user_newpassword, user_password, user_email on wmf wikis - https://phabricator.wikimedia.org/T307525 [09:19:48] (03PS1) 10PipelineBot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808815 [09:20:03] (03PS1) 10PipelineBot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808816 [09:21:19] (03PS1) 10PipelineBot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808818 [09:21:32] (03PS1) 10PipelineBot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808819 [09:21:54] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1106 db1146 db1156 db1157 db1161 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30344 and previous config saved to /var/cache/conftool/dbconfig/20220627-092154-root.json [09:21:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:50] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [09:22:51] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [09:22:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:56] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30345 and previous config saved to /var/cache/conftool/dbconfig/20220627-092256-ladsgroup.json [09:22:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:23:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:23:04] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched on wmf wikis - https://phabricator.wikimedia.org/T298565 [09:23:54] RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [09:24:01] (03PS1) 10PipelineBot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808821 [09:25:37] (03PS1) 10PipelineBot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808822 [09:25:43] (03PS1) 10PipelineBot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808823 [09:26:52] (03CR) 10Jbond: [C: 03+2] C:postgresql: grab the data directory from postgresql [puppet] - 10https://gerrit.wikimedia.org/r/807553 (https://phabricator.wikimedia.org/T311156) (owner: 10Jbond) [09:27:11] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30346 and previous config saved to /var/cache/conftool/dbconfig/20220627-092710-ladsgroup.json [09:27:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:30] (03PS1) 10PipelineBot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808824 [09:30:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30347 and previous config saved to /var/cache/conftool/dbconfig/20220627-093023-root.json [09:30:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30348 and previous config saved to /var/cache/conftool/dbconfig/20220627-093035-root.json [09:30:37] (03PS1) 10PipelineBot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808827 [09:30:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:53] (03PS1) 10Elukey: Merge branch 'master' into debian [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808828 [09:30:55] (03PS1) 10Elukey: Release version 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808829 (https://phabricator.wikimedia.org/T310980) [09:32:03] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30349 and previous config saved to /var/cache/conftool/dbconfig/20220627-093203-root.json [09:32:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1156 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30350 and previous config saved to /var/cache/conftool/dbconfig/20220627-093208-root.json [09:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30351 and previous config saved to /var/cache/conftool/dbconfig/20220627-093214-root.json [09:32:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:08] (03PS1) 10PipelineBot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808830 [09:36:34] (03PS1) 10PipelineBot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808832 [09:37:50] (03PS1) 10PipelineBot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808833 [09:42:16] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30352 and previous config saved to /var/cache/conftool/dbconfig/20220627-094216-ladsgroup.json [09:42:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:05] !log jgiannelos@deploy1002 helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply [09:43:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:13] (03PS1) 10Lucas Werkmeister (WMDE): Add WikibaseTerms temporary debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808836 (https://phabricator.wikimedia.org/T311307) [09:45:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30353 and previous config saved to /var/cache/conftool/dbconfig/20220627-094527-root.json [09:45:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:30] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808828 (owner: 10Elukey) [09:45:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30354 and previous config saved to /var/cache/conftool/dbconfig/20220627-094539-root.json [09:45:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30355 and previous config saved to /var/cache/conftool/dbconfig/20220627-094707-root.json [09:47:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:11] (03CR) 10Muehlenhoff: Release version 1.1.0 (031 comment) [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808829 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [09:47:13] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30356 and previous config saved to /var/cache/conftool/dbconfig/20220627-094712-root.json [09:47:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30357 and previous config saved to /var/cache/conftool/dbconfig/20220627-094718-root.json [09:47:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:14] RECOVERY - k8s API server requests latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27 [09:49:12] (03PS1) 10PipelineBot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808838 [09:49:35] (03PS2) 10Elukey: Release version 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808829 (https://phabricator.wikimedia.org/T310980) [09:49:45] (03CR) 10Elukey: "Thanks!" [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808829 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [09:50:39] (03CR) 10Lucas Werkmeister (WMDE): "(reviewing for backport+config window later)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808803 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [09:53:17] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Drop deprecated feature flags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [09:53:21] !log jgiannelos@deploy1002 helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply [09:53:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:30] !log jgiannelos@deploy1002 helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply [09:53:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:06] !log jgiannelos@deploy1002 helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply [09:54:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:09] !log jgiannelos@deploy1002 helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply [09:54:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:32] !log jgiannelos@deploy1002 helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync [09:54:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:12] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27 [09:56:37] (03CR) 10Michael Große: [C: 03+1] Add WikibaseTerms temporary debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808836 (https://phabricator.wikimedia.org/T311307) (owner: 10Lucas Werkmeister (WMDE)) [09:57:21] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30358 and previous config saved to /var/cache/conftool/dbconfig/20220627-095721-ladsgroup.json [09:57:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:36] !log copy cassandra and cassandra-tools packages in component/cassandra{311,dev} from wikimedia buster to bullseye - T310980 [09:57:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:41] T310980: Allow Cassandra to be deployed on Bullseye nodes - https://phabricator.wikimedia.org/T310980 [09:59:37] (03PS3) 10Elukey: Release version 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808829 (https://phabricator.wikimedia.org/T310980) [10:00:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30359 and previous config saved to /var/cache/conftool/dbconfig/20220627-100031-root.json [10:00:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:36] !log jgiannelos@deploy1002 helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync [10:00:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30360 and previous config saved to /var/cache/conftool/dbconfig/20220627-100043-root.json [10:00:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:59] (03CR) 10Tim Starling: "I should benchmark this to confirm it is actually necessary, now that I know about noreply." [puppet] - 10https://gerrit.wikimedia.org/r/807665 (https://phabricator.wikimedia.org/T310662) (owner: 10Tim Starling) [10:02:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30361 and previous config saved to /var/cache/conftool/dbconfig/20220627-100211-root.json [10:02:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:17] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30362 and previous config saved to /var/cache/conftool/dbconfig/20220627-100216-root.json [10:02:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:22] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30363 and previous config saved to /var/cache/conftool/dbconfig/20220627-100221-root.json [10:02:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:41] (03CR) 10WMDE-Fisch: [C: 04-2] "Postponing to cleanup the last bit that's still used. Thanks @Lukas." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808803 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [10:04:01] (03CR) 10WMDE-Fisch: [C: 04-2] "Postponing to cleanup the last bit that's still used. Thanks @Lukas." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804609 (https://phabricator.wikimedia.org/T310684) (owner: 10Awight) [10:06:24] (03CR) 10Volans: "replies inline" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/807986 (owner: 10Volans) [10:06:36] (03CR) 10Muehlenhoff: [C: 03+1] "Ship it :)" [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808829 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [10:09:56] 10SRE, 10Patch-For-Review, 10SRE Observability (FY2021/2022-Q4), 10Sustainability (Incident Followup): Most Icinga http checks ignore the URL parameter - https://phabricator.wikimedia.org/T304321 (10jbond) 05Open→03Resolved a:03jbond This can now be closed down. We ended up reverting back to the old... [10:09:58] (03PS3) 10Jbond: P:mediawiki::scap_client: add parameter to indicate scap master [puppet] - 10https://gerrit.wikimedia.org/r/807510 (https://phabricator.wikimedia.org/T310740) [10:10:02] (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [10:10:20] (03CR) 10CI reject: [V: 04-1] P:mediawiki::scap_client: add parameter to indicate scap master [puppet] - 10https://gerrit.wikimedia.org/r/807510 (https://phabricator.wikimedia.org/T310740) (owner: 10Jbond) [10:10:30] (03CR) 10Jbond: "updated, however from the original task im not sure if this CR is still useful?" [puppet] - 10https://gerrit.wikimedia.org/r/807510 (https://phabricator.wikimedia.org/T310740) (owner: 10Jbond) [10:11:25] 10SRE-swift-storage, 10SRE Observability, 10Patch-For-Review: swift hosts (thanos-fe1001, ms-be2012) with failed prometheus-ipmi-exporter services - https://phabricator.wikimedia.org/T311262 (10fgiunchedi) This issue seems to be related to deploying ipmi-exporter fleetwide for the first while the host/swift... [10:12:27] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30364 and previous config saved to /var/cache/conftool/dbconfig/20220627-101226-ladsgroup.json [10:12:29] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [10:12:30] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [10:12:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:12:32] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched on wmf wikis - https://phabricator.wikimedia.org/T298565 [10:12:35] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30365 and previous config saved to /var/cache/conftool/dbconfig/20220627-101235-ladsgroup.json [10:12:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:12:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30366 and previous config saved to /var/cache/conftool/dbconfig/20220627-101535-root.json [10:15:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30367 and previous config saved to /var/cache/conftool/dbconfig/20220627-101547-root.json [10:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:42] 10SRE, 10SRE-swift-storage, 10ops-codfw, 10DC-Ops: Install NVMe SSDs into moss-be200[1|2] & thanos-be200? - https://phabricator.wikimedia.org/T310923 (10LSobanski) Correction, Matthew will be back next Monday, I reached out to Filippo to see if I can provide any details this week. [10:16:43] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30368 and previous config saved to /var/cache/conftool/dbconfig/20220627-101643-ladsgroup.json [10:16:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:51] (03CR) 10Jgiannelos: [C: 03+1] "I re-enabled the tile pregeneration cronjobs on k8s and sent the events for tile pregeneration. This patch is now unblocked." [puppet] - 10https://gerrit.wikimedia.org/r/807108 (https://phabricator.wikimedia.org/T305845) (owner: 10MSantos) [10:17:14] 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Install NVMe SSDs into moss-be100[1|2] & thanos-be100? - https://phabricator.wikimedia.org/T310922 (10LSobanski) Matthew is out until next Monday, I reached out to Filippo to see if I can provide any details this week on which Thanos host we'll be using. [10:17:15] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30369 and previous config saved to /var/cache/conftool/dbconfig/20220627-101714-root.json [10:17:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30370 and previous config saved to /var/cache/conftool/dbconfig/20220627-101720-root.json [10:17:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:26] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30371 and previous config saved to /var/cache/conftool/dbconfig/20220627-101725-root.json [10:17:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:14] (03PS1) 10Vgutierrez: limit jinja2 version [software/acme-chief] - 10https://gerrit.wikimedia.org/r/808853 [10:22:12] (03CR) 10Jbond: [C: 03+1] C:base::puppet move Puppet to Systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [10:22:47] (03PS3) 10MSantos: maps: re-enable tile generation cron in codfw [puppet] - 10https://gerrit.wikimedia.org/r/807108 (https://phabricator.wikimedia.org/T305845) [10:24:09] (03CR) 10CI reject: [V: 04-1] maps: re-enable tile generation cron in codfw [puppet] - 10https://gerrit.wikimedia.org/r/807108 (https://phabricator.wikimedia.org/T305845) (owner: 10MSantos) [10:24:42] RECOVERY - BGP status on cr3-eqsin is OK: BGP OK - up: 346, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [10:25:18] (03PS4) 10Jbond: P:mediawiki::scap_client: add parameter to indicate scap master [puppet] - 10https://gerrit.wikimedia.org/r/807510 (https://phabricator.wikimedia.org/T310740) [10:27:20] (03CR) 10CI reject: [V: 04-1] limit jinja2 version [software/acme-chief] - 10https://gerrit.wikimedia.org/r/808853 (owner: 10Vgutierrez) [10:29:12] (03PS4) 10MSantos: maps: re-enable tile generation in codfw [puppet] - 10https://gerrit.wikimedia.org/r/807108 (https://phabricator.wikimedia.org/T305845) [10:29:43] (03PS5) 10MSantos: maps: re-enable tile generation in codfw Bug: T305845 Depends-On: I662cd25aec05ae3b62ed738f5e30c96d9b963a65 Change-Id: Ief448ed35ed79a47fc53031e82630779727235e7 [puppet] - 10https://gerrit.wikimedia.org/r/807108 (https://phabricator.wikimedia.org/T305845) [10:30:33] (03CR) 10CI reject: [V: 04-1] maps: re-enable tile generation in codfw Bug: T305845 Depends-On: I662cd25aec05ae3b62ed738f5e30c96d9b963a65 Change-Id: Ief448ed35ed79a47fc53031e82630779727235e7 [puppet] - 10https://gerrit.wikimedia.org/r/807108 (https://phabricator.wikimedia.org/T305845) (owner: 10MSantos) [10:30:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30372 and previous config saved to /var/cache/conftool/dbconfig/20220627-103039-root.json [10:30:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30373 and previous config saved to /var/cache/conftool/dbconfig/20220627-103051-root.json [10:30:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:44] I am going to restart the CI Jenkins [10:31:48] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30374 and previous config saved to /var/cache/conftool/dbconfig/20220627-103148-ladsgroup.json [10:31:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:32:19] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30375 and previous config saved to /var/cache/conftool/dbconfig/20220627-103218-root.json [10:32:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:32:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30376 and previous config saved to /var/cache/conftool/dbconfig/20220627-103224-root.json [10:32:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:32:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30377 and previous config saved to /var/cache/conftool/dbconfig/20220627-103229-root.json [10:32:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:32:43] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, one nit inline (but feel free to ignore)" [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [10:34:37] (03PS2) 10Vgutierrez: limit jinja2 version [software/acme-chief] - 10https://gerrit.wikimedia.org/r/808853 [10:36:19] (03CR) 10Filippo Giunchedi: Add a host's conftool pooled status and weight per service to prometheus (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246) (owner: 10Btullis) [10:36:43] (03PS6) 10MSantos: maps: re-enable tile generation in codfw [puppet] - 10https://gerrit.wikimedia.org/r/807108 (https://phabricator.wikimedia.org/T305845) [10:37:07] (03CR) 10MSantos: [C: 03+1] Improve performance of Tegola tile pregeneration [puppet] - 10https://gerrit.wikimedia.org/r/790679 (https://phabricator.wikimedia.org/T307182) (owner: 10Isabelle Hurbain-Palatin) [10:38:29] (03CR) 10CI reject: [V: 04-1] limit jinja2 version [software/acme-chief] - 10https://gerrit.wikimedia.org/r/808853 (owner: 10Vgutierrez) [10:39:24] (03PS12) 10Slyngshede: C:base::puppet move Puppet to Systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) [10:39:37] !log Restarting CI Jenkins [10:39:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:41:33] (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36034/console" [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [10:43:37] (03PS13) 10Slyngshede: C:base::puppet move Puppet to Systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) [10:45:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30378 and previous config saved to /var/cache/conftool/dbconfig/20220627-104543-root.json [10:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30379 and previous config saved to /var/cache/conftool/dbconfig/20220627-104555-root.json [10:45:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:46:53] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30380 and previous config saved to /var/cache/conftool/dbconfig/20220627-104653-ladsgroup.json [10:46:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:11] (03CR) 10Slyngshede: C:base::puppet move Puppet to Systemd timer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [10:47:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30381 and previous config saved to /var/cache/conftool/dbconfig/20220627-104722-root.json [10:47:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30382 and previous config saved to /var/cache/conftool/dbconfig/20220627-104728-root.json [10:47:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:33] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30383 and previous config saved to /var/cache/conftool/dbconfig/20220627-104733-root.json [10:47:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:34] (03PS3) 10Vgutierrez: limit jinja2 version [software/acme-chief] - 10https://gerrit.wikimedia.org/r/808853 [10:53:40] (03PS1) 10Jbond: C:base::puppet: add documentation and fix mionor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/808857 [10:54:15] (03CR) 10CI reject: [V: 04-1] C:base::puppet: add documentation and fix mionor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/808857 (owner: 10Jbond) [10:56:06] (03CR) 10Hnowlan: [C: 03+2] Improve performance of Tegola tile pregeneration [puppet] - 10https://gerrit.wikimedia.org/r/790679 (https://phabricator.wikimedia.org/T307182) (owner: 10Isabelle Hurbain-Palatin) [10:56:35] (03PS2) 10Jbond: C:base::puppet: add documentation and fix mionor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/808857 [10:58:04] (03CR) 10Hnowlan: [C: 03+2] maps: re-enable tile generation in codfw [puppet] - 10https://gerrit.wikimedia.org/r/807108 (https://phabricator.wikimedia.org/T305845) (owner: 10MSantos) [10:58:26] (03PS7) 10Hnowlan: maps: re-enable tile generation in codfw [puppet] - 10https://gerrit.wikimedia.org/r/807108 (https://phabricator.wikimedia.org/T305845) (owner: 10MSantos) [10:58:57] (03CR) 10CI reject: [V: 04-1] limit jinja2 version [software/acme-chief] - 10https://gerrit.wikimedia.org/r/808853 (owner: 10Vgutierrez) [10:59:03] (03CR) 10Muehlenhoff: "Two final bits, otherwise LGTM" [debs/prometheus-ganeti-exporter] - 10https://gerrit.wikimedia.org/r/804276 (https://phabricator.wikimedia.org/T311288) (owner: 10Slyngshede) [11:00:59] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30384 and previous config saved to /var/cache/conftool/dbconfig/20220627-110058-root.json [11:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:01:58] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30385 and previous config saved to /var/cache/conftool/dbconfig/20220627-110158-ladsgroup.json [11:02:01] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [11:02:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:02] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched on wmf wikis - https://phabricator.wikimedia.org/T298565 [11:02:02] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [11:02:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:07] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30386 and previous config saved to /var/cache/conftool/dbconfig/20220627-110207-ladsgroup.json [11:02:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30387 and previous config saved to /var/cache/conftool/dbconfig/20220627-110226-root.json [11:02:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30388 and previous config saved to /var/cache/conftool/dbconfig/20220627-110232-root.json [11:02:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30389 and previous config saved to /var/cache/conftool/dbconfig/20220627-110237-root.json [11:02:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:59] (03CR) 10Hnowlan: [C: 03+1] "lgtm" [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808829 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [11:03:23] (03CR) 10Hnowlan: [C: 03+2] image-suggestion: new container version [deployment-charts] - 10https://gerrit.wikimedia.org/r/808228 (https://phabricator.wikimedia.org/T311220) (owner: 10Hnowlan) [11:06:25] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30390 and previous config saved to /var/cache/conftool/dbconfig/20220627-110624-ladsgroup.json [11:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:51] (03Merged) 10jenkins-bot: image-suggestion: new container version [deployment-charts] - 10https://gerrit.wikimedia.org/r/808228 (https://phabricator.wikimedia.org/T311220) (owner: 10Hnowlan) [11:13:37] !log hnowlan@deploy1002 helmfile [staging] START helmfile.d/services/image-suggestion: sync [11:13:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:14:02] !log hnowlan@deploy1002 helmfile [staging] DONE helmfile.d/services/image-suggestion: sync [11:14:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:30] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30391 and previous config saved to /var/cache/conftool/dbconfig/20220627-112129-ladsgroup.json [11:21:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:47] (03PS4) 10Vgutierrez: limit jinja2 version [software/acme-chief] - 10https://gerrit.wikimedia.org/r/808853 [11:26:59] (03PS9) 10Slyngshede: Ganeti Prometheus exporter, initial checkin [debs/prometheus-ganeti-exporter] - 10https://gerrit.wikimedia.org/r/804276 (https://phabricator.wikimedia.org/T311288) [11:27:12] (03CR) 10Slyngshede: Ganeti Prometheus exporter, initial checkin (032 comments) [debs/prometheus-ganeti-exporter] - 10https://gerrit.wikimedia.org/r/804276 (https://phabricator.wikimedia.org/T311288) (owner: 10Slyngshede) [11:29:20] !log hnowlan@deploy1002 helmfile [codfw] START helmfile.d/services/image-suggestion: sync [11:29:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:56] !log hnowlan@deploy1002 helmfile [codfw] DONE helmfile.d/services/image-suggestion: sync [11:29:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:28] !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/image-suggestion: sync [11:30:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:03] !log hnowlan@deploy1002 helmfile [eqiad] DONE helmfile.d/services/image-suggestion: sync [11:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:02] (03PS3) 10Jbond: C:base::puppet: add documentation and fix mionor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/808857 [11:32:04] (03PS5) 10Vgutierrez: limit jinja2, itsdangerous and werkzeug version [software/acme-chief] - 10https://gerrit.wikimedia.org/r/808853 [11:32:50] (03PS1) 10Btullis: Add a new partman recipe for the new H750 based stat servers [puppet] - 10https://gerrit.wikimedia.org/r/808870 (https://phabricator.wikimedia.org/T299466) [11:33:54] (03CR) 10CI reject: [V: 04-1] Add a new partman recipe for the new H750 based stat servers [puppet] - 10https://gerrit.wikimedia.org/r/808870 (https://phabricator.wikimedia.org/T299466) (owner: 10Btullis) [11:34:46] (03PS4) 10Jbond: C:base::puppet: add documentation and fix mionor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/808857 [11:35:44] (03PS2) 10Btullis: Add a new partman recipe for the new H750 based stat servers [puppet] - 10https://gerrit.wikimedia.org/r/808870 (https://phabricator.wikimedia.org/T299466) [11:36:35] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30392 and previous config saved to /var/cache/conftool/dbconfig/20220627-113634-ladsgroup.json [11:36:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:53] (03Abandoned) 10Reedy: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808812 (owner: 10PipelineBot) [11:37:55] (03Abandoned) 10Reedy: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808814 (owner: 10PipelineBot) [11:37:58] (03Abandoned) 10Reedy: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808815 (owner: 10PipelineBot) [11:38:00] (03Abandoned) 10Reedy: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808816 (owner: 10PipelineBot) [11:38:08] (03Abandoned) 10Reedy: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808818 (owner: 10PipelineBot) [11:38:10] (03Abandoned) 10Reedy: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808819 (owner: 10PipelineBot) [11:38:32] (03Abandoned) 10Reedy: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808821 (owner: 10PipelineBot) [11:38:34] (03Abandoned) 10Reedy: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808822 (owner: 10PipelineBot) [11:38:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1134 db1147 db1158 d1165 db1167 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30393 and previous config saved to /var/cache/conftool/dbconfig/20220627-113834-root.json [11:38:36] (03Abandoned) 10Reedy: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808823 (owner: 10PipelineBot) [11:38:38] (03Abandoned) 10Reedy: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808824 (owner: 10PipelineBot) [11:38:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:40] (03Abandoned) 10Reedy: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808830 (owner: 10PipelineBot) [11:38:42] (03Abandoned) 10Reedy: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808827 (owner: 10PipelineBot) [11:38:48] (03Abandoned) 10Reedy: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808832 (owner: 10PipelineBot) [11:38:50] (03Abandoned) 10Reedy: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808833 (owner: 10PipelineBot) [11:38:52] (03Abandoned) 10Reedy: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808838 (owner: 10PipelineBot) [11:40:05] I see why legoktm disabled that on master ;D [11:40:24] (03PS3) 10Btullis: Add a new partman recipe for the new H750 based stat servers [puppet] - 10https://gerrit.wikimedia.org/r/808870 (https://phabricator.wikimedia.org/T299466) [11:42:13] (03PS5) 10Jbond: C:base::puppet: add documentation and fix mionor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/808857 [11:44:44] (03PS6) 10Jbond: C:base::puppet: add documentation and fix mionor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/808857 [11:44:49] (03CR) 10CI reject: [V: 04-1] C:base::puppet: add documentation and fix mionor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/808857 (owner: 10Jbond) [11:46:22] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36040/console" [puppet] - 10https://gerrit.wikimedia.org/r/808857 (owner: 10Jbond) [11:46:35] RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:46:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30394 and previous config saved to /var/cache/conftool/dbconfig/20220627-114658-root.json [11:47:03] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30395 and previous config saved to /var/cache/conftool/dbconfig/20220627-114703-root.json [11:47:08] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30396 and previous config saved to /var/cache/conftool/dbconfig/20220627-114707-root.json [11:47:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1165 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30397 and previous config saved to /var/cache/conftool/dbconfig/20220627-114712-root.json [11:47:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:19] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1167 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30398 and previous config saved to /var/cache/conftool/dbconfig/20220627-114718-root.json [11:47:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:40] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30399 and previous config saved to /var/cache/conftool/dbconfig/20220627-115140-ladsgroup.json [11:51:42] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [11:51:44] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [11:51:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:46] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched on wmf wikis - https://phabricator.wikimedia.org/T298565 [11:51:49] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30400 and previous config saved to /var/cache/conftool/dbconfig/20220627-115148-ladsgroup.json [11:51:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:49] (03CR) 10Btullis: "I'm adding this new partmen recipe, but I haven't tried it out yet." [puppet] - 10https://gerrit.wikimedia.org/r/808870 (https://phabricator.wikimedia.org/T299466) (owner: 10Btullis) [11:56:05] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30401 and previous config saved to /var/cache/conftool/dbconfig/20220627-115604-ladsgroup.json [11:56:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:59:37] (03CR) 10Jbond: [V: 03+1 C: 03+2] C:base::puppet: add documentation and fix mionor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/808857 (owner: 10Jbond) [12:02:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30402 and previous config saved to /var/cache/conftool/dbconfig/20220627-120201-root.json [12:02:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30403 and previous config saved to /var/cache/conftool/dbconfig/20220627-120207-root.json [12:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30404 and previous config saved to /var/cache/conftool/dbconfig/20220627-120211-root.json [12:02:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30405 and previous config saved to /var/cache/conftool/dbconfig/20220627-120216-root.json [12:02:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1167 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30406 and previous config saved to /var/cache/conftool/dbconfig/20220627-120222-root.json [12:02:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:32] (03PS1) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [12:07:23] PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift-account-stats_mlserve:prod.service,swift_dispersion_stats_lowlatency.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:11:11] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30407 and previous config saved to /var/cache/conftool/dbconfig/20220627-121109-ladsgroup.json [12:11:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:10] 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q4:(Need By: TBD) rack/setup/install an-presto10[06-15].eqiad.wmnet - https://phabricator.wikimedia.org/T306835 (10BTullis) @Cmjohnson - I think we should just use bullseye for these hosts, if there's a controller issue with buster. We should be able to get... [12:17:08] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30408 and previous config saved to /var/cache/conftool/dbconfig/20220627-121708-root.json [12:17:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30409 and previous config saved to /var/cache/conftool/dbconfig/20220627-121711-root.json [12:17:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30410 and previous config saved to /var/cache/conftool/dbconfig/20220627-121715-root.json [12:17:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30411 and previous config saved to /var/cache/conftool/dbconfig/20220627-121720-root.json [12:17:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30412 and previous config saved to /var/cache/conftool/dbconfig/20220627-121726-root.json [12:17:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:40] 10SRE, 10DNS, 10Traffic-Icebox, 10Wikimedia-Language-setup, 10Patch-For-Review: nan and minnan subdomain redirects are a mess - https://phabricator.wikimedia.org/T86915 (10JArguello-WMF) [12:18:30] 10SRE, 10ops-codfw, 10DBA, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install db2153.codfw.wmnet - db2174.codfw.wmnet - https://phabricator.wikimedia.org/T306927 (10Marostegui) @Papaul from which racks do you prefer me to try to decommission some of the hosts before you rack all these 22 servers? These hosts w... [12:18:57] 10SRE, 10ops-codfw, 10DBA, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install db2153.codfw.wmnet - db2174.codfw.wmnet - https://phabricator.wikimedia.org/T306927 (10Marostegui) I can decommission some of those so it can be just as swap. How many beforehand decommissions would help you? [12:26:19] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30413 and previous config saved to /var/cache/conftool/dbconfig/20220627-122618-ladsgroup.json [12:26:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30414 and previous config saved to /var/cache/conftool/dbconfig/20220627-123211-root.json [12:32:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30415 and previous config saved to /var/cache/conftool/dbconfig/20220627-123215-root.json [12:32:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30416 and previous config saved to /var/cache/conftool/dbconfig/20220627-123219-root.json [12:32:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30417 and previous config saved to /var/cache/conftool/dbconfig/20220627-123224-root.json [12:32:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30418 and previous config saved to /var/cache/conftool/dbconfig/20220627-123230-root.json [12:32:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:58] 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering, 10Patch-For-Review: Q4: rack/setup/install stat1010 - https://phabricator.wikimedia.org/T307399 (10BTullis) Thanks @Cmjohnson for your replies. I've created a new partman recipe in https://gerrit.wikimedia.org/r/808870 although I haven't yet tested it. If... [12:40:24] (03PS1) 10Alexandros Kosiaris: ipmi_exporter: Have wrapper depend on sudo::user [puppet] - 10https://gerrit.wikimedia.org/r/808883 [12:41:24] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30419 and previous config saved to /var/cache/conftool/dbconfig/20220627-124124-ladsgroup.json [12:41:26] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [12:41:28] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [12:41:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:30] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched on wmf wikis - https://phabricator.wikimedia.org/T298565 [12:41:33] (03CR) 10Alexandros Kosiaris: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36041/console" [puppet] - 10https://gerrit.wikimedia.org/r/808883 (owner: 10Alexandros Kosiaris) [12:41:33] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30420 and previous config saved to /var/cache/conftool/dbconfig/20220627-124132-ladsgroup.json [12:41:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:13] (03CR) 10Hokwelum: [C: 03+1] "Thank you for the fix! Ariel and I used pcc to validate if emails would be sent when there are no errors of any kind and it looks good." [puppet] - 10https://gerrit.wikimedia.org/r/806366 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [12:46:49] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30421 and previous config saved to /var/cache/conftool/dbconfig/20220627-124648-ladsgroup.json [12:46:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:55] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched on wmf wikis - https://phabricator.wikimedia.org/T298565 [12:47:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30422 and previous config saved to /var/cache/conftool/dbconfig/20220627-124715-root.json [12:47:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30423 and previous config saved to /var/cache/conftool/dbconfig/20220627-124719-root.json [12:47:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30424 and previous config saved to /var/cache/conftool/dbconfig/20220627-124723-root.json [12:47:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30425 and previous config saved to /var/cache/conftool/dbconfig/20220627-124728-root.json [12:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30426 and previous config saved to /var/cache/conftool/dbconfig/20220627-124734-root.json [12:47:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:01] 10SRE, 10serviceops, 10Patch-For-Review: Update conf1* servers - https://phabricator.wikimedia.org/T310062 (10akosiaris) [12:48:03] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q3:(Need By: TBD) rack/setup/install conf100[789] - https://phabricator.wikimedia.org/T301272 (10akosiaris) [12:51:25] (03PS1) 10Jbond: P:base::puppet: rename profile [puppet] - 10https://gerrit.wikimedia.org/r/808884 [12:51:27] (03PS1) 10Jbond: puppet: rename base::puppet class to puppet::agent [puppet] - 10https://gerrit.wikimedia.org/r/808885 [12:52:14] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/807983 (owner: 10Slyngshede) [12:52:39] !log Switch Puppet from cron to systemd timers, https://gerrit.wikimedia.org/r/c/operations/puppet/+/807118/ [12:52:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:30] (03CR) 10Btullis: [C: 03+2] Add a new partman recipe for the new H750 based stat servers [puppet] - 10https://gerrit.wikimedia.org/r/808870 (https://phabricator.wikimedia.org/T299466) (owner: 10Btullis) [12:54:16] (03CR) 10CI reject: [V: 04-1] puppet: rename base::puppet class to puppet::agent [puppet] - 10https://gerrit.wikimedia.org/r/808885 (owner: 10Jbond) [12:54:28] (03CR) 10Slyngshede: [C: 03+2] C:base::puppet move Puppet to Systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede) [12:54:36] (03CR) 10Vgutierrez: [C: 03+2] limit jinja2, itsdangerous and werkzeug version [software/acme-chief] - 10https://gerrit.wikimedia.org/r/808853 (owner: 10Vgutierrez) [12:54:45] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36042/console" [puppet] - 10https://gerrit.wikimedia.org/r/808884 (owner: 10Jbond) [12:54:49] (03PS14) 10Slyngshede: C:base::puppet move Puppet to Systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/807118 (https://phabricator.wikimedia.org/T273673) [12:55:06] (03PS2) 10Vgutierrez: api: support sha256 checksums [software/acme-chief] - 10https://gerrit.wikimedia.org/r/806939 (owner: 10Majavah) [12:55:19] (03PS2) 10Vgutierrez: api: Offer JSON for metadata if requested [software/acme-chief] - 10https://gerrit.wikimedia.org/r/806940 (owner: 10Majavah) [12:55:45] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:base::puppet: rename profile [puppet] - 10https://gerrit.wikimedia.org/r/808884 (owner: 10Jbond) [12:56:10] (03PS2) 10Jbond: puppet: rename base::puppet class to puppet::agent [puppet] - 10https://gerrit.wikimedia.org/r/808885 [12:57:09] (03PS3) 10Jbond: puppet: rename base::puppet class to puppet::agent [puppet] - 10https://gerrit.wikimedia.org/r/808885 [12:58:47] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/suggest/sections/{title}/{from}/{to} (Suggest source sections to translate) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX [12:59:24] (03CR) 10Elukey: [V: 03+2 C: 03+2] Merge branch 'master' into debian [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808828 (owner: 10Elukey) [12:59:28] (03PS1) 10ArielGlenn: make sure various dump related scripts send email when they run [puppet] - 10https://gerrit.wikimedia.org/r/808890 (https://phabricator.wikimedia.org/T273673) [12:59:33] (03CR) 10Elukey: [C: 03+2] Release version 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808829 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [12:59:38] (03CR) 10CI reject: [V: 04-1] puppet: rename base::puppet class to puppet::agent [puppet] - 10https://gerrit.wikimedia.org/r/808885 (owner: 10Jbond) [13:00:04] RoanKattouw, Lucas_WMDE, Urbanecm, and awight: How many deployers does it take to do UTC afternoon backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220627T1300). [13:00:04] Lucas_WMDE, itamarWMDE, and koi: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:09] o/ [13:00:13] I can deploy! [13:00:21] go ahead Lucas_WMDE ! [13:00:55] RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX [13:01:21] (03PS2) 10Lucas Werkmeister (WMDE): [cirrus] Add a custom profile for the wikibase language selector [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808011 (https://phabricator.wikimedia.org/T307869) [13:01:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30428 and previous config saved to /var/cache/conftool/dbconfig/20220627-130153-ladsgroup.json [13:01:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30429 and previous config saved to /var/cache/conftool/dbconfig/20220627-130219-root.json [13:02:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30430 and previous config saved to /var/cache/conftool/dbconfig/20220627-130223-root.json [13:02:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30431 and previous config saved to /var/cache/conftool/dbconfig/20220627-130227-root.json [13:02:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30432 and previous config saved to /var/cache/conftool/dbconfig/20220627-130232-root.json [13:02:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:36] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] [cirrus] Add a custom profile for the wikibase language selector [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808011 (https://phabricator.wikimedia.org/T307869) (owner: 10Lucas Werkmeister (WMDE)) [13:02:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30433 and previous config saved to /var/cache/conftool/dbconfig/20220627-130238-root.json [13:02:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:29] (03Merged) 10jenkins-bot: [cirrus] Add a custom profile for the wikibase language selector [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808011 (https://phabricator.wikimedia.org/T307869) (owner: 10Lucas Werkmeister (WMDE)) [13:04:04] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34] - https://phabricator.wikimedia.org/T294972 (10Andrew) >>! In T294972#8027892, @Cmjohnson wrote: > @Andrew I am not sure which raid configuration you need. I don't know what cloudcepho... [13:04:40] hmm, searchEntities.php doesn’t crash anymore like the last time we tried this config change, but it also doesn’t show different results… [13:05:11] dcausse: if you’re online – do you know if the “language” search profile needs further changes before it can be used by repo/maintenance/searchEntities.php? [13:06:02] (03CR) 10Alexandros Kosiaris: [V: 03+1 C: 03+2] ipmi_exporter: Have wrapper depend on sudo::user [puppet] - 10https://gerrit.wikimedia.org/r/808883 (owner: 10Alexandros Kosiaris) [13:06:44] otherwise, I guess we can still sync the config change, but then need to figure out further steps before we add the API parameter [13:07:34] (03PS4) 10Jbond: puppet: rename base::puppet class to puppet::agent [puppet] - 10https://gerrit.wikimedia.org/r/808885 [13:07:58] (03CR) 10CI reject: [V: 04-1] puppet: rename base::puppet class to puppet::agent [puppet] - 10https://gerrit.wikimedia.org/r/808885 (owner: 10Jbond) [13:08:10] alright, I’ll sync the change and comment on the task then [13:08:35] ty Lucas! [13:09:14] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [13:09:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:25] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36044/console" [puppet] - 10https://gerrit.wikimedia.org/r/808885 (owner: 10Jbond) [13:09:35] PROBLEM - k8s API server requests latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 verb={CREATE,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27 [13:11:41] RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:11:51] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [13:11:52] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [13:11:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:17] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808011|[cirrus] Add a custom profile for the wikibase language selector (T307869)]] (1/4) (duration: 03m 35s) [13:12:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:22] T307869: Request for new search profile for Wikidata that boosts Items for languages - https://phabricator.wikimedia.org/T307869 [13:12:39] (03CR) 10Elukey: [C: 03+2] Release version 1.1.0 (031 comment) [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808829 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [13:14:05] (03PS5) 10Jbond: puppet: rename base::puppet class to puppet::agent [puppet] - 10https://gerrit.wikimedia.org/r/808885 [13:14:22] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [13:14:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:08] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:808011|[cirrus] Add a custom profile for the wikibase language selector (T307869)]] (2/4) (duration: 03m 33s) [13:16:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:56] Lucas_WMDE: it might neeed some tuning perhaps? [13:16:59] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30434 and previous config saved to /var/cache/conftool/dbconfig/20220627-131658-ladsgroup.json [13:17:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30435 and previous config saved to /var/cache/conftool/dbconfig/20220627-131723-root.json [13:17:24] so far the maintenance script returns the same five results in the same order with or without the profile [13:17:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30436 and previous config saved to /var/cache/conftool/dbconfig/20220627-131727-root.json [13:17:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:33] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30437 and previous config saved to /var/cache/conftool/dbconfig/20220627-131733-root.json [13:17:37] though I didn’t try that many search terms yet [13:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:37] oh from the maint script [13:17:42] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30438 and previous config saved to /var/cache/conftool/dbconfig/20220627-131742-root.json [13:17:45] lemme see [13:17:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:54] I’ll post results on the task when I’m done, right now they’re not convenient to copy+paste because of tmux [13:18:03] sure [13:18:15] but the profile context string definitely makes it to CirrusSearch, if I put in an invalid value then I get a SearchProfileException [13:18:36] just to rule that out ^^ [13:18:57] ok [13:20:16] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/SearchSettingsForWikibase.php: Config: [[gerrit:808011|[cirrus] Add a custom profile for the wikibase language selector (T307869)]] (3/4) (duration: 03m 32s) [13:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:22] T307869: Request for new search profile for Wikidata that boosts Items for languages - https://phabricator.wikimedia.org/T307869 [13:21:33] Lucas_WMDE: I suppose you hacked searchEntities.php to pass the profile context to getRankedSearchResults ? [13:22:17] I wouldn’t call it “hacked” but yes ^^ [13:22:21] ok :) [13:22:32] (added a --profile-context option – in Wikibase.git, not just manually on mwdebug ^^) [13:22:45] oh cool [13:23:09] (03PS2) 10Lucas Werkmeister (WMDE): Add WikibaseTerms temporary debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808836 (https://phabricator.wikimedia.org/T311307) [13:23:39] (03PS1) 10Elukey: Fix and updates for release 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) [13:24:04] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/SearchSettingsForWikidata.php: Config: [[gerrit:808011|[cirrus] Add a custom profile for the wikibase language selector (T307869)]] (4/4) (duration: 03m 29s) [13:24:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:21] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Add WikibaseTerms temporary debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808836 (https://phabricator.wikimedia.org/T311307) (owner: 10Lucas Werkmeister (WMDE)) [13:25:08] !log uploaded perccli 007.1910.0000.0000 to bullseye-wikimedia-private T308027 [13:25:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:13] T308027: private repo deployment - perccli implementation - https://phabricator.wikimedia.org/T308027 [13:25:31] (03Merged) 10jenkins-bot: Add WikibaseTerms temporary debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808836 (https://phabricator.wikimedia.org/T311307) (owner: 10Lucas Werkmeister (WMDE)) [13:25:59] this one isn’t testable, I’ll sync it directly [13:26:11] 10SRE, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Radar): deployment-deploy03 ran out of memory twice while trying to perform a WikiLambda db migration - https://phabricator.wikimedia.org/T309413 (10TheresNoTime) 05Open→03Resolved Going to mark this resolved (//as it kinda/mostly/sorta is//) [13:26:40] (03PS2) 10Elukey: Fix and updates for release 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) [13:29:49] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808836|Add WikibaseTerms temporary debug log channel (T311307)]] (duration: 03m 30s) [13:29:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:56] T311307: Add debug logging for item terms storage after merging - https://phabricator.wikimedia.org/T311307 [13:30:47] (03PS3) 10Lucas Werkmeister (WMDE): Separate wmgWikibaseTermboxEnabled and wmgWikibaseSSRTermboxServerUrl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803497 (https://phabricator.wikimedia.org/T304328) [13:30:55] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [13:30:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:04] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30439 and previous config saved to /var/cache/conftool/dbconfig/20220627-133204-ladsgroup.json [13:32:06] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [13:32:08] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [13:32:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:08] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched on wmf wikis - https://phabricator.wikimedia.org/T298565 [13:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:13] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30440 and previous config saved to /var/cache/conftool/dbconfig/20220627-133212-ladsgroup.json [13:32:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:18] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Separate wmgWikibaseTermboxEnabled and wmgWikibaseSSRTermboxServerUrl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803497 (https://phabricator.wikimedia.org/T304328) (owner: 10Lucas Werkmeister (WMDE)) [13:32:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:38] (03Merged) 10jenkins-bot: Separate wmgWikibaseTermboxEnabled and wmgWikibaseSSRTermboxServerUrl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803497 (https://phabricator.wikimedia.org/T304328) (owner: 10Lucas Werkmeister (WMDE)) [13:33:42] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [13:33:43] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [13:33:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:06] itamarWMDE: the first termbox change is on mwdebug1001, let’s test it [13:34:20] sure [13:34:30] (03PS3) 10Elukey: Fix and updates for release 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) [13:34:35] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [13:34:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:46] !log btullis@cumin1001 START - Cookbook sre.hosts.reimage for host stat1010.eqiad.wmnet with OS bullseye [13:34:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:50] (03CR) 10Elukey: [C: 03+2] Release version 1.1.0 (031 comment) [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808829 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [13:34:53] 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q4: rack/setup/install stat1010 - https://phabricator.wikimedia.org/T307399 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host stat1010.eqiad.wmnet with OS bullseye [13:35:07] (03PS1) 10Jbond: base::expose_puppet_certs: rename expose_puppet_certs define [puppet] - 10https://gerrit.wikimedia.org/r/808893 [13:35:14] (03CR) 10Elukey: "This solves a lot of lintian issues, plus it should be more correct oveall :)" [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [13:35:43] (03PS4) 10Elukey: Fix and updates for release 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) [13:36:09] mobile termbox still seems to be working on real and test wikidata afaict [13:36:19] PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_dispersion_stats_lowlatency.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:36:27] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30441 and previous config saved to /var/cache/conftool/dbconfig/20220627-133627-ladsgroup.json [13:36:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:40] Lucas_WMDE seems to work on mobile and test [13:36:43] \o/ [13:36:47] okay, syncing [13:38:21] (03CR) 10CI reject: [V: 04-1] base::expose_puppet_certs: rename expose_puppet_certs define [puppet] - 10https://gerrit.wikimedia.org/r/808893 (owner: 10Jbond) [13:39:21] (03PS4) 10Lucas Werkmeister (WMDE): Unconfigure wmgWikibaseSSRTermboxServerUrl on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803498 (https://phabricator.wikimedia.org/T304328) [13:39:40] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [13:39:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:08] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:803497|Separate wmgWikibaseTermboxEnabled and wmgWikibaseSSRTermboxServerUrl (T304328)]] (duration: 03m 27s) [13:40:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:13] T304328: Move Termbox SSR for Beta Wikidata into deployment-prep project - https://phabricator.wikimedia.org/T304328 [13:40:40] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [13:40:41] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [13:40:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:50] (03PS2) 10Stang: enwikiquote: Create rollbacker user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808370 (https://phabricator.wikimedia.org/T310950) [13:41:30] hm, that diff doesn’t look right [13:41:33] (https://integration.wikimedia.org/ci/job/operations-mw-config-php72-composer-diffConfig-docker/12107/console) [13:41:40] wikidatawiki still has a wmgWikibaseSSRTermboxServerUrl [13:41:56] do individual wiki values in IS.php override the custom default in IS-labs.php? [13:43:00] ah, I guess we want to set -wmgWikibaseSSRTermboxServerUrl with the hyphen in front [13:43:08] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [13:43:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:14] (03PS5) 10Lucas Werkmeister (WMDE): Unconfigure wmgWikibaseSSRTermboxServerUrl on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803498 (https://phabricator.wikimedia.org/T304328) [13:43:24] let’s see if that works better [13:43:45] (03PS2) 10Jbond: base::expose_puppet_certs: rename expose_puppet_certs define [puppet] - 10https://gerrit.wikimedia.org/r/808893 [13:44:33] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36047/console" [puppet] - 10https://gerrit.wikimedia.org/r/808893 (owner: 10Jbond) [13:45:45] much better https://integration.wikimedia.org/ci/job/operations-mw-config-php72-composer-diffConfig-docker/12109/console [13:45:50] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Unconfigure wmgWikibaseSSRTermboxServerUrl on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803498 (https://phabricator.wikimedia.org/T304328) (owner: 10Lucas Werkmeister (WMDE)) [13:46:40] (03Merged) 10jenkins-bot: Unconfigure wmgWikibaseSSRTermboxServerUrl on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803498 (https://phabricator.wikimedia.org/T304328) (owner: 10Lucas Werkmeister (WMDE)) [13:47:44] nothing to test here, syncing directly [13:50:45] (03PS1) 10Volans: icinga: add test to improve test coverage [software/spicerack] - 10https://gerrit.wikimedia.org/r/808896 [13:50:47] (03PS1) 10Volans: ganeti: refactor Ganeti to support the new model [software/spicerack] - 10https://gerrit.wikimedia.org/r/808897 [13:50:50] koi: are you around? [13:51:18] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:803498|Unconfigure wmgWikibaseSSRTermboxServerUrl on Beta (T304328)]] (duration: 03m 20s) [13:51:19] yep [13:51:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:51:24] T304328: Move Termbox SSR for Beta Wikidata into deployment-prep project - https://phabricator.wikimedia.org/T304328 [13:51:32] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30442 and previous config saved to /var/cache/conftool/dbconfig/20220627-135132-ladsgroup.json [13:51:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:05] alright [13:52:09] 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q4: rack/setup/install stat1010 - https://phabricator.wikimedia.org/T307399 (10BTullis) [13:52:15] (03PS3) 10Lucas Werkmeister (WMDE): enwikiquote: Create rollbacker user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808370 (https://phabricator.wikimedia.org/T310950) (owner: 10Stang) [13:52:28] 10SRE, 10ops-eqsin: cr3-eqsin:xe-0/1/1 interface errors - https://phabricator.wikimedia.org/T300485 (10RobH) > Hello Team, > > > > Kindly be informed that the Tech has been ordered for tomorrow at 0800 GMT. > > Looking forward to keeping you updated. > > > > Thank you > > > > > > Best Regards,... [13:53:12] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [13:53:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:42] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] enwikiquote: Create rollbacker user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808370 (https://phabricator.wikimedia.org/T310950) (owner: 10Stang) [13:54:12] (03PS1) 10Hashar: jenkins: use upstream systemd definition [puppet] - 10https://gerrit.wikimedia.org/r/808900 (https://phabricator.wikimedia.org/T308637) [13:54:34] (03Merged) 10jenkins-bot: enwikiquote: Create rollbacker user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808370 (https://phabricator.wikimedia.org/T310950) (owner: 10Stang) [13:54:56] (03CR) 10Hashar: "systemd support is https://github.com/jenkinsci/packaging/pull/266" [puppet] - 10https://gerrit.wikimedia.org/r/808900 (https://phabricator.wikimedia.org/T308637) (owner: 10Hashar) [13:55:03] koi: the change is on mwdebug1001, please test [13:55:06] !log jayme@cumin1001 START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw [13:55:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:17] !log jayme@cumin1001 END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw [13:55:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:26] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/808900 (https://phabricator.wikimedia.org/T308637) (owner: 10Hashar) [13:55:54] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [13:55:56] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [13:55:59] Lucas_WMDE: LGTM [13:56:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:03] ok [13:56:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:31] syncing [13:56:52] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [13:56:53] 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q4: rack/setup/install stat1010 - https://phabricator.wikimedia.org/T307399 (10BTullis) I've attempted to run the cookbook to install this server, but it's failing at the TFTP step, I believe. {F35280906} The preceding parts of the cookbook appeared to wor... [13:56:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:05] (03PS1) 10Ayounsi: Systematically set the MTU on server facing switch interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/808901 [13:59:41] 10SRE, 10ops-codfw, 10DBA, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install db2153.codfw.wmnet - db2174.codfw.wmnet - https://phabricator.wikimedia.org/T306927 (10Papaul) @Marostegui it doesn't matter to me what works best for you will work for me. Thanks [14:00:05] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808370|enwikiquote: Create rollbacker user group (T310950)]] (duration: 03m 40s) [14:00:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:12] T310950: Add Rollback right to (English) Wikiquote - https://phabricator.wikimedia.org/T310950 [14:00:45] !log UTC afternoon backport+config window done [14:00:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:48] right on time :) [14:01:55] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [14:01:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:52] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [14:02:53] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [14:02:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:34] (03CR) 10Hashar: "Compiler https://puppet-compiler.wmflabs.org/pcc-worker1003/1356/ . That doesn't tell much about the state of the service once this is dep" [puppet] - 10https://gerrit.wikimedia.org/r/808900 (https://phabricator.wikimedia.org/T308637) (owner: 10Hashar) [14:04:35] (03CR) 10Jbond: "PCC al good" [puppet] - 10https://gerrit.wikimedia.org/r/808885 (owner: 10Jbond) [14:05:23] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [14:05:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:05:54] (03CR) 10Vgutierrez: "it looks good, could you add a test as well?" [software/acme-chief] - 10https://gerrit.wikimedia.org/r/806939 (owner: 10Majavah) [14:06:37] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30443 and previous config saved to /var/cache/conftool/dbconfig/20220627-140637-ladsgroup.json [14:06:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:02] (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [14:10:21] (03PS1) 10DCausse: Do not set wgWBCSLanguageSelectorRescoreProfile twice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808903 (https://phabricator.wikimedia.org/T307869) [14:17:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1135 db1148 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30444 and previous config saved to /var/cache/conftool/dbconfig/20220627-141701-root.json [14:17:03] (03PS2) 10DCausse: Do not set wgWBCSLanguageSelectorRescoreProfile twice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808903 (https://phabricator.wikimedia.org/T307869) [14:17:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:05] RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:21:43] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30445 and previous config saved to /var/cache/conftool/dbconfig/20220627-142142-ladsgroup.json [14:21:45] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [14:21:46] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance [14:21:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:48] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched on wmf wikis - https://phabricator.wikimedia.org/T298565 [14:21:51] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30446 and previous config saved to /var/cache/conftool/dbconfig/20220627-142151-ladsgroup.json [14:21:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:18] 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10jbond) @jbond note to self, look at extending raid fact to support new controller [14:24:21] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30447 and previous config saved to /var/cache/conftool/dbconfig/20220627-142421-root.json [14:24:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1148 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30448 and previous config saved to /var/cache/conftool/dbconfig/20220627-142428-root.json [14:24:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:43] jouncebot: now [14:25:43] No deployments scheduled for the next 1 hour(s) and 4 minute(s) [14:26:08] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30449 and previous config saved to /var/cache/conftool/dbconfig/20220627-142607-ladsgroup.json [14:26:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:26:44] dcausse: if you want, we could probably backport+deploy those WikibaseCirrusSearch fixes now? [14:26:55] or just let the extension fix roll out with the train, I guess [14:27:04] and then do the config change later this week [14:27:16] (03PS1) 10Elukey: Add configuration for the ml-cache codfw Cassandra cluster [puppet] - 10https://gerrit.wikimedia.org/r/808907 (https://phabricator.wikimedia.org/T302232) [14:27:19] Lucas_WMDE: as you want, I'm around for the next couple of hours [14:27:49] looks like both changes are simple enough that I could just hand-edit them on mwdebug and try them that way, actually [14:27:52] I’ll try that now [14:28:03] oh good idea, thanks! [14:28:41] * Lucas_WMDE hacking around on mwdebug1001 [14:29:51] (03PS1) 10Giuseppe Lavagetto: mediawiki: install php7.4 on the mwdebug servers [puppet] - 10https://gerrit.wikimedia.org/r/808908 (https://phabricator.wikimedia.org/T311386) [14:29:53] (03PS1) 10Giuseppe Lavagetto: mediawiki: install php7.4 on the canaries [puppet] - 10https://gerrit.wikimedia.org/r/808909 (https://phabricator.wikimedia.org/T311386) [14:29:55] (03PS1) 10Giuseppe Lavagetto: mediawiki: install php7.4 on jobrunners [puppet] - 10https://gerrit.wikimedia.org/r/808910 (https://phabricator.wikimedia.org/T311386) [14:29:57] (03PS1) 10Giuseppe Lavagetto: mediawiki: install php7.4 on the maintenance server [puppet] - 10https://gerrit.wikimedia.org/r/808911 (https://phabricator.wikimedia.org/T311386) [14:29:59] (03PS1) 10Giuseppe Lavagetto: mediawiki: install php7.4 on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/808912 (https://phabricator.wikimedia.org/T311386) [14:30:58] hmmm, to me it still looks like the same search results… [14:31:02] !log btullis@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1010.eqiad.wmnet with OS bullseye [14:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:08] 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: Q4: rack/setup/install stat1010 - https://phabricator.wikimedia.org/T307399 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host stat1010.eqiad.wmnet with OS bullseye executed with errors: - stat1010 (**FAIL**)... [14:31:43] (03PS2) 10Elukey: Add configuration for the ml-cache codfw Cassandra cluster [puppet] - 10https://gerrit.wikimedia.org/r/808907 (https://phabricator.wikimedia.org/T302232) [14:32:53] Lucas_WMDE: looking [14:33:08] I looked at both files again and I *think* I edited them equivalently [14:33:35] (but without changing the line wrapping in Hooks.php, and I didn’t remove the $config assignment either) [14:35:52] hang on [14:35:52] hm... yes the change you made should fix the bug I thought I fixed [14:35:55] (03PS3) 10Elukey: Add configuration for the ml-cache codfw Cassandra cluster [puppet] - 10https://gerrit.wikimedia.org/r/808907 (https://phabricator.wikimedia.org/T302232) [14:36:11] I just realized there’s still an open Gerrit change https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/806931 [14:36:17] do we need that first? [14:36:28] no, wait, that’s for the API [14:36:30] nevermind [14:36:59] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36052/console" [puppet] - 10https://gerrit.wikimedia.org/r/808907 (https://phabricator.wikimedia.org/T302232) (owner: 10Elukey) [14:37:04] trying to dump the profiles from the API and see [14:37:07] (locally I have that in a review/dcausse/801793 branch but this is actually my change, related to the Wikibase API and not CirrusSearch itself ^^) [14:37:09] ok [14:37:15] (03CR) 10Elukey: "Adding Eric to get a confirmation about rack values for codfw (I used the same as eqiad, lemme know if it is ok :)" [puppet] - 10https://gerrit.wikimedia.org/r/808907 (https://phabricator.wikimedia.org/T302232) (owner: 10Elukey) [14:37:52] akosiaris: thank you for fixing the ipmi-exporter/sudo race <3 [14:38:28] (03CR) 10Volans: [C: 03+1] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/808901 (owner: 10Ayounsi) [14:38:34] (03CR) 10Jbond: [C: 03+2] puppet: rename base::puppet class to puppet::agent [puppet] - 10https://gerrit.wikimedia.org/r/808885 (owner: 10Jbond) [14:38:37] (03CR) 10Jbond: [V: 03+1 C: 03+2] base::expose_puppet_certs: rename expose_puppet_certs define [puppet] - 10https://gerrit.wikimedia.org/r/808893 (owner: 10Jbond) [14:39:25] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30450 and previous config saved to /var/cache/conftool/dbconfig/20220627-143925-root.json [14:39:26] the fix is working but I see "statement_keywords": null, so something else is not correct :/ [14:39:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1148 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30451 and previous config saved to /var/cache/conftool/dbconfig/20220627-143932-root.json [14:39:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:44] hm :/ [14:39:55] (03CR) 10Muehlenhoff: Fix and updates for release 1.1.0 (031 comment) [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [14:39:56] but then I’ll +2 the fix at least [14:40:01] it can at least go into the train [14:40:28] godog: yw :-) [14:40:52] ah the code expects LanguageSelectorStatementBoosts but we configure LanguageSelectorStatementBoost (missing "s") [14:41:04] * Lucas_WMDE also thinks the config_prefix in extension.json seems to be a great way for extensions to shoot themselves in the foot, and to make code less readable :| [14:41:11] aaaah, classic :) [14:41:13] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30452 and previous config saved to /var/cache/conftool/dbconfig/20220627-144113-ladsgroup.json [14:41:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:39] PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_dispersion_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:41:41] this cirrus config is a bit of mess, hardly testable :( [14:41:52] (03PS5) 10Elukey: Fix and updates for release 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) [14:42:33] :/ [14:42:36] !log jayme@cumin1001 START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw [14:42:39] (03CR) 10Elukey: Fix and updates for release 1.1.0 (031 comment) [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [14:42:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:49] are you uploading a fix? [14:43:07] !log jayme@cumin1001 END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw [14:43:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:44] (03PS1) 10Filippo Giunchedi: smokeping: remove asw/pfw, moved to Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/808914 (https://phabricator.wikimedia.org/T169860) [14:46:28] Lucas_WMDE: still testing something on mwdebug1001 [14:46:34] ok [14:46:54] the profile seems correct but resuls are still identical :/ [14:49:14] :/ [14:50:49] (03PS6) 10Elukey: Fix and updates for release 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) [14:51:09] RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:52:48] (03CR) 10Ayounsi: [C: 03+2] Systematically set the MTU on server facing switch interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/808901 (owner: 10Ayounsi) [14:54:25] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [14:54:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30453 and previous config saved to /var/cache/conftool/dbconfig/20220627-145429-root.json [14:54:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30454 and previous config saved to /var/cache/conftool/dbconfig/20220627-145436-root.json [14:54:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:15] (03CR) 10Elukey: [C: 03+2] Fix and updates for release 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [14:56:18] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30455 and previous config saved to /var/cache/conftool/dbconfig/20220627-145618-ladsgroup.json [14:56:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:45] (03CR) 10Elukey: Fix and updates for release 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [14:56:59] (03PS7) 10Elukey: Fix and updates for release 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) [14:57:30] (03CR) 10Elukey: Fix and updates for release 1.1.0 (031 comment) [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [14:57:36] (03CR) 10Elukey: [C: 03+2] Fix and updates for release 1.1.0 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808892 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [15:02:56] 10SRE, 10SRE Observability: systemd state on thanos-fe1001 is flapping - https://phabricator.wikimedia.org/T311322 (10fgiunchedi) Thank you @ssingh ! Yes definitely something we need to look into (i.e. the temporary failures). For now I've silenced the alert on thanos-fe1001 for this week, so at least there's... [15:04:31] 10SRE, 10SRE Observability, 10User-fgiunchedi: systemd state on thanos-fe1001 is flapping - https://phabricator.wikimedia.org/T311322 (10fgiunchedi) [15:07:00] (03PS1) 10Jbond: P:puppet::agent: move WMF customisations to profile [puppet] - 10https://gerrit.wikimedia.org/r/808919 [15:07:11] (03PS1) 10Lucas Werkmeister (WMDE): Use WBCS config when registering language selector profile [extensions/WikibaseCirrusSearch] (wmf/1.39.0-wmf.17) - 10https://gerrit.wikimedia.org/r/808445 (https://phabricator.wikimedia.org/T307869) [15:07:51] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36053/console" [puppet] - 10https://gerrit.wikimedia.org/r/808919 (owner: 10Jbond) [15:08:04] dcausse: if you’re okay with mwdebug1001 being reset, we could deploy that backport now [15:08:07] or wait until later [15:08:18] if you’re still looking into it [15:08:26] Lucas_WMDE: I think I found something [15:08:30] I'm fine resetting it [15:08:43] ok [15:09:01] then hopefully this + the config change will go through before the portals window or whatever was next on the calendar [15:09:07] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Use WBCS config when registering language selector profile [extensions/WikibaseCirrusSearch] (wmf/1.39.0-wmf.17) - 10https://gerrit.wikimedia.org/r/808445 (https://phabricator.wikimedia.org/T307869) (owner: 10Lucas Werkmeister (WMDE)) [15:09:28] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36054/console" [puppet] - 10https://gerrit.wikimedia.org/r/808919 (owner: 10Jbond) [15:09:30] (03PS3) 10Lucas Werkmeister (WMDE): Do not set wgWBCSLanguageSelectorRescoreProfile twice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808903 (https://phabricator.wikimedia.org/T307869) (owner: 10DCausse) [15:09:33] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30456 and previous config saved to /var/cache/conftool/dbconfig/20220627-150933-root.json [15:09:34] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Do not set wgWBCSLanguageSelectorRescoreProfile twice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808903 (https://phabricator.wikimedia.org/T307869) (owner: 10DCausse) [15:09:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30457 and previous config saved to /var/cache/conftool/dbconfig/20220627-150940-root.json [15:09:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:47] (I assume the config change doesn’t need to wait for the WikibaseCirrusSearch backport) [15:09:49] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:puppet::agent: move WMF customisations to profile [puppet] - 10https://gerrit.wikimedia.org/r/808919 (owner: 10Jbond) [15:09:56] Lucas_WMDE: no it can go now [15:10:09] alright thanks [15:10:17] (Gerrit also reminds me that https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/801792 is still open) [15:10:21] (03Merged) 10jenkins-bot: Do not set wgWBCSLanguageSelectorRescoreProfile twice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808903 (https://phabricator.wikimedia.org/T307869) (owner: 10DCausse) [15:10:42] Lucas_WMDE: thanks for the help, and sorry for the mess! [15:10:52] no problem, thanks for being so quick to look into it! [15:11:05] RECOVERY - Check systemd state on ms-fe2012 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:11:23] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298565)', diff saved to https://phabricator.wikimedia.org/P30458 and previous config saved to /var/cache/conftool/dbconfig/20220627-151123-ladsgroup.json [15:11:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:28] T298565: Fix mismatching field type of user table for columns user_email_authenticated, user_email_token, user_email_token_expires, user_newpass_time, user_registration, user_token, user_touched on wmf wikis - https://phabricator.wikimedia.org/T298565 [15:12:47] (03PS1) 10Elukey: Release 1.1.0-2 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808920 (https://phabricator.wikimedia.org/T310980) [15:13:47] (03PS2) 10Elukey: Release 1.1.0-2 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808920 (https://phabricator.wikimedia.org/T310980) [15:15:13] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/SearchSettingsForWikidata.php: Config: [[gerrit:808903|Do not set wgWBCSLanguageSelectorRescoreProfile twice (T307869)]] (duration: 03m 41s) [15:15:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:15:19] T307869: Request for new search profile for Wikidata that boosts Items for languages - https://phabricator.wikimedia.org/T307869 [15:16:06] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [15:16:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:02] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [15:17:03] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [15:17:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:37] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1173.eqiad.wmnet with reason: Maintenance [15:18:39] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1173.eqiad.wmnet with reason: Maintenance [15:18:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:44] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1173 (T298555)', diff saved to https://phabricator.wikimedia.org/P30460 and previous config saved to /var/cache/conftool/dbconfig/20220627-151843-ladsgroup.json [15:18:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:50] T298555: Fix mismatching field type of logging.log_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298555 [15:19:03] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:19:30] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [15:19:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:19:43] [7f2d658c-f87d-4b57-bca1-ad7dcc1d996e] 2022-06-27 15:19:03: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" [15:19:49] (03CR) 10Ahmon Dancy: P:mediawiki::scap_client: add parameter to indicate scap master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/807510 (https://phabricator.wikimedia.org/T310740) (owner: 10Jbond) [15:20:09] [bbfae9e1-b141-46f0-82e7-3b30f3eae28a] 2022-06-27 15:19:59: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" [15:21:15] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48248 bytes in 0.062 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:22:29] checking, is it ongoing, Cyberpower678? [15:22:37] [ec063c27-4653-4205-8c8c-3b4cf6e7d37f] 2022-06-27 15:22:26: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" [15:23:35] [8484df27-0ffd-4926-a325-ee9e561984f0] 2022-06-27 15:23:28: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" [15:24:25] which dc are you connecting to, Cyberpower678? I cannot replicate [15:24:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30461 and previous config saved to /var/cache/conftool/dbconfig/20220627-152436-root.json [15:24:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:42] What's a DC? [15:24:44] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30462 and previous config saved to /var/cache/conftool/dbconfig/20220627-152443-root.json [15:24:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:54] Cyberpower678: which continent are you connecting from? [15:24:55] [2d470c60-e26d-4d9f-bc69-49016e8c89a1] 2022-06-27 15:24:14: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" [15:25:05] USA [15:25:06] (03CR) 10Andrew Bogott: [C: 03+2] "For my own reference: horizon-based hiera suggests that the active host is currently defined as tools-sgecron-01.tools.eqiad.wmflabs" [puppet] - 10https://gerrit.wikimedia.org/r/807194 (https://phabricator.wikimedia.org/T284767) (owner: 10Majavah) [15:25:15] 10SRE, 10ops-eqiad, 10DC-Ops: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10Cmjohnson) [15:25:54] and which wiki/url? [15:25:58] (03CR) 10Ayounsi: smokeping: remove asw/pfw, moved to Prometheus (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/808914 (https://phabricator.wikimedia.org/T169860) (owner: 10Filippo Giunchedi) [15:26:04] https://en.wikipedia.org/w/index.php?title=Special:Contributions&offset=20191119170134&target=InternetArchiveBot&namespace=2&tagfilter=&start=&end= [15:26:15] thanks [15:26:16] [15f06c65-c222-4266-a065-819748a484a6] 2022-06-27 15:25:53: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" [15:26:16] that helps [15:26:23] I don't see generalized errors [15:26:35] but could be affecting you specifically for some reason [15:26:42] (03Merged) 10jenkins-bot: Use WBCS config when registering language selector profile [extensions/WikibaseCirrusSearch] (wmf/1.39.0-wmf.17) - 10https://gerrit.wikimedia.org/r/808445 (https://phabricator.wikimedia.org/T307869) (owner: 10Lucas Werkmeister (WMDE)) [15:26:50] (03PS1) 10Hnowlan: similar-users: make max queries per account configurable [deployment-charts] - 10https://gerrit.wikimedia.org/r/808923 (https://phabricator.wikimedia.org/T310646) [15:27:14] well, I imagine InternetArchiveBot has tons of contributions, most of them not in ns2, so that’s a tough recentchanges query [15:27:21] yeah, it may be affecting rc only [15:27:26] [12bca1e7-6b10-444f-81ac-4f1c1c5328b1] 2022-06-27 15:27:05: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" [15:28:39] So how should I get this data? [15:29:05] dcausse: I’m syncing the Hooks.php backport now [15:29:17] (03PS1) 10Cmjohnson: Adding dse-k8-100[7-8] to netboot and site.pp [puppet] - 10https://gerrit.wikimedia.org/r/808924 (https://phabricator.wikimedia.org/T307400) [15:29:29] and I think afterwards it would be a good idea to scap pull on mwdebug1001 to make sure it’s in a known state [15:29:32] if that’s okay with you [15:29:35] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [15:29:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:37] Lucas_WMDE: ok thanks! I'll send another fix soon [15:29:43] jan_drewniak: my scap will run slightly into your window, sorry about that [15:29:54] it’s restarting php-fpm now [15:30:04] jan_drewniak: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220627T1530). [15:30:19] dcausse: sounds good 👍 [15:31:05] Is my issue being looked at, or are things currently being dealt with? [15:31:08] (03CR) 10Hashar: [C: 04-1] "I gave it a try locally under Bullseye and systemd 247.3-7 with the production current file and the Debian package + the override:" [puppet] - 10https://gerrit.wikimedia.org/r/808900 (https://phabricator.wikimedia.org/T308637) (owner: 10Hashar) [15:31:28] Cyberpower678: I am trying to help you [15:31:28] If the latter, any other way to get to the bot's contribs? [15:31:34] jynus: :-) [15:31:43] but first I must understand better why it is happening [15:32:00] Indeed. [15:32:15] (03PS4) 10Hnowlan: restbase-dev: change role of new hosts [puppet] - 10https://gerrit.wikimedia.org/r/766082 (https://phabricator.wikimedia.org/T295375) [15:32:17] !log lucaswerkmeister-wmde@deploy1002 Synchronized php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/Hooks.php: Backport: [[gerrit:808445|Use WBCS config when registering language selector profile (T307869)]] (duration: 03m 38s) [15:32:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:23] T307869: Request for new search profile for Wikidata that boosts Items for languages - https://phabricator.wikimedia.org/T307869 [15:32:26] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [15:32:27] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [15:32:27] jan_drewniak: my scap finished, sorry about that [15:32:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:34] go ahead with the portals update :) [15:33:20] jynus: let me know if you need me to do something [15:33:21] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [15:33:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:41] I think I may find it [15:33:47] *have found [15:34:10] (03CR) 10Cmjohnson: [C: 03+2] Adding dse-k8-100[7-8] to netboot and site.pp [puppet] - 10https://gerrit.wikimedia.org/r/808924 (https://phabricator.wikimedia.org/T307400) (owner: 10Cmjohnson) [15:35:07] PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [15:35:38] so some rcs queries are failing due to being too slow [15:35:52] sometimes that is due to changes on the query [15:36:06] or a new feature that makes it slow [15:37:23] Cyberpower678: Would you mind opening a ticket with the query, so I can share it with the mw developers better than on IRC? [15:37:31] sorry, with the url querying [15:37:40] I will add the query your request does [15:37:46] and we can have a look [15:38:46] jynus:https://phabricator.wikimedia.org/T311425 [15:38:59] thanks, the ids help us locate the right query [15:39:29] if it is the bot, becuse it reads a much larger number of rows, you may be the first to suffer from it [15:39:41] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30463 and previous config saved to /var/cache/conftool/dbconfig/20220627-153940-root.json [15:39:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30464 and previous config saved to /var/cache/conftool/dbconfig/20220627-153947-root.json [15:39:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:06] I may be able to give you a more efficient query that could be equivalent also [15:40:06] jynus: that would be surprising since there are bots with FAR more edits then IABot [15:40:12] ClueBot NG comes to mind [15:40:26] yeah, but probably nobody queries their contributions :-D [15:40:40] lol [15:40:45] also recentchanges is always changing, adding more filtering options [15:40:55] and sometimes some of those makes querying slower [15:41:02] You mean you don't go perusing ClueBot's contributions for giggles? [15:41:06] let me see if I can give you an alatenrtive [15:41:11] (I must be weird lol) [15:42:04] namespace 2 is user pages, right? [15:42:11] Yes [15:42:15] (03PS2) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [15:43:37] (03CR) 10Andrew Bogott: [C: 03+2] P:toolforge::grid::cronrunner: sync crontabs between hosts [puppet] - 10https://gerrit.wikimedia.org/r/805848 (https://phabricator.wikimedia.org/T284767) (owner: 10Majavah) [15:43:49] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1005.mgmt.eqiad.wmnet with reboot policy FORCED [15:43:49] (03PS9) 10Andrew Bogott: P:toolforge::grid::cronrunner: sync crontabs between hosts [puppet] - 10https://gerrit.wikimedia.org/r/805848 (https://phabricator.wikimedia.org/T284767) (owner: 10Majavah) [15:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:03] so not a fix, but a temporary workaround- without the namespace filter, the query seems to be much faster [15:44:14] could that be done and then filter by namespace on client [15:44:24] OR perform a custom query from quarry? [15:44:39] (where query limits are much more generous) [15:45:07] that would be a fast suggestion before looking more in depth into T311425 [15:45:07] T311425: DB Timeout doing a simple contributions search - https://phabricator.wikimedia.org/T311425 [15:45:40] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [15:45:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:48] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [15:47:25] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298555)', diff saved to https://phabricator.wikimedia.org/P30467 and previous config saved to /var/cache/conftool/dbconfig/20220627-154724-ladsgroup.json [15:47:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:47:29] T298555: Fix mismatching field type of logging.log_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298555 [15:49:56] Cyberpower678: I will ping a few people on the ticket, but please be patient- normally those fixes are complex, because they fix your query and break 10 others [15:50:09] see if othe other workarounds work for you temporarilly [15:51:46] as you can see the generated query is not that simple :-( [15:51:49] jynus: I'm actually looking for an old testcase I ran the bot on in the userspace. I was manually looking at each edit before I hit that issue. [15:52:12] it is possible that user pages are less cached so only happens there [15:52:21] needs more research [15:52:22] (03CR) 10Eevans: "What does the changeset message mean where it says "Don't add them to any hiera configurations...", isn't this doing that?" [puppet] - 10https://gerrit.wikimedia.org/r/766082 (https://phabricator.wikimedia.org/T295375) (owner: 10Hnowlan) [15:52:34] sorry I need to run into a meeting, will update with what I find later [15:52:35] (03PS2) 10Hashar: jenkins: use upstream systemd definition [puppet] - 10https://gerrit.wikimedia.org/r/808900 (https://phabricator.wikimedia.org/T308637) [15:53:05] Holy crap [15:53:09] hm is logstash having a *moment* or does it just not like chromebooks 🤔 [15:53:16] Why is it that complex? :O [15:53:31] ores and other stuff that comes with it :-D [15:53:39] (03CR) 10CI reject: [V: 04-1] jenkins: use upstream systemd definition [puppet] - 10https://gerrit.wikimedia.org/r/808900 (https://phabricator.wikimedia.org/T308637) (owner: 10Hashar) [15:53:44] Remind me what ORES is? [15:54:04] it is the many things that can be used for filtering contributions/rc queries [15:54:04] Cyberpower678: https://www.mediawiki.org/wiki/ORES [15:54:14] but it is not the fault here [15:54:28] sorry, need to really run, will talk to you later on ticket [15:54:33] check quarry, that could help [15:54:43] (03PS3) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [15:54:45] (03PS1) 10Jbond: C:puppet: move files and templates to correct module [puppet] - 10https://gerrit.wikimedia.org/r/808932 [15:54:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30468 and previous config saved to /var/cache/conftool/dbconfig/20220627-155444-root.json [15:54:49] !log cmjohnson@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [15:54:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:55] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye executed with error... [15:54:58] ORES might actually be useful for my future plans with IABot 3 [15:55:54] Cyberpower678: I managed to load https://en.wikipedia.org/w/index.php?title=Special:Contributions&offset=20191119170134&target=InternetArchiveBot&namespace=2&tagfilter=&start=&end= , I guess the caches warmed up [15:56:28] Cyberpower678: hi, if you have more questions feel free to join #wikimedia-ml. ORES is on the path of deprecation in favor of Lift Wing, another solution that is k8s based [15:56:44] elukey cool [15:57:19] and good to know. I'm experimenting with the idea of using ML to accurately convert freeform references into machine readable citation templates. [15:57:27] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36056/console" [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [15:57:53] Nemo_bis: [6f149ba9-6aa4-49a6-af35-c59a7c1cac6a] 2022-06-27 15:56:34: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" :-( [15:58:06] My cache is obviously out of coffee or out to lunch. [15:59:12] I do wonder what happens if the statement time were to be increased by 5 seconds though. [16:00:20] 10SRE, 10Traffic, 10Patch-For-Review, 10SRE Observability (FY2021/2022-Q4), 10User-fgiunchedi: Migrate Traffic Prometheus alerts from Icinga to Alertmanager - https://phabricator.wikimedia.org/T300723 (10BCornwall) Spoke with @Vgutierrez on IRC and they confirmed that the mmap maximum is worth monitoring... [16:00:31] (03PS4) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [16:00:58] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [16:01:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:05] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [16:02:02] (03PS2) 10Jbond: C:puppet: move files and templates to correct module [puppet] - 10https://gerrit.wikimedia.org/r/808932 [16:02:30] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30469 and previous config saved to /var/cache/conftool/dbconfig/20220627-160229-ladsgroup.json [16:02:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:03:22] (03CR) 10Eevans: restbase-dev: change role of new hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/766082 (https://phabricator.wikimedia.org/T295375) (owner: 10Hnowlan) [16:03:24] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36057/console" [puppet] - 10https://gerrit.wikimedia.org/r/808932 (owner: 10Jbond) [16:03:33] (03PS5) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [16:04:45] TheresNoTime: I’m also seeing a few timeouts from logstash [16:04:50] (03CR) 10Jbond: [V: 03+1 C: 03+2] C:puppet: move files and templates to correct module [puppet] - 10https://gerrit.wikimedia.org/r/808932 (owner: 10Jbond) [16:05:36] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [16:05:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:43] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [16:05:50] Lucas_WMDE: yeah, fairly intermittent? [16:06:10] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [16:06:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:17] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [16:06:58] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye [16:07:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:05] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye [16:07:06] yeah, and if it responds it doesn’t feel slower than normal, I think… [16:07:21] !log cmjohnson@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [16:07:24] !log cmjohnson@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [16:07:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:28] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye executed with error... [16:07:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:28] !log cmjohnson@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye [16:07:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:33] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye executed with error... [16:07:34] !log cmjohnson@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [16:07:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:46] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye executed with error... [16:07:57] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye executed with error... [16:09:05] (03PS1) 10Jbond: Revert "C:puppet: move files and templates to correct module" [puppet] - 10https://gerrit.wikimedia.org/r/808946 [16:11:01] PROBLEM - Widespread puppet agent failures on alert1001 is CRITICAL: 0.1202 ge 0.01 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [16:11:06] (03PS1) 10Jbond: P:puppet::agent: correct source paths [puppet] - 10https://gerrit.wikimedia.org/r/808935 [16:11:17] i fixing widspread issue now [16:11:46] tx [16:11:47] thx [16:11:51] (03CR) 10CI reject: [V: 04-1] Revert "C:puppet: move files and templates to correct module" [puppet] - 10https://gerrit.wikimedia.org/r/808946 (owner: 10Jbond) [16:12:07] (03CR) 10Jbond: [V: 03+2 C: 03+2] P:puppet::agent: correct source paths [puppet] - 10https://gerrit.wikimedia.org/r/808935 (owner: 10Jbond) [16:12:16] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [16:12:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:22] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [16:12:38] (03PS3) 10Hashar: jenkins: use upstream systemd definition [puppet] - 10https://gerrit.wikimedia.org/r/808900 (https://phabricator.wikimedia.org/T308637) [16:12:51] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/808900 (https://phabricator.wikimedia.org/T308637) (owner: 10Hashar) [16:13:59] (03Abandoned) 10Jbond: Revert "C:puppet: move files and templates to correct module" [puppet] - 10https://gerrit.wikimedia.org/r/808946 (owner: 10Jbond) [16:14:18] (03PS5) 10Hnowlan: restbase-dev: create new codfw cluster, replace old eqiad cluster [puppet] - 10https://gerrit.wikimedia.org/r/766082 (https://phabricator.wikimedia.org/T295375) [16:14:22] (03CR) 10Eevans: [C: 03+1] Release 1.1.0-2 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808920 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [16:14:39] (03CR) 10Hnowlan: restbase-dev: create new codfw cluster, replace old eqiad cluster (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/766082 (https://phabricator.wikimedia.org/T295375) (owner: 10Hnowlan) [16:14:51] (03CR) 10Elukey: [C: 03+2] Release 1.1.0-2 [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808920 (https://phabricator.wikimedia.org/T310980) (owner: 10Elukey) [16:16:49] (03CR) 10Eevans: [C: 03+1] restbase-dev: create new codfw cluster, replace old eqiad cluster [puppet] - 10https://gerrit.wikimedia.org/r/766082 (https://phabricator.wikimedia.org/T295375) (owner: 10Hnowlan) [16:16:55] (03PS1) 10Eigyan: [beta]: Remove GDI quick survey from EN,ES wikis - BETA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808936 (https://phabricator.wikimedia.org/T311429) [16:17:35] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P30470 and previous config saved to /var/cache/conftool/dbconfig/20220627-161734-ladsgroup.json [16:17:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:13] (03PS1) 10Elukey: Revert README changes [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808937 [16:19:59] (03CR) 10Hashar: jenkins: use upstream systemd definition (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/808900 (https://phabricator.wikimedia.org/T308637) (owner: 10Hashar) [16:22:23] (03CR) 10Elukey: [V: 03+2 C: 03+2] Revert README changes [debs/cassandra-tools-wmf] (debian) - 10https://gerrit.wikimedia.org/r/808937 (owner: 10Elukey) [16:22:46] dcausse: looks like extension.json also declares LanguageSelectorStatementBoost, not -Boosts [16:23:07] so I would lean towards keeping it as Boost, and only changing Hooks.php to use that instead of -Boosts [16:23:16] the non-language version also seems to be called Boost singular [16:23:25] Lucas_WMDE: ok makes sense I'll do that [16:23:36] alright [16:25:17] RECOVERY - Widespread puppet agent failures on alert1001 is OK: (C)0.01 ge (W)0.006 ge 0.00303 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [16:25:29] !log upload cassandra-tools-wmf 1.1.0-2 (py3 version) to bullseye-wikimedia - T310980 [16:25:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:25:36] T310980: Allow Cassandra to be deployed on Bullseye nodes - https://phabricator.wikimedia.org/T310980 [16:26:11] 10SRE, 10Cassandra, 10Patch-For-Review: Allow Cassandra to be deployed on Bullseye nodes - https://phabricator.wikimedia.org/T310980 (10elukey) All packages should now be present on bullseye-wikimedia, I'll test on ml-cache that everything works and report back. [16:28:02] !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [16:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:10] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye executed with error... [16:30:03] (03PS6) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [16:30:21] (03PS7) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [16:30:58] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [16:31:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:05] 10SRE, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [16:31:17] (03PS8) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [16:31:27] (03PS9) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [16:32:30] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [16:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:36] 10SRE, 10ops-eqiad, 10DC-Ops: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [16:32:40] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T298555)', diff saved to https://phabricator.wikimedia.org/P30471 and previous config saved to /var/cache/conftool/dbconfig/20220627-163239-ladsgroup.json [16:32:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:45] T298555: Fix mismatching field type of logging.log_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298555 [16:32:53] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [16:32:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:59] 10SRE, 10ops-eqiad, 10DC-Ops: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [16:33:48] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye [16:33:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:52] 10SRE, 10ops-eqiad, 10DC-Ops: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye [16:33:58] (03CR) 10Jbond: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/808896 (owner: 10Volans) [16:34:36] (03CR) 10Slyngshede: [C: 03+1] "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/808890 (https://phabricator.wikimedia.org/T273673) (owner: 10ArielGlenn) [16:36:35] (03CR) 10Jbond: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/808897 (owner: 10Volans) [16:37:35] (03CR) 10Jbond: [C: 04-1] "-1 until jamie returns" [puppet] - 10https://gerrit.wikimedia.org/r/807510 (https://phabricator.wikimedia.org/T310740) (owner: 10Jbond) [16:38:39] (03CR) 10Jbond: puppet: add wrapper command (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [16:42:31] (03PS1) 10DCausse: Increase weights on the language selector statement boosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808941 (https://phabricator.wikimedia.org/T307869) [16:43:52] (03PS4) 10Elukey: Add configuration for the ml-cache codfw Cassandra cluster [puppet] - 10https://gerrit.wikimedia.org/r/808907 (https://phabricator.wikimedia.org/T302232) [16:44:20] (03CR) 10Elukey: Add configuration for the ml-cache codfw Cassandra cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/808907 (https://phabricator.wikimedia.org/T302232) (owner: 10Elukey) [16:44:24] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1008.eqiad.wmnet with reason: host reimage [16:44:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:45:57] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage [16:46:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:13] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] "sounds good (I guess the 0.1 factor is the one in ElasticSearchRescoreFunctions.php)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808941 (https://phabricator.wikimedia.org/T307869) (owner: 10DCausse) [16:46:22] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage [16:46:25] (03CR) 10Volans: "[note] This would require also changes in spicerack and any other script that might interact with Puppet." [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [16:46:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:17] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage [16:47:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:50] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1008.eqiad.wmnet with reason: host reimage [16:47:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:24] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage [16:50:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:51:03] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [16:51:55] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage [16:51:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:52:01] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.289 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [16:52:24] (03CR) 10Jbond: puppet: add wrapper command (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [16:54:16] (03CR) 10Volans: puppet: add wrapper command (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [16:54:30] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage [16:54:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:57:08] (03CR) 10Jbond: puppet: add wrapper command (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [16:57:47] (03CR) 10Volans: [C: 03+2] icinga: add test to improve test coverage [software/spicerack] - 10https://gerrit.wikimedia.org/r/808896 (owner: 10Volans) [17:00:04] ryankemper: May I have your attention please! Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220627T1700) [17:07:16] (03Merged) 10jenkins-bot: icinga: add test to improve test coverage [software/spicerack] - 10https://gerrit.wikimedia.org/r/808896 (owner: 10Volans) [17:10:03] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [17:10:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:08] 10SRE, 10ops-eqiad, 10DC-Ops: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye completed: - dse-k8s-worker1007 (**PASS**... [17:12:18] (03PS1) 10JMeybohm: Alert on helm releases in bad state [alerts] - 10https://gerrit.wikimedia.org/r/808968 (https://phabricator.wikimedia.org/T310714) [17:13:23] !log dancy@deploy1002 backport aborted: (duration: 00m 02s) [17:13:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:58] !log dancy@deploy1002 backport aborted: (duration: 00m 02s) [17:15:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:49] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye [17:17:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:53] 10SRE, 10ops-eqiad, 10DC-Ops: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye completed: - dse-k8s-worker1005 (**PASS**... [17:18:40] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [17:18:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:48] 10SRE, 10ops-eqiad, 10DC-Ops: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye completed: - dse-k8s-worker1008 (**PASS**... [17:19:08] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [17:19:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:19:13] 10SRE, 10ops-eqiad, 10DC-Ops: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye completed: - dse-k8s-worker1006 (**PASS**... [17:24:52] PROBLEM - k8s API server requests latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 verb={CREATE,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27 [17:35:55] 10SRE, 10serviceops: Provide node14 and node16 images for running production node-based services - https://phabricator.wikimedia.org/T306996 (10bd808) >>! In T306996#7914578, @bd808 wrote: >>>! In T306996#7912881, @Joe wrote: >> I've build and published the `nodejs14-slim` and the `nodejs16-slim` images, using... [17:39:29] (03CR) 10BryanDavis: [C: 03+1] Provide a nodejs16 image based on Bullseye and Nodesource [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/806266 (https://phabricator.wikimedia.org/T310821) (owner: 10Majavah) [17:43:54] (03PS2) 10DLynch: Sync sampling rates at 9 wikis DiscussionTools is testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804022 (https://phabricator.wikimedia.org/T309260) [17:53:14] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance [17:53:15] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance [17:53:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:20] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1148 (T309311)', diff saved to https://phabricator.wikimedia.org/P30472 and previous config saved to /var/cache/conftool/dbconfig/20220627-175320-ladsgroup.json [17:53:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:26] T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311 [17:57:36] !log aokoth@cumin1001 START - Cookbook sre.hosts.decommission for hosts gitlab2001.wikimedia.org [17:57:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:20] 10SRE, 10DNS, 10Traffic, 10WMF-Legal, and 3 others: Setup redirect of policy.wikimedia.org to Advocacy portal on Foundation website - https://phabricator.wikimedia.org/T310738 (10LSobanski) @Varnent could we get a clarification of the timeline for this request? The description says end of this month and yo... [18:01:16] !log aokoth@cumin1001 START - Cookbook sre.dns.netbox [18:01:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:04] PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [18:06:10] (03PS1) 10Eigyan: [wmf-config]: Deploy GDI Safety Survey Wave 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808975 (https://phabricator.wikimedia.org/T311434) [18:10:02] (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [18:12:04] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148 (T309311)', diff saved to https://phabricator.wikimedia.org/P30473 and previous config saved to /var/cache/conftool/dbconfig/20220627-181204-ladsgroup.json [18:14:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:14:02] T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311 [18:15:12] RECOVERY - k8s API server requests latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27 [18:15:43] (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/808979 [18:20:04] !log aokoth@cumin1001 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [18:20:04] !log aokoth@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab2001.wikimedia.org [18:20:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:24:03] (03PS10) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [18:25:15] (03CR) 10CI reject: [V: 04-1] puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [18:27:09] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P30474 and previous config saved to /var/cache/conftool/dbconfig/20220627-182709-ladsgroup.json [18:27:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:09] 10SRE, 10Traffic, 10SRE Observability (FY2021/2022-Q4): Create vm.max_map_count metrics for Prometheus - https://phabricator.wikimedia.org/T311445 (10BCornwall) [18:31:48] (03PS11) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [18:31:51] (03PS1) 10Jbond: P:puppet::agent: add logging of puppet calls [puppet] - 10https://gerrit.wikimedia.org/r/808984 [18:33:07] (03CR) 10CI reject: [V: 04-1] puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [18:33:17] (03CR) 10CI reject: [V: 04-1] P:puppet::agent: add logging of puppet calls [puppet] - 10https://gerrit.wikimedia.org/r/808984 (owner: 10Jbond) [18:36:08] 10SRE, 10Cassandra, 10Platform Team Workboards (Clinic Duty Team): Revisit default settings for c-foreach-restart - https://phabricator.wikimedia.org/T198787 (10Eevans) 05Open→03Declined a:05hnowlan→03Eevans This no longer seems to be a problem, the packaging prepared has succumbed to bit-rot and has... [18:36:40] (03PS2) 10Jbond: P:puppet::agent: add logging of puppet calls [puppet] - 10https://gerrit.wikimedia.org/r/808984 [18:37:02] (03PS12) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [18:37:35] (03CR) 10CI reject: [V: 04-1] P:puppet::agent: add logging of puppet calls [puppet] - 10https://gerrit.wikimedia.org/r/808984 (owner: 10Jbond) [18:38:03] (03CR) 10CI reject: [V: 04-1] puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [18:38:11] (03PS3) 10Jbond: P:puppet::agent: add logging of puppet calls [puppet] - 10https://gerrit.wikimedia.org/r/808984 [18:39:08] (03CR) 10CI reject: [V: 04-1] P:puppet::agent: add logging of puppet calls [puppet] - 10https://gerrit.wikimedia.org/r/808984 (owner: 10Jbond) [18:39:19] (03PS13) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [18:40:18] (03CR) 10CI reject: [V: 04-1] puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [18:42:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P30475 and previous config saved to /var/cache/conftool/dbconfig/20220627-184214-ladsgroup.json [18:42:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:43:08] (03PS4) 10Jbond: P:puppet::agent: add logging of puppet calls [puppet] - 10https://gerrit.wikimedia.org/r/808984 [18:44:23] (03CR) 10Volans: P:puppet::agent: add logging of puppet calls (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/808984 (owner: 10Jbond) [18:48:14] (03CR) 10Jbond: P:puppet::agent: add logging of puppet calls (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/808984 (owner: 10Jbond) [18:48:24] (03PS14) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [18:50:44] (03CR) 10Eevans: [C: 03+1] Add configuration for the ml-cache codfw Cassandra cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/808907 (https://phabricator.wikimedia.org/T302232) (owner: 10Elukey) [18:51:30] (03PS15) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [18:53:07] (03PS16) 10Jbond: puppet: add wrapper command [puppet] - 10https://gerrit.wikimedia.org/r/808877 [18:53:58] (03CR) 10Jbond: puppet: add wrapper command (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/808877 (owner: 10Jbond) [18:57:19] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148 (T309311)', diff saved to https://phabricator.wikimedia.org/P30476 and previous config saved to /var/cache/conftool/dbconfig/20220627-185719-ladsgroup.json [18:57:20] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance [18:57:22] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance [18:57:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:25] T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311 [18:57:27] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1147 (T309311)', diff saved to https://phabricator.wikimedia.org/P30477 and previous config saved to /var/cache/conftool/dbconfig/20220627-185727-ladsgroup.json [18:57:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:59:50] !log volans@cumin2002 START - Cookbook sre.dns.netbox [18:59:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:51] !log volans@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [19:03:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:17] (03PS1) 10Clare Ming: Enable sticky header edit test on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808995 (https://phabricator.wikimedia.org/T310750) [19:10:20] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T309311)', diff saved to https://phabricator.wikimedia.org/P30478 and previous config saved to /var/cache/conftool/dbconfig/20220627-191020-ladsgroup.json [19:10:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:10:26] T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311 [19:11:04] RECOVERY - Uncommitted DNS changes in Netbox on netbox1002 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [19:25:25] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P30479 and previous config saved to /var/cache/conftool/dbconfig/20220627-192525-ladsgroup.json [19:25:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:29:55] (03CR) 10Andrew Bogott: [C: 03+2] P:openstack::designate: set base_url to use the https port [puppet] - 10https://gerrit.wikimedia.org/r/800948 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah) [19:30:55] (03CR) 10Jdlrobson: [C: 04-1] Enable sticky header edit test on beta cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808995 (https://phabricator.wikimedia.org/T310750) (owner: 10Clare Ming) [19:34:48] PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:35:54] 10SRE, 10ops-eqsin, 10Traffic: SSH on cp5012.mgmt is flapping (CRITICAL) - https://phabricator.wikimedia.org/T311264 (10RobH) If it is just 'depool' from command line and stop puppet, I can handle so you don't need to take it down in advance of the work, just lemme know! [19:36:06] (03CR) 10Clare Ming: Enable sticky header edit test on beta cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808995 (https://phabricator.wikimedia.org/T310750) (owner: 10Clare Ming) [19:39:35] (03CR) 10Jdlrobson: [C: 03+1] ":)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808995 (https://phabricator.wikimedia.org/T310750) (owner: 10Clare Ming) [19:39:44] (03CR) 10Jdlrobson: [C: 03+1] Enable sticky header edit test on beta cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808995 (https://phabricator.wikimedia.org/T310750) (owner: 10Clare Ming) [19:40:30] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P30480 and previous config saved to /var/cache/conftool/dbconfig/20220627-194030-ladsgroup.json [19:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:57] 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for Triciaburmeister - https://phabricator.wikimedia.org/T311453 (10TBurmeister) [19:48:56] !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=ats-be [19:48:57] !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=varnish-fe [19:48:57] !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=ats-tls [19:49:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:34] !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp5012.eqsin.wmnet with reason: depooled: flapping mgmt interface: T311264 [19:50:37] !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp5012.eqsin.wmnet with reason: depooled: flapping mgmt interface: T311264 [19:50:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:40] T311264: SSH on cp5012.mgmt is flapping (CRITICAL) - https://phabricator.wikimedia.org/T311264 [19:50:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:51:03] 10SRE, 10ops-eqsin, 10Traffic: SSH on cp5012.mgmt is flapping (CRITICAL) - https://phabricator.wikimedia.org/T311264 (10ssingh) >>! In T311264#8030866, @RobH wrote: > If it is just 'depool' from command line+stop puppet+icinga maint mode, I can handle so you don't need to take it down in advance of the work,... [19:55:35] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T309311)', diff saved to https://phabricator.wikimedia.org/P30481 and previous config saved to /var/cache/conftool/dbconfig/20220627-195535-ladsgroup.json [19:55:37] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance [19:55:38] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance [19:55:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:42] T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311 [19:55:43] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3314 (T309311)', diff saved to https://phabricator.wikimedia.org/P30482 and previous config saved to /var/cache/conftool/dbconfig/20220627-195543-ladsgroup.json [19:55:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:05] RoanKattouw, Urbanecm, and cjming: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220627T2000). [20:00:05] kemayo and cjming: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:13] i can deploy [20:00:38] I am present. [20:00:51] kemayo: i'll do yours first [20:01:13] (03CR) 10Clare Ming: [C: 03+2] Sync sampling rates at 9 wikis DiscussionTools is testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804022 (https://phabricator.wikimedia.org/T309260) (owner: 10DLynch) [20:01:14] 10SRE, 10ops-eqsin, 10Traffic: SSH on cp5012.mgmt is flapping (CRITICAL) - https://phabricator.wikimedia.org/T311264 (10BTullis) Hello, in case it's helpful, I fixed one of these the other day in eqiad by using `ipmitool` to do a cold reset of the BMC. {T311042} [20:03:58] (03Merged) 10jenkins-bot: Sync sampling rates at 9 wikis DiscussionTools is testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/804022 (https://phabricator.wikimedia.org/T309260) (owner: 10DLynch) [20:04:51] kemayo: is yours testable? on mwdebug1002 [20:05:10] One second [20:05:10] otherwise I can go ahead and sync [20:05:40] cjming: Okay, looks good. [20:05:46] great - syncing [20:06:19] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [20:06:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:34] PROBLEM - Check systemd state on stat1005 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-aarora-singleuser.service,jupyter-seddon-singleuser.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:07:14] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [20:07:15] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [20:07:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:39] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T309311)', diff saved to https://phabricator.wikimedia.org/P30483 and previous config saved to /var/cache/conftool/dbconfig/20220627-200738-ladsgroup.json [20:07:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:44] T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311 [20:08:11] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [20:08:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:33] !log cjming@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804022|Sync sampling rates at 9 wikis DiscussionTools is testing (T309260)]] (duration: 03m 36s) [20:09:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:39] T309260: Make EditAttemptStep sampling rate consistent with MobileWebUIActions and DesktopWebUI - https://phabricator.wikimedia.org/T309260 [20:09:46] kemayo: your change should be live [20:09:52] (03CR) 10Clare Ming: [C: 03+2] Enable sticky header edit test on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808995 (https://phabricator.wikimedia.org/T310750) (owner: 10Clare Ming) [20:10:03] 10SRE, 10ops-eqsin, 10Traffic: SSH on cp5012.mgmt is flapping (CRITICAL) - https://phabricator.wikimedia.org/T311264 (10RobH) >>! In T311264#8030961, @BTullis wrote: > Hello, in case it's helpful, I fixed one of these the other day in eqiad by using `ipmitool mc reset cold` to do a cold reset of the BMC. > {... [20:10:07] cjming: Thanks! [20:10:14] np! [20:11:14] (03Merged) 10jenkins-bot: Enable sticky header edit test on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808995 (https://phabricator.wikimedia.org/T310750) (owner: 10Clare Ming) [20:13:28] PROBLEM - Check systemd state on stat1005 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-aarora-singleuser.service,jupyter-seddon-singleuser.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:15:36] !log cjming@deploy1002 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:808995|Enable sticky header edit test on beta cluster (T310750)]] (duration: 03m 33s) [20:15:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:15:44] T310750: Sticky header edit icons do not show if an A/B test other than a sticky header A/B test is defined - https://phabricator.wikimedia.org/T310750 [20:17:09] closing backport window early - is anyone needs something, feel free to ping me (here or slack) and I can jump back on [20:17:33] !log end of UTC late backport window [20:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:20] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [20:18:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:19:23] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [20:19:24] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [20:19:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:19:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:14] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [20:20:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:20] PROBLEM - Check systemd state on stat1005 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-aarora-singleuser.service,jupyter-seddon-singleuser.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:20:26] (03CR) 10Nray: [C: 03+1] Enable sticky header edit test on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808995 (https://phabricator.wikimedia.org/T310750) (owner: 10Clare Ming) [20:22:44] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P30484 and previous config saved to /var/cache/conftool/dbconfig/20220627-202243-ladsgroup.json [20:22:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:25:52] (03PS1) 10Bartosz Dziewoński: Enable DiscussionTools newtopictool at enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/809010 (https://phabricator.wikimedia.org/T311023) [20:25:54] (03PS1) 10Bartosz Dziewoński: Enable DiscussionTools visualenhancements at mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/809011 (https://phabricator.wikimedia.org/T311456) [20:25:58] (03PS1) 10Bartosz Dziewoński: Enable DiscussionTools on mobile at partner wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/809012 (https://phabricator.wikimedia.org/T298221) [20:28:12] 10SRE, 10ops-eqsin, 10Traffic: SSH on cp5012.mgmt is flapping (CRITICAL) - https://phabricator.wikimedia.org/T311264 (10RobH) a:03ssingh updated idrac from 2.50.x to 2.81.81.81, A00 @ssingh, this should clear up our errors, i can login to the idrac and system is powered back up. once this is back onlin... [20:31:51] (03PS3) 10Jdlrobson: Rename `data-ve-target-container` attribute to `data-mw-ve-target-container` [skins/Vector] (wmf/1.39.0-wmf.17) - 10https://gerrit.wikimedia.org/r/808068 (https://phabricator.wikimedia.org/T310197) [20:37:49] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P30485 and previous config saved to /var/cache/conftool/dbconfig/20220627-203748-ladsgroup.json [20:37:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:43:46] 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops, and 3 others: Access to trusted gitlab runners for gitlab-roots (or appropriate similar group) - https://phabricator.wikimedia.org/T308350 (10brennen) 05Resolved→03Open While `contint-roots` members do now have access to `gitlab1... [20:45:12] (03PS3) 10Jdlrobson: Enable title above tabs on group 1 and group 0 wikis (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808056 (https://phabricator.wikimedia.org/T310054) [20:46:02] (03CR) 10CI reject: [V: 04-1] Enable title above tabs on group 1 and group 0 wikis (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/808056 (https://phabricator.wikimedia.org/T310054) (owner: 10Jdlrobson) [20:52:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T309311)', diff saved to https://phabricator.wikimedia.org/P30486 and previous config saved to /var/cache/conftool/dbconfig/20220627-205254-ladsgroup.json [20:52:56] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance [20:52:57] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance [20:52:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:53:00] T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311 [20:53:02] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3314 (T309311)', diff saved to https://phabricator.wikimedia.org/P30487 and previous config saved to /var/cache/conftool/dbconfig/20220627-205302-ladsgroup.json [20:53:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:53:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:56:44] RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:59:11] Hey all - I'd like to get a security patch out to wmf.17 for T308861, unless there are any objections. [21:00:04] Reedy, sbassett, Maryum, and manfredi: May I have your attention please! Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220627T2100) [21:00:15] (03PS1) 10Dzahn: admin: add gitlab-roots group to gitlab_runner role [puppet] - 10https://gerrit.wikimedia.org/r/809018 (https://phabricator.wikimedia.org/T308350) [21:04:37] 10SRE: pending diff in sre.dns.netbox cookbook - https://phabricator.wikimedia.org/T311446 (10Dzahn) [21:05:50] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T309311)', diff saved to https://phabricator.wikimedia.org/P30488 and previous config saved to /var/cache/conftool/dbconfig/20220627-210550-ladsgroup.json [21:05:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:05:56] T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311 [21:07:25] !log Deployed security patch for T308861 [21:07:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:10:16] (03CR) 10Brennen Bearnes: [C: 03+1] admin: add gitlab-roots group to gitlab_runner role [puppet] - 10https://gerrit.wikimedia.org/r/809018 (https://phabricator.wikimedia.org/T308350) (owner: 10Dzahn) [21:10:35] 10SRE: pending diff in sre.dns.netbox cookbook - https://phabricator.wikimedia.org/T311446 (10Volans) 05Open→03Resolved p:05Triage→03Medium a:03Volans This has been committed as `392e48a`. [21:11:10] 10SRE: pending diff in sre.dns.netbox cookbook - https://phabricator.wikimedia.org/T311446 (10Dzahn) 18:40 < volans> the decom cookbook is currently broken for VMs as I mentioned it in the SRE meeting earlier 18:41 < volans> it will be fixed tomorrow at this point, sorry for the trouble 18:59 < elukey> I indeed... [21:20:55] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P30489 and previous config saved to /var/cache/conftool/dbconfig/20220627-212055-ladsgroup.json [21:20:57] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [21:20:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:21:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:21:49] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [21:21:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:22:10] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [21:22:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:23:06] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [21:23:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:34:56] !log wikiadmin@10.64.48.109(kowiki)> delete from user_properties where up_property='growthexperiments-homepage-suggestededits-topics-enabled'; # T308309 [21:35:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:35:01] T308309: Delete rows for growthexperiments-homepage-suggestededits-topics-enabled - https://phabricator.wikimedia.org/T308309 [21:35:01] !log wikiadmin@10.64.48.109(arwiki)> delete from user_properties where up_property='growthexperiments-homepage-suggestededits-topics-enabled'; # T308309 [21:35:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:35:55] !log wikiadmin@10.64.48.109(viwiki)> delete from user_properties where up_property='growthexperiments-homepage-suggestededits-topics-enabled'; # T308309 [21:35:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:00] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P30490 and previous config saved to /var/cache/conftool/dbconfig/20220627-213600-ladsgroup.json [21:36:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:06] !log wikiadmin@10.64.16.184(cswiki)> delete from user_properties where up_property='growthexperiments-homepage-suggestededits-topics-enabled'; # T308309 [21:36:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:32] 10SRE, 10ops-codfw, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Management interface SSH icinga alerts - https://phabricator.wikimedia.org/T304289 (10BTullis) I don't know whether this approach has already been tried, but in case it helps I can share that I've had some success resolving this ale... [21:45:28] 10SRE, 10ops-eqsin, 10Traffic: SSH on cp5012.mgmt is flapping (CRITICAL) - https://phabricator.wikimedia.org/T311264 (10BTullis) I've happened upon this tracking ticket where many similar SSH related mgmt checks are mentioned: {T304289} Mentioning it here for cross-referencing purposes. [21:47:25] 10SRE, 10Traffic-Icebox: Puppetize LVS interface IP sets per-DC for easy use in ferm rules - https://phabricator.wikimedia.org/T179027 (10Aklapper) [21:47:33] 10SRE, 10Traffic-Icebox: Puppetize LVS interface IP sets per-DC for easy use in ferm rules - https://phabricator.wikimedia.org/T179027 (10Aklapper) [21:51:05] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T309311)', diff saved to https://phabricator.wikimedia.org/P30491 and previous config saved to /var/cache/conftool/dbconfig/20220627-215105-ladsgroup.json [21:51:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:51:12] T309311: Make user_editcount unsigned in production - https://phabricator.wikimedia.org/T309311 [21:51:21] 10SRE, 10SRE-Access-Requests, 10Infrastructure-Foundations, 10serviceops, and 5 others: Access to trusted gitlab runners for gitlab-roots (or appropriate similar group) - https://phabricator.wikimedia.org/T308350 (10Dzahn) [21:57:08] jouncebot: nowandnext [21:57:08] For the next 1 hour(s) and 2 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220627T2100) [21:57:08] In 3 hour(s) and 2 minute(s): Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220628T0100) [21:57:24] 10SRE, 10Gerrit, 10serviceops, 10serviceops-collab: replacement for gerrit2001 - https://phabricator.wikimedia.org/T243027 (10Dzahn) [22:02:53] (03CR) 10Urbanecm: [C: 03+1] GrowthExperiments: Remove unused GEHomepageSuggestedEditsRequiresOptIn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/791302 (https://phabricator.wikimedia.org/T308208) (owner: 10Kosta Harlan) [22:02:57] (03CR) 10Urbanecm: [C: 03+1] GrowthExperiments: Remove GEHomepageSuggestedEditsTopicsRequiresOptIn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/791303 (https://phabricator.wikimedia.org/T308209) (owner: 10Kosta Harlan) [22:03:49] 10SRE, 10ops-codfw, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Management interface SSH icinga alerts - https://phabricator.wikimedia.org/T304289 (10Dzahn) >>! In T304289#8031394, @BTullis wrote: > I installed the `ipmitool` package on the host and executed `sudo ipmitool mc reset cold` locally... [22:03:58] 10SRE, 10ops-eqiad, 10DC-Ops: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10Cmjohnson) [22:06:02] 10SRE, 10ops-eqiad, 10DC-Ops: Q4: rack/setup/install dse-k8s-worker100[5-8] - https://phabricator.wikimedia.org/T307400 (10Cmjohnson) 05Open→03Resolved These are installed, took a few extra steps, there is a raid card, and the disk has to be changed to no-raid, which triggers the bios to set network boot... [22:09:15] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts - https://phabricator.wikimedia.org/T304888 (10Cmjohnson) @dcaro Do you which netboot configuration you need? The setup instructions above just say software raid 10. [22:10:02] (CirrusSearchHighOldGCFrequency) firing: (2) Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [22:27:17] PROBLEM - Check systemd state on netbox1002 is CRITICAL: CRITICAL - degraded: The following units failed: check_netbox_uncommitted_dns_changes.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:29:37] RECOVERY - Check systemd state on netbox1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:30:07] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [22:30:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:35:05] PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: An error occurred checking if Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [22:36:11] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [22:36:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:16] 10SRE, 10ops-codfw, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Management interface SSH icinga alerts - https://phabricator.wikimedia.org/T304289 (10Volans) Also freeipmi is installed fleetwide [22:41:25] RECOVERY - Uncommitted DNS changes in Netbox on netbox1002 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [23:01:52] (03PS1) 10Brennen Bearnes: Add dotfiles for brennen [puppet] - 10https://gerrit.wikimedia.org/r/809037 [23:02:57] (03CR) 10CI reject: [V: 04-1] Add dotfiles for brennen [puppet] - 10https://gerrit.wikimedia.org/r/809037 (owner: 10Brennen Bearnes) [23:12:51] (03PS1) 10BCornwall: prometheus: Add custom vm.max_map_count metric [puppet] - 10https://gerrit.wikimedia.org/r/809038 (https://phabricator.wikimedia.org/T311445) [23:14:39] PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [23:18:31] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [23:18:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:25] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [23:22:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:19] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1458.mgmt.eqiad.wmnet with reboot policy FORCED [23:36:19] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1466.mgmt.eqiad.wmnet with reboot policy FORCED [23:36:19] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1460.mgmt.eqiad.wmnet with reboot policy FORCED [23:36:19] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1459.mgmt.eqiad.wmnet with reboot policy FORCED [23:36:19] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1463.mgmt.eqiad.wmnet with reboot policy FORCED [23:36:20] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1462.mgmt.eqiad.wmnet with reboot policy FORCED [23:36:20] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1457.mgmt.eqiad.wmnet with reboot policy FORCED [23:36:21] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1464.mgmt.eqiad.wmnet with reboot policy FORCED [23:36:21] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1465.mgmt.eqiad.wmnet with reboot policy FORCED [23:36:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:10] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1463.mgmt.eqiad.wmnet with reboot policy FORCED [23:49:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:13] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1466.mgmt.eqiad.wmnet with reboot policy FORCED [23:49:16] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1462.mgmt.eqiad.wmnet with reboot policy FORCED [23:49:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:17] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1459.mgmt.eqiad.wmnet with reboot policy FORCED [23:49:17] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1458.mgmt.eqiad.wmnet with reboot policy FORCED [23:49:17] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1465.mgmt.eqiad.wmnet with reboot policy FORCED [23:49:17] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1460.mgmt.eqiad.wmnet with reboot policy FORCED [23:49:17] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1464.mgmt.eqiad.wmnet with reboot policy FORCED [23:49:18] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1461.mgmt.eqiad.wmnet with reboot policy FORCED [23:49:19] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1457.mgmt.eqiad.wmnet with reboot policy FORCED [23:49:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:08] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1473.mgmt.eqiad.wmnet with reboot policy FORCED [23:51:08] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1467.mgmt.eqiad.wmnet with reboot policy FORCED [23:51:08] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1475.mgmt.eqiad.wmnet with reboot policy FORCED [23:51:08] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1469.mgmt.eqiad.wmnet with reboot policy FORCED [23:51:08] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1471.mgmt.eqiad.wmnet with reboot policy FORCED [23:51:09] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1472.mgmt.eqiad.wmnet with reboot policy FORCED [23:51:09] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1476.mgmt.eqiad.wmnet with reboot policy FORCED [23:51:10] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1468.mgmt.eqiad.wmnet with reboot policy FORCED [23:51:10] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1470.mgmt.eqiad.wmnet with reboot policy FORCED [23:51:11] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.provision for host mw1474.mgmt.eqiad.wmnet with reboot policy FORCED [23:51:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log