[00:00:24] (03CR) 10Papaul: [C: 03+2] Rename ceph to cephosd [puppet] - 10https://gerrit.wikimedia.org/r/981413 (https://phabricator.wikimedia.org/T349934) (owner: 10Papaul) [00:00:30] !log jclark@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti1035.eqiad.wmnet with OS bullseye [00:00:31] !log jclark@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti1036.eqiad.wmnet with OS bullseye [00:00:33] !log jclark@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti1037.eqiad.wmnet with OS bullseye [00:00:35] !log jclark@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti1038.eqiad.wmnet with OS bullseye [00:00:35] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q2:rack/setup/install ganeti103[5-8] - https://phabricator.wikimedia.org/T349925 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host ganeti1035.eqiad.wmnet with OS bullseye [00:00:37] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q2:rack/setup/install ganeti103[5-8] - https://phabricator.wikimedia.org/T349925 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host ganeti1036.eqiad.wmnet with OS bullseye [00:00:40] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q2:rack/setup/install ganeti103[5-8] - https://phabricator.wikimedia.org/T349925 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host ganeti1037.eqiad.wmnet with OS bullseye [00:00:42] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q2:rack/setup/install ganeti103[5-8] - https://phabricator.wikimedia.org/T349925 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host ganeti1038.eqiad.wmnet with OS bullseye [00:02:45] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering, 10Patch-For-Review: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10Papaul) @Jhancock.wm i send a patch to fix it. you can resume the install https://gerrit.wikimedia.org/r/c/operations/puppet/+/981413 [00:06:05] RECOVERY - cassandra-b SSL 10.192.16.244:7000 on restbase2030 is OK: SSL OK - Certificate restbase2030-b valid until 2025-12-06 17:50:15 +0000 (expires in 729 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates [00:06:17] RECOVERY - cassandra-b service on restbase2030 is OK: OK - cassandra-b is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:15:41] !log jclark@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage [00:15:51] !log jclark@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage [00:16:13] !log jclark@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage [00:16:18] !log jclark@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage [00:18:17] PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:19:05] !log jclark@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage [00:21:34] !log jclark@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage [00:24:30] !log jclark@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage [00:26:13] !log jclark@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage [00:31:08] RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:35:31] !log jclark@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001" [00:35:32] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54299 and previous config saved to /var/cache/conftool/dbconfig/20231208-003532-ladsgroup.json [00:35:38] T343198: Add pl_target_id column to pagelinks in production - https://phabricator.wikimedia.org/T343198 [00:36:38] !log jclark@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001" [00:36:39] !log jclark@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1038.eqiad.wmnet with OS bullseye [00:36:45] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q2:rack/setup/install ganeti103[5-8] - https://phabricator.wikimedia.org/T349925 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host ganeti1038.eqiad.wmnet with OS bullseye completed: - ganeti1038 (**PA... [00:37:03] !log jclark@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001" [00:38:09] !log jclark@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001" [00:38:10] !log jclark@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1035.eqiad.wmnet with OS bullseye [00:38:15] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q2:rack/setup/install ganeti103[5-8] - https://phabricator.wikimedia.org/T349925 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host ganeti1035.eqiad.wmnet with OS bullseye completed: - ganeti1035 (**PA... [00:38:37] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/980845 [00:38:43] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/980845 (owner: 10TrainBranchBot) [00:41:02] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10Papaul) [00:42:15] !log jclark@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001" [00:42:52] PROBLEM - Check systemd state on centrallog2002 is CRITICAL: CRITICAL - degraded: The following units failed: logrotate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:43:42] !log jclark@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001" [00:43:44] !log jclark@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1037.eqiad.wmnet with OS bullseye [00:43:48] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q2:rack/setup/install ganeti103[5-8] - https://phabricator.wikimedia.org/T349925 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host ganeti1037.eqiad.wmnet with OS bullseye completed: - ganeti1037 (**PA... [00:43:52] !log jclark@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001" [00:44:56] !log jclark@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001" [00:44:58] !log jclark@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1036.eqiad.wmnet with OS bullseye [00:45:03] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q2:rack/setup/install ganeti103[5-8] - https://phabricator.wikimedia.org/T349925 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host ganeti1036.eqiad.wmnet with OS bullseye completed: - ganeti1036 (**WA... [00:45:22] PROBLEM - Check systemd state on centrallog1002 is CRITICAL: CRITICAL - degraded: The following units failed: logrotate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:45:58] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q2:rack/setup/install ganeti103[5-8] - https://phabricator.wikimedia.org/T349925 (10Jclark-ctr) [00:46:41] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q2:rack/setup/install ganeti103[5-8] - https://phabricator.wikimedia.org/T349925 (10Jclark-ctr) 05Open→03Resolved a:05VRiley-WMF→03Jclark-ctr [00:50:39] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P54300 and previous config saved to /var/cache/conftool/dbconfig/20231208-005038-ladsgroup.json [00:51:05] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [00:56:21] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/980845 (owner: 10TrainBranchBot) [00:59:52] (03PS1) 10Majavah: admin: POC: allow using security key backed SSH keys [puppet] - 10https://gerrit.wikimedia.org/r/981418 [01:00:46] (03CR) 10CI reject: [V: 04-1] admin: POC: allow using security key backed SSH keys [puppet] - 10https://gerrit.wikimedia.org/r/981418 (owner: 10Majavah) [01:02:39] (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/851/con" [puppet] - 10https://gerrit.wikimedia.org/r/981418 (owner: 10Majavah) [01:03:26] (03PS2) 10Majavah: admin: POC: allow using security key backed SSH keys [puppet] - 10https://gerrit.wikimedia.org/r/981418 [01:05:26] (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/981418 (owner: 10Majavah) [01:05:46] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P54301 and previous config saved to /var/cache/conftool/dbconfig/20231208-010545-ladsgroup.json [01:20:52] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54302 and previous config saved to /var/cache/conftool/dbconfig/20231208-012051-ladsgroup.json [01:20:54] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance [01:20:56] T343198: Add pl_target_id column to pagelinks in production - https://phabricator.wikimedia.org/T343198 [01:21:07] (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [01:21:09] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance [01:21:15] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54303 and previous config saved to /var/cache/conftool/dbconfig/20231208-012115-ladsgroup.json [01:54:02] RECOVERY - MariaDB Replica Lag: s6 on dbstore1005 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [02:12:29] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye [02:12:34] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host sessionstore2004.codfw.wmnet with OS bullseye executed with errors: -... [02:14:23] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54304 and previous config saved to /var/cache/conftool/dbconfig/20231208-021422-ladsgroup.json [02:14:27] T343198: Add pl_target_id column to pagelinks in production - https://phabricator.wikimedia.org/T343198 [02:14:51] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED [02:15:59] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED [02:16:30] !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2004'] [02:16:51] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2004'] [02:17:52] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye [02:17:58] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host sessionstore2004.codfw.wmnet with OS bullseye [02:18:03] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye [02:18:09] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host sessionstore2004.codfw.wmnet with OS bullseye executed with errors: -... [02:19:20] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye [02:19:27] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host sessionstore2004.codfw.wmnet with OS bullseye [02:19:32] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye [02:19:37] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host sessionstore2004.codfw.wmnet with OS bullseye executed with errors: -... [02:29:29] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P54305 and previous config saved to /var/cache/conftool/dbconfig/20231208-022929-ladsgroup.json [02:33:51] 10SRE, 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder) [02:39:06] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:44:36] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P54306 and previous config saved to /var/cache/conftool/dbconfig/20231208-024435-ladsgroup.json [02:59:42] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54307 and previous config saved to /var/cache/conftool/dbconfig/20231208-025942-ladsgroup.json [02:59:44] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance [02:59:46] T343198: Add pl_target_id column to pagelinks in production - https://phabricator.wikimedia.org/T343198 [02:59:59] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance [03:00:06] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54308 and previous config saved to /var/cache/conftool/dbconfig/20231208-030005-ladsgroup.json [03:00:25] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [03:09:06] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:16:42] RECOVERY - cassandra-b CQL 10.192.16.244:9042 on restbase2030 is OK: TCP OK - 0.037 second response time on 10.192.16.244 port 9042 https://phabricator.wikimedia.org/T93886 [03:27:49] (03CR) 10Abijeet Patro: [V: 03+2] Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/981312 (owner: 10L10n-bot) [03:33:07] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54309 and previous config saved to /var/cache/conftool/dbconfig/20231208-033306-ladsgroup.json [03:33:13] T343198: Add pl_target_id column to pagelinks in production - https://phabricator.wikimedia.org/T343198 [03:48:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P54310 and previous config saved to /var/cache/conftool/dbconfig/20231208-034813-ladsgroup.json [04:03:20] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P54311 and previous config saved to /var/cache/conftool/dbconfig/20231208-040319-ladsgroup.json [04:10:54] RECOVERY - cassandra-c SSL 10.192.16.245:7000 on restbase2030 is OK: SSL OK - Certificate restbase2030-c valid until 2025-12-06 17:50:18 +0000 (expires in 729 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates [04:10:58] RECOVERY - cassandra-c service on restbase2030 is OK: OK - cassandra-c is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [04:18:27] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54312 and previous config saved to /var/cache/conftool/dbconfig/20231208-041826-ladsgroup.json [04:18:29] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance [04:18:31] T343198: Add pl_target_id column to pagelinks in production - https://phabricator.wikimedia.org/T343198 [04:18:44] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance [04:45:50] (03PS1) 10Kevin Bazira: ml-services: update article-descriptions isvc image in the experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/981426 (https://phabricator.wikimedia.org/T352959) [05:06:03] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance [05:06:18] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance [05:06:24] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54313 and previous config saved to /var/cache/conftool/dbconfig/20231208-050624-ladsgroup.json [05:06:42] T343198: Add pl_target_id column to pagelinks in production - https://phabricator.wikimedia.org/T343198 [05:41:17] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54314 and previous config saved to /var/cache/conftool/dbconfig/20231208-054116-ladsgroup.json [05:41:21] T343198: Add pl_target_id column to pagelinks in production - https://phabricator.wikimedia.org/T343198 [05:49:07] (03PS1) 10Samwilson: testwiki: Enable the Edit Recovery feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981423 (https://phabricator.wikimedia.org/T353041) [05:56:24] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P54315 and previous config saved to /var/cache/conftool/dbconfig/20231208-055623-ladsgroup.json [06:11:30] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P54316 and previous config saved to /var/cache/conftool/dbconfig/20231208-061130-ladsgroup.json [06:26:37] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54317 and previous config saved to /var/cache/conftool/dbconfig/20231208-062636-ladsgroup.json [06:26:39] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance [06:26:41] T343198: Add pl_target_id column to pagelinks in production - https://phabricator.wikimedia.org/T343198 [06:26:54] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance [07:00:05] Deploy window MediaWiki infrastucture (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231208T0700) [07:00:25] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [07:26:36] (03CR) 10Ayounsi: Move git search related classes to __init__ (034 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/981349 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi) [07:26:56] (03PS4) 10Ayounsi: Move git search related classes to __init__ [cookbooks] - 10https://gerrit.wikimedia.org/r/981349 (https://phabricator.wikimedia.org/T350152) [07:28:15] !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'configure' for AS: 237 [07:28:36] !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 237 [07:36:43] (03CR) 10Ayounsi: "PCC output and overall logic lgtm but I'll leave it to someone else to fully review the implementation (puppet/prometheus)." [puppet] - 10https://gerrit.wikimedia.org/r/981358 (https://phabricator.wikimedia.org/T163996) (owner: 10Majavah) [08:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231208T0800) [08:07:58] (03Abandoned) 10Muehlenhoff: Enable requestctl-driven network blocks for sretest [puppet] - 10https://gerrit.wikimedia.org/r/977166 (https://phabricator.wikimedia.org/T348734) (owner: 10Muehlenhoff) [08:39:59] (03PS2) 10Jelto: research-landing-page: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/981386 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza) [08:43:15] (03CR) 10Jelto: [C: 03+1] "I fixed a typo in the version tag. Should be 2023-12-07-192736 not 2023-12-07-19273." [deployment-charts] - 10https://gerrit.wikimedia.org/r/981386 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza) [08:47:04] (03CR) 10Jelto: [C: 03+2] research-landing-page: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/981386 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza) [08:47:58] (03Merged) 10jenkins-bot: research-landing-page: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/981386 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza) [08:48:21] 10SRE, 10SRE-tools, 10DBA, 10Infrastructure-Foundations, and 3 others: puppet7 on cumin breaks database connections - https://phabricator.wikimedia.org/T352974 (10ABran-WMF) {F41573747} testing `db-mysql` commands directly in context with the 2 CA reproduces this issue, it is possible that there is an issu... [08:54:06] (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [09:00:19] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to wmf and analytics-privatedata-users for EHughes (superset access with no server access) - https://phabricator.wikimedia.org/T351387 (10ehughes) Done, thank you! [09:02:29] (03PS1) 10Kosta Harlan: IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981424 (https://phabricator.wikimedia.org/T304604) [09:16:09] !log arnaudb@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance [09:16:23] !log arnaudb@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance [09:16:29] !log arnaudb@cumin1001 dbctl commit (dc=all): 'Depooling db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54318 and previous config saved to /var/cache/conftool/dbconfig/20231208-091628-arnaudb.json [09:16:35] T348183: Apply schema change for changing img_size, oi_size, us_size, and fa_size to BIGINT - https://phabricator.wikimedia.org/T348183 [09:18:25] (03CR) 10Ilias Sarantopoulos: [C: 03+1] ml-services: update article-descriptions isvc image in the experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/981426 (https://phabricator.wikimedia.org/T352959) (owner: 10Kevin Bazira) [09:19:03] (03PS4) 10Brouberol: Define an echoserver namespace for the dse-k8s-eqiad cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/981363 (https://phabricator.wikimedia.org/T353004) [09:19:05] (03PS3) 10Brouberol: Define a simple echoserver chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/981367 (https://phabricator.wikimedia.org/T353004) [09:19:07] (03PS3) 10Brouberol: Define deployment helmfiles for echoserver in dse-k8s-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/981368 (https://phabricator.wikimedia.org/T353004) [09:26:33] (03CR) 10Btullis: [C: 03+1] "Looks good, thanks." [deployment-charts] - 10https://gerrit.wikimedia.org/r/981363 (https://phabricator.wikimedia.org/T353004) (owner: 10Brouberol) [09:27:14] (03CR) 10Brouberol: [C: 03+2] Define an echoserver namespace for the dse-k8s-eqiad cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/981363 (https://phabricator.wikimedia.org/T353004) (owner: 10Brouberol) [09:28:09] (03CR) 10Jelto: "looks mostly good, one nit in-line about unused args" [puppet] - 10https://gerrit.wikimedia.org/r/979912 (owner: 10EoghanGaffney) [09:28:18] !log arnaudb@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54319 and previous config saved to /var/cache/conftool/dbconfig/20231208-092817-arnaudb.json [09:28:22] T348183: Apply schema change for changing img_size, oi_size, us_size, and fa_size to BIGINT - https://phabricator.wikimedia.org/T348183 [09:31:03] (03CR) 10Kevin Bazira: [C: 03+2] "Thanks for the review :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/981426 (https://phabricator.wikimedia.org/T352959) (owner: 10Kevin Bazira) [09:31:13] (03PS1) 10Brouberol: Provision credentials to deploy the echoserver service on dse-k8s [puppet] - 10https://gerrit.wikimedia.org/r/981425 (https://phabricator.wikimedia.org/T353004) [09:31:52] (03CR) 10Btullis: [C: 03+1] Provision credentials to deploy the echoserver service on dse-k8s [puppet] - 10https://gerrit.wikimedia.org/r/981425 (https://phabricator.wikimedia.org/T353004) (owner: 10Brouberol) [09:32:10] (03Merged) 10jenkins-bot: ml-services: update article-descriptions isvc image in the experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/981426 (https://phabricator.wikimedia.org/T352959) (owner: 10Kevin Bazira) [09:32:12] (03CR) 10Brouberol: [C: 03+2] Provision credentials to deploy the echoserver service on dse-k8s [puppet] - 10https://gerrit.wikimedia.org/r/981425 (https://phabricator.wikimedia.org/T353004) (owner: 10Brouberol) [09:40:35] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [09:40:44] 10SRE, 10SRE-tools, 10DBA, 10Infrastructure-Foundations, and 3 others: puppet7 on cumin breaks database connections - https://phabricator.wikimedia.org/T352974 (10ABran-WMF) one other interesting fact: a puppet 7 host >>! In T352974#9389926, @Marostegui wrote: > db1124 can be used for testing. It is a te... [09:41:08] !log Creating the echoserver namespace in dse-k8s-eqiad - T353004 [09:41:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:11] T353004: Deploy an echoserver service on dse-k8s-eqiad behind ingress - https://phabricator.wikimedia.org/T353004 [09:41:30] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [09:41:57] !log kevinbazira@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [09:43:24] !log arnaudb@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54320 and previous config saved to /var/cache/conftool/dbconfig/20231208-094324-arnaudb.json [09:53:02] (03PS4) 10Brouberol: Define a simple echoserver chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/981367 (https://phabricator.wikimedia.org/T353004) [09:53:04] (03PS4) 10Brouberol: Define deployment helmfiles for echoserver in dse-k8s-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/981368 (https://phabricator.wikimedia.org/T353004) [09:53:06] (03PS1) 10Phuedx: ext-EventStreamConfig: Add eventlogging_MediaWikiPingback stream config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981446 (https://phabricator.wikimedia.org/T323828) [09:58:31] !log arnaudb@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54321 and previous config saved to /var/cache/conftool/dbconfig/20231208-095830-arnaudb.json [09:59:44] (03PS5) 10Brouberol: Define deployment helmfiles for echoserver in dse-k8s-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/981368 (https://phabricator.wikimedia.org/T353004) [10:01:43] (03CR) 10Btullis: [C: 03+1] Define deployment helmfiles for echoserver in dse-k8s-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/981368 (https://phabricator.wikimedia.org/T353004) (owner: 10Brouberol) [10:07:47] (03PS1) 10David Caro: cloud: add missing codfw1dev:openstack_control_nodes [puppet] - 10https://gerrit.wikimedia.org/r/981448 [10:08:01] (ProbeDown) firing: (2) Service phab1004:443 has failed probes (http_phabricator_wikimedia_org_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#phab1004:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:08:43] hmmm, let me take a look [10:08:53] It took a moment but Phab loaded for me [10:09:00] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/853/console" [puppet] - 10https://gerrit.wikimedia.org/r/981448 (owner: 10David Caro) [10:09:00] high loda, lots of FPM processes [10:09:52] (03CR) 10David Caro: [V: 03+1] "This is making puppet fail on the puppetmaster puppetmaster-02.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud" [puppet] - 10https://gerrit.wikimedia.org/r/981448 (owner: 10David Caro) [10:10:46] (03PS2) 10David Caro: cloud: add missing codfw1dev:openstack_control_nodes [puppet] - 10https://gerrit.wikimedia.org/r/981448 (https://phabricator.wikimedia.org/T353048) [10:10:53] moritzm, sobanski: maybe let's move to _security? [10:10:59] +1 [10:13:37] !log arnaudb@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54322 and previous config saved to /var/cache/conftool/dbconfig/20231208-101337-arnaudb.json [10:13:41] T348183: Apply schema change for changing img_size, oi_size, us_size, and fa_size to BIGINT - https://phabricator.wikimedia.org/T348183 [10:18:01] (ProbeDown) resolved: (2) Service phab1004:443 has failed probes (http_phabricator_wikimedia_org_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#phab1004:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:37:10] 10SRE, 10SRE-tools, 10DBA, 10Infrastructure-Foundations, and 3 others: puppet7 on cumin breaks database connections - https://phabricator.wikimedia.org/T352974 (10ABran-WMF) it appears that most of our hosts are still using `/etc/ssl/certs/Puppet_Internal_CA.pem` and should be migrated to use `/etc/ssl/cer... [10:40:12] (03PS1) 10David Caro: openstack_apis_response: add value to the description [alerts] - 10https://gerrit.wikimedia.org/r/981450 [10:44:15] (03CR) 10Hnowlan: [C: 03+2] rest-gateway: fix device analytics routing [deployment-charts] - 10https://gerrit.wikimedia.org/r/980471 (https://phabricator.wikimedia.org/T343268) (owner: 10Alex Paskulin) [10:45:13] (03Merged) 10jenkins-bot: rest-gateway: fix device analytics routing [deployment-charts] - 10https://gerrit.wikimedia.org/r/980471 (https://phabricator.wikimedia.org/T343268) (owner: 10Alex Paskulin) [10:58:22] RECOVERY - Check systemd state on puppetmaster1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:00:26] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [11:15:40] (03PS1) 10Effie Mouzeli: (WIP) mcrouter vanilla chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/981461 [11:16:24] (03CR) 10CI reject: [V: 04-1] (WIP) mcrouter vanilla chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/981461 (owner: 10Effie Mouzeli) [11:17:27] (03PS2) 10Effie Mouzeli: (WIP) mcrouter vanilla chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/981461 [11:18:09] (03CR) 10CI reject: [V: 04-1] (WIP) mcrouter vanilla chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/981461 (owner: 10Effie Mouzeli) [11:20:13] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.2 point update - https://phabricator.wikimedia.org/T348326 (10MoritzMuehlenhoff) [11:25:40] (03CR) 10Samtar: [C: 03+1] testwiki: Enable the Edit Recovery feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981423 (https://phabricator.wikimedia.org/T353041) (owner: 10Samwilson) [11:39:09] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.2 point update - https://phabricator.wikimedia.org/T348326 (10MoritzMuehlenhoff) [11:40:36] !log hnowlan@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply [11:40:48] !log hnowlan@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply [11:46:01] (03PS1) 10Slyngshede: Move Debmonitor client code to separate repository. [software/debmonitor-client] - 10https://gerrit.wikimedia.org/r/981463 [11:46:24] (03CR) 10Tchanders: [C: 03+1] IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981424 (https://phabricator.wikimedia.org/T304604) (owner: 10Kosta Harlan) [11:50:18] (03PS3) 10Effie Mouzeli: (WIP) mcrouter vanilla chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/981461 [11:59:19] (03CR) 10Muehlenhoff: "The debian/control file is missing." [software/debmonitor-client] - 10https://gerrit.wikimedia.org/r/981463 (owner: 10Slyngshede) [12:04:15] (03PS1) 10Muehlenhoff: defs_requestctl_nftables.tpl: Fix range selection [puppet] - 10https://gerrit.wikimedia.org/r/981465 (https://phabricator.wikimedia.org/T348734) [12:10:22] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981465 (https://phabricator.wikimedia.org/T348734) (owner: 10Muehlenhoff) [12:10:28] (03PS5) 10Brouberol: Define a simple echoserver chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/981367 (https://phabricator.wikimedia.org/T353004) [12:10:30] (03PS6) 10Brouberol: Define deployment helmfiles for echoserver in dse-k8s-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/981368 (https://phabricator.wikimedia.org/T353004) [12:12:08] (03CR) 10Slyngshede: "We still need CI to be hooked up: https://gerrit.wikimedia.org/r/c/integration/config/+/981464 and setup-scm to be configured, but feedbac" [software/debmonitor-client] - 10https://gerrit.wikimedia.org/r/981463 (owner: 10Slyngshede) [12:16:33] (03CR) 10Slyngshede: Move Debmonitor client code to separate repository. (032 comments) [software/debmonitor-client] - 10https://gerrit.wikimedia.org/r/981463 (owner: 10Slyngshede) [12:17:03] (03PS2) 10Slyngshede: Move Debmonitor client code to separate repository. [software/debmonitor-client] - 10https://gerrit.wikimedia.org/r/981463 [12:17:26] (03CR) 10Muehlenhoff: Move Debmonitor client code to separate repository. (031 comment) [software/debmonitor-client] - 10https://gerrit.wikimedia.org/r/981463 (owner: 10Slyngshede) [12:21:01] (03PS3) 10Slyngshede: Move Debmonitor client code to separate repository. [software/debmonitor-client] - 10https://gerrit.wikimedia.org/r/981463 [12:22:21] (03CR) 10Slyngshede: Move Debmonitor client code to separate repository. (031 comment) [software/debmonitor-client] - 10https://gerrit.wikimedia.org/r/981463 (owner: 10Slyngshede) [12:40:16] (03PS1) 10Muehlenhoff: Remove entry for decommed host [puppet] - 10https://gerrit.wikimedia.org/r/981466 [12:42:21] !log hnowlan@deploy2002 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply [12:42:35] !log hnowlan@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply [12:42:58] 10SRE, 10SRE-swift-storage: Memory exhaustion when uploading large TIFF files by URL - https://phabricator.wikimedia.org/T334814 (10Don-vip) I still face the issue for large TIFF files, is there a way to workaround this problem, or is it possible to increase the memory limit? ` 2023-12-08T12:11:35.429Z Downlo... [12:43:01] (03PS4) 10Slyngshede: Move Debmonitor client code to separate repository. [software/debmonitor-client] - 10https://gerrit.wikimedia.org/r/981463 [12:51:20] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.2 point update - https://phabricator.wikimedia.org/T348326 (10MoritzMuehlenhoff) [12:51:35] (03CR) 10Muehlenhoff: [C: 03+2] Remove entry for decommed host [puppet] - 10https://gerrit.wikimedia.org/r/981466 (owner: 10Muehlenhoff) [12:54:21] (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [12:55:01] !log hnowlan@deploy2002 helmfile [codfw] START helmfile.d/services/rest-gateway: apply [12:55:11] !log hnowlan@deploy2002 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply [13:13:35] (03PS1) 10Muehlenhoff: doc: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/981469 [13:14:06] (CirrusSearchHighOldGCFrequency) resolved: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [13:19:52] (03CR) 10Muehlenhoff: "Looks good, a few nits inline" [software/debmonitor-client] - 10https://gerrit.wikimedia.org/r/981463 (owner: 10Slyngshede) [13:22:04] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3 point update - https://phabricator.wikimedia.org/T353057 (10MoritzMuehlenhoff) [13:22:37] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3 point update - https://phabricator.wikimedia.org/T353057 (10MoritzMuehlenhoff) p:05Triage→03Medium a:03MoritzMuehlenhoff [13:28:54] (03PS1) 10Ayounsi: [WIP] Cookbook to renumber a host while moving its vlan [cookbooks] - 10https://gerrit.wikimedia.org/r/981472 (https://phabricator.wikimedia.org/T350152) [13:31:46] (03PS2) 10Ayounsi: [WIP] Cookbook to renumber a host while changing its vlan [cookbooks] - 10https://gerrit.wikimedia.org/r/981472 (https://phabricator.wikimedia.org/T350152) [13:42:35] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Automation to change a server's vlan - https://phabricator.wikimedia.org/T350152 (10ayounsi) >>! In T350152#9355720, @Volans wrote: > * I would probably add a grep for the IP on at least `/etc` on the host too to check if it's hardcoded somewhere... [13:52:59] (PuppetZeroResources) firing: Puppet has failed generate resources on elastic1107:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:02:21] 10SRE, 10SRE-swift-storage, 10MediaWiki-extensions-PagedTiffHandler: Memory exhaustion when uploading large TIFF files by URL - https://phabricator.wikimedia.org/T334814 (10Don-vip) [14:02:59] (PuppetZeroResources) resolved: Puppet has failed generate resources on elastic1107:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:08:17] (03CR) 10Btullis: [C: 03+1] "Looks good." [deployment-charts] - 10https://gerrit.wikimedia.org/r/981367 (https://phabricator.wikimedia.org/T353004) (owner: 10Brouberol) [14:09:41] (03CR) 10Brouberol: [C: 03+2] Define a simple echoserver chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/981367 (https://phabricator.wikimedia.org/T353004) (owner: 10Brouberol) [14:11:18] (03PS1) 10Muehlenhoff: miscweb: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/981539 [14:12:30] (03CR) 10Brouberol: [C: 03+2] Define deployment helmfiles for echoserver in dse-k8s-eqiad [deployment-charts] - 10https://gerrit.wikimedia.org/r/981368 (https://phabricator.wikimedia.org/T353004) (owner: 10Brouberol) [14:14:43] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981539 (owner: 10Muehlenhoff) [14:15:19] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply [14:15:21] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply [14:18:09] (03PS1) 10Brouberol: Fix remaining references to rdf-streaming-updater [deployment-charts] - 10https://gerrit.wikimedia.org/r/981540 (https://phabricator.wikimedia.org/T353004) [14:18:38] (03PS2) 10Muehlenhoff: miscweb: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/981539 [14:20:58] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981539 (owner: 10Muehlenhoff) [14:39:07] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:40:32] (03PS1) 10Papaul: Add new sessionstore node to site.pp and preseed.yaml [puppet] - 10https://gerrit.wikimedia.org/r/981544 (https://phabricator.wikimedia.org/T349876) [14:42:01] (03CR) 10Btullis: [C: 03+1] Fix remaining references to rdf-streaming-updater [deployment-charts] - 10https://gerrit.wikimedia.org/r/981540 (https://phabricator.wikimedia.org/T353004) (owner: 10Brouberol) [14:42:50] (03CR) 10Brouberol: [C: 03+2] Fix remaining references to rdf-streaming-updater [deployment-charts] - 10https://gerrit.wikimedia.org/r/981540 (https://phabricator.wikimedia.org/T353004) (owner: 10Brouberol) [14:42:54] (03PS1) 10Muehlenhoff: gerrit: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/981545 [14:43:24] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/echoserver: apply [14:43:58] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/echoserver: apply [14:44:08] (03CR) 10Papaul: [C: 03+2] Add new sessionstore node to site.pp and preseed.yaml [puppet] - 10https://gerrit.wikimedia.org/r/981544 (https://phabricator.wikimedia.org/T349876) (owner: 10Papaul) [14:44:48] !log drain eqiad-codfw lumen transport for maintenance - T342502 [14:44:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:52] T342502: Inbound interface errors - https://phabricator.wikimedia.org/T342502 [14:45:22] (03CR) 10Dzahn: [C: 03+2] doc: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/981469 (owner: 10Muehlenhoff) [14:46:58] 10SRE, 10ops-eqiad, 10DBA: Degraded RAID on db1168 - https://phabricator.wikimedia.org/T353020 (10Jclark-ctr) 05Open→03Resolved Replaced Failed drive with disk from decommissioned server [14:47:17] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops, 10Patch-For-Review: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10Papaul) [14:48:02] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops, 10Patch-For-Review: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10Papaul) Servers were missing in site.pp and 2006 was missing in preseed.yaml file I send a patch to fix this . You an try again the re-image ht... [14:48:55] (03CR) 10Dzahn: [C: 03+2] "/etc/ferm/conf.d/10_doc-http gets removed and /etc/ferm/conf.d/10_doc_http with an underscore gets created. but file names shouldn't matte" [puppet] - 10https://gerrit.wikimedia.org/r/981469 (owner: 10Muehlenhoff) [14:51:50] (03CR) 10Dzahn: [C: 03+1] "yea, let's try without the DNS lookup for v6" [puppet] - 10https://gerrit.wikimedia.org/r/981387 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [14:54:07] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:55:40] (03CR) 10Dzahn: miscweb: Avoid Ferm-specific syntax (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/981539 (owner: 10Muehlenhoff) [14:56:14] 10SRE, 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10Jclark-ctr) cleaned both sides of cable and replaced optic if errors continue we will replace cable [14:56:42] (03CR) 10Dzahn: [C: 03+2] miscweb: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/981539 (owner: 10Muehlenhoff) [14:57:52] (03CR) 10Bking: [C: 03+2] wdqs: monitor ldf endpoint [puppet] - 10https://gerrit.wikimedia.org/r/981387 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [14:59:32] 10SRE, 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10ayounsi) 05Open→03Resolved Thx, closing the task, automation will re-open it if needed. [15:00:26] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [15:00:52] (03CR) 10Dzahn: [C: 03+2] "only affects miscweb2003, and there the diff is:" [puppet] - 10https://gerrit.wikimedia.org/r/981539 (owner: 10Muehlenhoff) [15:01:07] 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops, 10decommission-hardware: decommission db1126.eqiad.wmnet - https://phabricator.wikimedia.org/T352362 (10Jclark-ctr) [15:01:51] 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops, 10decommission-hardware: decommission db1126.eqiad.wmnet - https://phabricator.wikimedia.org/T352362 (10Jclark-ctr) 05Open→03Resolved [15:02:27] (03CR) 10Dzahn: [C: 03+1] "+1 from volans = you should merge :)" [puppet] - 10https://gerrit.wikimedia.org/r/972929 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse) [15:03:11] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981545 (owner: 10Muehlenhoff) [15:05:24] 10SRE, 10ops-eqiad: decommission flerovium - https://phabricator.wikimedia.org/T352193 (10Jclark-ctr) [15:05:32] 10SRE, 10ops-eqiad: decommission flerovium - https://phabricator.wikimedia.org/T352193 (10Jclark-ctr) 05Open→03Resolved [15:07:31] (03PS1) 10Muehlenhoff: parsoid::testing: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/981546 [15:09:00] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye [15:09:10] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops, 10Patch-For-Review: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host sessionstore2004.codfw.wmnet with OS bullseye [15:09:21] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye [15:09:30] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops, 10Patch-For-Review: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host sessionstore2004.codfw.wmnet with OS bullseye e... [15:10:31] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981546 (owner: 10Muehlenhoff) [15:13:33] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED [15:15:46] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED [15:17:05] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye [15:17:12] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host sessionstore2004.codfw.wmnet with OS bullseye [15:25:58] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10Jhancock.wm) @papaul thank you! @Eevans that did it. thank you too! I thought it was something I messed up so no big deal. I should finish this up this morning. [15:28:18] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye [15:28:19] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye [15:28:24] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host sessionstore2005.codfw.wmnet with OS bullseye [15:28:27] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host sessionstore2006.codfw.wmnet with OS bullseye [15:33:17] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage [15:35:19] (03CR) 10FNegri: [C: 03+1] "Nice!" [alerts] - 10https://gerrit.wikimedia.org/r/981450 (owner: 10David Caro) [15:35:53] (03PS1) 10Kevin Bazira: ml-services: update article-descriptions isvc image in the experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/981429 (https://phabricator.wikimedia.org/T352750) [15:36:44] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage [15:39:32] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/980079 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [15:43:14] 10SRE, 10SRE-tools, 10DBA, 10Infrastructure-Foundations, and 2 others: puppet7 on cumin breaks database connections - https://phabricator.wikimedia.org/T352974 (10LSobanski) I believe the collab tag was added automatically from the parent task so removing it. [15:44:17] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage [15:44:47] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage [15:45:40] (03CR) 10Ilias Sarantopoulos: [C: 03+2] ml-services: update article-descriptions isvc image in the experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/981429 (https://phabricator.wikimedia.org/T352750) (owner: 10Kevin Bazira) [15:47:13] (03Merged) 10jenkins-bot: ml-services: update article-descriptions isvc image in the experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/981429 (https://phabricator.wikimedia.org/T352750) (owner: 10Kevin Bazira) [15:47:45] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage [15:49:41] !log milimetric@deploy2002 Started deploy [airflow-dags/platform_eng@049cf03]: (no justification provided) [15:50:27] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage [15:50:33] (03PS1) 10Bking: wdqs: add ldf endpoint logic [puppet] - 10https://gerrit.wikimedia.org/r/981551 (https://phabricator.wikimedia.org/T347355) [15:50:33] !log milimetric@deploy2002 Finished deploy [airflow-dags/platform_eng@049cf03]: (no justification provided) (duration: 00m 52s) [15:52:40] (03PS2) 10Bking: wdqs: add ldf endpoint logic [puppet] - 10https://gerrit.wikimedia.org/r/981551 (https://phabricator.wikimedia.org/T347355) [15:53:33] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981551 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [15:58:19] (03CR) 10Dzahn: [C: 03+1] wdqs: add ldf endpoint logic [puppet] - 10https://gerrit.wikimedia.org/r/981551 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [15:58:59] (03CR) 10Bking: [C: 03+2] wdqs: add ldf endpoint logic [puppet] - 10https://gerrit.wikimedia.org/r/981551 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [16:08:27] !log kevinbazira@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [16:14:07] (ProbeDown) firing: (2) Service wdqs1015:80 has failed probes (http_query_wikidata_org_ldf_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1015:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:14:18] 10SRE, 10LDAP-Access-Requests: Grant Access to archiva-deployers for pfischer - https://phabricator.wikimedia.org/T352475 (10jijiki) 05Open→03Resolved User is present in archiva-deployers LDAP group, please reopen if there is anything else [16:14:39] (03PS1) 10Samtar: beta.labs: logo and favicon change for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981553 (https://phabricator.wikimedia.org/T352951) [16:19:28] !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1015.eqiad.wmnet with reason: T347355 [16:19:32] T347355: Create alerts for https://query.wikidata.org/bigdata/ldf - https://phabricator.wikimedia.org/T347355 [16:19:44] !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1015.eqiad.wmnet with reason: T347355 [16:19:59] (03CR) 10Sohom Datta: [C: 03+1] beta.labs: logo and favicon change for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981553 (https://phabricator.wikimedia.org/T352951) (owner: 10Samtar) [16:21:23] (03CR) 10Samtar: [C: 03+2] "self+2 on a Friday, but it's a simple beta-only change, was checked" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981553 (https://phabricator.wikimedia.org/T352951) (owner: 10Samtar) [16:22:09] (03Merged) 10jenkins-bot: beta.labs: logo and favicon change for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981553 (https://phabricator.wikimedia.org/T352951) (owner: 10Samtar) [16:28:06] (03PS5) 10Andrew Bogott: cloud-init: make puppet optional [puppet] - 10https://gerrit.wikimedia.org/r/980079 (https://phabricator.wikimedia.org/T326818) [16:28:20] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/980079 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [16:30:11] (03PS6) 10Dreamy Jazz: MediaModeration: Set MediaModerationDeveloperMode to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/979969 (owner: 10Kosta Harlan) [16:31:44] RECOVERY - cassandra-c CQL 10.192.16.245:9042 on restbase2030 is OK: TCP OK - 0.032 second response time on 10.192.16.245 port 9042 https://phabricator.wikimedia.org/T93886 [16:39:15] (03PS1) 10Ebernhardson: cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981557 [16:45:48] (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981557 (owner: 10Ebernhardson) [16:46:42] (03Merged) 10jenkins-bot: cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981557 (owner: 10Ebernhardson) [16:49:35] !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [16:49:51] !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:04:13] (03PS1) 10Ebernhardson: cirrus updater: Update async fetch queue size [deployment-charts] - 10https://gerrit.wikimedia.org/r/981562 [17:06:19] (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Update async fetch queue size [deployment-charts] - 10https://gerrit.wikimedia.org/r/981562 (owner: 10Ebernhardson) [17:07:07] (03Merged) 10jenkins-bot: cirrus updater: Update async fetch queue size [deployment-charts] - 10https://gerrit.wikimedia.org/r/981562 (owner: 10Ebernhardson) [17:08:38] (03PS1) 10Bking: miscweb: Notify search platform for sites they own [puppet] - 10https://gerrit.wikimedia.org/r/981563 (https://phabricator.wikimedia.org/T347355) [17:08:48] !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [17:09:05] !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:13:08] (03CR) 10Dzahn: [C: 03+1] "+1 (we can still discuss if we should have a second one with our team in some form or not)" [puppet] - 10https://gerrit.wikimedia.org/r/981563 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [17:21:36] (03CR) 10Bking: [C: 03+2] miscweb: Notify search platform for sites they own [puppet] - 10https://gerrit.wikimedia.org/r/981563 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [17:53:20] (03PS1) 10Bking: wdqs: move params from body to path [puppet] - 10https://gerrit.wikimedia.org/r/981578 (https://phabricator.wikimedia.org/T347355) [18:11:50] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/980079 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [18:13:28] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:14:06] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:14:56] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:17:28] (03PS1) 10Ebernhardson: cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981581 [18:21:11] (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981581 (owner: 10Ebernhardson) [18:22:02] (03Merged) 10jenkins-bot: cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981581 (owner: 10Ebernhardson) [18:23:12] (03CR) 10Dzahn: [C: 03+1] wdqs: move params from body to path [puppet] - 10https://gerrit.wikimedia.org/r/981578 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [18:24:41] (03CR) 10Bking: [C: 03+2] wdqs: move params from body to path [puppet] - 10https://gerrit.wikimedia.org/r/981578 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [18:24:59] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981578 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [18:26:13] !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [18:26:27] !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:27:04] !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [18:27:15] !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:28:10] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:28:10] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.262 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:28:50] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51008 bytes in 0.132 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:29:25] (03CR) 10Bking: [C: 03+2] wdqs: move params from body to path [puppet] - 10https://gerrit.wikimedia.org/r/981578 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [18:46:45] (03CR) 10LSobanski: "I am of the opinion that we should have an alert for this, let's discuss on Monday." [puppet] - 10https://gerrit.wikimedia.org/r/981563 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [18:55:31] (03PS1) 10Brion VIBBER: Remove obsolete lost GPG key for Brion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981583 [19:00:26] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [19:11:56] (03CR) 10Herron: [C: 03+1] "LGTM but not tested. if its not too much work maybe we could bind on ipv6 as well?" [puppet] - 10https://gerrit.wikimedia.org/r/981407 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse) [19:38:14] (03PS1) 10Dzahn: query_service: duplicate monitoring checks for sre-collab team [puppet] - 10https://gerrit.wikimedia.org/r/981591 (https://phabricator.wikimedia.org/T347355) [19:38:43] (03CR) 10CI reject: [V: 04-1] query_service: duplicate monitoring checks for sre-collab team [puppet] - 10https://gerrit.wikimedia.org/r/981591 (https://phabricator.wikimedia.org/T347355) (owner: 10Dzahn) [19:40:42] (03PS2) 10Dzahn: query_service: duplicate monitoring checks for sre-collab team [puppet] - 10https://gerrit.wikimedia.org/r/981591 (https://phabricator.wikimedia.org/T347355) [19:41:22] 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Platform-SRE: Q1:rack/setup/install wdqs102[0-4] - https://phabricator.wikimedia.org/T342749 (10RKemper) [19:43:24] (03CR) 10Dzahn: [C: 03+1] "re: lsobanski, that would be ~ https://gerrit.wikimedia.org/r/c/operations/puppet/+/981591/" [puppet] - 10https://gerrit.wikimedia.org/r/981563 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [19:48:53] (03PS1) 10Ebernhardson: cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981596 [19:55:51] (03PS1) 10Ryan Kemper: wdqs: remove extraneous insetup role [puppet] - 10https://gerrit.wikimedia.org/r/981599 (https://phabricator.wikimedia.org/T345475) [19:56:44] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981599 (https://phabricator.wikimedia.org/T345475) (owner: 10Ryan Kemper) [19:57:56] (03PS1) 10Eevans: keys & certs for missing restbase nodes [labs/private] - 10https://gerrit.wikimedia.org/r/981601 (https://phabricator.wikimedia.org/T352468) [19:59:11] (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981596 (owner: 10Ebernhardson) [19:59:57] (03Merged) 10jenkins-bot: cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981596 (owner: 10Ebernhardson) [20:02:16] !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [20:02:28] !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:03:19] (03PS6) 10Andrew Bogott: cloud-init: make puppet optional [puppet] - 10https://gerrit.wikimedia.org/r/980079 (https://phabricator.wikimedia.org/T326818) [20:03:52] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/980079 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [20:07:19] (03PS2) 10Eevans: add keys & certs for missing restbase nodes [labs/private] - 10https://gerrit.wikimedia.org/r/981601 (https://phabricator.wikimedia.org/T352468) [20:09:58] (03PS3) 10Eevans: restbase: add missing keys & certs, remove obsolete [labs/private] - 10https://gerrit.wikimedia.org/r/981601 (https://phabricator.wikimedia.org/T352468) [20:19:30] (03CR) 10Ryan Kemper: [C: 03+2] wdqs: remove extraneous insetup role [puppet] - 10https://gerrit.wikimedia.org/r/981599 (https://phabricator.wikimedia.org/T345475) (owner: 10Ryan Kemper) [20:23:07] (03CR) 10Dzahn: [C: 03+1] parsoid::testing: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/981546 (owner: 10Muehlenhoff) [20:24:04] (03CR) 10Dzahn: [C: 03+2] gerrit: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/981545 (owner: 10Muehlenhoff) [20:26:13] (03CR) 10Andrew Bogott: [C: 03+2] Horizon: allow image uploading via horizon for users with glance admin [puppet] - 10https://gerrit.wikimedia.org/r/980021 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [20:26:23] (03CR) 10Andrew Bogott: [C: 03+2] cloud-init: make puppet optional [puppet] - 10https://gerrit.wikimedia.org/r/980079 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [20:28:18] (03CR) 10Dzahn: [C: 03+1] "lgtm, but unlike the other ones i'd rather not merge myself" [puppet] - 10https://gerrit.wikimedia.org/r/981546 (owner: 10Muehlenhoff) [20:29:07] (03CR) 10Dzahn: [C: 03+2] "only affected gerrit2002, looks good, only used when migrating to new servers" [puppet] - 10https://gerrit.wikimedia.org/r/981545 (owner: 10Muehlenhoff) [20:53:24] (03PS1) 10Eevans: restbase: set production role and add config for restbase2031 [puppet] - 10https://gerrit.wikimedia.org/r/981605 (https://phabricator.wikimedia.org/T352468) [20:53:26] (03PS1) 10Eevans: restbase: set production role and add config for restbase2032 [puppet] - 10https://gerrit.wikimedia.org/r/981606 (https://phabricator.wikimedia.org/T352468) [20:53:28] (03PS1) 10Eevans: restbase: set production role and add config for restbase2033 [puppet] - 10https://gerrit.wikimedia.org/r/981607 (https://phabricator.wikimedia.org/T352468) [20:53:30] (03PS1) 10Eevans: restbase: set production role and add config for restbase2034 [puppet] - 10https://gerrit.wikimedia.org/r/981608 (https://phabricator.wikimedia.org/T352468) [20:53:32] (03PS1) 10Eevans: restbase: set production role and add config for restbase2035 [puppet] - 10https://gerrit.wikimedia.org/r/981609 (https://phabricator.wikimedia.org/T352468) [20:55:08] (03PS1) 10Andrew Bogott: wmcs-image-create: quote 'install_puppet' value [puppet] - 10https://gerrit.wikimedia.org/r/981610 [20:57:49] (03CR) 10Andrew Bogott: [C: 03+2] wmcs-image-create: quote 'install_puppet' value [puppet] - 10https://gerrit.wikimedia.org/r/981610 (owner: 10Andrew Bogott) [21:05:32] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye [21:05:38] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cephosd2001.codfw.wmnet with OS bullseye [21:08:17] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage [21:11:38] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage [21:17:16] (03PS1) 10Andrew Bogott: nova vendor-data: 2nd attempt to read 'install_puppet' metadata [puppet] - 10https://gerrit.wikimedia.org/r/981620 (https://phabricator.wikimedia.org/T326818) [21:20:02] (03CR) 10Andrew Bogott: [C: 03+2] nova vendor-data: 2nd attempt to read 'install_puppet' metadata [puppet] - 10https://gerrit.wikimedia.org/r/981620 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [21:30:12] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [21:31:33] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [21:31:35] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2001.codfw.wmnet with OS bullseye [21:31:40] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cephosd2001.codfw.wmnet with OS bullseye completed: - cephosd2001 (... [21:34:54] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye [21:35:00] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cephosd2002.codfw.wmnet with OS bullseye [21:35:26] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cephosd2002.codfw.wmnet with OS bullseye executed with errors: - ce... [21:37:16] (03CR) 10Andrea Denisse: [C: 03+2] "Thanks everyone for your ideas and suggestions. 😊" [puppet] - 10https://gerrit.wikimedia.org/r/972929 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse) [21:52:44] (03CR) 10Bking: [C: 03+1] wdqs: remove extraneous insetup role [puppet] - 10https://gerrit.wikimedia.org/r/981599 (https://phabricator.wikimedia.org/T345475) (owner: 10Ryan Kemper) [21:54:33] (03PS1) 10Dzahn: planet: enable update timers in planet1003 [puppet] - 10https://gerrit.wikimedia.org/r/981623 (https://phabricator.wikimedia.org/T348392) [21:56:43] (03CR) 10Dzahn: [C: 03+2] planet: enable update timers in planet1003 [puppet] - 10https://gerrit.wikimedia.org/r/981623 (https://phabricator.wikimedia.org/T348392) (owner: 10Dzahn) [22:06:45] (03PS1) 10Bking: wdqs: simplify ldf endpoint check [puppet] - 10https://gerrit.wikimedia.org/r/981624 (https://phabricator.wikimedia.org/T347355) [22:07:16] (03CR) 10CI reject: [V: 04-1] wdqs: simplify ldf endpoint check [puppet] - 10https://gerrit.wikimedia.org/r/981624 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [22:11:08] (03PS2) 10Bking: wdqs: simplify ldf endpoint check [puppet] - 10https://gerrit.wikimedia.org/r/981624 (https://phabricator.wikimedia.org/T347355) [22:13:41] (03CR) 10Gehel: wdqs: simplify ldf endpoint check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/981624 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [22:18:22] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10Jhancock.wm) [22:22:25] (03PS1) 10Andrew Bogott: nova vendor-data: 3rd attempt to read 'install_puppet' metadata [puppet] - 10https://gerrit.wikimedia.org/r/981625 (https://phabricator.wikimedia.org/T326818) [22:23:11] (03CR) 10Bking: wdqs: simplify ldf endpoint check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/981624 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [22:24:17] (03CR) 10Andrew Bogott: [C: 03+2] nova vendor-data: 3rd attempt to read 'install_puppet' metadata [puppet] - 10https://gerrit.wikimedia.org/r/981625 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [22:26:07] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981624 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [22:26:46] !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [22:26:58] !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:30:08] (03CR) 10Ryan Kemper: [C: 03+1] "Looks reasonable" [puppet] - 10https://gerrit.wikimedia.org/r/981624 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [22:32:10] (03CR) 10Bking: [C: 03+2] wdqs: simplify ldf endpoint check [puppet] - 10https://gerrit.wikimedia.org/r/981624 (https://phabricator.wikimedia.org/T347355) (owner: 10Bking) [22:35:26] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye [22:35:33] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cephosd2002.codfw.wmnet with OS bullseye [22:40:45] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye [22:40:52] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host sessionstore2006.codfw.wmnet with OS bullseye [22:41:44] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye [22:42:14] 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host sessionstore2006.codfw.wmnet with OS bullseye executed with errors: -... [22:42:52] !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bullseye [22:42:58] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cephosd2003.codfw.wmnet with OS bullseye [23:00:26] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [23:02:07] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage [23:03:57] !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [23:04:07] !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:05:31] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage [23:18:47] !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye [23:18:53] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cephosd2002.codfw.wmnet with OS bullseye executed with errors: - ce... [23:24:59] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [23:27:32] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [23:27:34] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2003.codfw.wmnet with OS bullseye [23:27:41] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cephosd2003.codfw.wmnet with OS bullseye completed: - cephosd2003 (... [23:28:21] (03PS1) 10Andrew Bogott: vendordata: only wipe out puppet certs if we aren't building a base image [puppet] - 10https://gerrit.wikimedia.org/r/981628 (https://phabricator.wikimedia.org/T326818) [23:29:35] (03CR) 10Andrew Bogott: [C: 03+2] vendordata: only wipe out puppet certs if we aren't building a base image [puppet] - 10https://gerrit.wikimedia.org/r/981628 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [23:40:01] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Engineering: Q2:rack/setup/install ceph200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T349934 (10Jhancock.wm) [23:47:46] !log dani@deploy2002 helmfile [staging] START helmfile.d/services/miscweb: apply [23:48:08] !log dani@deploy2002 helmfile [staging] DONE helmfile.d/services/miscweb: apply [23:48:15] !log dani@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply [23:48:35] !log dani@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply [23:48:43] !log dani@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply [23:49:10] !log dani@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply [23:55:22] (03PS1) 10Ebernhardson: cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981632 [23:59:36] (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/981632 (owner: 10Ebernhardson)