[06:13:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest2003.codfw.wmnet with OS bullseye... [08:20:53] 10Traffic, 10SRE: Cannot upload on Commons or even here - https://phabricator.wikimedia.org/T349671 (10LSobanski) [09:33:04] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10Fabfur) [09:34:33] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1001 for host cp1101.eqiad.wmnet with OS bullseye [09:59:08] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1001 for host cp1101.eqiad.wmnet with OS bullseye executed with errors: - cp1101 (**FAIL**) - Downtimed on Icinga/... [09:59:44] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1001 for host cp1101.eqiad.wmnet with OS bullseye [10:36:43] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1001 for host cp1101.eqiad.wmnet with OS bullseye completed: - cp1101 (**PASS**) - Removed from Puppet and PuppetD... [10:39:02] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10Fabfur) [10:40:22] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1001 for host cp1102.eqiad.wmnet with OS bullseye [11:17:30] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1001 for host cp1102.eqiad.wmnet with OS bullseye executed with errors: - cp1102 (**FAIL**) - Downtimed on Icinga/... [11:18:35] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1001 for host cp1102.eqiad.wmnet with OS bullseye [11:19:02] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate mr1-codfw from asw-a1-codfw to lsw1-a1-codfw - https://phabricator.wikimedia.org/T348164 (10cmooney) As discussed with @papaul we may try to connect this to lsw1-a2-codfw instead, so that we can remove the requirement for a leaf switch in... [11:52:07] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1001 for host cp1102.eqiad.wmnet with OS bullseye completed: - cp1102 (**PASS**) - Removed from Puppet and PuppetD... [11:56:02] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10Fabfur) [12:02:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Bring codfw row A-B EVPN switches live and make them gateway for existing Vlans - https://phabricator.wikimedia.org/T347191 (10cmooney) Discussed with @papaul and we will do this work on Thursday at 11.30am CDT / 16:30 UCT. Shouldn't be any inter... [13:00:54] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest2004.codfw.wmnet with OS bullseye [13:37:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest2004.codfw.wmnet with OS bullseye... [13:38:22] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest2004.codfw.wmnet with OS bullseye [14:20:44] (SystemdUnitFailed) firing: acme-chief.service Failed on acmechief2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:24:30] jbond: please do not deploy a second acmechief server without setting it to passive [14:25:09] vgutierrez: i was just comming to ask you. [14:25:21] i have build the server and done a basic puppet run [14:25:47] jbond: role/common/acme_chief.yaml:profile::acme_chief::passive [14:25:53] jbond: add 2002 there [14:26:17] i have installed the acme-chief software based on https://gitlab.wikimedia.org/repos/sre/acme-chief/-/merge_requests/2 [14:26:25] https://gerrit.wikimedia.org/r/c/operations/puppet/+/969336 [14:26:39] vgutierrez: this is to place it to passive i think is there anything elses? [14:26:48] jbond: errr make sure that acme-chief doesn't start there please [14:26:57] flagging it as passive should be enough of course [14:28:02] oh ok.. our puppetization is safe enough [14:28:05] vgutierrez: i hav stoped it now but it was started [14:28:09] $is_active = $::fqdn == $active_host [14:28:17] jbond: it shouldn't? [14:29:04] vgutierrez: i i thnk it tried to start when it got installed but failed [14:29:32] vgutierrez: can i get a +1 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/969336 [14:31:30] vgutierrez: white space fixed :) [14:32:42] jbond: pcc please [14:33:00] ack one sec [14:38:47] vgutierrez: https://puppet-compiler.wmflabs.org/output/969336/222 [14:39:00] please hit the active server [14:39:05] acmechief2001 [14:39:15] ack [14:41:45] vgutierrez: ghttps://puppet-compiler.wmflabs.org/output/969336/223/ [14:42:46] once merged what should i do to preform an initial sync? or ow long shuld i wait? [14:45:47] jbond: there is a timer.. give me one sec [14:47:19] jbond: acme-chief-certs-sync.timer on acmechief2001 [14:48:04] cheers [14:50:39] jbond: checked acmechief2002... acme-chief didn't do anything there cause there wasn't a config file [14:50:45] jbond: so no impact :) [14:51:21] great thanks [15:00:44] (SystemdUnitFailed) resolved: acme-chief.service Failed on acmechief2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:10:12] FYI i did a systemd reset-failed for the above [15:10:53] yeah, that's expected [15:11:03] puppet flags the service as inactive on passive nodes [15:11:09] so it won't get started at all [15:11:26] and by default a server is considered as inactive, so no big deal [15:11:32] s/inactive/passive/ [15:11:56] it also doesn't ship a default configuration file on the .deb package so it's unable to start if puppet doesn't configure it first [15:12:53] ack thanks [15:13:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) Above patch reflects my thinking on the best approach for this. I've taken the approach that we should announce all our internal... [15:38:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest2004.codfw.wmnet with OS bullseye... [15:41:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) FWIW in my original config for this I had terms to match routes redistributed into BGP locally and announced in IBGP, or between c... [16:19:23] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate mr1-codfw from asw-a1-codfw to lsw1-a1-codfw - https://phabricator.wikimedia.org/T348164 (10Papaul) @cmooney for the cross rack link it does make sense to use copper with 1000BaseT sine we have those already on site. On the other hand sin... [19:07:37] 10Traffic, 10Patch-For-Review: Add custom HAProxy backend only for healthchecks - https://phabricator.wikimedia.org/T348851 (10KOfori)