[06:45:57] <_joe_> can I get a seal of approval on https://gerrit.wikimedia.org/r/c/operations/puppet/+/917840 and followup? [08:14:53] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 10 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MoritzMuehlenhoff) [08:16:47] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 10 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MoritzMuehlenhoff) [08:19:23] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 10 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MoritzMuehlenhoff) [08:24:19] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 10 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MoritzMuehlenhoff) [08:33:45] (HAProxyRestarted) firing: HAProxy server restarted on cp4052:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=ulsfo%20prometheus/ops&var-instance=cp4052&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [08:34:42] (SystemdUnitFailed) firing: haproxy.service Failed on cp4052:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:38:45] (HAProxyRestarted) resolved: HAProxy server restarted on cp4052:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=ulsfo%20prometheus/ops&var-instance=cp4052&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [08:39:42] (SystemdUnitFailed) resolved: haproxy.service Failed on cp4052:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:51:01] 10Traffic, 10SRE, 10Wikidata, 10wdwb-tech, 10wmde-wikidata-tech: Wikidata seems to still be utilizing insecure HTTP URIs - https://phabricator.wikimedia.org/T331356 (10OlafJanssen) >>! In T331356#8849873, @Ladsgroup wrote: > Until it gets changed to HTTPS, basically we have two options: > - Remove the l... [09:26:22] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 10 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10hnowlan) [09:56:31] 10Traffic, 10Observability-Metrics, 10SRE, 10User-fgiunchedi: Upgrade cadvisor to 0.44 fleetwide - https://phabricator.wikimedia.org/T336740 (10fgiunchedi) [10:26:37] 10Domains, 10SRE: Mark Monitor administration panel - https://phabricator.wikimedia.org/T333827 (10Jacek_Broda_WMPL) a:05Jacek_Broda_WMPL→03None [10:30:11] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 10 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter depool all active/active services in codfw: codfw row D switches upgrade - T33504... [10:46:49] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 10 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter depool all active/active services in codfw: codfw row D switches upgrade - T33504... [11:04:51] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 10 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10BTullis) [11:38:29] 10Traffic: Document how to deploy changes to DNS repo without Gerrit working - https://phabricator.wikimedia.org/T336754 (10LSobanski) [12:17:49] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 10 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10ssingh) [12:19:43] 10netops, 10Infrastructure-Foundations, 10SRE: TLS certificates for network devices - https://phabricator.wikimedia.org/T334594 (10ayounsi) > +1 extending the lifetime is just delaying the issue and increasing the possibility its forgotten or missed Yes and no. It depends on how much we can automate it with... [12:49:17] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=3a841f97-aecd-4c7a-8eb4-8acd1caa15b3) set by ayounsi@cumin1001 for 2:00:00 on 189 host(s) and their ser... [12:53:23] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MatthewVernon) [12:54:15] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MoritzMuehlenhoff) [13:24:53] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [13:25:51] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade network devices to Junos 20+ - https://phabricator.wikimedia.org/T316539 (10ayounsi) [13:25:59] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) 05Open→03Resolved a:03ayounsi All stacks have been upgraded. Hopefully for the last time! [13:28:09] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MoritzMuehlenhoff) [13:28:52] 10netops, 10Infrastructure-Foundations, 10SRE: all network devices must run OpenSSH >= 7.2p1 but != 7.4p1 - https://phabricator.wikimedia.org/T254013 (10ayounsi) 05Stalled→03Resolved a:03ayounsi Done with all the sub-tasks upgrades. [13:46:34] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS... [13:53:59] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install dns200[456] - https://phabricator.wikimedia.org/T326688 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host dns2005.wikimedia.org with OS bullseye [13:54:42] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter pool all active/active services in codfw: codfw row D switches upgrade done - T335... [14:10:31] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter pool all active/active services in codfw: codfw row D switches upgrade done - T335... [14:12:01] 10netops, 10Infrastructure-Foundations: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769 (10ayounsi) p:05Triage→03Medium [14:15:00] 10netops, 10Infrastructure-Foundations: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769 (10Volans) [14:19:01] 10netops, 10Infrastructure-Foundations, 10Patch-For-Review: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769 (10ayounsi) [14:24:55] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10herron) [14:26:44] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10ayounsi) 05Open→03Resolved a:03ayounsi Upgrade went very well. Thanks everybody! That was the last one! [14:26:54] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [14:27:04] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install dns200[456] - https://phabricator.wikimedia.org/T326688 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host dns2005.wikimedia.org with OS bullseye completed: - dns2005 (**PASS**)... [14:32:54] <_joe_> vgutierrez: how do we merge my changes? disable puppet on all cache hosts, I apply the change to one ulsfo host, check if anything explodes? [14:33:13] that sounds right [14:33:20] <_joe_> or do I also need to restart trafficserver? [14:33:46] hmm it shouldn't be necessary [14:34:08] <_joe_> ack [14:34:25] <_joe_> so I'll merge both changes at once [14:44:12] <_joe_> vgutierrez: sigh I forgot to add the conf file to the files on the server, d'oh [14:46:46] :) [14:50:06] <_joe_> but I added the file by hand and it works [14:55:25] hello folks, I have another new VIP request for you - https://gerrit.wikimedia.org/r/c/operations/puppet/+/920218 - I know I hope it will be the last one for a few, sorry :( [14:55:55] I am wondering if it is ok to do a couple of pybal restarts (maybe after Giuseppe breaks traffic server) [14:59:51] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS bul... [15:01:35] elukey: I would wait for confirmation, not sure about any lvs outages for codfw today [15:01:51] no lvs work in codfw today [15:02:18] elukey: feel free to do the restart. for the review, all of us are in the Traffic meeting so will do later [15:02:50] sukhe: something seems off - the ticket for codfw row D has lvs2010 on it, but n/a because "depooled"? [15:02:56] ditto for the row D cps [15:03:46] bblack: yep, n/a because codfw was depooled [15:03:51] I will udpate [15:03:53] sukhe: thanks! [15:04:08] I'll wait for the traffic team's review [15:06:07] sukhe: oh the window is already done? [15:06:20] bblack: yeah [15:06:24] it was 13:00 UTC [15:06:31] codfw repooled again [15:06:58] ok [15:10:30] <_joe_> reenabling puppet on the cache-text hosts [17:17:17] 10Traffic, 10Content-Transform-Team-WIP, 10RESTBase, 10SRE, and 5 others: PCS caching and pregeneration when restbase is decommissioned - https://phabricator.wikimedia.org/T319365 (10FJoseph-WMF) I've scheduled a meeting this week for followup [17:18:15] 10Traffic, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/decom codfw unified decommission task - https://phabricator.wikimedia.org/T335777 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin2002 for hosts: `dns2002.wikimedia.org` - dns2002.wikimedia.org (**WARN**) - Downtime... [17:19:10] 10Traffic, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/decom codfw unified decommission task - https://phabricator.wikimedia.org/T335777 (10ssingh) [17:26:44] 10Traffic: Add systemd-level service bindings for Wikimedia DNS - https://phabricator.wikimedia.org/T336792 (10ssingh) [17:41:42] 10Traffic, 10SRE: Add systemd-level service bindings for Wikimedia DNS - https://phabricator.wikimedia.org/T336792 (10ssingh) a:05ssingh→03None [17:46:02] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install dns200[456] - https://phabricator.wikimedia.org/T326688 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host dns2006.wikimedia.org with OS bullseye [18:00:34] _joe_: did you re-enable puppet cp text? [18:00:46] <_joe_> sukhe: sigh, yes [18:00:46] it says it's still disabled but your earlier message says you enabled it [18:00:48] <_joe_> sorry :/ [18:00:52] np! can I enable it? [18:00:56] <_joe_> yes please [18:00:57] ok thanks [18:22:54] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install dns200[456] - https://phabricator.wikimedia.org/T326688 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host dns2006.wikimedia.org with OS bullseye completed: - dns2006 (**PASS**)... [19:00:57] 10Traffic, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/decom codfw unified decommission task - https://phabricator.wikimedia.org/T335777 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin2002 for hosts: `dns2003.wikimedia.org` - dns2003.wikimedia.org (**WARN**) - Downtime... [19:01:40] 10Traffic, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/decom codfw unified decommission task - https://phabricator.wikimedia.org/T335777 (10ssingh) [19:03:41] we completed some DNS work in codfw today, dns200[4-6] being the new DNS and ns1 hosts [19:03:44] if you see some issues, please ping me [19:03:44] we should be fine but just in case [19:03:46] thanks [20:16:45] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: cloudservices[2004/2005]-dev & cloudweb2002-dev: connect them to cloudsw so they can have cloud-private vlan - https://phabricator.wikimedia.org/T336587 (10taavi) [21:07:43] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Volans) Result of the testing with Cathal. I first want to thank @cmooney for all the help with JunOS-magics, that was pre... [22:01:24] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10cmooney) Awesome work getting it working @volans big thanks to you too :) >>! In T336485#8857232, @Volans wrote: > HTTP i... [22:08:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769 (10cmooney) [23:57:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769 (10ssingh)