[01:20:24] FYI, eqiad has been repooled for edge traffic - summary in -sre. flagging here in case there's any disruptive maintenance planned there under the cover of the switchover. [07:43:17] moritzm: o/ [07:43:19] Class[Ferm]: has no parameter named 'ferm_status_restart' (file: /srv/puppet_code/environments/production/modules/firewall/manifests/init.pp, [07:44:16] got this while running puppet on registry1004 [07:44:18] we just merged a fix [07:44:32] ah I got caught in the middle, retrying [07:44:44] fallout of the patch which enabled the next gen ferm status monitoring [10:27:14] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 4 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#10178834 (10MoritzMuehlenhoff) [12:11:07] 10netops, 06Infrastructure-Foundations, 10cloud-services-team (FY2024/2025-Q1-Q2): cloud: edge network suffers downtime if one cloudsw is down - https://phabricator.wikimedia.org/T375259#10179065 (10ayounsi) It would be useful to capture more data (eg. packet capture) next time this happens. The ICMP no rout... [12:28:10] 10netops, 06Infrastructure-Foundations, 10cloud-services-team (FY2024/2025-Q1-Q2): cloud: edge network suffers downtime if one cloudsw is down - https://phabricator.wikimedia.org/T375259#10179194 (10ayounsi) A few more info thanks to @aborrero on IRC. After 185.15.56.244, the packets towards 185.15.56.57 ar... [12:47:06] 10netops, 06Infrastructure-Foundations, 10cloud-services-team (FY2024/2025-Q1-Q2): cloud: edge network suffers downtime if one cloudsw is down - https://phabricator.wikimedia.org/T375259#10179269 (10ayounsi) Actually... `ssh: connect to host login.toolforge.org port 22: No route to host` is a red hearing, S... [13:03:55] 10netops, 06Infrastructure-Foundations, 10cloud-services-team (FY2024/2025-Q1-Q2): cloud: edge network suffers downtime if one cloudsw is down - https://phabricator.wikimedia.org/T375259#10179344 (10aborrero) In case they are useful, keepalived VRRP logs can be seen here: {P69421} [13:29:35] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Spicerack: allow cookbooks to abort execution from __init__ - https://phabricator.wikimedia.org/T365454#10179477 (10ssingh) Another use case for this is the `sre.dns.admin` cookbook. ` sukhe@cumin1002:~$ sudo cookbook sre.dns.admin show => CURRENT STA... [15:49:29] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-c-codfw switch stack - https://phabricator.wikimedia.org/T375418#10180297 (10Papaul) [15:51:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-d-codfw switch stack - https://phabricator.wikimedia.org/T375419#10180308 (10Papaul) [15:51:05] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-c-codfw switch stack - https://phabricator.wikimedia.org/T375418#10180305 (10Papaul) 05Open→03Resolved This is complete [15:52:24] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-d-codfw switch stack - https://phabricator.wikimedia.org/T375419#10180309 (10Papaul) 05Open→03Resolved This is complete [16:13:03] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 3 others: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10180446 (10Papaul) setup/configuration of both switches done. Just need to add the switches to monitoring was we have pfw1... [17:00:37] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: cr3-ulsfo incident 22 Sep 2024 - https://phabricator.wikimedia.org/T375345#10180628 (10RobH) Inbound shipment ticket 00980858 for UPS 1Z20506Y0100053206 (already delivered today and got the shipment notice last night). Next step is sc... [20:16:11] hello I/F friends, at least two of us have seen some odd puppet CI failures today, example: https://integration.wikimedia.org/ci/job/operations-puppet-tests-bullseye/369/console [20:16:11] does this look familiar to anyone? [20:54:25] * jhathaway hides in the bushes [20:55:44] swfrench-wmf: I suspect a consequence of upgrading the CI container to puppet7, that I failed to catch, but not for sure [21:02:18] are these hosts on puppet 7? [21:09:17] jhathaway: apologies, missed this before, and thank you very much taking a look. I *think* it has been? i.e., I don't see a notice in the login banner that it's still on 5. [21:09:50] more specifically, this is on a change that affects hosts with role::deployment_server::kubernetes [21:10:12] deployment servers are on P7, yes [21:10:49] hmm, not sure why then we wouldn't get an error in an apply on those servers, do they use that function? [21:11:20] what is the gerrit link? [21:11:38] here's one: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1076019 [21:11:50] here's another: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075981 [21:16:24] perhaps modules/service/manifests/node/config.pp:43 is never used in practice? [21:16:36] i.e. $full_config is always true? [21:20:39] ah, that's a good point [21:21:39] so, all instantiations of service::node I can find seem to have that true or deployment => 'scap3' [21:22:24] that would render the else branch at modules/service/manifests/node.pp:296 dead? [21:25:53] or, rather, dead with the exception of the one case where $full_config is true, I mean :) [21:30:17] if that is true, it would explain why there has not been failures in production [21:42:56] really simple question I do not know the answer to: how is it determined which tests are run for a given patch? [21:42:56] as in, does it somehow figure out the transitive set of all classes, functions, etc. used by resources affected by the patch diffs, and then run all covering tests? [21:53:10] swfrench-wmf: if I recall correctly it runs all tests for any modules whose files are altered by the patch set [21:53:38] it may do something more sophisticated than that, but I would need to dig through the code a bit [21:54:20] # * spec - run the spec tests on the modules where files are changed, or whose [21:54:22] # tests depend on modules that have been modified. [21:54:31] ^ is at least what the docs say [21:54:57] got it, so it's a fairly coarse "what's affected by this" definition [21:56:12] indeed