[00:04:15] RESOLVED: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:01:24] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9938126 (10Marostegui) [07:28:34] FIRING: DiskSpace: Disk space build2001:9100:/ 5.17% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [07:48:34] RESOLVED: DiskSpace: Disk space build2001:9100:/ 3.872% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [08:21:59] * volans is back, but over to Data Persistence [08:22:26] let me know if anything from the backlog of the last week still needs my attention [08:36:17] 10SRE-tools, 10conftool, 06Infrastructure-Foundations, 10Spicerack: Spicerack support for dbctl - https://phabricator.wikimedia.org/T362893#9938525 (10ABran-WMF) [08:36:32] 10SRE-tools, 10conftool, 06DBA, 06Infrastructure-Foundations, 10Spicerack: Spicerack support for dbctl - https://phabricator.wikimedia.org/T362893#9938526 (10ABran-WMF) [09:53:01] 10netops, 06Infrastructure-Foundations, 06SRE: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts - https://phabricator.wikimedia.org/T368513#9938867 (10fgiunchedi) Those are SSH probes from local prometheus hosts indeed, in this case the probe consists of a TCP conne... [09:55:04] 10netops, 06Infrastructure-Foundations, 06serviceops, 06Traffic: weighted maglev viability for low-traffic services - https://phabricator.wikimedia.org/T368545#9938878 (10fgiunchedi) >>! In T368545#9929623, @Vgutierrez wrote: >>>! In T368545#9929335, @ayounsi wrote: >> I think I miss some context, what's t... [10:02:58] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322#9938917 (10fgiunchedi) >>! In T326322#9934200, @cmooney wrote: > @fgiunchedi I was perhaps a little cheeky and merged this, but it was c... [12:24:45] I'm hitting a weird bug where the kernel a hosts boots into for d-i is 5.10.0-28-amd64 but the modules on disk are for 5.10.0-30-amd64 [12:24:53] it makes late_command.sh fail [12:25:16] + cp /target/lib/modules/5.10.0-28-amd64/kernel/drivers/firmware/qemu_fw_cfg.ko /lib/modules/5.10.0-28-amd64/kernel/drivers/firmware/ [12:25:18] cp: can't stat '/target/lib/modules/5.10.0-28-amd64/kernel/drivers/firmware/qemu_fw_cfg.ko': No such file or directory [12:27:21] It's the last step of the command for this host so I guess I can just go ahead and continue d-i [12:27:27] s/command/script/ [12:30:28] claime: without looking that's probably related to the latest debian point release that might need an update of our d-i image https://www.debian.org/News/2024/20240629 [12:31:32] yeah makes sense [12:31:42] wait, which distro was this? 5.10 kernel seems old [12:32:09] bullseye got a point release too https://www.debian.org/News/2024/2024062902 [12:32:13] so yeah [12:32:38] we need to update the images [12:32:38] https://wikitech.wikimedia.org/wiki/Updating_netboot_image_with_newer_kernel#Updating_production_point_release [12:32:48] volans: yeah, bullseye [12:32:52] I guess someone in I/F could do that (I'm not here) [12:33:30] x) I can run it if necessary, I have a bunch of servers to reimage [12:33:47] unless it's risky, but it looks like y'all have scripted it [12:42:42] volans: does it need to be synced to the other puppetservers after being updated, or do I need to run it on both puppet5 and puppet7 master ? [12:43:36] I don't recall the details, that rebuilds on volatile and then puppet runs should do their job, as long as you did it on the current volatile "master" :D [12:44:07] which is puppetmaster1001 iirc, so should be ok [12:44:23] I'll do a puppet run on puppetserver1001 to check [12:46:37] eeeeh apparently not [12:54:49] you need a run on the install server [12:55:06] as for the current volatile master I'm not sure as I was out last week [12:55:10] and things are moving [12:55:24] 10SRE-tools, 10conftool, 06DBA, 06Infrastructure-Foundations, 10Spicerack: Spicerack support for dbctl - https://phabricator.wikimedia.org/T362893#9939575 (10ABran-WMF) p:05Low→03Medium [13:30:24] volans: fyi, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071574#32 :) [13:33:30] \o/ [13:52:03] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e1-eqiad - https://phabricator.wikimedia.org/T365993#9939770 (10jcrespo) No action will be needed for backup1010 in the end. [14:28:52] 10SRE-tools, 06Infrastructure-Foundations, 13Patch-For-Review: Allow debmonitor to store the Debian version-id in the OS field - https://phabricator.wikimedia.org/T368744#9939925 (10elukey) p:05Triage→03Medium [14:37:11] 10netops, 06Infrastructure-Foundations: magru ipv6 issues - https://phabricator.wikimedia.org/T368499#9940003 (10ayounsi) 05Open→03Resolved a:03ayounsi [16:27:21] 10SRE-tools, 06Infrastructure-Foundations, 13Patch-For-Review: Allow debmonitor to store the Debian version-id in the OS field - https://phabricator.wikimedia.org/T368744#9940710 (10elukey) After a chat with Riccardo some things came up: * It seems that the issue comes up when debmonitor-client is upgraded... [16:35:28] https://docs.pyinfra.com/en/3.x/getting-started.html :P [16:36:14] let's replace cumin... :D [17:03:35] Nice! I was gonna mention jetporch as well, but that seems to have disappeared https://github.com/jetporch/jetporch_docs [17:24:34] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9941093 (10Papaul) [17:26:54] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9941103 (10Papaul) All the cabling is done. I am leaving this task open so when we move the console cables from asw-c*/d*-codfw to ssw1-* and lsw1-* I can u...