[03:03:55] we lost connectivity to puppetserver1001, 1002 and 2001 at midnight UTC, 1002 and 2001 are back but 1001 isn't [03:35:47] created T375839 [03:35:48] T375839: puppetserver[1001-1002,2001] crashed on 2024-09-27 00:00 - https://phabricator.wikimedia.org/T375839 [07:38:06] <_joe_> vgutierrez: so wait, we're without puppet in eqiad more or less? [07:38:21] _joe_: not anymore [07:38:32] _joe_: I "fixed" it by powercycling puppetserver1001 [07:38:40] <_joe_> oh ok [07:38:50] <_joe_> but it seems worth investigating for I/F indeed [07:38:55] <_joe_> also, why didn't it page? [07:39:15] <_joe_> seems like the puppetmaster where we merge the private repo crashing should be a paging alert [07:40:04] !incidents [07:40:05] 5284 (RESOLVED) Primary inbound port utilisation over 80% (paged) global noc (cloudsw1-e4-eqiad.mgmt.eqiad.wmnet) [07:40:05] 5283 (RESOLVED) Primary outbound port utilisation over 80% (paged) global noc (cloudsw1-d5-eqiad.mgmt.eqiad.wmnet) [07:40:05] 5282 (RESOLVED) Primary inbound port utilisation over 80% (paged) global noc (cloudsw1-e4-eqiad.mgmt.eqiad.wmnet) [07:40:05] 5281 (RESOLVED) Primary outbound port utilisation over 80% (paged) global noc (cloudsw1-d5-eqiad.mgmt.eqiad.wmnet) [07:40:20] it didn't page [07:40:34] so we might need to revisit that [07:40:45] and of course checking why we lost 3 puppetservers at the same time [07:47:56] <_joe_> ues [07:56:19] akosiaris@parsoidtest1001:~$ sudo systemd-sysusers [07:56:19] Creating group mcrouter with gid 495. [07:56:20] Creating user mcrouter (Mcrouter user) with uid 495 and gid 495. [07:56:20] Segmentation fault [07:56:37] It appears I 've managed to segfault systemd-sysusers [07:56:45] well, our puppet setup did [07:57:15] there's a glibc issue which which we hit on some roles with a lot of local users [07:57:37] https://phabricator.wikimedia.org/T256098 [07:57:57] oh task exists already, awesome [07:57:59] I tried to get it backported to the glibc stable series which is in Bulleye, but that didn't succeed [07:58:19] let me check parsoidtest1001 [07:59:31] it's from the mcrouter postinst [08:00:10] yup [08:00:17] very first line in the script [08:00:26] if [ "$1" = configure ]; then [08:00:26] systemd-sysusers [08:04:11] https://gerrit.wikimedia.org/r/1076144 [08:07:57] moritzm: tabs vs spaces aside, LGTM [10:15:31] GitLab needs a short maintenance at 11:00 UTC (in 45 minutes) [11:08:24] GitLab maintenance finished [14:36:49] mw-web looking kinda hot [14:37:31] yeah. will see if it's like the 10:00 UTC spike or it becomes worse [15:06:18] <_joe_> NA oncallers: all quiet on the eastern front today [15:06:29] thanks!