[09:04:34] FIRING: DiskSpace: Disk space seaborgium:9100:/ 5.195% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=seaborgium - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:14:34] RESOLVED: DiskSpace: Disk space seaborgium:9100:/ 4.895% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=seaborgium - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:26:15] 10CAS-SSO, 10Bitu, 06Infrastructure-Foundations, 06Security-Team, 06SRE: SSO kill switch for crucial services - https://phabricator.wikimedia.org/T233938#10645573 (10Arendpieter) 05Open→03Resolved [09:45:15] 10CAS-SSO, 10Bitu, 06Infrastructure-Foundations, 06Security-Team, 06SRE: SSO kill switch for crucial services - https://phabricator.wikimedia.org/T233938#10645637 (10MoritzMuehlenhoff) 05Resolved→03Open This isn't resolved? [11:05:29] 07Puppet, 06SRE: puppet error at the end of the run on prometheus2008: Could not autoload puppet/reports/logstash: Cannot invoke "jnr.netdb.Service.getName()" because "service" is null - https://phabricator.wikimedia.org/T388629#10645982 (10MoritzMuehlenhoff) I think I have a trail: I noticed that this occurs... [11:36:45] 07Puppet, 06SRE: puppet error at the end of the run on prometheus2008: Could not autoload puppet/reports/logstash: Cannot invoke "jnr.netdb.Service.getName()" because "service" is null - https://phabricator.wikimedia.org/T388629#10646127 (10MoritzMuehlenhoff) Changelog for 17.0.14: https://mail.openjdk.org/pip... [12:05:35] 10netops, 06Infrastructure-Foundations, 10ops-magru, 13Patch-For-Review: Jan 2025 - Magru core router connectivity blips - https://phabricator.wikimedia.org/T384774#10646249 (10cmooney) 05Open→03Resolved Router stable and config added to automation templates, closing task. [15:22:37] 10CAS-SSO, 06Infrastructure-Foundations, 06SRE: Registry of multiple webauthn devices - https://phabricator.wikimedia.org/T380180#10647144 (10TheDJ) [15:22:56] 10CAS-SSO, 06Infrastructure-Foundations, 06SRE: Registry of multiple webauthn devices - https://phabricator.wikimedia.org/T380180#10647147 (10TheDJ) [16:15:26] 10netops, 06Infrastructure-Foundations, 10ops-drmrs: cr1-drmrs to asw1-b12-drmrs link down - https://phabricator.wikimedia.org/T389071#10647485 (10RobH) > After changing the router side of the qsfp and fiber port back to solid green. > > Can you test it on your side? > > For information, you no longer ha... [16:20:55] FIRING: MaxConntrack: Max conntrack at 84.54% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [16:25:55] RESOLVED: MaxConntrack: Max conntrack at 84.54% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [16:32:55] FIRING: MaxConntrack: Max conntrack at 83.9% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [16:37:55] 10netops, 06Infrastructure-Foundations, 10ops-drmrs: cr1-drmrs to asw1-b12-drmrs link down - https://phabricator.wikimedia.org/T389071#10647639 (10cmooney) As discussed - somewhat clutching at straws at this point - we're gonna try moving the link/optic from port 48 to port 49 on the switch side. I've recon... [16:37:55] RESOLVED: MaxConntrack: Max conntrack at 82.48% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [16:50:58] 10netops, 06Infrastructure-Foundations, 10ops-drmrs: cr1-drmrs to asw1-b12-drmrs link down - https://phabricator.wikimedia.org/T389071#10647692 (10cmooney) It's been moved to port 49 now, but switch is still reporting no TX light on the second lane: ` Mar 18 16:42:37 asw1-b12-drmrs fpc0 qsfp-0/0/49 plugged... [17:55:22] 07Puppet, 06SRE: puppet error at the end of the run on prometheus2008: Could not autoload puppet/reports/logstash: Cannot invoke "jnr.netdb.Service.getName()" because "service" is null - https://phabricator.wikimedia.org/T388629#10648092 (10andrea.denisse) Hi, while running Puppet on the alert hosts I noticed... [18:09:14] 10netops, 06Infrastructure-Foundations, 10ops-drmrs: cr1-drmrs to asw1-b12-drmrs link down - https://phabricator.wikimedia.org/T389071#10648248 (10RobH) Ticket updated to move the link to router port 3. [18:37:04] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10648385 (10cmooney) [19:55:55] FIRING: MaxConntrack: Max conntrack at 84.91% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [20:05:55] RESOLVED: MaxConntrack: Max conntrack at 80.06% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [20:44:31] 07Puppet, 06SRE: puppet error at the end of the run on prometheus2008: Could not autoload puppet/reports/logstash: Cannot invoke "jnr.netdb.Service.getName()" because "service" is null - https://phabricator.wikimedia.org/T388629#10648788 (10jhathaway) 05Open→03Resolved a:03jhathaway Unfortunately thi... [21:01:27] 07Puppet, 06Data-Persistence, 10database-backups: Possible weird interaction between es backups and puppet runs leading to failures - https://phabricator.wikimedia.org/T367882#10648852 (10jhathaway) @jcrespo is it possible this is correlated with a ferm refresh from a puppet run? In your last example the fer... [21:27:57] 07Puppet, 06Infrastructure-Foundations, 10Keyholder, 06SRE: keyholder-proxy doesn't restart on config change - https://phabricator.wikimedia.org/T374711#10649030 (10jhathaway) @fgiunchedi should we consider this issue resolved, since the arming step for keyholder is manual, if I understand correctly?