[03:11:05] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [03:34:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [04:11:05] RESOLVED: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [05:29:25] RESOLVED: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [07:21:41] 10netops, 06Infrastructure-Foundations: lsw1-d2-codfw is unreachable through gNMI - https://phabricator.wikimedia.org/T401881 (10ayounsi) 03NEW p:05Triage→03High [07:46:39] 10netops, 06Infrastructure-Foundations: lsw1-d2-codfw is unreachable through gNMI - https://phabricator.wikimedia.org/T401881#11084791 (10cmooney) I was able to break it more!! I toggled the port number in the config, commited, then changed it back. Hoping perhaps this would force it to restart. Now: ` cmoo... [07:48:31] 10netops, 06Infrastructure-Foundations: lsw1-d2-codfw is unreachable through gNMI - https://phabricator.wikimedia.org/T401881#11084794 (10cmooney) Perhaps we could try one of these? ` cmooney@lsw1-d2-codfw> restart jsd ? Possible completions: <[Enter]> Execute this command all-members R... [08:28:17] I'll merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1177956 which will change things about throttling in nftables, this might impact Gerrit (on the https port), I'll be monitoring the situation but please let me know if you see something! [08:33:18] oops wrong chan sorry [08:37:17] 10netops, 06Infrastructure-Foundations: lsw1-d2-codfw is unreachable through gNMI - https://phabricator.wikimedia.org/T401881#11084923 (10ayounsi) 05Open→03Resolved a:03ayounsi Nice, it worked! ` lsw1-d2-codfw> restart jsd gracefully JET Services Daemon signalled but still running, waiting 28 second... [10:31:55] FIRING: MaxConntrack: Max conntrack at 81.27% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [10:36:55] RESOLVED: MaxConntrack: Max conntrack at 83.45% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [10:37:54] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Move pfw1b-codfw to rack F5 - https://phabricator.wikimedia.org/T401297#11085585 (10ayounsi) From a quick look it does seem best to have two control links for proper redundancy. But I suggest that we do a test. In a maintenance window, unplug t... [12:56:25] FIRING: SystemdUnitFailed: wmf_auto_restart_lldpd.service on install1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:34:12] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Move pfw1b-codfw to rack F5 - https://phabricator.wikimedia.org/T401297#11086240 (10Papaul) @ayounsi I will have to work with fundraising to see when it will be best for us to do so. Thanks [13:51:25] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_lldpd.service on install1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:25:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [16:32:18] 10netbox, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw: codfw:cr* router power not balance on all 4 PEM's - https://phabricator.wikimedia.org/T401937 (10Papaul) 03NEW [16:32:41] 10netbox, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw: codfw:cr* router power not balance on all 4 PEM's - https://phabricator.wikimedia.org/T401937#11086974 (10Papaul) p:05Triage→03Medium [17:09:05] 10Mail, 06Infrastructure-Foundations, 10MediaWiki-Email, 10MediaWiki-extensions-EmailAuth, and 4 others: Could not send confirmation email: Unknown error in PHP's mail() function. - https://phabricator.wikimedia.org/T383047#11087096 (10Tgr) I don't know enough about mailservers to have an opinion on the op... [17:45:25] RESOLVED: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [17:51:40] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_lldpd.service on install1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:51:40] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_lldpd.service on install1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:41:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [23:56:25] RESOLVED: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag