[10:21:55] FIRING: MaxConntrack: Max conntrack at 82.3% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [10:26:55] RESOLVED: MaxConntrack: Max conntrack at 82.3% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [10:43:22] 10SRE-tools, 06Infrastructure-Foundations: Outdated cookbooks cleanup - https://phabricator.wikimedia.org/T379259#10452589 (10Volans) @BTullis following up from our chat on [[ https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1104950/2/cookbooks/sre/aqs/__init__.py | this CR ]], when you have a chance le... [13:38:24] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:43:24] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:45:42] 10netbox, 06Infrastructure-Foundations: taavi's netbox-next account is stuck - https://phabricator.wikimedia.org/T351950#10453222 (10Aklapper) @taavi: Could you please answer the last comment? Thanks in advance! [13:52:07] 10netbox, 06Infrastructure-Foundations: taavi's netbox-next account is stuck - https://phabricator.wikimedia.org/T351950#10453240 (10SLyngshede-WMF) We have a follow-up / related bug: https://phabricator.wikimedia.org/T373702 I'll take a look at both of them. I suspect the issue is how the default account ass... [14:50:23] 10netops, 10Ceph, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Configure DSCP marking for cloudceph* hosts - https://phabricator.wikimedia.org/T371501#10453489 (10cmooney) @dcaro is there anything left to be done here? I see traffic profiled in the low and high classes across the cloud switc... [15:39:50] 10SRE-tools, 06Data-Persistence-Automations, 06DBA, 06Infrastructure-Foundations, and 2 others: spicerack mysql_legacy: support fetch metrics for instance - https://phabricator.wikimedia.org/T376596#10453800 (10ABran-WMF) a:05ABran-WMF→03None [15:48:29] 10netops, 06Infrastructure-Foundations: Multiple unreachable hosts in eqiad - https://phabricator.wikimedia.org/T382772#10453873 (10cmooney) p:05Triage→03Low a:03cmooney [15:57:21] 10netops, 10Ceph, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Configure DSCP marking for cloudceph* hosts - https://phabricator.wikimedia.org/T371501#10453986 (10dcaro) >>! In T371501#10453489, @cmooney wrote: > @dcaro is there anything left to be done here? I see traffic profiled in the lo... [16:02:37] 10netops, 06Infrastructure-Foundations: peering issues with Meta? - https://phabricator.wikimedia.org/T383442#10454029 (10cmooney) 05Open→03Resolved a:03cmooney Thanks for the task Daniel. I actually picked up on that email last week and re-enabled the sessions. So we should be ok here. The backgr... [16:08:59] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 06SRE: Manage frack switches with Netbox - https://phabricator.wikimedia.org/T268802#10454053 (10cmooney) [16:30:29] 10netbox, 06Infrastructure-Foundations: taavi's netbox-next account is stuck - https://phabricator.wikimedia.org/T351950#10454199 (10taavi) I think this one can be closed in favour of {T373702}. Both of the accounts are currently in a similar state (can log in, but it uses a random-looking username). [16:32:55] FIRING: MaxConntrack: Max conntrack at 81.1% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [16:37:55] RESOLVED: MaxConntrack: Max conntrack at 81.1% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [16:45:44] 07Puppet, 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10454289 (10elukey) Tried to copy the storcli64 binary to ms and presto nodes, these are the results: `... [16:48:11] 07Puppet, 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10454316 (10elukey) [18:16:44] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [18:21:44] FIRING: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [18:23:55] FIRING: MaxConntrack: Max conntrack at 83.61% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [18:26:44] RESOLVED: [2x] NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/12/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [18:28:55] RESOLVED: MaxConntrack: Max conntrack at 83.05% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [19:13:25] FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:28:24] RESOLVED: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:52:01] FIRING: NTPNoSynced: NTP not synced on es1043:9100 - https://wikitech.wikimedia.org/wiki/NTP - TODO - https://alerts.wikimedia.org/?q=alertname%3DNTPNoSynced [19:57:01] RESOLVED: NTPNoSynced: NTP not synced on es1043:9100 - https://wikitech.wikimedia.org/wiki/NTP - TODO - https://alerts.wikimedia.org/?q=alertname%3DNTPNoSynced [20:36:12] 10Mail, 06Infrastructure-Foundations, 06MediaWiki-Platform-Team, 10MediaWiki-User-login-and-signup, 07Wikimedia-production-error: Could not send confirmation email: Unknown error in PHP's mail() function. - https://phabricator.wikimedia.org/T383047#10455692 (10jhathaway) If possible it would be helpful t... [21:17:40] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: WMF RIPE Atlas probe in Eqiad offline - https://phabricator.wikimedia.org/T382518#10455925 (10VRiley-WMF) 05Open→03In progress [21:17:53] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: WMF RIPE Atlas probe in Eqiad offline - https://phabricator.wikimedia.org/T382518#10455927 (10VRiley-WMF) Rebooting Now [21:23:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: WMF RIPE Atlas probe in Eqiad offline - https://phabricator.wikimedia.org/T382518#10455949 (10VRiley-WMF) This has been rebooted @cmooney would you be able to check this when you have a chance? [22:26:42] 10netops, 06Infrastructure-Foundations, 10observability, 10Observability-Alerting, 06SRE: Alertmanager rule for network interface errors? - https://phabricator.wikimedia.org/T335350#10456238 (10andrea.denisse) Hi @cmooney, I noticed that patch 915489 has been merged. Do you know if there’s any remaining... [23:53:44] FIRING: [2x] NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/extras/scripts/18/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts