[08:22:50] 10netops, 06Infrastructure-Foundations, 06SRE: GRE Interfaces statistics not being returned by Juniper MX via gnmi - https://phabricator.wikimedia.org/T403936 (10cmooney) 03NEW p:05Triage→03Low [08:22:59] 10netops, 06Infrastructure-Foundations, 06SRE: GRE Interfaces statistics not being returned by Juniper MX via gnmi - https://phabricator.wikimedia.org/T403936#11156395 (10cmooney) [08:23:01] 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#11156396 (10cmooney) [08:42:28] moritzm: I preped https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1185859 for when netflow3004 is ready in its final role, ping me and I can roll it [08:52:06] ack, I'll ping you! [09:00:55] XioNoX: I've applied the role now, but you need to wait 30 mins until Puppet has run on all Kafka nodes [09:01:29] sounds good! [09:01:55] can you ping me when the Capirca/homer changes are deployed? [09:02:19] sure [09:02:27] excellent [09:43:35] moritzm: I deployed the change on all the mgmt routers and esams devices, will do the rest of the infra progressively [09:43:52] moritzm: netflow3003 is ready for decom, do you want to take care of it or should I? [09:52:58] I'm fine either way, I'm currently looking into some other tasks, if you have a moment go ahead, otherwise I'll take care of it in the afternoon [09:53:06] sure, on it [09:53:16] ack! [09:56:44] moritzm: easy one for you https://gerrit.wikimedia.org/r/c/operations/puppet/+/1185873 [09:57:41] thx [09:57:41] already +1d :-) [14:14:05] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [14:30:39] 10netops, 06Infrastructure-Foundations, 06SRE: Create alerting for saturation on sub-rated interfaces - https://phabricator.wikimedia.org/T374614#11157827 (10cmooney) I have set the bandwidth to '6000000000' either side manually in the UI so let's see how it goes. [14:40:17] 10Mail, 06Infrastructure-Foundations: Investigate options for outbound email redundancy for mediawiki on kubernetes - https://phabricator.wikimedia.org/T370006#11157889 (10CDanis) Looks like we might not actually need this? https://phabricator.wikimedia.org/T325131#8480824 [14:42:28] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845#11157897 (10cmooney) Is there anything remaining to do on this task? Looks like we have enough space now after the change in... [14:43:16] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Puppet module hiera_lookup not working - https://phabricator.wikimedia.org/T378331#11157913 (10jhathaway) a:03jhathaway [14:47:46] 10CAS-SSO, 06Infrastructure-Foundations: Error authenticating with services on CAS 7.1 - https://phabricator.wikimedia.org/T394759#11157950 (10SLyngshede-WMF) 05Open→03Resolved [14:57:19] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Puppet module hiera_lookup not working - https://phabricator.wikimedia.org/T378331#11158026 (10Volans) It works fine for me: ` >>> p.hiera_lookup('cumin1003.eqiad.wmnet', 'profile::puppet::agent::force_puppet7') DRY-RUN: Executing commands ['puppet look... [14:58:56] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Puppet module hiera_lookup not working - https://phabricator.wikimedia.org/T378331#11158044 (10jhathaway) >>! In T378331#11158026, @Volans wrote: > It works fine for me: ok to close then? [15:04:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:05:24] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Puppet module hiera_lookup not working - https://phabricator.wikimedia.org/T378331#11158104 (10Volans) 05Open→03Resolved This might have been related to the migration to puppet7 and the new puppetdb hosts probably. I can't recall. Resolving as i... [15:09:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:14:04] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [16:11:05] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [16:23:05] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [16:48:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [17:11:05] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [17:18:56] FIRING: MaxConntrack: Max conntrack at 82.41% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [17:23:05] RESOLVED: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [17:23:25] RESOLVED: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [17:28:56] RESOLVED: MaxConntrack: Max conntrack at 84.27% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [17:30:55] FIRING: MaxConntrack: Max conntrack at 84.54% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [17:45:56] RESOLVED: MaxConntrack: Max conntrack at 83.59% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [17:49:54] 10Mail, 06FR-donorrelations, 06Infrastructure-Foundations, 06SRE: Donations@ doesn't forward to donate@ - https://phabricator.wikimedia.org/T403986#11159151 (10Aklapper) [18:08:46] 10netops, 06Infrastructure-Foundations, 06SRE: Management routers: use BGP instead of OSPF - https://phabricator.wikimedia.org/T294845#11159215 (10Papaul) BGP is up on mr1-eqsin cr2/3-eqsin ` mr1-eqsin# run show route protocol ospf inet.0: 198 destinations, 200 routes (198 active, 0 holddown, 0 hidden) Res... [18:28:18] 10Mail, 06Infrastructure-Foundations: Investigate options for outbound email redundancy for mediawiki on kubernetes - https://phabricator.wikimedia.org/T370006#11159251 (10jhathaway) Thanks @CDanis that is also worth looking into as a redundancy option. [19:02:05] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [19:02:13] 10Mail, 06FR-donorrelations, 06Infrastructure-Foundations, 06SRE: Donations@ doesn't forward to donate@ - https://phabricator.wikimedia.org/T403986#11159409 (10Reedy) Isn't this handled by ITS these days? [19:16:45] 10Mail, 06FR-donorrelations, 06Infrastructure-Foundations, 06SRE: Donations@ doesn't forward to donate@ - https://phabricator.wikimedia.org/T403986#11159496 (10Aklapper) Likely... Are there discoverable docs which describe how anyone could find out somehow? [19:16:50] 10Mail, 06FR-donorrelations, 06Infrastructure-Foundations, 06SRE: Donations@ doesn't forward to donate@ - https://phabricator.wikimedia.org/T403986#11159499 (10Dzahn) While there are still some special cases that forward mail to donate@ in files controlled by SRE these should all be about wikipedia.org (fo... [19:17:20] 10Mail, 06FR-donorrelations, 06Infrastructure-Foundations, 06SRE: Donations@ doesn't forward to donate@ - https://phabricator.wikimedia.org/T403986#11159501 (10Dzahn) >>! In T403986#11159496, @Aklapper wrote: > Likely... Are there discoverable docs which describe how anyone could find out somehow? Not sin... [19:20:12] 10Mail, 06FR-donorrelations, 06Infrastructure-Foundations, 06SRE: Donations@ doesn't forward to donate@ - https://phabricator.wikimedia.org/T403986#11159506 (10ssingh) >>! In T403986#11159409, @Reedy wrote: > Isn't this handled by ITS these days? [Adding @jhathaway] We do handle some aliases (mostly the... [20:02:29] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [20:43:28] 10Mail, 06FR-donorrelations, 06Infrastructure-Foundations, 06SRE: Donations@ doesn't forward to donate@ - https://phabricator.wikimedia.org/T403986#11160029 (10jhathaway) On the mx-in servers you can obtain routing information via `sendmail -bv`, however it is a bit more annoying to work with compared to `... [20:51:15] 10Mail, 06FR-donorrelations, 06Infrastructure-Foundations, 06SRE: Donations@ doesn't forward to donate@ - https://phabricator.wikimedia.org/T403986#11160064 (10DSeyfert_WMF) Thank you everyone - the history of this address is why I wanted to confirm, thank you for your help! We'd greatly appreciate if ther... [21:16:49] 07Puppet, 06Release-Engineering-Team, 06SRE: docker-registry "Last updated at" text hiding under scrollbar - https://phabricator.wikimedia.org/T404008#11160187 (10Reedy) [21:28:51] 10Mail, 06FR-donorrelations, 06Infrastructure-Foundations, 06SRE: Donations@ doesn't forward to donate@ - https://phabricator.wikimedia.org/T403986#11160276 (10Dzahn) > sendmail -bv aha, thanks for adding that, @jhathaway > discoverable way we can check this Not really, because that is in a non-public... [21:31:32] 10Mail, 06FR-donorrelations, 06Infrastructure-Foundations, 06SRE: Donations@ doesn't forward to donate@ - https://phabricator.wikimedia.org/T403986#11160295 (10Dzahn) Another way to look for history is to browse the title of the subtasks of T122144. [23:06:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/debian synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag