[00:02:25] RESOLVED: SystemdUnitFailed: dump_proxy_ranges.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:51:07] 10SRE-tools, 06Infrastructure-Foundations: offboard-user: Migrate Phabricator API access from user.query() to user.search() - https://phabricator.wikimedia.org/T420324 (10MoritzMuehlenhoff) 03NEW [10:06:39] 10SRE-tools, 06Infrastructure-Foundations, 10Phabricator: offboard-user: Migrate Phabricator API access from user.query() to user.search() - https://phabricator.wikimedia.org/T420324#11717595 (10Aklapper) FYI pretty similar tasks: https://phabricator.wikimedia.org/maniphest/query/lV7c54v0tL3z/#R [12:24:06] 10netops, 06Infrastructure-Foundations: esams/magru: 185.71.138.0/24 (wikidough) prefix not advertized - https://phabricator.wikimedia.org/T420342 (10ayounsi) 03NEW p:05Triage→03High [12:24:18] 10netops, 06Infrastructure-Foundations: esams/magru: 185.71.138.0/24 (wikidough) prefix not advertized - https://phabricator.wikimedia.org/T420342#11718192 (10ayounsi) [12:38:02] 10netops, 06Infrastructure-Foundations, 06Traffic, 13Patch-For-Review: esams/magru: 185.71.138.0/24 (wikidough) prefix not advertized - https://phabricator.wikimedia.org/T420342#11718237 (10ayounsi) [12:43:20] 10netops, 06Infrastructure-Foundations: Drain ssw1-d1-eqiad and reset BGP EVPN sessions to force new vxlan tunnel establishment - https://phabricator.wikimedia.org/T420180#11718283 (10taavi) [12:47:41] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 06SRE, 06Data-Platform-SRE (2026-03-06 - 2026-03-27): Eqiad: lsw1-c2-eqiad BGP maintenance/ Tuesday 17th at 9:30 CDT - https://phabricator.wikimedia.org/T420158#11718297 (10cmooney) 05Open→03Declined This won't be required now, we have res... [12:48:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:49:03] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 06SRE, 06Data-Platform-SRE (2026-03-06 - 2026-03-27): Eqiad: lsw1-c7-eqiad BGP maintenance/ Thursday 19th at 10:00 am CDT - https://phabricator.wikimedia.org/T420159#11718300 (10cmooney) 05Open→03Declined This won't be needed now, we were... [12:53:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:13:17] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr3-eqsin:9804) - https://phabricator.wikimedia.org/T419859#11718437 (10ayounsi) 05Open→03Invalid I go through the karma dashboard from time to time. I prefer to have the peering sessions on... [13:13:45] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr3-eqsin:9804) - https://phabricator.wikimedia.org/T419858#11718441 (10ayounsi) 05Open→03Invalid I go through the karma dashboard from time to time. I prefer to have the peering sessions on... [13:13:58] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr1-esams:9804) - https://phabricator.wikimedia.org/T419857#11718445 (10ayounsi) 05Open→03Invalid I go through the karma dashboard from time to time. I prefer to have the peering sessions on... [13:14:04] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr1-esams:9804) - https://phabricator.wikimedia.org/T419856#11718448 (10ayounsi) 05Open→03Invalid I go through the karma dashboard from time to time. I prefer to have the peering sessions on... [13:14:10] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr3-eqsin:9804) - https://phabricator.wikimedia.org/T419855#11718451 (10ayounsi) 05Open→03Invalid I go through the karma dashboard from time to time. I prefer to have the peering sessions on... [13:14:18] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr3-eqsin:9804) - https://phabricator.wikimedia.org/T419854#11718454 (10ayounsi) 05Open→03Invalid I go through the karma dashboard from time to time. I prefer to have the peering sessions on... [13:43:19] 10netops, 06Infrastructure-Foundations, 06SRE: Drain ssw1-d8-eqiad and reset BGP EVPN sessions to force new vxlan tunnel establishment - https://phabricator.wikimedia.org/T420351 (10cmooney) 03NEW p:05Triage→03Medium [13:43:25] 10netops, 06Infrastructure-Foundations, 06SRE: Drain ssw1-d8-eqiad and reset BGP EVPN sessions to force new vxlan tunnel establishment - https://phabricator.wikimedia.org/T420351#11718588 (10cmooney) [13:43:30] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 06SRE: Nokia SR-Linux DHCP Relay Bug - https://phabricator.wikimedia.org/T411054#11718589 (10cmooney) [13:49:33] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11718638 (10Papaul) [14:08:33] 10SRE-tools, 10Cumin, 06Infrastructure-Foundations: Add proxy support to cumin openstack backend - https://phabricator.wikimedia.org/T420360 (10fgiunchedi) 03NEW [14:19:54] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops-radar: Add --min-uptime to cookbooks - https://phabricator.wikimedia.org/T419967#11718805 (10fgiunchedi) I also was wondering about a resumable rolling reboot feature for cookbooks and found this task, and of course I'm +1! The way I understand the feat... [14:22:59] 07Puppet, 06collaboration-services, 10Gerrit, 06Infrastructure-Foundations: Edit puppet-merge to use gerrit.discovery.wmnet instead of gerrit.wikimedia.org? - https://phabricator.wikimedia.org/T420184#11718810 (10ABran-WMF) [14:23:29] 07Puppet, 06collaboration-services, 10Gerrit, 06Infrastructure-Foundations: Edit puppet-merge to use gerrit.discovery.wmnet instead of gerrit.wikimedia.org? - https://phabricator.wikimedia.org/T420184#11718811 (10ABran-WMF) p:05Triage→03Low [14:24:52] 07Puppet, 06collaboration-services, 10Gerrit, 06Infrastructure-Foundations: Change puppet-merge git origin to use gerrit.discovery.wmnet instead of gerrit.wikimedia.org - https://phabricator.wikimedia.org/T420184#11718825 (10ABran-WMF) [14:45:08] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops-radar: Add --min-uptime to cookbooks - https://phabricator.wikimedia.org/T419967#11718942 (10JMeybohm) I think usability wise it might be more helpful to have an argument which takes the date and time after which a reboot is expected. So something like... [15:14:09] 10netops, 06Infrastructure-Foundations, 06SRE: Drain ssw1-d8-eqiad and reset BGP EVPN sessions to force new vxlan tunnel establishment - https://phabricator.wikimedia.org/T420351#11719150 (10cmooney) 05Open→03Resolved Ok this work is now complete. Only had to reset the tunnel on `lsw1-d4-eqiad` it w... [15:15:57] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 06SRE: Nokia SR-Linux DHCP Relay Bug - https://phabricator.wikimedia.org/T411054#11719160 (10cmooney) p:05Medium→03Low Ok all vxlan tunnels right now on row c/d leaf switches to ssw1-d1-eqiad and ssw1-d8-eqiad have a valid vxlan tunnel id. So u... [16:24:45] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: Update ULSFO LVS service IP's - https://phabricator.wikimedia.org/T418971#11719697 (10Fabfur) Procedure from the traffic perspective should be roughly - Depool ulsfo (around 0900UTC) and wait about 30' for all connections... [17:13:56] FIRING: ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:18:56] FIRING: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:53:56] RESOLVED: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [18:30:39] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: Update ULSFO LVS service IP's - https://phabricator.wikimedia.org/T418971#11720532 (10ssingh) >>! In T418971#11719697, @Fabfur wrote: > Procedure from the traffic perspective should be roughly > > - Depool ulsfo (around 0... [21:06:08] 10Mail, 06Infrastructure-Foundations, 10Observability-Logging: Allow IT Services to view inbound email logs - https://phabricator.wikimedia.org/T419906#11720988 (10taavi) >>! In T419906#11713781, @jhathaway wrote: > good catch, should be that a developer account is required, amending This is still not corre... [21:09:03] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops-radar: Add --min-uptime to cookbooks - https://phabricator.wikimedia.org/T419967#11720994 (10Ajuanca) What's task `T419960` about? I don't enough privilegies to access it. Yes, I think a parameter with expressive reboot time is more robust than a relati... [21:09:07] 10Mail, 06Infrastructure-Foundations, 10Observability-Logging: Allow IT Services to view inbound email logs - https://phabricator.wikimedia.org/T419906#11720997 (10jhathaway) >>! In T419906#11720988, @taavi wrote: >>>! In T419906#11713781, @jhathaway wrote: >> good catch, should be that a developer account i... [21:11:32] 10Mail, 06Infrastructure-Foundations, 10Observability-Logging: Allow IT Services to view inbound email logs - https://phabricator.wikimedia.org/T419906#11721001 (10jhathaway) [22:49:56] FIRING: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [22:54:56] RESOLVED: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown