[00:22:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-OnFire, and 2 others: asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10RLazarus) [00:23:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10RLazarus) [04:51:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul) [12:25:56] (HAProxyEdgeTrafficDrop) firing: 66% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [12:30:56] (HAProxyEdgeTrafficDrop) resolved: 65% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [15:59:04] vgutierrez: thanks for the +1 -- are you comfortable +2ing it? I have +2, but I haven't shepherded VCL changes and don't know the failure modes. [16:05:00] I'll do that tomorrow morning if that's OK with you [16:05:13] 10Traffic, 10Observability-Alerting, 10SRE, 10Patch-For-Review, 10User-fgiunchedi: Migrate Traffic Prometheus alerts from Icinga to Alertmanager - https://phabricator.wikimedia.org/T300723 (10BCornwall) 05In progress→03Resolved [16:08:53] vgutierrez: yes absolutely :) thank you again for the review [17:59:52] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: cr2-eqiad:FPC3 partial failure (PIC2/3) - https://phabricator.wikimedia.org/T312745 (10wiki_willy) RMA shipped out by Chris on Tuesday, July 26 >>! In T312745#8088364, @Cmjohnson wrote: > Replaced the line card, and placed the old one in the same... [18:53:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul) [18:56:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul) @ayounsi I moved asw2-c and asw2-d uplink from 0/0 and 0/1 to 1/0 and 1/3 on both router to match codfw. In the future if we have to change row A and ro... [19:29:26] 10netops, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): More public IPs for codfw1dev - https://phabricator.wikimedia.org/T313977 (10Andrew) [19:31:03] 10netops, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): More public IPs for codfw1dev - https://phabricator.wikimedia.org/T313977 (10Andrew) for quick reference: https://netbox.wikimedia.org/ipam/prefixes/5/prefixes/ [19:44:36] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul) @Jclark-ctr when you have time, can you plug on : ### CR1-eqiad to front of the patch panel port 0/1: 1 breakout cable and the first break out cable go... [19:47:12] 10netops, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): More public IPs for codfw1dev - https://phabricator.wikimedia.org/T313977 (10cmooney) I’ve no objection in principal. 185.15.57.12/30 is currently unallocated in Netbox. 185.15.57.16/29 is reserved there with description *“Temporary an... [20:48:50] 10Traffic, 10MediaWiki-Action-API, 10Wikimedia-production-error: API not responding (overflow) - https://phabricator.wikimedia.org/T313986 (10AlexisJazz) [20:49:32] 10Traffic, 10MediaWiki-Action-API, 10Wikimedia-production-error: API not responding (overflow) - https://phabricator.wikimedia.org/T313986 (10AlexisJazz) [20:50:24] 10Traffic, 10MediaWiki-Action-API, 10Wikimedia-production-error: API not responding (overflow) - https://phabricator.wikimedia.org/T313986 (10RhinosF1) We are aware of ongoing issues [20:51:26] 10Traffic, 10DBA, 10MediaWiki-Action-API, 10SRE, and 2 others: API not responding (overflow) - https://phabricator.wikimedia.org/T313986 (10RhinosF1) [20:52:21] 10Traffic, 10DBA, 10MediaWiki-Action-API, 10SRE, and 2 others: API not responding (overflow) - https://phabricator.wikimedia.org/T313986 (10RhinosF1) {T311106} [20:58:56] 10Traffic, 10Data-Persistence (Consultation), 10MediaWiki-Action-API, 10SRE, and 2 others: API not responding (overflow) - https://phabricator.wikimedia.org/T313986 (10Marostegui) Not sure what's expected from the DBAs here. There's a chain on things that got db1132 overloaded. [21:14:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): More public IPs for codfw1dev - https://phabricator.wikimedia.org/T313977 (10Andrew) 185.15.57.12/30 should be enough, so let's start with that. With luck that'll be all we need, and we can leave it as a permanent change. In t...