[08:34:52] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) Staging the new version on the switches: `asw-a-codfw> request system software add force-host set [ /var/tmp/jinstall-ex-4300-21.4R3-S1.5-signed.tgz /... [09:27:36] 10Traffic, 10Cloud-VPS, 10DNS, 10SRE: PDNS in cloud can return inconsistent answers - https://phabricator.wikimedia.org/T281700 (10taavi) [11:31:47] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10jbond) [11:58:33] 10netops, 10Infrastructure-Foundations, 10SRE: Allow managing drmrs DHCP settings with Homer - https://phabricator.wikimedia.org/T328737 (10cmooney) [11:58:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Consolidate Automation Templates for DC Switches - https://phabricator.wikimedia.org/T312635 (10cmooney) [11:59:20] 10netops, 10Infrastructure-Foundations, 10SRE: Allow managing drmrs DHCP settings with Homer - https://phabricator.wikimedia.org/T328737 (10cmooney) Thanks @MoritzMuehlenhoff, I can roll this into the work to unify the asw configs across the board. We have it automated for similar switches (lsw, cloudsw) el... [12:18:02] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10herron) [12:26:40] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ssingh) [12:39:24] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Vgutierrez) [12:41:10] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Joe) To depool all services in codfw we will just need to run: ` sudo cookbook sre.discovery.datacenter-route --reason 'T327925' depool codfw ` from one of t... [12:41:49] 10Traffic, 10netops, 10DBA, 10Data-Persistence, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) @Joe @akosiaris I assume we'll depool codfw for this one too? [12:46:17] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Joe) Please note: this won't depool `docker-registry`, which will still be active in codfw for the duration of the maintenance. [13:26:56] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10jcrespo) [13:33:59] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) For the record, full row hosts downtime done with: `sudo cookbook sre.hosts.downtime --hours 2 -r "codfw row A upgrade" -t T327925 'P{P:netbox::host%l... [13:34:26] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=295bf4d5-8856-488b-9ca9-06a0ff06db18) set by ayounsi@cumin1001 for 2:00:00 on 199 host(s) and... [14:15:22] 10Traffic, 10netops, 10DBA, 10Data-Persistence, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10akosiaris) >>! In T327991#8593396, @Marostegui wrote: > @Joe @akosiaris I assume we'll depool codfw for this one too? Yeah, as a team we are similarly affected to row A... [14:24:42] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Check console cable for asw-a2-codfw - https://phabricator.wikimedia.org/T329055 (10cmooney) p:05Triage→03Low [15:29:00] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Clement_Goubert) [15:29:57] 10Traffic, 10netops, 10DBA, 10Data-Persistence, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Clement_Goubert) [15:30:22] 10Traffic, 10netops, 10DBA, 10Data-Persistence, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Clement_Goubert) [15:37:57] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [15:39:35] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) 05Open→03Resolved a:03ayounsi The upgrade was smooth, ~15min hard downtime. No user impact, all the depools did their job. There was some paging... [15:50:07] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10colewhite) [16:03:34] 10Traffic, 10DNS, 10Infrastructure-Foundations, 10Mail, and 4 others: Add SPF records for gitlab.wikimedia.org - https://phabricator.wikimedia.org/T328642 (10eoghan) 05Open→03Resolved I've deployed the softfail records and checked that they're in place: ` ❯ for i in 0 1 2; do ns=ns${i}.wikimedia.or... [16:19:56] 10Traffic, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 7 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10ayounsi) p:05Triage→03Medium [16:21:29] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [16:22:00] 10Traffic, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 7 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10ayounsi) [16:22:10] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [16:25:27] 10Traffic, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 7 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10ayounsi) [16:27:30] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Check console cable for asw-a2-codfw - https://phabricator.wikimedia.org/T329055 (10Papaul) 05Open→03Resolved a:03Papaul The port was moved on the console server from port 18 to port 41 some days back when we did have some issues but I never... [16:30:09] 10Traffic, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 8 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10colewhite) [17:00:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-drmrs: cr2-drmrs:xe-0/1/1 stuck optic - https://phabricator.wikimedia.org/T324555 (10RobH) 05In progress→03Resolved remote hands successfully removed the optic this AM and placed it in our racks, we'll just have it thrown away next remote hands work... [18:40:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10Papaul) fyi i tested connecting temporary the xe-0/0/47 to cr2 xe-5/0/0 link was okay ` papaul@re0.cr2-... [21:08:00] 10Traffic, 10SRE, 10Data Pipelines (Sprint 08): Document Impact of Jan 8&9 Traffic Data Loss - https://phabricator.wikimedia.org/T326658 (10EChetty) [22:20:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, and 2 others: Review default ferm INPUT policy - https://phabricator.wikimedia.org/T264888 (10taavi)