[08:34:50] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) Staging the new version on the switches: `asw-a-codfw> request system software add force-host set [ /var/tmp/jinstall-ex-4300-21.4R3-S1.5-sig... [09:23:20] 10Puppet, 10Cloud-VPS, 10Data-Persistence, 10Infrastructure-Foundations, and 2 others: haproxy::site doesn't work as expected on the first puppet run - https://phabricator.wikimedia.org/T321684 (10taavi) [09:25:46] 10SRE-tools, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: WMCS VIPs: Netbox netmask inconsistencies - https://phabricator.wikimedia.org/T295774 (10taavi) [09:31:41] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team: Consider ways to make puppetmaster CA changes smoother on the puppet client end - https://phabricator.wikimedia.org/T220268 (10taavi) [09:39:09] 10Puppet, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE: The Rack Puppet master server is deprecated and will be removed in a future release. Please use Puppet Server instead. - https://phabricator.wikimedia.org/T185815 (10taavi) 05Open→03Invalid I suspect this will be fixed by the Puppet 7 upg... [10:03:05] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations: Expose public hostname as Fact in puppet - https://phabricator.wikimedia.org/T101903 (10taavi) [10:03:33] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations: Expose public hostname as Fact in puppet - https://phabricator.wikimedia.org/T101903 (10taavi) 05Open→03Declined Per above. [10:16:34] 10Puppet, 10Beta-Cluster-Infrastructure, 10Infrastructure-Foundations, 10Release-Engineering-Team (Seen), 10User-Joe: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675 (10taavi) [10:17:04] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Technical-Debt: Convert all of our site.pp/roles to the role/profile paradigm - https://phabricator.wikimedia.org/T159412 (10taavi) [10:17:23] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Technical-Debt: Uniform cluster nomenclature across puppet - https://phabricator.wikimedia.org/T159411 (10taavi) [10:35:57] 10Puppet, 10Cloud-VPS, 10Data-Persistence, 10Infrastructure-Foundations, and 3 others: haproxy::site doesn't work as expected on the first puppet run - https://phabricator.wikimedia.org/T321684 (10jbond) I have created a [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/887292 | CR ]] to force full... [11:31:45] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10jbond) [11:58:31] 10netops, 10Infrastructure-Foundations, 10SRE: Allow managing drmrs DHCP settings with Homer - https://phabricator.wikimedia.org/T328737 (10cmooney) [11:58:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Consolidate Automation Templates for DC Switches - https://phabricator.wikimedia.org/T312635 (10cmooney) [11:59:18] 10netops, 10Infrastructure-Foundations, 10SRE: Allow managing drmrs DHCP settings with Homer - https://phabricator.wikimedia.org/T328737 (10cmooney) Thanks @MoritzMuehlenhoff, I can roll this into the work to unify the asw configs across the board. We have it automated for similar switches (lsw, cloudsw) el... [12:18:00] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10herron) [12:26:38] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ssingh) [12:39:22] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Vgutierrez) [12:41:08] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Joe) To depool all services in codfw we will just need to run: ` sudo cookbook sre.discovery.datacenter-route --reason 'T327925' depool codfw ` from... [12:41:47] 10netops, 10DBA, 10Data-Persistence, 10Infrastructure-Foundations, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) @Joe @akosiaris I assume we'll depool codfw for this one too? [12:46:15] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Joe) Please note: this won't depool `docker-registry`, which will still be active in codfw for the duration of the maintenance. [13:26:54] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10jcrespo) [13:33:57] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) For the record, full row hosts downtime done with: `sudo cookbook sre.hosts.downtime --hours 2 -r "codfw row A upgrade" -t T327925 'P{P:netbo... [13:34:24] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=295bf4d5-8856-488b-9ca9-06a0ff06db18) set by ayounsi@cumin1001 for 2:00:00 on 199 hos... [14:15:20] 10netops, 10DBA, 10Data-Persistence, 10Infrastructure-Foundations, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10akosiaris) >>! In T327991#8593396, @Marostegui wrote: > @Joe @akosiaris I assume we'll depool codfw for this one too? Yeah, as a team we are similarl... [14:24:40] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Check console cable for asw-a2-codfw - https://phabricator.wikimedia.org/T329055 (10cmooney) p:05Triage→03Low [15:28:58] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Clement_Goubert) [15:29:55] 10netops, 10DBA, 10Data-Persistence, 10Infrastructure-Foundations, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Clement_Goubert) [15:30:20] 10netops, 10DBA, 10Data-Persistence, 10Infrastructure-Foundations, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Clement_Goubert) [15:37:55] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [15:39:33] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) 05Open→03Resolved a:03ayounsi The upgrade was smooth, ~15min hard downtime. No user impact, all the depools did their job. There was so... [15:50:05] 10netops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10colewhite) [16:03:32] 10Mail, 10DNS, 10Infrastructure-Foundations, 10SRE, and 4 others: Add SPF records for gitlab.wikimedia.org - https://phabricator.wikimedia.org/T328642 (10eoghan) 05Open→03Resolved I've deployed the softfail records and checked that they're in place: ` ❯ for i in 0 1 2; do ns=ns${i}.wikimedia.org... [16:21:27] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [16:22:08] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [16:27:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Check console cable for asw-a2-codfw - https://phabricator.wikimedia.org/T329055 (10Papaul) 05Open→03Resolved a:03Papaul The port was moved on the console server from port 18 to port 41 some days back when we did have some issues but I never... [17:00:43] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-drmrs: cr2-drmrs:xe-0/1/1 stuck optic - https://phabricator.wikimedia.org/T324555 (10RobH) 05In progress→03Resolved remote hands successfully removed the optic this AM and placed it in our racks, we'll just have it thrown away next remote hands work... [17:06:13] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Enable OIDC in CAS - https://phabricator.wikimedia.org/T311999 (10BTullis) Hi @jbond and @MoritzMuehlenhoff - how are things looking with regard to this OIDC support? We would still like to be able to {T305874} using idp because the LDAP... [17:06:29] 10CAS-SSO, 10Data-Catalog, 10Data-Engineering, 10Infrastructure-Foundations: Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10BTullis) [17:06:35] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Enable OIDC in CAS - https://phabricator.wikimedia.org/T311999 (10BTullis) [17:06:47] 10CAS-SSO, 10Data-Catalog, 10Data-Engineering, 10Infrastructure-Foundations: Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10BTullis) [17:06:50] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE: Upgrade IDPs to CAS 6.6/Bullseye and enable webauthn - https://phabricator.wikimedia.org/T305518 (10BTullis) [17:13:58] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Enable OIDC in CAS - https://phabricator.wikimedia.org/T311999 (10jbond) @BTullis OIDC support is now possible and is being tried out by the new IDM. It should be to a state where you can start using it and happy to help out/provide more... [18:40:10] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10Papaul) fyi i tested connecting temporary the xe-0/0/47 to cr2 xe-5/0/0 link was okay ` papaul@re0.cr2-... [19:04:32] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Technical-Debt: Convert all of our site.pp/roles to the role/profile paradigm - https://phabricator.wikimedia.org/T159412 (10Dzahn) As far as I can tell nowadays there is no more node that uses multiple roles. Only one role at a time, s... [19:05:29] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Technical-Debt: Convert all of our site.pp/roles to the role/profile paradigm - https://phabricator.wikimedia.org/T159412 (10Dzahn) https://gerrit.wikimedia.org/r/q/topic:%22role-profile%22+(status:open%20OR%20status:merged) [20:37:47] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE: Enable OIDC in CAS - https://phabricator.wikimedia.org/T311999 (10BTullis) @jbond - Many thanks. That's excellent. I think I'd be keen to look at doing that and helping find out the issues. I've asked the #data-engineering team so I'll get back to you in a coup... [22:20:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, and 2 others: Review default ferm INPUT policy - https://phabricator.wikimedia.org/T264888 (10taavi)