[05:18:24] (SystemdUnitFailed) firing: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:48:24] (SystemdUnitFailed) resolved: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:34:26] 10netops, 10Infrastructure-Foundations, 10ops-codfw: Decommission asw-b1-codfw - https://phabricator.wikimedia.org/T342076 (10ayounsi) p:05Triage→03Low [06:35:01] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10ayounsi) [06:35:09] 10netops, 10Infrastructure-Foundations, 10ops-codfw: Decommission asw-b1-codfw - https://phabricator.wikimedia.org/T342076 (10ayounsi) [07:08:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Decommission asw-b1-codfw - https://phabricator.wikimedia.org/T342076 (10ayounsi) [07:33:54] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Decommission asw-b1-codfw - https://phabricator.wikimedia.org/T342076 (10ayounsi) a:03ayounsi [07:42:57] 10netops, 10Infrastructure-Foundations, 10SRE: Packet Drops on Eqiad ASW -> CR uplinks - https://phabricator.wikimedia.org/T291627 (10cmooney) 05Open→03Resolved I’m going to close this task for now. The problem has been mitigated as best as possible with the current equipment we have. In time replacing... [08:08:31] 10SRE-tools, 10Spicerack: Spicerack: don't write logs to disk - https://phabricator.wikimedia.org/T342079 (10ayounsi) [08:36:21] 10SRE-tools, 10Infrastructure-Foundations: Add GraphQL support to wmflib - https://phabricator.wikimedia.org/T341968 (10ayounsi) [09:02:33] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) >>! In T320390#9018611, @Jelto wrote: > ... > There are two settings which we may test, one is `send_scope_to_token_endpoin... [09:35:29] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) I looked at the GitLab `gitlabhq_production` database and `identities` table. I connected to the psql database using: `sud... [09:57:29] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10fgiunchedi) [10:20:29] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10jbond) >>! In T320390#9022500, @Jelto wrote: >>>! In T320390#9018611, @Jelto wrote: >> ... >> There are two settings which we may... [11:26:34] 10netops, 10Infrastructure-Foundations, 10SRE: TLS certificates for network devices - https://phabricator.wikimedia.org/T334594 (10ayounsi) `name=SONiC refresh needed verbose ayounsi@cumin1001:~$ sudo cookbook -v sre.network.tls lsw1-e8-eqiad START - Cookbook sre.network.tls for network device lsw1-e8-eqiad... [11:31:31] 10Puppet, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10SRE, 10User-Joe: puppetmaster hostcert and hostprivkey point to nonexistent files - https://phabricator.wikimedia.org/T179099 (10jbond) [12:53:08] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) >>! In T320390#9022808, @jbond wrote: > > > The [[ https://docs.gitlab.com/ee/integration/omniauth.html#link-existing-use... [13:03:12] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10cloud-services-team (FY2022/2023-Q4): [spicerack] support including {project} in SAL messages - https://phabricator.wikimedia.org/T341793 (10fnegri) 05Open→03In progress p:05Triage→03High [13:03:18] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review, 10cloud-services-team (FY2022/2023-Q4): Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL - https://phabricator.wikimedia.org/T325756 (10fnegri) [13:09:37] 10CFSSL-PKI, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10SRE, 10Puppet (Puppet 7.0): Create dynamic CRL - https://phabricator.wikimedia.org/T340543 (10jbond) [14:58:41] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Unrelated DNS diffs shown if decommission and makevm cookbooks run at the same time - https://phabricator.wikimedia.org/T342130 (10ayounsi) [15:58:24] (SystemdUnitFailed) firing: puppetserver.service Failed on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:05:14] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Unrelated DNS diffs shown if decommission and makevm cookbooks run at the same time - https://phabricator.wikimedia.org/T342130 (10bking) Was thinking a bit more about this...would it work to do some minimal sanity-checking on the DNS changes (such as t... [16:07:00] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Unrelated DNS diffs shown if decommission and makevm cookbooks run at the same time - https://phabricator.wikimedia.org/T342130 (10jbond) i think this will ultmatly be solved by adding locking support to cookbooks, see T341973 [16:08:07] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Unrelated DNS diffs shown if decommission and makevm cookbooks run at the same time - https://phabricator.wikimedia.org/T342130 (10jbond) > It looks like there is work in progess to add locking to cookbooks , which would be an acceptable workaround. ind... [16:08:26] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Unrelated DNS diffs shown if decommission and makevm cookbooks run at the same time - https://phabricator.wikimedia.org/T342130 (10jbond) [16:08:30] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: Spicerack: add distributed locking support - https://phabricator.wikimedia.org/T341973 (10jbond) [16:18:24] (SystemdUnitFailed) resolved: puppetserver.service Failed on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:50:35] 10netops, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE, and 2 others: Alertmanager rule for network interface errors? - https://phabricator.wikimedia.org/T335350 (10lmata) [22:50:51] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, 10observability: Investigate Junos Prometheus exporter - https://phabricator.wikimedia.org/T333210 (10lmata) [23:16:12] 10SRE-tools, 10Infrastructure-Foundations, 10Observability-Alerting, 10observability: RAID check opened a ticket for kubernetes2012 while it was being reimaged - https://phabricator.wikimedia.org/T330150 (10lmata) [23:19:57] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, 10observability: Prometheus: ingest SONiC metrics - https://phabricator.wikimedia.org/T335027 (10lmata)