[02:47:13] (SystemdUnitFailed) firing: (2) generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:47:13] (SystemdUnitFailed) firing: (2) generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:02:39] volans|off: for when you come back, my patch for netbox-deploy repo might be ready to go, I replied on your latest comment (before the summit) https://gerrit.wikimedia.org/r/c/operations/software/netbox-deploy/+/1004192 [08:46:11] moritzm: Just did another reimaging of idp-test1003 with the new package. We're now up and running. I'm just doing a bit of testing and log reading [08:48:16] \o/ [08:49:25] It complains a bit about a ticket granting ticket, not sure if it's important [08:50:02] which log file? having a look [08:50:25] they are a bit noisy, it's not unlikely the same applies to the current installs [08:50:36] In /var/log/cas/cas.log [08:51:52] there's also some connection errors to ldap-ro.eqiad.wikimedia.org [08:52:05] Yeah, just popped up [08:53:34] The TGT for Slyngshede seems correctly created, though [08:54:29] although, the line after that does seem to indicate some error indeed [08:55:10] so possibly it's created, but it failed to store it correctly in session storage/memcached or so [08:58:08] Seems like a good guess, signing out also yields: Ticket-granting ticket [TGT-5-********Qd4-jB8-idp-test1003] cannot be found in the ticket registry. [09:05:31] ack, I'll have a closer look later [09:16:20] 10CAS-SSO, 06Infrastructure-Foundations: Migrate CAS to Bookworm - https://phabricator.wikimedia.org/T357748#9640963 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1002 for host idp-test1003.wikimedia.org with OS bookworm [09:21:22] 10CAS-SSO, 06Infrastructure-Foundations: Migrate CAS to Bookworm - https://phabricator.wikimedia.org/T357748#9640992 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1002 for host idp-test1003.wikimedia.org with OS bookworm completed: - idp-test1003 (**PASS**) - Downtime... [09:59:31] 10CAS-SSO, 10Gerrit, 06Infrastructure-Foundations, 06SRE, and 2 others: 14Add logout.d script for Gerrit - 14https://phabricator.wikimedia.org/T286905#9641230 (10hashar) 05Open→03Declined 14Users are blocked in Gerrit via wikitech Special:Block which had some recent fixes as part of T307558. [09:59:36] 10CAS-SSO, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: 14Cookbook for centralised logouts and session status queries - 14https://phabricator.wikimedia.org/T283242#9641233 (10hashar) [10:21:59] (SystemdUnitFailed) firing: (3) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:46:59] (SystemdUnitFailed) firing: (3) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:57:02] 10netops, 06DC-Ops, 06Infrastructure-Foundations: Take advantage of 10Gb NICs in the new network stack - https://phabricator.wikimedia.org/T360297#9642004 (10elukey) For the ML hosts - our K8s clusters don't currently require 10G bandwidth, and at the time we didn't want to "waste" 10G ports if not really ne... [13:59:33] 10netops, 06DC-Ops, 06Infrastructure-Foundations: Take advantage of 10Gb NICs in the new network stack - https://phabricator.wikimedia.org/T360297#9642027 (10ayounsi) {F42751975} {F42751976} Feel free to test it on Netbox next The steps to follow once this script is deployed : # (Optional) Upgrade idrac... [14:04:30] 10netops, 06DC-Ops, 06Infrastructure-Foundations: Take advantage of 10Gb NICs in the new network stack - https://phabricator.wikimedia.org/T360297#9642060 (10ayounsi) > For the ML hosts - our K8s clusters don't currently require 10G bandwidth, and at the time we didn't want to "waste" 10G ports if not really... [14:29:49] 10netbox, 06Infrastructure-Foundations: Netbox: use Journaling feature - https://phabricator.wikimedia.org/T310583#9642268 (10ayounsi) First use of the journaling feature in https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/1012680/ [14:38:08] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9642302 (10Papaul) Zeroize done on asw-a3 and asw-a4 [14:47:13] (SystemdUnitFailed) firing: (2) generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:04:13] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9642547 (10MoritzMuehlenhoff) [15:40:12] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9642787 (10MoritzMuehlenhoff) [15:47:01] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9642812 (10Papaul) Zeroize done on all the old switches in role a [15:55:21] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9642859 (10MoritzMuehlenhoff) [16:06:59] (SystemdUnitFailed) firing: (2) generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:08:31] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE, 13Patch-For-Review: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9642913 (10Papaul) [16:25:31] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE, 13Patch-For-Review: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9642974 (10Papaul) [16:31:09] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9642996 (10Papaul) [16:41:14] 10netops, 06DC-Ops, 06Infrastructure-Foundations: Take advantage of 10Gb NICs in the new network stack - https://phabricator.wikimedia.org/T360297#9643042 (10wiki_willy) Hi @elukey - do you want me to change the Lift Wing expansion requests for 16x servers in FY24-25 to 10g? Thanks, Willy [20:07:13] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:47:19] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9644060 (10Papaul)