[01:40:55] 06serviceops, 10Cassandra, 06Data Products, 06SRE, and 2 others: Commons Impact Metrics: Data Gateway endpoints - https://phabricator.wikimedia.org/T364921#9905781 (10Scott_French) [07:44:32] could I get a sanity check for https://gerrit.wikimedia.org/r/1047430 ? [07:56:31] <_joe_> moritzm: uh so there is a new irc server, I thought it was passive [07:59:37] the IRC servers are passive, they just act on the incoming UDP packets which get sent as notifications, but only the former baremetal servers were configured to sent to the bullseye hosts, we missed to add them for wikikube back in April 2023 [08:00:06] and it was bad timing, I was blaming the OS changes for the WIP bullseye setup until I eventually made the connection :-) [08:00:30] <_joe_> moritzm: so this was a year long incident? [08:00:44] please use https://wikitech.wikimedia.org/wiki/MediaWiki_On_Kubernetes#The_scap_way to deploy [08:01:33] no, no incident irc.wikimedia.org only pointed to irc1001 (the old buster host), which got all traffic, this only affected me when connecting to irc1002 manually to test the new setup [08:02:09] claime: yeah, I'll do that during the mediawiki infrastructure window later [08:02:13] ack [08:04:09] _if_ this would have been an incident and noone would have otherwise noticed for a year, I would have happily taken that as proof that the whole byzantine IRC setup has no remaining actual value and would have shut it down entirely [08:22:58] <_joe_> moritzm: exactly why I was asking :D [08:42:40] 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, 10Release-Engineering-Team (Seen): Turn down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949 (10Clement_Goubert) 03NEW [09:24:47] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, and 2 others: Relabel codfw wikikube worker nodes - https://phabricator.wikimedia.org/T367286#9906315 (10Clement_Goubert) 05Open→03Declined [09:46:59] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906401 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw2400 to wikikube-worker2011 completed: - mw2400 (**PASS*... [09:59:07] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906455 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw2403 to wikikube-worker2012 completed: - mw2403 (**PASS*... [10:06:19] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906481 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw2404 to wikikube-worker2013 completed: - mw2404 (**PASS*... [10:13:08] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906484 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw2405 to wikikube-worker2014 completed: - mw2405 (**PASS*... [10:18:34] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906495 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw2408 to wikikube-worker2017 completed: - mw2408 (**PASS*... [10:23:26] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906500 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw2409 to wikikube-worker2018 completed: - mw2409 (**PASS*... [10:23:55] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906502 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2011.codfw.wmnet with OS bullseye [10:24:28] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906503 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2012.codfw.wmnet with OS bullseye [10:24:42] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906504 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2013.codfw.wmnet with OS bullseye [10:25:03] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906505 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2014.codfw.wmnet with OS bullseye [10:25:35] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906509 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2017.codfw.wmnet with OS bullseye [10:25:50] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906510 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2018.codfw.wmnet with OS bullseye [10:32:19] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T367736#9906530 (10Clement_Goubert) [11:01:51] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906615 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2011.codfw.wmnet with OS bullseye completed: - wikikube-work... [11:03:29] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906619 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2018.codfw.wmnet with OS bullseye completed: - wikikube-work... [11:08:05] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906624 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2017.codfw.wmnet with OS bullseye completed: - wikikube-work... [11:11:30] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906625 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2014.codfw.wmnet with OS bullseye completed: - wikikube-work... [11:15:14] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906630 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2013.codfw.wmnet with OS bullseye completed: - wikikube-work... [11:17:58] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9906631 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2012.codfw.wmnet with OS bullseye completed: - wikikube-work... [13:37:01] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate wikikube control planes to hardware nodes - https://phabricator.wikimedia.org/T353464#9906991 (10jijiki) 05Stalled→03In progress [13:48:18] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Moving 1G servers out of rack D4 in prep of switch migration - https://phabricator.wikimedia.org/T361856#9907018 (10Papaul) @Clement_Goubert it is a U.S holiday today can we please rescheduled this for tomorrow . Thank you Sorry about that [13:57:57] 06serviceops: Update app.job module in deployment-charts - https://phabricator.wikimedia.org/T356885#9907044 (10jijiki) [13:58:37] 06serviceops: Update app.job module in deployment-charts - https://phabricator.wikimedia.org/T356885#9907048 (10jijiki) [13:58:38] 06serviceops: ipoid charts app.job module has out of band changes - https://phabricator.wikimedia.org/T365224#9907046 (10jijiki) [13:58:40] 06serviceops, 07Kubernetes, 13Patch-For-Review: Fix rendering issue in modules.app.job when cronjobs are enabled and private values are defined - https://phabricator.wikimedia.org/T362954#9907047 (10jijiki) [14:29:05] 06serviceops, 13Patch-For-Review: Cleanup old Docker images running Debian Stretch - https://phabricator.wikimedia.org/T367427#9907196 (10elukey) Next steps: * package a new version of debmonitor-client with https://gerrit.wikimedia.org/r/1043780 * install the package on build2001, so that the Docker images re... [14:31:03] 06serviceops, 06Infrastructure-Foundations, 13Patch-For-Review: Cleanup old Docker images running Debian Stretch - https://phabricator.wikimedia.org/T367427#9907197 (10elukey) p:05Triage→03Medium a:03elukey [14:59:38] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Moving 1G servers out of rack D4 in prep of switch migration - https://phabricator.wikimedia.org/T361856#9907287 (10Clement_Goubert) No worries, I'll extend the downtime, and we'll leave it like that for you to move. [15:01:37] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Moving 1G servers out of rack D4 in prep of switch migration - https://phabricator.wikimedia.org/T361856#9907289 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=111d8ee1-db67-4ba6-a57a-50da8c8dc4ff) set by cgoubert@cumin1002 for 2 days, 0:00:0... [15:58:19] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: management and main interface down for mw2321.codfw.wmnet - https://phabricator.wikimedia.org/T367702#9907461 (10Jhancock.wm) tried a few things but ultimately had to power cycle the server to get it back up. Lemme know if it looks good. [16:17:48] 06serviceops, 06DC-Ops, 10decommission-hardware, 10ops-codfw, 06SRE: decommission mw2281.codfw.wmnet mw22[83-90].codfw.wmnet - https://phabricator.wikimedia.org/T367275#9907489 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm [16:29:19] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: management and main interface down for mw2321.codfw.wmnet - https://phabricator.wikimedia.org/T367702#9907521 (10Clement_Goubert) 05Open→03Resolved Looks back up, putting it back to Active and running homer brought back BGP connectivit... [16:40:32] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE-OnFire, 10Sustainability (Incident Followup): codfw:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366205#9907547 (10Papaul) All the netbox part is done waiting. [18:22:01] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE-OnFire, 10Sustainability (Incident Followup): codfw:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366205#9907715 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl2001.co... [18:35:33] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE-OnFire, 10Sustainability (Incident Followup): codfw:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366205#9907723 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl2001.co... [18:40:28] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE-OnFire, 10Sustainability (Incident Followup): codfw:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366205#9907725 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl2002.co... [18:48:32] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE-OnFire, 10Sustainability (Incident Followup): codfw:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366205#9907733 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikube-ctrl2002.codfw.... [18:49:18] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE-OnFire, 10Sustainability (Incident Followup): codfw:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366205#9907734 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl2002.co... [20:31:31] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE-OnFire, 10Sustainability (Incident Followup): codfw:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366205#9907853 (10Papaul) @kamila 2001 and 2002 are ready ` papaul@lsw1-b7-codfw> show interfaces descriptions | match wiki* xe-0/0/42...