[01:15:55] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_exim4.service on mx-out1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:15:55] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_exim4.service on mx-out1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:30:32] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw rack C1 from asw-c1-codfw to lsw1-c1-codfw - https://phabricator.wikimedia.org/T373095#10116085 (10akosiaris) All wikikube hosts have been depooled. RESTBase and mc-wf should be good to do at anytime per comment... [08:25:54] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10vrts: generate_vrts_aliases failing on mx-in1001 - https://phabricator.wikimedia.org/T368257#10116220 (10Volans) The patch was needed, the last error was at Sep 03 17:08:32. After that it run smoothly except for one run at Sep 03 20:10:57 th... [08:28:53] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10vrts: generate_vrts_aliases failing on mx-in1001 - https://phabricator.wikimedia.org/T368257#10116221 (10LSobanski) So looks like we're back to the original problem only. [08:36:03] moritzm: ack thank you, yeah definitely alert2002 <-> lists1004 is switching back to iptables [08:37:02] moritzm: so far I can't explain the alert2002 -> durum1001:22 issue though, there's a repro a few lines down from the lists1004 message [08:37:20] I'll file a task [08:43:15] FYI m.oritz is out today ;0 [08:43:17] ;) [08:56:15] cheers, didn't notice [09:02:25] T373980 there we go [09:02:26] T373980: Hosts using nft are not reachable via ssh from alert[12]002 - https://phabricator.wikimedia.org/T373980 [09:15:55] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_exim4.service on mx-out1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:18:15] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 07Wikimedia-production-error: wikikube-worker2080.codfw.wmnet can't auth to registry - https://phabricator.wikimedia.org/T373982 (10Clement_Goubert) 03NEW p:05Triage→03High [09:48:26] 10SRE-tools, 06Infrastructure-Foundations, 13Patch-For-Review: Allow debmonitor to store the Debian version-id in the OS field - https://phabricator.wikimedia.org/T368744#10116523 (10elukey) 05Open→03Resolved @Volans checked in the debmonitor's DB and it seemed that only lvs3009 and cumin2002 were ho... [09:52:35] qq - I'd need to restart puppetdb (jvm) to pick up the new jvm version, is there a procedure aside from a simple restart? [09:53:49] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, and 2 others: wikikube-worker2080.codfw.wmnet can't auth to registry - https://phabricator.wikimedia.org/T373982#10116533 (10Clement_Goubert) 05In progress→03Resolved Pulling restricted images now works from `wikikube-worker2080`, resolving. [09:55:02] elukey: it might be outdated but there is https://wikitech.wikimedia.org/wiki/Service_restarts#puppetdb_2 [09:55:22] usually to avoid noisy alerts puppet gets disabled globally [09:56:16] or a good idea for a cookbook for the hackathon :-P [09:57:54] volans: ack thanks, I'll do it tomorrow morning :) [10:09:14] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116620 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw2325 to... [10:14:18] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116658 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by cgoubert@cumin1002 Renumb... [10:14:26] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116660 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host w... [10:54:26] 10CAS-SSO, 10netbox, 06Infrastructure-Foundations: Unable to log in to Netbox - https://phabricator.wikimedia.org/T373702#10116763 (10cmooney) >>! In T373702#10113687, @Southparkfan wrote: > Taavi re-added the group in bb989a1c77e3cd34b844dd19b5f352efd043716a. I'm not sure what's wrong either. Actually you'... [10:56:31] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116778 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw2316 to w... [10:57:49] 10CAS-SSO, 10netbox, 06Infrastructure-Foundations: Unable to log in to Netbox - https://phabricator.wikimedia.org/T373702#10116795 (10SLyngshede-WMF) Netbox is configured to ` SOCIAL_AUTH_ALLOW_GROUPS = ['ops', 'wmf'] ` so we might want to add nda there. I just think it's a little strange that it would tr... [11:01:43] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw rack C1 from asw-c1-codfw to lsw1-c1-codfw - https://phabricator.wikimedia.org/T373095#10116815 (10cmooney) >>! In T373095#10116085, @akosiaris wrote: > All wikikube hosts have been depooled. RESTBase and mc-wf... [11:04:48] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116839 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikik... [11:09:27] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116845 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw2318 to w... [11:09:29] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116846 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw2319 to w... [11:10:02] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116850 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw2317 to w... [11:14:57] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116856 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by cgoubert@cumin1002 Renumberin... [11:19:50] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116870 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by hnowlan@cumin1002 Renumbe... [11:20:04] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10116871 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wi... [12:08:38] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117057 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikiku... [12:08:42] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117058 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by hnowlan@cumin1002 Renumbering... [12:28:47] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117134 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jayme@cumin1002 from kubernetes200... [12:32:36] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117157 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jayme@cumin1002 from kubernetes201... [12:36:35] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117183 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jayme@cumin1002 from kubernetes203... [12:37:58] XioNoX: do you happen to know if the hosts.rename cookbook is supposed to work on supermicro machines? Currently it does abort because of the iDRAC version check [12:40:53] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117195 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [12:41:16] jayme: he's on holidays, I can check if you want [12:41:23] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117196 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wiki... [12:41:29] we are adding the support for supermicro in these weeks, maybe this is why it doesn't work [12:42:05] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117204 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [12:42:44] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117206 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wiki... [12:44:57] jayme: yeah I think it was never tested [12:45:00] elukey: ah, okay [12:45:31] yeah..the redfish calls seem pretty generic, but the version check definitely not [12:45:50] I need to check if the call is supported by supermicro [12:46:25] do you have a hostname that I can use? [12:46:43] elukey: well, a renamed host [12:46:55] was kubernetes2054 is now wikikube-worker2088 [12:47:01] jayme: are all of them supermicros? [12:47:17] idk [12:47:33] you mean all of the ones we're reimaging? definitely not [12:47:59] okok maybe I have the wrong assumption - when you asked to Arzhel if rename was supposed to work on supermicro I thought you had a use case [12:48:10] yes, that one use case [12:48:17] ah okok perfect, got it [12:48:19] lemme check [12:48:20] sudo cookbook sre.hosts.rename -t T372878 kubernetes2054 wikikube-worker2088 [12:48:23] T372878: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878 [12:48:46] feel free to hit it. it's depooled etc. [12:49:27] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117288 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [12:50:35] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117292 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [12:50:54] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117293 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wiki... [12:54:08] jayme: the code as is is not going to work, lemme send a patch [12:54:22] ❤️ [13:05:40] jayme: basically the idea is https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1070600 [13:05:48] going to dry run it with test-cookbook [13:06:05] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117355 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by hnowlan@cumin1002 Renumbe... [13:06:36] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117356 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wi... [13:09:04] mmm not sure if dry-run works in this case [13:10:09] when it does DRY-RUN: Would have called patch on https://10.193.2.82/redfish/v1/Managers/1/EthernetInterfaces/1 it then raises and exception since the JSON returned is nothing (rightfully) [13:15:55] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_exim4.service on mx-out1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:19:37] elukey: should we just try it? [13:22:00] jayme: it should work, but if you are not in a rush I'd wait Riccardo's review [13:26:38] elukey: ack. can def. wait until tomorrow [13:29:06] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117373 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube... [13:29:11] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117374 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering f... [13:30:07] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10117370 (10ABran-WMF) d/p hosts are listed below: |Rack_C2| backup2009 | n/a |Rack_C2| backup2006 | n/a |Rack_C2| db... [13:35:35] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117401 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube-worker2086.codfw.wmne... [13:35:41] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117402 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering for host wikikube-worke... [13:52:43] jayme: we are good to test-cookbook it :) [13:53:12] if you want to do it go ahead, I'll be available if anything breaks [13:56:29] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117457 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker2082.codfw.wm... [14:16:49] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube-worker2087.codfw.wmne... [14:16:52] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117590 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering for host wikikube-worke... [14:17:34] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117592 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by hnowlan@cumin1002 Renumbering for host wikikube-wor... [14:18:20] jayme: elukey: I'm removing the host from hieradata/common/kubernetes.yaml for now, it's breaking the registry because the rename hasn't gone through [14:19:32] 07Puppet, 06cloud-services-team, 10Cloud-VPS: Remove prod-specific bits from cloud puppetmasters - https://phabricator.wikimedia.org/T309281#10117597 (10joanna_borun) p:05Triage→03Low [14:22:15] claime: we can try to do it now if you want [14:22:44] sure [14:23:11] I hate the stupid way docker does auth [14:23:32] should be `test-cookbook -c 1070600 sre.hosts.rename kubernetes2054 wikikube-worker2088` on cumin [14:23:45] Awesome thanks, testing now [14:24:00] check all the names etc.. not sure if they are correct [14:24:36] names are good, proceeding [14:30:14] worked? [14:32:08] 'HostName': 'wikikube-worker2088', [14:32:10] seems so yes ç= [14:32:13] :) [14:32:25] (checked the BMC's hostname for 2088 via spicerack) [14:32:39] nice [14:32:39] yeah it worked [14:32:42] thanks <3 [14:32:56] merged the change as well! [14:34:03] 10SRE-tools, 06cloud-services-team, 10Cloud-VPS, 10Spicerack, and 2 others: cookbooks: for --interactive flags, add an option to skip the rest - https://phabricator.wikimedia.org/T315341#10117647 (10fnegri) [14:36:47] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw rack C1 from asw-c1-codfw to lsw1-c1-codfw - https://phabricator.wikimedia.org/T373095#10117662 (10ABran-WMF) >>! In T373095#10110436, @ABran-WMF wrote: > [...] I'll double check the DNS indeed great catch! no D... [14:38:21] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117668 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by cgoubert@cumin1002 Renumbering for host wikikub... [14:41:55] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10117677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2036.cod... [14:55:55] elukey, claime: cool, thanks! (sorry, was at a viewing ) [14:56:17] no problem [14:56:28] claime: so I can kick off the reimage now, right? [14:56:36] yep [14:56:43] <3 [14:56:43] well the renumber [14:56:49] that'll kick off the reimage [14:56:55] yeah, sorry :) [14:57:01] use https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1070585 [14:57:12] or review and merge it :p [14:58:00] uh...I'd at least complain about the commit message :) [14:58:24] but I'll review and test run for sure [15:21:40] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10117963 (10Jgreen) The haproxy configuration part seems to work, I'm able to port forward localhost tcp/443 and I... [15:40:43] 10netops, 06Infrastructure-Foundations, 06SRE: ToR server-move Netbox script adding ".0" to end of interface names - https://phabricator.wikimedia.org/T374024 (10cmooney) 03NEW p:05Triage→03Medium [15:47:04] 10SRE-tools, 10Cumin, 06Infrastructure-Foundations, 10Spicerack: Formalize and share the spicerack/cumin release process - https://phabricator.wikimedia.org/T276443#10118074 (10elukey) 05Open→03Resolved a:03elukey We have now https://gitlab.wikimedia.org/repos/sre/python-release that basically do... [15:56:22] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10118127 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by hnowlan@cumin1002 Renumbering for host wikikube... [15:56:36] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10118128 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker2083.codf... [16:00:48] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw rack C1 from asw-c1-codfw to lsw1-c1-codfw - https://phabricator.wikimedia.org/T373095#10118144 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=6ba6c00e-f364-45da-8be3-ee80785b36c0) set by cm... [16:00:55] FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_exim4.service on mx-out1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:08:58] 10SRE-tools, 10conftool, 06DBA, 06Infrastructure-Foundations, and 2 others: Spicerack support for dbctl - https://phabricator.wikimedia.org/T362893#10118192 (10ABran-WMF) [16:16:23] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw rack C1 from asw-c1-codfw to lsw1-c1-codfw - https://phabricator.wikimedia.org/T373095#10118229 (10cmooney) Link moves completed, all servers now responding to ping again so looks ok. Unsure of exact times for... [16:16:26] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10118230 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2036.codfw.w... [16:20:48] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10118258 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by cgoubert@cumin1002 Renumbering for host wikikube-wo... [16:22:26] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw rack C1 from asw-c1-codfw to lsw1-c1-codfw - https://phabricator.wikimedia.org/T373095#10118280 (10ABran-WMF) d/p hosts are repooling [16:25:50] 10netbox, 06Infrastructure-Foundations: Upgrade Netbox to 4.1 - https://phabricator.wikimedia.org/T371889#10118295 (10Volans) Netbox 4.1 is out, published yesterday. [16:42:26] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10118457 (10cmooney) >>! In T373096#10106969, @Dzahn wrote: > The server `phab2002` mentioned here for Collaboration S... [16:49:22] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10118561 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker2083.codfw.wm... [16:54:04] 10netops, 06Infrastructure-Foundations, 06SRE: ToR server-move Netbox script adding ".0" to end of interface names - https://phabricator.wikimedia.org/T374024#10118603 (10cmooney) [16:54:41] 10netops, 06Infrastructure-Foundations, 06SRE: ToR server-move Netbox script adding ".0" to end of interface names - https://phabricator.wikimedia.org/T374024#10118608 (10cmooney) [17:00:25] FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_exim4.service on mx-out1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:10:26] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10118739 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by hnowlan@cumin1002 Renumbering for host wikikube-wor... [21:00:55] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_exim4.service on mx-out1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:50:25] RESOLVED: [2x] SystemdUnitFailed: wmf_auto_restart_exim4.service on mx-out1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:36:08] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:codfw:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371434#10119784 (10Papaul) The diagram below will outline the cabling of the new Fundraising network devices {F57461650}