[01:53:04] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:48:30] (SystemdUnitFailed) firing: (2) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:18:04] (SystemdUnitFailed) firing: (2) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:18:30] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:53:39] 10Mail, 06Infrastructure-Foundations, 07User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984#9600958 (10Tacsipacsi) Thanks for the explanation! So the concern is the amount of outgoing mail, specifically the amount of mail going out from the... [14:07:20] topranks if you're around, I was wondering how to bring a server back into prod with a different name? It was decommed without '--keep-mgmt-dns' . More details here: https://phabricator.wikimedia.org/T358727 [14:08:38] inflatador: sure let me have a quick look [14:09:05] we're actually working on some automation around this at the moment, a cookbook to reimage and rename in the process [14:09:55] given this is decom'ed I think here we will jump to the start of the provision workflow but let me check [14:10:03] nice! it does seem like there's been a lot of server repurposing these days [14:10:50] https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Decommissioned_-%3E_Spare [14:10:59] or [14:10:59] https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Decommissioned_-%3E_Active [14:11:03] depending on the use case [14:11:29] volans: thanks yep [14:11:49] sounds like the latter [14:18:30] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:22:34] volans https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Rename_while_reimaging assumes "--keep-mgmt-dns" ...does that matter? Do I need to ask DC Ops for the mgmt interface info? [14:27:30] inflatador: what's the hostname in netbox? [14:27:58] I can't find wdqs1025 nor cp1086 [14:31:40] volans cp1086 was the old name AFAIK [14:33:45] it was renamed to wqds1025 [14:33:49] volans, inflatador: leave it with me [14:33:52] that's wrong [14:33:54] yeah I renamed it a moment ago [14:33:57] oh? [14:34:04] should be wdqs , not wqds ...common mistake ;) [14:34:28] yep. I do it nearly every time sry. I have to say in my head "wiki data query service" real slow or it happens :P [14:34:45] renamed again [14:41:32] volans: good spot on the name that could have got very confusing :) [14:41:55] inflatador: I updated the task there, I think you should be able to reimage with the new name when you're ready [14:44:46] just reimage or follow the whole rename while reimaging? [14:44:58] just to be clear and make sure we don't forget about steps [14:45:04] *make clear ownership [14:59:41] I gave more detail on the task, need to follow the 'rename while reimaging' process from the "patch puppet" step onward [15:00:21] thanks :) [15:10:31] thanks again fellas! dr0ptp4kt ^^ [15:38:57] 10netops, 06DBA, 06Infrastructure-Foundations, 06SRE, 10ops-codfw: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw - https://phabricator.wikimedia.org/T355873#9601519 (10Joe) [15:55:10] 10netops, 06DBA, 06Infrastructure-Foundations, 06SRE, 10ops-codfw: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw - https://phabricator.wikimedia.org/T355873#9601663 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=19e5ce18-f2ba-4d9e-a80a-2c957c2eecad) set by cmoon... [15:58:20] 10netops, 06DBA, 06Infrastructure-Foundations, 06SRE, 10ops-codfw: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw - https://phabricator.wikimedia.org/T355873#9601682 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=f241631d-4830-4ac7-b5c1-29790ccbb916) set by cmoon... [16:15:25] 10netops, 06DBA, 06Infrastructure-Foundations, 06SRE, 10ops-codfw: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw - https://phabricator.wikimedia.org/T355873#9601818 (10cmooney) All links moved without problem, servers back online and responding to ping now. [17:06:13] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan - https://phabricator.wikimedia.org/T352920#9602285 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=6010131f-b756-49c6-8082-62badba41... [17:08:17] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan - https://phabricator.wikimedia.org/T352920#9602297 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=c0fe6035-a553-49f8-8b94-3d7840e51... [17:09:18] 10netops, 06Infrastructure-Foundations, 06SRE: Icinga BFD check failing - https://phabricator.wikimedia.org/T359198 (10cmooney) p:05Triage→03Medium [17:19:58] 10netops, 06Infrastructure-Foundations, 06SRE: Icinga BFD check failing - https://phabricator.wikimedia.org/T359198#9602418 (10andrea.denisse) a:03andrea.denisse [17:26:24] 10netops, 06Infrastructure-Foundations, 06SRE: Icinga BFD check failing - https://phabricator.wikimedia.org/T359198#9602476 (10dcaro) Affecting also the cloudswitches {F42399814} [17:27:36] 10netops, 06Infrastructure-Foundations, 06SRE: Icinga BFD check failing - https://phabricator.wikimedia.org/T359198#9602493 (10dcaro) It's gone now :) [17:29:26] 10netops, 06Infrastructure-Foundations, 06SRE: Icinga BFD check failing - https://phabricator.wikimedia.org/T359198#9602506 (10andrea.denisse) [17:29:59] 10netops, 06Infrastructure-Foundations, 06SRE: Icinga BFD check failing - https://phabricator.wikimedia.org/T359198#9602503 (10fgiunchedi) I've bandaided the issue on alert2001, we'll need a more proper fix: ` # download-mibs # cd /var/lib/snmp && ln -s ../mibs ` [17:30:47] 10netops, 06Infrastructure-Foundations, 06SRE: Icinga BFD check failing - https://phabricator.wikimedia.org/T359198#9602522 (10Dzahn) There is this package on the alert hosts: ` ii snmp-mibs-downloader 1.2 all install and manage Management Information B... [17:32:42] 10netops, 06Infrastructure-Foundations, 06SRE: Icinga BFD check failing - https://phabricator.wikimedia.org/T359198#9602540 (10cmooney) >>! In T359198#9602522, @Dzahn wrote: > I guess the snmp-mibs-downloader just has to be automated to download stuff? Yeah on it's own that package installs but doesn't do a... [17:48:03] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan - https://phabricator.wikimedia.org/T352920#9602589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host... [17:53:36] 10netops, 06Infrastructure-Foundations, 06SRE: Icinga BFD check failing - https://phabricator.wikimedia.org/T359198#9602613 (10Dzahn) Looks like it's: `man 1 download-mibs` `download-mibs --help` and the config is at `/etc/snmp-mibs-downloader/snmp-mibs-downloader.conf` which has some kind of "AUTOLOAD" c... [17:55:55] 10netops, 06Infrastructure-Foundations, 06SRE: Icinga BFD check failing - https://phabricator.wikimedia.org/T359198#9602620 (10cmooney) >>! In T359198#9602613, @Dzahn wrote: > Looks like it's: > > `man 1 download-mibs` > `download-mibs --help` > > and the config is at `/etc/snmp-mibs-downloader/snmp-mibs-... [18:19:55] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:30:57] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan - https://phabricator.wikimedia.org/T352920#9602746 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1002 for host lvs... [18:31:38] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan - https://phabricator.wikimedia.org/T352920#9602756 (10cmooney) Reimage looks good, BGP up and lvs2011 handling traffic again: ` cmooney@cumin1002:~$ sud... [19:06:28] 10netops, 06Infrastructure-Foundations, 06SRE: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869#9602910 (10cmooney) [19:06:33] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan - https://phabricator.wikimedia.org/T352920#9602909 (10cmooney) 05Open→03Resolved [19:16:27] 10netops, 06SRE, 13Patch-For-Review, 10SRE Observability (FY2023/2024-Q3): Icinga BFD check failing - https://phabricator.wikimedia.org/T359198#9602958 (10andrea.denisse) [22:19:55] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:34:36] 10netops, 06DBA, 06Infrastructure-Foundations, 06SRE, 10ops-codfw: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw - https://phabricator.wikimedia.org/T355873#9603722 (10cmooney) 05Open→03Resolved a:03cmooney [23:34:45] 10netops, 06Infrastructure-Foundations, 06SRE, 10ops-codfw: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544#9603724 (10cmooney) [23:35:25] 10netops, 06Infrastructure-Foundations, 06SRE, 10ops-codfw: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9603728 (10cmooney) [23:35:31] 10netops, 06Infrastructure-Foundations, 06SRE, 10ops-codfw: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544#9603729 (10cmooney) [23:35:49] 10netops, 06Infrastructure-Foundations, 06SRE, 10ops-codfw: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544#9603725 (10cmooney) 05Open→03Resolved a:03cmooney Closing task. Big thanks to all the SRE teams for the help and co-operation getting this o...