[01:48:25] (SystemdUnitFailed) firing: wmf_auto_restart_prometheus-redis-exporter@6380.service on netbox2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:18:25] (SystemdUnitFailed) firing: (2) wmf_auto_restart_prometheus-redis-exporter@6380.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:31:32] ^ fixed, I dropped the prometheis-redis-exporter timers which were still present (and now failing after I removed Redis yesterday) [05:38:25] (SystemdUnitFailed) resolved: (2) wmf_auto_restart_prometheus-redis-exporter@6380.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:41:18] 10Mail, 06Infrastructure-Foundations: Consolidation and tracking of automated email alerts improvements across services - https://phabricator.wikimedia.org/T360902#9747632 (10Aklapper) [13:10:00] FYI I'm following the instructions in https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Rename_while_reimaging for switching lists2001.codfw.wmnet to have a public IP address, starting with the decommission job now. [13:10:40] It will keep the lists2001 name (but obviously switch from .codfw.wmnet to .wikimedia.org) [13:30:03] eoghan: let me know if you hit any issues [13:36:50] Will do, thanks! [13:50:45] I need to step out for 30 minutes. It's decommed and I'll go through the reprovisioning steps in 30 when I'm back [14:19:25] (SystemdUnitFailed) firing: netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:38:44] 10netops, 06Infrastructure-Foundations, 06SRE: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929#9748100 (10cmooney) There are a few elements here to consider: ######Existing cloud-hosts private IPv6 ranges The existing cloud-hosts vlans, in the WMF production realm, have IPs from the wider WM... [14:49:25] (SystemdUnitFailed) resolved: netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:49:44] topranks still around? I know it's really late in your day but was wondering if you could help troubleshoot some weirdness w/elastic1105 [15:50:19] context is at https://phabricator.wikimedia.org/T363516 , TLDR is that was causing missing search results in eqiad so we had to failover to codfw [15:50:58] inflatador: sure [15:51:08] I seen some of the talk on one of the other channels [15:51:14] hmm ok [15:51:37] topranks cool, I will get an etherpad started and grab a Google Meet if that works [15:52:24] sure yeah give me 5 mins [15:52:34] sure, no rush. will continue convo in #search if that is OK [15:55:47] cool [15:59:12] 10Mail, 06Infrastructure-Foundations, 06SRE: Provision mx-out - https://phabricator.wikimedia.org/T325407#9748391 (10jhathaway) [16:00:08] 10Mail, 06Infrastructure-Foundations, 06SRE: Provision mx-out - https://phabricator.wikimedia.org/T325407#9748389 (10jhathaway)