[03:48:48] (SystemdUnitFailed) firing: wmf_auto_restart_uwsgi-puppetdb-microservice.service Failed on puppetdb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:46:07] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10ops-codfw: Migrate lvs2011 and lvs2012 to new top-of-rack switches - https://phabricator.wikimedia.org/T348178 (10ayounsi) > Secondary Link Migration Looking at link usage, it's fine to drop the secondary link and keep it at 10G. https://librenm... [07:27:16] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate mr1-codfw from asw-a1-codfw to lsw1-a1-codfw - https://phabricator.wikimedia.org/T348164 (10ayounsi) > I would propose using free port ge-0/0/3 to add the new routed link, and bringing up the BGP peering before we touch the existing link.... [07:48:48] (SystemdUnitFailed) firing: wmf_auto_restart_uwsgi-puppetdb-microservice.service Failed on puppetdb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:36:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10ops-codfw: Migrate lvs2013 and lvs2014 codfw row A-B connections to new switches - https://phabricator.wikimedia.org/T348218 (10cmooney) p:05Triage→03Medium [08:37:11] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10ops-codfw: Migrate lvs2013 and lvs2014 codfw row A-B connections to new switches - https://phabricator.wikimedia.org/T348218 (10cmooney) [08:37:27] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10ops-codfw: Migrate lvs2013 and lvs2014 codfw row A-B connections to new switches - https://phabricator.wikimedia.org/T348218 (10cmooney) [08:37:35] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw row A-B migration - non-standard device moves - https://phabricator.wikimedia.org/T348128 (10cmooney) [08:40:36] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw row A-B migration - non-standard device moves - https://phabricator.wikimedia.org/T348128 (10cmooney) [08:47:10] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10ops-codfw: Migrate lvs2013 and lvs2014 codfw row A-B connections to new switches - https://phabricator.wikimedia.org/T348218 (10cmooney) [09:06:30] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) [09:29:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Add new codfw private vlan sub-interfaces to lvs2013 and lvs2014 - https://phabricator.wikimedia.org/T348225 (10cmooney) p:05Triage→03Medium [09:30:17] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Add new codfw private vlan sub-interfaces to lvs2013 and lvs2014 - https://phabricator.wikimedia.org/T348225 (10cmooney) [09:30:25] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10cmooney) [11:48:48] (SystemdUnitFailed) firing: wmf_auto_restart_uwsgi-puppetdb-microservice.service Failed on puppetdb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:57:02] ^ removed the auto restart timer on puppetdb [11:57:47] cheers [11:58:48] (SystemdUnitFailed) resolved: wmf_auto_restart_uwsgi-puppetdb-microservice.service Failed on puppetdb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:41:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10Patch-For-Review: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ssingh) @cmooney's comment above on the default routing policy and priority of routes got me thinking: if... [13:57:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10Patch-For-Review: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ssingh) >>! In T348041#9228328, @ayounsi wrote: > That's only for equal prefix length. For example a stati... [14:33:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Bring codfw row A-B EVPN switches live and make them gateway for existing Vlans - https://phabricator.wikimedia.org/T347191 (10cmooney) @papaul I've moved the google meet for this to the week after - Oct 17th. There are few other moving parts in... [15:06:40] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) [15:38:48] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:43:48] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:57:11] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10aborrero) The cloudgw side is now completed. We may want to refresh the neutron side as well: `lang=shell-session aborrero@cloudcontrol100... [16:06:03] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Bring codfw row A-B EVPN switches live and make them gateway for existing Vlans - https://phabricator.wikimedia.org/T347191 (10Papaul) @cmooney no problem [16:54:08] fyi all im going to tes the puppet migration cookbook on sretest1003 [16:54:53] \o/ [18:18:40] (SystemdUnitFailed) firing: httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:22:07] had to fix a minor bug https://gerrit.wikimedia.org/r/c/operations/puppet/+/963788 but sretest1003 is not on puppet7 [18:22:15] s/not/now/ [18:28:55] ~. [19:18:32] (SystemdUnitFailed) resolved: httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:06:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Cabling for Eqiad racke E5-7 and F5-7 - https://phabricator.wikimedia.org/T334231 (10Jclark-ctr)