[04:17:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:17:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:26:03] 10netops, 06DBA, 06Infrastructure-Foundations: x2 codfw master (db2144) TCP errors - https://phabricator.wikimedia.org/T362482 (10Marostegui) 03NEW [11:29:30] 10netops, 06DBA, 06Infrastructure-Foundations: x2 codfw master (db2144) TCP errors - https://phabricator.wikimedia.org/T362482#9711790 (10Marostegui) For now I have rebooted x2 master to see if it comes back in a better state (as doing a switch is really painful on x2) [11:33:20] 10netops, 06DBA, 06Infrastructure-Foundations, 13Patch-For-Review: x2 codfw master (db2144) TCP errors - https://phabricator.wikimedia.org/T362482#9711793 (10Marostegui) The reboot didn't work, going to start to switch the master [11:42:14] 10netops, 06DBA, 06Infrastructure-Foundations, 13Patch-For-Review: x2 codfw master (db2144) TCP errors - https://phabricator.wikimedia.org/T362482#9711820 (10Marostegui) The new master is showing the same issue - from IRC: ` 13:38:30 I think it could be on a network link in eqiad 13:38:39 10netops, 06DBA, 06Infrastructure-Foundations, 13Patch-For-Review: x2 codfw master (db2144) TCP errors - https://phabricator.wikimedia.org/T362482#9711824 (10cmooney) Apologies I'd missed it at first - but there are errors ingress on our circuit from codfw to eqiad on the eqiad side: {F46535882} This is... [11:46:27] 10netops, 06DBA, 06Infrastructure-Foundations, 13Patch-For-Review: x2 codfw master (db2144) TCP errors - https://phabricator.wikimedia.org/T362482#9711825 (10Marostegui) Thank you @cmooney! Up to you if you want to keep this task open for that or you want to open an specific one for that operation [11:48:59] 10netops, 06DBA, 06Infrastructure-Foundations, 13Patch-For-Review: 14x2 codfw master (db2144) TCP errors - 14https://phabricator.wikimedia.org/T362482#9711826 (10Marostegui) 05Open→03Resolved a:03cmooney 14Per the irc chat, closing this as fixed [11:54:46] 10netops, 06DBA, 06Infrastructure-Foundations, 13Patch-For-Review: 14x2 codfw master (db2144) TCP errors - 14https://phabricator.wikimedia.org/T362482#9711850 (10cmooney) [12:17:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:22:25] (SystemdUnitFailed) firing: (2) debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:23:36] 10netops, 06DBA, 06Infrastructure-Foundations, 13Patch-For-Review: 14x2 codfw master (db2144) TCP errors - 14https://phabricator.wikimedia.org/T362482#9711858 (10Marostegui) [12:47:25] (SystemdUnitFailed) firing: (2) debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed