[00:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:33:29] 10Mail, 06Infrastructure-Foundations, 10Phabricator: 14Interacting with Phabricator via mail is broken - 14https://phabricator.wikimedia.org/T92724#9712606 (10Mvolz) 14I'm having this issue again. Looks like a different error though: ` A message that you sent could not be delivered to one or more of... [10:04:54] 10Mail, 06Infrastructure-Foundations, 10Phabricator: 14Interacting with Phabricator via mail is broken - 14https://phabricator.wikimedia.org/T92724#9712695 (10Aklapper) 14@Mvolz: This ticket got closed nine years ago. You are probably looking for T356077 instead. [11:53:10] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522 (10ayounsi) 03NEW p:05Triage→03High [11:56:08] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522#9712932 (10ayounsi) [12:11:29] 10netops, 06Infrastructure-Foundations: Juniper: use export-format state-data json compact - https://phabricator.wikimedia.org/T362523 (10ayounsi) 03NEW [12:31:15] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522#9713006 (10ayounsi) Opened JTAC 2024-0415-128563 and attached logs/RSI/coredump. [12:39:23] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9713041 (10Gehel) [12:46:59] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 10cloud-services-team (FY2023/2024-Q3-Q4), and 2 others: Remove elasticsearch-curator dependency from Spicerack/Elastic cookbooks - https://phabricator.wikimedia.org/T361647#9713106 (10Gehel) [12:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:29:40] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9713348 (10Volans) With the above patch I think the issue should be solve... [13:38:57] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9713370 (10fnegri) Works for me! 🎉 ` spicerack (master) $ docker run --r... [13:40:07] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9713373 (10fnegri) `pip install wikimedia-spicerack` is also working fine... [13:55:34] (DiskSpace) firing: Disk space build2001:9100:/ 5.694% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [13:58:00] claime: related to the image script? ^^^ [14:01:02] folks some issue with my network here seems to have just gone down (writing this on phone) [14:01:18] will join meeting shortly please proceed [14:01:26] cc jobo [14:09:19] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review: 14spicerack: tox fails to install PyYAML using python 3.11 on bookworm - 14https://phabricator.wikimedia.org/T345337#9713473 (10Volans) 05Stalled→03Resolved 14Resolving then, th... [14:21:23] volans: possible, I was out, jayme ^ [14:21:46] volans: definitely :) [14:22:24] I'll keep an eye and clean up when everything is done [14:22:27] * volans in a meeting, let us know if we need to cleanup manually, IIRC there is a systemd timer that does a periodic cleanuo [14:25:59] 10netops, 06Infrastructure-Foundations: Juniper: use export-format state-data json compact - https://phabricator.wikimedia.org/T362523#9713593 (10ayounsi) p:05Triage→03Low a:03ayounsi [14:27:09] we'll take care of it. Don't worry [14:27:21] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: magru network setup - https://phabricator.wikimedia.org/T362421#9713590 (10ayounsi) p:05Triage→03High a:03ayounsi [14:28:28] thanks! [14:40:28] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: use old asw switches from row A and B as msw switches in row C and D - https://phabricator.wikimedia.org/T361871#9713684 (10cmooney) p:05Triage→03Low @papaul yeah I think if we want to go this route we can just set them up the same as w... [15:35:34] (DiskSpace) resolved: Disk space build2001:9100:/ 3.299% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [15:40:23] 10SRE-tools, 06SRE: Create a spicerack cookbook for restoring an etcd cluster from backups - https://phabricator.wikimedia.org/T203944#9714078 (10Volans) [15:40:51] 10SRE-tools, 06SRE: Covert deploy_apache_change.sh to a spicerack cookbook - https://phabricator.wikimedia.org/T203948#9714079 (10Volans) [15:42:38] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: spicerack.dnsdisc.Discovery should not allow pooling active/passive services in both datacenters - https://phabricator.wikimedia.org/T315560#9714081 (10Volans) @JMeybohm Is this something still needed? [15:43:07] 10SRE-tools, 06SRE: Spicerack cookbooks TODO list - https://phabricator.wikimedia.org/T203943#9714082 (10Volans) [15:43:23] 10SRE-tools, 06SRE: Create cookbook to do `nodetool repair` across cassandra cluster - https://phabricator.wikimedia.org/T225694#9714083 (10Volans) [15:47:26] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: spicerack.dnsdisc.Discovery should not allow pooling active/passive services in both datacenters - https://phabricator.wikimedia.org/T315560#9714127 (10JMeybohm) >>! In T315560#9714081, @Volans wrote: > @JMeybohm Is this something still needed? Not ult... [16:08:33] 10netops, 06DC-Ops, 06Infrastructure-Foundations: Take advantage of 10Gb NICs in the new network stack - https://phabricator.wikimedia.org/T360297#9714232 (10ayounsi) I started implementing a fix for that but it quickly gets complex as it means shutting down a port, and fully setting up another one. Before g... [16:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:18:15] 10netops, 06Infrastructure-Foundations, 06SRE: Move public-vlan host BGP peerings from CRs to top-of-rack switches in codfw - https://phabricator.wikimedia.org/T360772#9714985 (10cmooney) >>! In T360772#9657554, @ayounsi wrote: > We can define per host hiera keys, and empty lists as well, so to be tested but... [20:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:22:31] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move management routers ssh port - https://phabricator.wikimedia.org/T277438#9715614 (10ayounsi) We might have to re-prioritize this task because of {T362522} [21:23:00] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522#9715618 (10ayounsi) > I have checked the logs and it looks like the issue we are facing with the slowness on the device and the reboots is product of a brute force SSH attack on the SRX. > The login... [22:20:59] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522#9715694 (10cmooney) >>! In T362522#9715615, @ayounsi wrote: > it looks like the issue we are facing with the slowness on the device and the reboots is product of a brute force SSH attack on the SRX...