[02:13:50] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:13:50] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:08:43] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: BGP status (instance cr1-drmrs) - https://phabricator.wikimedia.org/T357389#9725186 (10LSobanski) Additional BGP WARNING alert that showed up today: ` AS38082/IPv6: Active (for 65d14h), AS5398/IPv6: Active (for 118d16h), A... [07:51:58] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: BGP status (instance cr1-drmrs) - https://phabricator.wikimedia.org/T357389#9725253 (10cmooney) I'll take a look and clear up what I can. [08:00:10] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: BGP status (instance cr1-drmrs) - https://phabricator.wikimedia.org/T357389#9725264 (10cmooney) 05Open→03Resolved This one in particular down for almost a year and IPs are not responding to ARP/ND on the LAN. Peerin... [09:07:27] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: magru network setup - https://phabricator.wikimedia.org/T362421#9725493 (10jcrespo) Hi, after 73470d0dca68abee0 ntp no longer auto-restarts, but after one of the latest changes (I believe b48874a81565b7051be39659c056), it is [[ https://aler... [09:14:14] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: magru network setup - https://phabricator.wikimedia.org/T362421#9725545 (10cmooney) >>! In T362421#9725493, @jcrespo wrote: > Hi, after 73470d0dca68abee0 ntp no longer auto-restarts, but after one of the latest changes (I believe b48874a815... [09:40:33] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9725610 (10MoritzMuehlenhoff) [10:08:28] Almost created a fun day for myself :D [10:08:35] https://www.irccloud.com/pastebin/NT9McWi2/ [10:09:42] *phew* :-) [10:10:42] gunning for a t-shirt? :) [10:10:54] haha yeah [10:10:55] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9725707 (10MoritzMuehlenhoff) [10:13:50] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:20:00] topranks: lol, nice you, I would have answered: NO! NO! NO! Definitely not! [10:20:10] and I'm sure Junos would have interpreted that as a yes [10:20:12] :-p [10:20:35] haha yeah it's programmed to know what you _really_ mean deep down :P [12:31:36] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9726125 (10MoritzMuehlenhoff) [13:06:53] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: magru network setup - https://phabricator.wikimedia.org/T362421#9726265 (10ssingh) Thanks @jcrespo! I should have silenced the alert or restarted the service; both of those are in progress now so we should see this resolve soon. @cmooney:... [13:50:20] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9726525 (10MoritzMuehlenhoff) [14:13:50] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:17:02] 10SRE-tools, 10conftool, 10Spicerack: Spicerack support for dbctl - https://phabricator.wikimedia.org/T362893 (10CDanis) 03NEW [14:17:09] 10SRE-tools, 10conftool, 06Infrastructure-Foundations, 10Spicerack: Spicerack support for dbctl - https://phabricator.wikimedia.org/T362893#9726654 (10CDanis) p:05Triage→03Low [14:21:17] cdanis: nice one on the dbctl stuff! [14:21:57] I'd spent some time trying to get to grips with the code and was planning to reach out to you, but got dragged in other direction this week [14:22:10] topranks: yeah please just reach out to any one of myself or joe or volans [14:22:32] part of the issue with conftool is that there's effectively three different ways it implements data models for the entities it manages [14:22:34] yep will do if it comes up again... the main bit I was worrying about is how to test any changes I might make [14:22:44] ok [14:22:50] heh yeah the best way is the integration test but even that is a bit non-trivial to get running locally [14:23:58] i actually got ratholed for an hour this week trying to be able to run the docker image used by jenkins locally, and gave up [14:24:07] (i'm probably holding it wrong) [14:26:08] but the tldr on running the integration test locally is: install apt install etcd-server, systemctl stop etcd, and then run tox [14:26:10] ha ok. [14:26:32] ok yeah, that *sounds* easy enough [14:31:07] once you've done that it's just a simple matter of understanding the code ;) [14:53:30] moritzm: is this a good time for a debdeploy of python3-wmflib? [14:59:05] I'm currently rolling out apache sec updates, so there would be some dpkg locks, but I can surely do something else for a while [14:59:06] go ahead [14:59:48] nah go ahead [14:59:51] no hurry on my side [14:59:56] lmk when you're done [15:00:08] ok [15:21:41] volans: all done, you can go ahead [15:21:49] perfect, thanks [15:29:30] 10netops, 06Infrastructure-Foundations, 06SRE: Add probenet configuration for magru - https://phabricator.wikimedia.org/T362902 (10Fabfur) 03NEW [15:31:52] moritzm: all done, in case you need to do anything else relate [15:31:55] d [15:32:29] ack, I'll resume with util-linux sec updates, then [15:40:34] topranks: I think my patches for magru are all in, wmflib is upgraded across the fleet, the cookbook patch is merged [15:40:49] anything else missing or for which you need a hand? [15:40:54] volans: ok nice! [15:41:14] no I don't think so, I'm in the process of doing the netbox additions for the network kit [15:41:31] after that I'll merge Arzhel's patch to the public repo and hopefully be able to generate all the configs [15:41:55] if there are any snags I'll let you know but I think not [15:42:05] volans: you are off tomorrow I assume? [15:42:17] yes [15:42:39] no probs, I can't think of anything that is missing [15:42:55] did the DNS stuff yesterday - we really need to make that easier :P [15:43:16] :P [15:43:34] we talked before on how to approach it, will see next Q if we can get time allocated for it [15:43:36] another big FIXME is unifiying all these spreads of the site data [15:43:44] we have so much duplication [15:45:49] 100% [16:06:20] I'll file a task to eventually test https://people.skolelinux.org/pere/blog/RAID_status_from_LSI_Megaraid_controllers_in_Debian.html [16:06:37] would be nice to drop the clumsy, proprietary original tool if that fully replaces it... [16:26:40] moritzm: nice! [17:26:03] 10Mail: Create user preference to receive change notification emails for bot edits - https://phabricator.wikimedia.org/T358087#9727479 (10Qwerfjkl) For what it's worth I've made https://watchlistemail.toolforge.org/ to send emails for bot edits. (It will only work on enwiki, ping me if anyone wants it working on... [18:13:51] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:27:48] 10netbox, 10ChangeProp, 06collaboration-services, 06Infrastructure-Foundations, and 10 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9727771 (10brennen) [21:11:30] 10Mail, 06Fundraising-Backlog, 06Infrastructure-Foundations, 06SRE: Access to DMARCIAN - https://phabricator.wikimedia.org/T356920#9728207 (10AKanji-WMF) 05Declined→03Open p:05High→03Medium a:03AKanji-WMF [22:13:51] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed