[08:59:10] <wikibugs>	 10puppet-compiler, 10Infrastructure-Foundations, 10User-dcaro: PCC Remove .configs file support under worker.py - https://phabricator.wikimedia.org/T294541 (10dcaro) 05In progress→03Open
[08:59:24] <wikibugs>	 10puppet-compiler, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-dcaro: PCC: add a less arbitrary success condition - https://phabricator.wikimedia.org/T295030 (10dcaro) 05In progress→03Open
[08:59:39] <wikibugs>	 10puppet-compiler, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-dcaro: PCC: add automatic style checker (balck + isort) - https://phabricator.wikimedia.org/T295063 (10dcaro) 05In progress→03Open
[09:53:38] <wikibugs>	 10SRE-tools, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10media-backups, and 2 others: minio monitoring broken due to TLS certificate marked as insecure - https://phabricator.wikimedia.org/T295594 (10jcrespo)
[09:54:02] <wikibugs>	 10SRE-tools, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10media-backups, and 2 others: minio monitoring broken due to TLS certificate marked as insecure - https://phabricator.wikimedia.org/T295594 (10jcrespo)
[10:08:18] <wikibugs>	 10netops, 10Infrastructure-Foundations: Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10ayounsi) p:05Triage→03High
[10:09:19] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE: cr1-eqiad -> Charter/AS7843 connectivity is broken - https://phabricator.wikimedia.org/T295650 (10ayounsi) Thanks for taking care of it. Proper fix is most likely T295672.
[10:15:51] <wikibugs>	 10SRE-tools, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10media-backups, 10observability: minio monitoring broken due to TLS certificate marked as insecure - https://phabricator.wikimedia.org/T295594 (10jcrespo) 05Open→03Resolved a:03jbond The patch + running puppet fixed the issue. I...
[10:43:46] <topranks>	 volans:  Are the instructions for creating a VM here still the best approach:
[10:43:48] <topranks>	 https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM
[10:43:55] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE: Rebuild Routinator (rpki) VMs with larger disk - https://phabricator.wikimedia.org/T292503 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by cmooney@cumin1001 for hosts: `rpki1001.eqiad.wmnet` - rpki1001.eqiad.wmnet (**PASS**)   - Downtimed hos...
[10:44:21] <topranks>	 i.e. running the sre.ganeti.makevm cookbook?
[10:44:25] <moritzm>	 topranks: yeah, those are still accurate
[10:44:56] <volans>	 yep ^^^
[10:44:59] <topranks>	 moritzm: cool, seems easy enough some other comments I seen confused me.
[10:45:03] <topranks>	 thanks both!
[10:45:44] <moritzm>	 which ones? if there still references to the old script (which predated the cookbook) we can remove them
[10:49:12] <topranks>	 none on the wiki, was just some comments on irc I think I mis-interpreted 
[10:51:57] <volans>	 moritzm: would now be a good time to run a fleet wide (buster+) debdeploy to upgrade python3-wmflib?
[10:52:37] <moritzm>	 volans: sure, go ahead :-)
[10:52:42] <volans>	 thx
[11:11:13] <volans>	 XioNoX, topranks: I know you have played on netbox-next for cable represenation options, would you mind if I import a fresh clean DB backup from netbox prod? It would nuke any local data modification. If you need those I can find another way to test the import script patch
[11:11:26] <XioNoX>	 +1 for me
[11:11:48] <topranks>	 Fire ahead volans I think everyone got to see the example so it's fine to thrash it
[11:12:06] <volans>	 ack, wanna make a screenshot just in case before I nuke it topranks ?
[11:12:59] <topranks>	 yeah actually not a bad idea.  
[11:13:01] <topranks>	 one sec
[11:13:20] <volans>	 take your time
[11:13:24] <volans>	 no hurry here
[11:13:29] <topranks>	 ok fire away thanks :)
[11:16:20] <volans>	 thank you!
[11:48:18] <topranks>	 moritzm: Not sure if something has gone wrong creating rpki1001 VM.
[11:48:33] <topranks>	 The instance exists but when I connect to the console there is no output, have rebooted it but same thing.
[11:49:18] <topranks>	 Ganeti says it's running and from what I can tell status looks ok
[11:49:46] <topranks>	 perhaps it's just a matter of waiting and I'm being impatient.
[11:51:30] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: cr1-eqiad -> Charter/AS7843 connectivity is broken - https://phabricator.wikimedia.org/T295650 (10cmooney) Please ignore the above, unrelated CRs.  I pasted the wrong task ID when doing the commit.
[11:54:55] <moritzm>	 you mean when you connect via "sudo gnt-instance console rpki1001.eqiad.wmnet there's no output?
[11:55:03] <topranks>	 yes
[11:58:16] <moritzm>	 interesting! that's the very same error I'm currently running into with in the new ganeti-test* cluster
[11:59:01] <moritzm>	 if you have a look at the processes on ganeti1009 (where rpki1001 was created) you'll see a kvm-console-wrapper zombie process
[11:59:14] <topranks>	 ok.  The virtual console did "detach" when I rebooted the instance, so it kind of looks like the command is attaching to _something_
[11:59:29] <moritzm>	 and the socat command which would have connected to the instance froze
[12:00:06] <topranks>	 ah ok yeah there are two of them there alright
[12:01:11] <moritzm>	 in a meeting now
[12:01:15] <topranks>	 no rush on this
[12:24:16] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE: cr1-eqiad -> Charter/AS7843 connectivity is broken - https://phabricator.wikimedia.org/T295650 (10cmooney) a:03cmooney
[12:27:55] <wikibugs>	 10SRE-tools, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10media-backups, 10observability: minio monitoring broken due to TLS certificate marked as insecure - https://phabricator.wikimedia.org/T295594 (10jbond) @jcrespo To be clear there was nothing wrong with your config, this is something...
[12:38:43] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE: cr1-eqiad -> Charter/AS7843 connectivity is broken - https://phabricator.wikimedia.org/T295650 (10cmooney) > My guess would be that this is Charter filtering traffic on their IXP port to only routers they have peerings with, for security/anti-DDoS reasons. >  >...
[12:47:34] <volans>	 FYI I've re-imported the latest backup into netbox-next, feel free to edit things as needed like before :) I'll be testing a mass run of the puppetdb import script for all devices in the afternoon
[14:50:21] <wikibugs>	 10netops, 10Infrastructure-Foundations: Upgrade core routers to Junos 20+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) p:05Triage→03Low
[14:54:37] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10fundraising-tech-ops: Upgrade pfw to Junos 20+ - https://phabricator.wikimedia.org/T295691 (10ayounsi) p:05Triage→03Low
[17:31:27] <volans>	 FYI (cc XioNoX, topranks): https://vincent.bernat.ch/en/blog/2021-source-of-truth-network
[17:31:50] <XioNoX>	 thx
[17:31:55] <volans>	 that points to https://blog.networktocode.com/post/nautobots-rollback/ TIL fo rme
[17:31:58] <volans>	 *for me
[17:32:04] <volans>	 although is of few months ago
[17:33:38] <XioNoX>	 yeah, it's one of the cool features of Nautobot
[17:33:51] <topranks>	 yeah it definitely looks good, I seen it for the first time last week.
[17:34:09] <volans>	 btw on footnote 1 we're quoted :D
[17:34:18] <volans>	 s/quoted/mentioned/
[17:35:32] <jbond>	  traceroute www.wikimedia.org
[17:36:34] <volans>	 jbond: ?
[17:36:42] <topranks>	 that a bad paste or you suspect we have issues John?
[17:37:01] <jbond>	 oh sorry bad paste :)
[17:37:15] <XioNoX>	 jbond: Password: 
[17:37:18] <XioNoX>	 :)
[17:37:23] <volans>	 lol
[17:37:30] <jbond>	 lol
[17:47:03] <majavah>	 hunter2
[20:55:30] <paravoid>	 what's with the "BGP peer above prefix limit global" that's been alerting since Thursday?
[21:06:02] <XioNoX>	 paravoid: IPv6 IX sessions have a cut of of 4000, with an alerting threshold of 80%, looks like AS4230 is hitting that
[21:06:31] <XioNoX>	 paravoid: it only emails peering@ though
[21:12:08] <XioNoX>	 interesting, looks like it keeps sending us too many prefixes then gets kicked then goes back to normal
[21:13:57] <XioNoX>	 oh, I see that topranks set a higher threshold not long ago
[21:14:01] <XioNoX>	 all good then
[21:14:07] <topranks>	 sry yeah was just coming here to update
[21:14:10] <topranks>	 alert has not cleared.
[21:14:38] <XioNoX>	 not?
[21:14:58] <topranks>	 no, but maybe librenms just hasn't polled.. I've not timed exactly but it's been more than 5 mins I think
[21:14:58] <XioNoX>	 the device logs are clean since
[21:15:33] <topranks>	 I mean in LibreNMS / alertmanager it's still shoiwng.  Router looks good.
[21:16:38] <XioNoX>	 yeah it will recover within 5 or 10min I guess
[21:17:07] <topranks>	 yeah assume so
[21:18:19] <topranks>	 it's gone now :)