[06:13:29] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Marostegui)
[06:13:39] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Marostegui)
[06:15:05] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Marostegui) Both masters, s7 and x1 have been switched over and no longer live in this rack.
[06:15:11] <wikibugs>	 10netbox, 10Infrastructure-Foundations: Netbox: use Custom Model Validation - https://phabricator.wikimedia.org/T310590 (10ayounsi) Agreed, we need to keep a close look at any risk of performance hit (eg. the ones that iterate over all objects), but a lot of reports/tests could be replaced by those validators....
[06:39:30] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi) p:05Triage→03Medium
[06:41:16] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi)
[06:41:24] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10ayounsi)
[06:42:22] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi)
[06:42:30] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE: Packet Drops on Eqiad ASW -> CR uplinks - https://phabricator.wikimedia.org/T291627 (10ayounsi)
[07:08:54] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi)
[08:10:27] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 4 others: Create a cookbook to perform a rolling reboot of a kubernetes cluster - https://phabricator.wikimedia.org/T260661 (10JMeybohm)
[08:50:54] <wikibugs>	 10netbox, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Grant cn=nda some sort of read only access to Netbox - https://phabricator.wikimedia.org/T302870 (10ayounsi) Send the above patch to grant access (`is_active`).  The permissions page though seems to involve quite a lot of manual work see fo...
[09:18:28] <wikibugs>	 10netbox, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Grant cn=nda some sort of read only access to Netbox - https://phabricator.wikimedia.org/T302870 (10ayounsi) I had a quick look at demo.netbox.dev and created a test user there (you can try with user foobar/foobar, the DB is reset every day...
[13:14:43] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10Cmjohnson)
[13:14:51] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE: Move asw2-d5-eqiad to spares - https://phabricator.wikimedia.org/T313115 (10Cmjohnson)
[13:53:10] <volans>	 moritzm: is there any WIP on ganeti01.svc.eqiad.wmnet:5080 APIs?
[13:53:36] <volans>	 it timedout for the netbox_ganeti_eqiad_sync.service from netbox1002
[13:58:14] <moritzm>	 first time I hear of this, having a look now
[13:59:35] <volans>	 first failure today at 13:35
[13:59:43] <volans>	 and second one at 13:50
[13:59:47] <volans>	 both UTC
[14:22:13] <moritzm>	 the certs are all valid until 2027 and the RAPI port is also running fine, but there's traceback in rapi-daemon.log starting st 13:35.
[14:22:32] <moritzm>	 there are no changes on the server itself, did anything change in the queries made by the netbox report?
[14:23:22] <volans>	 not that I know of
[14:23:56] <moritzm>	 mysterious, I'm going to open a task
[14:24:17] <volans>	 could be that the move to drbd of some VMs somehow affected the API response time? maybe it's just a timeout too short
[14:24:22] <volans>	 I'll test the call manually
[14:25:49] <moritzm>	 all the etcds which temporarily switched to DRBD are rolled back by now
[14:27:44] <volans>	 ok
[14:29:18] <volans>	 it seems the call that timesout is /2/instances?bulk=1, testeing with a longer timeout
[14:30:31] <volans>	 moritzm: confirmed
[14:30:34] <volans>	 just a timeout issue
[14:30:46] <volans>	 the default 5s we're currently using is not enough
[14:31:08] <volans>	 I'll send a patch to increase it, took short enough that I don't care if it's 5 or 10s for the API call
[14:31:19] <volans>	 doesn't seem worrying for now at least to me
[14:34:16] <moritzm>	 sounds good!
[14:36:53] <volans>	 moritzm: https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/815997
[14:53:56] <volans>	 all fixed
[14:59:23] <moritzm>	 nice!
[17:44:57] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10Patch-For-Review: drmrs: initial geodns configuration - https://phabricator.wikimedia.org/T304089 (10BCornwall)