[07:29:31] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 3 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) [08:13:03] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 3 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) [08:27:29] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 3 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) All pre-failover steps are done [08:27:45] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 3 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) [08:29:22] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 3 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) [09:03:07] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 3 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) [09:03:44] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 3 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) [09:05:00] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 3 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) [09:17:59] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 2 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) 05Open→03Resolved [09:18:18] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 2 others: Switchover m2 master (db1183 -> db1159) - https://phabricator.wikimedia.org/T300329 (10Marostegui) All done, thanks a lot @akosiaris @jcrespo for the support! [09:59:47] XioNoX, volans: I've stumbled on a bit of an odd one with the DHCP relay on the new switches. [10:00:01] I don't think it's an issue, but interested to get your opinion. [10:00:19] Basically when a host sends a DHCP request it being relayed correctly. Option 82 like this: [10:00:25] lsw1-f1-eqiad:xe-0/0/6.0:private1-f1-eqiad [10:00:40] The problem is the switches are also treating the DHCP DISCOVER as a regular L2 broadcast. [10:00:52] So if the Vlan is configured across multiple switches they will all see it. [10:01:05] And they all end up relaying it to the install server as a unicast packet. [10:01:13] These DHCP packets from the "wrong" switches have their own hostname/vxlan interface in the option 82 info: [10:01:17] lsw1-e1-eqiad:vtep.32770:private1-f1-eqiad [10:01:46] ^^ this won't ever match a config on the install server, so I think this won't cause problems in production [10:02:04] even if it would be better if these other packets weren't being generated. [10:02:13] yeah I aagree [10:02:40] it's a bit like right now, where both CRs relay DHCP requests [10:02:52] (except right now they're both valid) [10:02:55] yeah it's not dissimilar [10:03:04] doh, that's unoptimal for sure [10:03:05] exactly yeah in this case the "duplicates" aren't valid. [10:03:19] but yes the temporary config on the dhcp is generated with data from netbox [10:03:26] volans: one thing to consider is we will have a limited number of vlans that span multiple switches [10:03:27] so only the correct one will match [10:03:40] the new thing will be a tiny bit of noise, I don't think we should go great lengths to prevent it [10:03:45] right now I've got none planned (that may change), but I was testing the config for it. [10:03:46] ack [10:05:02] topranks: no blocker in any way, I agree [10:05:28] thanks. I did a brief google and couldn't find a way to prevent it, if I get time I'll double check again. [10:05:40] But agreed I think otherwise it hopefully won't cause a problem. [10:06:08] When the routing is in place and I've done more checks I'll try to run the reinstall cookbook for ms-fe1012 [10:06:23] Add "power down the other switches" to the re-image cookbook [10:06:34] lol problem solved :D [10:06:37] :) [10:06:39] :D [10:06:42] chaos monkey at it's best [10:07:01] volans: btw I think the reimage is probably gonna fail. [10:07:09] so optimistic [10:07:19] lol I can only ever be pleasently surprised in life :) [10:07:28] routing *back* from install server to the new racks is not in place to test fully. [10:07:44] But I don't think they'll get a proper DHCP reply, the routing from the new racks is in place... [10:07:48] and I see this on install1003 [10:08:01] https://www.irccloud.com/pastebin/h3S9QWeN/ [10:08:32] ah [10:09:05] you probably need to add the new networks to modules/install_server/files/dhcpd/dhcpd.conf [10:10:28] +1 [10:11:10] and more broadly to https://github.com/wikimedia/puppet/blob/production/modules/network/data/data.yaml#L88 [10:14:18] I wonder if we could generate dhcpd.conf from network/data.yaml [10:21:14] taavi: I had a quick look, I don't think it could be used "as it" especially the "next-server" field would need to documented somewhere else [10:21:26] but I'm sure someone could be nerd-snipped into it :) [10:25:46] and some of that data might be coming directly from netbox at some point, so not even sure it's worth to spend time on tht right now [10:34:50] taavi, XioNoX: thanks for the info yeah that's what I had expected. [10:35:01] I'll submit a patch to add the new assignments there :) [14:07:21] 10SRE-tools, 10Infrastructure-Foundations, 10Observability-Alerting: Spicerack: add support for Alertmanager - https://phabricator.wikimedia.org/T293209 (10fgiunchedi) >>! In T293209#7670485, @Volans wrote: > I had a chat with @jbond about this yesterday, putting the summary here for future reference for tho... [14:08:59] volans jbond I think we should be discussing ^ in an hangout too, if you have time on wed at 15 UTC there's the o11y meeting/office hours or an ad-hoc meeting works too [14:10:28] either work for me godog [14:11:48] +1 [14:13:20] ack, thanks! I've added you to the invite for next wed [14:32:46] 10SRE-tools, 10Infrastructure-Foundations, 10Observability-Alerting: Spicerack: add support for Alertmanager - https://phabricator.wikimedia.org/T293209 (10Volans) >>! In T293209#7675048, @fgiunchedi wrote: >>>! In T293209#7670485, @Volans wrote: >> - To support also the downtime of specific services (in I... [15:23:25] 10netops, 10Infrastructure-Foundations: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877 (10cmooney) p:05Triage→03Low [15:48:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877 (10ayounsi) [15:51:24] 10Puppet, 10Infrastructure-Foundations, 10Release-Engineering-Team, 10User-brennen: logspam-watch: sorting by message (column 6) appears broken - https://phabricator.wikimedia.org/T300298 (10dancy) 05Open→03Resolved a:03dancy [16:04:26] 10SRE-tools, 10Infrastructure-Foundations, 10serviceops: Add a kubernetes module to spicerack - https://phabricator.wikimedia.org/T300879 (10Joe) [16:05:29] 10SRE-tools, 10Infrastructure-Foundations, 10serviceops: Add a kubernetes module to spicerack - https://phabricator.wikimedia.org/T300879 (10Joe) [17:13:09] 10SRE-tools, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 2 others: Write a cookbook to set a k8s cluster in maintenance mode - https://phabricator.wikimedia.org/T277677 (10JMeybohm) a:05JMeybohm→03None [17:13:43] 10SRE-tools, 10Infrastructure-Foundations, 10serviceops: Add a kubernetes module to spicerack - https://phabricator.wikimedia.org/T300879 (10JMeybohm) [17:37:15] 10SRE-tools, 10Infrastructure-Foundations, 10serviceops: Add a kubernetes module to spicerack - https://phabricator.wikimedia.org/T300879 (10JMeybohm) > I'm unsure if we should use one of the many kubernetes python libraries available: > > * python-kubernetes, the official library, is quite hard to package...