[07:58:14] hnowlan: smelling fresh coffee here.. ready whenever you are [09:07:17] vgutierrez: great - I'll be ready in like 10 minutes [09:07:22] nice [09:27:24] vgutierrez: think I'm good to go if you are [09:29:09] hnowlan: sure [09:29:23] curl -v -H 'Host: en.wikipedia.org' -H 'X-Forwarding-Proto: https' http://127.0.0.1:3128/api/rest_v1/feed/onthisday/all/02/06 -o /dev/null [09:29:23] looks like a nice testcase, right? [09:30:04] yep, looks good! [09:30:21] cool, the usual cp2037 dance? [09:30:33] yeah cool :) [09:30:40] CR +1ed [09:31:07] thanks! disabling puppet [09:31:41] depooled cp2037 [09:35:13] hnowlan: hmm new endpoint doesn't set cache-control at all? [09:35:58] old one sends: < cache-control: s-maxage=300, max-age=60 [09:37:39] hmph, that was missed. [09:40:40] we'll need to roll back until that's in place I guess [09:44:27] ack [09:45:12] revert here https://gerrit.wikimedia.org/r/c/operations/puppet/+/946656 [09:45:35] but while we're in this state - if it's not too cheeky, once we have the revert in place I have another route I was going to ask about :) [09:46:12] https://gerrit.wikimedia.org/r/c/operations/puppet/+/946928 this is a new service endpoint so it's not connected to anything, but we'd like to expose it via the gateway [09:46:22] if you'd rather not, then no problem [09:49:37] "not connected to anything"? [09:49:47] meaning it isn't replacing anything? [09:50:07] yeah, sorry - not consumed by anything external [09:51:17] hnowlan: do you have a sample request handy? [09:55:45] vgutierrez: curl -v -H 'Host: wikimedia.org' -H 'X-Forwarding-Proto: https' http://127.0.0.1:3128/api/rest_v1/metrics/knowledge-gap/per-category/en.wikipedia/gender/female/20210101/20231201 [09:56:13] so /en.wikipedia.org/v1/metrics/knowledge-gap/per-category/en.wikipedia/gender/female/20210101/20231201 on the origin? [09:56:29] hmm wikimedia.org sorry [09:58:59] curl -H 'Host: wikimedia.org' -H 'X-Forwarded-Proto: https' https://rest-gateway.discovery.wmnet:4113/wikimedia.org/v1/metrics/knowledge-gap/per-category/en.wikipedia/gender/female/20210101/20231201 -v [09:59:06] that's currently triggering a 404 [10:02:44] sigh... [10:03:09] incredibly poor timing for my bouncer to drop :\ [10:03:12] apologies vgutierrez [10:03:21] no problem [10:04:24] dunno where I dropped off - the service can be queried on the gateway via `curl https://rest-gateway.discovery.wmnet:4113/analytics.wikimedia.org/v1/metrics/knowledge-gap/per-category/en.wikipedia/gender/female/20210101/20231201` [10:04:51] so host is analytics.wikimedia.org rather than wikimedia.org [10:05:50] the gateway will rewrite the hostname for all domains when the request arrives at the service so it doesn't really matter [10:06:22] curl https://rest-gateway.discovery.wmnet:4113/analytics.wikimedia.org/v1/metrics/knowledge-gap/per-category/en.wikipedia/gender/female/20210101/20231201 currently triggers a 404 here [10:06:36] here == cp2037 [10:08:15] aha. it appears they didn't deploy the service to codfw? heh [10:08:30] okay, let's revert the initial and just come back to this another time [10:08:42] 👍 [10:10:24] hnowlan_: eqiad doesn't look great either [10:11:06] https://www.irccloud.com/pastebin/n1I987Oh/ [10:13:15] hnowlan_: quick check on a 200 response shows cache-control but no etag or last-modified header, could we get any of those? [10:13:50] yep, I'll try to get those added [10:14:09] it looks like the endpoint works sporadically, some kind of deployment issue maybe. I'll follow up with the devs [10:14:12] cheers [10:14:39] revert merged, puppet re-enabled [10:14:48] repooling cp2037 [10:14:55] thx [10:16:00] thanks for the help! [13:34:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Improve Homer output when Juniper device rejects config - https://phabricator.wikimedia.org/T328747 (10ayounsi) a:03ayounsi [13:42:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox: Netbox Juniper report - https://phabricator.wikimedia.org/T306238 (10ayounsi) I sent a new email to Juniper yesterday to ask again about the best next steps here. [13:44:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox: Anycast: consistent routers->servers routing - https://phabricator.wikimedia.org/T253666 (10ayounsi) 05Resolved→03Declined [13:44:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, 10Patch-For-Review: Anycast AuthDNS - https://phabricator.wikimedia.org/T98006 (10ayounsi) [13:45:01] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, 10Patch-For-Review: Anycast AuthDNS - https://phabricator.wikimedia.org/T98006 (10ayounsi) [13:45:15] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox: Anycast: consistent routers->servers routing - https://phabricator.wikimedia.org/T253666 (10ayounsi) 05Stalled→03Resolved a:03ayounsi Boldly closing this as Katran will solve some if not all those limitations. [13:48:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, 10User-jbond: Anycast: consistent ICMP packet too big routing - https://phabricator.wikimedia.org/T253732 (10ayounsi) @Vgutierrez do you know how the future L4LB will handle ICMP PTB packets? Can it route it to the proper source host? [14:17:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, 10User-jbond: Anycast: consistent ICMP packet too big routing - https://phabricator.wikimedia.org/T253732 (10Vgutierrez) >>! In T253732#9080504, @ayounsi wrote: > @Vgutierrez do you know how the future L4LB will handle ICMP PTB packets? Can... [14:17:59] XioNoX: ^^ let me know if you need more details or that's enough [14:35:50] vgutierrez: noted thanks! I guess we don't know what their timeline is? [14:36:02] timeline for...? [14:36:38] code is there and ICMP PTB packets will reach the proper real server [14:37:15] what seems to be a WIP is the lower limit in MTU [14:37:24] ah right [14:37:36] I thought they were not going to the proper one yet [14:37:39] cool [14:37:54] as long as the real server is still there :) [14:39:41] of course [14:42:34] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, 10User-jbond: Anycast: consistent ICMP packet too big routing - https://phabricator.wikimedia.org/T253732 (10ayounsi) 05Open→03Declined Thanks, then like {T253666} I'm going to boldly close this task. [14:42:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, 10Patch-For-Review: Anycast AuthDNS - https://phabricator.wikimedia.org/T98006 (10ayounsi) [14:44:04] 10netops, 10Infrastructure-Foundations, 10SRE: Detect IP address collisions - https://phabricator.wikimedia.org/T189522 (10ayounsi) 05Open→03Resolved a:03ayounsi We have a working solution for the mgmt network (until it's time to split mgmt into smaller subnets). And for production, automation and per... [15:03:40] vgutierrez: would you have time for another attempt at wikifeeds? cache-control headers have been added everywhere they were missing [15:04:40] hnowlan: vg is out for the weekend now [15:05:43] ah, okay [15:07:51] would anyone else like to help with an ATS config change? We'd like to route traffic to wikifeeds via the rest-gateway (already tried via https://gerrit.wikimedia.org/r/945558 - the routing worked but the service was missing cache-control for some endpoints, which has since been fixed) ) [15:08:14] hnowlan: since it has his +1, I am happy [15:08:15] to [15:09:25] what kind of help are you looking for here? simply merging this and being around or something else? [15:10:49] sukhe: yep, pretty much! Just a +1 on the re-review and being around in case anything goes wrong :) [15:10:57] (it won't, I swear!) [15:11:10] hnowlan: you have his +1, merge away :) [15:11:16] and happy to help if things go south [15:21:55] sukhe: I actually had to revert earlier so could you review this please? <3 https://gerrit.wikimedia.org/r/c/operations/puppet/+/947372 [15:22:06] looking [15:22:27] cool, matches the revert exactly [15:22:34] as in the previously reviewed change [15:26:48] thanks! [15:26:59] hnowlan: merge away! [15:27:39] gonna disable puppet on A:cp beforehand just to be careful, will depool cp2037 and try it out there [15:27:55] thanks [15:35:16] alright, looks okay to me. See the cache-control headers as expected. I think I'm comfortable reenabling puppet on the rest of the hosts [15:37:44] hnowlan: looks ok! [15:39:32] sweet, let's go [15:40:13] puppet enabled, cp2037 repooled [15:41:15] thanks! [15:42:12] thank you! [16:11:04] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-esams:FPC0 Parity error - https://phabricator.wikimedia.org/T318783 (10ayounsi) Opened high priority case 2023-0809-747283 asking for a RMA. [17:14:52] okay, sadly we'll need to roll back again - it seems restbase was shielding us from some misconfigured clients that are making wikifeeds 5xx https://gerrit.wikimedia.org/r/c/operations/puppet/+/946665 [17:29:25] ah ok, thanks hnowlan! [19:33:29] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-esams:FPC0 Parity error - https://phabricator.wikimedia.org/T318783 (10cmooney) RMA in progress, Juniper happy with address for replacement and staff at destination are aware of delivery. I will decom the existing faulty card on Sunday when on site and prep... [20:13:55] 10Traffic, 10SRE: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605 (10ssingh) [20:15:54] 10Traffic, 10SRE: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605 (10ssingh) In discussion with @cmooney, we will be revisiting this task again when Traffic does some other authdns-related work, so removing it from the Traffic-Icebox.