[08:39:44] 10Traffic, 10Data-Engineering, 10Data-Platform-SRE, 10SRE: Move varnishkafka to PKI - https://phabricator.wikimedia.org/T337825 (10elukey) 05Open→03Resolved a:03elukey [08:59:57] hey vgutierrez - would you be available to give me a hand with rolling https://gerrit.wikimedia.org/r/c/operations/puppet/+/929674 out this morning? [09:01:24] hnowlan: morning, I think so [09:01:31] just wondering as regards the depooling process if there's any specific hosts that would be better to pick, and also which service they're using - I tried `confctl select service=text-https get` and I go no results [09:01:56] service=ats-be :) [09:02:16] ahh heh [09:03:21] any of 'service=ats-be,cluster=cache_text,dc=codfw' should do the trick [09:04:51] looks like cp1082.eqiad.wmnet isn't pooled atm, could I just use that or is it special? [09:05:28] ah that's cache_upload, nm [09:09:39] randomly picked cp2037.codfw.wmnet - just waiting for one of the dev team to be on hand and I'll let you know [09:10:12] hnowlan: ack [09:10:35] hnowlan: BTW, do you have a sample request that's valid on the new endpoint? [09:16:37] vgutierrez: yep: https://rest-gateway.discovery.wmnet:4113/en.wikipedia.org/v1/pdf/Tornado/a4/desktop [09:17:31] at the edge that would be https://en.wikipedia.org/api/rest_v1/page/pdf/Tornado [09:17:34] I think we should be good to go whenever suits [09:18:04] fabfur: how's your cookbook going? [09:18:36] hnowlan: give us a few minutes while we finish an upgrade in eqsin [09:18:40] no errors at the moment [09:19:39] about ~3 hosts done for text and 3 for upload [09:19:40] vgutierrez: ack, no worries [09:19:46] hnowlan: hmm that triggers a 404 here [09:19:55] vgutierrez@cp3060:~$ curl -H 'Host: en.wikipedia.org' https://rest-gateway.discovery.wmnet:4113/en.wikipedia.org/v1/pdf/Tornado/a4/desktop [09:19:55] {"httpCode":404,"httpReason":"Not Found"} [09:23:13] ahh ofc the host header will still be set when the request gets rewritten 🤦 [09:23:24] envoy is very strict about that, easy to fix though [09:23:32] hnowlan: you need to reissue the TLS certs as wel [09:23:34] *well [09:24:17] hnowlan: the TLS cert needs to include the public hostnames as well [09:24:26] ahhh [09:25:17] https://www.irccloud.com/pastebin/mtRo6DWo/ [09:25:27] you can see there restbase cert VS rest-gateway one [09:25:38] good point, thanks for that [09:26:00] hnowlan: basically any SAN listed in our unified cert should be there as well [09:26:38] no problem and sorry I've missed this while checking your CR [09:32:27] vgutierrez: think this should cover it https://phabricator.wikimedia.org/P49482 I took the alt_names from the restbase.discovery.wmnet config [09:33:00] nice :) [09:48:31] we already finished with the eqsin upgrade, let me know when you're ready :) [09:54:29] cool, just rolling out the certs now [10:09:51] vgutierrez: okay, rolled out - the request you were making earlier looks good [10:13:14] yep [10:13:23] but https://rest-gateway.discovery.wmnet:4113/en.wikipedia.org/v1/pdf/Tornado/a4/desktop doesn't match what you got in the CR [10:13:49] https://rest-gateway.discovery.wmnet:4113/api/rest_v1/$2/pdf/$3 [10:14:14] will the glob in the CR not catch everything after /pdf/? [10:14:56] hnowlan: what about /v1/ in your sample request VS /api/rest_v1/ :? [10:18:07] vgutierrez: that's matching the pattern of the restbase URLs - the rb-mw-mangling.lua rewrites the path right? [10:20:20] hnowlan: hmm [10:20:51] yep, you're right [10:21:17] ideally in future we'll support that natively in the rest-gateway and remove our reliance upon that [10:33:47] hnowlan: so.. let's disable puppet on A:cp-text and depool cp2037? [10:34:16] sounds good! [10:34:43] `sudo confctl select service=ats-be,cluster=cache_text,dc=codfw,name=cp2037.codfw.wmnet set/pooled=no` looks ok? [10:35:28] you don't need cluster or dc keys [10:35:35] name= is already more specific than that [10:35:46] ack [10:36:55] hmmm [10:37:00] should I just use name and not service? I see it's also pooled for cdn [10:37:18] hnowlan: I've mentioned cluster and dc, not service :) [10:37:26] service+name makes sense [10:37:37] BTW... I see that rest-gatewayt doesn't send an ETag header [10:37:45] is that a feature or a bug? :) [10:38:21] even if it sends access-control-expose-headers: etag [10:40:10] vgutierrez: it's both, heh. restbase sends that header and it was requested that we match restbase behaviour as closely as is possible [10:40:22] but restbase also doesn't set ETags [10:40:49] ook [10:41:11] seems like a good candidate to remove in future though [10:42:05] I've depooled cp2037, going to disable puppet [10:46:13] looking good : [10:46:16] :) [10:46:43] gonna merge and enable puppet on cp2037 - do I need to do a reload or will puppet handle that? [10:47:07] puppet handles everything [10:49:03] hnowlan: weird.. I saw the puppet run being triggered before the +2 message on -operations [10:49:19] and indeed.. it isn't applying your commit :) [10:49:28] Jun 27 10:49:00 cp2037 puppet-agent[2209342]: Applying configuration version '(ae599e626a) Alexandros Kosiaris - url_downloader: Remove the esams entries marked TODO' [10:49:42] vgutierrez: yep, that was me, just a habit. [10:50:12] going for real now [10:51:16] nice [10:51:20] ATS reloaded as expected [10:51:44] you can target ATS on 127.0.0.1:3128 [10:52:02] just set Host and X-Forwarded-Proto: https headers [10:53:35] looks okay.. I think! testing a bit more [10:54:19] hmmm [10:54:22] CacheResultCode:ERR_CLIENT_READ_ERROR CacheWriteResult:- ReqMethod:GET RespStatus:200 OriginStatus:000 [10:54:33] can you share the curl output? [10:54:51] on cp2037: curl -H "Host: en.wikipedia.org" -H "X-Forwarded-Proto: https" 127.0.0.1:3128/api/rest_v1/page/pdf/Tornado [10:55:00] just returns binary output as expected [10:56:00] hnowlan: server: restbase2016 :) [10:56:19] you're getting a cache hit on ATS for that URL [10:56:26] ahh. [10:57:18] if you append something like ?hnowlan=1 to the URL you'll see that it's still hitting the old service [10:58:22] ahh, sigh [11:03:30] so the match isn't working at all I guess [11:03:55] * vgutierrez double checking some stuff [11:07:42] hnowlan: yep, that seems to be the case [11:08:45] I'll revert for now [11:09:31] https://gerrit.wikimedia.org/r/c/operations/puppet/+/933408 [11:12:58] hnowlan: hmmm yeah, it looks like regex_map only supports regex on the Host header [11:13:11] per https://docs.trafficserver.apache.org/en/9.1.x/admin-guide/files/remap.config.en.html?highlight=regex_map#regular-expression-regex-remap-support [11:13:27] Only the host field can contain a regex; the scheme, port, and other fields cannot. For path manipulation via regexes, use the Regex Remap Plugin. [11:14:11] and that's https://docs.trafficserver.apache.org/en/9.1.x/admin-guide/plugins/regex_remap.en.html#regex-remap-plugin [11:14:43] hnowlan: you could also perform the redirection in lua [11:15:14] probably safer cause you can provide an unittest for that use case [11:16:02] ahhhh, cool [11:16:08] yeah that'd be ideal [11:18:30] I'll give that a go so [11:18:35] reverted on cp2037 [11:18:42] hnowlan: you could use https://gerrit.wikimedia.org/r/c/operations/puppet/+/900704 as an example [11:19:02] ping me if you need help to come with a PoC [11:19:36] vgutierrez: oooh yeah, that would make it easier to migrate other services gradually too once there's a working version in place [11:20:21] I've repooled cp2037 and I'll reenable puppet now. Thanks for all the help! [11:21:31] they do the matching based on the Host header [11:21:34] you need to use https://docs.trafficserver.apache.org/en/9.1.x/admin-guide/plugins/lua.en.html#ts-client-request-get-uri [14:11:48] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate group0 to Kubernetes - https://phabricator.wikimedia.org/T337490 (10Clement_Goubert) [14:14:24] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Migrate group1 to Kubernetes - https://phabricator.wikimedia.org/T340549 (10Clement_Goubert) [14:14:50] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Migrate group1 to Kubernetes - https://phabricator.wikimedia.org/T340549 (10Clement_Goubert) p:05Triage→03Medium [15:08:44] 10Traffic, 10Patch-For-Review: Write a cookbook to handle upgrades of ATS - https://phabricator.wikimedia.org/T335531 (10BCornwall) 05Open→03Resolved [19:13:20] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10leila) [19:13:29] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10leila) I'm going to remove this task from the Backlog lane of the #Research board given that there is no task for Resea...