[05:45:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools: DHCP traffic to install server is missing - https://phabricator.wikimedia.org/T337345 (10ayounsi) I worked around the issue by disabling "dhcp-relay" on cr2-eqiad `install1004:~$ sudo tcpdump -i ens13 "host 10.65.0.1"` is the easiest way to dete... [07:45:37] 10Traffic, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) I've added a v1.26 branch to the envoyproxy repo with the upstream code removed and packaging the upstream binary instead: https://gerrit.wikimedia.org/r/plugins... [08:54:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools: DHCP traffic to install server is missing - https://phabricator.wikimedia.org/T337345 (10cmooney) >>! In T337345#8875421, @ayounsi wrote: > So it's either a Junos bug or the need for another nerd knob. > Edit: [[ https://www.juniper.net/documenta... [09:23:15] 10Traffic, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10akosiaris) > As with kubernetes and isio I choose a branch per minor version instead of "envoy-future" to make it more clear and to allow for easier upgrades of older vers... [09:35:22] thought this might be interesting to you folks https://dropbox.tech/frontend/investigating-the-impact-of-http3-on-network-latency-for-search [09:47:18] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools: DHCP traffic to install server is missing - https://phabricator.wikimedia.org/T337345 (10cmooney) The Juniper [[ https://www.juniper.net/documentation/us/en/software/junos/dhcp/topics/topic-map/dhcp-relay-agent-security-devices.html | docs ]] do s... [10:14:34] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Create Quality of Service design for WMF internal networks - https://phabricator.wikimedia.org/T316358 (10jbond) >>! In T316358#8469318, @cmooney wrote: > @jbond I've uplaoded a separate patch (above) that makes a stab and working this clos... [10:22:41] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools: DHCP traffic to install server is missing - https://phabricator.wikimedia.org/T337345 (10cmooney) >>! In T337345#8874938, @Volans wrote: > I wonder if this has something to do with https://gerrit.wikimedia.org/r/c/operations/homer/public/+/908346... [11:46:35] Hi all! [11:46:37] Can anyone tell me if Varnish handles If-None-Match headers? And if it does, will it still loop that header through to the backend in case of a cache miss? [12:47:15] 10Traffic, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) >>! In T300324#8875975, @akosiaris wrote: > * Is there a specific reason that debian/source/format says 1.0 instead of 3.0 (quilt) ? > * debian/changelog should... [13:37:04] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10ssingh) @Jclark-ctr: Hi John, Traffic has completed its work on the dns hosts in codfw, so whenever you are ready to work on this, please go ahead. All we need from you is to finish the... [13:51:06] duesen: so... INM support on varnish doesn't seem to match https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-None-Match [13:52:13] vgutierrez@carrot:~/wikimedia.org/operations/puppet/modules/varnish$ curl -H 'If-None-Match: "90902610f3ab1a71752935cf3986abe4"' "https://upload.wikimedia.org/wikipedia/commons/thumb/7/73/Blue_pencil.svg/15px-Blue_pencil.svg.png" -v -o /dev/null 2>&1 |egrep "200|304" [13:52:13] < HTTP/2 200 [13:52:13] vgutierrez@carrot:~/wikimedia.org/operations/puppet/modules/varnish$ curl -H 'If-None-Match: 90902610f3ab1a71752935cf3986abe4' "https://upload.wikimedia.org/wikipedia/commons/thumb/7/73/Blue_pencil.svg/15px-Blue_pencil.svg.png" -v -o /dev/null 2>&1 |egrep "200|304" [13:52:13] < HTTP/2 304 [13:52:36] Varnish expects the Etag not to be enclosed by double quotes [13:53:06] and it doesn't loop through values [13:54:04] we are not currently performing any custom transformation of If-None-Match so this seems to be varnish 6.0.x stock behaviour [14:02:46] Wait... "Varnish expects the Etag not to be enclosed by double quotes"? [14:02:57] But the http spec *requries* it to be in double quotes. these are not optional. [14:03:20] that's what it seems from a simple check [14:03:25] we're emitting etags that aren't enclosed in double quotes [14:03:37] < HTTP/2 200 [14:03:39] < date: Tue, 23 May 2023 21:38:39 GMT [14:03:41] < etag: 90902610f3ab1a71752935cf3986abe4 [14:03:43] < server: ATS/9.1.4 [14:03:57] so it might be that's just what Varnish does when the client behind it is already breaking the spec [14:04:29] I'm guessing ATS got that etag from swift [14:04:36] haven't dug further into the stack [14:04:42] cdanis: I wonder if the U-A is going to enclose the Etag with double quotes if those are missing [14:04:53] vgutierrez: really good question :) [14:06:12] * duesen diesthe U-A should not touch the ETag. It's opaque. And naively enclosing it would mess with any W/ prefix marker [14:06:21] oops, sorry. [14:06:24] https://www.rfc-editor.org/rfc/rfc9110.html#name-etag [14:08:07] opaque-tag = DQUOTE *etagc DQUOTE [14:08:22] it looks like something is breaking the RFC :) [14:08:24] ah [14:08:26] https://phabricator.wikimedia.org/T256217 [14:08:30] T256217 [14:08:34] T256217: Swift sends ETAG without double-quotes - https://phabricator.wikimedia.org/T256217 [14:09:20] https://phabricator.wikimedia.org/T256217#8069422 [14:09:24] so.. config issue on swift side [14:11:48] heh and there's also T295556 [14:11:49] T295556: Image requests sending neither "Last-Modified" nor "ETag" HTTP headers. - https://phabricator.wikimedia.org/T295556 [14:12:05] > What appear to happen is that when a thumbnail is first generated on a Varnish/ATS/Swift miss, e.g. after upload, re-upload, or purge, the request goes to Thumbor which responds without an E-Tag header. However, this is not self-correcting on the next request, because the Thumbor response is CDN-cacheable. So even on repeats, it remains without E-Tag [14:13:59] *sigh* [14:14:34] Regular page content doesn't come with an ETag either [14:16:42] And varnish has the habit of "weakening" ETags, which isn't standard compliant either. [14:18:02] AS far as I can tell, it seems to work with resource loader, e.g. https://en.wikipedia.org/w/load.php?lang=en&modules=site.styles&only=styles&skin=vector-2022 [14:18:12] The response contains etag: W/"9p1df" [14:18:37] And the browser correctly sends If-None-Match: W/"9p1df" on the next request. [14:20:45] And the response comes back with status 304 and x-cache-status: hit-front [14:21:11] So I guess Varnish does handle If-None-Match, if a valid ETag is given. [14:21:21] Seems like it would be nice to get that working for thumbnaisl.. [14:21:43] Anyway. I was asking for use with REST APIs. And now I know that it should work. Thanks :) [14:28:04] duesen: yes, that's a response generated by varnish [14:28:32] so varnish doesn't enforce the DQUOTE part of the opaque-tag but that should be irrelevant IMHO [14:29:36] It doesn't? My test was using an ETag that has correct quotes. It's generasted by resourceloader. I don't know if it also works with an unquoted etag. [14:30:51] duesen: see the curl output I've pasted before [14:31:05] against upload.wm.o [14:31:27] first one with DQUOTEs returns a 200 [14:31:32] second one without DQUOTEs returns a 304 [14:31:41] ah right. [14:31:46] cause swift doesn't send the Etag enclosed with DQUOTEs [14:31:48] so yea, it'S treated as opaque. [15:00:33] sukhe: im just going to have a quick play with your anycast_hc cr to see if i can get pcc to work [15:00:57] jbond: thanks, all yours :P [15:01:42] we can also just try this: disable Puppet on P:bird::anycast, merge on one of the doh hosts, and then also quickly reimage to see if the initial puppetization goes well [15:01:57] (which is where we care about the puppet-level dependency, and not the systemd one) [15:02:22] happy to do that if you think that's just better [15:02:39] no i think there is still a bug so no point mergeing yet [15:02:47] ok thanks [15:03:12] fwiw as I noted, I did try with the .service bit removed. but feel free to try/confirm [15:03:43] ack [15:04:57] 10Traffic, 10SRE, 10serviceops, 10Platform Team Initiatives (API Gateway): Handle edge cache invalidation for the api gateway - https://phabricator.wikimedia.org/T324200 (10kamila) [15:05:33] 10Traffic, 10SRE, 10serviceops, 10Patch-For-Review, 10Platform Team Initiatives (API Gateway): Create Benthos docker image - https://phabricator.wikimedia.org/T336658 (10kamila) 05In progress→03Resolved Image built and published. [15:17:00] 10Traffic, 10envoy, 10serviceops: Refactor envoy.filters.http.router and envoy.filters.listener.tls_inspector - https://phabricator.wikimedia.org/T337405 (10JMeybohm) [15:18:51] sukhe: https://gerrit.wikimedia.org/r/c/operations/puppet/+/922514/8..12 [15:18:55] https://puppet-compiler.wmflabs.org/output/922514/41321/ [15:20:30] jbond: interesting! [15:20:42] Systemd::Service::Name [15:20:45] would have never guessed :> [15:20:56] :) [15:21:20] do we need to require on service_params? [15:21:26] or the service itself? [15:21:51] service_params will mean the require ends up going on toi the Service[] define [15:21:54] in theory we are only concerned with the first puppetization right? because systemd will take care of the other deps once the service is actually running [15:22:27] yes [15:23:05] and puppet shoiuld be clever enough to make sure the daemon-reload is opnly done once so the ordering there wont mater to much [15:23:43] tbh we could probably drop the require all together but im not 100% confident on that [15:23:56] 10Traffic, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) [15:24:38] jbond: yeah :) [15:24:47] I think probably best that it's there vs it's not there, that's how I view it [15:25:47] yes in this instance i think its usefull [15:25:54] if only to document expectations [15:26:42] yeah! [15:27:01] thanks for fixing the above. I was quite close to just merging and trying it out but that wouldn't have been so nice :) [15:28:52] np [15:29:12] going to roll it out slowly, just in case [15:29:43] ack sgtm [16:12:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools: DHCP traffic to install server is missing - https://phabricator.wikimedia.org/T337345 (10ayounsi) Copying the commit message as it have the RFO and fix details: The modern DHCP implementation on Juniper devices forwards ALL DHCP packets to the co... [18:22:15] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools: Add network devices fingerprints to known_hosts - https://phabricator.wikimedia.org/T327643 (10Volans) With T336485 almost completed, we could consider integrating the two things, getting this one off exported in some place and then have the `sre.... [19:38:24] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools: Add network devices fingerprints to known_hosts - https://phabricator.wikimedia.org/T327643 (10ayounsi) My initial guess was to add them to https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/common.y... [19:50:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools: DHCP traffic to install server is missing - https://phabricator.wikimedia.org/T337345 (10Jclark-ctr) @ayounsi the provisioning script is still failing in row e/f. dbproxy1026 dbproxy1027