[07:24:42] (SystemdUnitFailed) firing: nginx.service Failed on ncredir2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:29:42] (SystemdUnitFailed) resolved: nginx.service Failed on ncredir2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:11:21] ^^ related to ganeti ongoing work in codfw apparently [08:14:42] yeah, that should have been a one time thing, I'm using a new cookbook it ran into a cornercase which has been fixed now [08:22:11] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [08:30:30] 10Traffic, 10SRE, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) [10:13:55] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate group0 to Kubernetes - https://phabricator.wikimedia.org/T337490 (10Clement_Goubert) [10:20:40] vgutierrez: would it be alright if I gave the pdf routing another try? rest-gateway is behaving properly now (fingers crossed) [10:20:55] hnowlan: fabfur has some update on going [10:21:01] dunno if he finished already [10:21:57] ack [10:22:20] nope, but working only in text@eqiad [10:22:45] yeah we need to disable puppet on the whole text cluster to proceed [10:22:51] so that's gonna impact you [10:23:31] seems that cp1079 didn't run fine investigating why [10:23:53] puppet is enabled becaus I can see from the logs that the catalog has been applied 2m ago [10:24:31] fabfur: so what's the cookbook saying? [10:24:57] https://www.irccloud.com/pastebin/95fzKghn/ [10:25:22] retrying [10:25:45] now has been successful but I want to check that all is ok on this host [10:26:16] err [10:38:27] fabfur: cookbook finished, right? [10:42:52] :? [10:43:57] si [10:44:00] (yes) [10:44:03] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [10:44:26] hnowlan: feel free to proceed [10:44:40] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [10:44:43] ok for me [10:44:52] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate group0 to Kubernetes - https://phabricator.wikimedia.org/T337490 (10Clement_Goubert) 05In progress→03Resolved [10:45:31] I'll wait for drmrs [10:45:48] vgutierrez: cool, thanks! [10:46:35] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate group1 to Kubernetes - https://phabricator.wikimedia.org/T340549 (10Clement_Goubert) [10:46:47] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [10:46:59] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate group1 to Kubernetes - https://phabricator.wikimedia.org/T340549 (10Clement_Goubert) 05Open→03In progress [11:00:24] hnowlan: starting any time soon? [need to plan my lunch break] [11:00:35] vgutierrez: yep! Sorry, underway atm. [11:00:44] yeah.. I saw your +2 [11:02:01] still getting 404s? :( [11:05:07] yep, yet more restbase-isms. Just found out that Restbase is silently adding a page format to requests that don't have one - the gateway should be able to handle it so I'm going to try something quick now, but if it doesn't work I'll revert again. [11:09:55] alright, I think that works [11:10:05] verifying the files locally [11:12:02] files look good [11:14:41] hnowlan: headers don't match [11:15:57] comparing curl -H "Host: en.wikipedia.org" -H "X-Forwarded-Proto: https" 127.0.0.1:3128/api/rest_v1/page/pdf/Tornado/a4/desktop?vgutierrez=$RANDOM -v -o /dev/null output [11:16:00] on cp2039 and cp2037 [11:16:05] cp2039 includes an etag [11:16:09] cp2037 doesn't [11:17:15] https://www.irccloud.com/pastebin/3JwsJK0n/cp2039 [11:17:16] Ah, drat. That's not something I'll fix quickly here [11:17:31] https://www.irccloud.com/pastebin/7QeXcwXb/cp2037 [11:17:47] sorted headers for easier comp [11:19:32] ah well. I'll revert again and see with the team about getting that added to the services [11:19:46] ook [11:19:51] see you next week then ;P [11:20:34] heh ;_; [11:21:13] https://gerrit.wikimedia.org/r/934020 [11:21:45] I mentioned this a few days ago.. sadly I didn't double check current restbase output [11:24:52] Yeah, I didn't see the actual etag at the time 🤦 [11:30:38] hnowlan: maybe I missed some !log line but did you disabled puppet? [11:30:45] *disable [11:31:07] vgutierrez: I did yes - I'm about to reenable if that's okay? [11:31:14] yep [11:31:20] just double checking :) [11:31:39] cp2037 is repooled, everything should be back to normal. Thanks again! [11:31:47] no problem [11:35:30] vgutierrez: is there a specific format we should follow for the etag? I assume it doesn't really matter if we generate it in the same manner as restbase as long as we're consistent [11:36:25] etags are opaque [11:36:44] so as long as it comes wrapped by double quotes it should be fine [11:37:40] cool, thanks [11:38:01] opaque-tag = DQUOTE *etagc DQUOTE [11:38:06] from https://www.rfc-editor.org/rfc/rfc9110#field.etag [14:10:43] hi folks, I have a quick change for ores-legacy (if you have time) - https://gerrit.wikimedia.org/r/c/operations/puppet/+/934336 [14:11:15] 1 beer == 1 CR [14:11:35] elukey: uh? no caching for ores-legacy? [14:12:53] vgutierrez: not sure for the long term, but at the moment we haven't set any cache headers and while testing we see different things depending on where we hit the endpoint from [14:13:12] for example, Ilias hits esams and sees a cached content, I see another one via Marseille [14:13:39] by default Varnish adds some caching right? If no cache header is returned from upstream [14:21:22] elukey: yep [14:23:30] ok thanks! We sort out our caching mess then we'll re-enable :) [14:35:57] 10netops, 10Analytics-Radar, 10Data-Engineering, 10Infrastructure-Foundations: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10Aklapper) (Adding #Data-Engineering project tag for re-triage, as #Analytics-Radar is an inactive project tag after #Analytic... [15:45:40] hnowlan: can I start a cookbook against drmrs or is there some activity pending/going? [15:49:20] fabfur: oh, I've been finished for a long time :) [15:49:32] 👍 thanks! [22:00:02] 10Traffic, 10Data Pipelines, 10SRE: Add a rolled-up cache_status field to druid webrequest_sampled_128 - https://phabricator.wikimedia.org/T319344 (10JArguello-WMF) [22:07:01] 10Traffic, 10Data-Engineering: varnishkafka / ATSkafka should support setting the kafka message timestamp - https://phabricator.wikimedia.org/T277553 (10JArguello-WMF)