[02:31:22] sukhe: (low prio) - how crazy would it be to inject "Vary: User-Agent" in our edge responses to external clients? We already emit Cache-Control: private, max-age=0, no-transform. So while this is at first glance a big no-no, would it be fine external-facing? [02:31:24] Context: https://phabricator.wikimedia.org/T403866#11170665 [02:31:58] TLDR: Chrome Mobile tries to re-use the browser cache between your normal pageview and the reload after "Request desktop version" which makes it switch the User-Agent string it advertises. [02:32:34] I don't know why Google thought it was a good idea to send If-Modified-Since for that navigation (Firefox/Safari don't), but alas. It's arguably correct HTTP semantics. [02:34:38] The end-result is that on pilot wikis, mobile clients receive a mobile page with Last-Modified, then if you swithc to a desktop UA string, it reloads the page with LM in IMS and our edge then bravely responds with HTTP 304, despite this client/UA not having loaded this page before, but because the Last-Modified timestamps tend to be the same for both cache objects, it "works" in a way is unhelpful. [10:51:29] Please check https://www.wikimediastatus.net/ for soon-to-be-started maintenance, affecting mainly English Wikipedia [11:06:10] I can't get sretest2010 to install now, I think because the installer is not updating one of the disks correctly, so it keeps rebooting of the other disk (in the mdraid array) which has still the old mdraid UUID set in it. [11:06:25] so it just drops into the grub rescue prompt which then is unusable [11:07:27] (and just says "error: unknown filesystem" if you try and load other modules) [11:08:16] (I had similar with another SM system ms-be1083 booting via EFI the other day, but was able to set prefix / root and then bring it up, but that's not working with sretest2010) [11:33:06] I remember someone mentioning running into installer issues, and getting a solution [11:33:21] but I have not idea if related [11:40:48] Emperor: if you can access the host (with either ssh, d-i console, etc..) you could wipe the partition table like we do with the decom cookbook: [11:40:52] see https://gerrit.wikimedia.org/r/plugins/gitiles/operations/cookbooks/+/refs/heads/master/cookbooks/sre/hosts/decommission.py#310 [11:47:10] volans: I think the installer doesn't have wipefs in it. [12:42:50] Krinkle: noted, will respond after going through it [12:44:37] it really doesn't help that this node can't PXE unless set via the HTML5 console, and the HTML5 console then can't talk to the installer [13:50:01] effie: what is the primary task for hcaptcha rollout? [13:50:15] ah never mind I see it in your commit T403416 [14:24:18] effie: vgutierrez: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1187459/1/hieradata/common/service.yaml please check everything. [14:24:46] effie just updated and is rolling out the ports [14:24:51] (ports change) [14:25:21] alright, if puppet is running on the cp hosts, I can enable puppet on urldownloader1004 too [14:56:56] sukhe: ports change is rolling out now. it was blocked on at least some hosts by two successive puppet disables for (separate) work v.gutierrez and I were doing [14:57:09] swfrench-wmf: thanks, vg shared. [14:57:31] ah, good good. just wanted to connect the dots in case it was non-obvious :) [14:57:35] yep [15:29:46] https://gerrit.wikimedia.org/r/1187472 looking for a quick review on this please while my usual kind reviewers are busy :> [15:30:29] cdanis: <3 [15:47:47] mutante: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1187020 [15:59:01] moritzm: the change applied ok, but I am getting a repeating connection error: Could not connect to webproxy.eqiad.wmnet:8080 (2620:0:861:3:208:80:154:74), connection timed out Could not connect to webproxy.eqiad.wmnet:8080 (208.80.154.74), connection timed out [15:59:03] hi folks, just a reminder that we'll be starting our dc switchover live test soon [15:59:41] moritzm: do we need to open some port or something? [16:00:26] or running puppet on another host? [16:02:06] jynus: cool, thanks [16:02:31] oh, I think there is a mistake [16:02:55] packages are on component/bacula9 [16:03:23] but I think thirdparty is defined [16:03:24] let me check [16:07:22] mutante: a quick review to unbreak people* hosts ? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1187490 [16:08:40] ship it [16:08:56] jynus: done [16:09:04] thanks [16:12:59] I think it worked [16:13:06] double checking there was no leftovrs [16:14:12] I see some notices, but it doesn't seem related to the patches [16:15:18] backup ran correctly, so I am going to resolve the ticket [16:15:25] and update on email [16:15:29] :) [16:15:51] no leftovers from the previous repo that I could see [16:25:12] mutante: are you deploying the bacula monitoring revert? [16:25:43] jynus: yes [16:26:02] thanks, just making sure [16:27:27] deployed on backup1014 [16:27:41] nice, thank you [16:27:49] will resolve the ticket now [16:27:58] cool! [16:28:41] I certainly expected this whole thing to take longer than it did. this is great [16:28:58] unstalls other ticket [16:29:24] I will mention this when I give you credit on the email [16:29:43] aww, ty