[09:24:40] hi folks, I'll be turning up a new lvs service here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/889066 [09:26:40] vgutierrez: ^ [09:26:49] * vgutierrez looking [09:29:50] cheers [09:36:44] hmmm godog I'm not sold on having a ProxyFetch allowing a 401 as a valid status code TBH [09:37:36] vgutierrez: oh ok, sorry I merged -.- [09:37:48] what's your concern ? [09:38:01] (holding off for now on restarting pybal) [09:38:39] that a 401 doesn't validate that the service is ready to handle proper requests [09:39:47] yeah that's fair, arguably that's probing apache health rather than opensearch / the reverse proxied service [09:40:04] as the service owner I'm okay with that, at least for now that is [09:41:40] I'll see if I can exclude the status endpoint though [09:46:37] actually nevermind, I did research this before and upstream doesn't have a proper "ready" endpoint [09:46:41] https://github.com/elastic/elasticsearch/issues/81168 [09:47:08] the tl;dr there is that e.g. in opensearch's helm chart they do a tcp probe instead [09:48:31] vgutierrez: ^ going ahead with the pybal restarts if that works ? [09:48:47] sure [09:49:09] ok! [09:49:18] endpoint returns a 401 as expected, used SAN is in the cert used.. [09:49:22] shouldn't be a problem [09:51:01] as usual, lvs2010, lvs1020, lvs2009 and lvs1019 [09:51:23] *nod* thank you [10:05:26] hah ok so proxyfetch's http client considers the 401 as a failure it looks like, even though I'd be okay with that [10:08:14] and it looks like the expected status is checked only for the purposes of handling redirects or not, not the actual status back [10:08:54] ok different strategy! no proxyfetch [10:12:21] (maybe) [10:28:01] godog: yup.. tcp check seems the way to go [10:28:57] vgutierrez: yeah, I'm looking into a pseudo-ok ready endpoint, namely GET / [10:29:19] still not great but at least we're testing the happy http train of envoy -> apache -> opensearch [10:51:06] in other words this https://gerrit.wikimedia.org/r/c/operations/puppet/+/889083 [10:51:28] * vgutierrez looking [10:52:13] godog: / can't get you into issues? [10:52:24] compared to something like /_healthz? [10:54:08] vgutierrez: not sure what kind of issues, but yeah I'm with you that sth like /_healthz would be optimal here [10:58:19] ok gotta go run an errand, bbiab [14:54:18] FYI heads up I'm going with the last roll-restart of pybal to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/889083 [14:54:31] in this order lvs2010, lvs1020, lvs2009 and lvs1019 [15:50:27] godog: ack [15:53:18] hey, could I ask for a review on https://gerrit.wikimedia.org/r/c/operations/puppet/+/863380 ? I think it should be mostly straightforward since it reuses the majority of the non https balancer, but that's the type of thinking that usually gets me into trouble [15:57:33] damn I've missed that one sorry [15:57:41] herron: please could you re run PCC? [16:02:27] 10Wikimedia-Apache-configuration, 10DNS, 10SRE, 10Traffic-Icebox, 10Patch-For-Review: Remove aliases `minnan` and `zh-cfr` for the Min Nan Wikipedia - https://phabricator.wikimedia.org/T230382 (10Ladsgroup) hmm, it's not too complicated, my only concern is the order they should go in, I don't think that... [16:13:32] vgutierrez: sure [17:08:16] really interesting: https://blog.apnic.net/2023/02/08/the-root-of-the-dns-revisited/ [17:28:18] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host dns4004.wikimedia.org with OS bullseye [17:47:29] hi folks, I see several instances in the 'traffic' cloud vps project failing puppet, could you fix those please? https://prometheus-alerts.wmcloud.org/?q=%40state%3Dactive&q=project%3Dtraffic [17:48:20] taavi: thanks, will take a look [18:34:19] 10Traffic, 10netops, 10DBA, 10Data-Persistence, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10JMeybohm) [18:37:03] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host dns4004.wikimedia.org with OS bullseye executed with errors: - dns4004 (**FAIL**) - Downtimed o... [18:38:24] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host dns4004.wikimedia.org with OS buster [19:31:11] 10HTTPS, 10Diff-blog, 10SRE, 10Technical Blog: Send HSTS header on all Wordpress VIP-hosted domains - https://phabricator.wikimedia.org/T270034 (10Varnent) a:05Varnent→03None [19:34:38] 10HTTPS, 10Diff-blog, 10SRE, 10Technical Blog: Send HSTS header on all Wordpress VIP-hosted domains - https://phabricator.wikimedia.org/T270034 (10Varnent) I believe it is in pipeline for any requests lingering - but probably best to check with @CKoerner_WMF for diff. For the other two - while not her dire... [19:51:45] 10HTTPS, 10Diff-blog, 10SRE, 10Technical Blog: Send HSTS header on all Wordpress VIP-hosted domains - https://phabricator.wikimedia.org/T270034 (10Sbenchagra) @Dzahn Thanks for following up. I am just seeing this ticket. I will follow up and get back to you. [20:20:32] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host dns4004.wikimedia.org with OS buster executed with errors: - dns4004 (**FAIL**) - Downtimed on... [20:20:38] 10Wikimedia-Apache-configuration, 10SRE, 10Wikimedia-Site-requests, 10Patch-For-Review: Temporarily redirect sgs.wikipedia.org to bat-smg.wikipedia.org until bat-smg->sgs move can be done - https://phabricator.wikimedia.org/T204830 (10Dzahn) sgs.wikpedia.org has been added to DNS now sgs.wikipedia.org is... [20:43:15] 10Wikimedia-Apache-configuration, 10DNS, 10SRE, 10Traffic-Icebox, and 2 others: Remove aliases `minnan` and `zh-cfr` for the Min Nan Wikipedia - https://phabricator.wikimedia.org/T230382 (10BCornwall) DNS merged and deployed! Now just waiting for deployment of the appserver stuff. [23:19:36] 10HTTPS, 10SRE, 10Traffic-Icebox: Enable HSTS on store.wikimedia.org for HTTPS - https://phabricator.wikimedia.org/T128559 (10SHust) @Dzahn, After I shared the issue I'm having with Brendan Campbell (who added himself as a subscriber) + a few other colleagues, he suggested that I ask you for help! For Shopi... [23:29:26] 10HTTPS, 10SRE, 10Traffic-Icebox: Enable HSTS on store.wikimedia.org for HTTPS - https://phabricator.wikimedia.org/T128559 (10Dzahn) Hi @SHust let me clarify, so shopify is saying that store.wikimedia.org must have a AAAA record? The current status is that store.wikimedia.org is an alias for c.ssl.shopify.c... [23:37:24] 10HTTPS, 10SRE, 10Traffic-Icebox: Enable HSTS on store.wikimedia.org for HTTPS - https://phabricator.wikimedia.org/T128559 (10SHust) @Dzahn, Here's a screenshot from my Shopify chat ( I hope I didn’t share anything I shouldn’t have): {F36845020} {F36845023} {F36845022}