[08:33:25] I'll resolve the stray incident open from yesterday's incidents [08:34:34] <_joe_> uhm I think it might fire again soon :/ [08:36:24] hello folks [08:36:35] if nobody opposes I'd merge/build/rollout wmf-certificates for https://gerrit.wikimedia.org/r/c/operations/debs/wmf-certificates/+/742485 [08:36:51] (rollout via debdeploy) [08:45:50] <_joe_> elukey: lemme take a quick look :) [08:47:48] <_joe_> lgtm [08:57:23] thanks :) [13:24:25] elukey: you were about to merge sslcert::trusted_ca: ensure cert bundle readability for group/others (08a534c57d) ? [13:24:37] arturo: yes I was about to ask, thanks :) [13:24:45] cool [14:52:11] godog: any idea why we get stray lingering VO incidents so often? [14:55:30] cdanis: yes and no, happens only for host alerts so it has sth to do with the formatting/parsing I think, T264016 is what I'm using [14:55:31] T264016: Host page did not auto-resolve in VO - https://phabricator.wikimedia.org/T264016 [15:05:53] could someone have a look at <+icinga-wm> PROBLEM - Keyholder SSH agent on deploy1002 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder please? [15:06:27] majavah: hmm thats probably related to me removinf deployment-puppetboared will take a look [15:43:23] jbond, herron: shall we take another stab at the mx2001/LDAP change? [15:43:46] moritzm: good with me [15:43:47] moritzm: sure [15:44:28] ack let me get the cr up [15:44:37] ack [15:45:30] ack merging now [15:46:08] * jbond applying [15:47:18] applied and `sudo exim -bt otrs-test@wikimedia.org ` looks goot to me [15:48:27] test mail via mx2001 also arrived fine [15:49:03] just sent a test mail to otrs-test@w.o from external [15:49:24] ^ akosiaris: can you please check if you can see a new ticket in the test queue? [15:49:45] * akosiaris looking [15:49:55] * jbond also recived mail to jbond@w.o [15:50:45] I see 2 [15:51:18] i sent one from the cli with subject: test-mx2001 [15:51:42] one from jbond@mx2001, 3m ago and one from jmm@viruvalge 2m ago [15:51:50] excellent, thanks [15:51:52] ahh cool thanks [15:52:12] looking good, paniclog empty as well [15:52:27] yes and i dont see anything bad in the exim log [15:52:35] should we leave it 245 huors then do mx1001? [15:52:52] jbond, herron: I'd say we keep this in the current config until tomorrow, then I'll revert the patches that made routing prefer mx1001 for a day [15:52:55] and then mx1001? [15:53:20] but can also do it quicker, fine either way [15:53:37] sgtm, +1 for the cautious approach [15:53:44] moritzm: i thinkyou proposal sgtm [15:53:51] ok [16:19:36] "In this way our code should run and be testable on Pontoon and Deployment-prep without too many swearing or moments of internal sadness." [16:19:43] elukey: 💜 💜 💜 [16:22:53] elukey: can I please stick that on bash? [16:23:26] :) [16:23:51] probably too much swearing, just realized it :D [16:24:39] !bash In this way our code should run and be testable on Pontoon and Deployment-prep without too many swearing or moments of internal sadness. [16:24:39] RhinosF1: Stored quip at https://bash.toolforge.org/quip/hWGpcX0B8Fs0LHO5l3ng [16:24:48] elukey: it still sounds fun [17:23:50] majavah: were there any issues from switching deployment-deploy03? [17:24:56] legoktm: I don't think there were any major issues, it went suprisingly well [17:25:25] when did that happen btw? [17:25:36] yesterday evening my time [17:26:02] uh [17:26:06] which time is your time? :-) [17:26:27] EET = UTC+2 [17:26:42] ah, literally my time! [17:27:03] that's what I said [17:27:07] so probably after I was on it then. so I can't verify that it's all good. maybe next time [17:27:16] "next" time, see what I did there <-- [17:27:41] awesome :D [17:27:43] uh what I mean is that your time is my time :-P [17:27:55] maybe I need a time out... [21:09:10] hi folks. Comms is asking for attention on T296570 in support of a microsite launch they intend to start tomorrow [21:09:10] T296570: Setup subdomain for Foundation messaging site - https://phabricator.wikimedia.org/T296570 [21:13:26] <_joe_> herron: ^^ [21:15:21] I am submitting a patch for this [21:15:46] thanks sukhe I was just looking into the same [21:16:03] can't believe it's "launch tomorrow" once again [21:16:30] <_joe_> mutante: it's not, the task is open since a few days. [21:17:01] <_joe_> and given it's a one-line patch, the somewhat short timing isn't really an issue. [21:22:09] Are we going to find privacy, security, performance, design, a11y, and closed-source issues post-launch of the same nature we found the last five times we launched a WP-VIP site? [21:22:15] TLS does not work for this site when testing via /etc/hosts [21:24:28] who is responsible for managing the VIP hosted sites? [21:30:08] <_joe_> Krinkle: at the very least, this is a private site, meaning it's impacting employees only and so it has a limited scope and I think we can do that yes :) [21:30:25] I believe in recent years, since the redesign of the foundation website, most of them are managed by Comms and outsourced, no longer involving the engineering departments (not in a planned way at least, we do generally end up helping post-launch ad-hoc). [21:30:28] <_joe_> tltaylor: what do you mean with "responsible for managing"? I guess it's automattic [21:31:05] I would assume having a proper TLS cert was blocked on the DNS being enabled [21:31:16] I'm trying to understand who would have the ability to address the issues Timo raised [21:31:18] <_joe_> tltaylor: we can only manage the DNS record, but as herron pointed out, the certificates will need to be installed by who manages the infrastrcuture serving the sites, which is not us [21:31:45] <_joe_> tltaylor: oh ok then I don't have a good answer sorry :) [21:32:03] <_joe_> legoktm: uh possible yes [21:32:22] <_joe_> they still need to serve the temporary domain cert [21:32:37] I assumed that that was the case [21:32:44] so I am not sure if I missed something but I have merged the DNS patch [21:32:51] if there is more, happy to do that [21:33:14] tltaylor: normally the respective teams (design, sec, perf) would briefly consult in those aspects during planning and then confirm prelaunch review, as for anything else technical we do as an organisation. [21:35:22] it seems to have a valid cert now [21:35:40] yep [21:35:51] tltaylor: I didn't realise this was a private site. I don't know exactly what that means, but that probably obviates most vertical concerns. In the past it appeared as if planning processes intentionally avoiding involving Tech inc to eg no review of technical vendor or guiding planned work, and no preview stage shared, which inevitably the led to stress and post-hiv embarrassment, conflict and various teams helping out to fix up the [21:35:51] work. [21:36:44] post-hoc* [21:39:47] <_joe_> tltaylor: the site is now reachable :) [21:40:13] Comms will be appropriately thankful [21:41:43] I'll follow up on the other issues separately