[10:03:32] I'm seeing the following puppet error message at the end of the puppet run when running the agent on the deployment server [10:03:32] Error: Could not send report: Error 500 on SERVER: Server Error: Could not autoload puppet/reports/logstash: Cannot invoke "jnr.netdb.Service.getName()" because "service" is null [10:03:40] I've seen it on some idp servers as well [10:06:34] <_joe_> brouberol: I think the bug has been already reported by godog, I'd check phabricator [10:06:37] there's https://phabricator.wikimedia.org/T388629 for it, no fix know yet [10:06:59] thanks, noted [11:16:35] fyi inflatador - in https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1128533 CI was bypassed and the build is broken for others now [11:21:54] sigh [11:24:22] fix here https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1128836 [11:24:29] thx hnowlan, already +1ed [11:24:38] thanks! [11:33:32] <_joe_> bypassing CI is not cool. [11:34:15] on cookbooks it won't get merged till jenkins provides the V+2 [11:34:33] <_joe_> vgutierrez: well you can bypass it [11:34:50] it got C+2,V+2 and manually merged [11:34:58] before CI could run after the C+2 [11:35:04] oh.. that was in the previous CR [11:35:07] ignore me :) [11:35:26] definitely not cool :( [11:50:18] new SREs always confuse in which repo they should CR+2V+2 and submit and on which they should just hit CR+2. I wish puppet didn't require that so it'd be the same everywhere and less confusing [11:50:44] <_joe_> puppet doesn't require you to V+2 [11:50:52] <_joe_> it requires you to manually submit your change [11:51:18] <_joe_> which is to avoid the useless gate-and-submit step when you're in a hurry to get puppet code out [11:52:23] if I don't have a hurry can I just let jenkins merge it? I wasn't aware it'd be possible [11:57:23] <_joe_> no [11:57:36] <_joe_> as I said it's manually submit to avoid gate-and-submit [11:58:06] claime: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1128440 [11:58:32] effie: yeah that's not deployed [11:58:39] so wf is on 7 [11:58:59] yes just checked releases [11:59:09] then we may have the opposite effect [11:59:15] aand since it's the only thing on 7 [11:59:20] then it needs to pull the image [11:59:21] we do not have 7.4 images cached [11:59:40] and with the strategy it's got, it's pulling the image one replica at a time, 2min * 6 repl [11:59:42] 12 minutes [12:00:00] So 1.5 replicas at a time actually, 8 minutes [12:00:32] but basically if we merge that patch, it should be faster, but also we should override the strategy for mw-wf [12:02:31] should we finalize the migration for wikifunctions effie ? [12:02:58] I'm going to bump thumbor given that the deployment itself is finished, unless there are objections [12:04:20] claime: I think we should not proceed, we are right before the first part of the switchover, I would like to allow scott to do so as is already planned [12:04:56] ok sure [12:05:05] however, we can relax values like maxUnavailable [12:05:22] it helped *a wee bit* [12:10:43] I have this deploy for when things are stable. No rush at all: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1128482 please ping me when it's clear [12:23:44] Amir1: Things are stable, it's just wf being difficult wrt helmfile deployments, you should be ok to merge and deploy that change [12:25:45] ah okay then [12:25:46] thanks [12:31:34] spot the idiot: [12:31:37] https://www.irccloud.com/pastebin/P28zwb1C/ [12:32:08] x) [12:45:02] hmm, big elasticsearch-related diffs in admin_ng [12:46:19] external-services right? [12:46:32] btullis / brouberol ^ [12:47:06] that is expected, and according to inflatador, it is safe to apply [12:48:41] thanks! [13:03:48] reminder: Traffic/services switchover will be happening at 1400 [14:21:13] hnowlan: are you leading the switchover? I have a question (not specifically for you, but to coordinate with you first) [14:21:26] jynus: yep [14:22:18] I was wondering if it would be ok to put non critical hosts from backup into maintenance (I will keep the main bacula hosts ready for a recovery) while it happens? [14:22:33] I'm having problems running cumin from cumin2002 ( https://phabricator.wikimedia.org/P74243 ) , is this affecting anyone else? [14:22:35] or if I should leave puppet free of unrelated patches [14:23:09] jynus: I think that *should* be fine [14:23:23] NM, I think I know what it is [14:23:23] inflatador: do you have any file on your home matching cloudelastic* ? if so add quotes [14:23:24] inflatador: the second one is happening because the * glob is catching files in the cwd [14:23:30] inflatador: my guess is last one lacks the quotes [14:23:43] he he, 3 people saying basically the same :-D [14:23:54] I think it's because we've migrated cloudelastic to opensearch and the alias is pointing to the wrong role now ;( [14:23:54] inflatador: for the first one https://gerrit.wikimedia.org/r/c/operations/puppet/+/1128837 [14:24:33] hnowlan: ok, will probably keep it relatively quiet but shout if needed to stop all maintenance or something [14:24:58] damn, y'all must be psychic! Will mege that shortly [14:25:06] jynus: ack, thanks [14:25:15] and thanks for catching that BTW ;) [14:25:16] inflatador: nope, it just happened to us at the same time :-D [14:25:24] * at some point [14:37:25] after some hiccups, we will be proceeding with the traffic/services switch in the next few minutes [14:37:46] good luck team! [14:41:14] 🍿 [14:49:21] <_joe_> Emperor: swift /thumbor might indeed cause popcorn to be consumed :D [14:49:41] <_joe_> but you can blame Amir1 if it happens [14:50:46] _joe_: all the popcorn round here is salted with the tears of the swift admin [14:50:48] It is necessary :D [14:51:01] I think if that happens to me the headline will be Man Killed In Explosion In Own Popcorn Factory [14:51:08] <_joe_> lol [14:51:14] XD [14:59:47] :D [15:00:33] things have levelled out on the CDN (the docs were very accurate in estimating 20 minutes), moving ahead with services [15:00:44] super [15:18:58] for those who are most familiar with exim4, does this looks sane: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1128888 ? [15:19:41] jhathaway: ^ [15:19:57] arnaudb: looking [15:20:04] thanks jhathaway [15:28:49] arnaudb: looks good [15:29:15] thanks! [15:38:37] services switchover complete [15:39:58] \o/ [15:39:59] \o/ [15:43:04] \o/ [15:43:19] {◕ ◡ ◕} [15:43:51] nice [15:43:54] nice!! [15:50:59] congrats! [15:51:43] 👏 [18:37:57] one note for the on-callers - kartotherian.discovery.wmnet (backend for maps.wikimedia.org) has recently been migrated to k8s and IIRC it was listed among those backends that should remain active/active [18:38:11] because historically it wasn't able to run on one DC only [18:38:27] I noticed that it runs now only from eqiad, with codfw depooled [18:39:05] and it is doing fine, so good news :) We have a slow memleak that causes pods to be eventually OOM-killed after 3/4 days, impact to the users is minimal [18:39:25] now with more traffic the window may shrink, so in case you notice issues with kartotherian/maps please pool codfw in [18:39:33] cc: hnowlan --^ [18:40:44] also cc: nemo-yiannis [18:42:47] elukey: ah, `kartotherian-ssl` is excluded from the services switchover, but `kartotherian` isn't [18:42:49] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/cookbooks/+/refs/heads/master/cookbooks/sre/discovery/datacenter.py#42 [18:43:16] rzl: o/ [18:43:37] if those are entries in service.yaml, the active LVS config is kartotherian-k8s-ssl [18:43:45] okok didn't know it, it makes sense yes [18:43:56] that one either then :) yeah [18:44:36] I was planning to test a dc depool with more capacity but it sounds that the calculations made till now are good :D [18:44:43] very happy about it [18:44:55] offhand I don't know if that was deliberate or an oversight, but either way I'm glad it worked out :D [18:45:13] and easy enough to repool codfw if we find problems, yeah -- that's the whole point of this test anyway [18:50:19] super :) [18:50:26] (logging off, have a good day!) [18:50:32] have a good night! thanks for calling that out [21:23:19] elukey: ack, thanks for the heads-up. good news, even if it's a surprise