[07:26:35] Hi! I pushed a change set for the cirrus update last night, but CI builds fail. This is due to my integration tests expecting a working docker environment that is not provided during the CI build. Could we enable docker-in-docker for that? [08:03:18] pfischer: do you have a link to the CR? [08:06:11] probably https://gerrit.wikimedia.org/r/c/search/cirrus-streaming-updater/+/832379 [08:08:57] We might be able to enable docker-in-docker, that would need discussion with release engineering. I'm wondering if we could remove the dependency on docker with something like kafka-junit [08:15:58] we discussed with Antoine and that might not be an option with jenkins, we'd have to ask if this is going to be possible with gitlab CI [08:17:18] in the meantime we can have a look at kafka-junit peraps [08:17:47] docker in docker is not supported by our CI AFAIK, not sure about gitlab. What you can do though I think is to have a custom docker image for running CI on that specific repo. And that could be customized in ways that should allow to make it work. But you'll have to check with RelEng [08:20:15] tracking task is T283724 [08:20:16] T283724: Tracking task to support running arbitrary Docker images - https://phabricator.wikimedia.org/T283724 [08:21:07] volans: yes I think Antoine mentionned this but that is not something he'd like to replicate with jenkins at least, no clues where are with gitlab on this subject tho [08:29:14] oh you meant building a custom docker image like what we do for MW I suppose (have all the services running in a single image managed by supervisord), this might be an option I suppose [08:30:59] yes, but you have to ask Antoine though ;) [08:31:13] he's in this chan :-P [08:31:33] looks like elastic2043 is down. Nothing urgent, but could someone have a look? (cc: ryankemper, inflatador) [09:03:28] Antoine already knows our problem :P [09:03:30] pfischer: I've added a few comments on your CR (https://gerrit.wikimedia.org/r/c/search/cirrus-streaming-updater/+/832379). I only had time to look at the Maven side yet. [09:03:53] Feel free to ping me if you need more detail / context on any of those comments (mostly minor things). [12:38:40] bit of a last minute, but I'm going to skip the retrospective today. Long week for me already with that offsite. Does anyone wants to take over for this week? [12:52:05] Greetings [12:52:16] gehel ACK, will check 2043 [13:27:02] gehel how'd you know elastic2043 was having problems? I don't see any alerts in my email or in operations IRC [13:30:04] inflatador: I noticed it on -operations: Thu 08:28:15 icinga-wm| PROBLEM - Host elastic2043 is DOWN: PING CRITICAL - Packet loss = 100% [13:30:23] that's UTC time [13:31:04] volans thanks. I have IRC highlights on 'elastic', I guess I somehow scrolled past it [13:57:50] Best guess with 2043 is an OOM issue, grafana has some goofy RAM stats up until the reboot [14:10:25] dcausse looks like enwiki_titlesuggest index has 9599068 docs in both eqiad and codfw, is that enough sanity-checking to re-enable eqiad? re: https://phabricator.wikimedia.org/T308676#8237328 [14:10:43] inflatador: looking [14:16:47] hm.. eqiad@9200 has 227 titlesuggest shards, 228 for codfw looking [14:17:07] 9400&9600 are fine [14:19:48] eqiad is missing eswiktionary_titlesuggest [14:21:29] ah no it's codfw that has eswiktionary_titlesuggest twice... fixing [14:24:01] inflatador: we should be good to revert the mw config patch [14:26:31] dcausse this patch? https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/824787 [14:27:35] inflatador: yes made a revert already: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/832323 [14:28:26] dcausse does that have to go out with the train, or how does that work? [14:28:55] this is the mediawiki config, it has special deploy windows 3times a day [14:30:28] these are the "backport windows" at https://wikitech.wikimedia.org/wiki/Deployments [14:31:04] inflatador: I guess this one could be merged: https://gerrit.wikimedia.org/r/c/operations/puppet/+/815784 [14:32:17] inflatador: would you be around at 20:00–21:00 UTC ? [14:32:35] I can schedule the mw-config patch for this window [14:33:07] I can't be around at that time [14:34:34] dcausse Y, I'll be around [14:34:44] ok adding the patch there [14:42:57] cool, I know you aren't going to be there but CC'd you to invite [16:04:29] workout, back in ~30-40 [16:47:38] back [17:31:11] lunch, back in ~45 [18:18:39] back [18:19:57] hmm, paths seem to have changed on dumps.wikimedia.your.org [18:20:40] latest-mediainfo.ttl.gz is now found at https://dumps.wikimedia.your.org/pub/wikimedia/dumps/other/wikibase/commonswiki/latest-mediainfo.ttl.gz [18:20:59] unclear if thats an intentional change, or a temporary issue [18:22:29] additionally the latest dump it has is 20220809, vs 20220912 on our own servers [18:23:10] 20220829 i mean [18:27:07] That's probably not good, does data-persistence own that or...? [18:27:50] well, annoying history here :P That is an external mirror of our own dumps. It takes 3 hours to download a dump from the in-network mirror due to rate limiting, but 15 minutes from an external mirror: https://phabricator.wikimedia.org/T222349 [18:28:41] i just did a quick test from stat1007 though, which has the internal dumps mounted over nfs. I pulled 1GB at 112MB/s, which should be more than plenty. Perhaps we could get the nfs mount on specific blazegraph hosts? [18:29:25] (although to be fair, if i start the download now from internal mirrors and it takes 3 hours, that will be done before we get the nfs mounted i imagine :P) [18:32:33] a quick test with curl gives 4.5MB/s from the internal http mirror though [18:33:59] poking puppet but not sure yet how the nfs mount on stat1007 gets there [18:41:32] hmm, comes from statistics::dataset_mount, seems a bit overly specific to reuse directly but could perhaps crib the implementation [18:52:12] seems plausible, but guessing firewalls on labstore would have to be opened to those specific hosts. well, maybe `nc -zuv labstore1007.wikimedia.org 111` (udp) connects, but tcp doesn't and the nfs options used on stat1007 at least are specifically tcp [19:12:38] meh, responded to the wrong phab ticket and now phab doesn't like my TOTP code to remove the comment :P [19:28:48] ebernhardson Ah, I see your PR, will take a look after the mw-config deploy if that's cool [19:29:36] sure [21:07:25] i suppose might as well, i'm going to hack the wcqs reload script to pull from the new paths on dumps.wikimedia.your.org. The NFS stuff can still move forward and be ready for next time around [21:08:20] (not really hax, it's simply calling with appropriate env vars) [21:27:41] err, already forgot it has old data :P starting again with http from dumps.wikimedia.org and letting it be slow [21:31:02] * ebernhardson wonders if the updater is supposed to be stopped ...seems like yes. It's depooled anyways. disabling puppet and stopping the streaming updater [21:39:20] ebernhardson I can suppress those alerts, you think a day is reasonable? [21:59:38] inflatador: yea probably [22:01:30] ebernhardson ACK, done [23:18:15] meh, the transfer from our local dumps failed ... curl: (18) transfer closed with 9889353350 bytes remaining to read ... trying again