[07:49:41] ok, so S3 didn't work right of a bat [07:50:02] worse thing is it jus froze while trying to initialize and I know nothing about what failed [07:58:52] godog: how does swift container map to s3? [07:59:11] it's just a bucket? [08:02:55] zpapierski: that's correct, containers will show up as buckets [08:03:23] does access with e.g. a s3 client cli work ? [08:15:28] not yet, it contatenated bucket name with the address to try to reach it [08:15:39] I'm reading up on what I misconfigured [08:17:06] ah, I think it's a virtual host addressing [08:20:15] ok, so far so good - it told me (correctly) that my bucket doesn't exist [08:25:33] huh, well that's interesting [08:25:34] it worked [08:26:04] at least s3, events don't seem to flow correctly, but maybe I misconfigured something there [08:26:41] godog: thanks, it works as it should [08:26:55] dcausse: I managed to configure s3, it was no problem at all [08:27:13] zpapierski: sure np! glad it is working as expected [08:27:23] it didn't, I expected it to fail :) [08:27:38] (I expect everything to fail, at least for a few days) [08:27:44] lolz, fair enough [09:39:04] o/ [09:39:07] zpapierski: nice! [09:41:31] now the question is, do we try to deploy to k8s using S3 [09:42:26] I only placed a s3-hadoop jar in plugins and modified the config, should be easy [09:43:49] yes, we should prep a new image at least [10:09:04] ejoseph: would you have a moment today for a quick chat? [10:16:15] dcausse: do we have a ticket for that streaming updater issue/ [10:16:16] ? [10:17:03] zpapierski: for s3 I created T302494 [10:17:04] T302494: The WDQS Streaming Updater should use S3 to access thanos-swift instead of the native swift protocol - https://phabricator.wikimedia.org/T302494 [10:29:54] ok, thx [10:56:26] break/lunch [11:07:48] lunch [13:36:32] France is going out tonight with colleagues for her birthday. I'll be watching the kids and ending my day earlier than usual. [13:37:12] zpapierski: I'll skip the unmeeting and your goodbye party. I'll see you on Monday, but in any case, it's been a huge pleasure working with you so far! [13:39:05] gehel: feeling's mutual :) [14:04:31] Happy Friday [14:07:03] o/ [14:33:58] o/ [14:36:45] gehel sent you a last-minute pairing invite, sorry I forgot yesterday [14:36:55] inflatador: I'll be there! [14:38:19] ( ^_^)o自自o(^_^ ) CHEERS! [14:44:06] godog: on what ports does thanos cluster accept connections for S3? it's still https? [14:48:40] zpapierski: yes that's correct, https [14:48:46] thx [14:48:51] sure np! [16:01:51] \o [16:02:58] o/ [16:09:12] * gehel is out for the weekend! Have fun! [16:19:46] I'm in a workshop so won't be able to attend zpapierski 's last gaming mtg. But if you ever want to play some games let me know! [16:20:08] o/ [17:00:26] week-end time [17:17:26] Hello search team o/ I just want to followup again on https://gerrit.wikimedia.org/r/c/operations/puppet/+/764830 related to an incident with codfw this week. Will it be kosher for us to merge this revert or shall we wait for next week? [17:25:05] itamarWMDE: going to need to wait for next week, the resolution we are working on involves replacing one of the deployed libraries and isn't ready yet [17:33:55] no worries, thank you for the quick response [17:36:55] quick workout, back in ~30 [18:05:32] started up saneitizer for one round, it might be working, but we aren't too far from the limits. backlog climbs but it looks like it will probably still clear it the 2hr time window [18:15:43] nice [18:15:46] and also, back [18:17:48] hi, sorry to ping but we have a UBN by cirrus search jobs now https://phabricator.wikimedia.org/T302620 [18:19:32] Amir1: the jobs running right now will take about an hour to clear, they should have started at :20 after [18:20:07] Amir1: oh this has been going longer than the jobs i just started up, hmm in terms of redundant parse i suppose that would be another ticket, sec [18:20:32] ebernhardson: the thing is that these jobs are duplicating one page with the same parser options twice [18:20:38] Amir1: i expect it would be https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/765577 [18:20:57] Amir1: in that we took work that was executing in 3 separate jobs and made it run in one job because the job queue wasn't keeping up [18:21:31] it does the same work for two clusters, wouldn't be surprised if that triggered the parse. The question i would have is though, cirrus isn't supposed to parse anything. It's supposed to be getting the generic anonymous parser cache version [18:21:58] can it reuse the parser output, that would help to reduce this flood? [18:22:18] Amir1: the code is wholy separate, we literally $job->execute() instead of pushing it into the queue [18:22:37] ebernhardson: for wikitext, it has to parse all the templates etc. for wikibase, we removed that [18:22:37] it could be fixed, but in days not hours [18:23:24] Amir1: right, but my understanding is mediawiki has a cache of all these, and for most live revisions there should be no need to parse because it's already cached [18:23:30] has that changed in the 6+ years since we designed cirrus? [18:23:48] you mean parsercache? [18:23:50] yes [18:23:55] I don't know why it's not using that [18:24:07] it can be because during one job, it's not saved to PC? [18:24:38] like since it's not there yet, it reparses making this job much less performant [18:25:01] i'm not entirely sure, currently looking. All this is in CirrusSearch\BuildDocument\BuildDocument [18:25:20] aah, and we recently removed PC save for cirrus search jobs as it basically overflow the whole PC [18:25:22] it has a parser cache, but maybe it's supposed to be used differently or some such now? Perhaps this also needs to take some metrics [18:25:59] ahh, that would do it [18:26:35] you probably can use ParserOutputAccess service [18:26:54] it keeps it in cache during the run [18:27:12] hmm, ok that shouldn't be too hard to switch to [18:27:28] but also saves it in ParserCache I think (we probably can avoid that, even if it means patching that class) [18:31:47] looking at the code, it doesn't store them in PC, which makes creating a patch to fixing this much easier [18:34:16] yea looking through this, it seems all the work ends up in ContentHandler::getParserOutputForIndexing. [18:36:44] It's duplicating the logic a bit, we probably can get rid of it there [18:40:08] it sounds like underlying this, in the past the parsercache was big enough to hold all the live revisions, but over time it no longer fits? [18:41:25] ebernhardson: tldr yes but there are a lot of complexities with it, e.g. fragmentation has increased [18:41:45] this is particularly a problem in commons [18:42:15] where there are 100M files, each getting a PC entry + languages that anyone visit them (basically doubling the number of entries) [18:43:07] ouch, yea fragmentation will make things much bigger. I suppose i'm just wondering as we are also thinking about redesigning the way cirrus writes data, seems an important limitation to keep in mind :) [18:43:09] and to me, storing html of every page in wikis seem wasteful [18:43:25] because lots of them won't ever be visited [18:43:38] We visit every page every 8 weeks for a rendering pass [18:43:50] have for years, but it's been turned off recently because the infrastructure stopped keeping up [18:44:21] by visit I mean humans [18:44:48] and for most pages it's fast enough for reparse [18:45:00] (TTL of PC is a month anyway) [18:45:05] ahh, yes there are a lot of ancient pages we hold and index that are never visited, particularly outside the main namespaces [18:45:35] in botpedias for example [18:45:49] 4 million insects in cebwiki, I doubt it'll get a human view [18:45:56] lol :) [18:47:37] in terms of actual code here, I can only think of hacky ways since the ContentHandler interface changes are complex [18:47:55] like still taking in the parser cache, but treating it as a boolean [18:48:00] the biggest reason is commons tbh, it has 100M files of which maybe 1M get a human visit, the rest are from cirrus search jobs [18:48:30] see T285993 [18:48:30] T285993: [SPIKE] Estimate growth in demand for Parser Cache storage - https://phabricator.wikimedia.org/T285993 [18:49:04] back to the issue at hand, let me see if I can get it done in content handler [18:54:18] ebernhardson: what do you think of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/766183 ? [18:55:22] Amir1: should it also avoid the first $cache->get, delegating everything to ParserOutputAccess? [18:55:55] hmm, yeah [18:56:17] that would make $cache argument fully useless [18:56:25] i'm wondering when we actually set the cache differently, checking [18:56:48] only in one test as far as I can see [18:57:10] yea, i suspect the only purpose was to simplify testing. I think it would be safe to document the argument as deprecated? [18:57:31] i dunno...seems awkward [18:58:52] yeah, let's do that [18:59:09] yea [19:01:08] stupid question, how do you deprecate an argument [19:01:41] Amir1: i'm not sure either :) looking [19:06:57] Amir1: yea not really any specific guidance. I'd guess since this is already nullable, soft deprecation is documentation, i'm not really sure a clean way to implement hard deprecation. We can test for non-null but that doesn't guarantee we were invoked with 1 arg instead of 2 [19:07:33] yeah [19:08:25] let's go with soft for now [19:08:28] sure [19:11:10] Amir1: huh, i guess its easy: https://3v4l.org/8JlZO [19:12:15] but can still wait [19:13:08] yeah, we need to make patches against Wikibase and other places [19:13:14] patch lgtm, will see what ci thinks [19:13:21] thanks [19:13:38] fingers crossed that would solve the issue [19:14:25] I used that class before to reduce duplicate parses and I confirm it works like this but mw is full of surprises [19:14:31] it seems plausible. If its any consolation all these parses were already happening, just in different processes. It's inefficent for the current deployment, but overall work done should be ~same since before those logs spiked [19:15:15] yeah, my biggest concern is that with this logs I can't really see actual duplicate parses that I'm trying to reduce [19:15:23] makes sense [19:15:28] it's a good opportunity to optimize it as well :D [19:16:05] indeed :) [19:24:11] ebernhardson: I'm thinking of backporting it on Monday, does that make sense to you? [19:24:31] and reducing it to high [19:24:42] Amir1: yes that seems reasonable to me [19:25:03] Thanks! [19:25:50] lunch, back in ~1 hr [20:27:35] back [21:03:42] Going to teacher conference, back in ~1 hr [22:10:58] back