[09:14:08] dcausse: when you have time, a few pom upgrade still pending: https://gerrit.wikimedia.org/r/q/upgrading+to+latest+parent+pom+status:open [10:17:35] gehel: do you have a link to a bug for https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/853412/4/pom.xml#253 ? [10:18:18] no, I did some googling, but could not find anything [10:18:40] might make sense to open a bug to the resource plugin? [10:19:30] I vaguely remember Andrew having that same problem before on that same project [10:19:55] there was an issue with symlinks IIRC? [10:28:21] hm I think there's some confusion [10:28:43] parent pom says stick to 3.1.0, 3.2.0 does not seem to be able to handle symlinks [10:29:26] and you bumped to 3.2.0 in https://gerrit.wikimedia.org/r/c/wikimedia/discovery/discovery-parent-pom/+/853253 [10:29:33] (I missed that) [10:29:51] but 3.3.0 is perhaps avaible [10:30:25] https://issues.apache.org/jira/browse/MRESOURCES-269 is the bug we were having and perhaps the one you encountered when testing? [10:52:15] lunch [11:23:26] dcausse: Oh, that might have been my issue! I'll check again. [11:56:58] gehel: I encountered SLF4J warnings since bumping the parent to 1.65. Apparently flink is compiled against version 1.x of their API but we now enforce 2.x. As long as I do not provide an implementation everything works fine. However, if I declare log4j-slf4j2-impl as test dependency, I encounter runtime exceptions. [12:23:47] gehel: never mind, I fixed it by excluding SLF4J from flink [12:57:48] pfischer_: or we might want to keep it back to 1.x for projects that have other dependencies. [12:58:08] Too bad that Flink does not provide a BOM [13:49:09] o/ [13:57:39] translatewiki.net is running 7.10.2 https://translatewiki.net/wiki/Special:Version ! :) [14:01:47] inflatador: do you have any news on the flink-k8s operator? [14:05:02] dcausse no, looks like gabriele pinged again a few hrs ago in https://phabricator.wikimedia.org/T320812 . I'm still working thru https://phabricator.wikimedia.org/T321587 , made a lot of progress this wk [14:06:43] inflatador: thanks, no objections that I share this terraform repo with them? have a meeting with them now [14:07:11] dcausse np although it has no flink-specific stuff yet. I can join mtg too if you like [14:09:18] inflatador: sure, sending like in pm [16:56:52] ebernhardson: do you have objections shipping 1M entries (image suggestion) from transfer_to_es? [16:57:35] context is T320656 [16:57:36] T320656: [L] List articles appearing in articles with image suggestions - https://phabricator.wikimedia.org/T320656 [17:15:56] dcausse: should be fine i suppose [17:16:05] thanks! [17:16:35] uploaded https://gerrit.wikimedia.org/r/c/wikimedia/discovery/analytics/+/855569 it's a lot of boiler plate but I wanted to keep track that we've run this somehow [17:17:38] sure that seems sane [17:18:01] patch looks straight forward, can merge [17:18:10] ok shipping [17:19:16] i've also just put up the incoming links job to gerrit, set the start date for next week to be the first run. It probably shouldn't be merged until the weekly transfer for next week has run [17:19:26] nice! [17:19:28] err, it will run at the end of next week [17:19:48] next i have to decide how to modify cirrus, i suppose we should keep incoming links but disable it via config? [17:20:04] yes I think that's a good approach [17:20:51] surprisingly, only ~6M pages changed incoming links counts over >1wk [17:25:18] yes... but not sure what I would have expected before you computed this number [17:25:41] we get 10M to 12M edits/week [17:26:16] third of them are for wikidata tho [17:26:41] hmm, i suppose i hadn't considered that. I also had no clue what to expect :) [17:40:06] Trey314159: possibly of some interest to you: https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem/WDQS_and_Search#Korean_search_method_is_inconvenient,_it's_problem_to_search [17:41:27] MjolnirUpdateFailureRateExceedesThreshold is not me, haven't started the import yet [17:41:33] Ooo. Interesting looking. I'll check it out [17:41:37] :S [17:41:50] not seeing anything weird in grafana tho [17:42:21] oh actually there's a small bump in failures [17:43:12] yea, looks like 119 things failed. Not sure if we have logging that says what exactly, checking [17:47:58] Failed bulk update request: {'_index': 'gagwiki_content_1663873284', '_type': '_doc', '_id': '6503', 'status': 429, 'error': {'type': 'circuit_breaking_exception', 'reason': '[parent] Data too large, data for [indices:data/write/bulk[s]] would be [8335501470/7.7gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8335437472/7.7gb], new bytes reserved: [63998/62.4kb], [17:48:01] usages [request=0/0b, fielddata=4743104/4.5mb, in_flight_requests=63998/62.4kb, accounting=12882702/12.2mb]', 'bytes_wanted': 8335501470, 'bytes_limit': 8160437862, 'durability': 'PERMANENT'}} [17:48:17] memory usage still too tight in some of the small clusters [17:49:18] maybe the daemon should backoff and retry? or we figure out appropriate memory limits [17:50:03] :/ [17:51:49] not sure I undertand how to parse this error message [17:51:51] hmm, we still use elasticsearch client 5.5.3 there, maybe an update would help. We lean into the client for actually doing these updates, can check if later client versions better handle these [17:52:02] sure [17:52:09] it's saying the memory usage is currently 7.7gb, and it can't allocate 64kb because thats more than the 7.5gb limit [17:52:39] it's already over limit :| [17:52:44] yea [17:53:02] i suppose "can't allocate" is wrong, its jvm and the jvm would try. but elastic is refusing to try the request [17:53:03] and 64kb... [17:55:14] at the end of the day...i expect we have to just keep giving these instances more memory until they don't circuit break. Not sure what else can be done. A newer client can maybe backoff and retry, getting the updates in a little later, but the root issue remains [17:55:53] checking if we incresed these recently...i think we already tried that once [17:58:03] hmm, no they've been at 8G heap since created in 2018 [17:58:28] maybe add 25%? [18:01:56] +1 [18:47:44] dinner [20:15:45] ebernhardson: you around? Any chance you could jump in https://meet.google.com/eki-rafx-cxi for a few minutes? [20:15:52] gehel: sure, sec