[08:57:15] sup producer@codfw is broken since 1am apparently :/ [09:09:06] OOM when exporting prometheus metrics... [09:20:51] it's using 2000m, going to bump to 3g to unblock [09:21:03] but we might take a loook at the number of metrics we push [09:35:50] seems to have unblocked it [09:36:53] checking weighted_metrics... might be that one which not properly limiting itself in case a broken producer is sending "random" tag prefixes [09:37:55] no... still the same set of prefixes... [10:05:40] we seem to export 3.4Mb of metrics [10:06:15] task names can be particularly long... and it's repeated over all metrics [10:06:44] top counts are: [10:07:19] 60 flink_taskmanager_job_task_mailboxLatencyMs [10:07:21] 325 flink_taskmanager_job_task_operator_http_authority_path_method_request_duration_count [10:07:23] 1950 flink_taskmanager_job_task_operator_http_authority_path_method_request_duration [10:08:57] on a total of 4326 for one task manager producer [10:10:37] seems like there's nothing completely broken, unsure if we need all that granularity in the http metrics tho [10:34:12] errand+lunch [12:55:01] ebernhardson: both projects should now be renamed [14:01:25] o/ [14:03:23] inflatador: when you have a moment could you deploy https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/1091290 to wcqs nodes? [14:04:03] I checked few of them and they appear to be running old artifacts for both the main blazegraph service and the updater service [14:04:20] dcausse oops, sorry we missed that yesterday. Will roll out shortly [14:04:38] inflatador: no worries, thanks for the deploy! [14:47:20] \o [14:50:22] .o/ [14:53:39] o/ [14:55:27] wondering what is more pythonic between: list(map(func, list_of_stuf)) vs [func(e) for e in list_of_stuf] [14:55:52] dcausse: i would go with second, python has map but it's rarely used [14:56:06] ebernhardson: sounds good, thanks! [15:45:30] til ("1") is a str ("1",) is a tuple [15:46:27] dcausse: yeah, ( ) works that way in python; part of it is so you can use parens just to have expressions span across linebreaks [15:47:55] ah haven't thought about that use-case, I use it a lot in pyspark to break long call chains indeed [15:50:18] my construct is basically appending an optional string to a tuple: ("1", "2") + (optional_str,) if optional_str else () [16:16:37] not that anyone should ever do it, but fun fact (1,)*True == (1,), (1,)*False == (). Meaning you could do (optional_str,)*bool(optional_str) [16:18:41] nice! yes seems a bit too obscure too obscure for me :) and not sure that would work for me, forgot to mention that I have to transform the option_str with a function before [16:20:33] in unrelated fun problems...search-highlighter doesn't want to release because nexus is complaining that we didn't upload javadoc. Although i can clearly see attach-javadocs run earlier in the build. [16:22:48] :/ [16:23:09] ebernhardson: I vaguely remember issues with delombok and javadoc, where generated sources where not picked up by javadoc. [16:23:31] I think that was fixed in recent versions of the parent pom [16:23:50] gehel: hmm, i suppose i could try updating. This is using 1.75 [16:42:25] looks to have done the trick [17:00:40] CR for updating the the Search Platform DAGs to use miniforge if anyone has a chance to approve/merge https://gitlab.wikimedia.org/repos/search-platform/mjolnir/-/merge_requests/10 [17:01:11] that one's just for mjolnir, looks like we'll need discolytics as well [17:15:39] hopefully is a non-event. merged [17:16:44] plugins package for opensearch should now be ready: https://gerrit.wikimedia.org/r/c/operations/software/opensearch/plugins/+/1080749 [17:17:24] {◕ ◡ ◕} [17:18:00] oh, i should fix the changelog. [17:24:24] (fixed) [18:26:41] hm... can't figure out how to make mypy happy with a staticmethod that returns an instance of this same class... typeing.Self is too recent apparently [18:31:53] dcausse: will mypy accept the class as a string? [18:32:09] ah, haven't tried that [18:32:10] dcausse: or maybe `from future import annotations` would do the trick [18:33:16] err, from __future__ import annotations [18:36:34] poking at the stuck mjolnir job...no clue what it's doing :( It's in scala code, but the stack traces for executors have nothing interesting [18:37:37] :/ [18:38:11] static_func(): 'type' did the trick thanks! [18:45:17] oh, actually thats curious. The executor thinks it finished a task, the ui shows it as status killed, and nothing in the ui is running. So something deep in spark is confused :( [18:45:50] somehow the failed tasks didn't get rescheduled, and somehow they failed when the worked stdout claims success [18:48:42] oh man, if only spark would fail in nice ways with error messages in obvious places: Lost task 13.0 in stage 980.0 (TID 12906) (an-worker1170.eqiad.wmnet executor 7): TaskKilled (Tasks result size has exceeded maxResultSize) [18:50:59] ?? the task is returning results to the driver? something like df.collect()? [18:51:54] well, it's inside a stats function (approxQuantile) and those generally bring data back to the driver. But we have spark.driver.maxResultSize=3g here already. i guess just give it more :P [18:53:09] i suppose it must be an aggregate limit, from one of the tasks that was failed only ~250mb: Executor: Finished task 13.0 in stage 980.0 (TID 12906). 262903987 bytes result sent via BlockManager) [18:53:57] sure [18:55:19] i am a little surprised approxQuantile brings back so much data...somewhat curious what it looks like [19:10:11] dinner [19:23:07] haven't forgotten about the wcqs deploy, we'll do it by EoD [19:29:27] lunch/dr appointment, back in ~2h [21:28:22] back [22:17:48] Hey! Quick question...is it possible to view _history_ of a document for a particular page? Specifically, I'd like to know when weighted tag updates happened for it [22:17:54] (and even more specifically, i'm curious about updates for https://az.wikipedia.org/wiki/HP-GL) [22:23:33] ryankemper: https://gerrit.wikimedia.org/r/c/operations/software/elasticsearch/plugins/+/1092935 [22:23:55] urbanecm: hmm, we don't really do history of anything. There might be 90 days of history in the data lake of the event streams [22:24:19] it would be a little tedious, but can probably reconstruct it from there [22:26:06] ack, good to know. thanks ebernhardson