[06:22:34] ended up being busy for the last several hours doing house hunting stuff (my current lease is up june 30) but I just kicked off the next data-reload on `wdqs2023` with the 10x'd buffer size (`com.bigdata.rdf.sail.bufferCapacity=1000000`). Likely won't be quite done by wednesday meeting but we should have a good idea of how much time will be remaining then [07:20:36] dcausse: I tried to wrap my head around the flink windowing but I don’t understand why it should be more efficient processing a backfill than processing realtime events. Would we expect page_rerenders to be late that often? That’s the only scenario I can think of, that would improve deduplication during backfill: Many page_rerenders with an event time < watermark arrive late (after the watermark) in the window [07:20:37] operator because their processing (not emission) is delayed. Only then a backfill would deduplicate those formerly late page_rerender events. [07:23:48] pfischer: true, backfilling in pure event-time should yield similar behaviors in the number of overall deduplicated events, but the rate of deduplication seen in prometheus should be higher since it's processing more events [07:24:10] and that is not what we saw during the backfill, the rate of dedup suddenly dropped [07:24:57] and then increased again once the backfill was done [07:25:17] I'm not clear on why they were considered late... [07:28:01] perhaps one approach would be to split rev based and non rev based event, merge and depud based on event-time for revbased events, and separately dedup based on processing time for rerenders [07:35:18] I suspect that the watermark got increased for some reasons... if a single event in the cirrus-rerender input stream is ahead of time this could explain [07:35:21] Hm, I checked the WatermarkStrategy we apply to our sources. We do not define a timestamp assigner. [07:35:36] So how can flink know the event time? [07:35:39] yes, saw that and was working on a patch [07:35:46] it's using the kafka timestamp [07:36:22] it's not exactly the event-time per event platform spec but I hope it's close enough [07:36:32] Okay, I was expecting either stream utilities or kafka source to handle it. Good to know. [07:37:10] but definitely something we should better control I think [07:37:53] sadly not all streams are complying with the spec, the top-level dt field should be the event-time [07:38:50] there's also watermark alignment that seems interesting, to avoid buffering too many events when backfilling [10:12:00] lunch [13:17:20] o/ [14:59:52] \o [15:00:54] was thinking, need to add new tables to hive and part of the migration to shared airflow showed everyone else maintaining schemas from the application side, instead of having init-dag's. Could probably add some minor abstraction in discolytics to define a table and columns (yaml?), and have those tables created [15:01:57] although the init dags work fine...maybe not a worthwhile endeavor [15:02:07] poked briefly for tools, but nothing obvious exists [15:28:24] * ebernhardson mutters at okta ... i authed on the website directly yesterday but apparently that didn't reset the lifetime on my login session [15:58:27] workout, back in ~40 [16:26:39] creating hive tables is not something we do that often... DP is working on a "config" service which should define some schemas, perhaps that's where it should go once ready? [16:29:21] it would make sense for storing the config there, but it would still have to be provided to something that does the actual work [16:29:55] for now probably not worthwhile, perhaps fun but i already have hql create tables, can just put them in place :) [16:37:09] back [17:08:08] * ebernhardson separately wonders if we should try harder to index unrenderable pages...we could probably return doc built without the parser output props [17:08:32] it's always hard to decide how wrong things should be represented though :P [17:13:52] we do index lua errors already, an empty text might be ok perhaps? as long as we tag such articles properly [17:14:02] Hi, I have 2 questions on wikimedia: 1. How do I know elastic search is not running? There are no modules named in LocalSettings.php [17:14:03] 2. I cannot seem to get blazegraph to update. I keep running into 'HTTP request fialed.java.util.concurrent.ExecutionException........, Connection refused error'. [17:14:03] Do you any suggestions? Thanks! [17:15:49] Guest71: howdy [17:16:45] Guest71: I will say, getting blazegraph and elasticsearch to run is a significant endeavor. blazegraph more so than elasticsearch, but both will require significant effort on your part [17:18:26] Guest71: for if elasticsearch is running, it really depends on how you installed it. If you are running a docker container, check `docker ps`. If you installed from the system package manager it might be found in `systemctl | grep elastic` [17:19:12] Guest71: for the ExecutionException, that is curious and i'm not sure i can help much. The error is clearly from the java side of things, so it seems like blazegraph tried to make an http request to something and failed. I'm not sure what that might be though [17:25:53] Guest71: for 2. I suspect that's when running the updater, it's either failing to contact blazegraph or failing to contact mediawiki [17:25:54] Yes, I installed the docker. The container "wbdocker-elasticsearch-1" is running but nothing is listening on TCP ports 9200/9300 [17:28:48] I see these statements in the manual install documents [17:28:49] wfLoadExtension( 'Elastica' ); [17:28:49] wfLoadExtension( 'CirrusSearch' ); [17:28:50] wfLoadExtension( 'WikibaseCirrusSearch' ); [17:28:50] But they are not in the localsettings.php, are we supposed to add them? [17:31:06] Guest71: for localsettings, yes if to add those extensions you will have to edit LocalSettings.php [17:32:25] Guest71: for the elasticsearch ports, it really depends on how the infrastructure is configured. Perhaps it registers a dns and your mediawiki container is supposed to point at the named elasticsearch container. Or maybe you are running elastic in a container but mediawiki un-containerized, and docker needs to be informed to forward the port when starting up the elasticsearch container [17:35:29] The docker suite containes these applications, but it doesnt include a configuration for them? So we need to load these by editing LocalSettings.php? [17:35:45] which docker suite? [17:35:56] do you mean https://www.mediawiki.org/wiki/Cli? [17:35:59] Guest71: are you using https://github.com/wmde/wikibase-release-pipeline? [17:36:14] yes [17:37:01] we are using the release pipeline version 20, but this has been a problem since we started trying to get this to work at wmde 14 [17:37:32] not familiar with https://github.com/wmde/wikibase-release-pipeline but it has examples in https://github.com/wmde/wikibase-release-pipeline/tree/main/build/WikibaseBundle/LocalSettings.d.template not sure how to leverage those, perhaps ask for on their talk page at https://www.mediawiki.org/wiki/Wikibase/Docker ? [17:37:54] s/ask for/ask for help/ [17:39:19] going offline, have a nice rest of the week [17:39:30] dcausse: enjoy the big break! [17:45:43] lunch, back in time for pairing [17:45:46] thx dcausse for the meet recording with you and peter. have a good break! [18:14:00] back [19:58:52] * ebernhardson sighs and realizes i either need to rebuild docker images or learn how to move them all from my old laptop [20:00:24] * ebernhardson pulls a build command out of history and hopes for the best [21:13:51] good luck! [21:37:40] wow. i just realized the reason my bash gets slow after many hours is because `history -r` loads the full history and appends to the in-memory structure [21:37:55] so after a few hundreds (thousands?) of invocations there are tens of millions of history items [21:38:07] it should be `history -c; history -r` [21:38:54] * ebernhardson feels like it couldn't have worked that way when i first set this up... [21:48:25] hmm, looks like we have quite a few WDQS hosts not responding to prometheus pollers https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes%2Fcustom&var-module=All&orgId=1 [21:49:35] only one alert so far though, and it's cleared [21:51:20] Will keep an eye out, but I'm guessing things are OK