[12:52:00] dcausse: is there anything more to do on T326914 ? [12:52:01] T326914: Migrate the WDQS streaming updater from FlinkKafkaConsumer/Producer to KafkaSource/Sink - https://phabricator.wikimedia.org/T326914 [12:56:49] dcausse: about the Swift cleanup, do we have a follow up task? It seems that there are some open questions on how to keep this from growing out of control again. [12:58:33] dcausse, dr0ptp4kt : do we have anything to report on the Graph Split for this week? I don't see any activity on T347989 yet... [12:58:33] T347989: Adapt rdf-spark-tools to split the wikidata graph based on a set of rules - https://phabricator.wikimedia.org/T347989 [13:13:45] gehel: for a status update it would be something like 'Standing up Airflow local development environment on analytics cluster and reproducing graph split queries. Next up: porting graph query approach to parallel Airflow job with splits.' Basically, Monday through Wednesday was reproducing the essential parts of the Airflow environment to populate my own dr0ptp4kt.wikibase_rdf_with_split in HDFS and getting David/Andy's query [13:13:45] approach working in my own notebook to explore the data (no surprise, the query execution is prone to breakage, but uncapping limits makes it work). (Yesterday was meetings and prep and execution of backport/config deploys to get myself back in the game, but now back to graph split.) My target is something that's spitting out data into partitions by end of next week if I had to hazard a guess. As for productionizing with new a new [13:13:45] JAR and Airflow prod deploy I would imagine that could begin somewhat separately starting from later in the upcoming week to earlier in the week following that. I'll get to documenting material on task...technicality: should it be moved to the doing column? getting kids to bus soon, just happened to catch the notification here and had a minute to reply. [13:15:22] dr0ptp4kt: thanks! I'm moving the task to "in progress". In terms of update, I'll say something about getting the dev environment in place [13:16:28] dr0ptp4kt: good luck with the kids! [13:17:18] inflatador: I'll be 2' late [13:17:46] gehel ACK [13:20:14] actually, I'm there! [13:25:18] search-loader2002.codfw.wmnet has been failing puppet for a while and now it's sending emails to root@ about an expired certificate, is that in your radar? [13:32:17] gehel I can see you just fnie [13:32:24] I dunno what's wrong...let me join and leave [13:34:51] oh good, more network problems [13:51:53] taavi that's a non production host, will delete shortly [13:54:31] weekly update published: https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2023-10-13 [14:07:46] gehel patch for moving the failing search-loader hosts back to insetup, should staunch alerts https://gerrit.wikimedia.org/r/c/operations/puppet/+/965748 [14:08:06] I think I want to keep the hosts around a bit more just in case they are useful for testing [14:14:34] inflatador: lgtm [14:27:56] OK, puppet is happy [15:03:07] \o [15:18:48] workout, back in ~40 [16:38:59] back [17:33:18] lunch, back in ~30 [17:57:53] back [18:06:35] took a variety of mucking about, but convinced the ci pipelines to use the new version of conda. Slightly worried though that this new one is going to generate larger artifacts....i guess will find out soon enough :) [18:07:24] basically this ended up switching it from miniconda to the full blown analytics environment [18:07:31] The flink site has a few 404s ;( [18:07:36] fun :) [18:08:42] yeah, looks like I was reading an older blog post and they must've changed URIs around [18:19:07] meh, job downloaded dependencies for 19m 30s, just started to run the test, and the timed out at 20m :P [18:26:51] doesn't look like i can change the timeout, it comes from the runner :S we have default timeout of 1h, but the runner clamps it down [18:45:40] switching to wmcs runners gives higher timeouts, at least something :) [19:05:58] updating maven to 3.9 and enabling parallel pom downloading made a huge difference, finished in <13m [20:27:06] back [20:56:16] hmm, updated package is 480M, 468M before. worked ok [21:02:14] I wonder if we could just make runners on the ganeti infra? There's a lot of compute sitting out there [21:27:21] using wmcs seems to have worked fine, i think it's the same idea but it runs in the cloud infra [21:27:47] although now i wonder if we are supposed to build production images in cloud...i guess i'll test it with memopt runner now that we don't need the longer timeout afforded by wmcs [21:28:24] because this is an artifact the gets downloaded it doesn't really require special permissions though like the trusted runners...i dunno :)