[07:19:07] errand [10:11:22] Lunch [10:32:00] lunch [10:37:58] pfischer, dcausse : I've forwarded you the next WDQS workshop for this afternoon. Not many people confirmed, we might cancel it. Only join if you have time and interest. [11:50:43] o/ Don't suppose any of you kind people would have some capacity to do some pairing with WMDE on wikibase.cloud in the near future? We've been going through an ES outage since Monday night but making very little headway in recovering it. [12:39:29] tarrow: I think some of us can find some time, I forwarded you an invite to a meeting this afternoon, if you're available it might be good time to explain the current status [12:40:12] dcausse: sounds great! I'll see you there :) [13:03:54] o/ [13:24:31] import_cirrus_indexes_weekly is considered "running" from airflow but last log I can see indicates that it failed [13:30:13] ah indeed it's running under application_1678266962370_104209 it's me not being used to the new UI, it's visible in another screen [13:30:37] it's a bit of a maze to find this log tbh :( [14:26:34] \o [14:27:01] dcausse: yes, the skein indirection makes it a bit of a maze....there is a skein launcher of the spark app, you generally have to follow two levels of app id's for python now [14:27:22] oh [14:28:28] i think it was done because the conda .zip has to be unpacked for spark-submit, even though it's using the cluster deploy mode. a bit of a half-feature of spark i guess [14:28:52] plausibly the zip's could have been pulled down to the airflow instance and unzip'd, but this works too [14:35:33] hmm, the output suggests my change of the hive conf didn't take...maybe that can't be set from the spark conf command line? [14:38:23] yea, stackoverflow says hive properties have to pre prefixed `spark.hadoop.` [14:39:01] that was 2.4, but plausibly same here. annoyingly not seeing that in the spark docs (yet, could be there somewhere) [14:39:17] w [14:42:38] If I said elastic search envoy proxy issues, would this spark any memories of pain for anyone? [14:43:09] addshore: hmm, not really. envoy has been pretty painless for us and solved issues with cross-dc connection setup latency [14:45:36] dcausse: i think https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/333 is what we will need to get the cirrus index import working [14:46:01] doh, almost. should have been spark.hive not spark.hadoop.hive [14:49:24] ebernhardson: thanks :) [15:00:41] ebernhardson: thanks! [15:02:16] are we having wednesday mtg or design readout? Sorry I'm confused ;( [15:53:25] hi folks! [15:54:09] my team is working on deprecating the mediawiki.revision-score stream, since it is entangled with how ORES works and we'd like to deprecate it in favor of Lift Wing (during the next months) [15:54:26] we are aware that you use it, so our "migration plan" is to do the following: [15:54:38] 1) create mediawiki.revision_score_drafttopic [15:55:10] 2) create another stream to collect articletopic scores (outlink model) - the name etc.. is still to be decided [15:55:40] 3) Help you to migrate your codebase to the new streams (without relying anymore on the revision-score one) [15:55:56] 1) and 2) include importing the streams in hdfs/hive of course [15:56:14] is it something that we can work together during the next quarter? [15:56:32] I can open a task with some details so we can sync [15:56:36] lemme know :) [15:56:51] cc: dcausse, ebernhardson [15:59:30] elukey: ok, we have T328276 for the outlink migration, we need one for drafttopic I guess [15:59:30] T328276: Add outlink topic model predictions to CirrusSearch indices - https://phabricator.wikimedia.org/T328276 [16:01:11] dcausse: thanks! I have T328576, mediawiki.revision_score_drafttopic is already working [16:01:11] T328576: Implement new mediawiki.revision-score streams with Lift Wing - https://phabricator.wikimedia.org/T328576 [16:01:35] it is based on the new page_change stream, not on revision-create (Andrew asked us to think about switching to the new stream etc..) [16:01:49] and it produces revision-score events [16:01:52] elukey: do you have the new streams ready or not yet? [16:02:06] dcausse: only the drafttopic one, outlink is still in the making [16:02:15] ok thanks! [16:02:28] for the moment we are not producing all the scores that revision-score contained [16:02:31] I'll file a task for drafttopic then and we might start with this [16:02:58] IIUC you don't need other streams beside outlink and drafttopic right? [16:03:19] I mean it takes no time to add more, but so far we don't see the point of replicating the whole revision-score [16:03:25] (so adding revision_score_goodfaith, etc..) [16:03:36] we'll add streams as people need them basically [16:03:39] more manageable [16:04:14] elukey: correct only outlink and drafttopic for now [16:04:26] super <3, thanks for the help [16:06:41] workout, back in ~40 [16:39:51] dcausse: if I DM'd you a cluster state file could you take a look at it and see if "has anything suspicious" to you? [16:44:45] tarrow: sure I can take a quick look [17:49:28] lunch, back in ~40 [18:18:08] back [18:46:28] mutante asked about the difference between https://query-preview.wikidata.org/ and regular https://query.wikidata.org/ , does anyone know the use cases? [18:48:18] hmm, [18:49:50] inflatador: seems to be https://phabricator.wikimedia.org/T266470 . The short is that it runs against the 'test' instance of wdqs [18:50:11] " we want to expose wdqs1009 as a test server so that our users can make sure we're not breaking anything with the new WDQS updater" [18:50:47] yes, IIRC query-preview was put in place for the original streaming updater rollout [18:51:01] so the ui should be the same as the regular, but the query endpoints should point to the test server [18:54:03] looks like the latest glent deploy is now working appropriately, it's catching up to current now. I kinda wish airflow had some visual way to show that a dag is behind on it's schedule, currently it only shows the 'last run' date, but no hint other than manually applying the schedule in your head if thats up to date [18:54:21] thanks ebernhardson and ryankemper , will report back [21:21:54] Hey all, mjolnir-kafka-bulk-daemon is logging heavily about trying to write to an invalid index name: `glent_2023-03-29T20:32:21.006804Z/20230325` [21:22:31] Rate is about 10k msgs/sec [21:23:05] doh. That just migrated to the new airflow2 / spark3, seems something is up with it. I'll look into it [21:23:07] cwhite: thanks [21:29:35] seems auto versioning was mistakenly turned on when migrating, without auto versioning it wouldn't have that timestamp in it [21:35:10] pondering how i could make it skip the rest of that import instead of logging a few million failures... [21:38:45] These are messages in kafka? Could reset the offset: https://wikitech.wikimedia.org/wiki/Kafka/Administration#CLI [21:43:41] cwhite: i think it should have stopped now, i suppose i did something similar but slightly different (use kafkacat to join the consumer group for that topic, then stop the daemon so kafkacat can consume the message) [21:44:31] yep, that did the trick. massive rate drop. thanks!