[07:36:53] o/ [08:32:37] o/ [09:35:22] dcausse no rush: do you maybe have some swift magic (and credentials :D) to fetch the xgoboost models uploaded by mjolnir ? [09:48:37] gmodena: looking [10:44:16] gmodena: I believe it's in 'sudo -u analytics-search kerberos-run-command analytics-search hdfs dfs -text /user/analytics/swift_auth_analytics_admin.env' [10:46:54] errand+lunch [10:48:40] ignore the kerberos-run-command I think you can read this file with your own kerberos session [10:49:44] dcausse ack. checking [10:51:43] dcausse yep. works without sudo. fwiw I don't think I'm part of `search-analytics` anyway (modulo inherited membership) [11:04:21] Ah! we also store the models in HDFS [14:06:53] yes I think the swift part is only helping when publishing to elastic [14:10:14] the swift copy is for clients that can't access HDFS directly yeah [14:15:47] o/ [14:22:39] dcausse cdanis ack [14:23:04] o/ [14:30:10] dcausse we are looking at migrating Airflow-search instance to Kubernetes. Is now a good time, or do we need to schedule a maintenance window? Estimated downtime is 1-2 hours [14:30:31] inflatador: looking [14:30:38] https://airflow-search.wikimedia.org/dagrun/list/+ [14:30:39] https://airflow-search.wikimedia.org/dagrun/list/ [14:31:49] only one task is currently running https://airflow-search.wikimedia.org/taskinstance/list/?_flt_0_state=running# [14:32:05] inflatador: I'd say yes [14:33:14] dcausse yes, OK to go ahead or yes we should schedule a window? [14:33:27] inflatador: ok to go ahead :) [14:33:36] dcausse ACK, will give you a heads-up when we're starting [14:33:42] ack [14:58:58] dcausse thanks for T383333! [14:58:59] T383333: Add gmodena to analytics-search-users - https://phabricator.wikimedia.org/T383333 [15:07:23] yw! [16:48:57] For those not in the Slack thread, the airflow-search instance is now fully migrated to k8s [16:51:31] dcausse for the categories stuff, do we need to apply all daily dumps when reloading, or just the weekly dump + one daily dump? Based on https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/rdf/+/refs/heads/master/dist/src/script/loadCategoryDaily.sh#20 I'm assuming only one daily dump? [16:56:56] inflatador: I think loadCategoryDaily.sh just loads one day, so say the last weekly dump available is from monday and we run on friday it has to load the weekly from monday + the dailies for tue, wed, thur and fry [16:57:34] s/fry/fri [17:02:42] dcausse ACK. But based on my reading of the current script, it only seems to load the latest daily? Or am I missing something [17:16:04] lunch, back in ~1h [17:30:16] inflatador: yes loadCategoryDaily.sh loads only one day but since it's suppose to run every day it's ok [17:30:57] I have no clue if there are mechanism to capture cases when a host is down for several days [17:32:05] or even if the categories reload cookbook is taking care of populating days after importing the weekly dump [17:33:09] but tbh I haven't looked at those scripts in a while and from my memory they're pretty much a big mess [17:41:11] sigh... with superset you not only need perms on the dashboard to edit it but every single charts used by it... [18:00:04] There is a failing DAG on the airflow-search instance. Have you spotted it? https://airflow-search.wikimedia.org/dags/query_clicks_hourly/grid?dag_run_id=scheduled__2025-01-09T12%3A00%3A00%2B00%3A00&task_id=transform_hourly&base_date=2025-01-09T14%3A00%3A00%2B0000&tab=logs [18:00:22] btullis: looking [18:00:50] It looks like it's trying to call the `hive` CLI directly, which we don't currently have in our Airflow docker image. [18:01:15] yes [18:01:24] hm... [18:01:58] looking at how others are running hql [18:02:31] Bother. Maybe we will have to add it here? https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/blob/main/blubber.yaml?ref_type=heads#L100 [18:04:43] sigh... only search seems to be using the HiveOperator [18:05:05] used in many places for us and might take time to migrate to something else... [18:06:08] OK. I'm in a meeting right now. Maybe we could roll back, or add the hive CLI quickly tomorrow. [18:06:47] I see SparkSqlOperator being used [18:21:28] but we rely on airflow templating and apparently that's not supported with this operator... :( [18:22:48] ah no it might work if the sql is passed directly [18:25:41] back [18:30:14] dcausse btullis I can get started on a CR to add the hive package(s) if y'all like. Guessing it would be these packages: https://phabricator.wikimedia.org/P71951 ? [18:30:59] inflatador: no clue what's required to just have the hive CLI [18:31:15] it's part of the `hive` package [18:31:20] sending a quick patch to use something else, if that works maybe we can move away from hive [18:32:06] OK, let me know...happy to cut a new image with a few more pkgs if that's easier [18:32:53] inflatador: Let's see if the approach by dcausse works first, if we have time. I know that we prefer not to use the hive CLI, if it's possible. [18:33:10] I think we missed that SparkSqlOperation recommendation but now I remember Joseph telling me something about this several months ago :/ [18:35:15] btullis ACK, will wait for y'all's feedback [18:57:17] dcausse: I have to step away in about 15 minutes. How's it looking from your end? [18:57:53] btullis: well... still getting CI errors in airflow-dag :( [18:59:11] btullis: if I don't get anywhere this evening I'll stop the dag and ponder tomorrow [18:59:51] OK. Is it just the one DAG, then? When you said 'used in many places' I thought that it might be several that were affected. [19:01:39] it's actually 2 dags, other uses are for CREATE/ALTER statements that are not run automatically [19:02:14] so might be doable to migrate the actual uses and do later the CREATE/ALTER ones [19:06:32] dcausse: i think our sparksqloperator supports templating? [19:07:01] ottomata: it does with its own param yes but wanted to avoid rewriting much of the SQL [19:07:02] Oh yes, these are your `_init` DAGs that are paused and only get run manually, right? [19:07:09] btullis: yes exactly [19:07:38] but if the sql is provided directly and not from a file airflow might apply templating [19:07:47] so I think it might just work for me [19:08:03] what's not clear is all the hints we passed to hive [19:09:45] removed some that I believe are not required but things like "mapred.reduce.tasks: 6" to reduce to number of partitions, I'll have to understand how to do the same with sparksql [19:28:42] ci green, will deploy after dinner [19:30:27] Great! [20:11:06] bah, back ticks are removed from the query.... "csrs.`database`," -> "csrs.," [20:19:34] (╯°□°)╯︵ ┻━┻ [20:39:15] quick break, back in ~10 [20:39:18] err....20 [20:42:03] seems to be "spark-submit" that's subsutiting the back ticks... [20:49:15] other projects have similar issues it seems: https://github.com/apache/incubator-livy/issues/415 [21:16:33] Fail to parse '2024-10-12 0:00:00' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string. [21:16:35] sweet [21:16:47] ok giving up for today