[07:34:43] <gmodena>	 o/
[08:35:39] <dcausse>	 o/
[08:42:05] <dcausse>	 cirrus_import_indexes succeeded and unblocked downstream dags that all succeeded
[08:46:51] <gehel>	 dcausse: so that means we addressed all the known issues with the airflow-search migration to k8s?
[08:48:34] <gehel>	 inflatador: we expect most of the traffic to go to the main graph. Not sure where the latest analysis is, but the queries that need access to scholarly articles (either directly or through federation) are minimal. a few % of all queries. dcausse might have a more precise number.
[08:49:24] <dcausse>	 gehel: no, we're still blocked on T383430 to re-enable the data cleanups, and we have some monthly dags that have not run yet so not 100% but pretty confident that the main issue is fixed
[08:49:24] <stashbot>	 T383430: Use the KubernetesPodOperator for tasks that require access to refine python scripts - https://phabricator.wikimedia.org/T383430
[08:49:54] <gehel>	 that's still a good step forward!
[08:50:14] <dcausse>	 indeed!
[09:11:10] <gmodena>	 gehel dcausse I can help with T383430 when we are ready to use the k8s operator. I worked on the airflow-dags data retention implementation last december.
[09:11:10] <stashbot>	 T383430: Use the KubernetesPodOperator for tasks that require access to refine python scripts - https://phabricator.wikimedia.org/T383430
[09:16:22] <dcausse>	 gmodena: thanks!
[09:17:15] <gmodena>	 it will help migrating the analytics instance too, retention code is shared.
[10:45:13] <gmodena>	 dcausse shall we restart mjolnir_weekly?
[10:47:15] <gmodena>	 right now I'm working on exposing an easy_query label to xgboost, and it's tricky. I don't think I'll get to do large scale experiments today. Might as well burn some cpu cycles and catch up with the MLR training schedule.
[10:50:03] <dcausse>	 gmodena: ack, let's see how it behaves
[10:50:54] <gmodena>	 dcausse ok. I merged the skein logging mr
[10:51:07] <gmodena>	 let me triple check it has been deployed before re-running
[10:51:16] <dcausse>	 oops
[10:54:27] <gmodena>	 eh. We still have to manually run scap on ariflow-search, right?
[10:54:58] <dcausse>	 gmodena: I was expecting a merge to do the deployment for us?
[10:55:08] <dcausse>	 no new artifacts were added
[10:55:27] <dcausse>	 but the code tab does not show the libs so hard to tell if it's deployed...
[10:55:32] <gmodena>	 i thought so too
[10:55:35] <dcausse>	 looking at the pods
[10:56:01] <dcausse>	 should have been restarted after you merged
[10:56:33] <dcausse>	 hm perhaps not
[10:56:38] <dcausse>	 I see a git sync pod
[10:57:03] <dcausse>	 perhaps the code is mounted somewhere
[10:57:38] <dcausse>	 I see {"logger":"","ts":"2025-01-15 09:08:50.719454","caller":{"file":"main.go","line":1728},"level":0,"msg":"updated successfully","ref":"main","remote":"6601d1a61d234c5fe173fde2e823b101a78f2a59","syncCount":22}
[10:58:24] <gmodena>	 ack
[10:58:30] <dcausse>	 corresponds to https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/commit/6601d1a61d234c5fe173fde2e823b101a78f2a59
[10:58:32] <gmodena>	 looks like the right ref
[10:58:36] <dcausse>	 so hopefully ok
[10:59:47] <gmodena>	 we'll find out if the erroring code path is triggered :)
[11:00:12] <dcausse>	 sure
[11:00:23] <dcausse>	 feature collection started, visible at https://grafana-rw.wikimedia.org/d/000000616/elasticsearch-mjolnir-msearch?orgId=1&refresh=5m
[11:01:03] <dcausse>	 lunch
[11:01:45] <gmodena>	 and we have only one instance of mjolnir_weekly__feature_vectors-norm_query-20180215-query_explorer__20250103 running on yarn. 
[11:01:57] <gmodena>	 lunch++
[13:16:38] <gmodena>	 dcausse looks like https://phabricator.wikimedia.org/T279621 is complete. Maybe time to plan moving swift -> ceph for flink checkpoints store?
[13:20:46] <dcausse>	 gmodena: I've lost track of this but was it originally called misc object storage?
[13:21:40] <dcausse>	 but generally +1 to move away from thanos and use a more "appropriate" solution
[13:23:34] <btullis>	 That is correct. The MOSS (misc object storage service) project was renamed 'apus' (https://en.wikipedia.org/w/index.php?title=Apus_apus&redirect=no) - I think it's a sensible place for flink checkpoints, if you need multi-dc support.
[13:23:35] <dcausse>	 yes that's it: renamed "Set up Misc Object Storage Service (moss)" to "Set up new S3-level replicated storage cluster "apus"".
[13:24:02] <gmodena>	 yep
[13:24:08] <btullis>	 Speak to Em.peror if you would like a user and a bucket, I believe.
[13:24:36] <btullis>	 Otherwise known as Matthew Vernon on the Data Persistence team.
[13:25:12] <dcausse>	 sure, curious to know if flink checkpointing is in scope for this tho
[13:30:09] <btullis>	 Yeah. You might also check what the default bucket replication policy is like, as well. You might find that you have eqiad checkpoints replicate to codfw and vice-versa, which might not be what you want.
[13:35:50] <dcausse>	 sure, we need to put some thought into it and IIRC thanos is active/active with cross replication but flink@eqiad write to different bucket than flink@codfw so that they pollute each others
[13:36:06] <dcausse>	 don't* pollute
[14:11:57] <inflatador>	 <o/
[14:23:37] <inflatador>	 dcausse was wondering which graph type is expected to have more usage? Guessing main over scholarly? If there's any docs on that LMK
[14:26:44] <dcausse>	 o/
[14:37:42] <dcausse>	 inflatador: usage as in "query rate"? if yes yes main is going to support most of the wdqs traffic
[14:46:51] <dcausse>	 in term of writes and space usage I think this might vary a lot in the future but I think they're pretty equivalent as of today
[14:46:54] <dcausse>	 https://grafana-rw.wikimedia.org/d/000000234/kafka-by-topic?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-kafka_cluster=main-eqiad&var-kafka_broker=All&var-topic=eqiad.rdf-streaming-updater.mutation-main&var-topic=eqiad.rdf-streaming-updater.mutation-scholarly&viewPanel=35
[14:51:49] <inflatador>	 Thanks! I just wondered which one would use more resources after the split. Mainly thinking about the shape of our hardware, since we will be overprovisioned in terms of compute power but possibly underprovisioned in number of servers
[15:35:16] <inflatador>	 dcausse looks like there is a chart after all https://github.com/wbstack/charts/tree/main/charts/queryservice
[15:35:42] <inflatador>	 t-arrow was indeed the correct person to ask :)
[15:59:23] <dcausse>	 nice!
[16:07:06] <inflatador>	 Meet is not cooperating, sorry I'm late
[17:04:23] <inflatador>	 workout, back in ~40
[18:00:18] <inflatador>	 sorry, been back awhile
[18:32:44] <gmodena>	 Trey314159 dcausse Sorry if I left a bit abruptly -I lost track of time and had to catch a ferry home =). Thanks for the brainstorming and suggestions today! 
[19:00:27] <inflatador>	 I did get wdqs up and running in helm...next step is to see if I can get it to reload categories
[19:33:40] <Trey314159>	 gmodena: It's all very interesting stuff to think and talk about!