[07:41:40] o/ [07:41:45] flagging this cr from tchin https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1190 [07:44:13] o/ [07:57:48] o/ [08:25:34] pfischer, dcausse, Trey314159, ebernhardson: I'm looking for someone to present the Search Platform updates at next DPE Staff Meeting (next Monday). Interested? Slides: https://docs.google.com/presentation/d/1oT-Ft6Dkb4wDGDMfrkZ-5npVhmEv27KM5M1TFxRFp7A/edit#slide=id.g33b637f6e95_0_30 [08:59:38] gehel: I can do that [09:03:47] gehel: It's about that empty slide 42? [09:40:08] pfischer: 42 and 43. I copied a few things on 43 already, but could you take it over? [10:40:00] errand+lunch [10:49:07] lunch ! [13:13:32] o/ [13:36:34] o/ [13:45:51] I'm super confused, I was almost sure that execution_date was equal to data_interval_start but doing some testing with fixtures it seems to be equal to data_interval_end [13:46:09] from https://airflow.apache.org/docs/apache-airflow/stable/faq.html#faq-what-does-execution-date-mean [13:46:11] "Execution date or execution_date is a historical name for what is called a logical date, and also usually the start of the data interval represented by a DAG run." [14:01:17] \o [14:02:32] dates are still all confusing in airflow :P [14:02:41] o/ [14:02:53] i like data_interval_start/end [14:03:05] ebernhardson: if you have a moment could you look at the thread in https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1186 [14:03:15] something I don't get... [14:03:20] sure [14:03:30] could perhaps be how fixtures are generated... [14:04:37] .o/ [14:04:51] for test generation, we actually don't deal with start/end directly, airflow has a DataInterval class that represents start/end, and the Dag.iter_dagrun_infos_between returns a sequence of DataInterval's [14:05:41] well, it returns DagRunInfo's and we extract the data_interval from them [14:06:19] i suspect we are still using the execution_date mostly from not converting when switching from airflow 1, but not 100% sure [14:06:20] but in tests execution_date equals data_interval_end [14:06:42] but should be data_interval_start IIUC [14:06:53] hmm [14:07:34] we probably don't care since we're not supposed to use execution_date for anything but still very confusing [14:13:09] testing locally, if i replace `execution_date` with `data_interval_start`, indeed the date changes :S [14:13:47] I see mocker.patch.object(TaskInstance, "execution_date", mock.PropertyMock(return_value=data_interval.end)) [14:14:07] conftest.py:302 [14:14:30] ahh, that would do it [14:14:50] About T388213: I think we discussed adding a clear signal in the paylod itself. Should we still do that? Or is `isInternal()` sufficient? [14:14:51] T388213: Make it clear that CirrusDoc is not a stable format. - https://phabricator.wikimedia.org/T388213 [14:14:52] airflow docs are fun, execution_date refers to logical_date, logical_date says "This value does not contain any semantics, but is simply a value for identification." [14:15:11] fixing that we could see all dags still relying on this [14:15:11] gehel: i wasn't sure, isInternal effectively only changes the output on Special:ApiSandbox [14:15:26] trying [14:15:37] dcausse: yea does seem potentially worthwhile, everyone should move away from that if that are using it [14:15:40] I've re-opened for now. No emergenccy here, we can discuss on Monday. [14:20:08] all red :P [14:20:12] lol [14:20:39] but I'm sure we use it for something else (probably the dag run id or something) [14:20:47] ya a grep shows it all over our search dags :( [14:21:36] basde on the docs about how it has "no semantics" i suppose we should all migrate away [14:22:06] clearly it does have semantics, but they don't want to put any guarantees there to get people to move to data_interval_{start,end} [14:22:24] yes [14:22:33] {{ ds }} has same problem [14:22:44] yes it's deprecated as well [14:23:04] til the { date | ds } formatting [14:23:10] which is nice [14:24:16] hm... not sure what order would be best, remove the use of execution_date first or fix the test to align the fixtures with the reality [14:25:03] hmm, i kinda like the ability to change the test to check where things are wrong. But i guess you could just as well change the test to an arbitrary value and still find all the places it's used [14:25:28] i suppose make it correct, and ping data-engineering-collab? [14:26:21] by "make it correct" you mean fixing conftest.py to set execution_date date_interval.start? [14:26:26] yea [14:26:29] ok [14:29:47] heading to cowork space, back in ~30 [14:47:57] Weekly status: https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2025-03-28 [15:12:15] The Samsung TV at this coworking space has convinced me never to buy a Samsung [15:19:11] lol [15:19:36] i certainly have a love/hate relationship with modern electronics [15:26:34] Took me 10 minutes to get it to detect a connection on its HDMI port...it hides the ports if it doesn't see them...you get about 3 seconds until it flips back to its built in streaming crap...(old man grumbling continues) [15:30:08] OK, the new plugins pkg is published. I finally wrote a playbook to automate it, so it should be faster next time [15:32:17] cool! [15:34:25] thanks! [15:35:27] will roll a restart shortly [15:36:53] i've wondered a bit if we could somehow move the repo to gitlab and let gitlab build and upload via CI, but i'm not really sure what all is involved on the upload side [15:37:46] having the playbook automation is probably a good middle ground [15:38:47] inflatador: that reminds me, on the opensearch instances we have opensearch-madvise package, but in the git repo i still see elasticsearch-madvise, is there a new repo? [15:42:32] yeah, I was hoping to get it into CI eventually...last time I checked it didn't really fit our workflow [15:42:48] the opensearch-madvise is in https://apt-browser.toolforge.org/bullseye-wikimedia/component/opensearch13/ [15:44:10] inflatador: where is the source package that was built from though? I made a patch yesterday for operations/software/elasticsearch/madvise, but i'm guessing now thats the wrong place [15:44:19] oh i should have looked closer, i see https://gitlab.wikimedia.org/repos/search-platform/opensearch-madvise [15:44:25] will abandon the gerrit patch and put it over here [15:46:51] i suppose in theory the patch isn't 100% needed, looks like this has the new path hardcoded, but seems better to make it an argument [15:47:43] Happy to roll a new package if I need to [16:22:43] this might finally become unnecessary, someday soon(ish): https://github.com/apache/lucene/pull/13196 [16:23:04] listed as lucene 9.11.0 [16:23:21] opensearch 1.3 is still 8.10.1 though [16:47:10] Looks like I need to add the new relforge hosts to Puppet before I can roll-restart... [16:48:40] * ebernhardson doesn't understand why the cirrussearch-opensearch-image build is failing :S https://gitlab.wikimedia.org/repos/search-platform/cirrussearch-opensearch-image/-/jobs/473567 [16:49:08] unless my reading comprehension is failing (not unheard of), it only really says: error: failed to solve: exit code: 2 [16:49:57] :/ [16:50:27] seems to be very early? [16:51:05] I haven't tried installing the new package yet, let me try and rule that out [16:51:23] yea, very early. In a normal build the next two log lines would be `[internal] load .dockerignore` and `[internal] load build context` [16:51:54] inflatador: this is failing long before that, also the patch to update the reference to the package built and ran as expected [16:51:55] oh it's publishing the image [16:52:13] yes this is releasing the new tag after merging [16:52:41] the merge commit passed as expected: https://gitlab.wikimedia.org/repos/search-platform/cirrussearch-opensearch-image/-/jobs/473537 [16:53:15] i suppose they are running on different runners, this runs on *.eqiad.wmnet instances, where the regular build ran on gitlab-runner-memopt-* [16:54:17] i guess the others ran on rented cloud instances, and the tag build runs in prod [16:59:36] not seeing many others using blubber 1.1.0 [16:59:55] seeing mostly 1.0.1 and 1.0.0 [17:00:01] no clue if related [17:00:13] The new package installs OK in relforge, FWiW [17:00:43] yea this looks more related to the build env, it doesn't make it to the point where it even loads our dockerfile [17:01:13] we could try downgrading and see [17:01:47] yea can't hurt, checking if i can make it run on the prod runners without making a new tag, although making a new tag is probably fine [17:29:43] heading out [18:54:23] Hiking, back in 2 hours [20:06:13] rolling operation is failing on relforge, I think it wants row/rack attributes. I don't think we've ever set them for relforge, but I must be wrong about that since we've def used the cookbook before