[08:24:30] o/ dcausse: I am about to create this week’s report. Would have anything you want to add to https://etherpad.wikimedia.org/p/search-standup ? [08:25:14] pfischer: o/, I'll add a couple lines, sorry for that [08:30:54] dcausse: Thanks, no worries, I hope the work board fills in the gaps. ;-) [09:58:36] heading out [13:17:31] \o [14:07:15] * cormacparle waves [14:07:48] afaik there is no way to search for articles by `page_prop` - is that right? [14:08:24] like - give me all articles in `NS_FILE` with `page_prop:x=y`? [14:12:54] cormacparle: hmm, i don't think we index those but we probably could [14:13:42] generally you can attach action=cirrusdump to any page, like https://en.wikipedia.org/wiki/Main_Page?action=cirrusdump, and see what is indexed, which suggests what might be possible to find [14:23:26] yeah this one https://en.wikipedia.org/wiki/Wexford has the page_prop `page_image_free = Wexford_-_aerial_-_2024-08-31_04.jpg` but that's not in the CirrusDump [14:25:35] if you make a ticket it shouldn't be too hard, but it does take ~16 weeks iirc to get the initial data load in. Then we also have to answer questions like how the data should be analyzed to be searchable, plain exact match vs tokens or other options [14:26:01] ok cool, just exploring options really atm but good to know it's possible [14:39:10] heading in to my office, back in 30 [15:35:16] love these errors from gitlab ci: fatal: unable to access 'https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags.git/': Could not resolve host: gitlab.wikimedia.org [15:35:21] i think it's saying, try again in 5 minutes :P [15:44:51] Or, "it's Friday, what do you want from me?" ;P [15:52:11] :) [16:45:30] 7 more servers to go in CODFW...