[08:16:58] I've created T404822 after a semantic search meeting yesterday. [08:16:59] T404822: Analysis: how many search queries are using natural language vs keywords - https://phabricator.wikimedia.org/T404822 [08:17:47] ebernhardson: Is the description reasonable? [08:18:43] My expectation is that this should be relatively easy to implement. And probably easier to do ourselves than to delegate. If this is more complex than I expect, let me know and I'll see if we can find support. [08:24:07] If we work on relaxing AND in queries, it would make sense to create a hypothesis and attach it to WE3.1. [08:24:54] Astuthi can help us navigate the administrative complexity, but just a hypothesis is relatively lightweight. [08:25:08] Any volunteer to own this hypothesis? [10:19:19] lunch [13:19:50] o/ [13:39:17] \o [14:13:05] ebernhardson: Debra is joining our Wednesday meeting. If you can be there that's great! Otherwise, we'll keep you updated async [14:13:33] ebernhardson: Peter told me you might have a notebook with some data related to T404822 already... [14:13:34] T404822: Analysis: how many search queries are using natural language vs keywords - https://phabricator.wikimedia.org/T404822 [14:14:21] gehel: first hour i'll be around, it's early so workers wont even be here yet :) But not sure on second hour [14:14:33] good enough! [14:15:12] gehel: for natural language queries, i don't think i have any particularly relevant notebooks. I've pondered basic things looking for the who/what/why/where/when words, but never got around to it. [14:15:50] reading martins summary...i would have to spend some time disecting and understanding the definition :P [14:16:54] for things like "contain a categorical noun phrase immediately preceded by a preposition or relative clause;" i just dunno, i guess some sort of POS tagger? It's something i would have to spend time with [14:48:01] * pfischer can't make it to the Wednesday meeting tonight [14:56:01] ebernhardson: my understanding of the discussion the other day is that we should find a simple heuristic, and we don't care about too much about having a super precise categorization [14:56:02] ebernhardson: I dont think we need/should use that exact definition I shared in T404822 since some of the aspects might not be straightforward to implemented. probably something much simpler derived from that description will do the job for our purpose. I havent thought deeply about this yet but would be happy to connect and brainstorm more, if needed. [14:56:03] T404822: Analysis: how many search queries are using natural language vs keywords - https://phabricator.wikimedia.org/T404822 [15:01:48] mgerlach: that makes sense, from my side something like tokenizing / stemming (normalizing) and looking for particular words in historical queries is reasonably easy, but we haven't done anything more complex when it comes to actual language recognition [15:54:04] workout, back in ~40 [16:32:02] ebernhardson: yea, I think tokenizing/normalizing might be enough. from my perspective, the goal should be to come up with some heuristic thats relatively straightforward to implement and that we still believe is meaningful [16:47:55] dinner [16:54:47] back [16:55:00] ryankemper we have a new Envoy update ticket if you wanna have a look when you get in T404867 [16:55:01] T404867: Upgrade Envoy to v1.29.12 on wcqs and wdqs hosts - https://phabricator.wikimedia.org/T404867 [17:02:22] * ebernhardson ponders tokenizing queries in spark via http, vs tokenizing queries in relforge by indexing them [18:32:42] Reminder: I'm out Friday and Monday (Oscar's birthday + public holiday) [18:33:14] pfischer: can you prepare the triage meeting without me and facilitate (as you've been doing anyway lately) [18:39:02] back