[08:26:23] dr0ptp4kt: good to know, I've been testing wikibase_rdf_scholarly_split_refactor_w_cache_n_distinct and it looks good so far, still have to check the refs of wikidata_main [08:27:33] dr0ptp4kt: I have a parent/teacher meeting at 17:30 my time and can't make the meeting you scheduled sadly [09:03:16] dcausse: gehel: good morning by any chance do you know anything about the Gerrit repo `analytics/wmde/toolkit-analyzer`? Or do you have any point of contact for it? ;) [09:03:57] hashar: never heard of it, lemme see if I recognize something [09:04:09] I don't know much about it. I would contact Lydia (WMDE) or maybe Andrew McAllistair [09:04:11] that got written by addshore [09:05:56] I will ask Lydia :) [09:07:37] seems like a standalone tool to analyze wikidata dumps, so most likely rarely used? [09:08:00] I found some doc on https://wikitech.wikimedia.org/wiki/WMDE/Analytics [09:08:08] but maybe none of that still applies [09:08:20] I guess Lydia will know ;) [09:09:22] it's broken apparently: https://phabricator.wikimedia.org/T278665 [09:09:54] seems like nobody really care? [09:10:29] fun [09:11:05] yeah I am going to be bold and archive it [09:22:15] We are currently digging into this as part of analytics investigations. I don't think we should archive it yet [09:25:26] Lydia_WMDE: thanks! [09:25:28] Lydia_WMDE: Guten Tag! Good to know you are aware of it :) [09:25:54] I have filed https://phabricator.wikimedia.org/T351291 about it , looks like the cron job has been disabled back in April 2021 and the code is potentially unused now [09:26:10] hashar: so if it's running on stat I guess it has the same issues than other spark jobs and might have to stay on java8 while the analytics cluster runs jdk8 [09:26:49] possibly yes, which is fine to me :) [09:31:08] dcausse: gehel: and on an other topic `wikimedia/discovery/discovery-maven-tool-configs` and `wikimedia/discovery/discovery-parent-pom` are only tested with Java 8 , should I add Java 11 as well? ;) [09:32:21] we might just build it with java11? these do not build any jars iirc [09:32:29] cc pfischer [09:33:10] they all passes locally with Java 11 [09:33:26] Thanks for the context. Can you let Andrew know about this? I'm on on phone unfortunately [09:33:40] Yes please, building on java 11 would be highly welcome. [09:33:47] tbh I think we could drop the java8 build there unless gehel or peter has objections [09:34:18] Lydia_WMDE: I am adding Andrew to the task I have filed :) [09:34:57] Should be fine as long as they do not come with Java classes that could then no longer be used in java 8 based builds. So the parent-pom should be fine. The config too, but let me check. [09:35:35] Thanks @ hashar [09:37:04] hashar: both, `wikimedia/discovery/discovery-maven-tool-configs` and `wikimedia/discovery/discovery-parent-pom` can be built on java 11, java 8 can be dropped. [09:40:10] pfischer: excellent! I am going to send a change for CI to do the java8 > java11 switch and add you as a reviewer ;) [09:44:05] pfischer: https://gerrit.wikimedia.org/r/c/integration/config/+/974488 Move maven-tool-config and parent-pom to Java 11 :) [09:58:38] hashar: +1 [09:58:49] pfischer: thanks!! :) [10:55:21] lunch [14:19:35] o/ [14:21:02] Good doc about k8s limits/requests by e-lukey https://wikitech.wikimedia.org/wiki/Kubernetes/Resource_requests_and_limits [14:22:54] nice! [14:26:26] dcausse are you comfortable rolling out the flink-operator to prod today, or should we wait? [14:27:03] inflatador: if we fixed all the issues we've seen in staging sure [14:27:28] IIRC there was this kube master problem and the fact that we had to bump quota [14:28:02] dcausse still don't understand the kubemaster egress problem, let me double check quota but I don't think it will be a problem in prod [14:28:32] for the quota we might just fine-tune staging [14:28:48] I don't consider the kubemaster egress thing a blocker, but will defer to you [14:30:33] dcausse actually there are changes to admin_ng that need to happen first, let's work on that at pairing and move fwd if we both feel OK about it [14:30:52] if you prefer to move to prod and then investigate this egress problem I'm fine [14:31:45] but we don't want to be the sole flink app having to list the kubemasters explicitly [14:32:30] agreed...I think the problem is the custom role we use for the session cluster [14:33:30] how are we going to solve this issue after? can we switch role easily? [14:44:50] dcausse good question...it seems like a better idea to change the role in staging first if possible...not sure if it is without carving out another namespace [14:45:15] or leaving dirty changes in admin-ng...I can ask jayme if that is acceptable [14:45:48] anyway, not to context-switch too quickly, but I think it might be better to talk about the backfill test with you and pfischer in our pairing if that is OK [14:48:21] dr0ptp4kt: just finished testing wikibase_rdf_scholarly_split_refactor_w_cache_n_distinct and could not find any issues! could retrieve all entities from both splits, references & values are sane in both graphs (no broken links, no surperfluous refs nor values either) [14:48:55] inflatador: sure whatever works best for you [14:51:33] dcausse ACK, gehel suggested it and I think it's a good use of time ;) [15:06:02] pfischer: would you be available to discuss SUP and backfill test? [16:04:24] Meeting with enterprise team sw dev to prep for an interview loop I'm filling in for on thurs, will join weds meeting after [16:09:56] Wednesday Meeting (tm) is in https://meet.google.com/eki-rafx-cxi [16:11:13] @ pfischer this is the ticket ~= creating ES cluster on demand https://phabricator.wikimedia.org/T312198 [17:07:58] workout, back in ~40 [17:45:54] back [17:53:17] wdqs1024 failed its reload for the 3rd time...good news is the other 2 haven't failed yet [18:06:40] can we return that machine and get a different one? :P [18:06:47] :) [18:06:51] dinner [18:33:12] ;P [18:33:20] lunch, back in ~45 [19:14:37] afaict all the airflow SLA failures this morning were missing canary's on low-volume event streams. Manually marked them all as passed [19:15:41] Trey314159: lemma vs form. Is a lemma more important? If the lemma is also a form is that more important? My linguistics is too poor to answer these questions :P [19:20:55] * ebernhardson expects the answer to be that it depends on the use case [19:39:47] back [21:04:53] ebernhardson: sorry for the delay.. a lemma is the dictionary form, or the canonical form, of a word. It is a form, and if you had to pick a form to be more important, the lemma would arguably be it. (I have now typed, read, and said "form" so many times that it has lost all meaning.) "hope" is the lemma. "hope, hopes, hoped, hoping" are all forms. [21:30:14] Trey314159: thanks! I'm still unsure though :( I suppose the question is in the context of wikidata.org lexeme search. We currently score lemma's and form's as being the same, i'm adding a bit that scores things higher if they match both the form and the lemma, vs only matching one of them, but unsure if that's correct [21:30:23] an alternate solution would be to weight lemma's higher than forms [21:30:47] also notable these are all with constant score queries [21:32:30] In the context of (abstract) ranking, I'd give lemmas a little boost, but manually tweaking scoring is always tough and of course there will always be edge cases where it goes astray (and then people complain). [21:34:06] indeed, ideally would spend a few days with data and query analysis to understand how often they are similar, and relforge bits on how much changes, but thats a bit larger scope than we have for this one [21:34:45] Are there examples that match the lemma but not a form? That would be uncommon for most languages. Mechanically, it might be okay, but I would guess that is the same as just boosting the lemma, because it is usually also a form (super highly inflected verbs in some languages possibly being a counterexample). [21:34:49] I understand about not having time to dwell on it, though! [21:35:05] Trey314159: in my attempts to understand these, i found "-ize" as the lemma and "-ise" as the form [21:35:08] as a random example [21:35:22] but i don't know how common that is [21:36:01] Ahh, good example! [21:36:38] it also confirmed i don't know what a lemma is, because i wouldn't have consider a suffix a lemma :P [21:38:39] Yeah, it's a bit weird, but if you want them to be variants, you have to pick one as the canonical form, I guess—at least in the wikidata context. [21:38:42] will bump the lemma a little then too, can't hurt [21:38:47] Cool [21:39:18] also, apparently "go through hoops and loops" is also a lemma...it seems like you can justify anything as a lemma [21:39:32] https://www.wikidata.org/wiki/Lexeme:L345514 [21:46:15] That's the lemma for an idiom—because the meaning of the phrase is not the literal meaning of the sum of the parts—so that seems reasonable to me. [22:22:02] i suppose thats reasonable