[06:20:03] I see I made an off by one error with elasticsearch version :( [06:21:28] gehel: Were you the one to setup our shared drive? if so - do you remember if you add members if they automatically get an email? [06:21:54] I need to create one with members from within tech-all@, but I rather not mail every single soul in engineering [07:13:53] ah, you can mark a point in time in grafana, cool [07:19:10] dcausse: about that https://phabricator.wikimedia.org/T293027#7462156 comment from mpham - what is required to have WCQS (blazegraph) metrics under "wcqs" cluster name? [07:20:23] I assumed until now that if we report all the metrics under that cluster name, we'll basically get all those dashboard almost for free, not counting any we might have for the old updater, of course [07:26:47] tltaylor: note that comparing scores across multiple queries isn't meaningful (if that was your intention). I'm sure Erik can add a lot more details if needed. [07:27:31] zpapierski: our shared drive was probably created by Erika a long time ago [07:28:28] I don't know about shared drives, but when you add people to a directory, you can choose to notify them or not. [07:33:53] A new contributor just pushed a patch on mw-core, but about search. If someone feels like reviewing, that would make for a good first contributor experience! [07:33:57] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/734801/ [07:41:46] zpapierski: I suppose wcqs will report itself as the "wcqs" cluster so most dashboards will be re-usable, so in the SLO dashboard we might need to add a new cluster var to switch between the two [07:42:26] there are probably some dashboards doing regex with wdqs.*, those will have to be adapted [07:42:52] if blazegraph is running on wcqs you could already check the shape [07:51:24] not seeing any reported, unfortunately :( [07:51:34] I'll check it out later [07:51:37] errand [10:11:20] zpapierski: are you available [10:14:36] ejoseph: I'm relocating, will be back in about 20min [10:24:36] lunch [10:34:07] ejoseph: I'm here, want to continue yesterday's session? [10:37:21] Yes [10:37:48] cool, hit me with a code with me link [11:40:28] lunch break [12:40:58] lunch break [13:33:05] of course gehel is talking me out of misuse of relevance scores. [13:33:26] perhaps a multi-panel result set view would remove the need to interleave results [13:42:49] tltaylor: it's like the no 1. subject for relevance engineers! I'm still waiting for my wife to propose that she could use relevance scoring for anything outside of the context of the original query, but alas, didn't happen yet. [13:47:36] she's not an engineer, so I'll probably wait a longer while [14:12:43] tanny411: related to duplicated label predicates: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_differences, i.e. you should not see schema:name or skos:prefLabel predicate in the dataset you work with [14:13:10] if you do then we've made a mistake in the import process [14:17:41] dcausse: looking at T231845, didn't we add authentication to the updater? [14:17:41] T231845: wdqs update with restricted view access on anonymous users - https://phabricator.wikimedia.org/T231845 [14:19:11] gehel: no? the updater has always been using public un-authenticated endpoints [14:24:23] tltaylor: when we had this problem we always preferred multiple tabs, fusioning in the same resultset is very difficult and often confusing [14:50:57] heh, apparently they sent me cupcakes from the same place as trey, crossing the contry is taking a bit though :) [14:51:48] also, \o [14:54:01] wrt merging queries, i suppose i see a form a result interleaving more and more from big properties these days, i get a few results to the query, then a box with a few results from somewhere else, then a few more results to the initial query, etc. [14:54:40] but those are displayed as interleaved groupings, rather than individual interleaved results [14:56:44] o/ [14:57:32] yes they're generally clearly identified/separated in the UI [14:59:42] i wonder how the stats work out, we know people tend to click the top 3 assuming mostly position bias. But the top 3 are also usually pretty good. Maybe you can do better with three top 3 result lists instead of a top 10 [15:00:00] or maybe it's 100% position bias and those extra results just become the new 4-9 :) [15:01:33] ejoseph: please see Phab from Chris [15:02:13] Ok [15:17:06] WDQS Streaming Updater celebration is starting: https://meet.google.com/xgq-wvik-dkp [15:17:18] ryankemper, maryum, tltaylor ^ [15:20:11] here [16:06:44] ejoseph: you need to share the new production public key on phab so or update the puppet repo [16:07:29] Ok [16:07:33] On it [16:09:27] dcausse: sorry for the super delayed response. I did indeed find schema:name in my analysis, and also mentioned it here: https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis#Other_predicates_like_Label [16:09:27] If this is something recent I can recheck with newer data. But I can confirm there is no prefLabel. [16:09:27] Although both of these occur in https://www.wikidata.org/wiki/Special:EntityData/Q42.ttl?flavor=dump. Is this expected? [16:16:36] tanny411: np! them appearing Special:EntityData is normal but having both schema:name & rdfs:label seems weird... I need to check what's going on thanks! [16:18:06] sure! [16:28:37] ah schema:name is used to label sitelinks, i.e. triples: schema:name "Douglas Adams" [16:29:00] not to label entities, entity labels is only via rdfs:label [16:29:48] so no duplication here [16:41:22] interesting! thats great then [16:53:15] hmm, not entirely sure on what the deprecation process would look like to move a function from @stable to override to final [16:55:55] about https://gerrit.wikimedia.org/r/c/mediawiki/core/+/734801 ? [16:56:21] dcausse: yea, they don't mark it final yet but that would be appropriate given the use case [16:56:57] I left some questions on the ticket to make sure we don't miss anything important [16:57:24] I'm never super comfortable with these hooks being passed a mutable array [16:57:56] but it's there so I suppose it's tempting to mutate whatever you can :) [16:58:32] dcausse: lol, indeed :) I suppose in some places i try and pass an empty array into the hook and then add things it provides. It makes sense that there might be some higher level area the user should be injecting their content. Things get messy when data can come from anywhere [16:59:21] for instance here I wonder if extending the PdfHandler is not a better approach [17:02:45] dcausse: unrelatedly, where do i need to get the wcqs usernames so we find them when trying to figure out which queries are crushing the servers? [17:03:07] dcausse: right now they exist and are recorded inside a signed token, but we never actually write the username to anything server side [17:03:51] ebernhardson: I think it would go into https://schema.wikimedia.org/repositories//secondary/jsonschema/sparql/query/current.yaml ? [17:04:54] dcausse: ok, i can figure out how to get it there [17:05:32] it's a separate webapp tho? [17:06:03] dcausse: yes, wholly separate war install. Might use a shared cookie name or something, not sure yet [17:06:47] yes I'm not sure how to share data between two webapps :/ [17:07:19] blazegraph is probably already being provided the wcqsSession token we invent, i guess i can read that token in the blazegraph side if we pass the secret in [17:07:20] if you manage to get it from the client http request it'll be very easy to forward to the query log [17:08:06] the class assembling the event is org.wikidata.query.rdf.blazegraph.filters.QueryEventSenderFilter [17:08:54] * ebernhardson constantly wishes closing tabs in idea worked like chrome...the tab widths are all over the place :P [17:09:38] true, I'm quickly lost with tabs in idea [17:10:02] so I use Ctrl-E [17:13:00] going offline [17:13:47] g'night [17:28:53] ebernhardson: so we've got a bunch of new eqiad elastic* hosts that I need to bring into service: `elastic10[68-83]` [17:29:45] I know the general process, but are there any concerns about me adding such a large number of hosts to the elasticsearch clusters (before decom'ing the old ones)? I'd think the cluster should handle it fine, ie that our masters aren't particularly highly taxed or anything, but figured I'd sanity check with you [17:31:26] ryankemper: hmm, i can't imagine it causing any problems [17:31:46] cool, I don't think so either but just wanted to run it by you [17:31:47] ryankemper: the recovery process will run a bunch to rebalance the cluster, but should be fine [17:32:07] I imagine the shard reshuffling will put some extra load on the cluster but not enough to impact anything in a serious way [17:32:12] yeah [21:34:08] I'm going to guess probably not, but randomly wondering if there are good reasons we should prefer the wcqs banlist to be in puppet private instead of some public value. Is there any responsibility to not name/shame? [21:35:09] i'm guessing no since we put ip and user agent ban lists in public, but something feels slightly different about usernames [22:50:25] ebernhardson: there are both private and public IP/UA ban lists