[11:18:22] lunch [13:12:34] o/ [13:36:11] dcausse are we OK to close T374009 ? just wondering if you were thinking of any more immediate changes for categories [13:36:11] T374009: Investigate EQIAD WDQS graph split host alerts - https://phabricator.wikimedia.org/T374009 [13:38:11] inflatador: I think so? [13:38:37] inflatador: only thing is that I think I've seen few systemd timers complaining after we dropped the categories graph there, I wonder if there are just dangling systemd unit files that I should have removed with an explicit "ensure" -> absent [13:39:40] dcausse agreed, I don't think we used 'ensure -> absent' anywhere so I've been manually fixing these as they come up. There's always one more though ;P [13:40:34] inflatador: ah thanks for doing this! [13:40:48] I might reimage one of the graph split hosts just to make sure it doesn't pull in categories...pretty sure your changes already fixed it for new builds though [13:42:08] ack [13:43:22] yeah, no big deal either way. I've created T374967 for the migration discussion, will fill in details shortly [13:43:22] T374967: wdqs-categories migration: decide between Kubernetes and VMs - https://phabricator.wikimedia.org/T374967 [14:04:25] \o [14:09:45] o/ [14:31:20] i was filing phabricator tasks, and then after changing locations this happened. the display is really not *that* smudged...a pink line down the middle and nothing else :( think this is the ultimate way to enforce a wip limit when creating tasks. long presses on the power and no cables versus cables no luck. so maybe i'll be working the computer store circuit today. i'll try a few more keyboard sequences but I think the display or [14:31:20] its cabling is done for. https://usercontent.irccloud-cdn.com/file/H7hRR6xg/IMG_6327.JPG [14:31:56] :S [14:32:30] it's got a nice keyboard imprint at least :) [14:33:36] i had one of my laptops break that way awhile ago, thankfully oit was able to swap it out pretty easily. I dunno the process for mac though, they might send you to an apple store? for lenovo's i went to the office and they traded me machines [14:34:46] ouch :/ [14:35:43] for lenovo when I had my own laptop they sent someone "on-site" (at my house), they replaced the motherboard on my kitchen table :) [14:38:31] flink 1.17 runtimes has moved to archive.apache.org which is extremely slow, latest flink is 1.20 should we start upgrading before we get too far behind? [14:38:57] I used to take care of all the A/V equipment for a college. We bought dozens of Kodak projectors with a display flaw that looked exactly like dr0ptp4kt 's display ;P [14:39:05] dcausse: hmm, yea probably i suppose. Does the operator version usually have to stay synced or are they reasonably independant? [14:39:47] the operator is supposed to support multiple flink versions, looking [14:40:47] Re: flink migration LMK if I can help [14:41:40] now that we have ceph, we could look into mirroring the flink binaries...not sure it it's worth the effort though [14:44:49] the master branch says it's compatible with flink "v1.16, v1.17, v1.18, v1.19", the latest stable says "v1.16, v1.17, v1.18" [14:45:06] :S [14:45:35] wacky idea here...would we ever consider running categories on something other than blazegraph? Like qlever or one of the other potential replacements? [14:45:45] problem with these dependency matrix is that it's perhaps compatible but the doc is not updated [14:46:07] inflatador: shrug, we probably could if we wanted to build operational experience before doing the big switch [14:46:43] ebernhardson ACK, I guess we can table that until if/when we decide to ditch BG [14:46:49] inflatador: being that it's mostly running one query, it doesn't have the limitations of a public service that needs to support wide swaths of SPARQL [14:47:55] sadly it is exposed publicly, not well advertised, only documented at: https://www.mediawiki.org/wiki/Wikidata_Query_Service/Categories [14:48:13] oh interesting, i guess it would be worth checking before any switch what kind of public traffic it sees [14:49:10] dcausse: i suppose we could file a task about 1.20 support? [14:49:27] yes... and here blazegbraph custom features like bfs are explicitely mentioned in the doc, so clients will likely have to deal with a breaking changes [14:49:44] ebernhardson: you mean for us or the operator? [14:50:33] dcausse: for operator, i guess it would be some apache bug tracker [14:50:49] ebernhardson: sure [14:51:57] could we run categories in WMCS? Are there any public services that do that? [14:52:04] i guess they use jira? [14:52:13] inflatador: general rule is supposed to be prod doesn't use WMCS services [14:52:22] if prod needs it, the service needs to be prod [14:53:02] ACK, just going thru potential options [14:53:02] ebernhardson: yes, checked github and it's already there: https://github.com/apache/flink-kubernetes-operator/commit/024b70bbc752fef7e5a3646d948087b4d4ba9f16 [14:53:37] dcausse: interesting, indeed it looks to only be flagging it as working without any real changes [14:54:27] Won't be at triage (pairing with dpe sre) [14:54:52] yes... only the crd adds a specific entry for V1_20, but IIRC I've seen jobs misconfigured with V1_16 while running 1_17 and it worked just fine [14:55:53] will file few tasks to upgrade the jobs and hopefully the new operator might get released "soon" [14:55:56] the linked issue does suggest there was some issue though with an unrecognized field in a metric [14:57:18] dcausse: i guess they just missed some docs then? https://github.com/apache/flink-kubernetes-operator/pull/869 is merged which suggests master should support 1.20 [14:58:32] ebernhardson: thanks, yes master should be ready for 1.20, not sure where I saw that it was blocked on 1.19 [14:59:14] ah, it's https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/ [15:00:11] perhaps these nightlies are not updated that frequently? [15:00:17] perhaps [15:01:43] dcausse: triage meeting https://meet.google.com/eki-rafx-cxif [16:48:52] hmm, realizing i don't know how to select the staging release in any of our dashboards (flink-app and cirrus-streaming-updater) [16:49:49] ebernhardson: it's on a different prometheus cluster [16:50:15] dcausse: oh duh, yea it's listed there. Thanks! [16:58:59] oh, sigh...i forgot to deploy the schema changes to mediawiki-config first so staging doesn't come up. Will revert the helmfile changes and will have to do on thursday [17:01:27] :/ [17:16:27] means we can't deploy streaming updater until then, since the patch is merged there. I don't think there is anything pending though [17:17:35] well, can deploy the updater but can't update the container version [17:33:06] * ebernhardson shouldn't be surprised that instead of saying a paper was written by "carlo rovelli", they say it was written by Q4205426 Q21450812 [17:57:06] should pretty obvious :) [17:57:28] lunch, back in ~40 [18:04:52] randomly curious, the example queries we have at https://www.mediawiki.org/wiki/Wikidata_Query_Service/Categories work in the `run it manually` link, but not in the `Try it!` link that goes to the UI [18:05:33] presumably because the 'run it manually' link directly calls out the endpoint [18:26:29] yes it probably re-uses the same {{Sparql}} templates that just defaults to the wdqs ui [18:36:44] dinner [18:58:59] back [19:05:35] ebernhardson: 1:1? Or do we skip for today? [19:05:58] gehel: doh, 1 sec [19:53:22] * ebernhardson finds `interface ProperPageIdentity extends PageIdentity` mildly amusing. I guess we realized it was incomplete [20:11:03] soon to be followed by `ProperlyProperPageIdentity` ? :P [21:20:21] ryankemper just a heads-up, running a categories data reload from my user on cumin2002 , ref T375443 [21:20:22] T375443: categories migration: determine size of new WDQS categories dump - https://phabricator.wikimedia.org/T375443 [21:23:19] seeing a lot of stacktraces from the categories reload. Guessing this is not going to work [21:23:33] `Caused by: org.openrdf.sail.SailException: com.bigdata.rdf.sail.webapp.DatasetNotFoundException: namesp [21:23:33] ace=categories20240923` [21:27:06] hmm, trying to remember what namesp is [21:27:10] i'm sure it's namespace, but still [21:27:40] oh, duh, it was on the next line :P [21:28:45] I'm guessing the problem's related to recent changes to separate out categories? [21:29:12] getting same errors when I run `reloadCategories.sh` locally from wdqs2020 [21:29:13] dataset not found suggests to me it needs the equivalent of a 'create table' statement? [21:33:57] ebernhardson seems reasonable. Checking thru https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/query_service/files/cron/reloadCategories.sh , this is the script called by the cook-book [21:35:40] inflatador: did createNamespace.sh report any problems? I would expect that to have resulted in the new dataset [21:36:13] i guess it is supposed to || exit 1 on error, but maybe it failed without giving a bad return code [21:36:22] ebernhardson I'm still looking for a `createNamespace.sh` . That could well be the problem ;) [21:36:32] can't find it in /usr/local/bin on wdqs2020 [21:36:42] inflatador: its in the rdf deploy repo [21:38:00] it looks like the script is supposed to read in default.properties from same repo, update it with the new name, and then load it into blazegraph via curl [21:38:48] ACK, I wonder why createNamespace.sh is not on the host? Do we have to run a categories deploy to get it there? Or maybe it's there and I'm just looking in the wrong place? [21:39:29] inflatador: which host? it should be there [21:39:48] anything that has blazegraph should have it [21:39:52] it comes via the scap deploy [21:39:56] looking on `wdqs2020` [21:40:25] inflatador: /srv/deployment/wdqs/wdqs [21:44:46] ACK...still same errors when I run. This patch was merged today ( https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/1070942 ) ... maybe I need to do a scap deploy to pull in these updated scripts [21:46:24] inflatador: hmm, probably wouldn't hurt [21:50:14] ebernhardson yup, that fixed it...or at least it's getting further [22:06:08] ryankemper assuming the cook-book finishes during your shift, if you wanna update T375443 with the new journal size feel free . I've got tmux up on wdqs2020 and it looks like we're at about 4 GB [22:06:09] T375443: categories migration: determine size of new WDQS categories dump - https://phabricator.wikimedia.org/T375443 [22:07:01] inflatador: ack! [22:12:05] ryankemper excellent, thanks! /me is getting a little OCD about async/shift work ;P