[09:52:28] Weekly update: https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2023-12-01 [10:12:43] gehel: why closing T336443? [10:12:43] T336443: Investigate performance differences between wdqs2022 and older hosts - https://phabricator.wikimedia.org/T336443 [10:13:26] the investigation seems good enough, the difference in CPU frequency seems to explain the throughput difference. [10:13:42] Unless you have a different opinion? [10:14:08] I'll open another task to see if we want to do something about it (tuning the governor, or buying faster CPUs will less cores) [10:14:12] yes wdqs1022 and wdqs1023 have CPUs running at 2.40Ghz [10:14:32] which is faster than old ones wdqs1009 which runs at 2.1Ghz [10:15:53] And we still run the import faster on 1009 compared to 1023? [10:15:57] Ok, I'll re-open [10:19:15] thanks! [10:33:04] dcausse: I just forwarded you a meeting for this afternoon (sorry for the Friday meeting, feel free to skip) [10:33:31] np! [10:54:43] lunch [11:23:21] SpotBug claims so many errors that I think the project should be entirely erased :D [11:36:37] dcausse: are you able to publish to archiva (with your LDAP credentials)? I tried last night but end up with a 401 response. [12:20:04] pfischer: I used to be able to push stuff there but haven't done that in a while, trying [12:31:38] pfischer: seems like I can't, tried to upload a snapshot to https://archiva.wikimedia.org/repository/snapshots/ but failed with 401 as well [12:32:28] I don't get any buttons in the UI to upload manually which I remember existed a long time ago [12:33:14] if we're going to fork the elasticsink perhaps might make sense to have a repo somewhere and push from them with our CI? [12:33:35] s/them/there [12:34:07] or mimic a maven repo structure on people.wikimedia.org [12:34:53] or ask to change archiva to allow uploads from humans [12:44:29] dcausse: are you logged in to archiva [12:44:37] gehel: yes [12:45:49] as dcausse? [12:45:55] yes [12:46:12] I can't see to find you (but the UI is terrible) [12:46:45] you have the upload buttons? [12:47:05] I do [12:47:28] reminder: wikidata modeling days in 10' [12:47:56] I'm going to review the presentation one last time, I can have a look into archiva afterward [12:48:03] so it's still possible then, something to adjust [12:48:14] joining jitsi, Luca is in there [12:48:47] I'm in jitsi as well [12:48:56] if anyone else wants to join: https://meet.jit.si/WikidataDataModellingDays2023 [13:12:05] * hashar see a links, click it [13:12:11] sees [13:49:03] pfischer: and Archiva slowness is https://phabricator.wikimedia.org/T273086 :) [13:51:41] dcausse: do you have 5' for a debrief? [13:51:45] meet.google.com/vgp-rkwn-oar [13:52:31] Thanks for coming and telling us about the work; really appreciate you taking the time :) [14:13:58] tarrow: thanks! always a pleasure [14:16:52] o/ [14:18:28] o/ [14:29:11] pfischer were you able to get into archiva? [14:30:51] dcausse: looks like you need to be in the "archiva-deployers" LDAP group. Ideally, we'd like CI to do the release + upload. [14:33:48] gehel: thanks [14:34:07] inflatador: do you know where to run ldapsearch? [14:36:05] looking at backscroll I see that peter is in archiva-deployers so that might not be enough? [14:41:13] dcausse I used mwmaint2002.codfw.wmnet [14:43:20] dcausse: do you want to do additional validation on T347504 ? Or should I close? [14:43:20] T347504: WDQS graph split: load data from dumps into new hosts - https://phabricator.wikimedia.org/T347504 [15:44:37] gehel: does the archive UI allow you to grant permissions? [15:45:17] pfischer: I don't see a way to change permissions in the UI, but I don't know much about Archiva [15:54:54] Trey314159 this reminded me of you, for multiple reasons! https://about.usps.com/newsroom/national-releases/2023/1130-usps-reveals-additional-stamps-for-2024.htm?ICID=ref_fark [15:57:57] inflatador: that's cool! Thanks! [16:08:31] \o [16:16:35] 'nother CR for the ldf endpoint stuff. I got some help from observability, so hopefully this will work https://gerrit.wikimedia.org/r/c/operations/puppet/+/979388 [16:17:19] also, workout, back in ~40 [16:25:49] took a quick glance over guillaume vs david groups in ldap, the try and explain the varied archiva access, but not seeing anything obvious. The extra groups are: ops, project-tools, project-shiny-r, project-automation-framework, and tools.sonarqubebot. I'd suppose `ops` might give extra access in archiva ui, maybe another sre could check [16:27:11] Saw a joke about Germans using Ü instead of :) for smileys and it suddenly hit me that bottom parens and combining diaereses exist: ⏝̈ —why is no one using this? [16:27:46] i think project-* and tools.* are all related to cloud, shouldn't have any effect here [16:29:36] dcausse: I've sent you an invite for Monday morning to get started on that technical plan [16:30:32] dr0ptp4kt: I'm trying to find a spot to share that with you, but it's going to be too late for David. Let's do some async as well. [16:58:15] inflatador: if you have a chance, could you copy /var/lib/archiva/conf/archiva.xml from archiva1002.wikimedia.org to another server where i can see it? Not certain, but suspect it would explain how rbac is configured there [17:00:23] Trey314159: it probably requires finger-acrobatics, but good to know nonetheless [17:01:29] ebernhardson: BTW: I scp-ed the maven repo to my people public_html in the mean time [17:02:38] pfischer: excellent, thakns [17:02:59] pfischer: it's not easily typable, true, but neither are some of the more epic emoticons inflatador uses [17:04:29] back [17:04:46] ebernhardson :eyes [17:05:23] inflatador: in theory that should be where archiva stores all the configuration that is done from the admin UI, and since i'm not seeing anything relevant in puppet i'm guessing the rbac is there [17:06:12] * ebernhardson also expects it will be a terribly hard to read xml, that was never intended for humans :P [17:10:02] LOL. I used to like it better than json when json first got introduced [17:11:53] ebernhardson: I curious how those new bulk metrics exposed by flink end up in prometheus. I tried to group them with key-value pairs for change_type and [ES bulk item] result but in the flink web UI they only show up as awfully long metric names. As far as I remember, aggregating metrics across their names is harder than across tags in prometheus. [17:12:49] dinner [17:13:48] ebernhardson I copied archiva.xml to your homedir on mwmaint2002 [17:14:32] inflatador: thanks! looking [17:15:49] So this says archiva-deployers are `Global Repository Manager` and `Global Repository Observer`, and ops gets `Sytem Administrator`. So could explain how ge.hel has extra buttons. Will have to review some archiva docs to see if this needs to change [17:16:00] global repository manager feels like something that should be able to upload though [17:17:04] docs claim that should be everything we need: Global Repository Manager: users with this role can write to and administer any repository in the instance [17:17:53] does that map to an LDAP group? [17:18:31] its a mapping from archiva-deployers to those two roles [17:19:30] well...damn [17:20:00] maybe there are per-repo settings too? not sure [17:41:54] read through the rest, nothing interesting. Curiously i don't see any alternate access that would let the CI upload from jenkins work, but not for individual users :S [17:48:02] actually, it does have a second user manager impl. It is configured for both ldap and jdo (java data objects?), which iiuc is database backed [17:54:18] i dunno, not sure i can see much else from here. Maybe archiva logs have something, but they don't seem to make it to logstash [17:58:25] OK, one more blackbox CR . Maybe it will be the last! https://gerrit.wikimedia.org/r/c/operations/puppet/+/979401 [17:58:36] probably not though ;) [17:59:03] For https://phabricator.wikimedia.org/T351662 (related to https://phabricator.wikimedia.org/T336443 ) - do we know if it would be possible to do a head-to-head test of a singular munged file (e.g., *000001.gz) on each of wdqs1023 and wdqs1010? it seems like wdqs1010 would need to be depooled and disk journal file dropped. [17:59:05] fwiw, i tried the power savings versus performance profiles on my gaming machine (ubuntu 22.04), and the import time of one chunk was almost identical. so, while not conclusive because of different architectures, it's something of a clue. [17:59:21] inflatador: ^^ sorry, meant that for you [18:01:18] on my machine the cpu utilization was visibly different between these profiles, but no obvious difference in running time here. i'm going to try pinning to a single processor with _taskset_ as a cheap test (that seems to really rev up the cpu used by the parent java process...but need to run all the way) [18:01:25] dr0ptp4kt: we should be able to, loadData.sh amounts to looping over a list of files and pushing them through http. we could probably construct a single-line curl statement that uploads to an alternate namespace to not muck things up (n.b. i know little about blazegraph :P) [18:02:11] oh interesting, we don't actually even upload the file over curl. The curl statement says basically `LOAD ` [18:07:35] dr0ptp4kt ACK...I think that would be a good test. I can probably start working on it this afternoon CST, using ebernhardson suggestion [18:11:27] oh, cool - would it be okay to .bak (then re-plant that afterward) or destory the journal file (and cookbook data transfer a known good one, I know that has been flakey sometimes) and use a fresh one on top of the curl command here? it seems like the more utilized the journal file is the slower the loads become...which, you know, a graph split could somewhat sidestep - but question is if perf issue is seen from the get-go [18:11:51] interested to see, either way! [18:12:11] dr0ptp4kt: seems reasonable to test, i suppose i'm suspecting that the journal slowness is per-namespace, but i have no actual reason to believe that :) [18:12:31] dinner [18:12:48] Yeah, we can remove the journal files. Data transfer cookbook is reliable enough not to worry [18:14:12] ryankemper any chance you could review https://gerrit.wikimedia.org/r/c/operations/puppet/+/979401 ? The typo is causing alerts. I've suppressed them for now [18:16:41] lunch, back in ~1h [18:17:47] * ebernhardson is amazed that archiva can document their REST api without once mentioning a url or a path, just their java client library :P [18:20:53] thx inflatador . okay, i'm kicking off my attempt with taskset of 1 on desktop to let it complete and see if that does anything. [18:22:18] ebernhardson: ha, yes - that could certainly be!...gotta go blazegraph spelunking in the not too distant future for the db internals [18:25:11] hmm, perhaps relevant, the archiva-users mailing list has been archived since there is "little (almost none) activity with the project now" [18:25:23] probably time to at least put replacing archiva on the roadmap (can gitlab replace it?) [18:27:15] hmm, thats part of the "ee" gitlab [18:36:31] * ebernhardson sees that somehow OSS is a bit dead in the maven repository manager space, the non-dead options all look to be designed to upsell you into their paid solution [18:37:17] * ebernhardson could be missing a few options :P [18:58:58] inflatador: will be back in 15’ to review [19:10:31] back [19:11:39] ryankemper w'ere good, mutante +1'd [19:22:19] now I have to figure out how to stop the poller from DOSing the endpoint...looks like we've got multiple requests per second per poller [19:43:26] ah, it's DOSing because it made a target for each and every WDQS host. OK, back to the old style check for now [20:04:31] Giving up on the blackbox check and using the old style for now https://gerrit.wikimedia.org/r/c/operations/puppet/+/979408/2..3 [20:35:22] * ebernhardson is mildly annoyed debian debian started merging /{bin,sbin} into /usr/{bin,sbin}, resulting in hardlinks from files in /sbin to /usr/sbin, but `dpkg -S $(which netstat)` cant find things because which finds the "wrong" copy [20:52:30] oh yeah, that is obnoxious [21:02:47] taskset didn't change real time - 65 minutes just like the others [21:18:36] ryankemper ebernhardson mind taking a look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/979408 ? Sorry to bug again, but there's a target for each WDQS host hitting wdqs1015 and I'm sure it's creating logspam [21:19:35] sure, looking [21:20:45] Mutante was helping but he vanished somewhere ;) [21:23:39] inflatador: sure that url is correct? Trying to query that from here fails, and having "object=!wd:42 wdt:P31 wd:Q5" is a bit suspicious, because that object looks like a triple [21:24:23] maybe it does work that way, i'm not familiar with ldf :) but it seems to error from here [21:24:49] ebernhardson no ... I'm looking at the check above it and guessing that the part after the last "!" is the string it's supposed to match against [21:25:24] could be totally wrong about that though [21:25:44] I'm also worried that this one will have the same problem...it'll make a check for every single WDQS host instead of just one [21:26:40] inflatador: ahh, ok yea we need a different query. i think you can simplify the final bit to `object=!wd:Q42`, we don't really care what it repsonds with just that it does respond [21:26:44] for the every host thing...mhm [21:28:43] ACK...the query was d-causse' suggestion. But I'm starting to think it'd be more prudent to just wipe out the check and worry about it monday [21:29:48] looking at icinga, it's almost certainly going to make a check for every host...so forget it for now, I'll just take out the check entirely. Sorry to bug [21:30:16] no worries [21:30:45] Yes agreed on tearing it out for now [21:35:10] FWiW, observability has plans to make this type of check a little easier https://phabricator.wikimedia.org/T312840 [21:36:00] OK, updated the CR so it's just wiping out the check entirely