[02:59:03] I'm looking at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&orgId=1&refresh=1m&from=now-30d&to=now and noticing that prior to Streaming Updater release, most servers are <1min lag when not spiking, but in the past couple of days, they're all uniformly just over 1m lag. Is that expected? [06:39:30] mpham: it's expected - while current pipeline has much higher throughput than the old one, which means less (by far) spikes, it is also a longer one, somewhat by design. To better reconcile the differences between entities and sync few different changes topics (revision-create, delete-page,etc) a partial ordering is being peformed, which requires some time window (which is now set to 1m). [07:04:56] Also the lag is sampled every minute, so the best precision of that metric is 1 minute. A degradation of 1 minute is within the error margin of the metric. [07:05:11] Yes, I know I should be on vacation [07:08:55] really? I thought it doesn't really matter what the sampling rate here is, time() will return the timestamp at the time of the sampling? [07:09:02] (calculation is time() - blazegraph_lastupdated) [08:05:46] ok, now I have a percentage share in traffic for each instance [08:06:05] I don't think we're going to get anything better from preexisting values [10:06:23] dcausse: mind if I create a short meet after Wikidata sync to discuss what metric are we interested in for SLO? I'm still verifying the my math on dashboards (I think I'm missing something in my traffic percentage calculation), but I think we know enough to make an informed decision on what do we want to commit to [10:07:11] zpapierski: please do [10:12:36] hmm, actually he's around before retro, even better [10:16:33] perfect! [10:16:39] lunch [11:01:00] break [14:30:52] mpham: around to meet? [14:31:01] yeah, one sec [14:49:01] anyone else see this? https://phabricator.wikimedia.org/T294025 [14:56:09] zpapierski: I think this is the same issue I brought up earlier in the chat. Sounds like maybe we ought to explain what;s happening more generally so people understand [14:56:16] yep, will do [14:57:56] \o [15:00:24] sorry, should have scrolled up. Yes, I think some communication around this to the community is a good idea - if something was degraded in favor of the improvements, we should communicate that choice [15:05:58] ebernhardson: retro? [15:36:23] zpapierski: what do i need to do to deploy mw-oauth-proxy to wcqs? trigger a build in jenkins, but then what? [15:36:54] * ebernhardson could probably find it, but asking too :P [15:36:55] standard blazegraph deployment is when it happens [15:37:18] zpapierski: hmm, is there any danger in doing a full blazegraph deployment then, will that force wdqs too? [15:38:06] not sure, I never used scap for wcqs-beta [15:38:58] heh, actually i never updated scap to sync to wcqs :P [15:39:09] I guess you can do a group deployment? as in create wcqs group, alongside wdqs, wdqs-internal, etc? [15:39:13] ok, i'll do that. It can be a separate scap group [16:10:37] cbogen_: I answered in the ticket - https://phabricator.wikimedia.org/T294025#7448648 [16:10:56] I'm tempted to close it now as weel, since we now this isn't a bug [16:11:43] s/weel/well [17:20:22] aww, i don't have permission to submit in wikidata/query/deploy :) Anyone want to send these two: https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/732742/1 [17:38:08] ebernhardson: done [17:38:14] thx [17:38:45] we should get you merge rights as well [17:38:51] i assume that's somewhere in the gerrit repo settings [17:39:13] yea, looks like guillaume or david are in wikidata-query-admins and would need to add me [17:40:26] * ebernhardson somehow expects this to fail anyways, i suspect this scap deploy is still too tied to wdqs. Will find out :) [17:40:55] heh yeah I had the same "concern" (re deploy possibly not working) [17:41:14] I'll do a deploy today so we can see (/ we need to anyway, haven't deployed since completing the streaming updater cutover) [17:41:33] yea, i'm expecting this to fail today but hopefully can figure it out for next week :) [17:42:54] for wcqs i wasn't going to do the full deploy procedure, although i guess we could. I was going to scap sync only wcqs [17:43:11] i suppose best to do a proper deploy, will get awkward if we have different versions running in different places [17:43:58] zpapierski: thanks for the response, it's good! i say we wait a day or two and then close it [17:44:10] ebernhardson: do you remember how we add to `wikidata-query-admins`? I'm blanking [17:46:32] ryankemper: i mean david and guillaume are in wikidata-query-owners (s/admins/owners/, oops). https://gerrit.wikimedia.org/r/admin/repos/wikidata/query/deploy,access says i need to be in wikidata-query-deploy which is owned by wikidata-query-owners [17:46:39] so i think either of them can add me in the gerrit ui somewhere [17:48:20] I'm in `wikidata-query-deploy` (along with david/guillaume) but I don't see how to see who's in `wikidata-query-owners`specifically [17:48:38] https://usercontent.irccloud-cdn.com/file/anNwuDM6/Screen%20Shot%202021-10-21%20at%2010.48.30%20AM.png [17:49:16] https://gerrit.wikimedia.org/r/admin/groups/fa916121a583488e2a983d14a3ede1c455782dbe,members [17:50:28] yup, mostly re-browsed to group search, click through, and then there is a `members` link in the sidebar [17:51:33] ah gotcha (& thanks majavah), well hopefully dcausse can add you (and me) to https://gerrit.wikimedia.org/r/admin/groups/fa916121a583488e2a983d14a3ede1c455782dbe,members thru the UI tomorrow [17:54:32] ebernhardson: what was the scap command you were envisioning btw? would it be a `scap sync-world`? [17:54:57] the only command i'm aware of are `scap sync-file` and `scap sync-world` (i'd bet there's more tho) [17:55:00] commands* [17:56:28] it's scap deploy with cli options i believe, lemme check [17:58:44] i guess `scap deploy -l 'wcqs*'` should work, but i'm sure i've seen a single-group deploy option before. Still looking :) [18:02:42] hmm, interestingly reading through scap code there doesn't seem to be a way to choose the server groups, those are strictly from config. We would indeed need to use the limit-hosts option [18:13:23] Okay I like the idea of rolling the deploy for just wcqs first before doing a full one [18:13:50] And also if wcqs deploy is broken in a way that isn't trivially fixable today I can still deploy wdqs w/ `-l wdqs*` [18:18:45] yea sounds reasonable [18:30:25] Trey314159: do you want to be invited to the ElasticSearch planning/kickoff? I couldn't remember how involved you are in this process but didn't want to exclude you either (I think everyone else should probably be there) [18:31:45] mpham, yeah, please include me. I usually do a quick round of testing of analyzers to see if there are any unexpected changes. We've caught a few minor things that way—and discovered they'd added Ukrainian one year! [18:32:12] ok will do! [19:10:20] yea scap isn't happy, the problem is it's trying to restart the service but the service name varies [20:03:07] ryankemper: if you can merge this one it might work this time: https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/732772 [20:47:01] ebernhardson: merged [20:48:00] ryankemper: thanks! [20:55:31] hmm, nope. dry-run deploying the wcqs environment tried to deploy to canary...hmm [21:01:44] huh, and then ignoring that and specifying the config file directly it fails to connect to wcqs2003: Load key "/etc/keyholder.d/deploy_service.pub": invalid format [21:02:00] so, more to figure out :) [21:18:01] Weird that it'd be invalid format [21:19:06] The contents of the pubkey start with `ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCvbV8H7Vzy...`, i.e. it looks pretty standard [21:20:08] It's possible that format was invalidated and there's a new format now, but wouldn't someone else have gotten bit by that already if the `deploy_service` key is shared across services? hmm [21:26:57] Testing a bit, i think it might be something happening on the deploy host side, perhaps intentional [21:27:55] doing a normal `scap deploy --dry-run -l 'wcqs*'` looks to talk to wcqs ok, but adding a `-c scap/environments/wcqs/scap.cfg` causes the keyholder error [21:32:39] ebernhardson: looking back at your earlier message...what was the command that led to it trying to deploy to the canary? [21:33:47] `scap/scap.cfg` has `server_groups: canary,default`, which is absent in `scap/environments/wcqs/scap.cfg`, so i'm wondering if the environment scap.cfg functions more as an override meaning that it's still getting `server_groups` set by `scap/scap.cfg` [21:33:59] Or if the command you ran just doesn't involve `scap/environments/wcqs/scap.cfg` and that's why [21:33:59] ryankemper: hmm, sec lemme reconnect [21:34:29] ryankemper: scap deploy --dry-run --no-log-message -v --environment wcqs [21:34:54] Okay so I imagine that it always reads `scap/scap.cfg` and just uses the environment as an override [21:34:57] ryankemper: from the docs i had the impression only vars.yaml is merged between environments, otherwise files in the environment override the root [21:36:00] i think now on closer review though, it's also doing merging inside scap.cfg, so i can probably drop most of whats in the wcqs config and only list the differences [21:36:35] (the merging would also be where canary comes from, server_groups is canary,default in top level scap.cfg, wcqs i left it blank to take the system default of `default` [21:36:58] Sounds reasonable...so we'll want to make sure we override `server_groups` in the wcqs env so that we omit the canary [21:39:55] ryankemper: so i think https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/732797 will do it, hopefully :) [21:40:52] ebernhardson: LGTM, I'll let you do the honors [21:42:06] looks happy in dry run, letting it go for real [21:44:01] well, kinda sorta. It's complaining that i forgot to set the new required config values [21:46:11] new required config values? [21:47:02] it needs a random secret value to use as the JWT secret key, and it needs a url for kask [21:47:40] i suppose the secret needs to be deployed from puppet, has to be same on each and private [21:54:54] okay, if you go ahead and throw a secret into a private phab I can get started on getting it into /srv/private as well as the `operations/puppet` logic to stick the secret where it needs to go [21:55:22] does it just need to be written to a file on disk somewhere? [21:55:49] ryankemper: it doesn't have to be anything in particular, probably some reasonable length and representable in mostly ascii since it goes into a defaults file and sourced into a shell script [21:56:14] okay sounds simple enough [21:56:18] yeah i'll keep it ascii armorable :P [21:56:43] or whatever the appropriate technical term is for "not a bunch of binary gobbledygook" [21:57:14] i mean, /dev/urandom isn't a horrible place to get a secret. But it might need to be encoded first :) [21:57:45] i'm working out the little bits for exact property names and where they go, shouldn't take too long [22:00:46] they'll never figure out that the password is `{WӭwvwS?dBXֈ9jҔyz/m1u"]XKzdx;wk|` [22:22:27] so i think https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/732800 updates the startup script as necessary, and then https://gerrit.wikimedia.org/r/c/operations/puppet/+/732801 has puppet provide them, needs a secret added