[08:09:09] mjolnir had a successul run tonight [10:16:20] dcausse: thank you for taking care of the SUP producer outage earlier this week. What leaves me puzzled: The producer hardly uses HTTP metrics since the number of requests should be minimal compared to the consumer. [10:19:04] pfischer: indeed but I think it's the cardinality that was causing an issue here not really the rate, I think we were already running close to the jvm limits because I did not see anything "huge" in there [10:19:44] if you're curious a snapshot of the metrics are in deployment.eqiad.wmnet:/home/dcausse/producer_prom_metrics.lst [10:20:25] Thanks, I’ll have a look. I’ll create a ticket to boil down the metrics. [10:20:44] we could possibly reduce the cardinality if we wanted to I guess (i.e. remove some percentiles we don't use) [10:20:47] thanks! [10:21:13] or drop the per-host, dunno... [10:33:10] errand+lunch [14:15:33] check check [14:15:37] dcausse https://gerrit.wikimedia.org/r/c/operations/puppet/+/1088210/19/hieradata/role/common/wdqs/internal_main.yaml#11 doesn't need to change for internal graph split endpoints, right? `sparql_query_stream:wdqs-internal.sparql-query'` is OK? [14:20:42] o/ [14:20:44] inflatador: looking [14:22:20] inflatador: correst the sparql_query_stream should remain the same for all internal endpoints [14:22:25] s/s/c [14:23:22] For those internal endpoints, is it only CirrusSearch that consumes them? [14:26:57] inflatador: no, it WikibaseConstraintChecks and another extension, lemme find the right config [14:27:57] inflatador: https://codesearch.wmcloud.org/search/?q=localhost%3A6009&files=&excludeFiles=&repos= [14:28:29] these are the ones I know about but there might more out there :/ [14:29:13] Thanks, starting to think about what's next after we deploy [14:35:29] OK, created T380594 to discuss that further...low priority [14:35:30] T380594: Inform current wdqs-internal consumers about new internal graph split endpoints - https://phabricator.wikimedia.org/T380594 [15:14:49] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094468 puppet change for the new internal endpoints if anyone has a chance to review [15:25:04] inflatador: you'll have to re-enable the categories on either internal-main or internal-full them when we drop "internal-full" for cirrus' deepcat to work properly [15:25:53] dcausse good catch, I will make a note of that in some of the related Phab tasks [15:27:46] maybe a good excuse to create a categories-internal VM or two [15:32:23] :) [15:32:51] created T380608 for further discussion on this [15:32:52] T380608: Address categories migration for internal graph split endpoints - https://phabricator.wikimedia.org/T380608 [15:36:07] thanks! [15:45:19] The split graphs don't need anything special as far as scap config, do they? [15:52:21] I'm not sure... if dsh_target is the same I guess no, quickly looking only wcqs appears to have something different [15:57:37] \o [15:59:32] random newness, php 8.4 released with get/set hooks on properties: https://www.php.net/releases/8.4/en.php [15:59:39] will ofc be a few years till it makes mediawiki [16:05:57] nice! [16:09:09] Looks like scap deploys OK to the public graph split hosts, but doesn't recognize the internal graph splits. Will work on that [16:13:08] inflatador: the rdf deploy repo? Thats configured by scap.cfg in the repo, says to deploy to `dsh_targets: wdqs`. Thats referring to /etc/dsh/groups/wdqs [16:13:46] s/groups/group/ [16:15:57] ebernhardson ACK, thanks. I think the DSH config comes from https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/common/scap/dsh.yaml , will look when I get back [17:01:09] * ebernhardson realizes while looking at a graph that didn't seem to add up...clickthroughs is number of pageviews, impressions is number of pageview actors... [17:05:12] thankfully i know who to blame for the wrong graph :P [17:07:31] :) [17:30:43] ryankemper not feeling well, I'm going to rest up a bit. Should be in by our normal pairing time, if not I will keep you posted [17:33:24] heading out, have a nice week-end [18:24:26] ryankemper: an easy puppet patch if you could review/merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094484 [18:29:15] hmm, pcc fails prod and change, guess its broken :( maybe needs to wait for the facts to update [20:22:08] .o/ [20:22:24] ryankemper I'm back [20:23:00] ebernhardson That looks like a problem with the puppet patch we pushed earlier [20:23:39] `Error: Evaluation Error: Error while evaluating a Function Call, Could not find service wdqs-internal-main in service::catalog (file: /srv/jenkins/puppet-compiler/2491/change/src/modules/profile/manifests/services_proxy/envoy.pp, line: 46, column: 13) on node snapshot1016.eqiad.wmnet` [20:23:48] Let me fix that real quick [20:54:36] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094530 this is up if anyone wants to take a look [20:55:28] seems plausible, and afaik service_setup means noop [20:56:06] Yeah, that's the docs say. Let me fix the comment tho, it's in the wrong place [20:59:04] ebernhardson thanks, will take a look at your patch now [21:01:48] OK, yours is merged now [21:14:37] thanks! [21:31:40] sadly, we're getting conftool errors now [21:33:47] :S [21:37:35] adding a load balancer VIP by clicking a few buttons or calling a network engineer seems like a decent memory ;P [21:40:44] err....distant memory, that is [22:04:10] Well I let the pup sleep in my bed last night so he wouldn’t wake me up at 8am and ended up sleeping till 1pm [22:04:14] Oops [22:04:41] inflatador: taking dog out now but are you guys backing out the changes? [22:05:15] probably, I'm trying to put together a timeline at https://etherpad.wikimedia.org/p/conftool-T379329 to help Sukhbir and Scott understand what happened [22:05:15] T379329: Create puppet config for wdqs-internal-main and wdqs-internal-scholarly roles - https://phabricator.wikimedia.org/T379329 [22:09:00] Most likely merging a service yaml entry w/o both the etcd [22:09:31] ^ oops hit enter instead of delete [22:10:09] I would revert all 3 patches we don’t want to make too much noise [22:10:49] Must be a file in the original roles patch that shouldn’t have been there [22:11:11] Since it shouldn’t depend on service catalog being there