[07:01:59] Krinkle: we get some lag in x2 since the enablement of https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&var-site=codfw&var-group=core&var-shard=x2&var-role=All&viewPanel=6&from=now-24h&to=now [07:01:59] It doesn't look very very big for now (it did yesterday) but it seems to be under control for now [08:20:29] Okay, marostegui, let me know if that changes. I saw it recover but didn't see write rate decrease much, In fact it was rising again as part of what might become the daily or weekly seasonality. So I'm not sure what specifically induced the lab. [08:20:33] lag* [08:20:45] We have a few knobs we can turn [09:57:27] Krinkle: https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&var-site=codfw&var-group=core&var-shard=x2&var-role=All&viewPanel=6&from=now-24h&to=now we might need to use one of thos knobs :) [16:49:06] Is there something known going on in codfw? We're showing no traffic through varnish text. Has it been depooled? [16:49:40] https://usercontent.irccloud-cdn.com/file/3R5u9hZo/image.png [16:49:55] btullis: yes, it's depooled, there was a brief power incident [16:50:17] bblack: Thanks. [16:50:19] it will be back online for traffic shortly [18:06:57] hi hm, I'm getting "modules/profile/manifests/hadoop/spark3.pp:91 wmf-style: profile 'profile::hadoop::spark3' includes non-profile class conda_analytics" [18:07:04] from CI in https://gerrit.wikimedia.org/r/c/operations/puppet/+/813278 [18:07:21] which...is confusing. my understanding of puppet profiles is that they are supposed to include non proflie classes? [18:07:40] i'm looking for code that enforces this but i can't find it [18:08:31] profiles can use the `class { 'some_class': }` syntax [18:08:31] _joe_: ^ ? [18:09:03] ah hm. okay so if depending on another profile, should require ::... is okay [18:09:05] <_joe_> declare, not include [18:09:10] but if non profile, declare? [18:11:36] depends on what exactly you're doing [18:13:05] i see why that would work, all proflile params are handled by hiera [18:52:19] I'm coming up short as to why mainstash is causing lag. There's a lot of churn which isn't good, and I'll look into that. I have no defence for what it's doing. But.. at the same time, it appears to not affect the eqiad primary and local replicas. I don't know if this is cause or side-effect, but I think maybe it's not lagging due to processing the writes logically (e.g. cpu heavy or multi-row locks or selects, we have none of those) but [18:52:19] due to the hardware of that particular server not keepng up with disk writes? [18:52:20] https://grafana-rw.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=db2142&var-datasource=thanos&var-cluster=mysql&from=now-2d&to=now&viewPanel=6 [18:52:25] disk is fully saturated. [18:52:34] https://grafana-rw.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2142&var-port=9104&from=now-2d&to=now&forceLogin [18:55:07] ok so that is actually the case on the eqiad one as well. alright, that's no good. [21:27:57] hey mutante quick question if you're around -- what's the sync story for people1003 vs people2002? do we expect to rsync by hand before any failover?