[09:20:42] wdqs1013 went down again during the weekend [09:21:15] it always (or super often) that host that does it [09:21:22] yes, happens quite regularly sadly [09:21:49] should we take it out of ciculation and investigate? [09:22:11] 1013 & 1012 yes, I bet it's because, being powerful, the run more queries increasing chance to be hit by a deadly one [09:22:29] ah, I see [09:22:31] makes sense [09:22:47] would be nice to confirm it, though [09:22:58] maybe we should have a different memory configuration for them [09:23:12] wdqs1012 doesn't die nearly as often [09:24:09] hmm, actually that might be an interesting dig - which hosts contribute most often to the general instability [09:25:06] * zpapierski is off to play around with PromQL [09:25:43] indeed, thanks for looking! [10:23:50] and the winner is indeed wdqs1013 [10:28:17] https://grafana-rw.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&refresh=1m&forceLogin=true&from=1644748087771&to=1644834487771&viewPanel=36 [10:29:35] it's not exactly what I wanted to do (I want to have a more specific host availabilty, instead of SLO here), but I'm assuming it follows the same trend [10:30:18] funnily enough, wdqs1012 is actually a best performing host (SLO wise) among eqiad ones [10:30:24] wdqs1013 is anything but [10:30:46] which means that theory of those being the most exposed might not hold [10:37:42] how do you sort this thing... [11:10:12] lunch [11:10:19] lunch [12:16:02] lunch [14:14:34] greetings [14:16:59] o/ [14:57:28] o/ [16:02:10] ejoseph: triage meeting: https://meet.google.com/qho-jyqp-qos [16:34:19] zpapierski: i wonder on thinking again, did we actually need session return-to? Is w[cd]qs a SPA with only a single url? [16:42:15] I think it is, but might be wrong [17:11:29] dinner [18:13:19] o dpm [18:13:50] i don't see anything about boosters in https://ec.europa.eu/info/live-work-travel-eu/coronavirus-response/safe-covid-19-vaccines-europeans/eu-digital-covid-certificate_en is it saying the eu requires a 2-dose every 270 days? Currently noone is allowed a fourth shot here afaik [18:56:54] The quantity of "200kg of sugar" (440 lbs) came up in the Ask a Language Nerd meeting today and someone (from Poland) said they didn't think they'd eaten that much sugar in their life. mpham helpfully suggested I might have (can't argue with that). I looked it up, though, and the average American eats 150 lbs of sugar per year (up from 125 in the 70s and just 2 two hundred years ago). So 200kg is ~3 years for an average American. [18:56:54] In Poland, it's 40kg/88lbs per year—so 5 years' worth. (Since it's Valentine's Day, I'm working on getting my numbers up!) [19:04:07] hmm, how do i keep forgetting to turn puppet back on on the wcqs hosts....i always wonder if it does anything because `sudo enable-puppet foobar` emits nothing [19:21:02] ahh, the problem is i never realized i'm supposed to provide the same message when disabling and enabling. No wonder i've left it disabled a few times on accident... [19:21:30] (and it doesn't tell you it refused to turn back on, puppet patch up that adds an echo) [19:25:16] ryankemper: something i just noticed, in eqiad :9243 we have `persistent.cluster.remote.omega.seeds` and `persistent.search.remote.omega.seeds` in the cluster settings, one refers to new hosts and one refers to old. I suspect it's still using the old one since `curl https://search.svc.eqiad.wmnet:9243/omega:sowiktionary_content` fails to find an index, suggests its still looking at the [19:25:19] old masters [19:25:51] not entirely sure, but one way or the other only one set of conf should exist in the cluster settings :) [19:27:30] ebernhardson: ah, that might explain some of the issues with the alert firing https://phabricator.wikimedia.org/T301511#7708316 [19:28:06] the check script looks at `cluster|search`, but I didn't think about the other side (the actual setting being wrong) [19:31:27] ebernhardson: https://github.com/wikimedia/puppet/blob/58392896c66ea8669c3e52d27845be716c2c6c6a/modules/icinga/manifests/monitor/elasticsearch/cirrus_settings_check.pp#L13 here's what i meant about the check looking at either `cluster|search` [19:35:52] ryankemper: i think the name changed depending on the elasticsearch version, double checking [19:36:24] the commit message says something to that effect, https://github.com/wikimedia/puppet/commit/d6bfc99435ef0b024599438565c09c903c69cbca [19:40:22] ryankemper: i think once fixed these should decline, these should all be cross-cluster searches that are failing (fairly invisible to users, they just don't get a sidebar): https://logstash.wikimedia.org/goto/4743bc53e694043b26ff941b70d78c8a [20:01:29] ebernhardson: https://meet.google.com/stp-swkd-iho [20:02:08] omw [23:26:51] Trey314159: that is a horrifying amount of sugar! Do you think it includes corn syrup? [23:29:56] I created this (placeholder) ticket to figure out what we need to do to kill ApiFeatureUsage: https://phabricator.wikimedia.org/T301724. There's a question about whether we want to undeploy it or sunset it, and I'm not sure what the difference is or which we want