[00:40:50] ^ I was wrong on the above. Hosts: cumin:A:lvs does not work [00:40:55] what works is cumin:O: or cumin:P: [00:40:58] actual run for double confirmation https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8600/console [00:41:01] just as an FYI, in case someone wants to do this as well [05:42:22] We've been receiving this since yesterday: [07:41:42] RESOLVED: AlertLintProblem: Linting problems found for MySQLReplicaNotUsingGTID - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem which seems that some alerting is broken? [05:43:02] https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [06:43:54] I've created: https://phabricator.wikimedia.org/T427469 [06:43:58] for the alerts thing [09:06:03] jynus: o/ I used https://wikitech.wikimedia.org/wiki/Bacula#Restore_(aka_Panic_mode) to confirm/verify the pki-root1001's backup before decomming it, all good and smooth. Nice work :) [09:07:36] elukey: great <3 [09:07:53] so you don't need to atend my refresher [09:08:06] still open for other people with root next wednesday [09:09:09] it is important to do it regualarly, I do test that the system works generally, but I cannot caught logical erros like a miss on certain directories or things only service owners may know [09:15:56] totally agree yes, we should do some drills every now and then [14:21:39] I've had a report from one of our users that they are getting the following when browsing repositories with gtiles [14:21:42] Error: 429, Please respect our robot policy https://w.wiki/4wJS when crawling us, or contact noc@wikimedia.org (b95fadf) [14:23:10] RhinosF1: they should only be hitting that if they are from an old browser [14:23:51] I am not saying that that should be acceptable but yeah, the problem is there -- we are rate-limiting old browsers there for that [14:24:31] sukhe: have asked them what browser they are on [14:24:55] RhinosF1: let me check if we can relax that at least somewhat [14:25:46] sukhe: apparently they were just moaning about it happening before, no need to worry [14:26:49] RhinosF1: the current rule indicates heavy throttling for old browsers, so I doubt it will be fixed, unless they upgrade their browser [14:26:54] which well does not work in every case [14:27:08] so let us know, or ask them to reach out to us at noc@ and we can help them there [14:27:26] I think we can relax the limits somewhat and I am happy to do that but I will check with others for a +1 [14:29:00] it was old and hasn't happened for a while [14:29:05] ok [23:27:22] anybody see etherpad wonkiness? I am getting "upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused" [23:28:08] a retry worked; but also noticed that this morning, i was seeing constant disconnect/reconnect [23:33:09] bliviero: hm, that's the error you get when Envoy terminates TLS but the backend underneath (etherpad, in this case) isn't home [23:33:32] and, it looks like etherpad has died with "JavaScript heap out of memory" 521 times today [23:33:40] so, not just you :) [23:36:03] bliviero: I don't see a task already open -- probably you should open one under #Wikimedia-etherpad and #collaboration-services, unless maybe mutante happens to be around and has a moment to take a look [23:44:31] rzl: thank you for confirming, ticket opened, https://phabricator.wikimedia.org/T427588