[02:42:32] bd808: I would like to know about the task if you remember https://phabricator.wikimedia.org/T330341 [02:45:33] ZI_Jony: I tried the config change 3 times and it didn’t work. I’m not sure what else to try honestly. [02:45:57] I was going to setup a test install so I could try again without disrupting all other bridged channels, but I have not made the time to do that yet. [02:52:14] bd808: thanks for your reply and time [02:53:38] bd808 could you please try this IgnoreNicks="wm-bot wm-bot2 wm-bot3 wikibugs" [02:55:51] That’s what I tried in https://phabricator.wikimedia.org/T330341#8639404 and for unknown reasons it did not work. [02:56:41] And the third try was another variation where wikibugs was not the last item in the set. [12:46:01] !log admin stopping nutcracker on labweb100[34] to ensure that it's not actually doing anything [12:47:10] hmmm did stashbot go on vacation? [12:59:16] looks like it, ping timeout D: [13:27:04] !log tools.stashbot restarted after ping timeout [13:27:51] no ack message, but it shows up in https://sal.toolforge.org/tools.stashbot [13:28:10] andrewbogott: want to try re-logging your message? [13:28:49] hm, the message still hasn’t ended up on https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL though [13:29:55] there’s some “retrying to due to nonce error”s in the log, with reference to T106066 [13:31:29] :( [13:33:17] guesswork hypothesis: maybe stashbot gets oauth errors (the nonce stuff) from wikitech, sleeps before retrying, and doesn’t respond to IRC pings during that sleep and thus times out? [13:39:47] I don’t know enough to look further into this, I’m afraid :/ [13:52:55] !log admin starting nutcracker on cloudweb1003/4, see if that's what makes horizon login timeout [13:56:51] dcaro: that was quick. Did starting it help? [13:58:06] andrewbogott: yep :) [14:44:27] looks like stashbot managed to recover, yay [14:49:47] hmm, would that be related to the memcache/nutcracker issue? [14:50:12] wikitech runs on cloudwebs no? [14:51:18] not sure… but those “nonce error” messages are apparently related to memcached, at least [15:00:32] dcaro: yes, wikitech runs on the cloudweb100{3,4} hosts today [15:10:28] godog: I think we're still seeing too many phab tickets related to T333315 I just closed another bunch of them [15:10:32] T333315: WMCS: hundred of phabricator tickets were created for some alerts - https://phabricator.wikimedia.org/T333315 [15:11:59] I would be interested in changing that to don't create any ticket at all, but don't really know where/how to change that safely without affecting other pieces of the infra [15:15:06] I would prefer changing it in a way that it does not create a new ticket [15:15:21] but still opens one if there's no open one already [15:15:26] would that be possible? [15:16:05] this is the software right godog? https://github.com/knyar/phalerts [15:16:24] "if there is an existing open task with a given title, updates its description if necessary;" [15:16:54] it should be doing that already it seems, maybe the labels are not what it expects? [15:18:52] that sounds reasonable. But meanwhile we implement that, I think we could just disable them entirely [15:18:54] https://gerrit.wikimedia.org/r/904559 [15:28:19] I hope it's more a bug/misconfiguration than a missing feature [15:28:35] (in meetings for another ~30min, I'll reply later) [15:28:54] 👍 [15:32:43] I think this might be the key: "Herald edited projects, added cloud-services-team; removed cloud-services-team (Kanban). · View Herald TranscriptFeb 23 2023, 01:07" [15:32:46] it filters by the project [15:32:58] and the project has changed on the fly, so it does not match the new project [15:33:39] the new phid should be [15:33:41] https://www.irccloud.com/pastebin/i0E6BC3u/ [15:35:51] dcaro: that's very nice! [16:05:44] ok so yeah phalerts by default creates one task per alert group, so depending on grouping that affects how many tasks are opened [16:06:41] also indeed patch LGTM [16:08:11] arturo: did I get it right that https://gerrit.wikimedia.org/r/c/operations/puppet/+/904559 should fix the issue ? [16:10:32] godog: oh, interesting to see you say that because literally a minute ago I commented in a suggested code change about this question. so if I simply list 2 lines of "'http://localhost:8292/alerts?..." with 2 different phab tags, that would result in 2 tickets or 1 ticket with 2 tags? [16:10:49] as in https://gerrit.wikimedia.org/r/c/operations/puppet/+/903796/4/modules/alertmanager/templates/alertmanager.yml.erb#391 [16:11:07] reads more backlog [16:12:28] mutante: so multiple phid= in the same url: means one task with multiple tags [16:13:10] multiple url: with one phid= I think will result in multiple tasks each with its own tags [16:16:15] godog: ooh! glad I asked. I want one task with multiple tags, I will try ?phid=....&phid=.... [16:16:27] thanks [16:18:36] sure np [16:25:36] ok I'm logging off for the day, happy to resume tomorrow [21:52:30] !log gitlab-runners root@runner-1030:/var/lib/docker/volumes# rm -rf runner-m4mqfjvt-project-1177-concurrent-2-cache-c33bcaa1fd2c77edfc3893b41966cea8 (T333586) [21:52:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Gitlab-runners/SAL [21:52:35] T333586: runner-1030.gitlab-runners.eqiad1.wikimedia.cloud out of space - https://phabricator.wikimedia.org/T333586 [21:55:23] !log gitlab-runners root@runner-1030:/var/lib/docker/volumes# rm -rf runner-m4mqfjvt-project-860-concurrent-2-cache-c33bcaa1fd2c77edfc3893b41966cea8 ; rm -rf runner-m4mqfjvt-project-860-concurrent-3-cache-c33bcaa1fd2c77edfc3893b41966cea8 (T333586) [21:55:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Gitlab-runners/SAL