[09:57:37] !log paws autoscaling for renderer deployment T320776 910e3ca0dbf2a2b9019ff3d86520565e4548d778 [09:57:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [09:57:39] T320776: autoscaling for some pods - https://phabricator.wikimedia.org/T320776 [10:26:59] stw: TIL about lifecycle/ignore_changes, thanks! that seems like a much better solution than hardcoding the latest version when the image was created [10:36:41] You're welcome :) It's mostly used for things like tags/metadata, such as management tooling which applies it's own tags to resources outside of Terraform, that Terraform would then automatically remove on the next apply [10:37:21] It feels a little weird to use it for image_id, but it seems to work fairly well [10:40:47] I think part of the problem you were probably seeing is that declaring an image by name could change the underlying image it was referring to as WMCS renames deprecated images, hence why you were pinning to an image ID. Other platforms like AWS tend to never change the name of an image after it's been created making that less of a problem [10:54:34] !log tools rebuild mono68-sssd image with the expired DST Root CA X3 removed T311466 [10:54:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:54:37] T311466: Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 [14:44:52] Hey everyone! in about 10 minutes I'm going to reboot both cloud-vps bastions as part of some unavoidably urgent maintenance. This will kick folks off of any non-toolforge VMs. [14:45:22] Very sorry in advance for the rudeness of this event :( You should be able to reconnect a couple of minutes after the disconnect, and toolforge users will be unaffected. [14:48:49] rook, dcaro, dhinus, taavi, arturo, please note ^^ and save your work [14:49:24] no probs for me, thanks for the heads up! [14:49:53] 👍 [14:50:11] 👍 [14:55:49] reboot done. sorry again for the swift kick. [14:58:20] thanks! [15:07:51] !log tools rebooting tools-prometheus-7 and tools-prometheus-8 (but not at the same time) [16:06:03] !log tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # reboot of services [16:06:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [16:49:38] !log tools rebooting redis nodes (one at a time) [16:49:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:53:17] can I safely reboot tools-package-builder-04.tools.eqiad1.wikimedia.cloud? [16:53:21] dcaro: ^ [16:53:34] I think so yes, is anyone logged in? [16:54:12] nope, green light :) [16:54:22] thx [16:54:49] !log tools rebooting tools-package-builder-04 [16:54:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:55:56] Another zero day needing urgent patching? [16:59:12] * andrewbogott remains cryptically silent about security questions [17:02:45] I probably got it in my work email, but I'm off and refuse to login 😊 [17:02:45] Just reply when the hole is all patched up again. (re @wmtelegram_bot: remains cryptically silent about security questions) [18:24:58] https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage does this page need to be updated? I believe labstore100[6-7] have been decommed? If it's just a straight find and replace for clouddumps, I can do ti [18:25:00] or it [18:31:14] inflatador: yeah, I think that's all that needs changing, done [18:31:41] andrewbogott excellent, thanks! [19:12:27] !log tools.wikibugs restart wikibugs [19:12:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [19:25:01] spi-tools, which runs as a web service on toolforge, every once in a while will just stop responding to requests. Nothing logged in my django log, nothing logged in uwsgi.log either. [19:25:22] Doing a restart gets it going again, but it's frustrating not having a clue what's going on. [19:25:40] Any ideas on how to debug this? [19:26:11] Since there's nothing in uwsgi.log, I don't even know if the request is getting as far as my code or not. [19:41:39] It looks like uwsgi only logs something when the request completes. Is there some way to make it also log something when it issues the request to your service? [19:45:40] Does stashbot auto restart? If not someone should give it a poke [19:49:26] !log tools.stashbot Restarted to rejoin channels [19:49:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL [19:49:42] Sariboo: done ^^ [19:49:57] TheresNoTime: \o/ [19:50:10] (I worry about the bots) [19:51:04] wikibugs on the other hand seems just generally unhappy about life atm.. T321342 [19:51:05] T321342: wikibugs losing connection to IRC - https://phabricator.wikimedia.org/T321342 [19:52:01] wmopbot is also doing a lot of in and out, but at least that one is self restarting [19:59:22] !log tools.wikibugs restart wikibugs (again) [19:59:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [20:13:21] I am trying to add some info to wmopbot logs to discover something about the bots being disconnecting, it seems the connection keeps alive but doesn't receive data and the sent data doesn't arrive to IRC server, or maybe the connection broke but the process doesn't note that [20:16:38] I guess I can try taking a packet capture on the k8s worker VM directly to see if that contains anything weird [20:29:14] y'all sure it's not Libera having issues o.O [20:30:59] last time I had a libera staffer watch me repeatedly try to get wikibugs to fail to connect it worked perfectly every time [20:31:18] !log tools.bridgebot Double IRC messages to other bridges [20:31:58] !log tools.stashbot Restarted to rejoin channels, again :D [20:31:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL [20:33:01] TheresNoTime: we have several toolforge k8s bots with issues, and several bots hosted in prod or in separate cloud vps projects that work fine.. so that makes me think that it's very unlikely to be a libera.chat issue [20:33:53] !log tools.bridgebot Double IRC messages to other bridges [20:33:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [20:34:16] (I didn’t restart a second time, just re-logged the message since the initial one was exactly when stashbot restarted ^^) [20:36:15] taavi: could it be something to do with the range they connect from? [20:37:54] maybe. as I said I'm going to do a network capture whrn I have a bit of time to see what's actually happening instead of relying on guesswork [20:42:51] There goes stashbot again [22:09:50] !log tools.stewardbots ./SULWatcher/manage.sh restart # all bots down [22:09:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [22:10:36] !log tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by bot [22:10:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [22:14:30] !log tools.stewardbots ./SULWatcher/manage.sh restart # Bots joined but unresponsive [22:14:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [22:17:26] in service.log i get "Throttled for 3 restarts in last 3600 seconds". does that mean that my webservice was not started? [22:18:12] !log tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by bot [22:18:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [22:24:23] would be nice if it would not count manual starts towards that number or there was a way to unstuck it [23:05:29] it's very unpleasant to debug a webservice/git it running this way [23:06:16] i think i got it running a little bit but i'm none the wiser now [23:21:43] hm, webservice says my webservice is running, nginx says 503 Service Temporarily Unavailable