[08:33:27] <wm-bot>	 !log tools.growth-community-configuration <urbanecm> Updated Growth community config repos  (main=466d05b, example=82c99a9)
[08:33:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.growth-community-configuration/SAL
[08:36:44] <wm-bot>	 !log tools.growth-community-configuration <urbanecm> tools.growth-community-configuration@tools-sgebastion-10 ~/public_html/mw $ git checkout sandbox/urbanecm/community-configuration
[08:36:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.growth-community-configuration/SAL
[08:57:21] <taavi>	 jbond: btullis: clouddb1015 and 1019 are alerting because they've had puppet disabled since the middle of last week. the reason links to https://gerrit.wikimedia.org/r/c/operations/puppet/+/961829 but not a person, so whoever was working on deploying it please fix the remaining hosts
[09:11:36] <taavi>	 correction, not limited to 1015 and 1019, someone has just acked the alerts for other servers
[09:21:20] <btullis>	 taavi: thanks. They're on my radar for today. I need to depool them to reboot and I wasn't sure of the best method. I should have set downtime for the puppet alert, so apologies for that.
[09:27:31] <taavi>	 I think Puppet being disabled for that long is a problem in the first place, is there a way we could have avoided that somehow?
[09:29:08] <taavi>	 also, are there tasks about fixing the workflow for pooling/depooling wiki replica hosts? that seems to be causing quite a bit of work that should really be much easier
[09:37:21] <btullis>	 Yes it wan't intentional. I had intended for it to happen on Thursday at the latest, but I simply ran out of time. 
[09:38:42] <btullis>	 There are a couple of tickets about improving the pooling/depooling workflow, but I think we could do with reviewing and drawing them together into a bit more of a proper project. There is this one, for example: https://phabricator.wikimedia.org/T322658
[09:40:08] <btullis>	 I did some work a little while ago that was intended to allow us to use `confctl` to depool a whole wikireplica cluster (web/analytics) by swapping out which dbproxy server was in use.
[09:40:53] <btullis>	 However, I've only used it twice and both times pybal choked on it, depooled both clusters and needed restarting.
[09:43:34] <btullis>	 There is also a cookbook here: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/cookbooks/+/refs/heads/master/cookbooks/sre/wikireplicas/update-views.py which tries to automate the haproxy based draining and depooling and running maintain-vews.. As far as I know it has never actually been run successfully.
[09:54:56] <taavi>	 https://phabricator.wikimedia.org/T322658 would be helpful too, but I'm not sure how that would help with depooling individual hosts
[10:03:12] <btullis>	 973728: Depool the cloudb10[13-16] hosts for maintenance | https://gerrit.wikimedia.org/r/c/operations/puppet/+/973728 (If anyone would be so kind please.)
[10:04:38] <taavi>	 btullis: +1
[10:04:47] <taavi>	 let me know how it goes
[10:13:16] <taavi>	 btullis: https://phabricator.wikimedia.org/T300427#9325905
[10:14:10] <btullis>	 Thanks. We have a meeting this Wednesday - Shall we use some of that meeting to talk about improvements we can make in this area?
[10:16:01] <taavi>	 would be nice, but I'm not sure if the current timeslot we have for Wed has enought time for both deploying that change and talking about process improvements
[12:40:45] <wm-bot>	 !log tools.growth-community-configuration <urbanecm> Use redis for caching, without it, the testing instance was unreasonably slow
[12:40:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.growth-community-configuration/SAL
[13:21:30] <Rook>	 !log paws pwb version bump T351015
[13:21:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL
[13:21:34] <stashbot>	 T351015: New upstream release 8.5.1 for Pywikibot - https://phabricator.wikimedia.org/T351015
[14:28:16] <tchin>	 Hey all, I'm trying out Toolforge's build service and was wondering what the file limit is for the built image? It fails at the export step with 413 Request Entity Too Large
[14:29:37] <tchin>	 Built it locally and the image came out to 1.17 GB so... yeah that's probably it but wondering what the limit is
[14:35:23] <taavi>	 very good question. blancadesal ^ do you happen to know that off-hand?
[14:37:30] <blancadesal>	 tchin: as far as I know, there is no per-image quota, but there is a limit to the total size your tool's image registry will hold. From memory it's 500 GB, let me check
[14:38:49] <blancadesal>	 tchin: it's 1GB
[14:39:17] <blancadesal>	 what's the name of your tool? 
[14:40:44] <tchin>	 dpe-alerts-dashboard
[14:44:05] <tchin>	 basically I'm trying to see if I could get puppeteer running in a toolforge job but didn't realize how storage it takes up 😅
[14:44:11] <tchin>	 *much
[14:44:16] <blancadesal>	 so yes, the image is too big. :))
[14:45:51] <tchin>	 I guess I should ask first before continuing down a dead end, can toolforge jobs access the internet?
[14:46:27] <taavi>	 yes they can
[14:46:33] <tchin>	 ok nice
[14:46:45] <taavi>	 blancadesal: that seems a bit low.. and at least should be documented somewhere
[14:47:02] <taavi>	 a raw 413 error doesn't sound particularly helpfu;
[14:47:27] <tchin>	 for context I need to get info from lists.wikimedia.org but there's no api for it so web scraping is the only option
[14:47:28] <blancadesal>	 taavi: agree. need to go afk for a while soon, will open a ticket 
[14:51:54] <blancadesal>	 T351092
[14:51:54] <stashbot>	 T351092: [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092
[14:52:37] <taavi>	 tchin: ^ would you mind pasting the error you got in there?
[14:56:17] <tchin>	 got it
[15:00:31] <blancadesal>	 thanks!
[16:37:37] <taavi>	 !log tools drain tools-k8s-worker-84 tools-k8s-worker-85
[16:37:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[18:17:42] <Rook>	 !log paws remove old cluster T350875
[18:17:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL
[18:17:47] <stashbot>	 T350875: Remove 123_8 cluster - https://phabricator.wikimedia.org/T350875
[19:31:41] <andrewbogott>	 !log admin rebooting cloudcontrol2005-dev, trying to fix general misbehavior
[19:31:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[21:39:46] <wm-bot>	 !log tools.stewardbots <anticomposite> SULWatcher/manage.sh restart # SULWatchers disconnected
[21:39:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL
[22:21:22] <taavi>	 !log tools reboot! tools-sgewebgen-10-3, tools-sgeweblight-10-21, tools-sgeweblight-10-32, tools-sgeexec-10-16 due to high load average and/or stuck jobs
[22:21:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL