[11:12:16] !log paws deploying paws realtime collaboration 246e2afa97060dbd010b5a36ac88589f972d741f [11:12:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [12:22:39] 000016 [14:12:59] hi [14:14:05] !log bastion rebooting bastion-eqiad1-03,04 (not yet in active use) [14:14:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Bastion/SAL [14:14:23] r [14:14:32] ? [15:44:43] !log toolsbeta Deployed buildpack-admission-controller with the latest code (T297090) [15:44:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:44:46] T297090: [tbs][poc]Recreate toolsbeta deployment with the new code - https://phabricator.wikimedia.org/T297090 [15:47:43] Guest51, can we help you? [15:47:55] h [17:03:39] If i were to decom an instance from horizon, can i setup a 301 moved permanently to send to the new permenant location of the tool? I poked around in the horizon interface but didn't see it. The goal is wcqs-beta.wmflabs.org -> commons-query.wikimedia.org [17:08:19] perhaps https://wikitech.wikimedia.org/wiki/Nova_Resource:Redirects/Documentation ? [17:10:23] ebernhardson: yeah, redirects sounds fine for that use case [17:12:17] taavi: excellent, thans [17:12:24] add a k where appropriate :) [17:13:59] FYI, Pywikibot 7.0.0 moved some files around in the repo. I just updated some docs on wikitech, but if y'all find more things to update please do! https://wikitech.wikimedia.org/w/index.php?title=Help:Toolforge/Pywikibot&diff=1954677&oldid=1952389 [17:18:35] ebernhardson: I see you also have a wcqs-beta.wmcloud.org (wmcloud not wmflabs) configured, do you want the same treatment for that too? [17:18:57] taavi: yes, please [17:19:39] taavi: any wmcloud also works as a wmflabs name (99% sure) [17:20:31] wmflabs gets redirected to wmcloud, but not the other way around [17:21:09] bd808: while you are with pywikibot, thoughts on T302988? [17:21:10] T302988: Toolforge kubernetes and pywikibot - https://phabricator.wikimedia.org/T302988 [17:22:57] arturo: I think there was some other task where that got discussed in the past. The think I remeber is mostly that we would need (or at least should have) a process to update the container when pywikibot makes new stable releases. [17:23:15] *The thing I remember [17:23:56] T249787 is the place I'm remembering [17:23:56] T249787: Create Docker image for Toolforge that is purpose built to run pywikibot scripts - https://phabricator.wikimedia.org/T249787 [17:24:40] bd808: with "update the container" you mean inside the registry or running pods? [17:24:56] in the registry [17:25:06] oh right, me messing with the wcqs-beta redirects wasn't exactly a best idea when andrewbogott is upgrading the dns nodes :/ [17:26:13] bd808: do you think that whatever workflow (or schedule) we have for maintaining the other docker images is not enough? [17:27:21] arturo: well... the workflow/schedule today is mostly "when a new webservice build needs to go out", so no probably not frequent enough. [17:27:58] xd, sounds like another great fit for the toolforge build service pipeline [17:28:52] dcaro: yeah, adding real CI/CD for the shared containers has been a thing we have wanted for many years [17:29:33] Hugh was going to work on it and then ... life happened. [17:30:14] I'm wondering if something like https://gerrit.wikimedia.org/r/764300 could relax the tension while the toolforge build service is ready to go [17:30:46] !log redirects set up redirects for wcqs-beta T303202 [17:30:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Redirects/SAL [17:30:49] T303202: Redirect wcqs-beta.wmflabs.org to commons-query.wikimedia.org - https://phabricator.wikimedia.org/T303202 [17:30:56] (that string could be properly versioned, of course) [17:31:11] does that build it automatically? [17:31:31] (/me is not faimiliar with the jobs framework) [17:32:16] no building there, but it using the :latest tag means that new containers would be found as they were added to the registry [17:32:34] no :-( all the docker images we have for toolforge today (jobs or webservices) are manually built, see https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Docker_Images [17:35:08] Our whole setup for building containers could use help. I think Brooke had hopes of just replacing most of it with CI/CD and biuldpacks [17:35:23] *buildpacks [17:35:36] sounds like a nice thing to hope for [17:35:50] (and imo something reasonable too) [17:37:46] arturo: I certainly don't think we need perfect things. I think your jobs framework would be a good trial of a pywikibot container. [17:38:32] ok, I'll see if I can find some interested pywikibot beta testers. Anyway taavi suspects the current container image doesn't even work [17:38:34] taavi: cloudservices1004 is rebuilt and I'm going to let it sit for a couple hours before rebuilding 1003 so if you want a window to do dns things this is it :) [17:39:33] andrewbogott: managed to work around the breakage already :-P [17:39:35] arturo: yeah, it may not. I think I got distracted from that project before really finishing a working POC [17:39:45] ok! [17:41:32] Most of my grid engine jobs are moved to k8s. [17:41:35] https://usercontent.irccloud-cdn.com/file/l07SxzOJ/image.png [17:42:09] andrewbogott: did you ever fix the prometheus exporter for pdns-recursor on codfw1dev? [17:42:21] Amir1: awesome!!! [17:43:18] Amir1: I will send you flowers if you send some lines to cloud@ [17:43:56] 🎉 [17:44:08] They are still showing up because I migrated them this weekend and last but they will drop off ^^ [17:44:56] looks like you combined the many cronjobs into one? the deprecation tool would hide it if you had kubernetes cronjobs with the exact same name [17:45:45] taavi: I think that was https://gerrit.wikimedia.org/r/c/operations/puppet/+/763612 but maybe I'm thinking of something else? [17:46:32] so I still see 'Reduced availability for job cloud_dev_pdns in codfw' when looking at silenced alerts on alerts.wikimedia.org [17:46:37] taavi: yeah, hourly and daily [17:48:01] andrewbogott: oh I think it's a firewall rules issue now [17:48:07] lemme make a patch real quick [17:48:10] ok! [17:48:16] or hmm [17:48:54] I think prometheus hosts already have firewall rules for all ports [17:54:15] oh it'd probably help if I used the correct port number in the prometheus config file [17:56:22] andrewbogott: https://gerrit.wikimedia.org/r/c/operations/puppet/+/768770 [20:54:22] Hmm qstat on sgebastion-08 seems to have a timeout? [20:57:20] No! Not a timeout, it takes 46 seconds to respond [20:57:55] Wurgl: `qstat` seems hung somehow even on the grid master node. I'm not sure yet what is causing problems. [20:59:49] okay. [21:01:27] `time qstat` as the root user on the master node took 67 seconds to return... still trying to guess why [21:04:37] arturo: I can help with pywikibot testing. I already have some k8s jobs that use pywikibot on py39. [21:07:31] Wurgl: I think whatever was wrong is magically fixed now. qstat is at least fast for me again [21:08:12] I did not actually find a cause :/ [21:13:37] It's magic ;^)