[14:02:17] cloud homies I've a question if you know the answer to [14:02:32] we have certain hosts on the network running things that don't support IPv6 [14:02:51] the typical way we deal with this is not to add dns for these hosts, so any connection to the hostname uses IPv4 [14:03:16] we also have a report in netbox set up to alert us if any hosts that shouldn't have dns for v6 end up getting it set up: [14:03:17] https://netbox.wikimedia.org/extras/reports/results/5214939/#test_primary_ipv6 [14:03:30] the alerts are somewhat neglected :P [14:03:44] but there are many cloud hosts on the list, and have been for a while [14:04:10] so it seems clear to me dns entries for cloud hosts IPv6 isn't an issue? [14:04:20] if you agree I'll remove those alerts for the hosts [14:04:38] topranks: I'm pretty sure everything except the clouddb hosts can have IPv6 records [14:05:07] Ok. I noticce clouddb2002-dev is on the list but none of the rest [14:05:18] I'll leave the warning in place for clouddb, and remove it for the rest [14:05:27] I'll also remove the dns entry for clouddb2002-dev to remove that one [14:05:28] thanks! [14:31:18] taavi: hi, may you review/deploy an ircservserv config change I have send? https://gerrit.wikimedia.org/r/c/wikimedia/irc/ircservserv-config/+/967131 :) [14:31:24] else I will poke Kunal about it [15:51:13] Anyone know why things flapped just now? cloudcephos1025, cloudcephmon1002, clouddb1020? [15:51:23] (I'm off today, but concerned!) [15:53:53] I guess it was prometheus systemd, less concerning [15:59:11] flapped how? I haven't seen anything [16:00:44] Just "PROBLEM alert - cloudcephmon1002/Check systemd state is CRITICAL " with a quick recovery [16:21:59] bd808: hm, do we not install Extension:OAuth to private wikis? /me is a bit surprised T319934 can't use the new image as is [16:22:00] T319934: Migrate officewikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319934 [16:22:49] taavi: apparently nobody has ever added it to officewiki. It's on wikitech because I put it there at some point. [16:23:22] I have hacks similar to your hacks that make botpasswords work inside a buildpack images now. [16:24:10] It might be worth trying to combine the hacks and upstream an "insecure" mode in pywikibot's code [16:26:22] yep. are you still seeing any other issues than the git ownership warning? [16:27:28] it's a hard failure rather than a warning, but I have a workaround for that now too at https://gitlab.wikimedia.org/toolforge-repos/officewikibot-pywikibot/-/blob/9246c7618f84b3e1d63303f871d9d53d32f8e048/entrypoint.sh [16:28:48] I got `webservice buildservice shell -- pwb -family:officewiki -lang:en redirect both -moves -always` to run over the weekend. Debugging turned out to be easier for me with `webservice` managing the pod. [16:35:32] I had trouble getting `toolforge jobs` to capture log output when the pod was basically failing on startup. It would email me the exit code, but not any other output. [16:36:28] it wasn't logging anything at all to disk either, even when I explicitly configured stderr and stdout files. [16:37:28] yeah, it's not great. `toolforge jobs logs` sort of works, except that it only works until the pod is purged a minute after it completes [16:37:54] i'm working around the git ownership issue with https://gitlab.wikimedia.org/toolforge-repos/pywikibot-buildservice/-/commit/10406db40f5548c21986c53f0c9a419f3a4b9bd2, too [16:40:11] great minds ;) I didn't want to hack pwb.sh directly myself, but if it's in your upstream then I can get it for free once I rebase on your changes. [18:55:01] It seems that tf-jobs run now defaults to creating jobs with no disk logging and no emails on failure. Is that combination intentional? I honestly expected at least email on failure as a default. [19:00:58] * bd808 lunch