[01:29:37] * bd808 off [08:15:08] Morning [08:39:30] bd808: oops. deleted mine, I'll do some updates on that instead [08:58:27] bd80.8: feel free to reword it to something that you feel less threatening, but it is the preferred yes **if the current features work for you** (and if you can, not mounting NFS), and if not, then shared images. I can prepare an 'out of beta' email if you want, and remove the 'beta' tag from it if that would help, the current features are not going to change much (so the current interface is stable). We can wait until we have the [08:58:27] push-to-deploy parts of it, but I think that would be wider than build service, and probably might take some time. Note that the proposed out-of-beta date for the build service was december last year, but well, life happened. [10:38:59] I'm running a silly script to try to capture connection issues towards wikipedia from the tools-k8s-worker-nfs-6 in a pod named 'bash' on the default namespace (fyi) [10:39:35] are there connection issues? [10:39:49] https://phabricator.wikimedia.org/T356164 [10:40:06] kinda, some tools have seen some connection issues [10:45:10] dcaro: maybe try the wiki API endpoint, it may have a different behavior [10:45:15] (in your script) [10:45:19] yep, changed it already :) [10:45:32] using the same query now that one of the tools (commons API) [10:47:40] arturo: any ideas why we would be missing ~1day of logs from the controller manager pods? [10:47:47] (yesterday seems to be mostly missing) [10:48:23] as in, --previous shows before, and the current pod shows after, but there's a chunk missing [10:51:55] specifically I'm looking for the trial to trigger a job tool-chie-bot/job-archive that should have happened at 10:00 UTC (I can find the day before) [10:52:51] it triggered also ok today [10:57:22] dcaro: logrotate ? [10:59:56] does that affect the pod logs themselves? [11:01:49] yes, docker stores logs in the filesystem, and as far as I understand, they have a logrotate procedure [11:02:43] hmm, I don't see any rotated logs [11:02:45] looking [11:03:08] https://www.irccloud.com/pastebin/sqGmyvsR/ [11:04:01] those are the pods logs (current is 11, previous is 10), but even under the `/var/lib/docker/containers/.../` there's only one log for each, so no archived rotation, maybe they get discarded? [11:04:24] yeah, looks like it [11:04:44] I bet that's something we could configure [11:05:52] hmm, I don't find any docker specific logrotate config [11:09:06] I think it's docker directly that trims the logs [11:09:23] https://www.irccloud.com/pastebin/pEIifj3N/ [11:09:56] hmm, they are way smaller though [11:10:57] root@tools-k8s-control-5:~# ls -la /var/log/pods/kube-system_kube-controller-manager-tools-k8s-control-5_3f76f3c5513ca93fee400f42c484d3b9/kube-controller-manager/ [11:10:57] total 16 [11:10:57] drwxr-xr-x 2 root root 4096 Jan 31 03:50 . [11:10:57] drwxr-xr-x 3 root root 4096 Oct 18 11:19 .. [11:10:57] lrwxrwxrwx 1 root root 165 Jan 20 22:50 10.log -> /var/lib/docker/containers/efb2621cee3910c02590c56a15df4847a6c3521e9dab474de99fe93b95023282/efb2621cee3910c02590c56a15df4847a6c3521e9dab474de99fe93b95023282-json.log [11:10:57] lrwxrwxrwx 1 root root 165 Jan 31 03:50 11.log -> /var/lib/docker/containers/7e8e7aa7d2358a5c4488fb676841474f4c529b1822c19201bf30fbe134e93551/7e8e7aa7d2358a5c4488fb676841474f4c529b1822c19201bf30fbe134e93551-json.log [11:10:57] root@tools-k8s-control-5:~# ls -lah /var/lib/docker/containers/efb2621cee3910c02590c56a15df4847a6c3521e9dab474de99fe93b95023282/efb2621cee3910c02590c56a15df4847a6c3521e9dab474de99fe93b95023282-json.log [11:10:58] -rw-r----- 1 root root 44M Jan 31 03:50 /var/lib/docker/containers/efb2621cee3910c02590c56a15df4847a6c3521e9dab474de99fe93b95023282/efb2621cee3910c02590c56a15df4847a6c3521e9dab474de99fe93b95023282-json.log [11:10:58] root@tools-k8s-control-5:~# ls -lah /var/lib/docker/containers/7e8e7aa7d2358a5c4488fb676841474f4c529b1822c19201bf30fbe134e93551/7e8e7aa7d2358a5c4488fb676841474f4c529b1822c19201bf30fbe134e93551-json.log [11:10:59] -rw-r----- 1 root root 439K Feb 2 11:10 /var/lib/docker/containers/7e8e7aa7d2358a5c4488fb676841474f4c529b1822c19201bf30fbe134e93551/7e8e7aa7d2358a5c4488fb676841474f4c529b1822c19201bf30fbe134e93551-json.log [11:11:28] the biggest one is 44M (for the previous container), and only ~450K for the current one [11:25:54] btw. noticed today that you can get your external ip by curling wikipedia xd [11:54:18] sorry I'm distracted with many things and couldn't follow up with you [11:57:49] np, I'm not very focused myself [12:45:27] I just gained access to alertmanager, debmonitor, icinga, turnilo, netbox, etc [12:46:36] \o/ [12:46:39] * dcaro lunch [13:55:27] I'm suddenly missing atuin in our servers [14:33:18] TIL: atuin, looks nice! though I'm afraid if I started using I would lose the incentive of writing down useful commands to a public wiki :D [14:33:31] s/using/using it/ [16:36:43] cya on monday [16:37:07] o/ [16:40:34] have a good weekend! [16:40:46] taavi: what was the trove instance you were trying to back up last week? [16:41:07] andrewbogott: the one in the metricsinfra project [16:41:18] thx [16:46:28] dcaro: your explanation makes sense. Thanks for elaborating. [17:10:04] * arturo offline [18:41:10] * bd808 lunch [19:33:50] Rook: I'm searching for oldish trove instances that I can try to upgrade (and maybe break in the process). Would quarry-dev-kube fit into that category? Is it OK if it winds up ruined? [19:34:24] Seems like it should be fine [19:34:49] great. We will see how this goes [19:35:00] 👍 [19:45:13] looks like it survived! [19:54:14] Rook: flush with success, I have a followup question -- what about quarry-db-02? Since -dev-kube survived the upgrade db-02 probably will too, but it'll be down for a few minutes while it rebuilds. [19:54:30] If that's something that needs pre-announcement that's fine, I'll just look elsewhere for victims. [19:56:22] I'd mostly go for it. Though I am stepping out. So if it does implode I won't be able to respond for a few hours [19:56:46] ok! I'm feeling lucky [20:34:17] mixed results -- the upgrade didn't work but the instance is still running. Going to stop messing with it for now. [21:33:51] I stumbled over the https://wikitech.wikimedia.org/wiki/Wikitech:History page while wikignoming today. It has a short timeline with citations for various changes in wikitech's content & hosting. I knew these things had happened, but not when so that was a cool find. [21:42:49] more ancient history: Coren's first day as the SRE for Tool Labs (2013-02-25) -- https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/7WXOZJ2QPXWBCFZD5MAD6IHZ5YOP5Y3Y/ [21:49:19] cool! [21:49:45] We are way behind on updating that page [22:22:16] andrewbogott: I keep day dreaming about starting a written history of WMCS stuff while we can still track down many of the folks who were part of various interesting eras. It never seems to go beyond day dreaming though. [22:23:08] You can start by adding one additional line to that History page and see how it feels :) [22:36:14] {{Done}}. I added the date that I made the main page point to the Portals:* pages [22:36:45] It turns out that was 8 years and a couple of days ago