[02:24:33] Hello everyone, it looks like some PCC instances' disk are full. Do you know what can be done to solve this? https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/40760/console [02:24:49] I see the issue was reported before. [07:38:32] looks like the root partition is around 80% on the pcc worker 1003, probably the delete-old-output-* timers cleaned up a bit.. maybe we could increase their frequency (atm once a day afaics) [07:41:49] created https://gerrit.wikimedia.org/r/c/operations/puppet/+/910415 [07:43:46] elukey: note that the 2nd one only deletes reports older than 7 days [07:45:03] and the 1st one is "31 days" [07:46:00] maybe we can change it to 15 days and 5 days? [07:52:53] XioNoX: I wanted to try with something less invasive as first step, I know that the extra second run wouldn't remove much, it was more to get the clean up happening twice a day so people will likely not get stuck during their workday [10:43:51] jbond: thanks for the feedback! Didn't realize that it was probably jenkins, morning pebcak :) [10:45:29] elukey: was just reading the scroll back here. the message is comming from jenkins but it could be comming from the agent. [10:45:59] however either way the reports (shuold) be stored in /mnt/nfs/labstore-secondary-project which has plenty of space and shouldn;t be affected byt the reguler cron job [10:46:25] TIL thanks [10:46:43] elukey: im curious thugh as you said that root was at about 80% yesterday and its at 75% today [10:47:02] did yuo to something to clean up the other GB? [10:47:26] either way i would have expected jenkins to be able to do its thinkg with 4G [10:49:10] jbond: nono didn't do anything, it auto-resolved by itself [10:50:05] strange i dont see anything that would explain it, please ping directly if you see again [11:52:26] who could I pester with questions re: docker-pkg. I am getting a bizarre error about an image needed for a build (docker-registry.wikimedia.org/python3-bullseye) not found while I can see it just fine with `docker images` [11:54:32] To wit, https://phabricator.wikimedia.org/P47265 [12:02:11] Ok, maybe I have been holding docker-pkg wrong :-/ [12:04:36] klausman: did you find your solution? [12:04:46] yeah, I was using the wrong command line args [12:04:54] correct: [12:04:56] docker-pkg --info -c config.yaml build images --select '*gpu-tester*' [12:04:58] wrong: [12:05:09] docker-pkg -c config.yaml build images images/amd/gpu-tester/ [12:06:03] (the build still fails, but now it's a different problem: connections to mirror1001.wikimedia.org. fail) [12:06:35] And now they work. Ok. [12:06:49] I call gremlins [12:07:33] Unrelated nit: timestamps of the format 14:07:06,453 are... questionable [12:07:41] (no, I have no funky locale set) [12:10:00] a-ha. Docker does something stupid with my network. While the build is running, connections to the outside world are somewhat broken. Fantastic. [12:12:20] $ ip ro get 208.80.153.42 [12:12:21] 208.80.153.42 dev veth2dfcba1 src 169.254.137.245 uid 1000 [12:12:23] cache [12:12:25] whyyyyy [12:15:52] klausman: the timestamp is the default python asctime format [12:16:01] and it's bad and wrong. [12:16:02] Although I don't think we need to log ms [12:16:16] a comma in that spto is just argh [12:17:55] It's one of the two options given by ISO8601, the other being . (which I agree should be the default) [12:18:08] especially with an enUS locale [12:21:38] Wanna know something fun? If you want to change the format, you don't get an option to have the ms. [12:24:03] Every day we stray further from sanity [12:24:29] I _think_ my network problem is NetworkManager "helping". Now I just need to figure out how to make it not do that [12:26:57] nah, that wasn't it [12:35:56] Root of the problem: (*&^% connman. Why did I even have that installed. [12:50:02] klausman: let's also give credit for the tool to the authors, since it works really nicely :) [12:50:10] (docker-pkg) [12:57:01] klausman: You got conned [12:57:51] Oh, when it works, it works nicely, but I would have hoped for a more obvious error when I was misusing it :) [13:00:59] klausman: there is always the possibility for improvements, it is an internal tool :) [13:03:27] This is true [13:17:30] klausman: patches welcome [13:20:28] I actually was making one, because I was wondering where that weird datetime format came from :') [13:22:05] either a python default or a volans default [13:22:13] joe: python default [13:22:33] what did I miss? :) [13:22:40] but as I was saying, the datefmt option doesn't let you get the ms :') [13:22:47] I tend not to care much, I grew up reading output from programs in FORTRAN77 written by astrophysicists, I have high tolerance to visual clutter [13:22:51] (which tbh we don't care about) [13:23:23] so I typically either copy riccardo's logfmts or wait for him to make a patch to make it not-default [13:23:33] or someone with similarly low tolerance [13:25:02] I called it a nit for a reason :) I don't care all that much in this instance, since I won't be parsing those logs. But a comma in that spot struck me as bizarre. As claime established, it's a Python thing. [13:29:14] win/ 14 [13:29:25] fail [15:39:57] marostegui: hello do you have a minute to power donw pc2011 for me so i can work on th idrac issue https://phabricator.wikimedia.org/T334722 [15:47:53] papaul: not sure if he is still around, but that may be a complex operation, unless it is an emergency maybe better to wait after the switchover? (we are in a db maintenance freeze) [15:48:20] jynus: ok [15:48:25] thanks [15:48:38] I don't speak for him, but that would be my best guess of what to do [15:50:02] both a db maintenance freeze and the day before a holiday [15:54:22] papaul: I will add manuel to the ticket, on a normal week it should be an easy maintenance [15:54:46] ah, you just did it [15:55:11] thank you [15:57:09] in the event of a hw issue, we should have a software way to mitigate it, so it shouldn't a blocker [15:59:39] Are any other teams in need of an openjdk8 deb for Bullseye and beyond? Or is it just us? ;) [16:00:14] inflatador: o/ [16:01:21] btullis ACK. Looks like there is a component out there: https://apt-browser.toolforge.org/bullseye-wikimedia/component/jdk8/ Have you used it yet? [16:01:22] inflatador: openjdk-11 is not an option? [16:01:35] we might, for gerrit on bullseye, not sure yet [16:02:08] mutante sadly not for blazegraph (wdqs) [16:02:09] We need it for Hadoop at the moment, plus a few other things. Yes, we have installed some test bullseye Hadoop things using it, but not migrated the prod cluster. [16:07:39] sudo cumin 'R:Package = openjdk-8-jdk' 'lsb_release -c' [16:07:49] ^ 46 bullseye hosts use it [16:07:58] (46) [16:07:58] an-presto[1001-1015].eqiad.wmnet,an-test-client1002.eqiad.wmnet,an-test-druid1001.eqiad.wmnet,an-test-presto1001.eqiad.wmnet,an-test-worker1001.eqiad.wmnet,build2001.codfw.wmnet,clouddumps[1001-1002].wikimedia.org,kafka-logging[2001-2005].codfw.wmnet,kafka-logging[1003-1005].eqiad.wmnet,kafka-main[2001-2005].codfw.wmnet,kafka-main[1001-1005].eqiad.wmnet,kafka-test[1006-1010].eqiad.wmnet,wdqs2022.cod [16:08:04] fw.wmnet [16:08:07] mutante ah nice! [16:10:01] https://debmonitor.wikimedia.org/packages/openjdk-8-jdk [16:16:52] now, to crib someone's puppet code ;P [16:21:04] inflatador: grep ./hieradata/ for "java_packages" and people both include or class { 'profile::java': } [16:22:53] mutante Guessing it will work even though java 8 on bullseye requires a separate repo? btullis do you happen to remember? Looking at https://github.com/wikimedia/operations-puppet/blob/production/modules/profile/manifests/java.pp#L18 [16:24:20] nope....looks like that might not be enough for Bullseye [16:24:32] inflatador: ideally an apt::repository would be in that if you use the right parameter [16:30:29] no need, if you use profile::java it automatically adds the necessary repos [16:30:51] based on what you have configured as your Java versions in Hiara [16:31:59] and feel free to use Java 8 for Blazegraph as well, I'm keeping it updated based on the quarterly Java security releases [16:32:52] moritzm we have `require ::profile::java::java_8` in our query_service common.pp, but it doesn't seem to work on my new bullseye host [16:33:05] `E: Unable to locate package openjdk-8-jdk` [16:34:20] based on https://phabricator.wikimedia.org/T264181 it's not yet using profile::java in general [16:34:38] best to have a look at existing uses, e.g. in the IDPs or Hadoop [16:34:54] moritzm thanks for the tip! Will check it out once I get back from lunch [16:35:12] if you include profile::java you only need to set a few Hiera variables with your preferences about JRE or JDK and which version you want 8/11/17 [16:35:18] and it will do the rest automatically [16:35:49] if you have any questions let me know and feel free to add me to reviewers for a patch, afk for a while now