[08:28:48] vgutierrez@cp4044:/var/log$ ls -alh puppet.log* [08:28:48] -rw-r----- 1 root adm 0 Mar 1 00:00 puppet.log [08:28:48] -rw-r----- 1 root adm 315K Mar 3 08:28 puppet.log.1 [08:29:10] jbond: did we change log rotation for puppet recently? [08:44:15] vgutierrez: i dont think so [08:51:50] jbond: weird, that's the state for all cp servers right now [08:53:07] and some other hosts as well [08:58:03] right... all impacted hosts seem to be running bullseye (hence rsyslog 8.2102 instead of 8.1901) [09:00:49] hmm i can see some systems (sretest1001) on bullsye without that issue but ill did deeper [09:01:06] *dig [09:01:12] jbond: logrotate config for rsyslog in buster includes a daily rotated file (/var/log/syslog) that triggers /usr/lib/ryslog/ryslog-rotate, but that one is being rotated weekly on bullseye [09:01:34] jbond: I guess that's gonna depend on what "weekly" meansfor each bullseye host [09:02:48] that's /etc/logrotate.d/rsyslog BTW [09:05:18] vgutierrez: i have nmot hafd coffee yet so probably missing something , but /etc/logrotate.d/rsyslog us not responsible for puppet.log [09:06:04] jbond: it isn't [09:06:22] but triggering/usr/lib/rsyslog/rsyslog-rotate daily does the trick [09:07:09] ack i see thanks [09:07:09] vgutierrez: rsyslog-rotate tells rsyslog to reopen its logfiles (it sends it a HUP) [09:07:25] Emperor: I know.. and that's logfile agnostic [09:07:36] as long as it happens daily /var/log/puppet.log will look sane [09:07:52] if puppet.log needs to HUP rsyslog after its rotation, then the logrotate config for puppet.log should also call rsyslog-rotate [09:18:35] yeah... I was just pointing out why it's working in buster and not in bullseye [09:19:33] changing default rotation from daily to weekly feels like the sort of thing that should have made the release notes, oh well [09:20:27] jbond: are you working on a CR? I can do it if not [09:20:49] vgutierrez: im on it thanks [09:20:56] thx <3 [09:23:16] np, fyi https://gerrit.wikimedia.org/r/c/operations/puppet/+/893997 [09:53:00] jbond: I sent https://gerrit.wikimedia.org/r/c/operations/puppet/+/894009 your way for review; it's perhaps overly-conservative (but I didn't want reimages of older backends to accidentally change their disk layout...) [09:53:29] (DC-Ops are grumpy I've not sorted out partitioning for the new swift backends more rapidly) [09:56:44] Emperor: ack looking [09:56:51] <3 [10:01:12] Emperor: i left a comment but tl;dr yu shuld look at https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/859470/5/cookbooks/sre/swift/convert-disks.py [10:01:45] jbond: thanks; these are new servers, so _should_ start out all-JBOD :) [10:02:35] famous last words ;) [10:03:11] heh [10:04:48] what are your plans re that cookbook, OOI? I see it's under review still [10:05:29] Emperor: you are the target audience of that cookbooks so up to you :) [10:06:11] ah, OK. I'll see if I end up needing to use it on these new servers [10:06:54] the oly thing i would need to do is implment volan.s comment to ensure it can only work on swift hosts but otherwise we can merge it [10:07:03] * jbond will do that now [10:07:41] probably allow ms-be* and thanos-be* (as the latter are swift nodes too under the hood) [10:08:17] Emperor: is there a cumin alias that catches them all [10:09:23] nevermind i can use 'A:swift or A:thanos' [13:06:03] XioNoX: is there a task for the 'bgp status' warning in icinga? [13:06:20] looking [13:06:25] thx [13:06:30] jbond: the yellow ones? [13:06:51] yes [13:07:03] there is no great process, it's usually "when we get some time" [13:07:26] it usually means emailing our peers, etc and it can change too [13:07:43] I'll have a look at those [13:10:50] I acked the juniper alarms warning with the relevant tasks [13:17:02] jbond: all clear :) [13:22:24] XioNoX: ack thanks [13:26:14] almost all the remaining critical alerts in icinga are "Check systemd state" [13:28:58] Emperor: not sure if known https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=ms-fe2014&service=Swift+https+backend [13:44:37] XioNoX: that's a not-yet-in-service host cf T326848 [13:44:38] T326848: Q3:rack/setup/install ms-fe2013 - ms-fe2014, thanos-fe2004 - https://phabricator.wikimedia.org/T326848 [13:45:20] It'll be in service probably early next week [13:45:46] it should probably not be on that icinga page :) [13:53:30] Mmm, there is probably some process improvement in how/when nodes get added to icinga during the commissioning process [13:55:11] yeah for sure, it's possible to temporarily disable alerting in hiera too [13:55:58] some prefer it, some prefer to downtime the alert, it depends on if it comes up healthy, how long it will be "staged", the impact, etc [14:09:44] who si the best person to speak with about maps theses days? [14:12:43] jbond: I was looking at the "mapsXXX" in https://phabricator.wikimedia.org/T329073 but it's lacking feedback from many teams [14:13:23] XioNoX: ack thanks :( [14:15:30] going to use this opportunity to pings some of the teams about https://phabricator.wikimedia.org/T329073 (cc gehel, ryankemper, balloons, elukey, lmata, vgutierrez) [14:15:44] XioNoX: ack [14:16:13] hnowlan, urandom ^ too [14:22:28] XioNoX: ack, thanks! [14:28:05] hnowlan: fyi planet_sync_tile_generation-gis.service is failing on maps2009 [14:48:07] XioNoX: ack thanks! [14:55:02] XioNoX: done thanks! [14:55:38] btullis: o/ took the liberty to add "no action" to all the dse nodes, I think we can simply wait for recovery when it happens (so we observe how it fails etc..) [15:01:38] elukey: Yep, I agree. Many thanks. [15:19:46] jbond: ty, acked. will remove that check next week [15:22:50] hnowlan: ack thanks [15:37:22] does anyone with +2 on integration/config have a chance to look at this patch, https://gerrit.wikimedia.org/r/c/integration/config/+/893074? [15:44:01] jhathaway: for that repo I would recommend trying in -releng [15:44:14] mutante: thanks will do [15:44:29] I would ask hashar [15:45:04] yea, that's true but I know he is off now [15:45:26] I'll start with -releng, thanks