[08:31:21] my laptop just arrived \o/ [08:31:21] my laptop just arrived \o/ [08:31:31] new toys :-D [08:31:31] new toys :-D [08:32:44] the funny thing is that my wmde laptop was really bad and requested a new one for around a year, and basically when they said here is your new laptop. I responded with "well, I gave my notice two days ago" [08:32:44] the funny thing is that my wmde laptop was really bad and requested a new one for around a year, and basically when they said here is your new laptop. I responded with "well, I gave my notice two days ago" [08:33:42] one of the usb ports didn't work but every time you took it out, the laptop would just shut down :D [08:33:42] one of the usb ports didn't work but every time you took it out, the laptop would just shut down :D [08:34:40] haha, oh dear [08:34:40] haha, oh dear [09:24:58] Amir1: which one did you get? [09:24:58] Amir1: which one did you get? [09:25:46] marostegui: Thinkpad P1 G3 [09:25:46] marostegui: Thinkpad P1 G3 [09:26:02] it's beefy [09:26:02] it's beefy [09:35:08] :-( [09:35:08] :-( [09:45:34] Emperor: hello, yesterday go.dog found an error when running puppet on a swift host. Upon some troubleshooting between me and topranks we've got to the bottom of it and it unfolded as a 2 way issue, the puppet side of it has a patch that will fix it. The host side of it requires an echo to /sys/kernel/debug/i40e/... to fix it. It affects 15 swift hosts. [09:45:34] Emperor: hello, yesterday go.dog found an error when running puppet on a swift host. Upon some troubleshooting between me and topranks we've got to the bottom of it and it unfolded as a 2 way issue, the puppet side of it has a patch that will fix it. The host side of it requires an echo to /sys/kernel/debug/i40e/... to fix it. It affects 15 swift hosts. [09:46:38] We've tested the fix on relforge1004 and was all good, we're testing it now on relforge1003 with additional monitoring to ensure there isn't any network blip [09:46:38] We've tested the fix on relforge1004 and was all good, we're testing it now on relforge1003 with additional monitoring to ensure there isn't any network blip [09:46:48] full context is T290984 [09:46:48] full context is T290984 [09:46:48] T290984: error while resolving custom fact "lldp_neighbors" on ms-be105[1-9], ms-be205[1-6] and relforge100[3-4] - https://phabricator.wikimedia.org/T290984 [09:46:49] T290984: error while resolving custom fact "lldp_neighbors" on ms-be105[1-9], ms-be205[1-6] and relforge100[3-4] - https://phabricator.wikimedia.org/T290984 [09:49:02] We need to decide how we want to proceed with the swift hosts. It's not urgent, AFAIK the only thing that uses lldp_parent in pupet is to set the hostgroups in Icinga [09:49:03] We need to decide how we want to proceed with the swift hosts. It's not urgent, AFAIK the only thing that uses lldp_parent in pupet is to set the hostgroups in Icinga [09:49:34] Confirmed as hitless on relforge1003 also. [09:49:34] Confirmed as hitless on relforge1003 also. [09:58:53] volans, my only suggestion would be to add #SRE-swift-storage tag to the tickt and CC Filippo and Matthew [09:58:53] volans, my only suggestion would be to add #SRE-swift-storage tag to the tickt and CC Filippo and Matthew [09:59:58] filippo also mentioned someone else as a backup, but I cannot remember who [09:59:58] filippo also mentioned someone else as a backup, but I cannot remember who [10:00:31] as I just realized he is on vacations [10:00:31] as I just realized he is on vacations [10:00:55] * Emperor thinks he can find in scroll [10:00:55] * Emperor thinks he can find in scroll [10:01:28] cdanis [10:01:28] cdanis [10:01:41] yeah, I remember him saying it, but not sure which channel [10:01:41] yeah, I remember him saying it, but not sure which channel [10:03:18] ^volans [10:03:18] ^volans [10:04:09] jynus: filippo is the one that opened the task, I know he's out [10:04:09] jynus: filippo is the one that opened the task, I know he's out [10:04:45] (I didn't) 0:-) [10:04:45] (I didn't) 0:-) [10:06:27] there is no visible effect at all to apply the fix, we just would like a "go ahead" agreement :) [10:06:28] there is no visible effect at all to apply the fix, we just would like a "go ahead" agreement :) [10:06:30] https://phabricator.wikimedia.org/T290984#7354885 [10:06:31] https://phabricator.wikimedia.org/T290984#7354885 [10:06:36] (for the details of no impact ^^^ ) [10:06:36] (for the details of no impact ^^^ ) [10:06:43] it's not quite clear from the ticket, but did "ethtool -set-priv-flags eno5 disable-fw-lldp on" not work? [10:06:43] it's not quite clear from the ticket, but did "ethtool -set-priv-flags eno5 disable-fw-lldp on" not work? [10:07:12] the --show-priv-flags did not show the flag as it should have [10:07:12] the --show-priv-flags did not show the flag as it should have [10:07:18] neigher before nor after the echo fix [10:07:18] neigher before nor after the echo fix [10:07:24] hence we used the echo approach [10:07:24] hence we used the echo approach [10:07:41] see https://phabricator.wikimedia.org/T290984#7352746 [10:07:41] see https://phabricator.wikimedia.org/T290984#7352746 [10:07:53] OK. I'm content with you applying that fix, but you might like to ask cdanis as they are the swift-backup person, IYSWIM [10:07:53] OK. I'm content with you applying that fix, but you might like to ask cdanis as they are the swift-backup person, IYSWIM [10:09:23] sure, we can wait chris's input later today [10:09:23] sure, we can wait chris's input later today [10:21:35] does doing this change the LLDP response from the host? [10:21:35] does doing this change the LLDP response from the host? [10:22:26] yes, right now is broken [10:22:26] yes, right now is broken [10:22:37] or better there isn't any [10:22:37] or better there isn't any [10:22:49] so facter -p lldp is empty for example [10:22:49] so facter -p lldp is empty for example [10:41:40] OIC, sorry, I thought that was running on the host [10:41:40] OIC, sorry, I thought that was running on the host [12:01:28] marostegui: am I OK to reboot db1125? [I assume so 'cos it's mariadb::core_test, but...] [12:01:28] marostegui: am I OK to reboot db1125? [I assume so 'cos it's mariadb::core_test, but...] [12:01:38] (after lunch) [12:01:38] (after lunch) [12:03:37] yep, you can I downtimed it for 4h [12:03:37] yep, you can I downtimed it for 4h [12:16:56] ta [12:16:57] ta [12:26:55] OK, I'm happy that's doing the right thing - PME doesn't start on boot, but when you systemctl start mariadb, its gets started automatically by systemd. [12:26:55] OK, I'm happy that's doing the right thing - PME doesn't start on boot, but when you systemctl start mariadb, its gets started automatically by systemd. [12:27:27] I did "start slave;" at mysql once I was done mucking around [12:27:27] I did "start slave;" at mysql once I was done mucking around [12:27:27] Emperor: 🎉 [12:27:28] Emperor: 🎉 [12:27:36] cool :-) [12:27:36] cool :-) [12:28:26] and now that the new package is available, whenever a restart happens, it is very likely mariadb will be upgraded at the same time [12:28:27] and now that the new package is available, whenever a restart happens, it is very likely mariadb will be upgraded at the same time [12:30:06] I dunno if we want to keep T289488 open until the packages are deployed everywhere? [12:30:06] I dunno if we want to keep T289488 open until the packages are deployed everywhere? [12:30:06] T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter - https://phabricator.wikimedia.org/T289488 [12:30:07] T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter - https://phabricator.wikimedia.org/T289488 [12:31:25] Emperor: given it'll take months for the change to propagate if we're depending on new _mariadb_ versions to be installed, i would keep it open [12:31:26] Emperor: given it'll take months for the change to propagate if we're depending on new _mariadb_ versions to be installed, i would keep it open [12:31:40] (as opposed to releasing a new package for an existing mariadb version with your change applied) [12:31:41] (as opposed to releasing a new package for an existing mariadb version with your change applied) [12:32:27] months == 6-8 months, i'd estimate [12:32:27] months == 6-8 months, i'd estimate [12:33:17] 'k [12:33:17] 'k [12:33:35] to put it another way, i'd prefer if we weren't in limbo for that long [12:33:35] to put it another way, i'd prefer if we weren't in limbo for that long [12:33:41] (we can presumably do the latter if it gets annoying?) [12:33:41] (we can presumably do the latter if it gets annoying?) [12:38:00] Emperor: that would be my preference, yeah [12:38:00] Emperor: that would be my preference, yeah [12:41:04] I have no problem with that plan (but don't think I'm the one best placed to make it happen :) ) [12:41:04] I have no problem with that plan (but don't think I'm the one best placed to make it happen :) ) [13:01:24] so the plan is to leave that task open till 10.4.21-2 (or higher) is installed everywhere? [13:01:24] so the plan is to leave that task open till 10.4.21-2 (or higher) is installed everywhere? [13:02:32] marostegui: what i'm suggesting is that we release a new package for the _current_ version of mariadb 10.4 that we have deployed, that only contains the systemd change from matthew [13:02:32] marostegui: what i'm suggesting is that we release a new package for the _current_ version of mariadb 10.4 that we have deployed, that only contains the systemd change from matthew [13:02:42] that we can install everywhere without needing any reboots [13:02:42] that we can install everywhere without needing any reboots [13:02:57] aaaaaah right [13:02:57] aaaaaah right [13:03:19] but that means quite a bunch of 10.4 [13:03:19] but that means quite a bunch of 10.4 [13:03:41] as we have .15, .18 and .19 I think [13:03:42] as we have .15, .18 and .19 I think [13:03:48] installed at the moment [13:03:48] installed at the moment [13:05:28] I can compile all those no problem, but not sure if it is worth all the work [13:05:28] I can compile all those no problem, but not sure if it is worth all the work [13:05:28] marostegui: with a nicely documented build process for packages, i'm sure any random new person on the team could easily produce those updates! ;) [13:05:29] marostegui: with a nicely documented build process for packages, i'm sure any random new person on the team could easily produce those updates! ;) [13:05:40] yeah there's that [13:05:40] yeah there's that [13:05:42] yeah, that's no problem at all [13:05:42] yeah, that's no problem at all [13:06:03] we also have the additional issue that we only keep one version on the repo [13:06:03] we also have the additional issue that we only keep one version on the repo [13:06:29] AFAIK there can only be one version for any given package in reprepro for a given OS [13:06:29] AFAIK there can only be one version for any given package in reprepro for a given OS [13:06:36] * kormat winces [13:06:36] * kormat winces [13:06:46] i retract everything. new plan: NEVERMIND [13:06:46] i retract everything. new plan: NEVERMIND [13:06:57] sorry marostegui, hit enter before reading your last :) [13:06:57] sorry marostegui, hit enter before reading your last :) [13:07:00] xddddd [13:07:00] xddddd [13:07:47] unless you want to do the overhead of adding components [13:07:47] unless you want to do the overhead of adding components [13:07:59] and then assign them to the various hosts [13:07:59] and then assign them to the various hosts [13:08:05] I will get s1 and s8 on the new package though, so all those will be done for the switchovers [13:08:05] I will get s1 and s8 on the new package though, so all those will be done for the switchovers [13:08:10] volans: i absolutely do not [13:08:10] volans: i absolutely do not [13:08:15] haha [13:08:15] haha [13:08:21] :) [13:08:22] :) [13:08:53] then your last chance is to find the right about of money you need to give mor.itz to close both your eyes while you dpkg -i them :-P [13:08:53] then your last chance is to find the right about of money you need to give mor.itz to close both your eyes while you dpkg -i them :-P [13:08:59] *his [13:08:59] *his [13:09:01] i get why the debian packaging ecosystem is the way it is, but ye $deity does it hurt at times [13:09:02] i get why the debian packaging ecosystem is the way it is, but ye $deity does it hurt at times [13:09:27] my suggestion would be to close as fixed and state in the task that 10.4.21-2 and higher get installed as we normally and that will keep fixing the issue [13:09:27] my suggestion would be to close as fixed and state in the task that 10.4.21-2 and higher get installed as we normally and that will keep fixing the issue [13:09:45] jesus my English... [13:09:45] jesus my English... [13:09:52] Emperor: ^. a good example of the general rule: ignore Stevie Beth, listen to Manuel [13:09:52] Emperor: ^. a good example of the general rule: ignore Stevie Beth, listen to Manuel [13:11:22] :) [13:11:22] :) [13:44:38] volans: Emperor: lgtm [13:44:38] volans: Emperor: lgtm [13:45:51] topranks: ^^^ thanks cdanis, we were discussing in i/f too if at this point makes sense to fix this directly in puppet [13:45:51] topranks: ^^^ thanks cdanis, we were discussing in i/f too if at this point makes sense to fix this directly in puppet [13:45:57] without the need of a cumin run to apply this [13:45:57] without the need of a cumin run to apply this [13:46:13] that also sounds fine to me [13:46:14] that also sounds fine to me [13:46:25] an on-NIC lldp agent is pretty silly imo :) [13:46:25] an on-NIC lldp agent is pretty silly imo :) [13:46:32] yeah [13:46:32] yeah [13:47:54] I think we need a general mechanism to make sure this is there - for any new servers or for when one reboots. [13:47:54] I think we need a general mechanism to make sure this is there - for any new servers or for when one reboots. [13:48:16] The question remains whether it's urgent enough to warrant a "quick fix" command line via cumin until we have that in place. [13:48:16] The question remains whether it's urgent enough to warrant a "quick fix" command line via cumin until we have that in place. [13:56:39] depends when we'll get the fix, if it's a couple of days doesn't matter I'd say, if more let's fix it [13:56:39] depends when we'll get the fix, if it's a couple of days doesn't matter I'd say, if more let's fix it [14:09:34] ok, I hope to have some time this afternoon to look at it, when we've a clearer picture of how much effort is required we can make the call. [14:09:34] ok, I hope to have some time this afternoon to look at it, when we've a clearer picture of how much effort is required we can make the call. [14:13:13] SGTM, thanks for looking into it topranks! [14:13:13] SGTM, thanks for looking into it topranks!