[00:11:28] hrmm... the fix forward process doesn't work for me, I think I might be in a state other than this was meant for [00:12:17] running the agent (`puppet agent -twl`) fails with: Error: The certificate for 'CN=restbase2028.codfw.wmnet' does not match its private key [00:54:48] urandom: I fixed it [00:55:02] puppet on restbase2028 should work now [00:56:28] yea, it just ran. steps done: revoked cert on puppetserver1001, deleted key material on client, ran puppet to create new certificate signing request, signed on puppetserver [00:56:51] steps are in "fix forward" but not in that exact order [00:58:23] it says at the end to clean the cert on old puppetmaster. I did the same thing on the new puppetserver and then signed a new cert, that made things match again [01:01:18] final step: cleaned cert on old pupppetmaster. puppet still working on client. [01:04:15] cassandra servers are started. lgtm. just not pooled and leaving that to you [01:04:15] I... think I have that. I have more of these today, and I'm wondering how to apply this to them. Is it enough to have the hiera entries in place when changing the role? [01:04:29] Either way though, thanks for fixing that mutante ! [01:04:54] I am not entirely sure about that last question [01:05:26] but if you run into "cert does not match key" again, go to puppetserver1001 (the new puppet 7 server), "cert clean" there.. wait until it's done [01:06:17] back on the _client_ do the rm -rf /var/lib/puppet/ssl [01:06:25] run puppet again, should create new CSR [01:06:34] then "cert sign" on puppetserver1001 [01:06:41] what I mean is, I have machines that `insetup::serviceops` and running puppet 7, and I need to change them to `restbase::production` which is (aside from a canary, and this one), puppet 5 [01:07:30] is this breakage (and the subsequent fix) necessary, or is there a graceful way? [01:09:32] I think what you want is after one host is sucessfully converted, use "sudo cookbook sre.puppet.migrate-role restbase::production" [01:10:11] yeah, that makes sense [01:10:20] then it will tell you to put the Hiera keys in the role level [01:10:39] but I am not sure enough, I ran into similar issue you had with like one random host [01:10:43] and then also others fixed it [01:11:09] ideally if you can confirm with moritz.m tomorrow what the best approach ist [01:11:14] I should probably run it by moritz.m ... right [01:11:18] yea, this [01:11:26] he migrated one to 7 as a canary already [01:11:41] I assume it's ready for the whole role, and he just hasn't circled back yet [01:11:48] yea, maybe do one more new host and see what happens [01:11:53] then ask him before doing the role [01:12:03] makes sense [01:12:14] thanks for you help! [01:12:18] you're welcome [01:16:50] laters! also: <+icinga-wm> RECOVERY - cassandra-a SSL 10.192.16.237:7000 on restbase2028 is OK: SSL OK .. [08:45:27] urandom: the migration puppet7 -> puppet5 is not supported. If the hosts were supposed to be setup with puppet 5 they should not have been setup with an insetup role that has already been migrated to puppet7 [08:49:17] (officially supported). That said the reimage cookbook does delete the old cert from both puppet5 and puppet7 and then the puppet version to use is gathered from hiera data asking puppetserver for the value of "profile::puppet::agent::force_puppet7" for the given host and that's the puppet version used during the reimage [08:50:56] so if you reimage a host is possible to migrate also backward I think (but was not tested), while the migrate-* cookbooks are meant to only do the forward migration [09:33:21] I'm tracking down some issues with GeoIP2. I am not sure if I am reading T288375 correctly; should I be able to see `/usr/share/GeoIPInfo` on deployment or mwmaint hosts? [09:33:22] T288375: IPInfo MediaWiki extension depends on presence of maxmind db in the container/host - https://phabricator.wikimedia.org/T288375 [10:37:17] Do you know which team is handling the Developer account workflow? Is it Releng? Collab? Us? [10:46:12] kostajh: checking [10:47:10] claime: ty [10:55:43] as far as I can tell, the GeoIP2 files are in /usr/share/GeoIP, not in /usr/share/GeoIPInfo [10:56:55] kostajh: They are, through the Geoip::Data::Puppet class [10:57:40] claime: the files are in `/usr/share/GeoIPInfo`? [10:57:52] kostajh: in GeoIP, not GeoIPInfo [10:58:07] I see [10:58:19] so this config https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/f34c974194c52e7ff588c439a82ed1b025268b46/wmf-config/CommonSettings.php#3882 is likely invalid? [10:59:04] Well it's also in that path on appservers and k8s runners [10:59:39] If it needs to be in that path on mwmaint/deployment, it can be done [11:00:39] I am not sure why we have two separate paths for what are the same (?) files [11:01:18] for context: in this patch https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/978034/6/extension.json#315 I want to provide the path to the GeoIP2 country file, and am not sure which directory to use [11:03:59] kostajh: They're not the same files afaict [11:04:31] Country file seems to be in /usr/share/GeoIP but not /usr/share/GeoIPInfo [11:05:40] claime: can you share with me the file listing of /usr/share/GeoIPInfo? [11:06:00] sure [11:07:12] kostajh: two files, GeoIP2-Anonymous-IP.mmdb GeoIP2-Enterprise.mmdb [11:20:07] thanks! [14:51:18] jynus: today developer account tooling is managed by i/f, previously it was WMCS [14:51:41] thanks, will send them a suggestion [15:57:36] does anyone know if there's something I need to do to make a new IRC room "wikimedia official"? I'm trying to clean up the Data Platform channels [15:57:45] Re: https://phabricator.wikimedia.org/T352783 [16:05:01] <_joe_> inflatador: there are rules on meta, see https://meta.wikimedia.org/wiki/IRC [16:05:06] <_joe_> and the links therein [16:05:24] <_joe_> inflatador: if you want a better answer, #wikimedia-admin is the place to go to [16:05:44] _joe_ ACK, thanks for the info [16:05:51] <_joe_> (please not everyone who administers IRC does so in their volunteer capacity) [16:05:54] <_joe_> *note [16:06:11] <_joe_> sigh when a typo changes the meaning of what you're saying [16:06:58] <_joe_> inflatador: but the TLDR would be: create the channel as #wikimedia-$name, then add your channel configuration, if it's public, to https://meta.wikimedia.org/wiki/IRC/Bots/ircservserv [16:07:28] #wikimedia-ops, not #wikimedia-admin [16:07:33] <_joe_> sigh [16:07:35] <_joe_> yes [16:07:40] <_joe_> thanks taavi [16:07:53] <_joe_> and you're probably as sleep deprived as I am rn :) [16:08:07] ^^ +1 [16:09:51] somehow I'm not even that jet lagged [16:11:02] y'all at an offsite? :) [16:15:31] <_joe_> TheresNoTime: I am just naturally jetlagged [16:15:34] I am, I don't think _joe_ is here [16:15:44] <_joe_> I fell asleep at 3 am, woke up at 5:30 [16:16:00] ah :P [16:29:10] that is definitely not enough sleep :( [16:34:00] I feel dead at 4 hours sleep [16:34:19] Also thanks _joe_ for dealing with the event task the other day [20:18:38] mutante sorry for the spam, looks like the new alert for wdqs LDF is still creating a serviceops ticket...trying to figure out why ATM [20:19:27] closing T352810 shortly and will get a silence up [20:19:27] T352810: ProbeDown - https://phabricator.wikimedia.org/T352810 [21:02:18] inflatador: I noticed that because it reported in our team channel. I think it must be the "team" parameter. no worries [21:02:42] inflatador: team or "receiver" where it defines what actions to take on alerting [21:05:56] inflatador: team maps to one of the receivers in modules/alertmanager/templates/alertmanager.yml.erb [21:10:34] mutante I'll have a patch up to remove your team by EoD...still working on the patch. [21:12:10] thank you, no rush