[09:26:03] <_joe_> one of the puppet compilers has a full disk [09:28:39] didn't we had a GC script that was removing old runs? [09:31:37] <_joe_> yes, but I probably overwhelmed them with my compilations :P [09:33:07] <_joe_> actually I think this run might be the culprit https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31329/ [09:39:37] _joe_: I think you have a pending patch to merge on labs/private.git [09:39:57] <_joe_> arturo: go on, it's backwards compatible [09:40:05] <_joe_> if you have a patch of yours to merge [09:40:33] ok, I just merged `Giuseppe Lavagetto: remove tokens for production services from CI. (c94f32d)` [09:41:09] <_joe_> yup, thanks a lot! [09:41:41] np! [09:48:12] <_joe_> so it wasn't that build, it looks like GC is not working on compiler1002 [09:55:02] hello, we have a couple puppet patch we have cherry picked and I could use them to be merged :) Tested and they pass all fine. https://gerrit.wikimedia.org/r/c/operations/puppet/+/717687 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/722476/ [09:55:21] that is to let us provision Bullseye based instances to act as CI Jenkins agents [09:55:39] (the fleet is still on Stretch and gotta be migrated) [09:55:58] <_joe_> hashar: I'll take a look in a few [09:56:09] :unicorn: [10:00:07] and unrelated, there is a docker-pkg patch to add the user PATH to the environment when running test.sh commands ( https://gerrit.wikimedia.org/r/c/operations/docker-images/docker-pkg/+/692995 ). [10:19:48] <_joe_> hashar: sorry I'm ruinning a bit late on your patches, it might happen in the afternoon [10:20:00] <_joe_> something quite unexpected happened when merging a patch of mine [10:20:03] _joe_: no worries there is no rush ;) [10:20:10] it is merely to clear the Gerrit review queue [10:20:55] good luck fixing you patch! [10:26:31] <_joe_> hashar: I am a bit confused, it actually seems my patch fixed permissions on deploy2002, while they were ok on deploy1002 [10:28:21] maybe puppet ran behind your back on deploy1002 [10:30:05] <_joe_> no, I did see the changes I expected there, while on 2002 there were some weird things [10:30:07] <_joe_> anyways [10:47:00] Amir1: what did you run exactly ? [10:47:21] scap sync-file [10:47:35] scap sync-file wmf-config/InitialiseSettings.php 'Config: [[gerrit:723211|Enable new dispatch via job approach on testwikidata and testwiki (T291610)]]' [10:47:36] T291610: Enable new Dispatching on test wikidata - https://phabricator.wikimedia.org/T291610 [10:48:24] if I had to guess that's a bug introduced when converting scap from py2 to py3, the stack trace suggests it's in the irc logging code which is not enabled/tested on deployment-prep [10:49:05] ok, we need to create a task for releng to fix it, and I need to roll back [10:49:28] <_joe_> yeah majavah I think the fix is a one-liner [10:51:33] one option is to only rollback on deploy* [10:51:52] since scap on canaries has not complained [10:53:19] _joe_: should I just downgrade on deploy1002? [10:54:36] <_joe_> effie: yes, we can re-upgrade with a new version later [10:54:51] do you have a task id already? I think I have a fix [10:55:00] I like this solution better than rolling back cluster wide [10:55:11] <_joe_> majavah: I guess the deploy 4.0 task is gtg [10:59:21] https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/724396/ [11:07:58] Amir1: you can run it again [11:08:26] okay [11:11:12] <_joe_> majavah: merged; to build the new package we'll need to do some more stuff but we can take care of it [11:11:19] <_joe_> as expected it's a one-liner [11:11:34] <_joe_> *cough cough unit tests cough* [11:16:03] <_joe_> majavah: I created https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/724400 and https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/724401 if you want to give a +1 [11:19:05] _joe_: done [11:27:25] <_joe_> thanks [11:28:26] :jbond there's a puppet patch from you waiting to be merged. `Jbond: apt::package_from_component: update spec tests (93ea94eba0)` [11:28:37] Would you like me to merge it? [11:28:43] btullis: sorry yes please go ahead [11:29:02] np, thanks. [11:29:09] thanks [11:37:01] <_joe_> effie: you can rebuild scap 4.0.1 whenever you have time [11:37:09] <_joe_> and roll it out to the deployment servers too [11:37:21] after lunch, yes [11:37:26] <_joe_> (that's the only place where that code path will be touched) [12:41:57] moritzm: ping, there's a pending puppet change for merge "Update DHCP address for testvm2001", is it ok for me to merge? [12:50:09] dcaro: sorry, got distracted. please do! [12:51:17] ack [12:51:18] thanks! [14:22:17] I am going to depool maps (specifically the kartotherian service) in codfw for some testing of the new tile generation service [15:38:57] \o/. /me fingers crossed [16:52:38] Can someone confirm that we do indeed backup the /srv/mediawiki-staging/private data on the deployment host somewhere (besides other mw-related servers). Ref https://phabricator.wikimedia.org/T69818 [16:52:47] I think this is in bacula now but I'm not sure where to check [16:54:25] (context: this is the oldest open incident follow-up, 2014) [16:56:29] Krinkle: profile::mediawiki::deployment::server seems to backup all of /srv [16:56:38] Krinkle: AFAICT yes, but j.aime is authoritative on this. in modules/profile/manifests/mediawiki/deployment/server.pp [16:56:41] we have backup::set { 'srv-deployment': } [16:56:47] and in modules/profile/manifests/backup/filesets.pp [16:56:59] that set has includes => [ '/srv' ] [16:57:34] We'd need more than just /srv/deployment since that's all the non-mediawiki deployments (afaik all public Git repos, but I guess doesn't hurt backing up just in case) [16:57:35] and the former profile is applied to deploy2002.codfw.wmnet,deploy1002.eqiad.wmnet [16:57:48] ===> that set has includes => [ '/srv' ] [16:57:59] that should be the backed up path [16:58:07] right, ok. [16:58:19] but again, if you need an authoritative answer ask Ja.ime [17:08:20] <_joe_> yes we've always backed up all of /srv on the deployment hosts [17:08:25] <_joe_> sorry I was not reading [17:08:41] <_joe_> I've recovered the private repo from there years ago btw [17:09:02] <_joe_> so unless osmething has changed, we've been backing it up for a long time [17:50:40] I will answer tomorrow, but the true authoritative answer is: "trying a restore and see that it works" :-) (which I highly encourage doing from time to time, even if partial) [18:26:07] Krinkle: we are rsyncing /srv/deployment and /srv/patches over to the inactive deployment server. in addition as others have said there is Bacula and the fileset contains the entire /srv. I have confirmed this on bconsole. the way to do that is: ssh backup1001; sudo bconsole; restore; 75; 2 and then it's building a virtual directory tree you can navigate in. So short of doing the actual restore.. [18:26:13] it is there, includes mediawiki/ , /patches/ deployment/ etc