[10:36:54] hey, any friendly SRE who could merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/728656 for me please? 🙂 [10:39:37] I can, this will effect mwmaint hosts, rigt? [10:39:48] jynus: yes [10:41:54] urbanecm: optionally, you can add a line such as `Hosts: mwmain1002.eqiad.wmnet` to the commit log after the Bug entry, then add the gerrit comment "check experimental" to get a Puppet Catalog Compiler run against that host [10:42:16] mwmaint1002 even :) [10:42:41] in this case the change is trivial (to a string literal) so not a huge importance in this case [10:42:54] ema: thanks. I'm actually aware, I felt this is a very simple patch (merely updates already existing jobs), so I didn't do it. Do SREs prefer to have output for all patches, even trivial ones? [10:43:36] personally, for trivial things I don't need them- as it will only suface issues with catalogue compilation [10:44:23] urbanecm: I don't think we have any sort of set standards, but I personally find it very useful. In this case, for instance, doing so captures the idea of which hosts are affected by the change (which is the first question jynus asked) :) [10:45:45] hmm, good point. Noted :)) [10:46:10] urbanecm, one diff per section, like this: +"--dbshard s8" [10:46:25] jynus: yup, SGTM. Thanks! [10:47:00] 17h left [10:47:09] so please check logs tomorrow? [10:47:42] yup, will do. [10:48:02] btw, this is more of a nitpick for the future [10:48:18] we are trying to deprecate dbshard everywhere [10:48:57] mw and dba stuff calls it section, as some sections are shards, but others are just replica sets with no sharding [10:49:15] (deprecate the terminology, I mean) [10:49:32] i see. Do you want me to rename the parameter jynus? :)) [10:49:44] not now, more like, for the future :-) [10:49:59] okay :)). [10:50:48] prefer section or dbsection or dblist for new usages [10:51:05] 0:-) [10:52:37] will do :) [12:29:56] elukey: around ? [12:39:01] effie: yep [13:34:04] elukey: I was in a meeting, I think we found the answer, thank you ! [13:34:53] anytime i need an answer, not asking elukey is a key part of getting the right solution [13:35:53] kormat: do you have a set of prepared statements about me to use when needed? You typed the sentence too fast :D [13:36:09] elukey: i type extra quickly when powered by Truth™ [13:36:15] ahhhhh [14:15:51] do we have any sort of standard procedure for rolling _back_ an updated apt package? say i upload a new package for X, deploy it to a server, find it doesn't work in prod. is there a known-good way to get back to $PREVVER? [14:16:52] (reprepro's policy of Only One Version is substantially unhelpful here) [14:17:27] kormat: I would just rebuild with a higher version, if I'm controlling the package [14:18:02] if not, it's possible to force a downgrade; not sure about debdeploy though [14:18:08] joe: so go through the whole increment-the-epoch dance? :/ [14:18:14] *not* the epoch [14:18:19] never epoch [14:18:44] but like 1.2.3+2oopsactually1.2.2 [14:18:48] if you just upgraded one server you can surely put the old one in reprepro and just downgrade that single server [14:19:07] the question is... do you have the old one [14:19:07] ? [14:19:23] cdanis: without specifics, I wouldn't have suggestions about version numbers [14:19:51] cdanis: eee, really? [14:20:07] kormat: if you increment the epoch you can never use an upstream package ever again (without manual intervention) [14:20:07] kormat: what package version is the old one, and which is the new? [14:20:17] cdanis: there's no upstream in this case [14:20:28] joe: example is orchestrator, version 3.2.3-3 is the currently deployed version [14:20:36] i've uploaded 3.2.6-1 to apt.wm.o [14:20:55] but not to any server? [14:21:10] to one server [14:21:21] not yet, correct. i'm currently documenting the process for upgrading a server, and i'd like to have a reasonable roll-back set of steps [14:21:38] * volans feels ignored... you can put the old one back in reprepro [14:21:44] as long as you have it [14:21:51] ok then as volans suggested... for a single server, you can just rollback in reprepro and force the upgrade [14:22:02] volans: ah, yes. so, that would be the process i'd have followed on my own. it's just fugly. [14:22:07] I always save all uploaded packages in my home before adding them to reprepro [14:22:12] in this case i can rebuild 3.2.3-3 [14:22:14] and it should also probably be in /var/cache/apt/archive [14:27:52] (rebuilding it in a few months time, however, would be a royal pita) [14:28:49] kormat: don't you have the old build somewhere when you rsync it from deneb to apt1001 to push it to reprepro? [14:28:59] volans: can't build it on deneb [14:29:16] need go 1.14, which isn't easily(?) available [14:29:18] scp then... but I guess you copied it there somewhere [14:29:25] on apt1001 [14:29:30] before running reprepro [14:30:06] volans: yeah. to ~/orchestrator. i then nuked that when i was copying over the new release, so there wouldn't be old things in the way. ;) [14:31:53] look at /home/volans/spicerack/ for example :-P [14:32:04] I do have a script to deply that does save them in the proper structure [14:32:24] so I have a reasonable probability to have them if needed [14:32:38] is deneb backed up? [14:32:59] that's on apt [14:33:12] and my deploy script saves them locally to my laptop too fwiw :D [14:33:24] ok, same question, but there i'm pretty sure the answer is 'no' for /home [14:33:38] jynus: any idea? ^^ [14:34:11] home on deneb? I can look [14:34:21] jynus: same for /home on apt1001 [14:35:12] is it a package? I sometimes get those from cache on other hosts (while I am checking bacula) [14:35:34] * kormat adds apt1001 to list of hosts to backup ~kormat from to her laptop [14:36:15] jynus: this isn't about specific data-loss that has happened. just trying to figure out how painful it would be in the future if said data-loss _did_ happen [14:36:24] ah, ok [14:36:41] no backups on deneb setup [14:37:13] apt1001 has roothome srv-autoinstall and srv-wikimedia jobs [14:37:20] let me tell you what those are [14:38:47] `/home` on apt1001 is currently 43G, on deneb it's 29G, so neither are huge [14:39:40] /srv/wikimedia is where you should maybe find those debs [14:39:47] /srv/autoinstall and /srv/wikimedia [14:49:35] jynus: how frequently do those backups run? [14:49:48] kormat, "daily" [14:50:02] let me know if there is actionable, regular filesystem backups are going to be expanded this/next quarter, and a few GBs wouldn't take much space [14:50:04] ok, cool. that's probably more than sufficient. [14:50:33] the quotes is because it is a full monthly, differencial every fortnite, incremental daily [14:51:16] jynus: 👍 [14:51:46] I am always happy to expand and get new requirements :-) [14:55:34] oh, that's a new one. uploaded packages to apt.wm.o. ran 'apt update' on a host, can see the packages, but if i try to install them, i get 404's [14:55:54] I can see /srv/wikimedia/pool/main/o/orchestrator/ on both backups for apt1001 and apt2001- I doubled checked [14:56:28] that last part, I don't know- something related to cache? [14:56:31] https://phabricator.wikimedia.org/P17482 [14:58:26] reprepro swears it has the package [15:00:59] anyone got any ideas? moritzm maybe? [15:01:41] looking [15:03:31] is that reproducible, when did you run this? jbond just switched the apt.w.o record to a discovery record, this might have overlapped? [15:04:11] moritzm: i can try again. about 10 mins ago. [15:04:17] moritzm: looks like the package is not there https://apt.wikimedia.org/wikimedia/pool/main/o/orchestrator/ [15:04:25] it only has 3.2.3-3 [15:04:46] it's on disk on apt1001 at /srv/wikimedia/pool/main/o/orchestrator/ [15:05:39] not 3.2.6-1i see the issue i think io got codfw/eqiad ips mixed up when i added the discovery record one second (well 5mins+ cc moritzm ) [15:05:51] apt currently going to apt2001 [15:08:15] ah, yes. that explains [15:09:25] do.. uploads to apt1001 not get synced to codfw? [15:15:20] kormat: yes they do but its not instance apt is active/passive [15:19:09] kormat: should be working again now [15:19:17] jbond: cheers [17:42:13] hello, could someone merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/730752 for me please? :)) [17:42:21] (last puppet patch from me for today, I promise :)) [18:00:17] * legoktm looks [18:13:49] thanks legoktm [20:23:51] cwhite: https://phabricator.wikimedia.org/T247675 [20:24:42] TLDR: Can we afford to double the size of mediaiwki messages within the same document/index? [20:25:43] or alternatively, would you be open to trying to do this and ECR at the same time? E.g. we could perhaps filter it such that MW emits the context.* fields and transate it back to the current flat form for type:mediawiki and do the ECR transform + keeping context.* in the new one. [20:25:51] then let users migrate after 90 days. [20:49:37] Krinkle: Off the cuff, I'm not sure. I'll run some numbers though. [22:16:10] Krinkle: I have some data for you. Do you mind if I just add it to the task? [22:18:37] sounds good