[08:04:56] https://github.com/memcached/memcached/pull/716 is very interesting [08:05:39] effie, _joe_ --^ [08:07:09] interesting indeed [09:22:56] hello! Is there a way to trigger the puppet compiler against WMCS instances? I have a puppet patch for deployment-prep and not sure how to puppet compile it ;) [09:23:05] maybe jbond knows ? [09:25:14] hashar: yes you can ju8st put in a wmcs host name or regex. however keep in mind it will fail if the regex matches both production and cloud hosts [09:26:06] e.g. `Hosts: sso-debmon.sso.eqiad1.wikimedia.cloud` or `Hosts: re:.*\.cloud` (dont do this last one its a lot os hosts) [09:26:35] jbond: so tentatelivy: Hosts: .*\.deployment-prep\.eqiad1\.wikimedia\.cloud [09:26:45] would compile against the few dozens of instances we get there? [09:26:53] yes [09:27:10] amazing :] [09:27:14] assuming all the othyer caveates around making sure the facts are synced etc [09:27:32] and they atually compile without my change bah [09:28:31] anyway, great to know WMCS is supported \o/ [09:28:34] this is the other bit you may need https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Updating_nodes [09:32:35] jbond: thx! [09:33:16] np [10:03:01] ssh: Could not resolve hostname compiler1001.puppet-diffs.eqiad.wmflabs: Name or service not known [10:03:04] hehe such a rabbit hole [10:03:53] hashar: that works for me [10:05:26] looks like a ssh_config issue? [10:06:51] try ssh -vvv, it will tell you if at least it goes to the bastion, you can then try to see if the name resolves from there [10:07:17] also not sure if for that vm, but you should start using puppet-diffs.eqiad1.wikimedia.cloud domain [10:07:35] that's the new domain (new instances will not have wmflabs entry) [10:07:39] https://gerrit.wikimedia.org/r/c/operations/puppet/+/699393 [10:07:39] iirc [10:07:50] * vgutierrez updating his ~/.ssh/config [10:07:52] thanks dcaro [10:07:53] yeah I might have removed the old .wmflabs in my config [10:08:06] so the patch above adjust the script to use the "new" domain name [10:08:18] which would then probably breaks for anyone still using .wmflabs in their ssh config [10:08:34] magic! [10:08:51] dcaro: but you're still using bastion.wmflabs.org ant not bastion.wikimedia.cloud, right? [10:09:26] bastion-restricted.wmflabs.org for eqiad1.wikimedia.cloud [10:09:47] fun thing, .cloud. is a valid top level domain so maybe one day we will have to switch again to a domain reserved for private use ( .local. ) [10:10:04] afaik wikimedia.cloud is a valid domain out there :) [10:10:07] it's a different domain though, wmflabs.org vs eqiad.wmflabs xd [10:10:15] oh have we registered it ? [10:10:24] yup [10:10:29] yep :) [10:10:32] ... [10:10:54] why does that organization has so many great people that think about all the little details such as using a proper domain name [10:10:59] wikimedia.cloud has SOA record ns0.wikimedia.org. hostmaster.wikimedia.org. 2020061109 43200 7200 1209600 3600 [10:11:00] [10:11:07] yeah perfect [10:11:14] you all are doing great :-] [10:11:18] there's some info on cloud domains (it's quite ravelled) here https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/DNS [10:11:23] I still get confused xd [10:11:33] so I can get a hashar.wikimedia.cloud ? :D [10:12:07] send a patch to operations/dns and it's yours :} [10:18:24] that Hosts: re:.*\.deployment-prep\.eqiad1\.wikimedia\.cloud is really magic [10:34:04] hashar: if you are fixing the current puppet run errors on deployment-prep and need help I'm willing to lend a hand, eager to stop getting the puppet failed emails :) [10:37:17] dcaro: oh well not really :@ [10:37:26] xd [10:37:30] np [10:37:31] I have been on a pet project experiment to reduce our backlog / tech debt [10:37:50] which I supposedly should blog about soon :] [10:38:04] the idea is to find out super old task that got forgotten [10:38:18] either decline them on spot (cause they no more apply, were wild ideas, got superseeded by a more recent task) [10:38:24] or just fix them [10:38:39] which has lead me to try to add a motd on all beta cluster instances https://gerrit.wikimedia.org/r/c/operations/puppet/+/699207 :] [10:39:17] for deployment-prep puppet failure emails, I guess all project admins receive them but most would ignore them unfortunately [10:39:31] we would probably a slightly better system [10:40:23] I get a bunch for things I have nothing to do with, so for me they are not useful [10:40:26] anyway, one task done and I have learned something new about ppc ! [10:40:35] it would be nice to have them more targeted (but how?) [10:40:57] every old dead task that is closed is a win though [10:41:10] yeah [10:41:15] been doing that for two weeks [10:41:25] 👍 [10:41:30] I got to find whether the actions I have done can be looked up in Phabricator somehow [10:41:46] with the hope we can forge a weekly or so report that shows very old tasks that get addressed [10:41:46] then [10:42:04] I want to host a contest :-] [10:42:09] oh ho [10:42:32] free beer / drink of choice to the person that closes the most tasks? [10:42:36] the most *old* tasks [10:42:47] and the best backlog cleaner is .... *roll the drums** [10:42:55] hehehehe, I won a fixathon in my previous company xd, got me new headphones [10:43:03] see [10:43:05] incentive!! [10:43:06] wow nice [10:44:00] so eventually my idea is to list out some hints as how to find old tasks one can fix/decline [10:44:04] I'd be happy with some celebratory stickers ;) [10:44:12] spurt out a few examples of actions I have done [10:44:32] highlight some metrics report (to be done) [10:44:40] then trick everyone into doing the same thing to climb the ladder and reach #1 [10:45:16] I just have to rig the metric dashboard so that the current viewer is always #2 and just a few tasks away from reaching the pole position [10:45:30] so everyone will kind of be forced to fix just one more task. Success! [10:45:32] yep, that works quite well usually to get people engaged, if you have a live graph it also helps [10:45:51] hahahahhaha, creative engagement solutions xd [10:45:58] :D [10:46:59] I will just start with the basic anyway: how to find candidate tasks that are easy to close [10:48:54] 👍 [10:59:43] and our wikis are too smart nowadays https://commons.wikimedia.org/wiki/Special:MediaSearch?type=image&search=Antoine+Musso [10:59:45] image search! [12:26:53] dcaro: I finally reached my email inbox and now understand your reference about puppet failures ;D [12:27:10] looks like all brand new indeed [12:31:15] 🐐 [12:31:39] one had some puppet cert failure ( deployment-shellbox ) [12:33:20] rejected: parameter 'enumerable' expects an Iterable value, got Undef (file: /etc/puppet/modules/mediawiki/manifests/mcrouter/yaml_defs.pp, line: 23, column: 47) on node deployment-deploy01.deployment-prep.eqiad.wmflabs [12:33:29] that is for the deployment-deployXX instances :D [12:39:19] I had a look time ago, and all the errors seemed to come from missing yaml properties (that are not available in cloud), mostly because in cloud the $::_role is not supported, so it does not look for the related hiera role-based yamls [12:42:00] yeah [12:42:39] the deployment-deploy instances have puppet failure cause of some mcrouter related hiera values that are missing [12:42:57] I have poked effie about it with some details https://phabricator.wikimedia.org/T284420#7151296 [12:43:17] the last instance failing is deployment-deleteme , which I guess will be deleted eventually 8) [12:43:43] I had a quick look and I saw the puppet error, I am not sure I can get to it today [12:44:04] it was not supposed to fail ther (famous last words) [12:51:46] effie: yeah might not be as trivial as adding some new heira variable :D [13:35:08] Krinkle: x2 isn't in use yet, right? [13:36:05] yeah, it doesn't have any db [13:36:13] I am going to upgrade those hosts [13:37:32] marostegui: ok :) [13:50:39] Krinkle: all done, all hosts upgraded [14:13:17] is this channel still in use since we stopped using freenode? [14:14:55] yeah [14:14:55] nono. we have 97 people in here for shared silence [14:15:00] * Reedy grins [14:15:10] which you've just ruined [14:15:15] cormacparle: 👋 [14:15:35] :p [14:16:52] if we have broken something fairly important on commons, and want to do an emergency deploy, what do we need to do? [14:17:28] Ask if SRE's are around for you to deploy it [14:17:35] Other people are too... What's the patch? :) [14:18:25] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaSearch/+/699411 [14:20:05] Looks fairly harmless in terms of deployment [14:23:17] <_joe_> cormacparle: looks harmless to me too, do you have a task number? [14:23:45] * cormacparle quickly creates a ticket [14:23:49] (we only just spotted it) [14:24:16] <_joe_> ahah ok [14:24:42] <_joe_> cormacparle: go on anyways, it's a simple configuration revert [14:25:27] ok cool - ought I check with releng too just in case? [14:25:39] <_joe_> no I think it's ok [14:25:44] great [14:25:52] <_joe_> you just need to ensure there is someone around to help if things catch fire [14:27:55] cool ... anyone here want to volunteer? or should I ask over in releng too? [14:28:16] You want someone to deploy it? [14:28:28] (I can) [14:29:09] <_joe_> oh heh yeah Reedy go on, else I can do it, now with the thing on toolsforge I feel safer deploying any change tbh :) [14:29:19] that'd be great Reedy thank you! [14:31:21] <_joe_> cormacparle: so we need to backport the patch to the current branches [14:31:37] <_joe_> oh reedy's doing it [14:55:56] I think Jenkins has finished for the weekend already [14:56:44] he woke up [14:57:06] back to butling [14:57:43] some days jenkins, and some days jenkouts [16:06:12] kormat: thx, forgot about repooling it. I assumed htat happened unlogged after the optimize was done, good to know! [16:06:41] kormat: marostegui: btw, a small amount of good news: I see in the mwmaint1002 logs that the (regular) purge cron job finished in 14 hours today from 00:00 to 14:30 UTC. [16:06:52] oh nice [16:06:56] no changes were made to its sleep size or chunk size [16:07:08] maybe it is because of all the defragmentation [16:07:11] I wouldn't be surprised [16:07:24] now, we are of course going to grow accumulate some more data, so right now the WHERE clause is matching relatively few rows [16:07:37] ah, yeah, that too. [16:07:43] so that's two reasons it will get slower again [16:08:03] but it's something we can monitor as another metric to see whether our current retention remains stable. [16:08:26] I do expect it to go over 24h again though since just purging 1 host unthrottled depooled takes 12 hours [16:08:37] and this is doing multiple shards, and with huge sleeeps in between [16:08:59] has got to take longer than that I guess. [16:09:20] right now all shards are at their lowest size with lots of buffer space, possibly evicting almost nothing [16:09:32] s/buffer/empty space/ [16:09:48] so we do need to figure out still also what to do there [16:10:08] because as soon as we go over 24h again, the growth rate is also going to start increasing again [16:10:20] Krinkle: I've been keeping an eye on it recently: https://phabricator.wikimedia.org/P16418 [16:12:33] awesome, added to the post-deploy task. [16:13:38] kormat: any chance we can repool today so as to start seeing over the weekend what the population rate is without further drops in the graphs? [16:15:25] not on a Friday evening [16:15:59] but, pc3 is replicating from pc1010, so it will at least get writes [16:16:24] https://orchestrator.wikimedia.org/web/cluster/alias/pc3 [16:16:32] ah , interesting. [16:16:32] https://grafana.wikimedia.org/d/000000106/parser-cache?viewPanel=9&orgId=1&from=now-90d&to=now [16:16:40] so there will not be a drop or change in rate here after Monday? [16:16:55] is the space always included in this total? [16:18:08] I'm guessing yes, so it is double counting one shard more or less, which one changes from time to time. [16:18:16] I'm on my phone, i can't really tell [16:18:44] with the eviction Manuel mentioned, I guess it will shrink a bit to shrug off the stuff it got from other shards over the past few weeks. [16:20:27] Krinkle: Yeah, I am taking a look now at the warmness of the innodb buffer pool size [17:29:24] just received new Thinkpad X1 replacement after 4 years, now got the brandnew "9th gen", ordered by ITS to be sent directly to me. nice! but then.. started Debian installer and no NIC is detected at all (again), which is frustrating because I already used the unofficial .iso that includes nonfree firmware which was usually the obvious fix but nto now :( [17:29:51] aren't there a number of folks around with it [17:30:08] someone has to have the easy fix [17:30:43] and 9th gen is soo new it's not even in https://www.thinkwiki.org/wiki/Category:X_Series [17:30:53] wow [17:30:59] congrats on having new shiny in any case! [17:31:21] what did you install? try the latest bullseye installer, it's very close to final anyway [17:31:43] best I found yet and pretty limited https://wiki.archlinux.org/title/Lenovo_ThinkPad_X1_Carbon_(Gen_9) [17:32:24] moritzm: so far I had tried firmware-10.9.0-amd64-netinst.iso . will do latest bullseye today [17:33:13] I did go through the install without networking and got the base install and it does load iwlwifi but installer finds no NIC at all whatsoever ..it says [17:33:40] also maybe I can still find the old USB to Ethernet adapter somewhere that this did not come with :P [17:36:05] but then.. i dont actually have somethig to plug the Ethernet into right now, was just living off of hotspots and mobile :P [17:36:55] hrm [17:37:04] you don't have any other isos around, I gues [17:37:05] s [17:37:10] maybe I should go back to the old days and use the public library to download an ISO and burn it to a CD-ROM and also get the drive from them for a day .. they actually DO have tech items at the library now , heh [17:37:14] and you have no network to get something else [17:37:19] yeah, something like that [17:37:42] apparently they ship it with ubuntu and fedora now so there ought to be one of those isos that works to at least get you online [17:38:45] yea, trying Ubuntu just to see what it does was another idea.. and then "debtakeover" to convert it.. jk on the second part [17:39:41] Ubuntu 20.04 LTS supposedly [17:39:51] but I dunno if they have added extra packages [17:39:52] mutante: did you use one of the 'nonfree' images? [17:40:05] mutante: the year of the Linux desktop has arrived, at least somewhat on Ubuntu :) [17:40:10] oh sorry you mentioned in your first post [17:40:10] cdanis: yes, unofficial "firmware" .iso [17:40:16] unfortunate [17:40:42] i'll let you know how it goes with bullseye though [17:41:08] hey, I got this nice hp omen to work out of the box with fedora 33, nvidia gpu used for rendering and I didn't have to do anything fancy, it was kinda awesome. so kinda year of the laptop [17:41:10] I don't even remember what I did for setting up my current Debian system to get everything working, but I remember it not being smooth [17:41:11] dunno why i took 10.9 [17:41:13] this is on my T480s [17:41:46] there are the non free repos for fedora though so [17:50:41] sukhe: the last couple years it used to be "almost everyhing works out o the box, except wifi" and then _if_ you got the nonfree wifi firmware on there (which was annoying until it detects removable media etc).. but once you had it.. it was fixed [17:51:11] (for X Thinkpads) [18:47:52] moritzm: what does very close mean? [18:50:42] they have a release date of july 31 I think but 26 June was proposed, that's only a couple weeks away. Just all the right people were not available. https://lists.debian.org/debian-release/2021/06/msg00200.html [18:51:17] hard freeze 17 July, still can't imagine there would be big changes before then [18:52:17] apergos: nice [19:00:23] herron: https://phabricator.wikimedia.org/T243057#7152299 [19:02:54] mutante: was just looking at that, I guess it makes some sense that by changing the device config on the VM the network interface would be renumbered as a side effect. I was worried it might happen on every gnt-instance reboot, but only after the disk remove makes more sense [19:05:57] yea, only when you add (and now we learned also remove) drives on existing VMs [19:06:38] sukhe: what was the reason again we have bast5001 and bast5002 using an IP each currently? I forgot again already, heh [19:07:33] there are also 2 IPs "reserved for infra" but dont know more ...how/if you could get one of them [19:08:50] herron: thanks for removing the prometheus disks, that did give us resources again. now it's just down to another resource. IP addresses. so the comments are semi-related :) [19:09:41] yeah for sure, didn't realize space was tight there already. haha naturally [19:10:43] sukhe: same in ulsfo, 2 bastions and 2 IP's "reserved". so yea.. gotta refer to infra/netops/traffic [19:40:25] mutante: I am not sure about the reason :) I think there is a broader discussion to be had on this topic; perhaps for the week after All Hands! [19:50:27] mutante: I remember coming across a ticket that talked about the eqsin bastion hosts and now I can't find it :( [21:52:15] sukhe: ACK, well, let's talk about it after allhands then, agreed :) [21:52:19] topranks: https://gerrit.wikimedia.org/r/c/operations/puppet/+/699323