[09:07:24] [09:33:30] Is https://wikitech.wikimedia.org/wiki/Bastion still current? attemptiong to ssh to bast2002.wikimedia.org isn't working for me, and I see bast2003 is available... [09:34:28] Emperor: bast2003 is now current - There was an email to ops on May 5th. I got caught out too :-) [09:34:57] should wmf-update-ssh-config know about that change? [09:35:07] * Emperor edits their ssh config by hand (and the Bastion page) [09:36:14] I think the latest version of wmf-update-ssh-config does know about it, but I haven't checked personally. [09:36:28] wmf-sre-laptop does and it's the authoritative one [09:36:53] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/debs/wmf-sre-laptop [09:37:47] volans: wait, I have wmf-laptop-sre installed /o\ [10:43:44] dumb question time! how should I be installing docker-pkg, clone the repo and setup.py install, or something fancier? [10:44:45] <_joe_> kamila_: I think it's on pypi [10:45:42] _joe_: not under any obvious name AFAICT [10:46:21] unless we have our own pypi that I'm not aware of [10:50:37] <_joe_> uh no my bad [10:50:42] <_joe_> we never uploaded it [10:50:47] <_joe_> well, maybe we should :P [11:13:49] ...who exactly is "we"? XD [11:14:15] <_joe_> I know I have one [11:14:25] <_joe_> bd808 surely has at least one [11:14:32] <_joe_> I don't remember who else grabbed one [11:15:03] <_joe_> but on a more serious note, I wanted to find one of those online services that send a single printed t-shirt to people [11:15:17] <_joe_> and gift them every quarter :) [11:16:08] _joe_: wrong channel? [11:16:25] <_joe_> kamila_: sigh yes [11:16:33] <_joe_> I mixed the two conversations [11:16:37] <_joe_> sorry bd808 :) [11:16:58] <_joe_> kamila_: "we" here was really "anyone who has access to pypi and works here" [11:17:04] <_joe_> but I guess it should be me [11:17:13] _joe_: maybe go rest (and stay away from any root prompts for today) :P [11:17:27] <_joe_> kamila_: are you kidding? I want another t-shirt [11:17:31] :D [11:17:33] <_joe_> but yes, I'll rest soon-ish [11:17:45] <_joe_> I claim this is the effect of looking at prometheus configs [11:18:00] a solid excuse [11:24:34] _joe_: do you happen to feel like uploading the docker-pkg in the near future? asking because I prefer pypi if given the choice, so wondering if I should go do something else for a bit or just install it as is [11:28:53] <_joe_> kamila_: I'll do it today, promised [11:29:27] thank you <3 [12:21:51] If I want to backport two stacked comments, should I squash them first? [12:22:18] Not usually... [12:49:28] duesen: as long as they're stacked in gerrit, you should be fine [13:50:33] Reedy: and if I know that the first patch is broken without the second? The first patch is a re-apply of a reverted patch, and the second patch is the fix for the re-apply... [14:04:16] Reedy: i need to test them together on a debug host, before the first one can be pushed to all hosts. Can I just give scap both IDs, or do I need to squash first to make this work? [14:05:09] subbu: this question is why i don't like the "revert the revert" approach... [14:05:24] oh, subbu isn't here :P [15:55:36] Certainly in the old scap methods, you'd just merge both.. I think the newer scap deployment commands work fine with a stack too [15:57:28] * bd808 is not sure if _joe_ is sorry for pypi existing or for not having tshirts to give out or ... [15:57:38] <_joe_> tshirts :D [15:59:09] I don't think we have had any for a number of years now. I think Greg/Tyler ordered the most recent batch, but I think they were all handed out before we even moved to the new office in SF. [15:59:21] I have stickers which are much easier to carry around :) [18:13:54] heads-up: reimaging cookbook is not working because of some DHCP errors. I am filing a task but just in case someone was planning [18:16:09] https://phabricator.wikimedia.org/T336696 [18:20:03] thanks! [18:20:53] ssw1-a1 is also what showed up in -operations [18:20:56] /last ssw1-a1- [18:21:03] yeah [18:21:05] range 10.193.2.1/16 10.193.2.1/16; [18:21:16] I think it's being duplicated here and hence (it can only accept one range) [18:21:24] checking where it was introduced [18:22:42] hmm not sure [18:22:52] effie, _joe_, Amir1: so how can we move forward on https://phabricator.wikimedia.org/T329366? What'S the next step? [18:22:56] https://phabricator.wikimedia.org/T332180 [18:24:02] So we need to look at the problem of CPU usage in jobrunners [18:25:00] duesen: I'm making some progress on that. We (Search platform and I) might be able to brush off 10-15% of the load but it might take a bit of time [18:25:52] Amir1: would be possible to route the cirrussearch jobs to the parsoid cluster? Parsoid should no longer need it soon. What'S the load there? [18:26:25] Also, would fixing https://phabricator.wikimedia.org/T329842 help? [18:26:35] More than 50% of the jobrunner is the cirrussearch jobs https://performance.wikimedia.org/arclamp/svgs/daily/2023-05-11.excimer.RunSingleJob.svgz?x=126.2&y=901 [18:26:58] duesen: yes, that would help but we need to see how big of an impact that's going to me [18:27:00] *make [18:27:46] duesen: T329842 might help a bit but maybe not a lot, I feel that it's affecting very small pages that are easy to parse [18:27:47] T329842: Some jobs in refreshLinksPrioritized seems to repeat themselves for ever - https://phabricator.wikimedia.org/T329842 [18:27:49] cool [18:28:15] Krinkle: could you have a look at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/919289/? I'd like to merge it soon. [18:32:43] sukhe: how about this.. we just "live hack" it on the install server.. you can go ahead.. then we re-enable puppet and let the ticket play out [18:33:31] has anyone used transfer.py as a library? Just curious what 'remote_execution' means in this context: https://doc.wikimedia.org/transferpy/master/transferpy/transferpy.html#module-transferpy.Firewall . I'm just trying to open FW ports on target [18:37:23] mutante: I thought about it but kinda hesitant to do that. also this reimage is not important or urgent so I will wait :) [18:39:24] sukhe: fair enough! ack [18:39:46] mutante: I had your idea too but I think the fix is what I mention in the above comment [18:39:54] but yeah I guess we can wait [18:40:31] I agree now [18:40:36] since not urgent [18:40:41] not touching things :) [18:40:52] could still mess with automation [18:41:24] yep [18:51:03] <_joe_> duesen: the right thing to do is move a couple servers over tbh [19:08:37] _joe_, Amir1: when would be a reasonable time for me to poke you about this again? Next week? Next Month?... [19:16:37] Moving hosts from parsoid/etc to jobrunners is out of my area to be able to help, You should ask joe [19:41:16] <_joe_> it's a simple reimage, we can work on it tomorrow [20:04:06] Does our puppet config actively remove packages from a host? [20:05:26] inflatador: I imagine it depends on the module [20:05:29] It can [20:08:40] RhinosF1 yeah, looks like it's removing python27 on wdqs hosts, which is messing up our deploys...hmm [20:09:14] inflatador: does the puppet output show it doing so [20:10:17] When I've used puppet, it used to give an indication of what module the removal comes from [20:10:49] Y, it's logged [20:10:59] `(/Stage[main]/Base::Standard_packages/Package[python2.7-minimal]/ensure) removed (corrective)` [20:12:09] inflatador: is this a bullseye host [20:12:21] Y. Looks like modules/base/manifests/standard_packages.pp has an explanation [20:12:30] from the puppet repo that is [20:13:15] https://github.com/wikimedia/operations-puppet/blob/production/modules/base/manifests/standard_packages.pp#L78 [20:13:17] Yep [20:13:36] inflatador: the python2 in bullseye shouldn't be removed [20:13:51] Iirc it wouldn't work from the 'python' command [20:14:09] You'll need buster for python2 or to migrate to python3 [20:15:09] We just need it for `git-fat` which is a python2-only package [20:15:15] Ah! [20:15:41] We **should** be using git-lfs instead, but I don't know when that's happening...need to check with my SWEs [20:15:47] inflatador: read https://phabricator.wikimedia.org/T279509 [20:15:54] From moritzm [20:17:03] inflatador: and https://phabricator.wikimedia.org/T316876 [20:22:05] RhinosF1 THX, will add it to the current ticket I'm working [20:26:11] inflatador: link? [20:26:53] https://phabricator.wikimedia.org/T331300 [23:10:43] noting happened during on-call shift. [23:11:06] but there is the issue that the bot did not update the topic. [23:11:22] so it looks like I am on call but per VO I am not and need to go now [23:11:34] I suspect it is related to earlier issues in cloud [23:11:43] and want to shill for a prod bot box again. cya [23:12:03] will now be off as in "in a pool" [23:52:43] duesen: dcausse: ok, I'll look tomorrow morning. I at glance, I think we may want to invert this so that things work correctly by default instead of leaving it as a landmine for every other call. This is a feature specific to ArticleViewPoolCounter and further specific to page view cache misses. It's now been buried by 5 layers and reuse of code that feels accidental. Some of that is long term to think about her we could start by making [23:52:43] this opt-in. I'd then likely want to deprecate or re-exam some of the refactoring in this area to see whether anything else got lost along the way. Maybe ArticleViewPC is fine to leave as is but it still feels weird to have all PO access go through this now.