[10:55:19] moritzm: when you have a minute, I'd appreciate a +1 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/737856 [11:11:51] Would anyone be able to cast an eye over this temporary firewall policy change please? https://gerrit.wikimedia.org/r/c/operations/homer/public/+/737889 [11:39:38] btullis: intent is clear, but I'm not sure if we can define the port directly in the cr-analytics.inc file like that. [11:39:54] We may need to define a name for it in SERVICES.svc and refer to that name instead. [11:40:18] XioNoX can you confirm that? [11:40:23] indeed, I was about to reply the same [11:41:16] or like https://github.com/wikimedia/operations-homer-public/commit/50350ecdcd650bab4fce9be34b39869dc4ef1024 [11:41:58] but a different definition is better if it's a different service [11:42:24] OK, thanks both. Are you happy for me to take this approach with this temporary singe port? I wondered about defining the whole default port range for transfer.py (4400-4500) but then I didn't know whether it was worth it. [11:43:18] No point having it more open than it needs to be, but if it's gonna use ports other than 4400 then it probably will need the range yeah. [11:45:01] It only uses a single port (it calls netcat on both ends) but its default choice is any unused port between 4400 to 4500. https://doc.wikimedia.org/transferpy/master/usage.html#Named%20Arguments [11:45:41] I'll stick with a single port for now and create a second CR to revert this temporary change later. 👍 [11:46:10] yeah that makes sense thanks. [11:46:14] cool! [11:48:29] Cheers. Appreciate the quick turnaround. [12:12:50] btullis: lgtm! let me know if you need help to push the change [12:15:12] Thank you. I think it's puppet-merge, run-puppet-agent on cumin server, then homer "cr*eqiad*" diff, homer "cr*eqiad*" commit 'My lovely message' [12:15:16] Is that about right? [12:16:38] btullis: no need for puppet merge [12:17:17] XioNoX: Oh yes, of course. Thanks. [12:17:29] btullis: you will see https://gerrit.wikimedia.org/r/c/operations/homer/public/+/737765 in your diff, it's fine to commit it as well [12:17:52] ping me if there is anything else ;) [12:18:11] XioNoX: Will do. :-) [14:08:24] <_joe_> kormat: sooo, this is for setting the *host* to critical, so it can happen post-read-only indeed. [14:08:48] <_joe_> and it doesn't need to be part of the procedure [14:09:01] <_joe_> so it's not indeed forbidden [14:09:16] <_joe_> once we move to alertmanager(TM) we can have more refined logic [14:10:18] <_joe_> but we could find solutions, but all would require us to change how profile::monitoring [14:15:25] <_joe_> *works [14:17:28] doing this purely via am is probably not workable, but a combination of scrape config generation and some alerting rules should be doable, if complex [14:21:26] <_joe_> yeah my point is that it's pointless to do deep surgeries about something that can be changed two days after we've switched over, and not cause any troubles in the process [14:21:40] <_joe_> especially if we're switching to a new system soon(TM) [14:21:49] 👍 [14:42:19] Hey! I'm in a meeting with new hires, and there is a request to have a session that presents an overview of the various components that we have. The "Request Flow" session from volans might be a good starting point. [14:42:37] Does volans or someone else want to do another live version of that talk? [14:44:10] gehel: ?sure? I think the request flow was joe's sessio [14:44:26] <_joe_> yep [14:44:30] <_joe_> it was mine [14:44:35] <_joe_> and sure, I'm happy to [14:44:42] Oh, might be. I was confused by volans picture at the end. [14:44:47] <_joe_> ahahahahah [14:44:47] eheheh [14:44:55] * gehel should have thought that volans wouldn't push his own picture :) [14:45:19] _joe_: do you want to set it up? Or want me to find a time for you? [14:45:21] <_joe_> gehel: who would like to attend? [14:45:32] <_joe_> I need to give a new hand of paint to the slides [14:45:51] Specifically Matthew Vernon, but I suspect other new hires might be interested. SRE and others. [14:46:05] I can find a list of potential attendees [14:46:36] <_joe_> well for SREs, they should all have been sent through all of those presentations :) [14:47:11] through the recordings? Or a live session? [14:47:23] <_joe_> gehel: for SREs, usually it's going through the recordings [14:47:37] <_joe_> but I think that that presentation specifically can use some renewal [14:47:45] <_joe_> so I will re-record it for the next times [14:47:50] I think there was an ask for a more interactive version. [14:48:31] Tbh, the ask was for an overview of all the components. And I'm not entirely sure that we have such a session. [14:48:41] Also not sure we could create it if we wanted :) [14:49:33] <_joe_> all the components? [14:49:39] gehel: with some of the recent sre hires we had them watch all the videos then had one session with all the presentes for a Q and A session on all the parts [14:49:41] gehel: the usual onboarding ask the new hire to go through the recordings and then setup usually 2 Q&A sessions, one around half videos and one at the end [14:49:43] <_joe_> how many weeks is that presentation? [14:50:02] in those Q&A sessions we usually answer any question that came up with the recorded presentations [14:50:17] I'm not sure if those were not setup (yet) in some cases [14:50:22] <_joe_> jokes aside, I think a single presentation is not super great for that, but we can do a 100k ft view introduction [14:51:42] "all the components" of MediaWiki or honestly everything in prod or ? [14:53:54] The ask is everything in prod. Not that necessarily possible. [14:54:40] the "service catalog" idea from product is probably eventually relevant for trying that [14:55:08] my onboarding checklist says "These are not meant for "self study"," apropos videos about our stack [14:56:03] kormat: fwiw, we have one manual puppet patch right now, which is switching httpbb monitoring to run from cumin1001<-->cumin2001. (we just dropped the one for WDQS lag checking!) I think it's fine to add another one if the feature is worth it, just file a a task in #DC-switchover as a future TODO that it should be automated somehow [14:56:24] (last item at the bottom of https://office.wikimedia.org/wiki/Technology/Onboarding/Checklists/MVernon#Getting_Started_in_the_Technology_Department ) [14:56:35] <_joe_> Emperor: yeah the idea is you watch a few, guided by your onboarding buddy, then we have a Q&A session [14:57:11] <_joe_> Emperor: who's your onboarding partner? [14:57:32] _joe_: kormat [14:57:38] <_joe_> oh, I'm sorry [14:57:46] that explains everything :-P [14:57:48] 🤣 [14:57:52] <_joe_> pretty much yes [14:58:24] <_joe_> Emperor: so yeah let's take the chance to re-do the request flow and the appservers deep dive ones [14:58:41] <_joe_> bd808: I think people are more interested in getting an idea of how our production systems work together [14:58:44] UPDATE onboarding SET buddy = '_joe_' WHERE newhire = 'Emperor'; [14:59:34] I appreciate being nice on IRC is a bit infra dig, but kormat has been great as an onboarding buddy (puns notwithstanding) :) [14:59:35] _joe_: *nod* I can see that being a thing that an SRE would want a grasp on as early as possible. [15:00:10] I'm still waiting for the deep overview of MediaWiki that I was promised in 2013 :) [15:00:20] <_joe_> Emperor: yeah it's just banter ofc [15:00:33] bd808: if you get it, ping me and I'll join ;) [15:01:29] _joe_: sounds good [15:01:43] <_joe_> Emperor: I'll try to schedule them next week [15:01:52] 🙇 [15:01:53] <_joe_> we can extend the invite to anyone who wants to know more [15:01:55] <_joe_> and btw [15:02:08] <_joe_> I just realized I should do a part about mediawiki running in kubernetes [15:02:55] *twitch* [16:11:11] _joe_: I'd sign up for that (curious how different our introductions are, as I often do them for non-SRE about our infra). [16:11:23] Also, welcome Emperor :) [16:12:12] <_joe_> Krinkle: I avoided giving a grand tour at first, I describe the flow of some specific requests, thus introducing pieces of the infrastructure in that context [16:12:47] <_joe_> for example, a very simplified vision of how a VE editing session happens [16:13:19] Ack. Per my diagrams, I tend to go for a Wikipedia page view or API request.