[07:19:00] XioNoX, topranks o/ not sure if it was already discussed (in case sorry) but I see in icinga a lot of nodes on new E/F racks showing connectivity issues (seems to be the broken ARP reported in https://phabricator.wikimedia.org/T294137#7763600) [07:33:52] mmm there are some alerts for kartotherian-eqiad [07:34:55] elukey: em no, thanks for that, I've cleared the mac-ip cache on all of those switches now to resolve. [07:35:15] I'll need to look and see if there is any pattern here, previously this has only happened following initial reimage. [07:35:40] If that has changed it may require us to review the status of those racks [07:35:50] topranks: <3 [07:36:34] I did manage on Friday to capture a trace of it "as it happened" for Juniper, so hopefully we can make some progress with Juniper. [07:54:51] I have restarted all eqiad tilerators, they were down due to some osm/loading issue IIUC [08:07:28] Looking a little deeper at this, the affected hosts do all seems to have had the issue since shortly after their re-image on April 7th. [08:07:40] So it doesn't seem to be outside the pattern we've seen so far. https://phabricator.wikimedia.org/P25274 [08:15:11] hnowlan: o/ around? [08:24:37] hnowlan: there seems to be an issue with kartotherian/tilerator on maps node, I tried to restart some daemons/nodes but didn't get that far [08:40:25] is there some 'feature' on cumin hosts that kill long-running idle shell sessions? [08:41:29] aaargh. [08:41:31] modules/base/files/environment/bash_autologout.sh:TMOUT=432000 [08:41:36] so that's how i keep losing state :/ [09:02:37] akosiaris: congratulations! i have randomly selected you for a one-line CR: https://gerrit.wikimedia.org/r/c/operations/puppet/+/784224 [09:05:06] "randomly" [09:05:51] (oblig: https://xkcd.com/221/) [09:07:28] lol. I also have a place in my heart for https://www.americanscientist.org/sites/americanscientist.org/files/20144141249210337-2014-05TechnologueFp170.jpg [09:07:45] I remember the Debian openssh debacle back in the early 00s [09:08:16] sorry, late 00s [09:08:33] akosiaris: that dilbert comic lives in my head [09:08:48] and ofc related: https://xkcd.com/424/ [09:09:06] MD_update(&m, buf, j); [09:09:15] what could possibly go wrong [09:09:38] * kormat giggles [09:12:03] elukey: thanks for the heads-up, that service isn't in the critical path any more. Not sure why the alerts have revived though, will fix [09:12:15] akosiaris: thank you for your service [09:12:52] hnowlan: ack thanks! I saw also a tegola-related alert, but I didn't recall if now it is independent from maps nodes or not, so I thought there was some dependency [09:13:02] (tegola pybal alert I meant) [09:13:04] kormat: I 've left a comment in https://phabricator.wikimedia.org/T122922#7863525 as well [09:14:08] oh great, thanks <3 [09:18:35] elukey: ah, yeah, they are indicative of something bigger :/ [09:28:17] lemme know if you need help! [12:21:20] volans: I got a - **Failed to get Netbox script results, try manually**: https://netbox.wikimedia.org/api/extras/job-results/2896452/ during a reimage, but I haven't been able to find how to proceed after that on wikitech, is there a doc page I can read? [12:23:21] I guess it is related to https://netbox.wikimedia.org/extras/scripts/interface_automation.ImportPuppetDB/ ? [12:37:45] marostegui: checking, sorry was at lunch [12:37:52] no problem volans! [12:38:12] so, the current Netbox version that we have keeps only the last run of a script [12:38:47] just hte other day Netbox latest release changelog reported that they added support for keeping more runs [12:39:14] so my bet is that while the cookbook run the script and was polling the results, another run of the same script (for another host most likely) was run [12:40:37] and yes, you're right, that's the Netbox script in question (interface_automation.ImportPuppetDB) [12:40:52] Do I need to run it manually as the output says? [12:41:18] marostegui: which host was this? you can safely re-run the netbox script, doesn't hurt and at most updates netbox according to the real data in puppetdb [12:41:25] volans: it was db1136 [12:41:57] but most likely it was already synced, yeah the dry-run shows a noop [12:42:19] but in any case, you can always run that script [12:43:07] volans: Cool, thank you :) [12:44:42] np [12:59:19] folks I am going to re-initialize the ml-serve-codfw cluster, I'll try to downtime everything that I can but some alarms (like pybal etc..) may fire, in case please ignore them :) [15:22:08] I sometimes feel that there needs to be a #no-stupid-sre-questions channel for asking stupid questions like this... [15:22:46] btullis: this one is totally fine! there are no stupid question, really [15:22:49] what's yours? :) [15:23:06] Thanks. Is there a reason why we don't have `givenName` or users' full names in our LDAP directory? [15:24:21] the actual WMF directory is the google one, the LDAP we have is wikitech, where anyone can create an account [15:25:03] then there is an ldap-corp that is a RO replica basically of what's the google directory, but that one will most likely be replaced soon-ish™ [15:25:10] and is managed by ITS [15:26:17] I'm talking wikitech I think. ldap-ro.eqiad.wikimedia.org - so can I just add a `givenName` field from the Wikitech preferences UI? [15:27:48] I'm not sure I'm following, but moritzm is surely more suited to answer this one ;) [15:27:51] what's your use case? [15:28:11] /srv/labsdb/binlogs? [15:28:21] btullis: I don't think the wikitech extension currently supports that, but we'll soon have someone starting to build a proper IDM and then we'll also revamp the attributes in use and have support for this [15:28:40] but yeah depends on your use case and what you need it for, there might be other options [15:29:41] Cool, thanks. I'm using the wikitech LDAP directory as an authentication source for datahub.wikimedia.org - I'm currently trying to pre-create the user database by doing an LDAP import: [15:30:31] It's baulking a little because we don't currently have First and Last names, or givenNames for people. [15:32:33] It would also be nice for other things that create accounts on successful LDAP authentication, like Superset. [15:33:31] Anyway, I think I can probably work around it for now the issue might well go away when we switch to OIDC authwentication: https://phabricator.wikimedia.org/T305874 [16:27:01] for now using the CN as the primary identier seems like the best option, the current wikitech admin follows mostly the on wiki approach of focusing on the user's pseudonym instead of a real/given name [16:27:31] moritzm: Thanks 👍 [16:28:00] all the object classes on the LDAP level support it though, so when we extend this towarsd a full IDM we'll have the full names also stored/managed in the IDM for staff and users with elevated access (researchers/community NDA) [16:30:12] Great. That works for me too. [19:59:05] Krinkle: I though Redirect_fixer also take care of Special:DoubleRedirects except if marked with __STATICREDIRECT__ ? [20:04:34] hauskatze: I'm not aware of logic in core that would spontaneously discover and rectify pre-existing double redirects. [20:05:01] Indeed, pages marked with the __STATICREDIRECT__ magic word will not be updated. [20:06:26] Krinkle: I never used the feature in prod (was briefly enabled for a while before I was here iirc) :) [20:06:38] so it needs a checkbox too? [20:06:39] hm [20:07:18] looks like you need someone running redirect.py (for double and broken redirects) on officewiki and then enabling the feature [20:07:33] hauskatze: When the feature is enabled, and you use Special:MovePage, there will be an option that queues a job to update any incoming redirects. The option is enabled by default, much ilke e.g. move talk page and move subpages. [20:08:09] One can run maintenance/fixDoubleRedirects.php to fix older ones. [20:08:46] Although yeah, there's also a pywikibot script one can use. bd8.08 did that already meanwhile I believe. [20:09:23] I have MABot running double/broken redirects on several wikis; wikitech one of them [20:09:33] not officewiki though heh [20:11:09] I'll wait for one or two more +1s before scheduling the patch for swat if that's okay [20:33:02] sure, thanks hauskatze ! [20:34:17] graag gedaan :)