[07:46:46] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade Fastnetmon to 1.2.0 - https://phabricator.wikimedia.org/T271228 (10ayounsi) It's back! https://github.com/pavel-odintsov/fastnetmon/releases/tag/v1.2.0 :) [08:31:44] 10netops, 10Infrastructure-Foundations: Finalise design extentison of WMCS networks to new cloudsw in Eqiad rows E/F - https://phabricator.wikimedia.org/T304989 (10Aklapper) [08:40:52] 10netops, 10Infrastructure-Foundations: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F - https://phabricator.wikimedia.org/T304989 (10Aklapper) [09:02:36] 10netops, 10Infrastructure-Foundations: Ganeti hosts use analytics vlan as v6 getaway - https://phabricator.wikimedia.org/T305034 (10ayounsi) p:05Triage→03Medium [09:16:07] 10netops, 10Infrastructure-Foundations: Ganeti hosts use analytics vlan as v6 getaway - https://phabricator.wikimedia.org/T305034 (10ayounsi) I couldn't find any mention of `accept_ra` in Puppet or cookbooks. Some more digging shows that it might have been added manually in T265607#6547365, but maybe the scri... [10:11:38] 10netops, 10Infrastructure-Foundations, 10SRE: Ganeti hosts use analytics vlan as v6 getaway - https://phabricator.wikimedia.org/T305034 (10ayounsi) p:05Medium→03Low a:03MoritzMuehlenhoff After chatting with Moritz I pushed a manual fix and confirmed that the route was gone after the expiring timer. T... [11:30:42] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Review filtering for cloud-hosts on CR routers eqiad - https://phabricator.wikimedia.org/T285461 (10ayounsi) 05Open→03Resolved a:03ayounsi All done here! [11:39:42] 10netops, 10Infrastructure-Foundations, 10SRE: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F - https://phabricator.wikimedia.org/T304989 (10cmooney) [11:46:23] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Patch-For-Review, and 2 others: Create or modify an existing tool that quickly shows the db replication status in case of master failure - https://phabricator.wikimedia.org/T281249 (10Ladsgroup) I'm not following what you mean by USAGE 😅 Can you elaborate? [11:48:40] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Patch-For-Review, and 2 others: Create or modify an existing tool that quickly shows the db replication status in case of master failure - https://phabricator.wikimedia.org/T281249 (10Marostegui) A script usage to show when the script is executed without... [11:59:24] hi, I'm attempting to select all hosts that have o11y as a contact via cumin. My understanding is that I should be doing sth like P:contacts%role_contacts = "Observability SREs" which doesn't work at least because role_contacts is a list I believe? is what I'm trying to do achievable atm? [11:59:53] or perhaps query for the variable itself ? i.e. profile::contacts::role_contacts [12:00:29] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Patch-For-Review, and 2 others: Create or modify an existing tool that quickly shows the db replication status in case of master failure - https://phabricator.wikimedia.org/T281249 (10Ladsgroup) I see. Done now. Can you take a look? [12:03:44] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Patch-For-Review, and 2 others: Create or modify an existing tool that quickly shows the db replication status in case of master failure - https://phabricator.wikimedia.org/T281249 (10Marostegui) Btw confirmed it works fine when the master is dead: ` Orde... [12:07:00] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Patch-For-Review, and 2 others: Create or modify an existing tool that quickly shows the db replication status in case of master failure - https://phabricator.wikimedia.org/T281249 (10Ladsgroup) 05Open→03Resolved Let's call this done. I'll pick up {T1... [12:07:39] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Patch-For-Review, and 2 others: Create or modify an existing tool that quickly shows the db replication status in case of master failure - https://phabricator.wikimedia.org/T281249 (10Marostegui) Many thanks for working on this! <3 [12:19:45] godog: eh, with variables that are arrays is a bit of a mess, I did run a workaround, do you have task where I should paste the result? [12:20:16] I did run cumin -x '*' 'grep "Observability SREs" /etc/wikimedia/contacts.yaml' instead, I'll check if there is a way to get it purely by query too [12:20:54] volans: thank you that's good enough, I don't have a task but a phaste will work fine [12:22:22] godog: https://phabricator.wikimedia.org/P23803 [12:23:30] volans: thanks! super useful [12:23:42] anytime [12:28:26] 10netbox, 10Infrastructure-Foundations: Agree how to document intra-DC patch panels in Netbox - https://phabricator.wikimedia.org/T293221 (10ayounsi) Maybe a niptic, and let me know if I'm mistaken, but what we need to document are the circuits/links/x-connects usage between the two cages (as we have a limited... [12:38:33] and btw, although puppetdb docs says "Arrays match if any one of their elements matches.", it doesn't seem to be true IRL [12:38:39] or I'm misreading the docs [12:40:00] I didn't read the docs but was kinda hoping what I tried worked out of the box / as expected [12:40:05] narrator: it did not [12:40:18] on the puppetdb side that is [12:40:22] with a puppetdb query of ["=", ["parameter", "role_contacts"], ["Observability SREs"]] (note the square brackets around "Observability SREs") it does match 69 hosts, those that have only o11y as contact [12:40:46] /o [12:40:52] /o\ so subtly wrong [12:42:15] the 2 netmon hosts hav also IF SRE [12:42:17] and those are not matched [13:21:10] godog: fwiw I've updated the paste with a way to get the same data via puppetdb api + jq only [13:21:22] not ideal, but better than ssh-ing everywhere :) [13:22:50] volans: thank you, yeah that's far better for sure, I'll bookmark the phaste/commandline [13:23:12] I'm always a little scared of cumin '*' [13:23:20] for good reasons :) [13:23:42] it's sad that the '=' doesn't work for arrays like the docs seems to say it should [13:25:44] volans: I think there is an "in" operator in the mini-language [13:26:38] https://puppet.com/docs/puppetdb/5.2/api/query/tutorial.html the bottom section [13:27:08] cdanis: yes, I've tried [13:27:10] and failed [13:27:12] ahah [13:27:16] fair enough [13:27:19] it can totally be me [13:27:35] there is also the ~> operator but pnly for paths AFAIUI [13:29:33] consistent (hah!) with "there's more than one way to do it" [13:33:58] there always more than one wrong way to do it... the problemm is to find the right one usually :D [16:07:30] 10SRE-tools, 10Infrastructure-Foundations, 10Pybal, 10SRE, and 2 others: Applications and scripts need to be able to understand the pooled status of servers in our load balancers. - https://phabricator.wikimedia.org/T239392 (10Lydia_Pintscher) [16:51:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10Patch-For-Review: drmrs: initial geodns configuration - https://phabricator.wikimedia.org/T304089 (10BBlack) For esams failover testing: we're planning to attempt this on Thursday. The idea is to merge the oustanding patches and then depool esa... [17:02:25] Having trouble connecting to the DRAC SSH interface on the majority of wqds hosts in CODFW. Mostly getting permission denied, even when I disable key-based auth...any suggestions? [17:04:32] One of them has the kex error documented at https://wikitech.wikimedia.org/wiki/Management_Interfaces but most don't [17:04:37] wfm, I took wdqs2007 as a test [17:05:04] above you said 'wqds', is that a typo only on IRC? [17:05:21] sorry, it's wdqs...and wdqs2007 is the only one that seems to work ;P [17:05:31] give me one that doesn't [17:05:37] wdqs2001 [17:05:53] I'm in [17:05:59] /admin1-> [17:06:15] did you login as root? [17:06:28] yes, ofc, root@wdqs2001.mgmt.codfw.wmnet [17:06:33] I'd say check your ssh config [17:07:02] will do, which bastion are you using? [17:07:16] the local one, so for codfw the 2* one [17:07:23] 2002 [17:08:28] interesting, that's what I'm using too. ryankemper and both ran into this one yesterday [17:09:27] have you seen if ssh -vvv gives you some hint? [17:09:33] ah, I can get a pw prompt from the bastion itself [17:09:49] yeah, -vvv wasn't too helpful [17:10:11] https://www.irccloud.com/pastebin/UwMvpJPW/ [17:10:40] Yeah I get permission denied even though I see it selecting my prod key in the `-vvv` [17:10:45] sorry, I have to step away abruptly... will try to be back later [17:10:53] no worries [17:11:00] #wikimedia-dcops might be a place to ask too if noone else is around here [17:11:13] can you ssh into the bastion themselves? [17:11:34] yeah, I can get into the bastion, and from there I do get a password from from the mgmt interface [17:13:15] Oh interesting if I manually ssh into the bastion and hop over to the mgmt host from there then it works [17:15:06] yeah, I think there are implicit options on the bastion host that override the SSH flags I'm setting on my laptop [17:15:27] (just a theory) [17:17:48] full output of my ssh -vvv at https://phabricator.wikimedia.org/P23863 [17:25:57] maybe because wdqs2007 was reimaged more recently? https://phabricator.wikimedia.org/T281437 . Anyway, not a huge deal if we can get to it via bastion [17:58:05] that's weird, hard to say without looking at the ssh config too