[08:34:59] pfischer: do you have any updates to add to https://etherpad.wikimedia.org/p/search-standup ? [09:27:04] inflatador, ryankemper: we have a ticket for the W[CD]QS migration to bullseye (T343124). Could you use it to track that work instead of the epic (T323921) ? [09:27:05] T323921: [Epic] Migrate all Search Platform servers to Debian Bullseye - https://phabricator.wikimedia.org/T323921 [09:27:05] T343124: Migrate WDQS and WCQS servers to Debian Bullseye - https://phabricator.wikimedia.org/T343124 [09:33:46] ebernhardson: not clear from the ticket, do we now use the latest release from mw-cli - T333183 [09:33:47] T333183: Migrate cindy-the-browser-test-bot to a docker based runner - https://phabricator.wikimedia.org/T333183 [09:36:44] weekly news published to https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2023-08-11 [10:20:18] lunch [12:25:39] dcausse: I implemented the duplicates metering but I’m not sure about the structure of the metrics array. Would you have a minute before you leave? [12:27:23] pfischer: sure, 4pm would that work for you? [12:58:30] I've created a page for the search update pipeline, with a section for decision records. We might want to move to a sub page at some point, but at least we now have a place to record decisions! [13:06:08] dcausse: yes, thanks! [13:10:48] gehel Y will do [13:14:04] re: using the correct phab ticket, that is [13:14:33] inflatador: thanks! [13:21:37] o/ [13:56:32] I’m trying to access our cloud VPS instance that hosts Cindy but I’m not able to SSH into it. I followed https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances but I might not be “a member of a Cloud VPS project”. How can I beome one? [13:58:32] pfischer: which project [14:01:53] I’m trying to access cirrus-integ03.search.eqiad1.wikimedia.cloud, I don’t know the name of the project. [14:02:32] it's "search", looking [14:03:54] pfischer: dcausse can add you. Member means you can manage via horizon. Viewer/Reader is ssh only. [14:04:19] You can see so on https://openstack-browser.toolforge.org/project/search [14:04:34] RhinosF1: thanks. 👀 [14:04:45] and wikimedia.cloud vm domains are always vm-name.project.region.wikimedia.cloud [14:05:13] the open stack region always being eqiad1 as the codfw one is only for WMCS testing [14:06:53] pfischer: could you try again? [14:08:17] I still get an error: pfischer@bastion.wmcloud.org: Permission denied (publickey). [14:08:17] kex_exchange_identification: Connection closed by remote host [14:08:17] Connection closed by UNKNOWN port 65535 [14:09:11] pfischer: are you using your cloud key (should be the same that you use to access gerrit)? [14:09:27] pfischer: what is your ssh config [14:09:42] pfischer I'm available if you need further assistance [14:09:52] You should have the key in your .ssh/config file not using -i [14:09:56] As it won't work [14:10:08] (I found that out after banging my head on my sofa for a bit) [14:10:40] you should have an entry with smth like "Host *.wmflabs.org gitlab.wikimedia.org gerrit.wikimedia.org *.wmflabs *.wikimedia.cloud *.wmcloud.org" in your .ssh/config that points to your gerrit ssh key [14:11:11] You will need a bastion configured for wikimedia.cloud too [14:11:12] you can remove wmflabs* hosts suffixes, they should not longer be used [14:11:14] I uploaded my public key in my wiki tech preferences (https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances#Permission_denied_(publickey)) and I configured ~/.ssh/config according to https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances#Setup [14:11:48] pfischer: can you paste your config in a phab paste [14:15:01] RhinosF1: https://phabricator.wikimedia.org/P50497 [14:15:33] pfischer: I can't see that [14:16:07] Removed NDA tag, now? [14:16:41] pfischer: nope, try editing the view policy [14:17:32] pfischer: is id_wmf_sre_pro the same key you use for gerrit? [14:18:27] https://phabricator.wikimedia.org/P50498 [14:19:10] dcausse: no [14:19:38] pfischer: this should be the same, lemme paste my config [14:20:14] pfischer: your key for WMCS must be different to prod [14:20:41] Heads-up: I'm reimaging wdqs2008 and 2009 [14:21:40] RhinosF1: I uploaded my dev key (same as for gitlab/gerrit) to Special:Preferences and referenced it as IdentityFile. Still same error. [14:22:19] pfischer: P50499 (note the two keys ~/.ssh/id_rsa_prod vs ~/.ssh/id_lab) [14:24:11] dcausse: Thanks! It worked now. Maybe some kind of delay after uploading my dev key. No I’m able to log in. RhinosF1: Thank you, too! [14:24:38] it should be instant [14:24:40] but maybe [14:26:11] Hm, I had both keys uploaded and only worked after I deleted the prod one. [14:26:56] pfischer: did you have your prod key uploaded to wikitech? [14:27:32] Yeah, I overlooked the note saying not to do that, sorry for that. [14:27:50] pfischer: please change it [14:28:44] Alright, I’ll create a request. [14:28:50] pfischer: filing task now [14:29:25] https://phabricator.wikimedia.org/T344059 [14:30:15] Just so I know better: Why is the public key problematic? [14:32:13] pfischer: i assume security reasons as WMCS is not a trusted system. It is something you agreed to when signing L3 [14:33:11] according to policy it should be treated as compromised if it's used anywhere outside of WMF Prod and replaced or removed ASAP [14:33:43] Alright, thank you. Sorry for the inconvenience. [14:34:13] pfischer: please can you paste a new key on task asap and we can get someone from ops to change it [14:34:26] or say so we can have your access suspended until you can [14:34:32] RhinosF1 I can do it if you walk me thru it [14:34:38] Sure, one sec. [14:34:46] I assume it's just a puppet change [14:35:30] inflatador: yes, update on data.yaml [14:35:57] if the key is on task, i can do the patch to merge [14:37:43] and you just +2 [14:39:00] It’s on the task [14:39:35] pfischer: do you still want to meet regarding commons dups metrics? [14:39:47] inflatador: please +2 https://gerrit.wikimedia.org/r/c/operations/puppet/+/948111 [14:40:12] and obviously merge / puppet-merge [14:41:09] RhinosF1 on it. As far as the WMCS key, do I need to help with that? [14:41:31] inflatador: nope, only prod needs replacing [14:41:38] and wmcs is self service anyway [14:41:59] gehel: yes adam released a new version of mwcli, and our patch is included in the release notes. So the latest version shouldwork [14:42:39] RhinosF1 OK, change is puppet-merged. pfischer if you're about to login to any hosts let me know. Puppet runs every 15m but I can run it manually if need be [14:43:21] pfischer: please test access too to make sure still working [14:49:15] Rhinos: pfischer@bast3006.wikimedia.org: Permission denied (publickey,keyboard-interactive). [14:53:04] pfischer: can you try again in like 20 minutes (or make dcausse run puppet on bast3006 and a server you will be sshing into) [14:53:24] pfischer: actually, try another bast [14:53:32] I don't have super powers :P [14:53:36] not in esams [14:53:45] i meant inflatador [14:54:34] bast3006/7 are broken for the esams -> knams move per -sre [14:54:53] 08:49:36 fyi, we're going to depool esams in 10-ish minutes, that will be the final depool before the migration [14:54:53] 08:51:23 please don't forget to not use bast3006/3007 anymore, next working one will be 3001 or 3008 I guess [14:56:39] thanks [14:58:24] RhinosF1: hostnames bast3001… and bast3008… cannot be resolved [14:58:43] pfischer: next working one, please use a different DC for the next DC [14:58:51] drmrs might be your closest [14:59:02] Esams is gone for the next week [14:59:37] 6002 might work [15:00:10] yes, 6002 for europe [15:06:01] RhinosF1: Still no luck with bast6002, error remains the same: pfischer@bast6002.wikimedia.org: Permission denied (publickey,keyboard-interactive). [15:06:40] pfischer: should be fully deployed within the next 5 minutes [15:06:50] so wait another 5 [15:06:56] Okay, I’ll wait. [15:07:11] if not then i will find an sre to help [15:07:44] RhinosF1: Thank you for dealing with that so quickly! [15:07:58] ebernhardson: I was expecting that we use mwcli to run Cindy, and that we were running a locally patched version waiting for the change to be merged in mwcli. And so I was expecting a CR to change that now that there is a new mwcli version. [15:11:28] pfischer: try now? [15:15:46] RhinosF1: Done, commented on the task. [15:16:12] pfischer: \o/ [15:16:29] thank you for the co-operating [15:16:36] and inflatador thank you for pushing buttons [15:16:53] 'Tis my greatest skill ;P [15:20:18] got to go [15:20:50] pfischer: I commented as much as I can on your patch regarding metrics but please check with Erik if all this makes sense [15:21:28] have a nice week end [15:27:27] Thank you dcausse: You too! [15:40:37] time to start the weekend! [15:48:02] lucky you ;) [15:48:08] workout, back in ~40 [16:32:07] back [16:41:40] pfischer no hurry, but https://phabricator.wikimedia.org/T341625 has a few more questions, LMK if I can help answer [17:27:07] curious, the wikidata ttl imports were world readable prior to the july 23 import [17:27:21] but that last few have all been only user and group readable [17:56:14] something's still stopping the wdqs updater from coming up automatically after data xfer...hmm [18:12:55] :S [18:13:47] i can poke it a bit if you need [18:34:33] I got it, everything looks good except wdqs2008 isn't showing up on the lag dashboard [18:34:46] heading to lunch, but will take a look when I get back [19:11:30] I feel like there is something I'm not understanding about hdfs permissions. core-site.xml sets fs.permissions.umask-mode to 027. That comes from profile::hadoop::common in puppet and hasn't changed in years [19:11:50] but many of the datasets we create in hadoop clearly have world-readable permissions set. But for some reason the ttl imports do not [19:12:15] i tested and i can set an explicit umask in spark, so we can get what we need. But it leaves the question, why aren't our permissions consistent? [19:26:45] back