[08:00:26] o/ going to deploy article-country [08:01:20] staging looks good ... [08:07:13] article-country is up and running in prod: https://phabricator.wikimedia.org/P72170 [08:28:56] good morning o/ [08:40:47] kevinbazira: so the above deployment has the code but not the config required to produce the stream, right? [08:41:21] just asking to make sure I understand the change, I know we have the patches for changeprop pending [08:42:58] isaranto: o/ the deployed model-server can process inputs from the source event stream and send prediction results to the output event stream [08:43:56] ok, thanks! [08:44:47] np! :) [09:24:07] 10Lift-Wing, 06Machine-Learning-Team: Add locust load tests for articletopic-outlink - https://phabricator.wikimedia.org/T384276 (10gkyziridis) 03NEW [10:11:23] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Georgios Kyziridis - https://phabricator.wikimedia.org/T384239#10478744 (10isarantopoulos) [10:11:53] 10Lift-Wing, 06Machine-Learning-Team: Add locust load tests for articletopic-outlink - https://phabricator.wikimedia.org/T384276#10478745 (10isarantopoulos) [10:12:28] (03PS1) 10Gkyziridis: locust; add articletopic_outlink tests. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113093 (https://phabricator.wikimedia.org/T384276) [10:15:31] (03CR) 10Gkyziridis: "I tried to test it locally but I couldn't since my machine cannot send API calls to liftwing staging." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113093 (https://phabricator.wikimedia.org/T384276) (owner: 10Gkyziridis) [10:27:35] (03CR) 10Kevin Bazira: [C:03+1] "I have tested this patch on `stat1008.eqiad.wmnet` and here are the results: https://phabricator.wikimedia.org/P72178" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113093 (https://phabricator.wikimedia.org/T384276) (owner: 10Gkyziridis) [10:51:54] (03PS1) 10Ilias Sarantopoulos: logo-detection: update kserve to 0.14.1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113099 (https://phabricator.wikimedia.org/T367048) [11:16:39] regarding --^ we were just testing sth with George on gerrit and decided to do a kserve upgrade instead of a dummy change [11:17:31] ack! [11:18:05] klausman: o/ whenever you get a mnute please help deploy: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1112126 [11:18:05] not sure I have the rights to deploy `changeprop`. thanks! [11:20:25] * isaranto lunch! [11:37:02] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Georgios Kyziridis - https://phabricator.wikimedia.org/T384239#10479063 (10jcrespo) Indeed, that's documented at... [11:45:08] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Georgios Kyziridis - https://phabricator.wikimedia.org/T384239#10479077 (10jcrespo) [11:47:57] kevinbazira: I'll see what I can do. currently running at about 25% brianpower (see Slack) [11:48:56] actually, let me ask Hugh to do it. I better not mess with production in my current brainstate [11:49:55] klausman: hope you get well soon. the deployment can wait. np! [11:50:42] I poked Hugh about it. Seeing as he already has +1'd it, he might be able to help with deployment [11:53:26] great. thanks! 🙏 [11:56:06] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Georgios Kyziridis - https://phabricator.wikimedia.org/T384239#10479118 (10jcrespo) [11:56:52] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Georgios Kyziridis - https://phabricator.wikimedia.org/T384239#10479119 (10jcrespo) I will be adding now the LDAP... [12:00:05] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Georgios Kyziridis - https://phabricator.wikimedia.org/T384239#10479138 (10jcrespo) >>! In T384239#10478122, @thc... [12:08:58] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Georgios Kyziridis - https://phabricator.wikimedia.org/T384239#10479189 (10jcrespo) WMF LDA group added: https://... [12:28:36] kevinbazira: the push has been done [12:29:18] kevinbazira: danke! [12:31:31] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Update revertrisk to kserve 0.14.1 - https://phabricator.wikimedia.org/T383119#10479232 (10gkyziridis) I reached MunizaA via slack she did not have time to work on this issue. She will get back to me whenever she has time. [12:33:09] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Update articletopic outlink to kserve 0.14.1 - https://phabricator.wikimedia.org/T383312#10479236 (10gkyziridis) These are the `wrk` load results: ` $ wrk -c 1 -t 1 --timeout 3s -s outlink.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/outlink-... [12:58:32] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Georgios Kyziridis - https://phabricator.wikimedia.org/T384239#10479280 (10jcrespo) [13:15:38] 06Machine-Learning-Team: Expose reference quality isvc on API gateway - https://phabricator.wikimedia.org/T378495#10479404 (10achou) API docs are updated: https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_reference_risk_prediction https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Reference_risk_sc... [13:44:24] (03PS2) 10Gkyziridis: locust: add articletopic_outlink tests. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113093 (https://phabricator.wikimedia.org/T384276) [13:48:07] (03PS1) 10Gkyziridis: testing git issue [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113140 [13:49:50] (03PS1) 10Gkyziridis: testing second time git issue [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113142 [13:50:17] (03Abandoned) 10Gkyziridis: testing second time git issue [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113142 (owner: 10Gkyziridis) [13:50:29] (03Abandoned) 10Gkyziridis: testing git issue [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113140 (owner: 10Gkyziridis) [13:51:53] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Georgios Kyziridis - https://phabricator.wikimedia.org/T384239#10479562 (10jcrespo) a:03jc... [14:00:59] 10Lift-Wing, 06Machine-Learning-Team: Create SLO dashboards for reference quality models - https://phabricator.wikimedia.org/T384316 (10isarantopoulos) 03NEW [14:02:52] (03PS3) 10Gkyziridis: locust: add articletopic_outlink tests. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113093 (https://phabricator.wikimedia.org/T384276) [14:21:47] good morning all [14:42:36] hi Chris o/ [14:43:31] 06Machine-Learning-Team, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Georgios Kyziridis - https://phabricator.wikimedia.org/T384239#10479886 (10jcrespo) 05Open→03Resolved Acc... [14:45:55] (03CR) 10Ilias Sarantopoulos: "Before we merge we should also have the stats.csv first from running the load test on statbox" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113093 (https://phabricator.wikimedia.org/T384276) (owner: 10Gkyziridis) [14:46:16] georgekyz: can you test ssh access now? [14:50:41] isaranto: still not working [14:50:41] ``` [14:50:41] ssh deployment.eqiad.wmnet [14:50:41] gkyziridis@bast3007.wikimedia.org: Permission denied (publickey). [14:50:41] Connection closed by UNKNOWN port 65535 [14:50:41] ``` [14:51:17] ok lets wait as it needs some time [14:51:29] o/ forcing puppet on the bastion3007 [14:51:58] georgekyz: can you confim with ssh -vvv etc.. that it is trying to use the right key? [14:52:20] just to remove any misconfig concerns [14:52:29] https://www.irccloud.com/pastebin/oeAPcNt7/ [14:54:03] Notice: /Stage[main]/Admin/Admin::Hashuser[gkyziridis]/Admin::User[gkyziridis]/User[gkyziridis]/ensure: created [14:54:19] okok so puppet ran on bast3007 [14:54:38] this is my config file: [14:54:38] ``` [14:54:38] # Production access [14:54:38] # https://wikitech.wikimedia.org/wiki/SRE/Production_access#Setting_up_your_access [14:54:52] also running it on deploy2002 so we verify that it works [14:56:59] (it is taking a bit) [14:57:40] we will be in a meeting for the next hour so no rush [14:58:06] can you retry now before the meeting? [14:58:22] if you join late you can blame me [14:58:32] (blame Luca, they'll know) [14:59:02] (03PS4) 10Gkyziridis: locust: add articletopic_outlink tests. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1113093 (https://phabricator.wikimedia.org/T384276) [15:01:34] still not working: [15:01:34] ``` [15:01:34] ssh -vvv deployment.eqiad.wmnet [15:01:34] OpenSSH_9.8p1, LibreSSL 3.3.6 [15:01:34] debug1: Reading configuration data /Users/george/.ssh/config [15:06:05] mmm weird, I'll need the full output [15:13:21] 10Lift-Wing, 06Machine-Learning-Team: Create SLO dashboards for reference quality models - https://phabricator.wikimedia.org/T384316#10480061 (10isarantopoulos) p:05Triage→03Medium [15:20:35] 06Machine-Learning-Team: Issues with Reference Need and Reference Risk models - https://phabricator.wikimedia.org/T384172#10480112 (10isarantopoulos) a:03achou [15:32:17] 10Lift-Wing, 06Machine-Learning-Team: Requesting write access to ml-serve-{eqiad,codfq} for ML team - https://phabricator.wikimedia.org/T381883#10480179 (10isarantopoulos) p:05Triage→03Medium [15:32:56] 06Machine-Learning-Team: Issues with Reference Need and Reference Risk models - https://phabricator.wikimedia.org/T384172#10480191 (10isarantopoulos) p:05Triage→03High [15:34:05] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Build and Publish ROCm-Compatible Python Packages - https://phabricator.wikimedia.org/T381859#10480198 (10isarantopoulos) a:03isarantopoulos [15:35:29] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work, 13Patch-For-Review: Create event stream for article-country model-server hosted on LiftWing - https://phabricator.wikimedia.org/T382295#10480209 (10isarantopoulos) [15:36:01] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work, 13Patch-For-Review: Create event stream for article-country model-server hosted on LiftWing - https://phabricator.wikimedia.org/T382295#10480212 (10isarantopoulos) p:05Triage→03Medium [15:38:38] 10Lift-Wing, 06Machine-Learning-Team: [onboarding] Update articletopic outlink to kserve 0.14.1 - https://phabricator.wikimedia.org/T383312#10480229 (10isarantopoulos) 05Open→03Resolved [15:39:13] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Add locust load tests for articletopic-outlink - https://phabricator.wikimedia.org/T384276#10480234 (10isarantopoulos) p:05Triage→03Medium [15:46:54] 06Machine-Learning-Team: [LLM] ML-lab benchmarking - https://phabricator.wikimedia.org/T382343#10480264 (10isarantopoulos) p:05Triage→03Medium [15:49:42] 06Machine-Learning-Team: Expose reference quality isvc on API gateway - https://phabricator.wikimedia.org/T378495#10480299 (10isarantopoulos) 05Open→03Resolved [15:51:18] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Request to host article-country model on Lift Wing - https://phabricator.wikimedia.org/T371897#10480312 (10isarantopoulos) 05Open→03Resolved [17:14:55] elukey: thank you for the help <3 [17:15:33] I think George has access now [17:15:52] super :)