[08:48:23] o/ [08:51:19] the patch for json support in httpbb got merged. I'll give it a try to deploy it if I have permissions according to https://wikitech.wikimedia.org/wiki/Httpbb [09:08:48] woowwww [09:08:52] nice job isaranto <3 [09:09:33] isaranto: mmm I see the old version on deploy1002 though [09:09:42] not sure if Reuven created the new deb [09:10:02] yeah still not on the apt repo [09:10:21] I think that we need to wait for the new version to be cut [09:16:19] yes! I'll try to deploy the new version [09:17:17] w8... the instructions on wikitech don't mention anything about a new repo [09:17:30] isaranto: I think that you can't, an SRE needs to build and upload the new .deb to apt.wikimediaorg [09:17:38] and then install it on deploy1002 [09:18:05] ok then! I'll wait to ask Lazarus how to deploy it then. In the meantime I'm opening the patch with the tests, but I'll wait so that we can test them before it is merged [09:18:10] see https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/httpbb/+/refs/heads/master/debian/changelog [09:20:21] isaranto: maybe to test it you can checkout the repo on stat100x and test it from there with a venv? [09:20:25] to unblock you [09:21:41] sure! [09:21:44] thanks [09:24:16] here is the patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/885990 [09:28:56] (03PS13) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [09:30:19] I also added the script that creates these yaml files for httpbb in the above patch --^ (the production ones is above 600 lines) [09:57:19] 10Machine-Learning-Team: Add basic explainability to WikiGPT - https://phabricator.wikimedia.org/T328638 (10kevinbazira) [09:57:25] super, I'll review everything in a bit [10:17:13] seems like I cant run pip install as the connection is blocked from stat box (even if I disable ssl).. [10:17:31] starting the revscoring deployment with articlequalit on eqiad 🤞 [10:23:06] isaranto: you need https://wikitech.wikimedia.org/wiki/HTTP_proxy [10:24:03] good to know! thanks for providing all the answers Luca! [10:24:04] <3 [10:24:49] <3 [10:25:05] isaranto: I'd need some brain bounce with you about how yaml is nice [10:25:07] do you have a min? [10:25:13] (fine even for this afternoon) [10:25:27] worked like a charm [10:26:06] I am available. give me 3-4' [10:27:38] on IRC is fine I am in a coworking :) [10:27:52] I am discussing this with serviceops as well, basically https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/885991 [10:28:05] I realized that we'd need more flexibility for the changeprop's config [10:28:19] in the mediawiki.page.change schema, the wiki name is under meta.domain [10:28:32] meanwhile it is "database" in mediawiki.revision-create [10:28:45] so to keep everything flexible, I used toYaml [10:28:51] but it doesn't preserve quoting :( [10:29:04] https://github.com/helm/helm/issues/4262 [10:29:42] lemme check to understand [10:43:15] if you check the helm diff, it is a no-op (mostly) except for [10:43:16] 10:52:02 - database: '/.*/' [10:43:16] 10:52:02 + database: /.*/ [10:43:27] that in theory should be ok for our use case [10:43:28] \o [10:43:31] o/ [10:43:37] but I'd have preferred to have quotes [10:43:38] Ah, YAML and quoting is always.... "nice" [10:45:08] elukey: also, your comments re: shoulders were spot on. I've been doing very mild rotation exercises and it helps a lot. [10:45:38] klausman: nice! try also to stretch veeery gently your forearms if it feels ok [10:45:43] it helps a ton to me [10:45:48] aye, will do [11:10:24] I have updated https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/885991 with specific ranges isaranto [11:10:33] I think it is safer for regexes [11:10:42] not super nice to see but safe [11:10:45] *safer [11:13:37] 10Machine-Learning-Team, 10Patch-For-Review: Investigate if the mediawiki.revision-score stream can be broken down into multiple ones with ChangeProp - https://phabricator.wikimedia.org/T327302 (10elukey) While thinking about the documentation I realized that restricting the regex to `database` is not good for... [11:16:31] ok . sry didnt respond, was trying to render the template with helmfile but couldnt (cause of this error `panic: unexpected error: exec: "helmfile_log_sal": executable file not found in $PATH`) [11:16:40] anyway if this works then it is ok with me! [11:17:53] thanks a lot! Sorry for the extra hassle of yaml :( [11:18:07] elukey: perhaps we could use squote (signle quote) function instead of quote for consistency [11:18:16] *single [11:18:36] isaranto: ah yes yes, should be fine with double though! [11:19:06] def it will - just me nitpicking [11:19:11] <3 [11:19:16] ok testing in staging [11:26:40] all works :) [11:30:11] articlequality played great!!! [11:30:37] got back prediction results for all servers 😄 [11:32:16] \o/ [11:34:48] it is soooo temptating to deploy them ALL at once.... [11:34:58] but I will hold myself [11:41:48] yeah please :D [11:42:00] I started https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Streams, lemme know folks if it is a good place [11:42:19] I'll improve it and ask the Research team to proof-read it [11:42:46] Diego suggested a couple of weeks ago to have a task with a template, I'll try to add one as well. [11:43:22] going afk for lunch, bbl! [12:26:16] * klausman lunch, too [13:19:40] (03PS14) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [13:29:50] (03PS15) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [13:30:51] all revscoring model servers have been successfully deployed to python 3.9 + debian bullseye + kserve 0.9 🎉 🎉 🎉 [13:31:35] I tested them with the script in the above patch [13:32:27] where I twicked it a bit - now it works for all wikiz (including wikiquotes, wikibooks, wikidata) + I added some more logging [13:36:47] (03PS16) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [13:53:53] woohoo \o/ [13:57:56] isaranto: nice job!! [13:59:53] teamwork 🎼 [14:00:00] 💪 [14:04:57] isaranto: one nit for the test lifwing code review - do you mind to add a README or similar explaining briefly what the files are for, how to use it, etc..? [14:05:04] (just merged the puppet change) [14:05:18] nooo [14:05:25] I mean for the merge [14:05:31] I wanted to test them first [14:06:01] all cool, we'll see [14:06:09] I'll add the readme u mentioned [14:06:13] ah also snap I see an error that I didn't notice in puppet [14:08:32] https://gerrit.wikimedia.org/r/c/operations/puppet/+/886050/ [14:08:41] even pcc didn't complain, weird [14:11:39] my bad for the ores thingy [14:12:32] the tests are wrong though. the hosts need to be start with https which is enforced from the pattern of httpbb , otherwise it fails to parse [14:13:54] right right, my bad, I haven't coordinated with you [14:14:00] we can fix it later [14:14:07] when the new httpbb is deployed [14:20:12] at the moment I cannot test httpbb from stat4 I get proxy error. trying to find what the proxy for https is [14:20:20] is it the same? [14:21:52] oh , if we prepend the host with https:// the server doesn't respond [14:23:49] isaranto: try "unset https_proxy" [14:24:11] you are trying to contact an internal endpoint with the proxy that should reach outside [14:28:37] you're right I tried it just in case...without the proxy I get ` SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate` [14:29:39] that's definitely weird [14:29:51] ah wait, and if you try with [14:30:18] export REQUESTS_CA_BUNDLE=/etc/ssl/certs/wmf-ca-certificates.crt [14:30:19] ? [14:32:05] 🎉 [14:32:06] httpbb uses requests IIRC, and it may be using the default CA cert that doesn't recognize the inference one [14:32:14] woorked [14:56:30] I fixed the tests for httpbb - https://gerrit.wikimedia.org/r/c/operations/puppet/+/886063 [14:56:52] I ran them all for staging and prod and they worked fine- all assertions passed [14:57:31] i had to manipulate a bit the script that creates them - some rev_ids wouldn't return anything from mwapi. I'll write it in the readme.md i'll create [15:00:40] nice! [15:02:27] merged :) [15:43:25] 10Machine-Learning-Team: Add basic explainability to WikiGPT - https://phabricator.wikimedia.org/T328638 (10kevinbazira) An explainability section has been added to WikiGPT. When a search response is returned there is a section below it that says "How WikiGPT got this answer:" Demo: https://drive.google.com/uc... [15:48:20] * elukey taking a break [15:49:50] (03PS17) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [15:51:39] (03CR) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) (owner: 10Ilias Sarantopoulos) [15:54:54] (03PS18) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [16:04:35] (03PS19) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [16:04:58] Done with the API testing (I think I am at least 😛 ) . I added a Readme with some basic information. ( what they contain, how to run etc) [16:06:04] I think the httpbb ones' can be added to Jenkins at some point. lets monitor it and see how it goes [16:09:08] 10Machine-Learning-Team, 10Patch-For-Review: [revscoring] Upgrade python from 3.7 to 3.9 in docker images - https://phabricator.wikimedia.org/T325657 (10isarantopoulos) All revscoring model servers have been successfully upgraded to Python 3.9.2 and Debian Bullseye. 🎉 As part of this ticket we also solved the... [17:13:55] changeprop deployment done, we are ready to put streams in production as well (at least on paper) [17:14:37] 10Machine-Learning-Team: Investigate if the mediawiki.revision-score stream can be broken down into multiple ones with ChangeProp - https://phabricator.wikimedia.org/T327302 (10elukey) Change-prop updated in production, we are now ready to have streams! [17:30:56] all right logging off, tomorrow if everything goes as planned me and Tobias are going to upgrade the staging cluster to k8s 1.23 [17:31:16] it may not work for some days, hopefully as few as possible, but please be patient folks [17:31:37] if you have any important test/task to do on staging please speak up and we'll reschedule [17:33:23] (cross-posted on slack too [17:33:27] * elukey afk! [20:33:49] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10SRE-tools: httpbb with HTTP POSTs and json payload - https://phabricator.wikimedia.org/T328280 (10RLazarus) 05Open→03Resolved This is deployed! Thanks again for the patch, let me know if you need anything else.