[07:38:52] Morning! [07:39:29] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Research, 10Patch-For-Review: Proposal: deprecate the mediawiki.revision-score stream in favour of more streams like mediawiki-revision-score- - https://phabricator.wikimedia.org/T317768 (10elukey) Hi folks! Getting back to this task so everybo... [07:39:57] o/ [07:54:44] so I checked httpbb and it works in staging with the new TLS cert config (it was showing up the same issue as nodejs..) [07:55:05] but when dealing with POST data, it seems that it accepts only string: string combinations [07:55:08] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/httpbb/+/refs/heads/master/httpbb/parse.py#58 [07:55:18] so if we put rev_id: 12356 it doesn't work :( [07:55:29] (and we return 400 if rev-id is a string) [08:14:52] (will try to see if I can add the use case to it and send a patch to SREs) [08:18:09] nice! could u send me the httpb config so I can check? [08:18:36] for now I think the python script plays nice cause we can also add additional custom cases [08:18:55] I'll upload a draft patch/PR a bit later [08:19:06] yeah but I'd prefer us to use standard SRE tools (deployed everywhere, supported, etc..) [08:19:16] I would prefer not to maintain another tool [08:20:18] isaranto: https://phabricator.wikimedia.org/P43430 this is the config [08:20:55] agree [08:21:08] on the other hand it is not a tool 😛 [08:21:29] just kidding lets go with httpb [08:21:37] shall I work on submitting a patch ? [08:23:44] well it always start as a script and then it becomes a big thing :) [08:23:50] I've seen it happening multiple times [08:23:59] ofc [08:24:24] "this will be just a small thingy"= famous last words [08:24:24] I am already working on it, I should be able to send a patch later on (need to run errand now for an hour, but I hope to have it ready today) [08:24:36] for this round of tests if you have something ready to go let's use it [08:24:45] I was just saying long term [08:24:54] if you have code already working we can definitely use it [08:25:10] in any case I will upload it in the repo to keep it as a reference [08:27:40] * elukey back in an hour [08:51:52] 10Machine-Learning-Team, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [08:52:24] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [09:15:38] elukey: o/ re T317768 did you create new schemas for the new topics? [09:16:31] dcausse: they use the same schema as defined in this patch https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/884155 [09:16:51] since it a separation of the stream, the schema is exactly the same [09:17:01] isaranto: thanks! [09:17:39] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Research, 10Patch-For-Review: Proposal: deprecate the mediawiki.revision-score stream in favour of more streams like mediawiki-revision-score- - https://phabricator.wikimedia.org/T317768 (10dcausse) @elukey thanks for the ping! If the plan to s... [09:48:50] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Jelto) [09:54:36] 10Machine-Learning-Team, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Jelto) [09:56:08] dcausse: exactly yes, as Ilias mentioned! [09:56:21] thanks! :) [09:56:38] ahhh so you use the hive tables! [09:56:56] I think it should be fine, we'll have more specialized tables as well [09:57:15] dcausse: but you don't need multiple model scores for the same rev-id right? [09:58:21] elukey: not yet, would this be an issue? [09:58:52] isaranto: about https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/884155 - EventGate's streams are configured in mediawiki's config, I think it was a convenient central point when DE decided where to put it [09:58:57] (I was puzzled at first) [09:59:40] got it! [09:59:41] this becomes available from https://meta.wikimedia.org/w/api.php?action=streamconfigs&all_settings in the end [10:00:02] dcausse: nono not at all, the only thing that will differ in the future for you is that if you want a score from multiple models for the same rev-id you'll need to query more than one tables. At the moment in revision score ORES returns all the scores for a single rev-id [10:00:59] elukey: oh I see, no for now it's only articletopics and drafttopics but we do a separate pass on those [10:01:58] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10fgiunchedi) [10:04:50] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10MatthewVernon) [10:07:22] dcausse: perfect! [10:13:09] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10elukey) [10:16:19] 10Machine-Learning-Team, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10elukey) [10:18:47] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10fgiunchedi) [10:30:00] 10Machine-Learning-Team, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10fgiunchedi) [10:30:44] 10Machine-Learning-Team, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10fgiunchedi) [10:33:08] 10Machine-Learning-Team, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10MatthewVernon) [11:13:31] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10SRE-tools: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10elukey) [11:16:32] ok created https://gerrit.wikimedia.org/r/c/operations/software/httpbb/+/884285 for httpbb [11:16:37] in theory it should work [11:17:53] looks nice! [11:18:27] httpbb looks nice as well! [11:32:16] it seems I don't offer much of a review other than LGTM but your patches seem solid! [11:34:10] thanks for the review! We'll see what Reuven says, hopefully a new version will be cut soon [11:39:43] * elukey lunch! [12:05:22] (03PS1) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [12:50:48] (03PS2) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [13:13:14] (03PS3) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [13:17:16] (03PS4) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [13:20:45] * isaranto afk lunch [14:36:02] (03PS1) 10Elukey: blubber: install wmf-certificates where missing [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884316 [14:38:18] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "✔" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884316 (owner: 10Elukey) [14:46:48] (03CR) 10Elukey: [C: 03+2] blubber: install wmf-certificates where missing [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884316 (owner: 10Elukey) [14:51:33] (03Merged) 10jenkins-bot: blubber: install wmf-certificates where missing [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884316 (owner: 10Elukey) [15:21:38] ah! I am testing change prop again, I don't see the final event in kafka (generated by liftwing) [15:22:34] 10Machine-Learning-Team, 10ORES, 10MediaWiki-Page-history, 10MediaWiki-Special-pages: Highlight edits' contribution quality predictions on Revision history and User contribution pages - https://phabricator.wikimedia.org/T318372 (10Lectrician1) [15:36:39] mmm I turned on debugging and I see the msg that an event was successfully posted to eventgate [15:36:42] now I am puzzled [15:39:40] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade the ml-staging-codfw cluster to k8s 1.23 - https://phabricator.wikimedia.org/T327767 (10elukey) [15:39:57] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade the ml-staging-codfw cluster to k8s 1.23 - https://phabricator.wikimedia.org/T327767 (10elukey) All prep work done, we should be ready to go! [15:44:18] ok so the events module seems to not work in staging [15:45:24] now I am puzzled as well 😄so it works, but not in staging or it doesnt work? [15:46:05] yeah sorry, I tried manually to trigger an event generation, not via change prop [15:46:19] the code doesn't complain, and log debug says "event sent correctly" [15:46:25] but I don't see the event in the kafka topic [15:46:43] and it worked when I tested locally [16:01:41] mmmmmm [16:02:15] so I don't see any event registered in eventgate-main.error.validation, that is the topic used by eventgate to enqueue events that fail validation [16:05:39] in https://grafana.wikimedia.org/d/zsdYRV7Vk/istio-sidecar?from=now-6h&orgId=1&to=now&var-backend=All&var-cluster=codfw%20prometheus%2Fk8s-mlstaging&var-namespace=revscoring-editquality-goodfaith&var-quantile=0.5&var-quantile=0.95&var-quantile=0.99&var-response_code=All I don't see eventgate-main.discovery.wmnet so something is off [16:07:58] never a joy in this work [16:08:46] may I help in any way? [16:09:55] if you see anything off in the send event code lemme know, I am going to test the docker image from our docker registry and re-test locally [16:10:03] and check with tcpdump on the pod [16:18:44] yes it must be the new code, I don't see the call to eventgate in tcpdump [16:23:18] the code works locally, I re-tested it [16:26:18] so it must be an interaction between the aiohttp client and istio-proxy [16:27:20] all right I think that I am declaring defeat for today [16:27:30] and I'll restart on monday with clear head/mind :D [16:27:40] have a nice weekend folks! [16:27:56] cu monday! [16:38:18] 10Machine-Learning-Team, 10Patch-For-Review: [Liftwing testing] - Post deployment testing - https://phabricator.wikimedia.org/T327787 (10isarantopoulos) In the attached patch I adde a python script that hits all the deployed models in production and staging and verifies that a proper response is returned (200... [18:29:36] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Research, 10Patch-For-Review: Proposal: deprecate the mediawiki.revision-score stream in favour of more streams like mediawiki-revision-score- - https://phabricator.wikimedia.org/T317768 (10Isaac) @dcausse could you loop me in when you start on... [19:34:56] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10SRE-tools: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10RLazarus) [20:38:52] elukey: is this a new stream? [20:39:01] does it have a stream config entry? [20:39:30] https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Deployment (ignore all the eventlogging specific bits) [20:39:52] https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration [21:46:55] oh, you got it. https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/884155 [23:11:19] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10SRE-tools: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10RLazarus)