[06:27:31] hello folks! [06:27:38] so https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/923376 should do the trick with changeprop [07:04:31] (03CR) 10Kevin Bazira: [C: 03+1] test_liftwing.py: simplify decorator test [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923339 (owner: 10Elukey) [07:21:39] Good morning! [07:28:51] kalimera :) [07:38:34] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "Thanks for fixing this!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/922809 (https://phabricator.wikimedia.org/T337287) (owner: 10Elukey) [07:39:16] elukey: sorry about that test , it probably got messed up between some changes. thanks for fixing it! [07:43:06] isaranto: no problem! It helped me understanding the code more :) [07:43:24] I can do that more often then :P [07:43:25] (03CR) 10Elukey: [C: 03+2] ores-legacy: simplify test in test_liftwing.py [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/922809 (https://phabricator.wikimedia.org/T337287) (owner: 10Elukey) [07:44:55] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "Same here, thanks for spotting this!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923339 (owner: 10Elukey) [07:45:32] isaranto: I think that it is only a matter of having CI running the tests, if it becomes too cumbersome to add them to the current settings we can also think about moving ores-legacy to separate repo [07:46:16] (03Merged) 10jenkins-bot: ores-legacy: simplify test in test_liftwing.py [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/922809 (https://phabricator.wikimedia.org/T337287) (owner: 10Elukey) [07:49:20] (03CR) 10Elukey: [C: 03+2] test_liftwing.py: simplify decorator test [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923339 (owner: 10Elukey) [07:50:08] def agree, I'll work on that prob next week [07:50:30] (03Merged) 10jenkins-bot: test_liftwing.py: simplify decorator test [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923339 (owner: 10Elukey) [07:50:47] I should have some branch when I worked on it somewhere... [08:16:19] isaranto: I updated https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/922583 with a test that I have been thinking recently, namely try to have different limit ranges for the various namespaces [08:16:29] the idea is to increase only where needed, so we'll have more control [08:16:36] it should work, let's see if CI agrees [08:17:08] I have to do to a doc appt in a bit, I'll merge it later (or ping Tobias when he'll be online for merge + admin_ng deployment :) [08:17:41] ah nice CI failure [08:17:42] sigh [08:17:45] Ack [08:18:22] Don't work about it Luca. I'll take a look [08:18:40] Talk2u later [08:34:32] Mornin' [08:35:08] I mentioned it before on an SRE channel, but it might be useful for non-SREs as well (maybe even more so): https://github.com/tummychow/git-absorb [08:58:50] elukey: o/ a naive q - I don't understand {{ $v | quote }}. Is it to add quotes to the value? why do we need quotes? [09:02:24] 10Machine-Learning-Team, 10MinT: Shut down and deconfigure NLLB setup on AWS - https://phabricator.wikimedia.org/T337369 (10klausman) After examining the setup some more, I figure I can delete the images on S3 as well, they are easy enough to reproduce with the docs I have. So this has been completed. [09:06:36] aiko: o/ yeah I know it is weird, but changeprop gets really upset if the config values are not quoted, and the helm function `toYaml` (that simply renders a block of yaml, would be perfect for us) doesn't add quotes [09:07:00] so we need to render values in that way [09:08:33] It's a consequence of YAML being permissive with quotes, but not all YAML consumers (changeprop in this case) being tolerant of it. in JSON, it would be unambiguous. [09:09:00] Such is life with data formats :) [09:10:35] I was puzzled that the toYaml function didn't include the possibility to add quotes [09:10:45] It's a bit odd, yes [09:14:31] aiko: no more errors in changeprop staging! [09:14:43] let's see if it works now [09:17:34] elukey: I found a bug on the Helm GH about the quoting problem. It's a messy matter [09:18:25] https://github.com/helm/helm/issues/4262 [09:24:06] yep I recall that I stumbled upon the same issue when we had to add the "meta" keyword :( [10:09:04] elukey: no, still not getting request :( [10:10:09] aiko: ack, it seems that the msg that you are sending is dropped [10:10:29] {"name":"changeprop-staging","hostname":"changeprop-staging-6bb85fd499-bt282","pid":141,"level":"DEBUG","message":"Dropping event message","event_str":"{\"changelog_kind\":\"update\",\"page_change_kind\":\"edit\",\"dt\":\"2023-05-17T00:55:12Z\",\"wiki_id\":\"enwiki\",\"page\":{\"page_id\":936177 [10:10:35] this is the event right? [10:11:09] yep [10:11:48] why it is dropped? [10:11:56] aiko: can I see the event? Where do you have it? [10:12:29] it doesn't say but when it drops usually it is due to the match rule [10:12:42] elukey: ok wait a sec [10:13:28] https://phabricator.wikimedia.org/P48594 [10:16:40] aiko: just to discard weird corner cases, could you try with "namespace_id":"0","is_redirect":"false" in the event? [10:16:49] just to see if we have a matching problem with quotes [10:18:16] okok let me try [10:20:32] yes now outlink got the event [10:20:52] ouch [10:20:53] but EventGate rejected the generated event [10:21:15] ok so this complicates things in changeprop [10:21:31] since we shouldn't quote for bools and integers [10:22:05] 🫠 [10:22:15] aiko: any log that indicates why it was rejected? [10:23:59] not sure.. I saw RuntimeError: The event posted to EventGate has been rejected, please contact the ML team if the issue persists. [10:24:04] and aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url=URL('http://eventgate-main.discovery.wmnet:4480/v1/events') [10:24:34] aiko: ack, the error should be in the response's body, I guess that we don't log it right? [10:26:17] I'm thinking maybe it is because we used "0" and "false"? the generated event also carried them, so they are wrong type [10:27:18] probably, but we should have some extra logging in our code [10:27:55] agree [10:29:24] I'll work on that [10:29:27] <3 [10:29:45] isaranto: I haven't forget about your patch, still trying to make the limitranges to work :D [10:36:24] all good! I'm working on other stuff [10:36:44] I'm checking how I can run unit tests but only for the images that do have unit tests [10:58:07] ok the changes have been filed, waiting for serviceops' review since I changed some ruby code in CI (I hate ruby) [10:58:12] going afk for lunch, ttl! [11:18:17] * isaranto lunch! [12:51:11] (03PS1) 10Ilias Sarantopoulos: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 [12:52:23] (03CR) 10Ilias Sarantopoulos: ores-legacy: run tests in ci (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [12:52:40] (03CR) 10CI reject: [V: 04-1] ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [12:56:23] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399 (10daniel) Quick recap on what LinkTarget does... MediaWiki distinguishes between several differ... [13:05:03] (03PS2) 10Ilias Sarantopoulos: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 [13:05:38] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399 (10Ottomata) Aye, thanks Daniel! This is great. Similar to but with more detail to [[ https://pha... [13:12:57] 10Machine-Learning-Team, 10API Platform, 10Anti-Harassment, 10Cloud-Services, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10hashar) [13:21:12] aiko elukey ...https vs http? [13:25:23] (03PS3) 10Ilias Sarantopoulos: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 [13:26:34] ottomata: nono we need http, the istio proxy (localhost mesh) catches it and makes a http call [13:27:03] the istio proxy mesh works differently from the service ops one, it is transparent [13:27:23] we have special rules to have http in localhost and then https to the real endpoint [13:27:39] so that the istio proxy can publish http metrics (otherwise it would only publish tls ones) [13:27:46] it is a little bit weird I know [13:27:53] but kserve works with knative and istio mesh sigh [13:35:28] oh right [13:35:34] oh wow [13:35:41] cool, so the app doesn't have to know about the localhost proxy? [13:35:43] that's cool [13:36:08] i guess it has to know not to use https though [13:36:10] but still cool [13:36:44] elukey: aiko, fwiw, you can get find validation errors in https://logstash.wikimedia.org/app/discover#/view/AXMlVWkuMQ_08tQas2Xi?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-1h,mode:quick,to:now))&_a=h@c2dd4ef [13:37:33] and also in the (eqiad|codfw).eventgate-main.error.validation topic [13:37:36] (and hive table) [13:39:09] like this one! [13:39:09] https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-default-1-7.0.0-1-2023.05.26?id=IzaRV4gB_1-qnHEY85oZ [13:39:42] also, just checking that you saw https://phabricator.wikimedia.org/T328899#8881023 [13:39:58] we should proably do taht asap if you are going to start producing to this stream? fine to wait a bit if you want to experiment first? [13:43:27] I'll let aiko to comment :) [13:43:46] thanks for the links! adding them to the docs [13:45:30] {{done}} [13:56:25] aiko: so https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/923611 should fix the changeprop issue but the fix is not great [13:56:34] not really proud, but cannot think about other solutions [13:56:40] from the diff it works fine [14:05:41] (03CR) 10Elukey: ores-legacy: run tests in ci (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [14:12:57] (03CR) 10Elukey: ores-legacy: run tests in ci (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [14:30:02] (03PS4) 10Ilias Sarantopoulos: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 [14:31:40] (03CR) 10Ilias Sarantopoulos: ores-legacy: run tests in ci (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [14:44:57] ottomata: thanks for the links for logstash, very helpful! and I'll send a patch to version the stream [14:48:23] * isaranto argh [14:48:49] * isaranto ranting about directories [14:49:57] (03PS5) 10Ilias Sarantopoulos: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 [14:53:29] (03PS1) 10Ilias Sarantopoulos: ores-legacy: rename project to match the agreed name [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923621 [14:55:13] I'm renaming everything named as ores-migration to ores-legacy to avoid any confusion [15:01:41] (03CR) 10CI reject: [V: 04-1] ores-legacy: rename project to match the agreed name [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923621 (owner: 10Ilias Sarantopoulos) [15:04:35] (03CR) 10Ilias Sarantopoulos: "We need to w8 for the patch in integration config to be deployed in order for CI to pass" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923621 (owner: 10Ilias Sarantopoulos) [15:13:36] (03CR) 10Ilias Sarantopoulos: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923621 (owner: 10Ilias Sarantopoulos) [15:27:53] isaranto: you can deploy 3b :) [15:28:02] thanks! [15:28:48] if it works with the first go I'm coming over to buy u a beer! [15:29:24] chances are it won't so I'm on the safe side :) [15:30:27] :D [15:34:43] (03CR) 10Ilias Sarantopoulos: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923621 (owner: 10Ilias Sarantopoulos) [15:35:32] aiko: new changeprop deployed, let's see if it works now! [15:36:18] going afk for today folks, have a good one! [15:37:34] \o Also: Monday is a holdiay here (Pentecost), so see yall on Tuesday [15:37:55] isaranto: ah snap I forgot to update the namespace config [15:38:00] and for some reason, no diff is shown [15:38:09] aa ok [15:38:30] cause I saw the diff only on the infservices [15:38:36] but when I synced nothing happened [15:42:05] since I don't know exactly what's missing we can continue on monday [15:47:05] tried to raise it manually [15:47:16] there is an issue with our config and the admin_ng templates [15:49:13] let's restart on monday :) [15:52:32] cool! have a great weekend! [15:53:46] as I said since I haven't actually tested it I dont think it would work anywayz [15:54:08] but ok I still owe the beer though :) [15:54:21] logging off as well folks, cu! [16:06:45] (03PS2) 10Ilias Sarantopoulos: ores-legacy: rename project to match the agreed name [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923621 [16:08:27] (03PS3) 10Ilias Sarantopoulos: ores-legacy: rename project to match the agreed name [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923621 [16:14:37] (03CR) 10CI reject: [V: 04-1] ores-legacy: rename project to match the agreed name [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923621 (owner: 10Ilias Sarantopoulos) [16:16:40] elukey: the workaround works! no more .page.is_redirect' and '.page.namespace_id' validation errors [16:17:17] but still have validation errors for '.prior_state.revision' and '.revision' which I don't know why it happen.. I need to figure it out [16:18:44] https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-default-1-7.0.0-1-2023.05.26?id=YczRWIgB_1-qnHEYRsWd