[07:24:52] good morning folks! [07:25:04] just realized (with fresh eyes) that my pull request for mwapi was garbage :D [07:25:07] closed it [08:02:08] 10Lift-Wing, 10Machine-Learning-Team: No healthy upstream and upstream connect error in Lift Wing - https://phabricator.wikimedia.org/T322196 (10elukey) Nice write-up about a similar problem: https://karlstoney.com/2019/05/31/istio-503s-ucs-and-tcp-fun-times/ [08:09:31] * elukey commutes to the co-working (a local bar nothing fancy :D) [08:36:51] going to roll restart ores for security upgrades [09:06:25] eqiad done, proceeding with codfw [09:19:06] morning! :) [09:23:30] morning! [09:26:43] elukey: I was looking at goodfaith p99 latency https://grafana.wikimedia.org/d/zsdYRV7Vk/istio-sidecar?from=now-1h&orgId=1&to=now&var-backend=All&var-cluster=codfw%20prometheus%2Fk8s-mlserve&var-namespace=revscoring-editquality-goodfaith&var-quantile=0.5&var-quantile=0.95&var-quantile=0.99&var-response_code=200&viewPanel=18 it looks like enwiki is way worse than other wikis [09:31:11] aiko: yeah I think it is due to the bigger chunk of traffic that it manages (most of the events in revision-create are enwiki) [09:31:40] I'll try to run other tests with MP and see if it improves [09:31:47] what do you think? [09:32:10] elukey: yeah makes sense. also saw dewiki, wikidata, frwiki, eswiki,.. these are all large and very active wikis [09:32:42] yeah let's try it [09:57:10] 10Machine-Learning-Team: Retrain fawiki articlequality model - https://phabricator.wikimedia.org/T317531 (10achou) @kevinbazira thanks for working on this. I can see that the quality predictions of the new model remain at the C level for both revisions, indicating that the model takes into account both ref tags... [10:15:39] aiko: restarted the ml-serve-codfw pod with MP enabled (also increased its CPU limits), and restarted Benthos [10:15:48] *enwiki-goodfaith pod [10:33:28] elukey: ack! [10:41:22] Morning! (sorta :)) [11:16:24] (commuting again, ttl in a bit) [11:51:14] 10Machine-Learning-Team: Move Wikilabels Postgres Instances to VMs - https://phabricator.wikimedia.org/T312564 (10klausman) p:05Triage→03Medium [12:08:32] <- Lunch [12:08:47] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 6th round of wikis - https://phabricator.wikimedia.org/T304550 (10kevinbazira) [12:11:50] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 6th round of wikis - https://phabricator.wikimedia.org/T304550 (10kevinbazira) @kostajh, thank you for the confirmation. We have published the datasets for all 16 wikis. [13:02:38] Morning all! [13:24:08] o/ [13:28:00] 10Machine-Learning-Team, 10ContentTranslation, 10Wikimedia Enterprise: Run NLLB-200 model in a new instance - https://phabricator.wikimedia.org/T321781 (10elukey) @LSobanski this is the first example of AWS microservice built outside our production realm, I asked to open a task to SRE to discuss how it is be... [13:39:38] 10Machine-Learning-Team, 10ContentTranslation, 10Wikimedia Enterprise, 10serviceops: Run NLLB-200 model in a new instance - https://phabricator.wikimedia.org/T321781 (10LSobanski) Thanks for the clarification. Let's start with #serviceops then and see who else we need afterwards. [16:34:46] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Observability-Logging, 10observability, 10Event-Platform Value Stream (Sprint 04): Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10Ottomata) FYI I am working on making a more specific list of requirements spec for... [18:55:54] 10Machine-Learning-Team, 10ContentTranslation, 10Wikimedia Enterprise, 10serviceops: Run NLLB-200 model in a new instance - https://phabricator.wikimedia.org/T321781 (10calbon) I am moving this ticket to ML's in progress column. @klausman I spoke to Deb. It sounds like the plan currently is for you to do t...