[02:48:52] 10Lift-Wing, 10Machine-Learning-Team, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Machine Learning team - k8s resources access - https://phabricator.wikimedia.org/T333174 (10Ladsgroup) [02:49:11] 10Lift-Wing, 10Machine-Learning-Team, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Machine Learning team - k8s resources access - https://phabricator.wikimedia.org/T333174 (10Ladsgroup) [07:47:52] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [08:52:10] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10Kubernetes: Issues deploying calico to ml-staging-codfw and aux-k8s-eqiad - https://phabricator.wikimedia.org/T333302 (10JMeybohm) [09:03:59] TIL -> https://regex.ai/ [09:04:12] hello folks :) [09:10:49] o/ [09:14:31] morning! :) [09:28:29] hello :) [09:51:50] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10BTullis) [10:26:26] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Issues deploying calico to ml-staging-codfw and aux-k8s-eqiad - https://phabricator.wikimedia.org/T333302 (10elukey) just deployed calico on ml-staging-codfw with typha replica count 1 and it worke... [10:47:33] 10Machine-Learning-Team, 10API Platform, 10API-Portal, 10Platform Team Initiatives (API Gateway Roadmap): Add documentation about LiftWing to the API Portal - https://phabricator.wikimedia.org/T325759 (10elukey) >>! In T325759#8725364, @apaskulin wrote: >> For example, revscoring could point to https://git... [10:47:42] * elukey lunch [11:28:06] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10BTullis) [11:30:28] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fnegri) I "depooled" dbproxy1019 by following the procedure at https://wikitech.wikimedia.org/w/index.php?title=Portal:Data_Services/Admi... [11:49:56] * isaranto lunch [12:21:12] isaranto: o/ you should now be able to check isvcs in ml-staging-codfw, could you please check when you have a moment? [12:29:09] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ayounsi) [12:36:34] I have depooled ores100[34] as prep step for https://phabricator.wikimedia.org/T330165 [12:41:07] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team: Define SLIs/SLOs for link recommendation service - https://phabricator.wikimedia.org/T278083 (10elukey) I think that it should be the Growth team's responsibility to set some target availability, to then use it as starting point for a conversation with SRE... [12:42:43] 10Machine-Learning-Team, 10ORES, 10Scap: Scap deploy for ORES reports success even when uwsgi fails to start up - https://phabricator.wikimedia.org/T280998 (10elukey) 05Open→03Declined Ores is going to be replaced soon by Lift Wing. [12:44:54] 10Machine-Learning-Team, 10ContentTranslation, 10Wikimedia Enterprise: Run NLLB-200 model in a new instance - https://phabricator.wikimedia.org/T321781 (10elukey) [12:45:25] 10Machine-Learning-Team, 10ContentTranslation, 10Wikimedia Enterprise: Cleanup NLLB200 docker image - https://phabricator.wikimedia.org/T324464 (10elukey) 05Open→03Resolved a:03elukey We are hopefully moving away from NLLB, and we'll use a different Docker image. Let's close for the moment, we can re-o... [12:46:35] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Observability-Logging, and 2 others: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10elukey) 05Open→03Resolved a:03elukey Closing since we have been using benthos for a while :) [12:47:58] elukey: yep all good I can view isvc(s) [12:48:31] 10Machine-Learning-Team, 10ORES: Allow browser caching of ORES responses - https://phabricator.wikimedia.org/T251004 (10elukey) 05Open→03Declined Ores is going to be replaced soon by Lift Wing :) [12:48:41] isaranto: ack I'll deploy to prod and close the task :) [12:49:41] 10Lift-Wing, 10Machine-Learning-Team, 10Documentation: Create technical documentation for Lift Wing Infrastructure - https://phabricator.wikimedia.org/T276601 (10elukey) a:05Chtnnh→03None [12:50:39] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ssingh) [12:50:57] 10Lift-Wing, 10Machine-Learning-Team, 10Documentation: Improve Lift Wing documentation - https://phabricator.wikimedia.org/T316098 (10elukey) a:05achou→03None [12:52:08] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint): Define SLIs/SLOs for link recommendation service - https://phabricator.wikimedia.org/T278083 (10kostajh) a:03kostajh >>! In T278083#8734421, @elukey wrote: > I think that it should be the Growth team's responsibility to set some target av... [12:52:17] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint): Define SLIs/SLOs for link recommendation service - https://phabricator.wikimedia.org/T278083 (10kostajh) [12:52:48] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint): Define SLIs/SLOs for link recommendation service - https://phabricator.wikimedia.org/T278083 (10akosiaris) >>! In T278083#8718263, @kostajh wrote: >>>! In T278083#8717993, @elukey wrote: >> @kostajh hi! We are helping in the training part o... [12:53:13] 10Lift-Wing, 10Machine-Learning-Team, 10Documentation: Create technical documentation for Lift Wing Infrastructure - https://phabricator.wikimedia.org/T276601 (10elukey) [12:53:33] 10Lift-Wing, 10Machine-Learning-Team: Create Draft Documentation For Moving Models Into Lift Wing - https://phabricator.wikimedia.org/T269169 (10elukey) [12:53:49] 10Lift-Wing, 10Machine-Learning-Team, 10Documentation: Create technical documentation for Lift Wing Infrastructure - https://phabricator.wikimedia.org/T276601 (10elukey) [12:55:54] 10Machine-Learning-Team: Write Lift Wing Wiki Page - https://phabricator.wikimedia.org/T302897 (10elukey) [12:56:08] 10Lift-Wing, 10Machine-Learning-Team, 10Documentation: Create technical documentation for Lift Wing Infrastructure - https://phabricator.wikimedia.org/T276601 (10elukey) [12:57:49] 10Lift-Wing, 10Machine-Learning-Team, 10SRE, 10SRE-Access-Requests: Machine Learning team - k8s resources access - https://phabricator.wikimedia.org/T333174 (10elukey) 05Open→03Resolved a:03elukey Took the liberty to merge Alexandro's proposal, since the isvc resources don't really contain anything... [12:58:23] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter depool all active/active services in eqiad: eqiad r... [13:02:32] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10BTullis) [13:03:16] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint): Define SLIs/SLOs for link recommendation service - https://phabricator.wikimedia.org/T278083 (10kostajh) >>! In T278083#8734493, @akosiaris wrote: >>>! In T278083#8718263, @kostajh wrote: >>>>! In T278083#8717993, @elukey wrote: >>> @kostaj... [13:17:54] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter depool all active/active services in eqiad: eqiad r... [13:22:48] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ssingh) [13:35:49] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10BTullis) [13:40:47] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint): Define SLIs/SLOs for link recommendation service - https://phabricator.wikimedia.org/T278083 (10KStoller-WMF) 👍 I'm happy to review once we have an initial draft. Or, @kostajh, please just let me know if you want me to take the lead on this. [13:41:03] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Jelto) [13:44:17] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10herron) [13:46:05] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Jelto) [13:49:39] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=4c1e12e1-9d5e-4447-880a-f0ec09133a64) set by ayounsi@cumin1001 for 2:00:... [13:51:24] 10Machine-Learning-Team, 10Observability-Logging, 10Patch-For-Review: Logging spam from revscoring deploys - https://phabricator.wikimedia.org/T320468 (10elukey) @colewhite I don't find the graph that you showed me some time ago about the drop/filter action taken by https://gerrit.wikimedia.org/r/c/operation... [13:54:01] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10jbond) [13:55:41] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10MatthewVernon) [13:56:29] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10BTullis) [13:59:29] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fgiunchedi) [14:01:56] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10jbond) [14:23:02] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint), 10User-notice: Deploy "add a link" to 7th round of wikis - https://phabricator.wikimedia.org/T304551 (10Trizek-WMF) Can we schedule a release date for these wikis? Can it be next week (Wed April 5)? [14:26:51] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fgiunchedi) [14:32:46] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter pool all active/active services in eqiad: eqiad row... [14:35:25] 10Machine-Learning-Team, 10ORES, 10Advanced-Search, 10All-and-every-Wikisource, and 69 others: Remove unnecessary targets definitions - https://phabricator.wikimedia.org/T328497 (10kostajh) [14:47:57] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter pool all active/active services in eqiad: eqiad row... [14:48:25] 10Machine-Learning-Team, 10Observability-Logging, 10Patch-For-Review: Logging spam from revscoring deploys - https://phabricator.wikimedia.org/T320468 (10colewhite) Direct link [[ https://grafana-rw.wikimedia.org/explore?orgId=1&left=%7B%22datasource%22:%22000000026%22,%22queries%22:%5B%7B%22refId%22:%22A%22... [14:50:10] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10ayounsi) [14:50:22] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ayounsi) [14:59:53] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ayounsi) The switch upgrade itself went smoothly as well, like the other rows. One issue was that gerrit1001 was missing from the list.... [15:03:21] elukey: I was thinking to start the work on deploying the ores legacy endpoint [15:03:43] if u have some time we could have a chat about how to procceed [15:03:57] isaranto: deinitely yes, we can chat tomorrow about it what do you think? [15:04:17] if possible I think that we should try https://gitlab.wikimedia.org/repos/sre/sextant/-/blob/scaffold/README.md#create-a-new-chart-from-scaffolding-models [15:04:30] sure, lets talk tomorrow [15:04:43] I'll start from there [15:05:36] basically the idea is to create a helm chart in deployment-chart, once we have the first ores legacy docker images to use [15:05:58] we can deploy it on lift wing using the same procedures used for wikikube (the serviceops cluster) [15:06:10] so all the scaffolding will be done with some config files [15:06:23] then we add the helmfile config as well (super easy) and we try to deploy [15:07:48] ack. we already have the docker images https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-ores-migration/tags/ [15:12:54] 10Machine-Learning-Team, 10Observability-Logging: Logging spam from revscoring deploys - https://phabricator.wikimedia.org/T320468 (10elukey) @colewhite thanks! Checking [[ https://grafana-rw.wikimedia.org/explore?left=%7B%22datasource%22:%22000000026%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22... [15:15:45] repooled ores 1003/4 after network maintenance [15:51:04] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10ayounsi) [15:54:53] 10Machine-Learning-Team, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 6 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10ayounsi) [15:55:27] 10Machine-Learning-Team, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 6 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10ayounsi) [16:07:45] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint): Define SLIs/SLOs for link recommendation service - https://phabricator.wikimedia.org/T278083 (10RLazarus) >>! In T278083#8734485, @kostajh wrote: > I'll start a draft document, and @DMburugu and I will circulate it when it's ready for revi... [16:19:37] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10ayounsi) [16:36:33] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10MatthewVernon) [16:36:55] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10MatthewVernon) [16:47:01] logging off folks! o/ [17:13:37] 10Machine-Learning-Team, 10ORES, 10Advanced-Search, 10All-and-every-Wikisource, and 68 others: Remove unnecessary targets definitions - https://phabricator.wikimedia.org/T328497 (10Jdlrobson) [19:49:45] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10Eevans) [19:51:35] 10Machine-Learning-Team, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10Eevans) [20:53:31] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10colewhite) [20:56:14] 10Machine-Learning-Team, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10colewhite) [22:35:26] 10Machine-Learning-Team, 10Observability-Logging: Logging spam from revscoring deploys - https://phabricator.wikimedia.org/T320468 (10colewhite) Spam filter removed.