[07:36:36] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9703049 (10MoritzMuehlenhoff) [08:36:55] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9703148 (10BTullis) [10:18:46] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9703294 (10MoritzMuehlenhoff) [10:26:19] 06serviceops, 06SRE, 07Epic, 13Patch-For-Review: Phase out cergen for ServiceOps services - https://phabricator.wikimedia.org/T360636#9703307 (10Clement_Goubert) [10:27:31] 06serviceops, 06SRE, 07Epic, 13Patch-For-Review: Phase out cergen for ServiceOps services - https://phabricator.wikimedia.org/T360636#9703310 (10Clement_Goubert) chartmuseum and docker-registry done [10:57:38] 06serviceops, 10Citoid, 06SRE, 13Patch-For-Review: 14Create a readiness probe for zotero - 14https://phabricator.wikimedia.org/T213689#9703351 (10Mvolz) 14I notice that Zotero is not part of this dashboard: https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?orgId=1 Is there a re... [11:01:53] 06serviceops, 06SRE, 07Epic, 13Patch-For-Review: Phase out cergen for ServiceOps services - https://phabricator.wikimedia.org/T360636#9703359 (10MoritzMuehlenhoff) [11:05:09] 06serviceops, 10Citoid, 06SRE, 13Patch-For-Review: 14Create a readiness probe for zotero - 14https://phabricator.wikimedia.org/T213689#9703363 (10Clement_Goubert) 14I think it's because monitoring is disabled in the service's `values.yaml` [11:08:24] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9703370 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw1421.eqiad.wmnet with OS bullseye [11:08:53] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9703372 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw1422.eqiad.wmnet with OS bullseye [11:09:26] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9703374 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw1491.eqiad.wmnet with OS bullseye [11:09:48] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9703375 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw1492.eqiad.wmnet with OS bullseye [11:10:11] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9703376 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw1493.eqiad.wmnet with OS bullseye [11:37:47] 06serviceops, 10Citoid, 06SRE, 13Patch-For-Review: 14Create a readiness probe for zotero - 14https://phabricator.wikimedia.org/T213689#9703471 (10Clement_Goubert) 14Summing up the discussion on the patch set, this is not what is wanted, turning monitoring on in the service would turn on prometheus met... [11:42:17] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9703501 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw1421.eqiad.wmnet with OS bullseye completed: - mw14... [11:46:00] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9703502 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw1493.eqiad.wmnet with OS bullseye completed: - mw14... [11:49:28] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9703514 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw1422.eqiad.wmnet with OS bullseye completed: - mw14... [11:53:15] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9703516 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw1491.eqiad.wmnet with OS bullseye completed: - mw14... [11:54:51] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9703520 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw1492.eqiad.wmnet with OS bullseye completed: - mw14... [12:09:15] 06serviceops, 10Citoid, 06SRE, 13Patch-For-Review: 14Create a readiness probe for zotero - 14https://phabricator.wikimedia.org/T213689#9703551 (10Mvolz) 14 >>! In T213689#9703471, @Clement_Goubert wrote: > Summing up the discussion on the patch set, this is not what is wanted, turning monitoring on... [12:18:52] 06serviceops, 10Citoid, 06SRE, 13Patch-For-Review: 14Create a readiness probe for zotero - 14https://phabricator.wikimedia.org/T213689#9703583 (10Clement_Goubert) 14>>! In T213689#9703551, @Mvolz wrote: > Thanks for linking the actual current Zotero probe - I see it checks the export endpoint? Where c... [12:34:19] 06serviceops, 06Machine-Learning-Team, 13Patch-For-Review: Rename the envoy's uses_ingress option to sets_sni - https://phabricator.wikimedia.org/T346638#9703617 (10JMeybohm) [12:37:37] 06serviceops, 06Machine-Learning-Team, 13Patch-For-Review: Rename the envoy's uses_ingress option to sets_sni - https://phabricator.wikimedia.org/T346638#9703621 (10JMeybohm) Unfortunately version 1.4.3 of mesh.configuration still uses `uses_ingress` in one if-block. So the initially assumed version requirem... [12:45:28] 06serviceops, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate charts to Calico Network Policies - https://phabricator.wikimedia.org/T359423#9703644 (10JMeybohm) [12:46:12] 06serviceops, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate charts to Calico Network Policies - https://phabricator.wikimedia.org/T359423#9703648 (10JMeybohm) Grouped the todo list by chart, some of those also need mesh.configuration updates due to {T346638} which coul... [15:04:18] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review: PodSecurityPolicies will be deprecated with Kubernetes 1.21 - https://phabricator.wikimedia.org/T273507#9704123 (10JMeybohm) [18:25:49] lazy copypasta from way back: [18:25:50] 21:29 < bblack> non-urgent, but when people are back Tuesday or whatever: what's a reasonable way to estimate our total concurrency capacity for MediaWiki these days? (like, given the number of MW processes/threads/CPUs/whatever we have avilable, how many reqs could be in flight before we're just queueing and bogging down) [18:25:56] 21:30 < bblack> I'm trying to use this to inform some (much smaller) reasonable concurrency limits at the edge for heavy users, where I know that limit, even if all of it's non-cache-hit, won't be a significant fraction of our total capacity on the inside. [18:26:35] still looking for a serviceops-informed answer, please, on this topic. Basically what's our MediaWiki-level concurrency capacity, roughly, for live requests in flight through MW. [18:27:02] naively in the old world I'd maybe count CPUs or configured apache->php threads on mw* hosts or something like that [18:28:01] I just figure the answer is perhaps more-subtle now, given k8s [18:28:13] (or maybe more straightforward but different, either way) [19:34:52] bblack: k8s is still running php-fpm threads [19:35:01] https://grafana.wikimedia.org/d/35WSHOjVk/application-servers-red-k8s?orgId=1&viewPanel=64&from=now-7d&to=now [19:35:05] hover for absolute numbers [19:35:15] https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red?orgId=1&refresh=1m [19:36:03] the bare metal number looks like 6.5k and the k8s number looks like 3.5k [19:36:14] for total, theoretically-possible concurrency [19:39:44] cdanis: thanks so much! [19:40:14] np :) i'd be very surprised if we could reach even 75% utilization without taking a latency hit [19:41:24] I just wanted to be in the right ballpark. I'm gonna cut down to <10% of the above as step 1 and then go from there. [19:42:56] (the goal is to arrive at some "if you don't use concurrency > X, it won't be a problem, even if they're all uncached" and give a lowball numbers that we know we can handle, but try not to be farcical and just say "2" :P) [19:43:11] perfect [19:43:51] somewhere there's an official "so you run a bot against the mediawiki api" that suggests a max concurrency of 1 [19:44:19] ah here https://www.mediawiki.org/wiki/API:Etiquette [19:45:08] yeah this is basically for major partners or whatever, e.g. a handful of cases like Google. [19:45:28] yeah I figured, and I agree both numbers are farcical for that, it's just funny [19:45:31] the folks who in theory we want to pay to play... [19:45:45] yes, these numbers will be part of making them pay to play :) [19:46:12] godspeed bblack :)