[08:30:28] 10serviceops, 10SRE, 10Traffic, 10envoy, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) [09:19:04] FYI, kubetcd2004 will briefly go down for a reboot of a ganeti node [09:51:12] likewise 2006 [10:00:41] 👍 [10:01:10] hmm, it just dawned on me. moritzm: do you see emojis in your term? [10:06:22] I do! at least standard emojis [10:06:39] Chris still manages to outsmart my terminal with advanced emojis, though [10:06:47] I was about to ask [10:07:00] he even outsmarts my webapp some times [10:13:53] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate group0 to Kubernetes - https://phabricator.wikimedia.org/T337490 (10Clement_Goubert) [10:44:01] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [10:44:38] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [10:44:50] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate group0 to Kubernetes - https://phabricator.wikimedia.org/T337490 (10Clement_Goubert) 05In progress→03Resolved [10:46:33] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate group1 to Kubernetes - https://phabricator.wikimedia.org/T340549 (10Clement_Goubert) [10:46:45] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [10:46:57] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate group1 to Kubernetes - https://phabricator.wikimedia.org/T340549 (10Clement_Goubert) 05Open→03In progress [12:32:31] and kubetcd2005 [13:14:21] 10serviceops, 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 B): Flink k8s operator in staging sometimes will not sync changes to FlinkDeployments - https://phabricator.wikimedia.org/T340059 (10JArguello-WMF) [13:31:26] my ears are burning [13:41:45] you should see a doctor [13:41:46] :p [13:42:13] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Post Kubernetes v1.23 cleanup - https://phabricator.wikimedia.org/T328291 (10JArguello-WMF) [13:42:32] o/ [13:43:12] Does this mean that once you've turned off pregen, the parsoid cluster will not do anything at all? [13:43:23] Or will it still have a function for other parsing needs? [13:43:55] duesen: we are asking because we see it being kind of busy [13:44:02] straight VE edits? [13:44:49] when we turn off pregen, the parsoid cluster will indeed have nothing to do. [13:44:55] s/straight/direct/ [13:45:27] We could/should re-purpose it to serve the core html endpoints, in addition to the parsoid html endpoints [13:45:55] We can use the computing power... when we remove storage from restbase, a lot of traffic will hit the mw rest endpoints [13:46:12] akosiaris: VE is not using any REST endpoints anymore. [13:46:27] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes: etcd cluster reimage strategies to use with the K8s upgrade cookbook - https://phabricator.wikimedia.org/T330060 (10JArguello-WMF) [13:46:31] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Kubernetes: Kubernetes v1.23 use PKI for service-account signing (instead of cergen) - https://phabricator.wikimedia.org/T329826 (10JArguello-WMF) [13:46:33] The VE action API module talks to parsoid directly inside php. No external calls. [13:46:36] duesen: ite is not a matter of if we will move the parsoid servers to the jobrunner cluster, that is a no brainer [13:46:54] duesen: ah, cool. thanks, I had forgotten that the migration was done [13:47:02] \o/ [13:47:18] effie: I really expect that we'll need to boxes for handling the html endpoints in the future as well. [13:47:19] I am just slightly puzzled with the current status, of we remove 4 more servers, we may overload the rest of them way to much [13:47:23] till we finish the migration [13:48:03] It's just that we have a second set of html endpoints (/v1/page/{title}/html and /v1/revision/{title}/html) [13:48:49] if i understood akosiaris correctly, the idea was not move servers from the API cluster to Jobrunners. Not from the parsoid cluster. [13:49:28] that's a correct understanding. [13:49:29] The parsoid cluster is not going to see less traffic soon. When we turn off storage in restbase, html endpoints in MW will be hit more, not less. [13:50:12] There is a meeting coming up to discuss this. It's in 40 minutes. [13:50:18] Shall I invite any of you? [13:50:53] yea, let me just do that [13:51:08] consider yourself optimal :) [13:51:32] I missed the convo for the API cluster apparenly [13:52:15] lol ok [13:54:21] duesen: that is in 2 hrs right? [13:54:30] a sorry [13:54:51] I will try to join you yes [13:55:44] We should figure out who will be joining the meeting in the future as well [13:55:48] I think there's something I'm still not understanding correctly re: parsoid cluster usage. [13:56:14] And by that I mean the parseXXXX servers [13:56:51] "when we turn off pregen, the parsoid cluster will indeed have nothing to do" and "The parsoid cluster is not going to see less traffic soon. When we turn off storage in restbase, html endpoints in MW will be hit more" seem contradictory [13:57:04] (fully aware this may be a lack of understanding of workflow there) [14:02:08] claime: you are right, the way I said it is a contradiction. eventually, the parsoid endpoints will be unused, but we will be hitting similar html endpoints. and wewill be hitting them more, since the storage layer in restbase goes away. so the parsoid cluster should be repurposed to serve them. [14:02:43] duesen: ok that makes sense [14:02:45] thanks [14:03:03] me saying that "when we turn off pregen, the parsoid cluster will indeed have nothing to do" was misleading. forget about it :) [14:03:26] I shared a lucidchart with you all: when we turn off pregen, the parsoid cluster will indeed have nothing to do. [14:03:31] we can talk about it in the meeting [14:07:09] "User is not assigned to this application." :( [14:08:07] I sent a request through okta [14:19:22] Hm, I assumed you would still be able to view, at least... [14:31:25] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking), 10Patch-For-Review: Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host mw1482.eqiad.wmnet with OS buster [14:31:30] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking), 10Patch-For-Review: Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host mw1483.eqiad.wmnet with OS buster [14:31:36] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking), 10Patch-For-Review: Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host mw1484.eqiad.wmnet with OS buster [14:31:41] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking), 10Patch-For-Review: Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host mw1484.eqiad.wmnet with OS buster [14:31:45] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking), 10Patch-For-Review: Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host mw1485.eqiad.wmnet with OS buster [14:31:55] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking), 10Patch-For-Review: Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host mw1486.eqiad.wmnet with OS buster [14:32:01] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking), 10Patch-For-Review: Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host mw1484.eqiad.wmnet with OS buster executed... [15:16:34] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host mw1482.eqiad.wmnet with OS buster completed: - mw1482 (**WARN**)... [15:19:26] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host mw1483.eqiad.wmnet with OS buster completed: - mw1483 (**WARN**)... [15:21:27] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host mw1485.eqiad.wmnet with OS buster completed: - mw1485 (**WARN**)... [15:24:01] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host mw1484.eqiad.wmnet with OS buster completed: - mw1484 (**WARN**)... [15:36:33] duesen: FYI, done adding 5 new hosts to the jobrunner cluster https://grafana.wikimedia.org/goto/8HM6B494k?orgId=1 [15:36:48] We'll see how it impacts backlog on template edit [15:39:35] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10Clement_Goubert) 5 servers moved from api_appserver to jobrunners: {F37123224} [15:49:32] claime: there's an alert "Host parse1012 is not in mediawiki-installation dsh group", is that related to your machine shuffling? [15:49:58] moritzm: Nope, it's related to it being pooled=invalid because it keeps flapping [15:50:09] ok [15:59:27] duesen: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/933060 was merged, any objections to me deploying changeprop with it? [16:02:38] klausman: i have absolutely no clue... [16:03:25] Well, that means I'm not messing up anything you're doing, which is all I need :) [16:05:09] klausman: I am blissfully ignorant of changeprop :) [16:05:35] claime, effie, vgutierrez: https://drive.google.com/file/d/1eSr_QSPnO09MslOI2BJR2BExX6Nc3x7Y/view?usp=drive_link [16:07:34] cheers [16:07:50] duesen: you got the cache hitrate already in the spreadsheet, let me know if I missed some interesting endpoint [16:26:24] pusehd changeprop to all three places, looking good (modulo the usual backlog bump) [20:23:44] 10serviceops, 10Add-Link, 10Growth-Team, 10GrowthExperiments-NewcomerTasks, 10SRE: linkrecommendation kubernetes service is down with HTTP 504: "upstream request timeout" - https://phabricator.wikimedia.org/T340780 (10Marostegui) Thank you!! [20:28:58] 10serviceops, 10SRE, 10SRE-Access-Requests: Drop the `deploy-service` right, move three included users to `deployment` (or drop access)? - https://phabricator.wikimedia.org/T340165 (10Dzahn) [20:37:42] 10serviceops, 10SRE, 10SRE-Access-Requests: Drop the `deploy-service` right, move three included users to `deployment` (or drop access)? - https://phabricator.wikimedia.org/T340165 (10Dzahn) Yes, the overlap in people is small, and at first it does seem to make sense to merge them. But the groups have prett... [20:45:46] 10serviceops, 10SRE, 10SRE-Access-Requests: Drop the `deploy-service` right, move three included users to `deployment` (or drop access)? - https://phabricator.wikimedia.org/T340165 (10RhinosF1) If deploy-service is used for k8s deploys, surely everyone in 'deployment' needs it with MediaWiki moving to k8s.... [20:50:02] 10serviceops, 10SRE, 10SRE-Access-Requests: Drop the `deploy-service` right, move three included users to `deployment` (or drop access)? - https://phabricator.wikimedia.org/T340165 (10Dzahn) basically "deployment" is "mediawiki scap deployers" and deploy-service is "any service k8 deployers" and started as "... [21:00:24] 10serviceops, 10SRE, 10SRE-Access-Requests: Drop the `deploy-service` right, move three included users to `deployment` (or drop access)? - https://phabricator.wikimedia.org/T340165 (10taavi) >>! In T340165#8978137, @Dzahn wrote: > basically "deployment" is "mediawiki scap deployers" and deploy-service is "an... [21:09:26] 10serviceops, 10SRE, 10SRE-Access-Requests: Drop the `deploy-service` right, move three included users to `deployment` (or drop access)? - https://phabricator.wikimedia.org/T340165 (10taavi) > Should we drop this group? From my count[0] there are only three users in that group and not deployment, so it's jus... [21:10:22] 10serviceops, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Drop the `deploy-service` right, move three included users to `deployment` (or drop access)? - https://phabricator.wikimedia.org/T340165 (10Dzahn) per ` role/common/deployment_server/kubernetes.yaml` ` profile::admin::groups: - deployment... [21:46:00] 10serviceops, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Drop the `deploy-service` right, move three included users to `deployment` (or drop access)? - https://phabricator.wikimedia.org/T340165 (10Jdlrobson) Fine with me to be removed from that group. [21:48:21] 10serviceops, 10Data-Engineering, 10Event-Platform Value Stream, 10SRE-OnFire: Incident: 2022-12-09 api appserver worker starvation - https://phabricator.wikimedia.org/T324994 (10JArguello-WMF) [21:50:39] 10serviceops, 10Data-Engineering, 10Event-Platform Value Stream, 10SRE-OnFire, and 2 others: Uneven CPU throttling of eventgate-analytics under load - https://phabricator.wikimedia.org/T325068 (10JArguello-WMF)