[00:22:20] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2423.codfw.wmnet with OS buster completed: - mw2423 (**PASS**) - Removed from Pupp... [00:24:57] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2424.codfw.wmnet with OS buster [00:27:25] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10Papaul) [00:41:12] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2425.codfw.wmnet with OS buster [01:22:43] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2425.codfw.wmnet with OS buster completed: - mw2425 (**PASS**) - Removed from Pupp... [01:22:45] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2424.codfw.wmnet with OS buster completed: - mw2424 (**PASS**) - Removed from Pupp... [01:23:19] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2426.codfw.wmnet with OS buster [01:28:01] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2427.codfw.wmnet with OS buster [01:29:25] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10Brycehughes) @akosiaris We'll see how it goes over the the next few weeks. Really appreciate the o... [01:36:36] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10Papaul) [02:10:13] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2426.codfw.wmnet with OS buster completed: - mw2426 (**PASS**) - Removed from Pupp... [02:11:13] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2427.codfw.wmnet with OS buster completed: - mw2427 (**PASS**) - Removed from Pupp... [02:11:38] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2428.codfw.wmnet with OS buster [02:11:58] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10Papaul) [02:56:55] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2428.codfw.wmnet with OS buster completed: - mw2428 (**PASS**) - Removed from Pupp... [03:12:49] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10Papaul) [03:47:21] 10serviceops, 10Commons, 10MediaWiki-File-management, 10SRE, and 3 others: Frequent "Error: 429, Too Many Requests" errors on pages with many (>50) thumbnails - https://phabricator.wikimedia.org/T266155 (10Samwilson) Related Community Wishlist Survey proposal: [[https://meta.wikimedia.org/wiki/Community_Wi... [07:02:49] 10serviceops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 9 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [09:30:10] 10serviceops, 10Data-Persistence, 10Discovery-Search, 10SRE, and 2 others: March 2023 Datacenter Switchover Excluded services - https://phabricator.wikimedia.org/T329193 (10Clement_Goubert) [09:41:19] 10serviceops, 10SRE, 10CommRel-Specialists-Support (Jan-Mar-2023), 10Datacenter-Switchover: CommRel support for March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T328287 (10Clement_Goubert) I'll let @LSobanski answer authoritatively for Phabricator and Etherpad. We are not switching over... [10:34:08] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc2052.codfw.wmnet with OS bullseye [10:34:34] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS bullseye [11:08:39] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc2052.codfw.wmnet with OS bullseye completed: - mc2052 (**PASS**) - Downtimed on Icinga/Alertmanager - Disa... [11:20:12] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS bullseye executed with errors: - mc-gp1001 (**FAIL**) - Downtimed on Icinga/Aler... [11:21:03] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS bullseye [11:50:54] Sorry to trouble you, but I think I'm going to need some help understanding the way we link k8s infrastructure_users to ServiceAccounts and RoleBindings. It's about trying to allow shell users (on stat boxes) to create SparkApplication objects in the 'spark' namespace on the dse-k8s cluster. [11:52:09] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS bullseye executed with errors: - mc-gp1001 (**FAIL**) - Removed from Puppet and... [11:52:34] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS bullseye [11:56:26] I've created infrastructure users called `spark` and `spark-deploy` in `hieradata/common/profile/kubernetes.yaml` in the private repo, but I haven't yet created a kubectl config for these on the deployment server. https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/profile/kubernetes/deployment_server.yaml#L272-L290 [11:57:27] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS bullseye executed with errors: - mc-gp1001 (**FAIL**) - Removed from Puppet and... [11:57:45] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS bullseye [11:58:38] The reason for that is that I think users (members of the `analytics-privatedata-users` POSIX group) will be using these accounts from the stat boxes instead of the deployment servers. [12:04:56] btullis: first of all: You only need to add infrastructure_users entries for service accounts you need *outside* of the cluster. Therefore I think the "spark-operator" entry is not requires (and I also don't know what cfss-issuer is for - assuming it's not needed as well) [12:05:53] the ServiceAccount objects that charts create are usually bound to pods of that chart, meaning they can access the k8s api with credentials bound to that service account [12:08:52] think of infrastructure_users as a static list of tokens (subject to change in the future) that humans and tools can use to identify themselves to the k8s api from outside the cluster [12:09:43] if you want to stick to how we do it with wikikube, you probably just need a "spark" (and it's corresponding "spark-deployment") user [12:11:06] helmfile.d/admin_ng/helmfile_namespaces.yaml will take care of rolebindings etc. for that user(s) in the "spark" namespace [12:11:54] jayme: OK, thanks. So you're saying that in this case we won't need the spark-operator or spark-operator-deploy users because we will be using the admin tokens to deploy the spark-operator service. Right? [12:12:42] So I can remove both https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/profile/kubernetes/deployment_server.yaml#L283-L286 and the entry with the two tokens in the private repo. [12:16:13] Right. helmfile.d/admin_ng/helmfile_namespaces.yaml is starting to make more sense now. [12:17:23] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS bullseye executed with errors: - mc-gp1001 (**FAIL**) - Removed from Puppet and... [12:18:52] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS buster [12:25:38] btullis: yes, exactly. The operator you will deploy with admin credentials and it then uses a service account managed inside of the cluster to do it's thing [12:28:28] there is some documentation around that at https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service - feel free to amend/extend :) [12:33:01] Great, thanks. I'll deploy this https://gerrit.wikimedia.org/r/c/operations/puppet/+/887994 and then delete them from the private repo. [12:34:38] Now I'll have a crack at granting the rights to create SparkApplication objects to the spark service account. I'll worry about how to get the kubectl config onto the stat boxes (and any necessary ferm stuff) another time. [12:49:06] btullis: I still wonder: What service account will the "spark-driver" use? [12:50:10] and will there be one spark-drive and one spark-executor per spark job? Or are they shared? [12:52:51] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS buster completed: - mc-gp1001 (**PASS**) - Removed from Puppet and PuppetDB if p... [12:53:12] The spark-driver is intended to be run in the same security context as the executors, so I would expect that it uses `spark-serviceaccount` - equivalent to a posix user running a process under their own id. [12:55:25] There will be one spark-driver per job, but the spark-driver spawns many executor pods. [12:57:27] This is equivalent to the way that spark runs on hadoop/YARN, when a user launches a job in 'cluster mode'. This creates a spark-driver process on one yarn worker node, which then creates a bunch of executor processes across the rest of the cluster, until the job is finished. [12:57:38] btullis: but how would the spark-driver know which servie account to use? [12:58:18] also, there is no connection between a service-account and the users as wich a process in a container runs as [12:58:47] the service-account is only used for reference in authentication/authorization to the k8s api [13:00:28] The service-account for the driver to use is passed as an attribute in the spec of the SparkApplication resource. Like this: https://phabricator.wikimedia.org/T318926#8389971 [13:02:40] I think buried deep in the CRD here? https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855674/30/charts/spark-operator/templates/crds/sparkoperator.k8s.io_sparkapplications.yaml#1309 [13:02:54] btullis: ah, I see. Tbh I think I would use a separate service-account for that and not re-use "spark-deploy" user for that [13:04:06] the service accounts "spark" and "spark-deploy" are managed by infrastructure_users and helmfile.d/admin_ng/helmfile_namespaces.yaml and are really intended for "outside of cluster" use [13:04:57] for the spark-driver you could easily create a dedicated service-account (inside the helm chart) which you can manage yourself [13:05:50] that way the credentials of the software are uncoupled from the credentials the humans [13:05:56] ...if that makes sense [13:06:40] (ofc. it is absolutely possible that I'm missing something here) [13:08:40] Gotcha. In fact that phab link I sent isn't the latest. The comment on the spark-chart CR is the latest. I am creating a serviceaccount within the chart called 'spark-serviceaccount' and that's the one that I've been using in the SparkApplication kubectl file. https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855674/30/charts/spark-operator/templates/serviceaccount-spark.yaml [13:09:55] ah, I see [13:10:01] Sorry, I made things less clear by linking to an old definition. Here's the latest test job I've been sending: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/855674/comments/a6b93275_de54a412 [13:10:54] ...but you were totally right earlier, I don't think that this 'spark-serviceaccount' has the rights to create SparkApplication objects in the 'spark' namespace. [13:12:01] and that's probably fine [13:12:15] as the SparkApplication objects are created by humans, AIUI [13:12:37] thus the "spark-deploy" service-account (via infrastructure_users) [13:22:47] Sorry, do you mean that I should add another rolebinding for the spark-serviceaccount or that you don't think it's necessary? My head is starting to spin a bit with it. [13:23:51] 10serviceops, 10SRE, 10CommRel-Specialists-Support (Jan-Mar-2023), 10Datacenter-Switchover: CommRel support for March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T328287 (10LSobanski) - GitLab failover requires a ~1.5h maintenance window during which GitLab will be unavailable. - We won'... [13:26:51] neither of that :) - sorry for being so confusing. I did not really get all levels of operator, driver and executor in first place [13:28:10] I think your chart should create 2 service accounts: spark-operator and spark-driver which clearly indicate which component uses them [13:29:28] then there will be the unfrastucture_users "spark" and "spark-deploy" which will be managed by helmfile.d/admin_ng [13:30:39] OK, cool. Won't the executors also end up using the spark-driver serviceaccount, given that the driver spawns those pods? That would be confusing? Should I call the serviceaccount spark-app or something? [13:30:40] spark-deploy you will need to grant the permission to create SparkApplication objects (in addition to the default permissions) via helmfile_namespace.yaml [13:31:26] that depends on how the driver spanwns the pod really... [13:32:00] ideally it does not specify a serviceaccount for the executor pods as those don't need any special permissions (AIUI) [13:32:57] OK, will try using the spark-driver serviceaccount. [13:33:04] Thanks so much. [13:36:21] Sure. I left a naming comments on the CR. Maybe that makes it a bit less confusing for our future selves to argue about :) [13:54:44] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2429.codfw.wmnet with OS buster [13:58:13] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2430.codfw.wmnet with OS buster [14:14:48] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2431.codfw.wmnet with OS buster [14:21:45] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2432.codfw.wmnet with OS buster [14:27:36] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc2053.codfw.wmnet with OS bullseye [14:40:16] 10serviceops, 10Data-Persistence, 10Discovery-Search, 10SRE, and 2 others: March 2023 Datacenter Switchover Excluded services - https://phabricator.wikimedia.org/T329193 (10Clement_Goubert) >>! In T327920#8570661, @bd808 wrote: > #Toolhub does not have a working Kubernetes deployment outside of eqiad ({T28... [14:44:56] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2429.codfw.wmnet with OS buster completed: - mw2429 (**PASS**) - Removed from Pupp... [14:45:04] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2430.codfw.wmnet with OS buster completed: - mw2430 (**PASS**) - Removed from Pupp... [14:46:48] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2433.codfw.wmnet with OS buster [14:47:04] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2434.codfw.wmnet with OS buster [15:02:53] 10serviceops, 10SRE, 10CommRel-Specialists-Support (Jan-Mar-2023), 10Datacenter-Switchover: CommRel support for March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T328287 (10Clement_Goubert) While not directly linked to the switchover as it does not have a codfw deployment, Toolhub will p... [15:03:20] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc2053.codfw.wmnet with OS bullseye completed: - mc2053 (**PASS**) - Downtimed on Icinga/Alertmanager - Disa... [15:07:23] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc2054.codfw.wmnet with OS bullseye [15:08:36] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2432.codfw.wmnet with OS buster completed: - mw2432 (**PASS**) - Removed from Pupp... [15:09:10] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2435.codfw.wmnet with OS buster [15:11:13] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2431.codfw.wmnet with OS buster executed with errors: - mw2431 (**FAIL**) - Remove... [15:31:49] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2434.codfw.wmnet with OS buster completed: - mw2434 (**PASS**) - Removed from Pupp... [15:39:38] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2431.codfw.wmnet with OS buster [15:39:40] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS bullseye [15:39:49] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2431.codfw.wmnet with OS buster executed with errors: - mw2431 (**FAIL**) - Remove... [15:41:11] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2431.codfw.wmnet with OS buster [15:42:13] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc2054.codfw.wmnet with OS bullseye completed: - mc2054 (**PASS**) - Downtimed on Icinga/Alertmanager - Disa... [15:43:06] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2433.codfw.wmnet with OS buster executed with errors: - mw2433 (**FAIL**) - Remove... [15:51:57] 10serviceops, 10MW-on-K8s, 10SRE, 10SRE Observability: Ingest php-slowlog in logstash - https://phabricator.wikimedia.org/T326794 (10Clement_Goubert) Dashboard available: https://logstash.wikimedia.org/app/dashboards#/view/74557260-a88f-11ed-96bb-4b4732aa077a [15:55:23] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc2055.codfw.wmnet with OS bullseye [16:05:28] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2435.codfw.wmnet with OS buster executed with errors: - mw2435 (**FAIL**) - Remove... [16:09:25] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc2055.codfw.wmnet with OS bullseye executed with errors: - mc2055 (**FAIL**) - Downtimed on Icinga/Alertmanag... [16:09:46] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc2055.codfw.wmnet with OS bullseye [16:10:34] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc-gp1001.eqiad.wmnet with OS bullseye completed: - mc-gp1001 (**PASS**) - Downtimed on Icinga/Alertmanager... [16:37:27] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2431.codfw.wmnet with OS buster executed with errors: - mw2431 (**FAIL**) - Remove... [16:44:31] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc2055.codfw.wmnet with OS bullseye completed: - mc2055 (**PASS**) - Removed from Puppet and PuppetDB if prese... [17:36:38] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10jijiki) [17:44:06] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2431.codfw.wmnet with OS buster [17:45:36] 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: Toolhub does not have a working Kubernetes deployment outside of eqiad - https://phabricator.wikimedia.org/T329319 (10bd808) [17:51:31] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2433.codfw.wmnet with OS buster [17:54:40] 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: Toolhub does not have a working Kubernetes deployment outside of eqiad - https://phabricator.wikimedia.org/T329319 (10bd808) Some IRC discussion from 2023-01-30 in the #wikimedia-serviceops channel: `lang=irc [16:36:45] I think I... [17:54:44] 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: Toolhub does not have a working Kubernetes deployment outside of eqiad - https://phabricator.wikimedia.org/T329319 (10bd808) @JMeybohm and @Clement_Goubert Do you expect the aux cluster to be ready for the Toolhub workload in time to mak... [17:58:31] 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: Toolhub does not have a working Kubernetes deployment outside of eqiad - https://phabricator.wikimedia.org/T329319 (10bd808) @Gehel is there planned maintenance for the search-chi-eqiad cluster during the time that codfw is the active DC... [17:59:37] 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: Toolhub does not have a working Kubernetes deployment outside of eqiad - https://phabricator.wikimedia.org/T329319 (10bd808) @Marostegui is there planned maintenance for the m5 cluster during the time that codfw is the active DC that act... [18:01:08] 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: Toolhub does not have a working Kubernetes deployment outside of eqiad - https://phabricator.wikimedia.org/T329319 (10bd808) [18:08:59] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mw2435.codfw.wmnet with OS buster [18:22:40] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc-gp2001.codfw.wmnet with OS bullseye [18:32:55] 10serviceops, 10DC-Ops, 10ops-codfw, 10ops-eqiad: Update iDRAC and NIC firmware on mc-gp* hosts - https://phabricator.wikimedia.org/T329323 (10jijiki) [18:33:05] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2433.codfw.wmnet with OS buster completed: - mw2433 (**PASS**) - Removed from Pupp... [18:33:09] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2431.codfw.wmnet with OS buster completed: - mw2431 (**PASS**) - Removed from Pupp... [18:34:46] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10Papaul) [18:49:33] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mw2435.codfw.wmnet with OS buster completed: - mw2435 (**PASS**) - Removed from Pupp... [18:56:03] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc-gp2001.codfw.wmnet with OS bullseye completed: - mc-gp2001 (**PASS**) - Downtimed on Icinga/Alertmanager... [18:56:05] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10Papaul) [18:56:55] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw, 10ops-eqiad: Update iDRAC and NIC firmware on mc-gp* hosts - https://phabricator.wikimedia.org/T329323 (10Reedy) [19:20:50] 10serviceops, 10Performance-Team: Rewrite mw-warmup.js in Python - https://phabricator.wikimedia.org/T288867 (10RLazarus) [19:29:24] 10serviceops, 10Performance-Team: Rewrite mw-warmup.js in Python - https://phabricator.wikimedia.org/T288867 (10RLazarus) @Krinkle Are you aware of any current uses of `warmup.js` //besides// the DC switchover automation? Anywhere else I need to maintain compatibility, or adapt either humans or software to cal... [19:36:16] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q3:rack/setup/install mw2420-mw2451 - https://phabricator.wikimedia.org/T326362 (10Jhancock.wm) [19:36:27] 10serviceops, 10Performance-Team: Rewrite mw-warmup.js in Python - https://phabricator.wikimedia.org/T288867 (10Krinkle) >>! In T288867#8602756, @RLazarus wrote: > @Krinkle Are you aware of any current uses of `warmup.js` //besides// the DC switchover automation? Anywhere else I need to maintain compatibility,... [19:46:44] 10serviceops, 10Performance-Team: Rewrite mw-warmup.js in Python - https://phabricator.wikimedia.org/T288867 (10RLazarus) Perfect, thanks! [20:18:05] 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: Toolhub does not have a working Kubernetes deployment outside of eqiad - https://phabricator.wikimedia.org/T329319 (10Marostegui) >>! In T329319#8602423, @bd808 wrote: > @Marostegui is there planned maintenance for the m5 cluster during... [20:31:53] 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: Toolhub does not have a working Kubernetes deployment outside of eqiad - https://phabricator.wikimedia.org/T329319 (10bd808) p:05Triage→03High [20:32:05] 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: Toolhub does not have a working Kubernetes deployment outside of eqiad - https://phabricator.wikimedia.org/T329319 (10bd808) [21:17:02] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw, 10ops-eqiad: Update iDRAC and NIC firmware on mc-gp* hosts - https://phabricator.wikimedia.org/T329323 (10jijiki) [21:17:05] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10jijiki) [22:19:53] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10Brycehughes) @akosiaris what if we just dd a dummy-edit bot for every page on Wikivoyage? It's Wik... [22:36:15] 10serviceops, 10Data-Persistence, 10Discovery-Search, 10SRE, and 2 others: March 2023 Datacenter Switchover Excluded services - https://phabricator.wikimedia.org/T329193 (10bd808) >>! In T329193#8601521, @Clement_Goubert wrote: >>>! In T327920#8570661, @bd808 wrote: >> #Toolhub does not have a working Kube...