[00:14:02] 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Dzahn) [04:01:06] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10Brycehughes) @akosiaris thanks a ton for setting that up and running it. So, that solves my proble... [07:38:18] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10akosiaris) >>! In T226931#8670016, @Jaifroid wrote: > Just to report, as promised, that the Kiwix... [07:45:05] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [07:50:21] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [08:04:26] 10serviceops, 10SRE, 10Patch-For-Review: kubernetes102[34] implemetation tracking - https://phabricator.wikimedia.org/T313874 (10akosiaris) @jijiki, I +1ed the above, but we also lack a homer patch to instruct the routers to peer with the nodes. See https://gerrit.wikimedia.org/r/c/operations/homer/public/+/... [08:21:42] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10Jaifroid) @akosiaris Thank you - that sounds very positive. Possibly a common template, as you say... [08:23:30] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [08:26:52] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [08:27:25] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10akosiaris) >>! In T226931#8671506, @Jaifroid wrote: > @akosiaris Thank you - that sounds very posi... [08:29:00] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [08:31:31] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [08:34:22] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Update wikikube eqiad to k8s 1.23 - https://phabricator.wikimedia.org/T331126 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=6354fd03-fd3c-49db-ac21-75e88be10633) set by akosiaris@cumin1001 for 1 day, 0:00:00 on 23 host... [08:35:50] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10Jaifroid) OK, we can live with that, given that over time the pages will get edited, hopefully, an... [08:39:22] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10akosiaris) >>! In T226931#8671559, @Jaifroid wrote: > If you had to target only one other Wikivoy... [08:40:23] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10Jaifroid) Agreed! Many thanks! [08:52:39] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10MoritzMuehlenhoff) [08:53:59] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10elukey) [08:59:50] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [09:20:09] <_joe_> akosiaris: are we changing the IP ranges for the k8s pods? [09:20:27] yes [09:20:29] <_joe_> if so, we need to check DB grants/firewall rules [09:20:39] Amir fixed all of these already [09:20:42] <_joe_> was that already done? [09:20:44] <_joe_> ok perfect [09:21:06] yeah, we triggered it on the codfw upgrade and he was kind enough to fix it for eqiad too [09:29:24] any clue on why changeprop@eqiad could have started to process events again? https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?orgId=1&var-dc=eqiad%20prometheus%2Fk8s&var-job=cirrusSearchElasticaWrite&from=now-3h&to=now [09:29:42] for a short period apparently [09:31:44] <_joe_> I frankly doubt it was actually processing jobs [09:32:01] <_joe_> but if it was, I would assume it was some form of delayed execution? [09:32:12] <_joe_> anwyays, the jobs will be routed to the correct datacenter [10:21:09] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [10:23:06] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [10:23:32] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: New Kubernetes nodes may end up with no Pod IPv4 block assigned - https://phabricator.wikimedia.org/T296303 (10JMeybohm) 05Open→03Resolved a:03JMeybohm I'm going to close this as resolved now since we're using calico v3.23 everywhere... [10:24:58] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update staging-codfw to k8s 1.23 - https://phabricator.wikimedia.org/T326340 (10JMeybohm) [10:25:03] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [10:50:32] hi folks! I'd like to add a new port to the tls-proxy for inference.d.w via https://gerrit.wikimedia.org/r/c/operations/puppet/+/894014. Is there anything specific that I need to do beside merging? [11:03:16] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [11:05:45] elukey: no, you are good to go [11:07:02] <3 [11:19:00] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Define priorityClassName for istio and cert-manager deployments - https://phabricator.wikimedia.org/T310618 (10JMeybohm) a:03JMeybohm [11:21:32] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update wikikube eqiad to k8s 1.23 - https://phabricator.wikimedia.org/T331126 (10JMeybohm) [11:21:34] 10serviceops, 10SRE, 10Patch-For-Review: kubernetes102[34] implemetation tracking - https://phabricator.wikimedia.org/T313874 (10JMeybohm) [11:24:13] o/ who should we consult with about a new usage of kafka main https://phabricator.wikimedia.org/T325303#8672061 ? [11:25:00] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10aborrero) [11:25:40] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10aborrero) [11:26:05] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10aborrero) Sent a ping to @Marostegui regarding clouddb[1013-1014,1021] Also @Andrew regarding cloudservices host, but I think the host can be tak... [11:32:47] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) @aborrero regarding clouddb* hosts, it is up to your team but I think it would be nice if you could depool them. Better user experienc... [12:25:05] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Define priorityClassName for istio and cert-manager deployments - https://phabricator.wikimedia.org/T310618 (10JMeybohm) 05Open→03Resolved [12:25:13] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [12:28:47] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [12:33:20] ottomata: o/ I think that is a matter of traffic volume, how many partitions etc.. I'd say that if you could come up with some numbers and final specs (how many partitions, consumers, etc..) it should be fine [12:34:34] Jumbo is a bigger cluster but it also runs 10g nics, so any huge increment in bw returned wouldn't cause any issues.. Not that the new page change stream should cause it, but let's double check [12:34:51] (maybe page_content_change could? Not sure) [12:34:52] i think i have numbers from research on kakfa stretch, will paste them in to ticket [12:35:14] super, after that if everything looks reasonable there shouldn't be blockers [12:47:35] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10BTullis) [12:55:15] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10fnegri) @Marostegui @aborrero the patch above should depool clouddb1013 and clouddb1014. I don't think clouddb1021 can be depooled easily as it l... [12:57:41] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10BTullis) [13:17:14] 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10jnuche) @akosiaris thanks for the feedback. Just to clarify, we can work around the issue currently, but it makes the frequent Scap self-update process more error-... [13:28:21] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 09), 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) [13:29:47] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 09), 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) [13:46:31] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10cmooney) [13:56:52] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10BTullis) [13:58:36] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10jbond) [13:59:11] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10jbond) [13:59:47] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10MatthewVernon) [14:10:00] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=f4ffc353-a529-4620-994f-ae7b737f3c7a) set by cmooney@cumin1001 for 2:00:00 on 238... [14:15:11] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 09), 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10JMeybohm) https://www.mediawiki.org/wiki/Platform_Engineering_Team/Event_Platform_... [14:17:11] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=0a07bba2-0f50-4eec-9718-0c768add34f3) set by cmooney@cumin1001 for 2:00:00 on 1 h... [14:42:04] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10jbond) [14:50:20] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10cmooney) Happy to say the upgrade went as expected, no issues encountered. All devices now back online running 21.4R3-S1.5. [14:52:33] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Andrew) the following hosts paged during this maintenance: ` NodeDown wmcs cloudvirt1023:9100 (node eqiad) NodeDown wmcs cloudvirt1024:9100 (nod... [14:54:39] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10MoritzMuehlenhoff) [14:58:58] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10BTullis) [15:05:40] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 09), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) > Is that equivalent in k8s (having a CPU request... [15:06:27] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 09), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) https://nightlies.apache.org/flink/flink-kubernete... [15:10:46] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10cmooney) >>! In T329073#8672931, @Andrew wrote: > the following hosts paged during this maintenance: > > > ` > NodeDown wmcs cloudvirt1023:9100... [15:13:24] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10MoritzMuehlenhoff) [15:26:46] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update wikikube eqiad to k8s 1.23 - https://phabricator.wikimedia.org/T331126 (10akosiaris) [15:31:17] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10BTullis) [16:06:38] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10colewhite) [16:36:46] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update wikikube eqiad to k8s 1.23 - https://phabricator.wikimedia.org/T331126 (10akosiaris) [17:06:57] 10serviceops, 10MW-on-K8s, 10Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): Build MediaWiki images for kubernetes on the deployment servers - https://phabricator.wikimedia.org/T297673 (10thcipriani) 05In progress→03Resolved I believe this is happening now. Just noticed that this tas... [18:35:31] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Remove the .Values.kubernetesApi hack - https://phabricator.wikimedia.org/T326729 (10JMeybohm) a:03JMeybohm