[07:12:39] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10akosiaris) [07:44:31] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10akosiaris) [07:51:29] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10akosiaris) [07:57:26] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10akosiaris) p:05Triage→03Medium While I did provide data on specific racks, given our availability zones are centered around rows right now, I am gonna focus on rows. Looking at the data I note... [08:52:53] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10akosiaris) Playing around with data using the following constraints: * We are 40%+ skewed towards using row A across all mw2* hosts (this isn't easily fixable right now) * I can only easily mess a... [09:14:09] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10akosiaris) [09:36:55] 10serviceops, 10Keyholder, 10Scap, 10serviceops-collab, 10Datacenter-Switchover: scap can not ssh with keyholder on deploy2002 - https://phabricator.wikimedia.org/T331568 (10Clement_Goubert) [[ https://wikitech.wikimedia.org/wiki/Switch_Datacenter/DeploymentServer | Documentation ]] updated. [09:39:54] find is another loop though, unless you use {} + to pass all args at once :P [09:43:57] 10serviceops, 10Keyholder, 10Scap, 10serviceops-collab, 10Datacenter-Switchover: scap can not ssh with keyholder on deploy2002 - https://phabricator.wikimedia.org/T331568 (10hashar) 05Open→03Resolved Awesome thank you! [09:44:39] <_joe_> I prefer using xargs for those cases [09:47:13] fair [09:47:46] (so do I btw, I just wanted to throw some obscure find shit) [10:12:46] 10serviceops, 10Performance-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10kostajh) [10:37:39] <_joe_> claime: note for us - it seems that httpbb failed on mw on k8s when running during a deployment [10:38:12] <_joe_> we should probably make some tests - it's possible we're being too aggressive in deployments [10:38:25] That's possible [10:40:27] Huh, upstream connect error [10:40:31] Well wait a second [10:40:43] because we reduced replicas in codfw by quite a bit [10:40:49] And I don't think we rolled that back [10:43:03] Hmm we did [10:44:48] Adding a task to check it out [10:48:20] <_joe_> <3 [10:48:43] I wonder what's the default deployment strat [10:48:53] Because roll-restart is an env var [10:51:13] jayme: you seem to have a vim open on /srv/deployment-charts/helmfile.d/services/rdf-streaming-updater/helmfile.yaml with a dirty change (removing atomic: true from the chart) [10:51:36] dcausse: oof...sorry [10:51:42] np! :) [10:51:48] _joe_: Seeing the error I'd say we're not allowing for enough time to finish in-flight requests [10:51:59] dcausse: fixed [10:52:08] thanks :) [10:52:17] dcausse: expect a version bump on deploy [10:52:29] ok [10:52:46] dcausse: reasoning can be found in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/895765 [10:53:10] should be a noop apart from the version number as I've reverted my change [10:54:39] thanks for taking care of this! [10:55:10] well...actually I didn't - but you're welcome :-D [10:55:18] :) [11:00:07] 10serviceops, 10MW-on-K8s: httpbb fails requesting mw-web during deployments - https://phabricator.wikimedia.org/T331609 (10Clement_Goubert) p:05Triage→03Medium [11:00:28] _joe_: ^ [11:14:36] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10Clement_Goubert) That looks a lot better balanced even without touching row A skew, we wouldn't dip below 50% capacity in any cluster if we lose row A (which was the concern for jobrunners). We're... [11:16:33] 10serviceops, 10MediaWiki-extensions-PropertySuggester, 10Wikidata, 10wdwb-tech, and 2 others: New Service Request SchemaTree - https://phabricator.wikimedia.org/T301471 (10akosiaris) Any updates on this one? Per previous comment we were waiting on a merge, has this been done? [11:17:07] 10serviceops, 10Wikidata, 10Wikidata-Query-Service, 10Wikidata.org, 10wdwb-tech: Query service maxlag calculation should exclude datacenters that don't receive traffic and where the updater is turned off - https://phabricator.wikimedia.org/T331405 (10Lucas_Werkmeister_WMDE) > Can I take one step back and... [11:27:41] 10serviceops, 10Machine-Learning-Team, 10SRE, 10Language-Team (Language-2023-January-March), 10Service-deployment-requests: New Service Deployment Request: NNLB-200 for machine translation - https://phabricator.wikimedia.org/T329971 (10akosiaris) I 've transformed (roughly) this to a #service-deployment-... [12:04:43] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10akosiaris) @jafroid enwiktionary done. Regarding de.wikivoyage.org, I see barely [1364](https://qu... [12:07:13] 10serviceops, 10ChangeProp, 10Content-Transform-Team-WIP, 10Page Content Service, and 3 others: Parsoid cache invalidation for mobile-sections seems not reliable - https://phabricator.wikimedia.org/T226931 (10Jaifroid) Ah, that's very interesting! Must be heavier use of images, then! Good to know that it's... [12:08:21] 10serviceops, 10SRE, 10Abstract Wikipedia team (Phase λ – Launch), 10Service-deployment-requests: New Service Request: function-orchestrator and function-evaluator (for Wikifunctions launch) - https://phabricator.wikimedia.org/T297314 (10MatthewVernon) [13:22:34] 10serviceops, 10Wikidata, 10Wikidata-Query-Service, 10Wikidata.org, 10wdwb-tech: Query service maxlag calculation should exclude datacenters that don't receive traffic and where the updater is turned off - https://phabricator.wikimedia.org/T331405 (10dcausse) WDQS lag issues should be rare now, a node no... [14:26:49] check out -sre chat. There's a complaint of 412 responses to VE edits on officewiki. I'm starting to suspect, with my limited knowledge of all related things, that this might be dc-switching related. [14:27:00] no firm evidence though [14:46:39] 10serviceops, 10Parsoid, 10RESTBase: HTTP 412 Errors when editing Officewiki - https://phabricator.wikimedia.org/T331629 (10akosiaris) [14:46:48] 10serviceops, 10Parsoid, 10RESTBase: HTTP 412 Errors when editing Officewiki - https://phabricator.wikimedia.org/T331629 (10akosiaris) p:05Triage→03High [14:55:58] 10serviceops, 10MediaWiki-REST-API, 10Parsoid: HTTP 412 Errors when editing Officewiki - https://phabricator.wikimedia.org/T331629 (10ssastry) Officewiki does not use RESTBase. [14:56:06] 10serviceops, 10MediaWiki-REST-API, 10Parsoid: HTTP 412 Errors when editing Officewiki - https://phabricator.wikimedia.org/T331629 (10akosiaris) [15:50:13] 10serviceops, 10MediaWiki-REST-API, 10Parsoid, 10Patch-For-Review: HTTP 412 Errors when editing Officewiki - https://phabricator.wikimedia.org/T331629 (10daniel) FWIW, when I tried to edit an existing page on officewiki, it worked fine. When I tried to edit the same page again, it failed with a 412. [15:51:04] 10serviceops, 10MediaWiki-REST-API, 10Parsoid, 10Patch-For-Review: HTTP 412 Errors when editing Officewiki - https://phabricator.wikimedia.org/T331629 (10daniel) a:03daniel [16:09:49] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [16:19:07] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Re-enable seccomProfile in cert-manager chart after k8s 1.23 migration completed - https://phabricator.wikimedia.org/T325620 (10JMeybohm) a:03JMeybohm [16:24:11] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [16:27:33] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Migrate charts away from deprecated typology annotations - https://phabricator.wikimedia.org/T325066 (10JMeybohm) a:03JMeybohm [16:31:43] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Post Kubernetes v1.23 cleanup - https://phabricator.wikimedia.org/T328291 (10JMeybohm) [16:33:53] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Post Kubernetes v1.23 cleanup - https://phabricator.wikimedia.org/T328291 (10JMeybohm) [16:53:05] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Post Kubernetes v1.23 cleanup - https://phabricator.wikimedia.org/T328291 (10JMeybohm) [17:55:19] 10serviceops, 10Performance-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10Jdforrester-WMF)