[10:08:53] 10serviceops, 10Data-Catalog, 10Data-Engineering, 10SRE, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) OK @JMeybohm I've created three CRs that I think should do what we need to finish this. * Adding CNAME records to DNS * Adding service catalog entries... [11:17:38] 10serviceops, 10Data-Catalog, 10Data-Engineering, 10SRE, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) Cool, thanks! +1ed the first two. The service::catalog entries should be in stage production before switching trafficserver to the discovery record ju... [14:01:58] 10serviceops, 10SRE, 10Traffic: fawiki user reports getting 503 errors with message "upstream connect error or disconnect before headers" - https://phabricator.wikimedia.org/T310450 (10CDanis) This error message comes from [[ https://www.envoyproxy.io/ | Envoy ]], which we use for internal cross-service TLS... [14:06:02] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Define priorityClassName for istio and cert-manager deployments - https://phabricator.wikimedia.org/T310618 (10JMeybohm) [14:18:31] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [15:52:10] yo, wondering if anyone has any history on the thumbor machines that are currently pooled (most likely mutante?). I see in https://phabricator.wikimedia.org/T285477 that we built and pooled thumbor200[56] and thumbor100[56] last year [15:52:24] but we also have thumbor100[12] and thumbor200[34] [15:53:07] all 4 of those rather old machines are pooled, and we saw that the service was dealing with serious load and some failing probes when thumbor2004 was down yesterday [15:53:26] thumbor100[34] were decommissioned when the new hosts were added [15:53:46] was it the plan to only have 2 thumbor nodes running with new hardware in each DC or something else? [15:58:20] 10serviceops, 10Infrastructure-Foundations, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Deployment Autopilot 🛩️): Use scap to deploy itself to scap targets - https://phabricator.wikimedia.org/T303559 (10TheresNoTime) Ref the [[ https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/805406 |... [16:05:13] 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration: Migrate thumbor to Kubernetes - https://phabricator.wikimedia.org/T233196 (10hnowlan) [16:15:01] 10serviceops, 10Infrastructure-Foundations, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Deployment Autopilot 🛩️): Use scap to deploy itself to scap targets - https://phabricator.wikimedia.org/T303559 (10dancy) @TheresNoTime Thanks for the report! [16:20:15] 10serviceops, 10Infrastructure-Foundations, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Deployment Autopilot 🛩️): Use scap to deploy itself to scap targets - https://phabricator.wikimedia.org/T303559 (10jnuche) hi @TheresNoTime, thank you for your comment! The scap operation running in beta... [17:40:12] hnowlan: there was at least https://phabricator.wikimedia.org/T273137#7146136, but it seems the decom went ahead half a year later [17:41:21] 10serviceops, 10SRE, 10Shellbox: Shellbox resource management - https://phabricator.wikimedia.org/T310557 (10CDanis) My 2 cents: * Allowing Shellbox to burst beyond its cpu limit seems like the right first, easy thing to try. There's little risk to enabling this for a few services, and (AFAIK?) depending o... [21:21:15] 10serviceops, 10VisualEditor, 10Parsoid (Tracking), 10Patch-For-Review, and 2 others: Preemptively warm caches for Parsoid output - https://phabricator.wikimedia.org/T301371 (10ssastry)