[01:00:02] 10serviceops, 10MediaWiki-REST-API, 10Parsoid, 10MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), and 2 others: HTTP 412 Errors when editing Officewiki - https://phabricator.wikimedia.org/T331629 (10ssastry) 05Open→03Resolved [09:43:32] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fgiunchedi) [09:55:11] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic: Find a sensible way to redirect traffic to mw-on-k8s - https://phabricator.wikimedia.org/T331318 (10Joe) After some thought, I think the most maintainable way to do this is to add an additional lua module to the maps for api/appservers. Specifically, this would... [10:03:59] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fgiunchedi) [10:09:50] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Migrate testwikidata to Kubernetes - https://phabricator.wikimedia.org/T331268 (10Clement_Goubert) [10:10:02] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [10:24:46] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Joe) Hi, I have a few questions, if the plan is we move this... [10:33:12] 10serviceops, 10SRE, 10Patch-For-Review: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=c5ba1cf2-f027-43f9-8672-b4eb30f98ddc) set by cgoubert@cumin1001 for 1:00:00 on 32 host(s) and their services w... [11:17:02] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=17f33514-0b87-4f50-abfa-6cd2e1548410) set by cgoubert@cumin1001 for 5:00:00 on 32 host(s) and their services with reason: new_instal... [11:41:38] I'd like to increase the namespace limits for thumbor on CPU - generally the findings from recent experiments is that we just need more replicas in general https://gerrit.wikimedia.org/r/899654 [11:41:52] however, since increasing the queue length in k8s I am tempted to also push that further [11:42:13] Making thumbor *slower* isn't something I'd love to do per se but if the alternative is not serving then it's obviously a lot better [11:42:17] and it might be a compromise [11:44:40] if you look at the "HAProxy eqiad k8s" section you can see that our 503s spike when our queues fill up (naturally) [11:44:43] https://grafana-rw.wikimedia.org/d/Pukjw6cWk/thumbor?forceLogin&from=1678964674900&orgId=1&to=1678967035770 [12:03:48] also I have a strong suspicion that slower/heavier tasks are a big part of it given that they will have a disproportionate impact on k8s workers when there are less of them in general [12:08:50] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=f7f64d19-c64a-4fb5-a8ab-f3218dfd9862) set by cgoubert@cumin1001 for 1:00:00 on 32 host(s) and their services with reason: new_instal... [12:12:01] hnowlan: did you do a quick calculation if we even have additional 35 CPUs vailable? [13:23:32] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fgiunchedi) [14:09:44] 10serviceops, 10Prod-Kubernetes, 10Wikidata, 10Wikidata-Query-Service, 10wdwb-tech: Write and adapt Runbooks and cookbooks related to the WDQS Streaming Updater and kubernetes - https://phabricator.wikimedia.org/T293063 (10Gehel) p:05Triage→03High [14:35:46] jayme: I did a bit, I suspect we don't [14:50:06] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=33992616-b446-4bc5-bf17-27cb8c47e8d7) set by cgoubert@cumin1001 for 1:00:00 on 32 host(s) and their services with reason: new_instal... [15:27:19] hnowlan: yeah, that was my assumption as well :) [15:46:59] 10serviceops, 10SRE: mw2420-mw2451 service implementation tracking - https://phabricator.wikimedia.org/T326363 (10Clement_Goubert) All done ` {"mw2422.codfw.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=codfw,cluster=api_appserver,service=nginx"} {"mw2423.codfw.wmnet": {"weight": 30, "pooled": "yes"}, "... [18:41:30] 10serviceops, 10SRE, 10Traffic, 10VPS-project-Codesearch, 10Patch-For-Review: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10BCornwall)