[08:25:44] 06serviceops, 10MW-on-K8s: Running maintenance scripts in screen on `deploy1003` appears to fail, but is still running - https://phabricator.wikimedia.org/T400962#11056688 (10Clement_Goubert) >>! In T400962#11054567, @jrbs wrote: >>>! In T400962#11053266, @Clement_Goubert wrote: >> `mwscript-k8s` does not need... [09:32:27] 06serviceops, 06MW-Interfaces-Team, 07OKR-Work, 13Patch-For-Review: Identify and configure rest.php routes in REST gateway - https://phabricator.wikimedia.org/T400132#11056855 (10Clement_Goubert) #mw-interfaces-team This is the list of routes we've found in the spec files, would it be possible to get eyes... [10:39:20] 06serviceops, 10Prod-Kubernetes, 07Wikimedia-production-error: etcdserver: mvcc: database space exceeded - https://phabricator.wikimedia.org/T401107 (10jijiki) 03NEW p:05Triage→03Unbreak! [10:49:21] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review, 07Wikimedia-production-error: etcdserver: mvcc: database space exceeded - https://phabricator.wikimedia.org/T401107#11057113 (10Clement_Goubert) Quick dump of my investigation so far: - There's a [[ https://grafana.wikimedia.org/goto/EOBnZCwHR?orgId=1 |... [11:22:38] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review, 07Wikimedia-production-error: etcdserver: mvcc: database space exceeded - https://phabricator.wikimedia.org/T401107#11057167 (10jijiki) [14:08:51] 06serviceops: wikikube-ctrl200[4-5] implementation tracking - https://phabricator.wikimedia.org/T390861#11057635 (10Clement_Goubert) p:05Triage→03Medium a:03jasmine_ [14:16:35] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review, 07Wikimedia-production-error: etcdserver: mvcc: database space exceeded - https://phabricator.wikimedia.org/T401107#11057685 (10Clement_Goubert) Following a repeat of the incident triggered by the next `scap backport`, we've increased `etcd`'s `quota-bac... [14:20:13] 06serviceops, 10MediaWiki-extensions-OAuth, 06MediaWiki-Platform-Team, 07Technical-Debt, 07Upstream: Migrate OAuth extension back from wikimedia/oauth2-server fork to upstream - https://phabricator.wikimedia.org/T261462#11057742 (10Tgr) I think we should fix this by the time we switch to PHP 8.3. It woul... [14:56:07] 06serviceops, 10Sustainability (Incident Followup): Page only if videoscalers are unavailable for longer than the default time - https://phabricator.wikimedia.org/T338220#11057894 (10jijiki) 05Open→03Invalid moved to k8s [15:49:33] 06serviceops, 10decommission-hardware: decommission mwmaint1002.eqiad.wmnet and mwmaint2002.codfw.wmnet - https://phabricator.wikimedia.org/T400442#11058066 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jasmine@cumin1003 for hosts: `mwmaint1002.eqiad.wmnet` - mwmaint1002.eqiad.wmnet (**PA... [16:04:37] 06serviceops, 10decommission-hardware: decommission mwmaint1002.eqiad.wmnet and mwmaint2002.codfw.wmnet - https://phabricator.wikimedia.org/T400442#11058124 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jasmine@cumin1003 for hosts: `mwmaint2002.codfw.wmnet` - mwmaint2002.codfw.wmnet (**PA... [16:07:04] 06serviceops, 10decommission-hardware: decommission mwmaint1002.eqiad.wmnet and mwmaint2002.codfw.wmnet - https://phabricator.wikimedia.org/T400442#11058136 (10jasmine_) a:05jasmine_→03None [16:17:17] 06serviceops, 10MoveComms-Support, 07Datacenter-Switchover: MoveComms support for Southward DC Switchover (September 2025) - https://phabricator.wikimedia.org/T399894#11058178 (10jasmine_) a:05jasmine_→03None [17:14:10] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review, 07Wikimedia-production-error: etcdserver: mvcc: database space exceeded - https://phabricator.wikimedia.org/T401107#11058388 (10jijiki) Digging deeper with @Clement_Goubert and @Scott_French, we found that a vast number of mw-script jobs was created (not... [17:23:56] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review, 07Wikimedia-production-error: etcdserver: mvcc: database space exceeded - https://phabricator.wikimedia.org/T401107#11058439 (10jijiki) p:05Unbreak!→03High [19:13:27] 06serviceops, 06MediaWiki-Engineering: Prepare WMF PHP 8.3 packages for bullseye - https://phabricator.wikimedia.org/T398245#11058822 (10Scott_French) Many thanks to @Krinkle for validating xhprof as a suitable alternative for tideways in T400109. My understanding is that, once profiling-related code in media... [19:50:37] 06serviceops, 10MW-on-K8s, 10Sustainability (Incident Followup): mw-scripts SAL integration - https://phabricator.wikimedia.org/T376776#11058927 (10RLazarus) 05Open→03Resolved Implemented and [[ https://wikitech.wikimedia.org/w/index.php?title=Maintenance_scripts&diff=prev&oldid=2329864 | documented... [19:52:11] Hi, I just applied a helmfile change to eventgate-analytics. deployed fine in staging and codfw. In eqiad it has been hanging for 13 mins. [19:53:26] hm, I wonder if that's further fallout from T401107 -- taking a look [19:53:38] 4 of the pods are still 40d old, most have been restarted [19:54:10] I also had a mwscript-k8s invocation take an unusually long time just now, so it's not just you [19:54:32] aye k...and in the meantime my session disconected (and I did not use a screen or tmux :o oops) [19:55:58] somethign is unsticking! [19:56:52] looks like it finished. [19:57:01] rzl: can I do eventgate-main? or should I hold? [19:57:25] feel free, I'm poking around but nothing that precludes you from trying [19:59:47] okay [20:01:30] noting for anyone following along, mwscript-cleanup was started at about 17:00, and is still going -- that's why the LIST percentile latency gets crappy right then and stays that way [20:02:17] that's either causing the whole cluster to slow down (unlikely IMO, but not ruling it out) or it's affected by it and showing up disproportionately in the stats [20:02:23] eventgate-main deployed as normal. [20:02:32] thank you rzl [20:02:44] good! I didn't do anything, but happy to claim the credit [20:02:57] thanks for flying serviceops [20:07:24] :) [20:37:36] rzl: i need to rollback eventgate-analytics. eqiad is now failing but not hanging. [20:37:43] Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress [20:39:02] STATUS: pending-upgrade [20:39:08] maybe because my session died before? [20:45:25] yeah, that's likely [20:45:41] safest bet is to `helm rollback` and then roll forward -- do you know how to dot hat? [20:45:45] *to do that [20:46:39] i haven't done it in a long time [20:48:06] no problem [20:48:56] ottomata: here's what I see for the release history https://www.irccloud.com/pastebin/hTedLJWH/ [20:49:20] double-check me on the cluster, service, and release name please :) then, if it sounds good to you, I'll roll back to revision 34 [20:49:39] eventgate-analytics, eqiad, production [20:49:42] revision 35 is the one that didn't go out completely about an hour ago [20:49:50] okay, that should be good. [20:50:24] done [20:50:49] thank you so much rzl . I'm sorry i have to run. I did not expect this deploy to go on this long. [20:50:53] cwhite: ^ should be good. [20:51:12] ottomata: no problem! note your upgrade earlier today still did not happen, you'll need to roll forward if you want to [20:51:37] that's fine, i merged a chagne to rollback to the same image version that i think r35 should be [20:51:45] i need a mw core deployment before i go forward again. [20:51:45] thank you! [21:09:27] 06serviceops, 10MediaWiki-Core-Profiler, 10WikimediaDebug: Switch wmf-config/Profiler from Tideways to XHProf - https://phabricator.wikimedia.org/T401152 (10Krinkle) 03NEW [21:09:42] 06serviceops, 10MediaWiki-Core-Profiler, 10WikimediaDebug: Switch wmf-config/Profiler from Tideways to XHProf - https://phabricator.wikimedia.org/T401152#11059226 (10Krinkle) [21:09:47] 06serviceops, 10MediaWiki-Core-Profiler, 10WikimediaDebug: Switch wmf-config/Profiler from Tideways to XHProf - https://phabricator.wikimedia.org/T401152#11059229 (10Krinkle) [21:10:22] 06serviceops, 10MediaWiki-Core-Profiler, 07Documentation, 13Patch-For-Review: Migrate use of php-tideways_xhprof to php-xhprof - https://phabricator.wikimedia.org/T348379#11059232 (10Krinkle) [21:18:48] 06serviceops, 10MediaWiki-Core-Profiler, 10WikimediaDebug: Switch wmf-config/Profiler from Tideways to XHProf - https://phabricator.wikimedia.org/T401152#11059277 (10Krinkle) [21:19:01] 06serviceops, 06MediaWiki-Engineering: Prepare WMF PHP 8.3 packages for bullseye - https://phabricator.wikimedia.org/T398245#11059278 (10Scott_French) 05Open→03Resolved Alright, that should be everything tracked here. Separately, I'll rebuild the (as-yet unused) php8.3 production images to pick up 8.3.... [21:19:06] 06serviceops, 10MediaWiki-Core-Profiler, 06MediaWiki-Platform-Team, 10WikimediaDebug: Switch wmf-config/Profiler from Tideways to XHProf - https://phabricator.wikimedia.org/T401152#11059283 (10Krinkle) [21:52:56] 06serviceops, 10MediaWiki-Core-Profiler, 06MediaWiki-Platform-Team, 10WikimediaDebug: Switch wmf-config/Profiler from Tideways to XHProf - https://phabricator.wikimedia.org/T401152#11059363 (10Scott_French) Thanks for filing this @Krinkle. So, for the production use case, the switch from installing php-ti...