[00:01:21] 10Phabricator, 10LDAP-Access-Requests, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Migrate dev user accounts for bvibber - https://phabricator.wikimedia.org/T358044#9570656 (10Bugreporter) >>! In T358044#9569964, @Peachey88 wrote: >>>! In T358044#9569601, @bvibber wrote: >> That's probably the way... [00:05:00] Project beta-update-databases-eqiad build #74109: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74109/ [00:17:49] 10Phabricator-Bot-Requests, 10Release-Engineering-Team (Now this 🫠), 10Incident Tooling, 10User-brennen: Create "corto" Phabricator bot account for Corto - https://phabricator.wikimedia.org/T355758#9570702 (10brennen) For my own future reference: Followed instructions at https://www.mediawiki.org/wiki/Phab... [00:37:28] 10Phabricator-Bot-Requests, 10Release-Engineering-Team (Now this 🫠), 10Incident Tooling, 10User-brennen: Create "corto" Phabricator bot account for Corto - https://phabricator.wikimedia.org/T355758#9570740 (10brennen) I've added the bot to trusted-contributors and WMF-NDA, although on the latter I'm a litt... [01:05:00] Project beta-update-databases-eqiad build #74110: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74110/ [02:05:00] Project beta-update-databases-eqiad build #74111: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74111/ [03:05:00] Project beta-update-databases-eqiad build #74112: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74112/ [03:42:23] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T330214#9570969 (10Jdlrobson) [04:05:00] Project beta-update-databases-eqiad build #74113: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74113/ [05:05:00] Project beta-update-databases-eqiad build #74114: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74114/ [05:53:51] 10Release-Engineering-Team (Seen), 10scap2: Eliminate symlinks in mediawiki-config (as much as possible) - https://phabricator.wikimedia.org/T126306#9571027 (10Joe) 05Open→03Invalid This task was about HHVM-specific issues. Feel free to reopen if you think it's still valid. [06:05:00] Project beta-update-databases-eqiad build #74115: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74115/ [07:05:00] Project beta-update-databases-eqiad build #74116: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74116/ [07:51:07] 10Phabricator, 10Release-Engineering-Team (Kanban): task-series broken because of error: invalid type "custom.release.date" - https://phabricator.wikimedia.org/T219192#9571088 (10Littleggghost) [07:53:33] 10Phabricator, 10Release-Engineering-Team (Kanban): task-series broken because of error: invalid type "custom.release.date" - https://phabricator.wikimedia.org/T219192#9571091 (10Littleggghost) [07:56:11] 10Continuous-Integration-Infrastructure, 10SRE, 10collaboration-services, 10vm-requests: Ganeti VM for contint migration - https://phabricator.wikimedia.org/T358237#9571093 (10LSobanski) [08:05:00] Project beta-update-databases-eqiad build #74117: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74117/ [08:41:18] good morning hashar! wikibase CI is blocked by the cache corruption issue again on mwgate-node18-docker. could you please nuke the cache again? :) [08:42:01] e.g. https://integration.wikimedia.org/ci/job/mwgate-node18-docker/20558/console [09:00:21] 10Release-Engineering-Team, 10castor, 10ci-test-error: Wikibase CI blocked by castor cache corruption issue - https://phabricator.wikimedia.org/T358312#9571232 (10Jakob_WMDE) [09:03:05] 10Phabricator, 10Release-Engineering-Team (Kanban): task-series broken because of error: invalid type "custom.release.date" - https://phabricator.wikimedia.org/T219192#9571249 (10Peachey88) [09:05:00] Project beta-update-databases-eqiad build #74118: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74118/ [10:05:00] Project beta-update-databases-eqiad build #74119: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74119/ [10:05:13] 10Phabricator, 10LDAP-Access-Requests, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Migrate dev user accounts for bvibber - https://phabricator.wikimedia.org/T358044#9571399 (10ayounsi) a:03ayounsi [10:24:50] 10Release-Engineering-Team (Now this 🫠), 10Scap, 10MW-on-K8s, 10SRE, 10serviceops: Find a way to address canary releases directly - https://phabricator.wikimedia.org/T358117#9571471 (10Clement_Goubert) >>! In T358117#9570020, @thcipriani wrote: > ... > Running httpbb against an mwdebug server before roll... [10:49:04] o/ hi from ml-team, I need some help with a 500 error when CI is pushing the model image. here is the log: https://integration.wikimedia.org/ci/job/inference-services-pipeline-revertrisk-multilingual-publish/69/execution/node/59/log/ - I have no clue what caused the error [11:05:00] Project beta-update-databases-eqiad build #74120: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74120/ [11:08:03] 10Beta-Cluster-Infrastructure: beta-update-databases-eqiad job times out - https://phabricator.wikimedia.org/T358329#9571610 (10TheresNoTime) [11:08:41] 10Beta-Cluster-Infrastructure: beta-update-databases-eqiad job times out - https://phabricator.wikimedia.org/T358329#9571623 (10TheresNoTime) related to {T358236} ? [11:13:34] Project beta-update-databases-eqiad build #74121: 15ABORTED in 36 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74121/ [11:46:08] 10Beta-Cluster-Infrastructure: deployment-webperf21 puppet failure: Could not find class role::webperf::processors_and_site - https://phabricator.wikimedia.org/T358332#9571699 (10TheresNoTime) [11:55:17] Project beta-update-databases-eqiad build #74122: 15ABORTED in 35 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74122/ [12:06:26] Could someone stop the timer for `beta-update-databases-eqiad` until whatever is wrong is sorted, else it'll be going all weekend >.< [12:30:16] TheresNoTime: 👋 just did that [12:30:32] thank you :) [12:30:34] !log temporarily disabled https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/ until it can be troubleshooted [12:30:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:41:01] Project beta-update-databases-eqiad build #74123: 15ABORTED in 21 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/74123/ [12:41:45] (^^ last build was still running, disregard) [12:54:04] aiko: has the size of the images you're pushing in that job increased significantly? it's possible you're hitting a layer limit [12:54:37] I'd suggest you ask in #-serviceops, RelEng doesn't really maintain the docker registry and probably they'll be able to help you better [13:09:28] TheresNoTime, jnuche: is there a task for broken beta CI [13:09:49] T358329 [13:09:49] T358329: beta-update-databases-eqiad job times out - https://phabricator.wikimedia.org/T358329 [13:49:11] jnuche: yes, the size of the image increased 2G compared to previous one [13:49:40] thanks! I'll ask in service ops [14:07:35] 10Continuous-Integration-Infrastructure, 10SRE, 10collaboration-services, 10vm-requests: Ganeti VM for contint migration - https://phabricator.wikimedia.org/T358237#9572029 (10hashar) > What do you want to use as the host name, something like zuul1001? I'd go with `contint1003`. Daniel mentioned using the... [14:13:03] 10Continuous-Integration-Infrastructure, 10SRE, 10collaboration-services, 10vm-requests: Ganeti VM for contint migration - https://phabricator.wikimedia.org/T358237#9572032 (10ayounsi) For testing hosts I'd prefer running on private IPs as those tend to have puppet disabled for longer period of time and "e... [14:46:13] 10Release-Engineering-Team (Now this 🫠), 10Patch-For-Review: gitlab-cloud-runner: Roll back pending helm releases before running terraform apply - https://phabricator.wikimedia.org/T354787#9572127 (10CodeReviewBot) sandeeps updated https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/... [14:57:55] !log sudo rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwgate-node18-docker/ # T358312 [14:58:21] hm, no stashbot around? [14:58:30] Apparently not. Will post manually. [14:58:47] 10Release-Engineering-Team, 10castor, 10ci-test-error: Wikibase CI blocked by castor cache corruption issue - https://phabricator.wikimedia.org/T358312#9572169 (10Jdforrester-WMF) I've run `sudo rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwgate-node18-docker/` on integration-castor05.integration.eqiad... [15:00:57] wb stashbot [15:16:29] Thanks for fixing that, Lucas_WMDE. [15:16:59] and thank you for fixing the castor cache :) [15:18:37] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.41.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T330214#9572227 (10Jdforrester-WMF) [15:29:31] 10Release-Engineering-Team, 10castor, 10ci-test-error: Wikibase CI blocked by castor cache corruption issue - https://phabricator.wikimedia.org/T358312#9572302 (10Lucas_Werkmeister_WMDE) 05Open→03Resolved a:03Jdforrester-WMF Seems to be working now, thank you! [15:29:54] Something wrong wit gerritbot in https://phabricator.wikimedia.org/T300334#9572243, why the highlighted box is completely unrelated thing? Or is it just phab automagic matching change numbers? [15:32:59] Nikerabbit: change numbers are long enough now that phabricator's automatic linking logic sometimes thinks they're short commit hashes. it was discussed here a few days ago, but I don't recall if anyone had any good solutions [15:33:54] gotcha [15:34:26] we can migrate to svn and start again from number 1 [15:34:30] Or GitLab. [15:34:33] * James_F coughs. [15:35:50] the problem with beta cluster db might be caused by my trying to add wiki three times in a row - every run was failing due to some problems (first run failed due to missing sql files in Math extension, second time failed due to some tables already implemented - that was the time when I dropped the `test2wiki` db, third time failed due to moved sql files in Linter extension, here I dropped the test2wiki again (both drops on [15:35:51] db11), then it failed third time when it tried to add fill the data - failed that `test2wiki` db doesn’t exist - so I assumed that due to some reason replication failed) [15:36:43] And now I’m little bit stuck and I don’t know what to do next with that nice ticket - and I’m too afraid to fix it - eg ssh to beta cluster/db11 as it is Friday [15:37:09] and so far my work on beta cluster machine leads only to more errors [15:48:56] (03CR) 10Daimona Eaytoy: "I'd like to help with this, but I've got a question. Would it be possible to have a single place in CI that runs PHPUnit for core/extensio" [integration/config] - 10https://gerrit.wikimedia.org/r/803525 (https://phabricator.wikimedia.org/T90875) (owner: 10Kosta Harlan) [15:51:18] (03CR) 10Jforrester: "There's no reason we can't migrate to new calls, but quibble exists to consolidate CI calls, so theoretically anything that is not using q" [integration/config] - 10https://gerrit.wikimedia.org/r/803525 (https://phabricator.wikimedia.org/T90875) (owner: 10Kosta Harlan) [15:52:18] 10Beta-Cluster-Infrastructure: beta-update-databases-eqiad job times out - https://phabricator.wikimedia.org/T358329#9572395 (10pmiazga) T358236 could cause this issue as the `addWiki.php` script failed in the middle of the process and I had to drop the `test2wiki` multiple times manually (as the script was fail... [15:58:18] 10Beta-Cluster-Infrastructure: beta-update-databases-eqiad job times out - https://phabricator.wikimedia.org/T358329#9572403 (10Jdforrester-WMF) Job now disabled as of 2024-02-23Z12:29:35: https://sal.toolforge.org/log/kIjy1Y0BxE1_1c7sK1mK [16:00:36] (03CR) 10Arlolra: [C: 03+2] Stop suppressing edit section links + hide cdx-info-chip divs (031 comment) [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1005855 (owner: 10Subramanya Sastry) [16:01:40] (03Merged) 10jenkins-bot: Stop suppressing edit section links + hide cdx-info-chip divs [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1005855 (owner: 10Subramanya Sastry) [16:24:47] 10Beta-Cluster-Infrastructure: beta-update-databases-eqiad job times out - https://phabricator.wikimedia.org/T358329#9572480 (10pmiazga) This job runs the`wmf-beta-update-databases.py` script - I wonder if I can run it manually and see the output. I compared the last sucessfull run and first failed runs: Last s... [16:26:06] !log executing mwscript update.php --wiki=aawiki --quick --skip-config-validation to check if this is going to timeout as it timeouts in `beta-update-databases-eqiad` Jenkins job [16:26:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:26:23] pmiazga: Good luck! [16:26:32] Project beta-scap-sync-world build #143896: 04FAILURE in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/143896/ [16:28:49] 10Beta-Cluster-Infrastructure: beta-update-databases-eqiad job times out - https://phabricator.wikimedia.org/T358329#9572489 (10pmiazga) So I executed the script by hand and it looks like it gets stuck. ` pmiazga@deployment-deploy03:~$ mwscript update.php --wiki=aawiki --quick --skip-config-validation #!/usr/bi... [16:31:15] Project beta-scap-sync-world build #143897: 04STILL FAILING in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/143897/ [16:36:11] Project beta-scap-sync-world build #143898: 04STILL FAILING in 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/143898/ [16:38:02] Yippee, build fixed! [16:38:02] Project beta-scap-sync-world build #143899: 09FIXED in 1 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/143899/ [16:38:34] ^ that was my fault, I rebooted `deployment-deploy03` and didn't rearm keyholder [16:42:42] In short, regarding the beta cluster db job issue - the `mwscript update.php` timeouts [16:43:07] pmiazga: do we know what part of update.php [16:43:11] I don’t see any changes in mw core around that time that could affect the script [16:44:05] RhinosF1 - when it goes trough sql files one by one. I attached the output in phab ticket: https://phabricator.wikimedia.org/T358329#9572489 [16:45:07] The command I executed on deploy03 -> `mwscript update.php --wiki=aawiki --quick --skip-config-validation` - the command should be harmless, we used to run it every hour but now the job is disabled [16:47:24] Ye beta runs that all the time [16:47:36] It's weird it suddenly started failing [16:59:29] Previously I stopped script after a two/three minutes. Let me run it for ~15min and I’ll see if it moves any forward [17:10:49] TheresNoTime: poor keyholder :] [17:12:51] and reading some backlog [17:13:03] Phabricator auto linking Gerrit change number is definitely a bug [17:13:28] 10Release-Engineering-Team (Now this 🫠), 10Scap, 10MW-on-K8s, 10SRE, 10serviceops: Find a way to address canary releases directly - https://phabricator.wikimedia.org/T358117#9572625 (10thcipriani) so ` httpbb /srv/deployment/httpbb-tests/appserver/* --hosts=mwdebug.discovery.wmnet --https_port=4444` from... [17:13:30] aka change 10060129 end up rendered to a commit matching it: Change rMW10060129ea6d had a related patch set uploaded (by Nikerabbit; author: Nikerabbit): [17:13:38] Ok, so it looks like the script works, but it takes like 4-5 minutes to run a single sql file [17:14:19] After 4 minutes in managed to do `..rev_actor field in revision table`, `.watchlist_expiry table already exist` and `page_restrictions field does not exist in page table, skipping modify field patch` [17:14:23] 14* minutes [17:16:19] 10Release-Engineering-Team (Now this 🫠), 10Scap, 10MW-on-K8s, 10SRE, 10serviceops: Find a way to address canary releases directly - https://phabricator.wikimedia.org/T358117#9572630 (10dancy) >>! In T358117#9572625, @thcipriani wrote: > so ` httpbb /srv/deployment/httpbb-tests/appserver/* --hosts=mwdebug... [17:16:25] 10Beta-Cluster-Infrastructure: beta-update-databases-eqiad job times out - https://phabricator.wikimedia.org/T358329#9572631 (10pmiazga) I left the script working for 15 minutes and looks like it's slowly progressing: ` pmiazga@deployment-deploy03:~$ mwscript update.php --wiki=aawiki --quick --skip-config-valid... [17:37:14] 10Release-Engineering-Team (Now this 🫠), 10Release, 10Train Deployments: 1.42.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T354437#9572707 (10VolkanUral89) [17:38:59] 10Release-Engineering-Team (Now this 🫠), 10Release, 10Train Deployments: 1.42.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T354437#9572719 (10taavi) [18:23:02] 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible: Edits not saved on beta cluster - https://phabricator.wikimedia.org/T358364#9572844 (10taavi) [18:25:40] ^ sounds awfully like a database replication issue which would line up with update.php slowness [18:29:41] 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible: Edits not saved on beta cluster - https://phabricator.wikimedia.org/T358364#9572882 (10TheresNoTime) Given the recent database issues ({T358329}), probably related [18:32:55] taavi: could something with replication have broke when a new wiki failed at creating [18:38:53] 10Release-Engineering-Team (Now this 🫠), 10Scap, 10serviceops-radar, 10Python3-Porting: git-fat replacement/removal - https://phabricator.wikimedia.org/T279509#9572900 (10dancy) a:03dancy [20:50:34] 10Continuous-Integration-Config, 10I18n, 10Language-Team (Language-2024-January-March), 10Language-Technical Support (Language-Technical Support (Current) ), 10affects-translatewiki.net: Automatically allow tag in message translations - https://phabricator.wikimedia.org/T357670#9573357 (10Amire80) [20:50:36] 10Continuous-Integration-Config, 10I18n, 10Language-Team (Language-2024-January-March), 10Language-Technical Support (Language-Technical Support (Current) ), 10affects-translatewiki.net: Automatically allow id HTML attribute in message translations - https://phabricator.wikimedia.org/T357086#9573358 (10Am... [20:50:42] 10Continuous-Integration-Config, 10I18n, 10Language-Team (Language-2024-January-March), 10Language-Technical Support (Language-Technical Support (Current) ), 10affects-translatewiki.net: Automatically allow
in message translations - https://phabricator.wikimedia.org/T356548#9573360 (10Amire80) [21:00:13] 10Scap, 10Data-Engineering: Can't deploy airflow-dags/research anymore - https://phabricator.wikimedia.org/T311336#9573392 (10dancy) [21:34:31] 10GitLab (Upstream pit of despair 🕳️), 10Release-Engineering-Team, 10Patch-For-Review, 10Upstream: GitLab header logo blocks top portion of page - https://phabricator.wikimedia.org/T358234#9573541 (10CodeReviewBot) brennen merged https://gitlab.wikimedia.org/repos/releng/gitlab-settings/-/merge_requests/57... [21:46:37] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Preferences: User preferences no longer working: mw.user.options is not reflecting database on beta cluster - https://phabricator.wikimedia.org/T358393#9573567 (10Jdlrobson) [21:48:59] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Preferences: User preferences no longer working: mw.user.options is not reflecting database on beta cluster - https://phabricator.wikimedia.org/T358393#9573581 (10Jdlrobson) Also seeing this on https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page - when I select... [21:49:07] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Preferences: User preferences no longer working: mw.user.options is not reflecting database on beta cluster - https://phabricator.wikimedia.org/T358393#9573582 (10Jdlrobson) [21:50:47] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Preferences: User preferences no longer working: mw.user.options is not reflecting database on beta cluster - https://phabricator.wikimedia.org/T358393#9573567 (10Jdlrobson) [21:51:10] 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible: Edits not saved on beta cluster - https://phabricator.wikimedia.org/T358364#9573588 (10taavi) [21:51:16] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Preferences: User preferences no longer working: mw.user.options is not reflecting database on beta cluster - https://phabricator.wikimedia.org/T358393#9573593 (10taavi) [22:12:36] 10Scap, 10Data-Engineering, 10Patch-For-Review: Can't deploy airflow-dags/research anymore - https://phabricator.wikimedia.org/T311336#9573628 (10CodeReviewBot) dancy opened https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/224 Fix longstanding bug in git.next_deploy_tag() [23:01:39] 10Scap, 10Data-Engineering, 10Patch-For-Review: Can't deploy airflow-dags/research anymore - https://phabricator.wikimedia.org/T311336#9573705 (10CodeReviewBot) thcipriani merged https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/224 Fix longstanding bug in git.next_deploy_tag()