[00:31:08] Project beta-update-databases-eqiad build #69015: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69015/ [01:30:45] Project beta-update-databases-eqiad build #69016: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69016/ [01:40:03] dduvall: awesome work on the phatalaty stack traces patch. I tried to comment in gerrit but my old login doesn't work there. [02:30:54] Project beta-update-databases-eqiad build #69017: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69017/ [03:30:54] Project beta-update-databases-eqiad build #69018: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69018/ [04:31:00] Project beta-update-databases-eqiad build #69019: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69019/ [05:30:59] Project beta-update-databases-eqiad build #69020: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69020/ [06:28:24] 10Continuous-Integration-Config, 10VPS-project-Extdist, 10ci-test-error: Tests failing due to python3.4 not found - https://phabricator.wikimedia.org/T341718 (10hashar) For the record, python3.4 vanished when the job got changed from the `releng/tox` Docker image to the `releng/tox-buster` Docker image back... [06:30:54] Project beta-update-databases-eqiad build #69021: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69021/ [07:30:57] Project beta-update-databases-eqiad build #69022: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69022/ [08:30:56] Project beta-update-databases-eqiad build #69023: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69023/ [08:53:15] thcipriani: dancy: tasks concerning a PHP segfault can be filed against #php-segfault in Phabricator [08:54:00] whatever causes the failure must have been introduced in the 1 hour window between the Jenkins job invocation [08:55:01] between 19:20:00 and 20:31:09 UTC ( taking in account the 11 minutes build duration ) [08:56:15] Start-Date: 2023-07-25 23:26:01 [08:56:15] Commandline: apt install systemd-coredump [08:56:15] Install: systemd-coredump:amd64 (241-7~deb10u10) [08:56:15] Error: Sub-process /usr/bin/dpkg returned an error code (1) [08:56:29] from /var/log/apt/history.log [08:56:34] which is unrelated to the php sgfault I guess [08:56:50] I give up at this point, that is already two rabbit holes tasks to investigate [08:58:16] Jul 25 23:54:09 deployment-deploy03 puppet-agent[1108]: (Git::Clone[beta-mediawiki-core]) Scheduling refresh of Exec[/bin/rm -r /srv/mediawiki-staging/php-master/extensions] [08:58:19] for a third one :) [08:58:41] <_joe_> hashar, jnuche: I have a patch for repos/releng/release; do I need to make a fork in my own space, push there, open a merge request? [08:59:59] _joe_: I push a feature branch to the repo and request a merge request [09:00:07] but maybe I can push a feature branch cause I am an admin/owner of the repo [09:00:11] <_joe_> hashar: yeah I don't have rights [09:00:28] else the model is to fork the repo, push the feature branch to your fork and then request a merge request [09:00:30] ala GitHub [09:00:34] <_joe_> (which I used to have on gerrit, but maybe that was due to being a gerritadmin) [09:01:49] it used to be mediawiki/tools/release in gerrit where all mediawiki/* +2'ers had access. I've asked whether the reduction in access was intentional and received no response [09:06:39] <_joe_> taavi: that means yes :) [09:06:50] <_joe_> jokes aside, it's not a problem [09:06:54] <_joe_> hashar: https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/38 [09:07:11] <_joe_> I'm blocked on that [09:09:36] taavi: the release tools have been around for a while. They were maintained by the old Platform Engineering team which eventually had some members moved to the newly created Release Engineering team [09:09:44] and Platform Engineering eventually got disbanded [09:12:36] _joe_: that merge request would be for dancy , I don't know anything about the generation of those container images [09:14:48] 10GitLab, 10Release-Engineering-Team: GitLab merge request pages show an error when logged out - https://phabricator.wikimedia.org/T340062 (10hashar) [09:16:42] 10GitLab (Upstream pit of despair ๐Ÿ•ณ๏ธ), 10Release-Engineering-Team, 10Upstream: GitLab merge request pages show an error when logged out - https://phabricator.wikimedia.org/T340062 (10hashar) #upstream task (filed against Gitlab 16.1.1) https://gitlab.com/gitlab-org/gitlab/-/issues/416863 [09:20:48] jnuche, hashar: I'm planning to decommission releases1002, releases2002 this morning. Is there anything that you're aware of that would stop me from doing this? [09:20:50] 10GitLab (Upstream pit of despair ๐Ÿ•ณ๏ธ), 10Release-Engineering-Team, 10Upstream: GitLab merge request pages show an error when logged out - https://phabricator.wikimedia.org/T340062 (10hashar) [09:21:01] 10GitLab (Upstream pit of despair ๐Ÿ•ณ๏ธ), 10Release-Engineering-Team, 10Upstream: GitLab merge request pages show an error when logged out - https://phabricator.wikimedia.org/T340062 (10hashar) p:05Triageโ†’03Medium [09:23:00] 10Release-Engineering-Team (Priority Backlog ๐Ÿ“ฅ), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T340247 (10jnuche) [09:23:27] eoghan: fine to decommission from my side [09:24:04] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Zuul, 10Patch-For-Review: Refresh integration/zuul/deploy to work on Debian Bullseye - https://phabricator.wikimedia.org/T342346 (10hashar) https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/940161 creates a P... [09:30:56] Project beta-update-databases-eqiad build #69024: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69024/ [09:49:08] eoghan: I think it is good to go yes :-] [09:52:24] Wonderful. [09:56:46] <_joe_> hashar: heh I thought the patch was simple enough. Ok, I'll wait for dancy then :) [10:04:43] 10GitLab (Infrastructure), 10collaboration-services: Create alerting for GitLab CI failures - https://phabricator.wikimedia.org/T339370 (10Jelto) [10:09:13] 10GitLab (Infrastructure), 10collaboration-services: Create alerting for GitLab CI failures - https://phabricator.wikimedia.org/T339370 (10Jelto) 05In progressโ†’03Resolved I've done tests and triggered failing jobs (exit 1), see T342736#9043761 and https://gitlab.wikimedia.org/jelto/test-project/-/jobs. Th... [10:30:54] Project beta-update-databases-eqiad build #69025: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69025/ [11:01:20] 10Release-Engineering-Team, 10bacula, 10collaboration-services: All backups failing for releases1002, releases2002 - https://phabricator.wikimedia.org/T342743 (10jcrespo) [11:21:46] 10Release-Engineering-Team (Priority Backlog ๐Ÿ“ฅ), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T340247 (10jnuche) Rolled back to group0 since there is a chance T342733 could be impacting users: T342733#9043895 [11:30:55] Project beta-update-databases-eqiad build #69026: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69026/ [11:33:25] 10Release-Engineering-Team (Priority Backlog ๐Ÿ“ฅ), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T340247 (10Urbanecm) [11:55:53] hashar: I'm trying to look into the problem with the beta Jenkins job, but I can't even find the node running the job in Horizon [11:56:28] would you have some time for pairing on this at some point today? would like to pick your brain on how that is set up [11:58:53] I'm disabling the job for now to avoid the noise [11:59:59] !log disabled failing https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/ temporarily [12:00:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:15:49] (03PS5) 10Hashar: Use gear from upstream and bump it to 0.16.0 [integration/zuul/deploy] - 10https://gerrit.wikimedia.org/r/940166 (https://phabricator.wikimedia.org/T258630) [12:15:51] (03PS1) 10Hashar: Add wheels for Debian Bullseye [integration/zuul/deploy] - 10https://gerrit.wikimedia.org/r/941904 (https://phabricator.wikimedia.org/T342346) [12:19:49] (03CR) 10Hashar: [C: 04-1] "I'd like the wheel to have the date of the Zuul application which is I90cceab6b3f42d4a9660c795e8acf4285037447a" [integration/zuul/deploy] - 10https://gerrit.wikimedia.org/r/941904 (https://phabricator.wikimedia.org/T342346) (owner: 10Hashar) [12:32:51] 10Release-Engineering-Team, 10bacula, 10collaboration-services: All backups failing for releases1002, releases2002 - https://phabricator.wikimedia.org/T342743 (10jcrespo) 05Openโ†’03Resolved a:03jcrespo This was solved by T334435 (hosts were being decommissioned). [12:45:39] (03CR) 10Hashar: Link to git blames for each of the stacktrace frames (032 comments) [releng/phatality] - 10https://gerrit.wikimedia.org/r/940265 (https://phabricator.wikimedia.org/T342400) (owner: 10Dduvall) [13:17:12] 10Release-Engineering-Team, 10sre-alert-triage: Alert triage: overdue critical alert - https://phabricator.wikimedia.org/T342755 (10JMeybohm) something scap ` Jul 25 03:01:23 deploy1002 scap[30573]: At least one patch failed to apply Jul 25 03:01:23 deploy1002 scap[30573]: 03:01:23 stage-train failed: 10Continuous-Integration-Config, 10MediaWiki-Special-pages, 10MediaWiki-User-management: Add testing for Special:UserRights's interwiki mode - https://phabricator.wikimedia.org/T342763 (10Urbanecm) [13:21:17] jnuche: sorry I have missed your ping earlier. I have selected to ignore the issue [13:22:10] the instance is `deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud` which is attached as an agent to the CI Jenkins controller [13:22:25] the scripts run as jenkins-deploy user there which has a few sudo rules to let it run mwscript etc [13:22:33] 10Release-Engineering-Team, 10Machine-Learning-Team: ML-Team will soon stop using LFS on Gerrit (for ORES deployment) - https://phabricator.wikimedia.org/T342765 (10klausman) [13:23:02] then it is PHP segfaulting, so I guess it is "all about" retrieving the core dump, adding the php debug symbols and running gdb to retrieve the stracktrace [13:26:05] * hashar files a task [13:38:47] (03PS1) 10Krinkle: zuul: Comment-out CirrusSearch and ProofreadPage from wmf_gate [integration/config] - 10https://gerrit.wikimedia.org/r/941935 (https://phabricator.wikimedia.org/T256626) [13:39:53] (03CR) 10CI reject: [V: 04-1] zuul: Comment-out CirrusSearch and ProofreadPage from wmf_gate [integration/config] - 10https://gerrit.wikimedia.org/r/941935 (https://phabricator.wikimedia.org/T256626) (owner: 10Krinkle) [13:42:06] hashar: thx, the host is indeed in the `deployment-prep` project and I have ssh access [13:47:49] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10WikiLambda, 10Wikifunctions, and 2 others: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 (10hashar) [13:49:34] (03PS2) 10Krinkle: zuul: Comment-out CirrusSearch and ProofreadPage from wmf_gate [integration/config] - 10https://gerrit.wikimedia.org/r/941935 (https://phabricator.wikimedia.org/T256626) [13:50:43] (03CR) 10CI reject: [V: 04-1] zuul: Comment-out CirrusSearch and ProofreadPage from wmf_gate [integration/config] - 10https://gerrit.wikimedia.org/r/941935 (https://phabricator.wikimedia.org/T256626) (owner: 10Krinkle) [13:51:20] (03Abandoned) 10Krinkle: zuul: Comment-out CirrusSearch and ProofreadPage from wmf_gate [integration/config] - 10https://gerrit.wikimedia.org/r/941935 (https://phabricator.wikimedia.org/T256626) (owner: 10Krinkle) [13:54:26] Jul 26 13:53:50 deployment-deploy03 puppet-agent[27089]: (/Stage[main]/Systemd::Coredump/File[/etc/systemd/coredump.conf]/content) -KeepFree=4G [13:54:26] Jul 26 13:53:50 deployment-deploy03 puppet-agent[27089]: (/Stage[main]/Systemd::Coredump/File[/etc/systemd/coredump.conf]/content) +KeepFree=20% [13:54:29] damn puppet [13:57:28] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10WikiLambda, 10Wikifunctions, and 2 others: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 (10hashar) I have changed `/etc/systemd/coredump.conf`: ` - KeepFree=20% + KeepFree=4G ` But t... [13:57:34] jnuche: I think I fixed systemd coredump thing (and disabled puppet) [13:57:49] may you reenable the job and trigger it? That should result in a core file this time [13:58:53] nice, will do [13:59:43] then I am not sure what should be passed to `CoreDump.KeepFree`, I am assuming 4G is fine [14:00:11] !log reenabled https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/ so we can (hopefully) get a code dump [14:00:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:00:25] *core, bah [14:01:40] kicked off the job manually [14:02:03] Setting up systemd-coredump (241-7~deb10u10) ... [14:02:04] adduser: The user `systemd-coredump' already exists, but is not a system user. Exiting. [14:02:04] dpkg: error processing package systemd-coredump (--configure): [14:02:09] PFFFFFF THAT IS NEVER ENDING [14:02:16] at least I got the dbg packages installed [14:02:48] !log deployment-prep: on deployment-deploy03, installed php / libpcre debugging symbols for T342769: `sudo apt-get install php7.4-common-dbgsym php7.4-cli-dbgsym libpcre2-dbg` [14:02:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:02:51] T342769: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 [14:03:47] 10GitLab (Infrastructure), 10Release-Engineering-Team, 10collaboration-services, 10Patch-For-Review: Upgrade GitLab to major version 16 - https://phabricator.wikimedia.org/T338460 (10Jelto) [14:04:38] well, maybe I could have run the update.php --wiki=wikifunctionswiki script directly [14:05:52] 10GitLab (Infrastructure), 10Release-Engineering-Team, 10collaboration-services, 10Patch-For-Review: Upgrade GitLab to major version 16 - https://phabricator.wikimedia.org/T338460 (10Jelto) [14:07:08] I bet nothing will core dump this time :D [14:07:11] just to annoy us [14:07:53] and maybe we need to set the core dump limit to unlimited :/ [14:08:08] fingers crossed [14:08:31] I am guessing it defaults to being disabled [14:09:17] so we need a `ulimit -c ulimited` before invoking the command, which can probably be inserted in the shell line of the jenkins job [14:09:41] I am pretty sure I wrote some doc about that [14:11:49] I found some old gem from 2017 ( https://gist.github.com/hashar/a8dbb3bee8a0272350320136e9566100 ) [14:12:01] unrelated [14:12:13] Project beta-update-databases-eqiad build #69027: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69027/ [14:13:13] AHHH I HAVE FOUND ONE https://phabricator.wikimedia.org/T296539#7531235 [14:13:40] but that time I ran the command directly under gdb [14:14:34] and of course there is no coredump due to the ulimit for core files being set to 0 block [14:15:30] hi folks. does anyone know if we have some issues with the debian-glue for bookworm builds? [14:15:53] I am seeing a lot of "00:00:14.434 ERROR: ld.so: object 'libeatmydata.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored." that further result in unexpected build errors [14:16:00] https://integration.wikimedia.org/ci/job/debian-glue/2308/console <-- live example [14:16:08] happy to open a task but wanted to check here first, thanks! [14:16:33] 10GitLab (Project Migration), 10Release-Engineering-Team (Priority Backlog ๐Ÿ“ฅ), 10Data Engineering and Event Platform Team, 10Data-Platform-SRE, 10Patch-For-Review: Migrate analytics/datahub pipeline to GitLab - https://phabricator.wikimedia.org/T341194 (10CodeReviewBot) btullis opened https://gitlab.wiki... [14:16:42] > AHHH I HAVE FOUND ONE https://phabricator.wikimedia.org/T296539#7531235 [14:16:42] still useful, I'm bookmarking that [14:17:48] 10GitLab (Infrastructure), 10Release-Engineering-Team, 10collaboration-services, 10Patch-For-Review: Upgrade GitLab to major version 16 - https://phabricator.wikimedia.org/T338460 (10Jelto) [14:18:19] yeah [14:18:28] so we need to pass `--skip-config-validation` to update.php [14:18:31] anyway [14:18:37] on the host: `sudo -s -u www-data gdb --args /usr/bin/php /srv/mediawiki-staging/multiversion/MWScript.php update.php --wiki=wikifunctionswiki --quick --skip-config-validation` [14:18:59] then I get the gdb prompt in which I enter `run` and press enter [14:19:21] that starts the update, something freeze eventually or is doing some processing [14:19:43] something something with mysql [14:20:03] 14:03:16 [14:20:03] 14:12:12 Done in 1.9 s. [14:20:53] that is from the jenkins console, the update step takes roughly 9 minutes. it should probably be batched and provide some progress report to the user :) [14:21:45] James_F: Wikilambda somehow triggers a segfault in php/libpcre when doing the update.php for wikifunctions :/ [14:21:47] https://phabricator.wikimedia.org/T342769 [14:22:05] hashar: Oh no. [14:22:16] I am not sure what update.php is currently doing though [14:22:34] there is no progress report but the thread dump shows it is in mysql so I assume it updates stuff in the datase [14:22:37] database [14:22:38] That's where it's trying to inject the pre-defined content. [14:22:40] Yeah. [14:22:52] sholdn't the update be batched / showing some progress? [14:23:06] 10Release-Engineering-Team (Priority Backlog ๐Ÿ“ฅ), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T340247 (10Urbanecm_WMF) [14:23:07] It should spam out on each page it edits. [14:23:13] I am pretty sure we have support for that, or at least at some point we did and that was used when changing the images checksum from md5 to sha1 [14:23:46] maybe it is stuck before having an opportunity to report [14:23:51] Yeah. [14:24:12] I'm tempted to delete the wiki and start again. ;-) [14:24:22] AH I GOT IT [14:24:57] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Abstract Wikipedia team, 10WikiLambda, and 3 others: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 (10Jdforrester-WMF) [14:27:35] hashar: What breaks? [14:27:59] let me copy paste :) [14:28:04] Sure. [14:28:56] Project beta-update-databases-eqiad build #69028: 04STILL FAILING in 8 min 55 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69028/ [14:31:35] 10GitLab (Project Migration), 10Release-Engineering-Team (Priority Backlog ๐Ÿ“ฅ), 10Data Engineering and Event Platform Team, 10Data-Platform-SRE, 10Patch-For-Review: Migrate analytics/datahub pipeline to GitLab - https://phabricator.wikimedia.org/T341194 (10BTullis) I have now preapred a merge request to e... [14:35:07] jnuche: James_F: trace pasted at https://phabricator.wikimedia.org/T342769#9044861 [14:35:29] but that is where I stop, I don't know how to inspect further in gdb [14:35:36] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Abstract Wikipedia team, 10WikiLambda, and 3 others: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 (10hashar) So Instead of messing up with systemd / coredump / ulimit I went to invok... [14:35:45] my knowledge is limited to invoking gdb then entering `run` and then `bt` [14:36:02] That's more knowledge than me. [14:37:23] TIL, gonna make notes of all of this [14:37:48] there is some macro to enhance the trace with zend/php knowledge [14:37:57] jnuche: BTW, fixes for the train's blockers are both now merged in wmf.19 but not master due to a blocker to all merges there (?). [14:39:14] there's a third contender, but someone is already on that one, hopefully it'll be backported soon too: https://phabricator.wikimedia.org/T342744 [14:39:28] but there's a blocker to merge to master? hum [14:39:44] Fun times. [14:40:35] jnuche: Yeah. I'll file now. [14:40:47] I got the trace :) [14:41:03] appreciated [14:41:48] [0x7fffcb44b710] preg_match("\7^Z[1-9]\d*$\7u", "Z1002") [internal function] [14:41:48] [0x7fffcb44aba0] Opis\JsonSchema\Validator->validateString(reference, reference, array(0)[0x7fffcb44ac10], array(7)[0x7fffcb44ac20], object[0x7fffcb44ac30], object[0x7fffcb44ac40], object[0x7fffcb44ac50]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:1219 [14:41:48] [0x7fffcb44a760] Opis\JsonSchema\Validator->validateProperties(reference, reference, array(0)[0x7fffcb44a7d0], array(7)[0x7fffcb44a7e0], object[0x7fffcb44a7f0], object[0x7fffcb44a800], object[0x7fffcb44a810], NULL) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:943 [14:42:00] I will paste the details and the first few lines on the task [14:42:47] jnuche: T342775 [14:42:47] T342775: Code merge blocker: MediaWiki\Auth\AuthManagerTest::testSecuritySensitiveOperationStatus with data set #0 (true) - https://phabricator.wikimedia.org/T342775 [14:43:05] I guess the regex does an infinite loop [14:43:20] or whatever really I don't know [14:46:18] hashar: OK, so that's probably coming from extensions/WikiLambda/function-schemata/data/CANONICAL/Z9.yaml [14:53:14] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Abstract Wikipedia team, 10WikiLambda, and 4 others: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 (10CodeReviewBot) jforrester opened https://gitlab.wikimedia.org/repos/abstract-wiki... [14:55:17] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Abstract Wikipedia team, 10WikiLambda, and 4 others: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 (10Jdforrester-WMF) ` [10:41:48] <+hashar> [0x7fffcb44b710] preg_match("\7^Z[1-9]\d... [15:02:20] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Abstract Wikipedia team, 10WikiLambda, and 4 others: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 (10hashar) After I have remembered PHP provides some gdb helpers ( [[ https://raw.gi... [15:02:27] hashar: I think this is real. :-( [15:02:43] James_F: I have pasted the php trace at https://phabricator.wikimedia.org/T342769#9045024 [15:03:00] jnuche: you can mute the job again [15:03:03] and I am reenabling puppet [15:03:13] hashar: Thanks. [15:03:36] so to me there is a loop somewhere in the json schema validation [15:03:53] which I am surprised is not caught earlier in CI or whatever, then I guess all the content imported is not necessarly validated on CI [15:04:00] or that is user data rather than code [15:04:13] I am guessing it should be reproducible, at least update.php takes 7 minutes before the segfault [15:04:22] !log disabled https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/ after debugging is done. See https://phabricator.wikimedia.org/T342769 [15:04:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:04:36] hashar: We enforce a dependency graph to avoid exactly this, butโ€ฆ [15:04:42] as for why libpcre2 segfaults, I guess it is because the stacktrace is too long/large and the kernel bails out [15:05:13] and probably the json schema validator should have an increasing counter and eventually bails out when it detects a loop [15:05:52] I think last summer I have read something about implementing a json schema validator causing some challenges [15:06:02] Yeah. [15:06:05] cause some instructions in the schema are cycles (as intended) [15:06:09] and that is tricky to implement [15:06:14] don't quote me :] [15:07:07] my experience is with a java validator and looking at its document page it does mention "endless loops" ( doc https://victools.github.io/jsonschema-generator/#subtype-resolvers ) [15:08:03] * hashar suggests to move to XML & XML DTD ๐ŸงŒ [15:08:12] hashar: I'm force-reloading the pre-defined content and it'sโ€ฆ maybe working? [15:09:00] jnuche: so I think the tldr is to install the debugging symbols ( I went full nuclear: `apt-get install php7.4-common-dbgsym php7.4-cli-dbgsym libpcre2-dbg`) [15:09:11] jnuche: then run the php command with `gdb` in front of it [15:09:41] then download https://raw.githubusercontent.com/php/php-src/php-7.4.30/.gdbinit to ~/ [15:10:08] load it in gdb with: `source ~/.gdbinit` (if it recognizes `~`) [15:10:12] zbacktrace [15:10:29] set attention set to developer :] [15:11:19] James_F: if you could revert the WikiLambda code that is causing the issue it would let us reenable the database updates on th ebeta cluster and mark T342769 resolved [15:11:20] T342769: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 [15:11:39] hashar: We've fixed it. User error on-wiki combined with fragile code. You can re-enable now, I think. [15:11:45] then I guess get another task for addressing content inducing an infinite loop in the json schema validation? [15:11:52] ah [15:11:55] Yeah, I'll file a task [15:12:02] <3 [15:12:13] hashar: all noted, thanks! [15:12:21] and looks like you can reenable the job [15:12:49] and I think thcipriani is right, I should blog more [15:12:58] maybe that's something we could document in a wiki somewhere [15:13:00] instead of crawling through old tasks [15:13:07] or a wiki yeah [15:13:20] https://wikitech.wikimedia.org/wiki/GDB_with_PHP :D [15:13:28] finally found it (and it was in my browser history [15:14:00] I have learned it all from Tim Starling (who wrote that page in 2006 [15:14:11] short break, I have meetings [15:15:09] 10Phabricator, 10Release-Engineering-Team (Escape Goats๐Ÿ), 10Wikimedia-Phabricator-Extensions, 10PHP 7.4 support, 10Patch-For-Review: Exception in PhutilMediaWikiAuthAdapter.php due to array offset access with curly braces - https://phabricator.wikimedia.org/T342007 (10thcipriani) [15:16:21] 10Phabricator, 10Release-Engineering-Team (Escape Goats๐Ÿ), 10collaboration-services, 10Patch-For-Review, 10User-brennen: Migrate phabricator.wikimedia.org to Phorge as upstream - https://phabricator.wikimedia.org/T333885 (10thcipriani) [15:18:43] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Abstract Wikipedia team, 10WikiLambda, and 4 others: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 (10Jdforrester-WMF) 05Openโ†’03Resolved a:03Jdforrester-WMF Possibly caused by e... [15:18:51] jnuche: Does it look likely group1 will roll forward in the next 30 mins? I really don't want to create wikifunctions.org with wmf.18โ€ฆ [15:19:38] James_F: thanks for the follow up tasks [15:20:11] James_F: you still have the possibility of another rollback if the train does roll forward [15:20:39] taavi: Yes, but the *creation* bit is a one-time thing, and I want the i18n from wmf.19 when I do so as that's forever. [15:21:00] James_F: could happen, but it's a bit tight, still waiting on this: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/941882 [15:21:16] jnuche: Should I backport it now? :-) [15:21:39] if you feel confident approving it, you'll get no objection from me :) [15:22:00] It passes CI, which is impressive for a Cirrus patch. [15:23:14] hehehe [15:28:06] 10GitLab (Misc), 10Release-Engineering-Team (Escape Goats๐Ÿ), 10User-aborrero: Investigate and document stacked merge requests - https://phabricator.wikimedia.org/T300819 (10thcipriani) p:05Lowโ†’03Medium [15:28:22] 10GitLab (Integrations), 10Phabricator, 10Release-Engineering-Team (Escape Goats๐Ÿ): Get GitLab to render `T{\d}+` in MR overviews, comments, etc. as links to Phabricator - https://phabricator.wikimedia.org/T337570 (10thcipriani) [15:29:52] !log reenabled https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/ Should be fixed now! https://phabricator.wikimedia.org/T342769 [15:29:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:52:08] 10Release-Engineering-Team (Priority Backlog ๐Ÿ“ฅ), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T340247 (10jnuche) All three blockers have fixes backported for 1.41.0-wmf.19. Rolling forward to group1 again. [16:16:12] (03PS3) 10Dduvall: Link to git blames for each of the stacktrace frames [releng/phatality] - 10https://gerrit.wikimedia.org/r/940265 (https://phabricator.wikimedia.org/T342400) [16:16:14] (03PS1) 10Dduvall: Provide a test Dockerfile target for local testing [releng/phatality] - 10https://gerrit.wikimedia.org/r/941960 [16:23:42] (03CR) 10Dduvall: Link to git blames for each of the stacktrace frames (035 comments) [releng/phatality] - 10https://gerrit.wikimedia.org/r/940265 (https://phabricator.wikimedia.org/T342400) (owner: 10Dduvall) [16:30:26] Yippee, build fixed! [16:30:26] Project beta-update-databases-eqiad build #69029: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69029/ [16:30:33] James_F: ping me when time to add wikifunctions to wikistats [16:30:33] James_F: ^ [16:30:42] somehow the update is working now :]]]] [16:30:46] RhinosF1: Ha. Now? Maybe. [16:31:01] hashar: Magic! [16:31:16] James_F: i get domain not configured [16:31:49] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Abstract Wikipedia team, 10WikiLambda, and 4 others: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1 - https://phabricator.wikimedia.org/T342769 (10hashar) And eventually it is fixed: ` lang=irc 16:30:26 Project bet... [16:32:06] 16:30:25 Done in 11 s. [16:32:08] RhinosF1: Yeah, think I'll need more DNS fun. But the wiki is there. [16:32:22] James_F: wikistats needs api working [16:32:36] RhinosF1: Ah well, I'll ask SRE. [16:33:49] twentyafterfour: hey, thanks! great to see your nick :) [16:41:08] 10Release-Engineering-Team (They Live ๐Ÿ•ถ๏ธ๐ŸงŸ): Update buildkitd wmf/v0.11 branch with latest upstream v0.11 changes - https://phabricator.wikimedia.org/T337976 (10thcipriani) 05Openโ†’03Resolved [16:42:11] 10Release-Engineering-Team (Escape Goats๐Ÿ), 10Patch-For-Review, 10Release Pipeline (Blubber): Implement acceptance tests for Blubber as executable examples - https://phabricator.wikimedia.org/T338160 (10thcipriani) [16:47:58] 10GitLab (Project Migration), 10Release-Engineering-Team (They Live ๐Ÿ•ถ๏ธ๐ŸงŸ), 10User-brennen: Migrate mediawiki/ namespace from Gerrit to GitLab - https://phabricator.wikimedia.org/T335921 (10thcipriani) [16:48:04] 10GitLab (Project Migration), 10Release-Engineering-Team (They Live ๐Ÿ•ถ๏ธ๐ŸงŸ), 10User-brennen: Define a permissions model for the /repos/mediawiki/ namespace on GitLab - https://phabricator.wikimedia.org/T336807 (10thcipriani) 05Openโ†’03Resolved What's done: * We created https://gitlab.wikimedia.org/repos/medi... [16:51:36] hashar: Actually Beta Cluster Wikifunctions now looks very poorly indeed. :-( [16:51:47] :-\ [16:51:55] Table 'wikifunctionswiki.revtag' doesn't exist [16:52:07] Possibly because it's now hidden by prod config? [16:52:13] I guess some table lacks the maintenance magic for update.php to create it? [16:52:16] ah yeah [16:52:24] if it is not there, it is not created [16:52:37] But how was it working 50 minutes ago? [16:52:44] no idea? cache? [16:59:22] 10GitLab (Project Migration), 10Release-Engineering-Team (They Live ๐Ÿ•ถ๏ธ๐ŸงŸ), 10User-brennen, 10User-dduvall: Write a GitLab "Migrating a Project" runbook / manual - https://phabricator.wikimedia.org/T307538 (10thcipriani) [16:59:44] 10GitLab (Project Migration), 10Release-Engineering-Team (Escape Goats๐Ÿ), 10User-brennen, 10User-dduvall: Write a GitLab "Migrating a Project" runbook / manual - https://phabricator.wikimedia.org/T307538 (10thcipriani) [17:05:25] 10Gerrit, 10GitLab (Project Migration), 10Release-Engineering-Team (Escape Goats๐Ÿ), 10User-Kizule, 10User-brennen: Script closing/archiving of migrated repositories on Gerrit - https://phabricator.wikimedia.org/T330345 (10thcipriani) a:05brennenโ†’03None [17:12:12] 10GitLab, 10ExtensionDistributor: Add Gitlab Provider for ExtensionDistributor - https://phabricator.wikimedia.org/T340523 (10thcipriani) [17:15:12] 10GitLab (Auth & Access), 10Release-Engineering-Team (Radar), 10CAS-SSO, 10Infrastructure-Foundations, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10thcipriani) [17:26:29] 10Release-Engineering-Team (Priority Backlog ๐Ÿ“ฅ), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T340247 (10Urbanecm) [17:32:37] 10Release-Engineering-Team (Priority Backlog ๐Ÿ“ฅ), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T340247 (10Urbanecm_WMF) [17:34:31] RhinosF1: OK, *now* the wiki's properly there, as long as you use www.wikifunctions.org for now. [17:35:22] James_F: ok, is wikifunctions.org multilingual or english [17:36:44] RhinosF1: Multi-lingual, but differently so. [17:36:54] (And also not fully working yet.) [17:42:49] James_F: magic :) [18:14:51] 10GitLab (CI & Job Runners), 10Release-Engineering-Team, 10mwcli: GitLab CI jobs failing with "You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit" - https://phabricator.wikimedia.org/T329216 (10dancy) [18:15:26] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (They Live ๐Ÿ•ถ๏ธ๐ŸงŸ), 10collaboration-services, 10serviceops-radar, 10Patch-For-Review: Set up mirror of the docker hub registry for gitlab-runners - https://phabricator.wikimedia.org/T329679 (10dancy) 05Openโ†’03Resolved I'm resolving this task as "do... [19:03:14] (03CR) 10Ahmon Dancy: Provide a test Dockerfile target for local testing (031 comment) [releng/phatality] - 10https://gerrit.wikimedia.org/r/941960 (owner: 10Dduvall) [19:06:25] James_F: can I leverage wikifunctions as a Makefile ? :-b [19:06:47] hashar: No. :-P [19:07:20] as a functional progtamming language maybe? [19:08:14] let's re-implement MediaWiki CI on top of Wikifunctions [19:11:47] deal [20:56:49] 10GitLab, 10Movement-Insights: GitLab Private Repository Request for: Data Center Switch Analysis - https://phabricator.wikimedia.org/T342482 (10thcipriani) Hey @nshahquinn-wmf I'm afraid as an administrator I can't create a project in your user namespace. So I have two options for you: 1. You can create an e... [21:20:34] 10GitLab, 10Movement-Insights: GitLab Private Repository Request for: Data Center Switch Analysis - https://phabricator.wikimedia.org/T342482 (10mpopov) In this specific case I would recommend a private repo within Neil's namespace (following Tyler's instructions for next steps) because I'm not sure that it's... [21:23:10] (03PS2) 10Dduvall: Provide a test Dockerfile target for local testing [releng/phatality] - 10https://gerrit.wikimedia.org/r/941960 [21:23:19] (03CR) 10Dduvall: Provide a test Dockerfile target for local testing (031 comment) [releng/phatality] - 10https://gerrit.wikimedia.org/r/941960 (owner: 10Dduvall) [21:25:29] (03PS3) 10Dduvall: Provide a test Dockerfile target for local testing [releng/phatality] - 10https://gerrit.wikimedia.org/r/941960 [21:25:56] (03CR) 10Dduvall: "Ad" [releng/phatality] - 10https://gerrit.wikimedia.org/r/941960 (owner: 10Dduvall) [21:46:01] (03CR) 10Ahmon Dancy: [C: 03+1] Provide a test Dockerfile target for local testing (031 comment) [releng/phatality] - 10https://gerrit.wikimedia.org/r/941960 (owner: 10Dduvall) [22:12:29] 10GitLab (Misc), 10Release-Engineering-Team (Escape Goats๐Ÿ), 10User-aborrero: Investigate and document stacked merge requests - https://phabricator.wikimedia.org/T300819 (10cscott) This seems related to the general 'Depends-On' feature in gerrit/jenkins, which I think is also substantially missing/needs-hack...