[00:12:51] 10Phabricator (Upstream), 10Release-Engineering-Team, 10Upstream: Task Form becomes inaccessible after edit - https://phabricator.wikimedia.org/T312614 (10Dzahn) Upstream confirmed here https://secure.phabricator.com/T13685#257106) Thanks a lot for the quick fix and everything! As a follow-up we did talk... [02:28:51] 10GitLab (Initialization), 10observability, 10serviceops, 10Patch-For-Review: Define monitoring for gitlab - https://phabricator.wikimedia.org/T275170 (10Dzahn) Adding the history of changes that we should have all linked here. Also this is a way to share information with @thcipriani because we have talke... [02:30:32] 10GitLab (Initialization), 10observability, 10serviceops, 10Patch-For-Review: Define monitoring for gitlab - https://phabricator.wikimedia.org/T275170 (10Dzahn) Optionally we could reopen this ticket for just a short time until we declare it done and link the alert dashboard. [03:44:13] 10Beta-Cluster-Infrastructure, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10Ryasmeen) [04:19:33] 10Beta-Cluster-Infrastructure, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10Krinkle) It's an interesting issue as it takes a lot for the webserver to fail so fatally that it can't... [05:34:38] 10Beta-Cluster-Infrastructure, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10ori) The kernel message buffer (via `sudo dmesg -T`) shows php-fpm7.2 is segfaulting: ` [Sat Jul 9 05:... [05:50:44] !log krinkle@deployment-mediawiki-12$ sudo apt-get install systemd-coredump # ref T312689 [05:50:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [05:50:51] T312689: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 [05:51:44] adduser: The user `systemd-coredump' already exists, but is not a system user. Exiting. [05:51:44] dpkg: error processing package systemd-coredump (--configure): [05:51:44] installed systemd-coredump package post-installation script subprocess returned error exit status 1 [05:52:47] 10Beta-Cluster-Infrastructure, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10Krinkle) ` krinkle@deployment-mediawiki12:~$ sudo apt-get install systemd-coredump Reading package list... [05:54:13] ori: this is kinda outside my comfort zone, but fwiw, I do see it in syslog as well: [05:54:14] Jul 9 05:53:05 deployment-mediawiki12 kernel: [12386141.583005] php-fpm7.2[16611]: segfault at 7ffc864cffe8 ip 00007f6cba1f6cd4 sp 00007ffc864cffc0 error 6 [05:54:14] Jul 9 05:53:05 deployment-mediawiki12 kernel: [12386141.583017] Code: 10 48 83 e8 01 48 89 44 24 40 48 89 dd 48 89 d8 48 8b 58 08 4c 8b 78 18 48 8b 08 8b 40 40 48 8b 71 08 4c 8b 71 10 48 83 c0 01 <48> 89 44 24 28 eb 04 48 83 c3 01 49 3b df 73 14 48 0f b6 03 48 83 [05:54:14] Jul 9 05:53:05 deployment-mediawiki12 kernel: [12386141.695738] php-fpm7.2[20407]: segfault at 7ffc864cffe8 ip 00007f6cba047f8c sp 00007ffc864cffc0 error 6 [05:54:14] Jul 9 05:53:05 deployment-mediawiki12 kernel: [12386141.695741] Code: 10 48 83 e8 01 48 89 44 24 40 48 89 dd 48 89 d8 48 8b 58 08 4c 8b 78 18 48 8b 08 8b 40 40 48 8b 71 08 4c 8b 71 10 48 83 c0 01 <48> 89 44 24 28 eb 04 48 83 c3 01 49 3b df 73 14 48 0f b6 03 48 83 [05:55:58] I guess that's where it ends up after `dmesg`, okay, so not surprising that's its the same and not more info. [06:03:35] let's delete the systemd-coredump user and retry installing the package? [06:03:50] it doesn't look like the user was provisioned by puppet [06:06:16] systemd-coredump:x:999:999:systemd Core Dumper:/:/usr/sbin/nologin [06:06:23] sure [06:06:25] wait [06:06:31] i just did it, writing the !log line [06:06:47] how did you determine that it wasn't from puppet? [06:06:53] was* [06:06:58] wasn't* [06:07:03] i grepped the puppet repo for systemd-coredump? [06:07:09] right [06:07:09] okay [06:07:15] but why would apt-get care? [06:07:44] the package's postinst script wants to create the user [06:08:02] !log ori@deployment-mediawiki12: userdel systemd-coredump, followed by apt install systemd-coredump [06:08:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [06:08:20] ah so it fails even if it exists in precisely the way it would end up? the erorr message reads as if its "ensure present" semantics but found an incompatible existing entry that is "not a system user" [06:08:28] it seems 999 would be the (last) UID for system users though [06:09:11] based on a quick search: https://systemd.io/UIDS-GIDS/#summary [06:09:19] anyway, it has 120 now. [06:09:20] before i ran userdel it was: [06:09:21] systemd-coredump:x:999:999:systemd Core Dumper:/:/usr/sbin/nologin [06:09:23] yeah [06:10:21] looks like we've got some details in syslog now [06:10:27] tideways is segfaulting? [06:11:20] maybe part of the php74 provisioning broke tideways with something too new or smth [06:11:34] still odd that it triggers only on those requests though [06:12:30] php-fpm7.2 is the process that segfaults, but some of the crashes occur in tideways code [06:13:15] could it be a redherring? E.g. it's just always on the stack with its no-op hooks [06:13:35] * Krinkle tries to disable it [06:14:14] commented out in /etc/php/7.2/fpm/conf.d/30-tideways-xhprof.ini [06:14:50] the four stack traces i've looked at so far all indicated the crash occurred in tideways [06:15:48] (using `sudo coredumpctl list` to see the list of available stack traces and `sudo coredumpctl info ` to see the traces (where comes from the output of coredumpctl list) [06:16:07] !log krinkle@mediawiki12$ sudo disable-puppet [06:16:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [06:17:04] ori: now I get HTTP 500 [06:17:06] Jul 9 06:15:39 deployment-mediawiki12 php7.2-fpm: PHP Fatal error: Allowed memory size of 698351616 bytes exhausted (tried to allocate 20480 bytes) in /srv/mediawiki/php-master/includes/HookContainer/HookRunner.php on line 1759 [06:17:54] "#0 /srv/mediawiki/php-master/includes/user/UserOptionsManager.php(603): unknown()\n#1 /srv/mediawiki/php-master/includes/user/UserOptionsManager.php(498): MediaWiki\\User\\UserOptionsManager->loadOriginalOptions()\n#2 /srv/mediawiki/php-master/includes/user/UserOptionsManager.php(148): MediaWiki\\User\\UserOptionsManager->loadUserOptions()\n#3 /srv/mediawiki/php-master/extensions/DiscussionTools/includes/Hooks/HookUtils.php(357): [06:18:18] that's progress [06:18:42] i'm afraid i have to abandon you in the field of battle though, 2am here [06:19:47] sure :) [06:19:49] ttyl [06:20:00] night! [06:30:08] 10Beta-Cluster-Infrastructure, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10Krinkle) After @ori removed deleted the `systemd-coredump` user and installed `systemd-coredump`. With t... [06:30:12] 10Beta-Cluster-Infrastructure, 10DiscussionTools, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10Krinkle) p:05Triage→03High a:03Esanders [06:32:48] 10Beta-Cluster-Infrastructure, 10DiscussionTools, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10Krinkle) My estimate is this is induced by DiscussionTools, and would likely affect... [07:23:56] 10Beta-Cluster-Infrastructure, 10DiscussionTools, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10RhinosF1) T308074 is the next train task which this should probably block if it wil... [08:11:46] 10Beta-Cluster-Infrastructure, 10DiscussionTools, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10Esanders) We spotted an out of memory exception on a Patch Demo too: https://patchd... [08:16:17] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.39.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T308072 (10hashar) 05Open→03Resolved That has been successful [08:17:03] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.39.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T308074 (10hashar) [08:17:05] 10Beta-Cluster-Infrastructure, 10DiscussionTools, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10hashar) [08:17:52] Ty hashar [08:17:59] 10Beta-Cluster-Infrastructure, 10DiscussionTools, 10Editing-team: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10hashar) Week of July 11th 2022 doesn't have a MediaWiki train, next one will be 1.3... [08:18:34] * RhinosF1 likes to leave to product owners and releng to make things blockers when it's so far away [08:20:06] RhinosF1: :]]] [08:20:36] RhinosF1: i am pretty sure Ed and developers of DiscussionTools will fix it next week, then I also want to make sure we dont forget :] [08:20:45] Yeah very likely [08:21:01] Have a good weekend too! You should be enjoying your Saturday! [08:21:07] that one is a nasty bug, it definitely has the possibility to bring down the whole site [08:21:16] I do more or less :] [08:21:24] woke up early with an envy to write some java code ... [08:21:45] OOMs can be awful [08:21:51] I hate debugging them too [08:22:18] Been a while since I touched any java [09:03:59] (03PS2) 10Hashar: Add new dependency to BlueSpicePageTemplates [integration/config] - 10https://gerrit.wikimedia.org/r/812323 (owner: 10Robert Vogel) [09:04:08] (03CR) 10Hashar: [C: 03+2] Add new dependency to BlueSpicePageTemplates [integration/config] - 10https://gerrit.wikimedia.org/r/812323 (owner: 10Robert Vogel) [09:06:00] (03Merged) 10jenkins-bot: Add new dependency to BlueSpicePageTemplates [integration/config] - 10https://gerrit.wikimedia.org/r/812323 (owner: 10Robert Vogel) [09:10:38] 10Beta-Cluster-Infrastructure, 10DiscussionTools, 10Editing-team, 10Patch-For-Review: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10Zabe) Seems to have been caused by https://gerrit.wikimedia.o... [09:12:17] found the causing patch [11:51:23] 10Beta-Cluster-Infrastructure, 10DiscussionTools, 10Editing-team, 10Patch-For-Review: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10Esanders) p:05High→03Unbreak! [12:59:10] 10Release-Engineering-Team, 10CampaignEvents, 10Campaign-Registration, 10Campaign-Tools (Campaign-Tools-Sprint-16), 10Patch-For-Review: Release V0 of the CampaignEvents extension to the Beta Cluster - https://phabricator.wikimedia.org/T311752 (10Daimona) [13:03:15] (03PS1) 10Daimona Eaytoy: zuul: [mediawiki/extensions/CampaignEvents] Promote to Wikimedia production section [integration/config] - 10https://gerrit.wikimedia.org/r/812446 (https://phabricator.wikimedia.org/T311752) [13:05:00] (03CR) 10CI reject: [V: 04-1] zuul: [mediawiki/extensions/CampaignEvents] Promote to Wikimedia production section [integration/config] - 10https://gerrit.wikimedia.org/r/812446 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy) [13:05:50] (03PS2) 10Daimona Eaytoy: zuul: [mediawiki/extensions/CampaignEvents] Promote to Wikimedia prod section [integration/config] - 10https://gerrit.wikimedia.org/r/812446 (https://phabricator.wikimedia.org/T311752) [13:18:14] (03CR) 10Hashar: [C: 03+2] zuul: [mediawiki/extensions/CampaignEvents] Promote to Wikimedia prod section [integration/config] - 10https://gerrit.wikimedia.org/r/812446 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy) [13:20:05] (03Merged) 10jenkins-bot: zuul: [mediawiki/extensions/CampaignEvents] Promote to Wikimedia prod section [integration/config] - 10https://gerrit.wikimedia.org/r/812446 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy) [13:22:20] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.39.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T308074 (10taavi) [13:22:40] 10Beta-Cluster-Infrastructure, 10DiscussionTools, 10Editing-team, 10User-Ryasmeen: "Service Temporarily Unavailable" shows up when trying to add a new section/topic to a talk page on Beta cluster - https://phabricator.wikimedia.org/T312689 (10taavi) 05Open→03Resolved [13:22:45] 10Release-Engineering-Team, 10CampaignEvents, 10Campaign-Registration, 10Campaign-Tools (Campaign-Tools-Sprint-16), 10Patch-For-Review: Release V0 of the CampaignEvents extension to the Beta Cluster - https://phabricator.wikimedia.org/T311752 (10Daimona) [14:34:02] (03CR) 10Thcipriani: [C: 03+2] Start branching CampaignEvents for Wikimedia production [tools/release] - 10https://gerrit.wikimedia.org/r/811689 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy) [14:35:00] (03Merged) 10jenkins-bot: Start branching CampaignEvents for Wikimedia production [tools/release] - 10https://gerrit.wikimedia.org/r/811689 (https://phabricator.wikimedia.org/T311752) (owner: 10Daimona Eaytoy) [14:38:52] 10Release-Engineering-Team, 10CampaignEvents, 10Campaign-Registration, 10Campaign-Tools (Campaign-Tools-Sprint-16), 10Patch-For-Review: Release V0 of the CampaignEvents extension to the Beta Cluster - https://phabricator.wikimedia.org/T311752 (10Daimona) [17:25:43] !log Cherry-picked Ief73cc553 (varnish: use libvmod-querysort on Beta Cluster) on deployment-prep Puppetmaster. Can be reverted if there are any issues. [17:25:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:39:33] !log ori@deployment-mediawiki12:~$ sudo apt install php-tideways-xhprof-dbgsym [20:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:44:20] (03PS1) 10Jbond: WIP: add files for custom image for beaker builds [integration/config] - 10https://gerrit.wikimedia.org/r/812463 [20:46:19] (03PS2) 10Jbond: WIP: add files for custom image for beaker builds [integration/config] - 10https://gerrit.wikimedia.org/r/812463 [20:48:46] (03CR) 10CI reject: [V: 04-1] WIP: add files for custom image for beaker builds [integration/config] - 10https://gerrit.wikimedia.org/r/812463 (owner: 10Jbond) [20:48:55] (03PS1) 10Umherirrender: [zuul] Archive skin BlueSpiceSkin [integration/config] - 10https://gerrit.wikimedia.org/r/812464 (https://phabricator.wikimedia.org/T203215) [20:49:30] (03PS2) 10Umherirrender: [zuul] Archive skin BlueSpiceSkin [integration/config] - 10https://gerrit.wikimedia.org/r/812464 (https://phabricator.wikimedia.org/T203215) [20:50:37] (03PS1) 10Umherirrender: [zuul] Archive extension BlueSpiceBookshelfUI [integration/config] - 10https://gerrit.wikimedia.org/r/812465 (https://phabricator.wikimedia.org/T268085) [20:51:27] (03CR) 10CI reject: [V: 04-1] [zuul] Archive skin BlueSpiceSkin [integration/config] - 10https://gerrit.wikimedia.org/r/812464 (https://phabricator.wikimedia.org/T203215) (owner: 10Umherirrender) [20:52:52] (03PS1) 10Umherirrender: [zuul] Archive extension BlueSpiceExtensions [integration/config] - 10https://gerrit.wikimedia.org/r/812466 [20:53:02] (03CR) 10CI reject: [V: 04-1] [zuul] Archive extension BlueSpiceBookshelfUI [integration/config] - 10https://gerrit.wikimedia.org/r/812465 (https://phabricator.wikimedia.org/T268085) (owner: 10Umherirrender) [20:54:46] (03CR) 10CI reject: [V: 04-1] [zuul] Archive extension BlueSpiceExtensions [integration/config] - 10https://gerrit.wikimedia.org/r/812466 (owner: 10Umherirrender) [21:02:57] (03PS1) 10Umherirrender: [zuul] Add extension SemanticRESTAPI [integration/config] - 10https://gerrit.wikimedia.org/r/812467 (https://phabricator.wikimedia.org/T311226) [21:07:18] (03PS2) 10Umherirrender: [zuul] Archive extension BlueSpiceExtensions [integration/config] - 10https://gerrit.wikimedia.org/r/812466 [21:08:59] (03PS2) 10Umherirrender: [zuul] Archive extension BlueSpiceBookshelfUI [integration/config] - 10https://gerrit.wikimedia.org/r/812465 (https://phabricator.wikimedia.org/T268085) [21:10:17] (03PS3) 10Umherirrender: [zuul] Archive skin BlueSpiceSkin [integration/config] - 10https://gerrit.wikimedia.org/r/812464 (https://phabricator.wikimedia.org/T203215)