[01:02:45] 10Phabricator, 10Release-Engineering-Team (Bonus Level 🕹️), 10serviceops, 10serviceops-collab, 10Patch-For-Review: Setup rsync for phab data on disk - https://phabricator.wikimedia.org/T313360 (10Dzahn) [01:03:07] 10Phabricator, 10Release-Engineering-Team (Bonus Level 🕹️), 10serviceops, 10serviceops-collab, 10Patch-For-Review: Setup rsync for phab data on disk - https://phabricator.wikimedia.org/T313360 (10Dzahn) [01:03:54] 10Phabricator, 10Release-Engineering-Team (Bonus Level 🕹️), 10serviceops, 10serviceops-collab, 10Patch-For-Review: Setup rsync for phab data on disk - https://phabricator.wikimedia.org/T313360 (10Dzahn) [03:55:05] PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [03:57:27] RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 14 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [04:18:40] (03CR) 10Hashar: [C: 03+2] "I am very happy to see bits from https://deploy-commands.toolforge.org/bacc being incorporated in scap CLI. Even copy pasting is error pro" [tools/scap] - 10https://gerrit.wikimedia.org/r/824552 (https://phabricator.wikimedia.org/T315444) (owner: 10Ahmon Dancy) [04:22:56] (03Merged) 10jenkins-bot: scap backport: Include bug ids in SAL messages [tools/scap] - 10https://gerrit.wikimedia.org/r/824552 (https://phabricator.wikimedia.org/T315444) (owner: 10Ahmon Dancy) [04:26:36] (03CR) 10Hashar: [C: 03+1] "I have deployed the change 🎉" [integration/docroot] - 10https://gerrit.wikimedia.org/r/810979 (https://phabricator.wikimedia.org/T307405) (owner: 10Stang) [06:10:21] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Zuul: zuul-merger on contint1001 does not run anymore - https://phabricator.wikimedia.org/T315586 (10hashar) [08:00:08] (03CR) 10Hashar: "Should we deploy that or do we need to wait for something else?" [integration/config] - 10https://gerrit.wikimedia.org/r/803291 (owner: 10Jforrester) [08:06:20] (03CR) 10Hashar: [C: 04-1] jjb: Use composer phpunit:entrypoint (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/803525 (https://phabricator.wikimedia.org/T90875) (owner: 10Kosta Harlan) [08:29:58] (03CR) 10Jaime Nuche: [C: 03+2] Use train-blockers.toolforge for scap stage-train auto information [tools/scap] - 10https://gerrit.wikimedia.org/r/824288 (https://phabricator.wikimedia.org/T310395) (owner: 10Ahmon Dancy) [08:34:13] (03Merged) 10jenkins-bot: Use train-blockers.toolforge for scap stage-train auto information [tools/scap] - 10https://gerrit.wikimedia.org/r/824288 (https://phabricator.wikimedia.org/T310395) (owner: 10Ahmon Dancy) [08:34:46] thcipriani: I have generated the deployment calendar for next week https://wikitech.wikimedia.org/wiki/Deployments#Week_of_August_22 [08:35:19] had to hardcode dan phpid in the script, he is next week backup conductor and somehow the script would not query phab to resolve the name [08:35:32] that was to unblock j.o.e asking to schedule a deploy next week [08:58:05] 10Release-Engineering-Team (Bonus Level 🕹️), 10Scap: Refactor Scap to use TimeoutLock as the sole locking mechanism - https://phabricator.wikimedia.org/T315531 (10jnuche) [08:58:07] 10Release-Engineering-Team (Bonus Level 🕹️), 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Make scap deploy to kubernetes together with the legacy systems - https://phabricator.wikimedia.org/T299648 (10jnuche) a:03jnuche [10:30:55] Is there any policy on importing a bigger set of pages ( xmlDump from en.wikipedia, 500 pages, latest revision only, no templates, not overwriting existing articles ) to the en.beta cluster? [10:31:07] I would use importDump.php [10:31:52] Need some more pages that have coordinates for testing GeoData search. [10:32:12] The alternative would be creating random article stubs with random coodrinates. [10:37:39] as long as you did it correctly with licensing etc, it shouldn't be a overall issue [10:38:02] maybe just hoping that the betacluster doesn't catch alight and crash might be your biggest one >.> [10:38:53] 10Project-Admins: Create subproject + milestone - https://phabricator.wikimedia.org/T315654 (10Olea) [10:48:54] Yeah thinking about it, I might just condense the import to stubs with the coordinates. Thanks though! [10:51:12] 10Project-Admins: Create subproject + milestone for Wikimedia España - https://phabricator.wikimedia.org/T315654 (10Aklapper) [10:52:30] 10Project-Admins: Create subproject + milestone for Wikimedia España - https://phabricator.wikimedia.org/T315654 (10Aklapper) @Olea: He, should they both be directly under #wikimedia-españa ? Or should the milestone be a milestone of the subproject? Also, could you please [provide short project descriptions](htt... [11:04:38] 10Project-Admins: Create subproject + milestone for Wikimedia España - https://phabricator.wikimedia.org/T315654 (10Olea) I think the proper way is a milestone of the subproject, yes. Description: * subproject: * Name: Jornadas WMES * Description ES: Encuentro anual de la comunidad WMES * Description EN:... [11:07:03] 10Release-Engineering-Team (Doing), 10Scap, 10Documentation, 10Patch-For-Review: scap documentation is no more generated - https://phabricator.wikimedia.org/T315541 (10hashar) [11:13:40] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10Utilities-code-utils: Add CI to mediawiki/tools/code-utils - https://phabricator.wikimedia.org/T309099 (10hashar) 05Open→03Resolved We now have `composer test` for php 7.2 to 8.1 and `shellcheck` on top of that. [14:10:22] (03CR) 10Jforrester: Zuul: [mediawiki/extensions/ImageSuggestions] Add to extension gate (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/803291 (owner: 10Jforrester) [14:23:26] 10Beta-Cluster-Infrastructure, 10WMDE-GeoInfo-FocusArea, 10Maps (Kartographer), 10WMDE-TechWish-Sprint-2022-08-17: Add test articles to the beta cluster - https://phabricator.wikimedia.org/T315673 (10WMDE-Fisch) [14:23:38] 10Beta-Cluster-Infrastructure, 10WMDE-GeoInfo-FocusArea, 10Maps (Kartographer), 10WMDE-TechWish-Sprint-2022-08-17: Add test articles to the beta cluster - https://phabricator.wikimedia.org/T315673 (10WMDE-Fisch) [14:24:15] 10Beta-Cluster-Infrastructure, 10WMDE-GeoInfo-FocusArea, 10Maps (Kartographer), 10Unplanned-Sprint-Work, 10WMDE-TechWish-Sprint-2022-08-17: Add test articles to the beta cluster - https://phabricator.wikimedia.org/T315673 (10WMDE-Fisch) [14:24:39] 10Beta-Cluster-Infrastructure, 10WMDE-GeoInfo-FocusArea, 10Maps (Kartographer), 10Unplanned-Sprint-Work, 10WMDE-TechWish-Sprint-2022-08-17: Add test articles to the beta cluster - https://phabricator.wikimedia.org/T315673 (10WMDE-Fisch) a:03WMDE-Fisch [14:54:22] 10Release-Engineering-Team (Bonus Level 🕹️), 10Scap: `scap backport` should include phabricator task in SAL messages - https://phabricator.wikimedia.org/T315444 (10dancy) 05Open→03Resolved The new behavior will be in the next scap release (probably 4.14.0) [15:00:49] (03PS1) 10Jforrester: dockerfiles: [php80] Upgrade to PHP 8.0.22 and cascade [integration/config] - 10https://gerrit.wikimedia.org/r/824755 [15:07:34] (03PS2) 10Jforrester: dockerfiles: [php80] Upgrade to PHP 8.0.22 and cascade [integration/config] - 10https://gerrit.wikimedia.org/r/824755 (https://phabricator.wikimedia.org/T315167) [15:07:36] (03PS1) 10Jforrester: jjb: Bump all PHP80 jobs to use images with PHP 8.0.22 [integration/config] - 10https://gerrit.wikimedia.org/r/824756 (https://phabricator.wikimedia.org/T315167) [15:08:23] (03CR) 10Jforrester: [C: 03+2] dockerfiles: [php80] Upgrade to PHP 8.0.22 and cascade [integration/config] - 10https://gerrit.wikimedia.org/r/824755 (https://phabricator.wikimedia.org/T315167) (owner: 10Jforrester) [15:10:25] (03Merged) 10jenkins-bot: dockerfiles: [php80] Upgrade to PHP 8.0.22 and cascade [integration/config] - 10https://gerrit.wikimedia.org/r/824755 (https://phabricator.wikimedia.org/T315167) (owner: 10Jforrester) [15:11:07] !log Docker: Building and publishing images with PHP 8.0.22 for T315167 [15:11:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:11:09] T315167: CI job mediawiki-quibble-composer-mysql-php80-docker on mediawiki/core gate-and-submit is flaky failing with Segmentation fault - https://phabricator.wikimedia.org/T315167 [15:15:00] !log Upgrading scap to latest code revision in beta cluster [15:15:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:15:51] (03PS1) 10Ahmon Dancy: Release 4.14.0-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/824758 [15:15:53] (03CR) 10Ahmon Dancy: [C: 03+2] Release 4.14.0-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/824758 (owner: 10Ahmon Dancy) [15:17:02] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10PHP 8.0 support, 10Patch-For-Review, 10ci-test-error: CI job mediawiki-quibble-composer-mysql-php80-docker on mediawiki/core gate-and-submit is flaky failing with Segmentation fault - https://phabricator.wikimedia.org/T315167 (10Reedy) Be... [15:22:14] (03Merged) 10jenkins-bot: Release 4.14.0-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/824758 (owner: 10Ahmon Dancy) [15:37:37] thanks for getting the new deployment calendar up hashar --- I fixed part of the script that does it on toolforge, but the pywikibot piece is still unhappy :\ [15:50:17] !log Docker: Build stalled out for 30 minutes; terminated and re-started. [15:50:18] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:07:19] (03CR) 10Jforrester: [C: 03+2] "Deployed. All looks fine so far." [integration/config] - 10https://gerrit.wikimedia.org/r/824756 (https://phabricator.wikimedia.org/T315167) (owner: 10Jforrester) [16:09:10] (03Merged) 10jenkins-bot: jjb: Bump all PHP80 jobs to use images with PHP 8.0.22 [integration/config] - 10https://gerrit.wikimedia.org/r/824756 (https://phabricator.wikimedia.org/T315167) (owner: 10Jforrester) [16:17:45] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10PHP 8.0 support, 10Patch-For-Review, 10ci-test-error: CI job mediawiki-quibble-composer-mysql-php80-docker on mediawiki/core gate-and-submit is flaky failing with Segmentation fault - https://phabricator.wikimedia.org/T315167 (10Jdforreste... [17:42:15] 10Release-Engineering-Team (Bonus Level 🕹️), 10Scap: scap: add progress reporting to php-fpm-restarts - https://phabricator.wikimedia.org/T302631 (10dancy) a:03dancy [17:48:24] thcipriani: the script had a failure when I first ran it [17:48:44] no asignee? [17:49:28] that is what I thought so I have assigned the N+2 train to Dan [17:49:50] 10:09:40 secondary = '{{ircnick|%s|%s}}' % (secondary.ircnick, secondary.fullname) [17:49:50] 10:09:40 AttributeError: 'NoneType' object has no attribute 'ircnick' [17:50:47] in the TrainFinder I had to add `task.users["PHID-USER-oetk6bbl6omm354ejz3b"] = "dduvall"` [17:51:09] even though there was a secondary with Dan PHPID-USER [17:51:50] it was not added by `_populate_users` or `_train_tasks` [17:52:01] the print statement were to pin point the cause [17:52:13] then I added the id/name manually and called it a day :-] [17:53:28] huh [17:53:52] James_F: I have noticed docker-pkg downloads all versions of the parent images which surely takes age ( https://phabricator.wikimedia.org/T310458 ) [17:54:18] James_F: I haven't investigated much though [17:54:35] thcipriani: that deployment-calendar script probably deserve a rewrite of some sort [17:54:47] then it works ™ [17:56:40] James_F: eg on contint2001 `docker-pkg` has fetched all the sury-php images [17:56:42] docker images|grep -c /sury-php [17:56:42] 12 [17:56:44] :-\ [17:56:54] anyway thanks for the php 8.0.22 update [17:57:57] I am off :-] merry week-end [18:05:14] enjoy [18:27:25] (03PS1) 10Ahmon Dancy: Add progress reporting to php-fpm-restarts [tools/scap] - 10https://gerrit.wikimedia.org/r/824774 (https://phabricator.wikimedia.org/T302631) [18:30:30] (03PS2) 10Ahmon Dancy: Add progress reporting to php-fpm-restarts [tools/scap] - 10https://gerrit.wikimedia.org/r/824774 (https://phabricator.wikimedia.org/T302631) [18:31:05] anyone have branch delete rights in gerrit? i mistakenly created the `es7` branch in `mediawiki/vendor` when needing an `es710` branch to match what exsits in the extensions. No big deal, but if someone has the rights to `git push origin --delete es7` in mediawiki/vendor it would help keep things clean. Nothing was ever merged to es7 it maps directly to a commit in master [18:31:15] ebernhardson: should be able to... [18:32:32] Hmm. Nope [18:32:42] Probably gonna need someone from releng (or with gerrit root) [18:33:04] or bd808 [18:33:20] ok, thanks for checking! [18:34:38] o/ [18:34:43] * dancy reads [18:34:56] * taavi tries [18:34:57] I can apparently delete wmf/ branches, but not other :D [18:36:28] I deleted it [18:36:35] somehow looks like only 'platform-engineering' can force push to all of mediawiki/* [18:37:03] dancy: excellent, keeping our stuff just a little cleaner :) appreciate it [18:37:14] taavi: welcome to gerrit rights [18:37:39] maybe the VandalFighters group can too, but i'm never entirely sure when reading gerrit access rights [19:47:11] 10Beta-Cluster-Infrastructure: Add basic alerting to the Beta Cluster - https://phabricator.wikimedia.org/T315695 (10TheresNoTime) [19:49:38] 10Beta-Cluster-Infrastructure, 10SRE-OnFire, 10Sustainability (Incident Followup): Add basic alerting to the Beta Cluster - https://phabricator.wikimedia.org/T315695 (10TheresNoTime) [20:32:02] 10Gerrit: Users with a different name in the cn field compared to uid field cannot use http auth - https://phabricator.wikimedia.org/T225308 (10thcipriani) 05Stalled→03Declined >>! In T225308#8159399, @Aklapper wrote: > Can this task be resolved, or declined, or is there more to do here? (Asking as tasks sho... [20:35:50] and evidently platform-engineering is an archived group. [20:35:52] fun. [20:39:49] wasn't CPT? [20:40:42] nope [20:41:03] the group pre-dates that. Think "platform" as in "gerrit is a platform" [20:43:20] ah, you mean gerrit group, not Phab tag [20:43:28] yep [20:43:53] sorry, late response to scrollback from like > 1hr ago without context :) [21:09:49] 10Beta-Cluster-Infrastructure: prometheus-beta.wmcloud.org 504 timeout - https://phabricator.wikimedia.org/T315699 (10TheresNoTime) [21:13:08] 10Beta-Cluster-Infrastructure: prometheus-beta.wmcloud.org 504 timeout - https://phabricator.wikimedia.org/T315699 (10TheresNoTime) p:05Triage→03Low Considering nothing appears to be complaining, and that it's fairly likely this hasn't worked for a while, I'm going to set the priority to **low** (else people... [21:14:13] ^ no need to panic ^ [21:36:50] (03CR) 10Thcipriani: [C: 03+2] deployments-calendar: Add TheresNoTime [tools/release] - 10https://gerrit.wikimedia.org/r/823724 (owner: 10Samtar) [21:37:41] (03Merged) 10jenkins-bot: deployments-calendar: Add TheresNoTime [tools/release] - 10https://gerrit.wikimedia.org/r/823724 (owner: 10Samtar) [21:43:56] PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [21:46:18] RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 19 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [22:05:04] 10Beta-Cluster-Infrastructure: prometheus-beta.wmcloud.org 504 timeout - https://phabricator.wikimedia.org/T315699 (10TheresNoTime) Prometheus is running, on port `9903` ` samtar@deployment-prometheus02:~$ sudo netstat -ltnp | grep "prometheus" tcp 0 0 127.0.0.1:9903 0.0.0.0:*... [22:16:50] this is annoying right now: [22:16:55] Error: Found 1 dependency cycle: [22:17:01] (Exec[Refresh sysusers] => User[scap] => Exec[bootstrap-scap-target] => Class[Scap] => Scap::Target[phabricator/deployment] => Package[phabricator/deployment] => Class[Phabricator::Phd] => Systemd::Sysuser[phd] => File[/etc/sysusers.d/phd.conf] => Exec[Refresh sysusers]) [22:17:09] can't even see those in the compiler before merge [22:17:28] and fixed like 3 other issues before.. thought finally it works.. then this new one [22:17:57] and it's just because I am trying to use be a good user and use systemd::sysuser to create system users (phd) now [22:18:00] but scap does as well [22:18:36] and then there is that circular dependency somehow.. but I don't want to touch all of scap either if it can be avoided.. so yea... [22:19:33] that's why it's _still_ a problem with the phab role on new phab hosts.. after the 3 other things with LVS and whatnot [22:29:02] oof. [22:36:00] brennen: I found it's probably because of a change that happened within the last 24 hours, heh [22:36:19] in the context of this ticket https://phabricator.wikimedia.org/T315568 [22:37:13] a "require => Exec['Refresh sysusers']," was added to systemd::sysuser in gerrit:824696 [22:37:39] well. I am sure we have comments soon [22:38:53] the good part, I can just flip a switch in Hiera to disable :use of systemd::sysuser.. because .. we just added that [22:38:56] https://gerrit.wikimedia.org/r/c/operations/puppet/+/824796/1/hieradata/hosts/phab2002.yaml#1 [22:39:14] this does not give us the reserved UID though as it should be [22:39:46] and that's why this is all on-topic for https://phabricator.wikimedia.org/T313360 (Setup rsync for phab data on disk) .. that's how it goes [22:40:05] afterwards we will have a "proper UID" for phd, 920 [22:40:23] and 498 will go away with the old hosts.. and we werent supposed to use one under 900 [22:47:10] 10GitLab (Project Migration), 10Phabricator, 10Release-Engineering-Team (Bonus Level 🕹️), 10Epic, and 2 others: Migrate active repositories in Phabricator Differential to GitLab - https://phabricator.wikimedia.org/T191182 (10bd808) [22:49:19] yea.. so .. good test.. I got the phab role applied on phab2002 (without systemd::sysuser, without LVS etc) [22:49:30] but it still tries to make sshd listen on the wrong IP [22:49:47] so that's next but it's one step closer [22:49:59] 10Beta-Cluster-Infrastructure: deployment-mwlog01 full mwlog-data-01 volume - https://phabricator.wikimedia.org/T315707 (10TheresNoTime) [22:50:23] 10Beta-Cluster-Infrastructure: deployment-mwlog01 full mwlog-data-01 volume - https://phabricator.wikimedia.org/T315707 (10TheresNoTime) a:03TheresNoTime [22:52:43] it's because there were 2 sshd's, git-ssh and regular and both were supposed to get port 22, on their own IP [22:54:07] how much will Beta explode if I shut down ^ mwlog01 to extend the /srv volume? [22:55:14] one way to find out [22:57:12] aa [22:57:29] !log shutting down deployment-mwlog01 for T315707 [22:57:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:57:32] T315707: deployment-mwlog01 full mwlog-data-01 volume - https://phabricator.wikimedia.org/T315707 [22:58:51] mwlog could mean it's like centralllog or it's like oxygen or something else [22:59:37] oh, ignore that comment, like actual mwlog probably [23:04:21] !log resized deployment-mwlog01's /srv volume, restarted [23:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:06:23] 10Beta-Cluster-Infrastructure: deployment-mwlog01 full mwlog-data-01 volume - https://phabricator.wikimedia.org/T315707 (10TheresNoTime) 05Open→03Resolved Resized from 5GB to 15GB ` samtar@deployment-mwlog01:~$ sudo resize2fs /dev/sdb resize2fs 1.44.5 (15-Dec-2018) Filesystem at /dev/sdb is mounted on /srv;... [23:07:51] 10Phabricator, 10Release-Engineering-Team (Bonus Level 🕹️), 10serviceops, 10serviceops-collab, 10Patch-For-Review: Setup rsync for phab data on disk - https://phabricator.wikimedia.org/T313360 (10Dzahn) after we removed the LVS (git-ssh) setup part, worked on reserving a proper UID for the phd systemuser... [23:08:18] Thanks for doing the timeline TheresNoTime [23:09:01] RhinosF1: oh no worries, thought I'd at least fill in the time it started [23:09:02] 10Phabricator, 10Release-Engineering-Team (Bonus Level 🕹️), 10serviceops, 10serviceops-collab, 10Patch-For-Review: move phabricator to new hardware generation - https://phabricator.wikimedia.org/T280597 (10Dzahn) after we **removed the LVS (git-ssh) setup part**, worked on reserving a **proper UID for th... [23:10:37] RhinosF1: one day I'll pick https://uptime.wmcloud.org/status/beta back up again [23:14:40] TheresNoTime: is that your tool? [23:15:29] RhinosF1: it's a hacked on version of what I run at https://uptime.theresnotime.io/status/wmf-tools [23:17:25] Ah!