[07:51:28] Hallo. Is there a deployment window ending soon? Is it possible, by any chance, to deploy another thing? [07:51:44] Or should I schedule for another one and patiently wait? :) [07:52:55] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.39.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T308067 (10RhinosF1) [07:56:03] ... I guess I should wait :) [08:41:47] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.39.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T308067 (10Lucas_Werkmeister_WMDE) [08:56:08] 10Beta-Cluster-Infrastructure, 10Wikidata, 10Wikidata-Termbox, 10wdwb-tech, and 4 others: Move Termbox SSR for Beta Wikidata into deployment-prep project - https://phabricator.wikimedia.org/T304328 (10ItamarWMDE) > Please fix. All deployment-prep instances must be fully configured via Puppet and not by han... [09:27:11] <_joe_> jnuche: I might need help understanding why a scap configuration is not working [09:27:40] <_joe_> I added php_fpm_always_restart: true to the [deploy1002.eqiad.wmnet] stanza [09:27:46] <_joe_> but somehow running scap I still see [09:28:09] <_joe_> 08:42:33 Running '/usr/local/sbin/check-and-restart-php php7.2-fpm 100' on 315 host(s) [09:28:18] <_joe_> while the number at the end should be sys.maxsize [09:29:27] <_joe_> per https://github.com/wikimedia/scap/blob/master/scap/php_fpm.py#L52 [09:29:49] 10GitLab (Infrastructure), 10Data-Persistence-Backup, 10serviceops, 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Jelto) Backup size decreased after cleanup of big projects. Thanks again to @brennen and @Dzahn for finding and coordinating this! We... [09:30:40] _joe_: I'm not familiar with that part of the code, but I will take a look [09:31:19] <_joe_> jnuche: ah I thought it might be a brainfart in the configuration actually that I'm not seeing [09:36:48] _joe_: so it seems that log you're seeing is taking the valuye from the list of `mw_web_clusters` -> https://github.com/wikimedia/scap/blob/master/scap/main.py#L506 [09:37:13] which I assume contains all those hosts with a PHP process to restart [09:37:49] the `sys.maxsize` value seems to be used simply as a very upper bound to make sure the number of restarted hosts is not limited [09:38:01] *very high upper bound [09:38:05] <_joe_> jnuche: yes [09:38:07] <_joe_> btu somehow [09:38:17] <_joe_> php_fpm.INSTANCE.cmd doesn't contain the right threshold [09:38:39] <_joe_> but the default one [09:38:46] <_joe_> the current configuration reads [09:39:20] <_joe_> https://github.com/wikimedia/puppet/blob/production/modules/scap/templates/scap.cfg.erb#L149-L154 [09:39:24] <_joe_> so as I read the code [09:39:38] <_joe_> when php_fpm_always_restart is true [09:40:00] <_joe_> the php_fpm.INSTANCE.cmd should include sys.maxsize and not 100 in the command [09:41:03] I see, gotcha [09:42:55] (looking a bit more) [09:44:06] <_joe_> thanks [09:44:20] <_joe_> I've looked at this quite a bit and I still can't figure out what is going on [09:45:57] _joe_: in the meantime, have you tried adding `php_fpm_always_restart: true` to the `[global]` stanza in the config? [09:46:18] <_joe_> jnuche: no, I can do it manually and re-try a deployment [09:50:13] <_joe_> jnuche: sigh I think I know what the problem is [09:50:15] <_joe_> testing [09:50:57] <_joe_> 09:50:42 Running '/usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807' on 315 host(s) [09:51:01] <_joe_> sigh [09:51:02] <_joe_> True [09:51:04] <_joe_> not true [09:51:12] * _joe_ bangs head against the desk [09:51:15] 😄 [09:51:38] that happens, no harm no foul :) [10:05:58] <_joe_> jnuche: https://gerrit.wikimedia.org/r/c/operations/puppet/+/801345 :) [10:07:13] <_joe_> thanks for the help btw, I was going crazy [10:20:11] _joe_: of course, happy to help you rubber duck the issue ;) [10:20:15] (03PS1) 10Samwilson: Add `skins/Vector` as dependency to `WikiEditor` [integration/config] - 10https://gerrit.wikimedia.org/r/801349 (https://phabricator.wikimedia.org/T307725) [10:49:01] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.39.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T308067 (10Lucas_Werkmeister_WMDE) [11:46:13] !log apply gitlab-settings to gitlab1003 - T307142 [11:46:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:46:16] T307142: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 [11:47:24] !log apply gitlab-settings to gitlab1004 - T307142 [11:47:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:52:16] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10Jelto) [13:03:31] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10Jelto) `gitlab1003` and `gitlab1004` are configured as GitLab replicas now and are serving https://gitlab-replica-new.wikimedia.org/ and https://gitlab... [13:15:49] 10Release-Engineering-Team, 10SRE, 10SRE-OnFire, 10Sustainability: Remove old scap repositories from deploy1002 - https://phabricator.wikimedia.org/T309162 (10MoritzMuehlenhoff) p:05Triage→03Medium [13:25:02] 10Phabricator: De-link my aodit@wikimedia.org staff email from personal volunteer profile - https://phabricator.wikimedia.org/T305919 (10AOdit_WMF) Yes, all sorted, thank you Andre [14:41:06] <_joe_> whoever runs the train this week: we've enabled restarts of php-fpm across the infrastructure with every deployment [14:53:23] 10Release-Engineering-Team (Radar), 10Scap, 10Patch-For-Review, 10User-jijiki: Update Scap to perform rolling restart for all MW deploy - https://phabricator.wikimedia.org/T266055 (10Joe) Current status is: - Each deployment will restart php-fpm - api and appserver canaries have opcache revalidation turned... [15:40:05] 10Beta-Cluster-Infrastructure, 10Wikidata, 10Wikidata-Termbox, 10wdwb-tech, and 3 others: Move Termbox SSR for Beta Wikidata into deployment-prep project - https://phabricator.wikimedia.org/T304328 (10ItamarWMDE) After a quick discussion with @Jakob_WMDE, I think we should consider if we even require termb... [15:40:16] 10Beta-Cluster-Infrastructure, 10Wikidata, 10Wikidata-Termbox, 10wdwb-tech, and 3 others: Move Termbox SSR for Beta Wikidata into deployment-prep project - https://phabricator.wikimedia.org/T304328 (10Majavah) >>! In T304328#7966461, @ItamarWMDE wrote: >> Please fix. All deployment-prep instances must be f... [16:22:18] 10Beta-Cluster-Infrastructure, 10Wikidata, 10Wikidata-Termbox, 10wdwb-tech, and 3 others: Move Termbox SSR for Beta Wikidata into deployment-prep project - https://phabricator.wikimedia.org/T304328 (10Lucas_Werkmeister_WMDE) >>! In T304328#7967931, @ItamarWMDE wrote: > After a quick discussion with @Jakob_... [16:27:34] 10Beta-Cluster-Infrastructure, 10Wikidata, 10Wikidata-Termbox, 10wdwb-tech, and 3 others: Move Termbox SSR for Beta Wikidata into deployment-prep project - https://phabricator.wikimedia.org/T304328 (10Majavah) >>! In T304328#7968095, @Lucas_Werkmeister_WMDE wrote: >>>! In T304328#7967941, @Majavah wrote: >... [19:16:10] maintenance-disconnect-full-disks build 390625 integration-agent-docker-1036 (/: 30%, /srv: 98%, /var/lib/docker: 51%): OFFLINE due to disk space [19:21:00] maintenance-disconnect-full-disks build 390626 integration-agent-docker-1036 (/: 30%, /srv: 73%, /var/lib/docker: 50%): RECOVERY disk space OK [19:35:58] > 0:35 <•wikibugs> (CR) Krinkle: [WIP] phpunit: Remove suite.xml (1 comment) [core] - https://gerrit.wikimedia.org/r/741970 (https://phabricator.wikimedia.org/T227900) (owner: Kosta Harlan) [19:36:02] kostajh: hope that helps :) [19:37:36] Krinkle: ty!