[00:00:36] bd808: will give it a closer look. For now I'm affirming that your use-case is worthy of the monkier "epic saga." [00:07:33] bd808: I find it ugly [00:07:54] but that doesn't mean there's a prettier solution :/ [00:15:50] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10brennen) [00:22:11] Platonides: well, I find literally everything about Dockerfiles ugly so I won't disagree [00:23:43] If you mean the code example I gave is ugly, most of the ugly is already generated by Blubber -- https://github.com/wikimedia/toolhub/blob/main/.pipeline/local-python.Dockerfile [00:47:05] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10User-brennen: A tool for quickly answering what groups an extension is deployed to - https://phabricator.wikimedia.org/T296050 (10brennen) [00:47:45] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10User-brennen: A tool for quickly answering what groups an extension is deployed to - https://phabricator.wikimedia.org/T296050 (10brennen) [01:03:28] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10User-brennen: A tool for quickly answering what groups an extension is deployed to - https://phabricator.wikimedia.org/T296050 (10bd808) One truly horrible way would be by scraping Special:Version on all/representative wikis: `lang=javascript,name=[[w:en:Special:V... [01:05:40] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10User-brennen: A tool for quickly answering what groups an extension is deployed to - https://phabricator.wikimedia.org/T296050 (10AntiCompositeNumber) I'm surprised [[ https://bash.toolforge.org/quip/AVWoDg8ZgCrwkbTdmcjL | you of all people ]] are advocating scree... [01:07:58] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10User-brennen: A tool for quickly answering what groups an extension is deployed to - https://phabricator.wikimedia.org/T296050 (10bd808) >>! In T296050#7515637, @AntiCompositeNumber wrote: > I'm surprised [[ https://bash.toolforge.org/quip/AVWoDg8ZgCrwkbTdmcjL | y... [01:20:14] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10TheDJ) I was reading Daniel's comment and opened the graph board: https://grafana.wikimedia.org/d/GpL5R8CGz/mysql-query-rate... [01:33:20] 10GitLab (Auth & Access), 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10User-brennen, 10cloud-services-team (Kanban): Create top level 'cloud' group on Gitlab - https://phabricator.wikimedia.org/T293741 (10brennen) 05Resolved→03In progress > What about us who have advanced access on WMCS but don... [01:39:45] 10GitLab (Auth & Access), 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10User-brennen, 10cloud-services-team (Kanban): Create top level 'cloud' group on Gitlab - https://phabricator.wikimedia.org/T293741 (10brennen) p:05Triage→03Medium [01:40:09] 10GitLab (Administration, Settings & Policy), 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10User-brennen: GitLab should not display ads for paid versions - https://phabricator.wikimedia.org/T295453 (10brennen) p:05Triage→03Medium [02:45:45] 10GitLab (Auth & Access), 10Release-Engineering-Team (Priority Backlog 📥): Create a top level wmde group on Gitlab - https://phabricator.wikimedia.org/T291388 (10brennen) 05In progress→03Resolved @Addshore I've added you as an owner of `people/wmde`, which as @thcipriani mentioned grants access to `repos/w... [06:35:31] Project beta-scap-sync-world build #27851: 04FAILURE in 1 min 8 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27851/ [06:45:37] Project beta-scap-sync-world build #27852: 04STILL FAILING in 1 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27852/ [06:56:30] Project beta-scap-sync-world build #27853: 04STILL FAILING in 2 min 5 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27853/ [07:05:41] Project beta-scap-sync-world build #27854: 04STILL FAILING in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27854/ [07:16:05] Project beta-scap-sync-world build #27855: 04STILL FAILING in 1 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27855/ [07:23:44] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10Marostegui) p:05Medium→03Unbreak! This has caused a huge increase on queries, this is an example of an `enwiki` replica:... [07:24:59] dduvall, thcipriani, jeena: ^ [07:25:26] duesen, Pchelolo: ^ [07:26:16] 07:25:40 RhinosF1: thanks - I am fine if someone wants to investigate but if not (or it is not easy to find) I would prefer to revert [07:26:31] Project beta-scap-sync-world build #27856: 04STILL FAILING in 2 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27856/ [07:27:19] RhinosF1: looking [07:29:34] I think the best thing to do at the moment is roll back to testwikis [07:30:12] jeena: unless you can get hold of someone confident enough to revert individual patch then probably [07:31:44] jeena: o/ db arch here, I'm also looking [07:32:01] if there is a patch that we can revert, I can take over and do it [07:32:13] I'm not sure what patch it is [07:32:17] but I need to investigate more [07:32:32] The risky one says it could cause this [07:32:41] Oh, that one [07:33:08] Amir1: do you mean don't roll back while you investigate? [07:33:12] Bugs could go two ways: failure to cache may cause extra load on database servers. [07:33:44] Amir1: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/699067/ is a possibility and should be revertable according to the comment [07:33:51] jeena: maybe for a bit [07:33:52] there was a bug in logging I think caused by rolling back on tuesday [07:33:55] okay [07:34:36] It goes clean Amir1 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/739840 [07:34:45] I love how a patch to reduce db read causes it to quadruple [07:35:04] lol [07:36:01] so there are some notes: 1- We are not sure this is really the cause, we can definitely revert and see if it reduces the load, it can be anything in wmf.9 [07:36:45] 2- the revert patch has not passed CI yet, I'm not comfortable merging it right now, if jenkins is green, then it's okay to merge [07:37:02] Project beta-scap-sync-world build #27857: 04STILL FAILING in 1 min 47 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27857/ [07:37:03] I can stand by if we want to try that [07:37:12] isn't like crazy late there? [07:37:37] I'm usually up at this hour anyway :P [07:37:43] haha, okay then [07:37:57] I can be here for like 25 minutes but then I got to go for college, might have some free time when I get there but I'm in for one hell of a day so i doubt it [07:38:06] I investigate in the mean time to see if I can spot any obvious mistake in the code [07:38:13] okay, sounds good [07:38:46] But revert is uploaded and waiting on CI if you're happy to go and just do it, if I'm around then I'll see emails + IRC pings [07:39:25] I'll wait for CI I guess. Is there a cherry pick? [07:39:39] I can do cherry pick [07:39:52] there is now https://gerrit.wikimedia.org/r/c/mediawiki/core/+/739841 [07:39:55] thanks :) [07:40:13] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10brennen) [07:40:40] Me + Amir raced [07:40:49] But https://gerrit.wikimedia.org/r/c/mediawiki/core/+/739841 ye [07:41:14] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10Ladsgroup) p:05Unbreak!→03Medium Changing priority back as {T296063} is UBN now. [07:41:56] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10brennen) > This really needs to be investigated and/or reverted, enwiki is having 4x times the amount of queries it used to... [07:42:15] Should be 15-20 minutes before CI gets green on both [07:43:08] brennen: that's an interesting point [07:43:22] Which means rolling back the train is gonna be a pain [07:43:36] So I guess this revert passing and working is best option [07:43:38] ah yeah that's what I was mentioning about the errors for logging [07:43:51] majavah: ^ [07:44:04] left a comment [07:44:08] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10Majavah) >>! In T293950#7515983, @brennen wrote: > I am wondering what impact T295930 - PHP Notice: Array to string conversi... [07:44:12] jeena: just read the ticket and actually clicked on the impact [07:44:34] I didn't explain very well [07:44:36] majavah: what about logs? That generated quite a lot in 15 minutes [07:44:41] * brennen here [07:44:50] i might not be much more useful than "here", but i'm here. :) [07:44:58] no worries brennen [07:45:01] it'll cause a short spike, it should get better when those entries are no longer in special:RecentChanges [07:45:17] It's about 1/s so probably not that bad [07:45:26] Logstash I'd assume can take that [07:45:28] i'm around but can't focus very much due to $IMPORTANT_REAL_LIFE_THING [07:46:04] I've got 15 minutes until life reminds me it's a weekday [07:46:22] Also yesterday very much not a good day so definately will be busy [07:47:04] Project beta-scap-sync-world build #27858: 04STILL FAILING in 2 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27858/ [07:47:24] ^ seems unrelated [07:47:36] That's a permission error [07:47:44] I assume chmod or chown will fix [07:48:02] it's related to d.ancy's work [07:49:41] Amir1: and others, please ping if you need me [07:49:59] thanks majavah. [07:49:59] thanks majavah :) [07:50:58] you're awesome majavah [07:51:12] agreed [07:51:26] Definitely [07:52:52] sidebar: lunar eclipse is just starting to get good from this locality. [07:53:31] I think it might be too cloudy here [07:54:16] * RhinosF1 can't see much [07:54:37] That's probably because the sun is far too bright this morning [07:56:00] Project beta-scap-sync-world build #27859: 04STILL FAILING in 1 min 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27859/ [07:56:15] hah [08:02:31] Amir1: master passed CI [08:02:56] +2ing [08:04:03] K [08:04:10] WMF says 0 min [08:05:37] Amir1: WMF branch is a go too [08:05:54] Both are being submitted :) [08:06:05] Project beta-scap-sync-world build #27860: 04STILL FAILING in 1 min 40 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27860/ [08:06:17] I'll probably have to go before Jenkins merges [08:06:31] thanks for your help! [08:06:46] ^ [08:08:05] Amir1: sounds good on the master patch but make sure to block next train too [08:08:37] I know, just give me an hour to see if I can fix the issue [08:08:45] or should I hand it over [08:09:12] Ok [08:09:13] :) [08:15:45] Project beta-scap-sync-world build #27861: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27861/ [08:20:10] i'm calling it for the night. thanks everybody. [08:23:01] bye brenne.n [08:23:08] I'm off now [08:23:13] toodles [08:23:14] Have a good end to the week [08:23:22] you too! [08:25:00] Amir1: should I sync it now? [08:25:09] I'm doing it [08:25:57] oh okay [08:26:02] Project beta-scap-sync-world build #27862: 04STILL FAILING in 1 min 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27862/ [08:29:03] jeena: the reads fixed https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=16&orgId=1&from=now-1h&to=now&var-job=All&var-server=db1163&var-port=9104 [08:29:08] go get rest!!!! [08:29:23] hooray! glad it worked. And thanks for taking it on [08:30:32] I'm off to sleep. Have a good weekend! [08:31:17] jeena: good night ;) [08:31:52] Amir1: I am around if needed [08:32:12] not sure how my american colleagues ended up still being around past midnight though [08:32:46] Thanks. I think we are out of the woods now but It's not reverted on master yet. I try to talk to Daniel and Petr [08:33:54] then if stuff got rolled back, we have the day to stabilize [08:34:06] and I can step in on monday morning to move us forward again [08:36:16] Project beta-scap-sync-world build #27863: 04STILL FAILING in 1 min 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27863/ [08:36:19] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T293952 (10Ladsgroup) [08:37:07] hashar: we are at wmf.9 still, the revert got cherry-picked [08:37:27] awesome! [08:38:05] sal died and gives 500 error https://sal.toolforge.org/production [08:38:16] works for me [08:43:51] fun that gives me a server side error :/ [08:46:04] Project beta-scap-sync-world build #27864: 04STILL FAILING in 1 min 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27864/ [08:49:51] Ty all [08:49:56] Sal fine here too [08:50:02] You been amazing [08:55:58] Project beta-scap-sync-world build #27865: 04STILL FAILING in 1 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27865/ [09:06:06] Project beta-scap-sync-world build #27866: 04STILL FAILING in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27866/ [09:13:23] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10dcaro) > So, to reiterate, I would say if the teams for which you have assignments (and who's work would benefit from this tracking) is limited to 2 or few... [09:15:59] Project beta-scap-sync-world build #27867: 04STILL FAILING in 1 min 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27867/ [09:17:47] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10dcaro) > FWIW, I think that this approach isn't exclusive of personal tasks or otherwise. You could, for example, have "SRE-origin-user" and track that on... [09:18:25] hashar: I got a 500 too now [09:18:44] Might be worth telling bd.808 when they awake [09:19:17] ah yeah I'm seeing it to [09:20:04] It's intermittent [09:20:16] A refresh and it came back again [09:20:39] I'll create a task [09:23:28] https://phabricator.wikimedia.org/T296072 [09:26:04] 10GitLab (Auth & Access), 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10User-brennen, 10cloud-services-team (Kanban): Create top level 'cloud' group on Gitlab - https://phabricator.wikimedia.org/T293741 (10aborrero) Thanks! I tried creating a subgroup: https://gitlab.wikimedia.org/repos/cloud/tool... [09:26:21] Project beta-scap-sync-world build #27868: 04STILL FAILING in 1 min 51 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27868/ [09:36:07] Project beta-scap-sync-world build #27869: 04STILL FAILING in 1 min 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27869/ [09:46:06] Project beta-scap-sync-world build #27870: 04STILL FAILING in 1 min 40 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27870/ [09:56:00] Project beta-scap-sync-world build #27871: 04STILL FAILING in 1 min 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27871/ [10:06:10] Project beta-scap-sync-world build #27872: 04STILL FAILING in 1 min 40 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27872/ [10:16:21] Project beta-scap-sync-world build #27873: 04STILL FAILING in 1 min 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27873/ [10:23:12] hmm [10:24:17] 00:01:55.013 cp: cannot create regular file '/srv/mediawiki-staging/php-master/cache/l10n/l10n_cache-ab.cdb': Permission denied [10:24:18] joy [10:25:43] probably due to https://gerrit.wikimedia.org/r/c/operations/puppet/+/739620 for T295304 [10:25:43] T295304: Improve efficiency of scap l10n operations - https://phabricator.wikimedia.org/T295304 [10:26:09] Project beta-scap-sync-world build #27874: 04STILL FAILING in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27874/ [10:28:14] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Scap, 10Patch-For-Review: Improve efficiency of scap l10n operations - https://phabricator.wikimedia.org/T295304 (10hashar) 05Resolved→03Open The deployment-prep job is broken since 11/19 06:35 UTC https://integration.wikimedia.org/ci/job/beta-scap-syn... [10:36:03] Project beta-scap-sync-world build #27875: 04STILL FAILING in 1 min 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27875/ [10:42:47] Project beta-scap-sync-world build #27876: 04STILL FAILING in 5 min 26 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27876/ [10:44:40] Project beta-scap-sync-world build #27877: 04STILL FAILING in 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27877/ [10:48:53] Project beta-scap-sync-world build #27878: 04STILL FAILING in 1 min 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27878/ [10:54:59] Project beta-scap-sync-world build #27879: 15ABORTED in 4 min 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27879/ [11:08:30] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10hashar) I have moved /srv/mediawiki-staging/php-master/cache/l10n to a l10n-old. Ran Puppet... [11:08:59] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Scap, 10Patch-For-Review: Improve efficiency of scap l10n operations - https://phabricator.wikimedia.org/T295304 (10hashar) I have moved /srv/mediawiki-staging/php-master/cache/l10n to a l10n-old. Ran Puppet which did not do anything. I have triggered a ne... [11:09:27] !log deployment-prep: fixed l10n permission issue that caused scap to abort early since 6:35 UTC # T295304 [11:09:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:09:31] T295304: Improve efficiency of scap l10n operations - https://phabricator.wikimedia.org/T295304 [11:11:25] Yippee, build fixed! [11:11:25] Project beta-scap-sync-world build #27880: 09FIXED in 15 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27880/ [11:12:16] succes! [11:12:44] dancy: somehow the l10n directory on deployment-prep was owned by www-data rather than l10nupdate. I manually fixed the permission and that seems to work [11:13:09] if there is no l10n directory available, scap or some script seems to create a l10n regular file and everything explodes :D [11:15:24] afk [11:28:56] jerkins seems backlogged [11:53:44] one more train issue maybe? T296077 [11:53:45] T296077: CentralNotice banners not showing in Minerva - https://phabricator.wikimedia.org/T296077 [12:06:17] boldly added wmf.9 blockers as parent task [12:06:26] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10Lucas_Werkmeister_WMDE) [12:15:59] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10Ladsgroup) [13:33:28] rebase hell again [13:34:29] (03PS2) 10Hashar: jjb: move docker run --volume option(s) at end [integration/config] - 10https://gerrit.wikimedia.org/r/739888 [13:36:45] (03CR) 10Hashar: [C: 03+2] jjb: move docker run --volume option(s) at end [integration/config] - 10https://gerrit.wikimedia.org/r/739888 (owner: 10Hashar) [13:39:05] (03Merged) 10jenkins-bot: jjb: move docker run --volume option(s) at end [integration/config] - 10https://gerrit.wikimedia.org/r/739888 (owner: 10Hashar) [14:09:46] 10Release-Engineering-Team (Deployment Training Requests): Deployment training request for JKieserman - https://phabricator.wikimedia.org/T296024 (10JKieserman) Thank you! Would it be possible to attend the earlier session instead? [15:01:13] (03PS8) 10Hashar: jjb: play with Jinja2 [integration/config] - 10https://gerrit.wikimedia.org/r/739282 [15:04:15] (03PS1) 10Hashar: jjb: move another --volume option at end [integration/config] - 10https://gerrit.wikimedia.org/r/740180 [15:05:41] (03PS9) 10Hashar: jjb: play with Jinja2 [integration/config] - 10https://gerrit.wikimedia.org/r/739282 [15:10:01] (03PS4) 10Hashar: jjb: migrate docker --volume to "volumes" yaml map [integration/config] - 10https://gerrit.wikimedia.org/r/739583 [15:10:24] pfff [15:10:25] finally [15:10:31] got stuff almost in a noop state :] [15:20:28] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10thcipriani) >>! In T293950#7477468, @daniel wrote: > ##### Risky Patch! 🚂🔥 > > * **Change**: https://gerrit.wikimedia.org... [15:29:22] (03CR) 10Hashar: [C: 03+2] jjb: move another --volume option at end [integration/config] - 10https://gerrit.wikimedia.org/r/740180 (owner: 10Hashar) [15:30:33] (03CR) 10Hashar: "It is a noop in jjb at the expanse of some crazy templating ;D The idea is to make the job-templates slightly easier and more yamlish." [integration/config] - 10https://gerrit.wikimedia.org/r/739282 (owner: 10Hashar) [15:32:28] (03Merged) 10jenkins-bot: jjb: move another --volume option at end [integration/config] - 10https://gerrit.wikimedia.org/r/740180 (owner: 10Hashar) [15:45:40] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Scap, 10Patch-For-Review: Improve efficiency of scap l10n operations - https://phabricator.wikimedia.org/T295304 (10dancy) As of https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/738453 (which is included in 4.0.3-1+0~20211117232016.94~1.gbpafe1e5) t... [15:51:06] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Scap, 10Patch-For-Review: Improve efficiency of scap l10n operations - https://phabricator.wikimedia.org/T295304 (10Majavah) Not sure why this has happened on some hosts but not all, but the explanation is simple-ish: ` taavi@deployment-deploy01:~$ apt-cac... [15:52:54] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Scap, 10Patch-For-Review: Improve efficiency of scap l10n operations - https://phabricator.wikimedia.org/T295304 (10dancy) >>! In T295304#7516860, @dancy wrote: > The primary problem here is that scap was auto-downgraded. It looks like that happened on so... [16:03:30] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review: beta-build-scap-deb failing - https://phabricator.wikimedia.org/T295719 (10dancy) 05Open→03Resolved [16:09:03] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10akosiaris) p:05Triage→03Medium [16:09:28] dancy: good morning! there have been some issue with the scap upgrade on deployment-prep this morning ;D [16:09:39] but i see you answer [16:09:40] ed [16:10:11] Saw that. Looking for a solution now (trying to see how I can influence the package version that the debian glue builder creates) [16:10:53] it should get it from the debian/changelog [16:11:06] + some git magic string appended to it [16:11:43] hmm.. in that case I'll just bump the revision from 1 to 2. That should do the trick. [16:12:26] or I could change from 4.0.3 to 4.1.0 in preparation for the next release. [16:13:10] there might also be some different between stretch and buster which I guess is going to be rabbit hole [16:13:15] so yeah bump minor maybe [16:13:37] alright, going with that. [16:14:00] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10akosiaris) p:05Medium→03Unbreak! [16:15:49] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10MBinder_WMF) > All my tasks are tagged with 1 team (cloud services), so you could say it's limited to 1. They might have other team tags too, though I don'... [16:16:04] dancy: good luck! I am reaching week-end time unfortunately :] [16:16:18] the CI madness frmo yesterday seems to be solved (well not really but back to the previous half broken state) [16:16:21] No problem. I have a handle on it. Thanks for the backup! [16:16:24] dduvall: if around, please see the mem leak above [16:16:26] will look at rebalancing instances partitions next week [16:17:28] oh joy a mem leak of doom :-\ [16:18:11] hashar: on a friday too [16:18:18] yeah well it happens [16:18:22] Looks like we might have to rollback [16:18:42] thcipriani: for the record given that next week is thanksgiving in the USA, I am more happy to complete the train dance next week if we end up rolling back and need to push stuf fnext week [16:18:56] cause next week is a regular week for me :-] [16:19:06] (03PS1) 10Ahmon Dancy: Advanced version to 4.0.3-2 [tools/scap] - 10https://gerrit.wikimedia.org/r/740193 [16:19:11] Yeah otherwise 11 would be awful [16:19:14] noting it on the task [16:19:19] oh yeah, I'm off all next week! [16:19:25] ^ same [16:19:28] hashar: <3 [16:19:49] That would be 4 weeks if we went straight .7 -> .11 [16:19:55] As there was no .8 [16:20:09] (03CR) 10jerkins-bot: [V: 04-1] Advanced version to 4.0.3-2 [tools/scap] - 10https://gerrit.wikimedia.org/r/740193 (owner: 10Ahmon Dancy) [16:20:49] good luck [16:20:52] gah! [16:20:53] and good rollback :-\ [16:21:00] & [16:21:00] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10hashar) Next week is thanksgiving in the US, I am volunteering to run any follow up train actions that would need to happen... [16:21:19] ouch [16:22:28] Is anyone leading the rollback now? [16:22:58] Not as far as I can tell. [16:23:11] no, still looking at the task [16:24:09] (03CR) 10Ahmon Dancy: "recheck" [tools/scap] - 10https://gerrit.wikimedia.org/r/740193 (owner: 10Ahmon Dancy) [16:25:27] (03CR) 10Ahmon Dancy: [C: 03+2] Advanced version to 4.0.3-2 [tools/scap] - 10https://gerrit.wikimedia.org/r/740193 (owner: 10Ahmon Dancy) [16:26:06] (03Merged) 10jenkins-bot: Advanced version to 4.0.3-2 [tools/scap] - 10https://gerrit.wikimedia.org/r/740193 (owner: 10Ahmon Dancy) [16:29:50] is it safe to rollback to group1 wikis? [16:30:05] (03PS1) 10Ahmon Dancy: Change 4.0.3-1 to 4.0.3-2 in debian/changelog [tools/scap] - 10https://gerrit.wikimedia.org/r/740197 [16:30:15] wasn't there an explosion the last time we tried to rollback this train? [16:30:23] (03CR) 10Ahmon Dancy: [C: 03+2] Change 4.0.3-1 to 4.0.3-2 in debian/changelog [tools/scap] - 10https://gerrit.wikimedia.org/r/740197 (owner: 10Ahmon Dancy) [16:30:58] When jeena rolled back to .7 earlier in the week, there are more errors [16:31:26] These: https://phabricator.wikimedia.org/T295930 [16:31:27] wmf.9 centralauth creates log entries that wmf.7 centralauth does not understand, there'll be a spike of logspam that'll get better when those log entries are no longer visible in special:recentchanges [16:32:21] majavah: thanks for that explanation [16:32:37] thcipriani: why group1, will 0+1 together not cause memory issues? [16:32:44] (03Merged) 10jenkins-bot: Change 4.0.3-1 to 4.0.3-2 in debian/changelog [tools/scap] - 10https://gerrit.wikimedia.org/r/740197 (owner: 10Ahmon Dancy) [16:33:08] thcipriani: oh, and the centralauth thing only affects metawiki [16:33:13] RhinosF1: I was looking at the timing and it matched group1->group2, but looking more closely I see they all rolled out in 30 minutes. [16:33:20] which is group1 [16:33:43] thcipriani: yeah it was very close together [16:34:05] so I guess I'll roll back to group0 since that's "safe" [16:35:28] It didn't seem to be an issue there so that's best [16:35:32] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10thcipriani) a:05jeena→03thcipriani [16:35:34] I hope [16:38:32] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Scap, 10Patch-For-Review: Improve efficiency of scap l10n operations - https://phabricator.wikimedia.org/T295304 (10dancy) I upgraded the scap package on beta hosts to 4.0.3-2+0~20211119163357.100~1.gbp08fad4. This version number is greater than 4.0.3-2,... [16:45:31] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10jcrespo) I double-checked and the timestamp where heavy memory usage growth corresponds quite well with an increase in MySQL qu... [16:49:08] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10thcipriani) a:05thcipriani→03None Unassigning now that this is rolled back. >>! In T296098#7517064, @jcrespo wrote: > I do... [18:44:35] 10GitLab (Auth & Access), 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10User-brennen, 10cloud-services-team (Kanban): Create top level 'cloud' group on Gitlab - https://phabricator.wikimedia.org/T293741 (10brennen) > Or is there some kind of inheritance from the parent group? One thing that's non-o... [18:56:31] 10GitLab (Auth & Access), 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10User-brennen, 10cloud-services-team (Kanban): Create top level 'cloud' group on Gitlab - https://phabricator.wikimedia.org/T293741 (10brennen) (Added `people/volunteer-group-cloud-admin` as maintainers in the meanwhile.) [19:13:26] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10dancy) This happened again today: https://integration.wikimedia.org/ci/job/wmf-quibble-se... [19:15:18] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10dcaro) The above is good for me :) [19:28:16] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10MBinder_WMF) [19:34:46] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10cjming) [19:35:36] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10MBinder_WMF) Done! #cloud-services-worktype-project #cloud-services-worktype-maintenance #cloud-services-worktype-unplanned #cloud-services-origin-user... [19:39:33] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10mmodell) Is there any reason not to use the {icon tag color=yellow}tag icon, even if not really intended for global use? [19:47:36] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10stjn) [20:06:47] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10MBinder_WMF) Good question! I am a dinosaur when it comes to the naming/coloring/icon conventions established early on in WMF's use of Phab, and though I k... [20:07:59] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10Ladsgroup) We can test seeing if it's caused by bad GC (plus the db bug) by restarting php-fpm and seeing if the memory usage r... [20:27:51] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10mmodell) The tag icon is entirely semantic but it can be used in search and I find that helpful for filtering out projects that are not relevant to a parti... [20:30:44] 10Release-Engineering-Team (Deployment Training Requests): Deployment training request for JKieserman - https://phabricator.wikimedia.org/T296024 (10thcipriani) >>! In T296024#7516616, @JKieserman wrote: > Thank you! Would it be possible to attend the earlier session instead? Sure thing, I added you to the UTC-... [20:32:48] (03PS1) 10Hashar: jjb: expand -v docker run option and move to end [integration/config] - 10https://gerrit.wikimedia.org/r/740245 [20:34:52] (03CR) 10jerkins-bot: [V: 04-1] jjb: expand -v docker run option and move to end [integration/config] - 10https://gerrit.wikimedia.org/r/740245 (owner: 10Hashar) [20:35:23] 10Release-Engineering-Team (Deployment Training Requests): Deployment training request for JKieserman - https://phabricator.wikimedia.org/T296024 (10JKieserman) Awesome, thank you! [20:36:13] oh a shellcheck error [20:38:22] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10MBinder_WMF) Word. By all means, change it if that's generally accepted :) [20:38:25] (03PS2) 10Hashar: jjb: expand -v docker run option and move to end [integration/config] - 10https://gerrit.wikimedia.org/r/740245 [20:38:27] (03PS10) 10Hashar: jjb: play with Jinja2 [integration/config] - 10https://gerrit.wikimedia.org/r/739282 [20:38:29] (03PS5) 10Hashar: jjb: migrate docker --volume to "volumes" yaml map [integration/config] - 10https://gerrit.wikimedia.org/r/739583 [20:38:51] ebernhardson: first time I get a shellcheck error while editing jenkins job, and that definitely caught an issue that would have caused major havoc! thanks :] [20:45:48] 10GitLab (Auth & Access): Create subgroup for 'wikisp' - https://phabricator.wikimedia.org/T296110 (10Galahad) [21:37:50] (03CR) 10Jeena Huneidi: "A .pipeline/config file is needed in the servicelib-node repository" [integration/config] - 10https://gerrit.wikimedia.org/r/739908 (https://phabricator.wikimedia.org/T295994) (owner: 10Nikki Nikkhoui) [21:52:05] 10Beta-Cluster-Infrastructure, 10SRE, 10Traffic, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10AlexisJazz) [21:52:36] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10SRE, 10Traffic, and 2 others: [epic] The SSL certificate for Beta cluster domains fails to properly renew & deploy - https://phabricator.wikimedia.org/T293585 (10AlexisJazz) [21:52:44] 10Beta-Cluster-Infrastructure, 10SRE, 10Traffic, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10AlexisJazz) [21:52:55] 10Beta-Cluster-Infrastructure, 10SRE, 10Traffic, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10RhinosF1) [21:53:03] 10Beta-Cluster-Infrastructure: 'en.wikipedia.beta.wmflabs.org' Certificate has expired - https://phabricator.wikimedia.org/T296000 (10RhinosF1) [21:53:36] 10Beta-Cluster-Infrastructure, 10SRE, 10Traffic, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10RhinosF1) [21:53:44] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10SRE, 10Traffic, and 2 others: [epic] The SSL certificate for Beta cluster domains fails to properly renew & deploy - https://phabricator.wikimedia.org/T293585 (10RhinosF1) [21:55:08] 10Beta-Cluster-Infrastructure, 10SRE, 10Traffic, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10RhinosF1) Closing as duplicate of linked task. I assume it's the standard restart of st... [21:55:21] 10Beta-Cluster-Infrastructure, 10SRE, 10Traffic, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10AlexisJazz) @RhinosF1 : en.wikipedia.beta.wmflabs.org is already working again but uplo... [21:55:34] 10Beta-Cluster-Infrastructure: 'en.wikipedia.beta.wmflabs.org' Certificate has expired - https://phabricator.wikimedia.org/T296000 (10RhinosF1) @Urbanecm: can you check the upload hosts? I believe something needs restarting for them. [21:56:18] 10Beta-Cluster-Infrastructure, 10SRE, 10Traffic, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10RhinosF1) >>! In T296113#7517656, @AlexisJazz wrote: > @RhinosF1 : en.wikipedia.beta.wm... [21:56:40] 10Beta-Cluster-Infrastructure: *.beta.wmflabs.org Certificate has expired - https://phabricator.wikimedia.org/T296000 (10RhinosF1) [21:57:06] urbanecm: if around can you check ^ [21:57:19] looking [21:57:29] hopefully it should be not that hard now [21:58:11] server has the new cert, reloading... [21:58:24] 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10RhinosF1) Removing SRE + Traffic as it's not managed by them [21:58:26] 10Beta-Cluster-Infrastructure: *.beta.wmflabs.org Certificate has expired (November 2021 edition) - https://phabricator.wikimedia.org/T296000 (10RhinosF1) [21:58:31] 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10AlexisJazz) >>! In T296113#7517658, @RhinosF1 wrote: >>>! In T296113#7517656, @AlexisJazz wrote: >> @RhinosF1... [21:58:33] !log urbanecm@deployment-cache-upload06:~$ sudo systemctl reload trafficserver-tls.service # T296000 [21:58:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:58:36] T296000: *.beta.wmflabs.org Certificate has expired (November 2021 edition) - https://phabricator.wikimedia.org/T296000 [21:59:29] RhinosF1: wfm now [21:59:32] 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10RhinosF1) Upload hosts are just more broken than the rest. Yes I'm 1000% sure it's the exact same issue and i... [22:00:07] urbanecm: yes that's it [22:00:13] I forgot which service [22:00:18] I'm pretty tired [22:00:23] 10Beta-Cluster-Infrastructure: *.beta.wmflabs.org Certificate has expired (November 2021 edition) - https://phabricator.wikimedia.org/T296000 (10Urbanecm) >>! In T296000#7517657, @RhinosF1 wrote: > @Urbanecm: can you check the upload hosts? I believe something needs restarting for them. Reloaded trafficserver-t... [22:00:31] trafficserver-tls, the one i just restarted :) [22:00:38] (but i have to admit i looked it up in SAL too :D) [22:01:22] * urbanecm loves when a systemctl reload/restart fixes stuff [22:01:46] 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10AlexisJazz) >>! In T296113#7517666, @RhinosF1 wrote: > Upload hosts are just more broken than the rest. Yes I... [22:03:03] 10Beta-Cluster-Infrastructure, 10User-Urbanecm: upload.wikimedia.beta.wmflabs.org certificate expired (October 2021) - https://phabricator.wikimedia.org/T293070 (10Urbanecm) 05Open→03Resolved a:03Urbanecm Works now. [22:03:06] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10SRE, 10Traffic, and 2 others: [epic] The SSL certificate for Beta cluster domains fails to properly renew & deploy - https://phabricator.wikimedia.org/T293585 (10Urbanecm) [22:06:28] 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10RhinosF1) Yes I've noticed that but it still impacted the whole of beta. Someone can dig on the task from yes... [22:08:07] urbanecm: ^ is a fair point [22:08:21] 10Beta-Cluster-Infrastructure: *.beta.wmflabs.org Certificate has expired (November 2021 edition) - https://phabricator.wikimedia.org/T296000 (10AlexisJazz) >>! In T296000#7517667, @Urbanecm wrote: >>>! In T296000#7517657, @RhinosF1 wrote: >> @Urbanecm: can you check the upload hosts? I believe something needs r... [22:08:23] It's only about 6 weeks since the last renewal [22:09:16] we added a new domain? [22:09:22] might be old cert getting invalidated somehow [22:09:27] Could be [22:09:29] (just a guess) [22:09:58] If issuing a new revokes old [22:10:04] not sure [22:11:27] 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10AlexisJazz) >>! In T296113#7517683, @RhinosF1 wrote: > Yes I've noticed that but it still impacted the whole... [22:12:29] 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on November 18, 2021. - https://phabricator.wikimedia.org/T296113 (10RhinosF1) It's the same certificate for all. It's just that upload requires an extra step to deploy because f... [22:32:37] (03CR) 10Nikki Nikkhoui: jjb, Zuul: [mediawiki/services/servicelib-node/spec] add test pipeline (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/739908 (https://phabricator.wikimedia.org/T295994) (owner: 10Nikki Nikkhoui) [23:46:16] (03PS1) 10BryanDavis: macros: Use numeric gid when creating a user [blubber] - 10https://gerrit.wikimedia.org/r/740282 [23:50:49] (03CR) 10BryanDavis: "Small bug I noticed while playing around with the runs and lives configuration settings. I stopped myself from also removing the unnecessa" [blubber] - 10https://gerrit.wikimedia.org/r/740282 (owner: 10BryanDavis)