[04:39:39] 10GitLab: Account pending approval banner shown on initial sign-in gives no context for how to proceed - https://phabricator.wikimedia.org/T353752 (10deni) I hate to be negative, but this does not feel ready for production. I have now been trying for over half an hour to figure out why I am getting this message... [04:48:37] (DatasourceError) firing: Queue (Jenkins jobs + Zuul functions) alert - https://grafana.wikimedia.org/alerting/grafana/iS0FSjJ4z/view - https://wikitech.wikimedia.org/wiki/Monitoring/DatasourceError - https://alerts.wikimedia.org/?q=alertname%3DDatasourceError [04:53:37] (DatasourceError) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://grafana.wikimedia.org/alerting/grafana/iS0FSjJ4z/view - https://wikitech.wikimedia.org/wiki/Monitoring/DatasourceError - https://alerts.wikimedia.org/?q=alertname%3DDatasourceError [07:31:16] 10Continuous-Integration-Config, 10Gerrit, 10NetworkSession: New repo mediawiki/extensions/NetworkSession wrongly inherits from All-Projects, not mediawiki/extensions - https://phabricator.wikimedia.org/T355186 (10QChris) It seems the repo got added manually. I've: * set the parent to mediawiki/extensions, *... [08:21:32] 10GitLab (Pipeline Services Migration🐤), 10Wikimedia-Design, 10collaboration-services, 10Design, 10Patch-For-Review: move design.wikimedia.org to kubernetes - https://phabricator.wikimedia.org/T350791 (10Jelto) >>! In T350791#9462683, @DDeSouza wrote: >> And the design team deployers (@Volker_E and @DDeS... [08:26:34] 10Gerrit, 10Release-Engineering-Team, 10Data-Engineering, 10Upstream: git clone and git pull commands fail for refinery repo - https://phabricator.wikimedia.org/T355173 (10hashar) I have uploaded at https://people.wikimedia.org/~hashar/T355173/ : | [[ https://people.wikimedia.org/~hashar/T355173/local-clo... [10:05:00] Project beta-update-databases-eqiad build #73229: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/73229/ [10:29:55] (03PS5) 10WQuarshie: Extensions and Hooks [tools/code-utils] - 10https://gerrit.wikimedia.org/r/989910 (https://phabricator.wikimedia.org/T354654) [10:30:27] (03PS2) 10WQuarshie: support stdout [tools/code-utils] - 10https://gerrit.wikimedia.org/r/990632 (https://phabricator.wikimedia.org/T354654) [10:51:25] 10Gerrit, 10Release-Engineering-Team, 10Data-Engineering, 10Upstream: git clone and git pull commands fail for refinery repo - https://phabricator.wikimedia.org/T355173 (10gmodena) @thcipriani just wanted to give an ack that I managed to reproduce `jgit clone` case. However, `jgit fetch` failed on the `dep... [11:05:01] Project beta-update-databases-eqiad build #73230: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/73230/ [11:19:36] (03CR) 10Daniel Kinzler: Extensions and Hooks (035 comments) [tools/code-utils] - 10https://gerrit.wikimedia.org/r/989910 (https://phabricator.wikimedia.org/T354654) (owner: 10WQuarshie) [11:19:47] (03CR) 10Daniel Kinzler: [C: 03+2] Extensions and Hooks [tools/code-utils] - 10https://gerrit.wikimedia.org/r/989910 (https://phabricator.wikimedia.org/T354654) (owner: 10WQuarshie) [11:20:27] (03Merged) 10jenkins-bot: Extensions and Hooks [tools/code-utils] - 10https://gerrit.wikimedia.org/r/989910 (https://phabricator.wikimedia.org/T354654) (owner: 10WQuarshie) [11:27:48] Project beta-scap-sync-world build #138561: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/138561/ [11:37:13] Project beta-scap-sync-world build #138562: 15ABORTED in 7 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/138562/ [11:39:18] 10Release-Engineering-Team, 10Wikimedia-Extension-setup, 10Russian-Sites: Add Extension:PlaceNewSection to Russian Wikipedia - https://phabricator.wikimedia.org/T344501 (10Iniquity) [11:43:50] Project beta-scap-sync-world build #138563: 04STILL FAILING in 4 min 23 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/138563/ [11:45:28] ^ `mwdeploy@deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud: Permission denied (publickey).` [11:45:40] * TheresNoTime in a meeting, so can't dig atm [11:54:41] 10Beta-Cluster-Infrastructure: beta-scap-sync-world: mwdeploy@deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud: Permission denied (publickey) - https://phabricator.wikimedia.org/T355218 (10TheresNoTime) [11:55:18] 10Beta-Cluster-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): beta-scap-sync-world: mwdeploy@deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud: Permission denied (publickey) - https://phabricator.wikimedia.org/T355218 (10TheresNoTime) [11:56:32] I can’t `ssh deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud` either [11:56:59] perhaps it’s so overloaded that it can’t respond to SSH connections? IIRC there have been similar issues in the past (though I’m not 100% sure if that was on beta or elsewhere) [11:59:07] I think https://sal.toolforge.org/log/kZylcYsBxE1_1c7sHLZu might have been a similar case [12:02:05] !log soft-rebooted deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud T355218 [12:02:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:02:08] T355218: beta-scap-sync-world: mwdeploy@deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud: Permission denied (publickey) - https://phabricator.wikimedia.org/T355218 [12:03:26] Lucas_WMDE: welp, that didn't work.. maybe should have done a hard reboot (: [12:03:36] damn :( [12:03:52] oh wait.. [12:04:18] hm, it has a nonempty log in horizon now at least [12:04:27] but from ssh I get “Connection closed by UNKNOWN port 65535” o_O [12:05:22] I love when `A start job is running for Create V���s and Directories (34s / no limit)` /j [12:05:51] * urbanecm went to log a reboot of mwmaint [12:05:55] and noticed TheresNoTime already tried [12:06:30] well the log shows its at the prompt now, so maybe ssh will work [12:06:47] depends on what "work" means [12:06:51] it does not end with a timeout [12:06:56] but it doesn't let me in either [12:07:16] Project beta-scap-sync-world build #138564: 15ABORTED in 21 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/138564/ [12:07:26] neither does deploy03, ftr [12:07:34] huh, I can ssh in o.o [12:07:54] samtar@deployment-mwmaint02:~$ uptime [12:07:54] 12:07:40 up 2 min, 3 users, load average: 0.46, 0.17, 0.06 [12:07:55] hm, me too now [12:08:19] * Lucas_WMDE runs `who` and waves at TheresNoTime and taavi [12:08:33] silly me. i was trying to SSH from a different cloud host and was wondering why i'm getting permission errors [12:08:38] when i ssh from my laptop, it works! :D [12:08:42] \o/ [12:08:43] :D [12:08:48] computers were a mistake, the beta cluster doubly so [12:08:57] (/j) [12:09:34] and given i ran some scripts at mwmaint, and this sometimes happen when cloud servers get overloaded, i'm thinking that i caused this bug to happen :D [12:09:39] TheresNoTime: https://bash.toolforge.org/quip/IulTF40BxE1_1c7sqx96 [12:09:45] feel free to edit in the /j if you want :P [12:09:49] smh :P [12:09:57] TheresNoTime: congratulations! :) [12:10:07] `beta-scap-sync-world` running now, so hopefully will all be ok [12:10:29] question of the day: do i want to (re)start the same userOptions.php run that possibly caused this outage? :D [12:11:00] Yippee, build fixed! [12:11:00] Project beta-scap-sync-world build #138565: 09FIXED in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/138565/ [12:11:04] yay! [12:11:07] can you do it with a nicer niceness? (or would that not really help here?) [12:11:38] depends what resources were eaten by it. if it was RAM, niceness wouldn't help [12:11:52] maybe i can start by not running four instances of it at the same time :D [12:12:14] apparently we don’t have a persistent journal on that machine, that’s unfortunate [12:12:23] 10Beta-Cluster-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): beta-scap-sync-world: mwdeploy@deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud: Permission denied (publickey) - https://phabricator.wikimedia.org/T355218 (10TheresNoTime) 05Open→03Resolved a:03TheresNoTime Further pr... [12:12:33] (so idk how to find out whether it was RAM or CPU or something else) [12:13:28] * TheresNoTime doesn't actually remember what the niceness does [12:13:39] Lucas_WMDE: definitely RAM https://usercontent.irccloud-cdn.com/file/6q8lUEIu/image.png [12:14:18] wget downloadmoreram.com [12:14:37] and i started it at ~10:00 CET, so that is very likely it [12:15:27] 10Beta-Cluster-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): beta-scap-sync-world: mwdeploy@deployment-mwmaint02: Permission denied (publickey) - https://phabricator.wikimedia.org/T355218 (10TheresNoTime) [12:15:51] dunno if you want to add that root cause ^, I just said it was "overloaded" and closed it [12:15:52] Lucas_WMDE: or curl https://horizon.wikimedia.org/something/something. that is more likely to succeed here. [12:15:55] urbanecm: are you sure your script does not have a memory leak? [12:16:35] taavi: good question. it was userOptions.php, which I (possibly incorrectly) expected to be reasonably written [12:16:38] also the apis are under openstack.eqiad1.wikimediacloud.org and not horizon.wikimedia.org [12:18:07] (https://wikitech.wikimedia.org/wiki/Help:Using_OpenStack_APIs) [12:18:18] detail :D [12:18:46] but thanks, might come helpful :) [12:20:02] DevOps action for today complete (turned a VM off and back on) [12:20:11] twice! :) [12:21:03] * urbanecm goes to try running the script again, this time more carefully and with an eye on the RAM chart. [12:25:51] !log deployment-prep: `foreachwiki userOptions.php --old-is-default --old=2 --new 1 --nowarn echo-subscriptions-web-reverted`(T353225) [12:25:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:25:55] T353225: Echo: Make use of conditional user defaults - https://phabricator.wikimedia.org/T353225 [12:45:18] 10Beta-Cluster-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): beta-scap-sync-world: mwdeploy@deployment-mwmaint02: Permission denied (publickey) - https://phabricator.wikimedia.org/T355218 (10Urbanecm_WMF) FTR, the likely root cause was me running several userOptions.php scripts at T353225, resu... [12:47:07] (03PS1) 10WQuarshie: stdin [tools/code-utils] - 10https://gerrit.wikimedia.org/r/991321 (https://phabricator.wikimedia.org/T354654) [12:56:03] Hey releng! Could any one point me at or explain in more detail how we build the localisation caches when doing deploys? We already looked in https://gitlab.wikimedia.org/repos/releng/scap/-/blob/47f7311df48b756f3cafa620a25ae296464aff4c/scap/tasks.py and are trying to understand how the list of extensions that will have their caches built is determined. Our end goal is to try and build a container image with a built localisation cache [12:56:03] for a list of extensions [13:34:51] tarrow: https://noc.wikimedia.org/conf/highlight.php?file=wmf-config/extension-list [13:36:36] Reedy: and where is that fed into rebuildLocalisationCache.php or equivalent? [13:40:09] tarrow: https://gitlab.wikimedia.org/repos/releng/scap/-/blob/47f7311df48b756f3cafa620a25ae296464aff4c/scap/tasks.py#L718-723 [13:40:45] probably a few levels of indirection [13:41:58] Cheers! That looks very close to what we're looking for; still trying to understand if I care about MessageFileList vs the Dirs equivalent [13:51:49] taavi: i think the memory usage is reasonably static once the script starts, and that i simply started too many of them for beta mwmaint to handle. so far no issues. so, all good i guess. [14:56:22] 10Gerrit, 10Release-Engineering-Team, 10Data-Engineering, 10Upstream: git clone and git pull commands fail for refinery repo - https://phabricator.wikimedia.org/T355173 (10hashar) I don't know what is going on since when I serve the bare repository with cgit (`git daemon --export-all`) and then fetch from... [15:21:35] 10Gerrit, 10Release-Engineering-Team, 10Data-Engineering, 10Upstream: git clone and git pull commands fail for refinery repo - https://phabricator.wikimedia.org/T355173 (10hashar) [15:28:03] or maybe not... [16:00:21] 10GitLab: GitLab Private Repository Request for: future-audiences/takehome-task - https://phabricator.wikimedia.org/T354665 (10thcipriani) a:03thcipriani @NBaca-WMF should be done and you should be the owner: https://gitlab.wikimedia.org/repos/future-audiences/takehome-task let me know if anything is amiss! [16:01:06] 10GitLab: GitLab Private Repository Request for: future-audiences/takehome-task - https://phabricator.wikimedia.org/T354665 (10thcipriani) 05Open→03Resolved [16:08:09] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, 10serviceops: ThumbnailRender job calls $wgImageMagickConvertCommand - https://phabricator.wikimedia.org/T355243 (10Clement_Goubert) [16:08:43] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Move MediaWiki jobs to mw-on-k8s - https://phabricator.wikimedia.org/T349796 (10Clement_Goubert) [16:13:49] 10Release-Engineering-Team, 10Data-Engineering, 10Data-Platform, 10Data-Platform-SRE, 10Discovery-Search (Current work): SonarQube build are failing with Java 11 - https://phabricator.wikimedia.org/T355122 (10CodeReviewBot) pfischer merged https://gitlab.wikimedia.org/repos/search-platform/cirrus-streami... [16:27:02] (03Abandoned) 10Ebernhardson: Access change for mediawiki/extensions/NetworkSession [extensions/NetworkSession] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/991051 (owner: 10Ebernhardson) [16:38:34] (03CR) 10Jforrester: [C: 03+2] Remove 'WikimediaUI Base' and replace with 'Codex Design Tokens' [integration/docroot] - 10https://gerrit.wikimedia.org/r/991040 (https://phabricator.wikimedia.org/T354310) (owner: 10VolkerE) [16:39:29] (03Merged) 10jenkins-bot: Remove 'WikimediaUI Base' and replace with 'Codex Design Tokens' [integration/docroot] - 10https://gerrit.wikimedia.org/r/991040 (https://phabricator.wikimedia.org/T354310) (owner: 10VolkerE) [16:40:02] 10Phabricator: Change my username - https://phabricator.wikimedia.org/T355245 (10Bugreporter) [16:40:51] 10Gerrit, 10Release-Engineering-Team, 10Data-Engineering, 10Upstream: git clone and git pull commands fail for refinery repo - https://phabricator.wikimedia.org/T355173 (10gmodena) >>! In T355173#9464084, @hashar wrote: [...] > The old git protocol has the same issue (`-c protocol.version=0`). I have no id... [16:41:00] 10Phabricator: Change my username - https://phabricator.wikimedia.org/T355245 (10taavi) 05Open→03Resolved a:03taavi Done. [17:07:01] 10Continuous-Integration-Config, 10Gerrit, 10NetworkSession: New repo mediawiki/extensions/NetworkSession wrongly inherits from All-Projects, not mediawiki/extensions - https://phabricator.wikimedia.org/T355186 (10Jdforrester-WMF) Thank you! [17:19:02] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.42.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T354432 (10Jdforrester-WMF) [17:30:03] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10kamila) [17:30:11] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Migrate mobileapps to k8s - https://phabricator.wikimedia.org/T350846 (10kamila) 05Open→03Resolved All traffic is now going to k8s \o/ I will keep an eye on php workers saturation, but it should be fine, so I'm calling it... [17:43:50] 10Release-Engineering-Team (Quid Pro Crow 🦃), 10Security-Team, 10SecTeam-Processed: Notify MediaWiki security tasks as soon as an uploaded patch fails to apply - https://phabricator.wikimedia.org/T350065 (10thcipriani) a:05dduvall→03jnuche [17:46:50] (03CR) 10Daniel Kinzler: stdin (036 comments) [tools/code-utils] - 10https://gerrit.wikimedia.org/r/991321 (https://phabricator.wikimedia.org/T354654) (owner: 10WQuarshie) [17:54:03] (03CR) 10Daniel Kinzler: support stdout (031 comment) [tools/code-utils] - 10https://gerrit.wikimedia.org/r/990632 (https://phabricator.wikimedia.org/T354654) (owner: 10WQuarshie) [17:58:02] Project beta-scap-sync-world build #138601: 15ABORTED in 7.3 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/138601/ [18:27:29] 10Phabricator, 10collaboration-services: automate data syncing between phabricator servers - https://phabricator.wikimedia.org/T354221 (10Dzahn) 05In progress→03Resolved > Currently a Phabricator server has 3 rsync modules to sync data. One each for these pathes: [x] /srv/dump - code not replaced by rsync... [18:55:56] 10Gerrit: Gerrit notification emails are missing the content of inline comments sometimes - https://phabricator.wikimedia.org/T355259 (10matmarex) [18:57:13] 10Gerrit, 10Upstream: Gerrit does not send email when a new patchset is created using the web interface - https://phabricator.wikimedia.org/T343471 (10matmarex) Not fixed :( [18:58:50] 10Phabricator, 10collaboration-services: automate data syncing between phabricator servers - https://phabricator.wikimedia.org/T354221 (10Dzahn) cc: @eoghan @brennen @Jelto now we have this new setup for Phabricator servers: on the active host, rsyncd listens: ` [phab1004:~] $ sudo grep -E '(path|hosts)'... [19:21:56] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.42.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T354432 (10jeena) Isn't a deployment of changeprop in kubernetes needed here? I don't think scap does this. [20:25:27] 10Gerrit, 10Release-Engineering-Team, 10Data-Engineering, 10Upstream: git clone and git pull commands fail for refinery repo - https://phabricator.wikimedia.org/T355173 (10gmodena) @hashar @thcipriani the version of `git` installed on deploy2002 (2.20.1) does not support `--refetch`: ` gmodena@deploy2002... [21:20:56] 10Gerrit, 10Release-Engineering-Team, 10Data-Engineering, 10Upstream: git clone and git pull commands fail for refinery repo - https://phabricator.wikimedia.org/T355173 (10hashar) @gmodena I forgot the deployment server have an old version of git :-\ The issue is on the server side anyway and the bare rep... [21:26:56] 10Release-Engineering-Team, 10Data-Engineering, 10Gerrit (Gerrit 3.7), 10Upstream: git clone and git pull commands fail for refinery repo - https://phabricator.wikimedia.org/T355173 (10hashar) 05Open→03Stalled I have worked around the issue by doing a fresh clone on the deployment server, though that f... [22:31:22] 10Release-Engineering-Team, 10Tech-Docs-Team, 10Documentation: Review critical-path deployment pipeline documentation - https://phabricator.wikimedia.org/T352259 (10dancy) I made some edits to https://www.mediawiki.org/wiki/GitLab/pipeline_conversion and https://gitlab.wikimedia.org/repos/releng/pipeline-to... [23:14:05] 10GitLab (Project Migration), 10Release-Engineering-Team, 10User-brennen: Add a /repos/gadgets namespace in Wikimedia Gitlab - https://phabricator.wikimedia.org/T353491 (10brennen) 05Open→03Resolved a:03brennen > Not that I'm aware of, but GitLab issues is a thing and works just like GitHub issues. You... [23:19:24] 10Release-Engineering-Team, 10Data-Engineering, 10Gerrit (Gerrit 3.7), 10Upstream: git clone and git pull commands fail for refinery repo - https://phabricator.wikimedia.org/T355173 (10thcipriani) >>! In T355173#9467192, @hashar wrote: > So that is worked around for now and I am marking this task stalled p... [23:21:31] 10GitLab (Project Migration), 10Release-Engineering-Team, 10User-brennen: Add a /repos/gadgets namespace in Wikimedia Gitlab - https://phabricator.wikimedia.org/T353491 (10Novem_Linguae) Give me access too if you want :)