[00:07:21] (03PS1) 10Subramanya Sastry: WIP: Add script-generated CSS [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/869335 [00:23:06] (03PS2) 10Subramanya Sastry: WIP: Add script-generated CSS [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/869335 [01:55:52] (03PS3) 10Subramanya Sastry: WIP: Add script-generated CSS [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/869335 [01:55:54] (03PS1) 10Subramanya Sastry: Temporary Cite CSS fixes [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/869339 [04:11:06] (03PS2) 10Subramanya Sastry: Followup to 6fb18f41: DT adds multiple comments now [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/868455 [04:11:08] (03PS2) 10Subramanya Sastry: Temporary Cite CSS fixes [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/869339 [04:11:10] (03PS4) 10Subramanya Sastry: WIP: Add script-generated CSS [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/869335 [04:11:12] (03PS1) 10Subramanya Sastry: cache_purge_adaptor: Follow redirects while purging [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/869343 [08:35:27] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Zache) >>! In T706#8462887, @Zache wrote: > Hi, I Am Zache (Kimmo Virtanen) and work at Wikimedia Finland. I would like to get subproject creation right for project #WMFI which is Wi... [08:44:07] 10GitLab (CI & Job Runners), 10serviceops-collab: runner-1025.gitlab-runners.eqiad1.wikimedia.cloud cannot connect to docker daemon errors - https://phabricator.wikimedia.org/T325604 (10Jelto) 05Open→03Resolved p:05Triage→03Medium a:03Jelto Thanks for bringing this up! I can confirm that jobs are fai... [09:02:28] 10Phabricator: Create Growth team blog on Phame - https://phabricator.wikimedia.org/T293240 (10kostajh) 05Open→03Declined @Aklapper no thanks. [10:26:18] (03PS1) 10TheDJ: Add Lectrician1 [integration/config] - 10https://gerrit.wikimedia.org/r/869731 [10:26:55] (03PS2) 10TheDJ: Add Lectrician1 to CI allow list [integration/config] - 10https://gerrit.wikimedia.org/r/869731 [11:01:47] Would oneone object if I +2ed this patch so I can deploy it to beta for testing? I assume +2ing it will not deploy it to production. https://gerrit.wikimedia.org/r/c/mediawiki/services/mathoid/+/865787. _joe_ taavi Reedy? [11:01:57] s/oneone/anyone [11:05:11] dwalden: +2 on that repo won't automatically deploy to production. so as long as you don't leave master in a broken state, that should be fine [11:08:35] 10Continuous-Integration-Config, 10Wikidata, 10wdwb-tech, 10wmde-wikidata-tech: Run CI tests daily on master for ungated extensions - https://phabricator.wikimedia.org/T285049 (10ItamarWMDE) [11:09:32] taavi: Thanks. I will +2 and test on beta. If it seems broken, I will revert the patch. [11:28:05] 10Continuous-Integration-Config, 10Wikidata, 10wdwb-tech, 10wmde-wikidata-tech: Run CI tests daily on master for ungated extensions - https://phabricator.wikimedia.org/T285049 (10ItamarWMDE) Task Review Notes: 1) It appears that tests for Cognate are already running 2) We are receiving emails for failures... [11:31:24] 10Continuous-Integration-Config, 10Wikidata, 10wdwb-tech, 10wmde-wikidata-tech, 10User-ItamarWMDE: Run CI tests daily on master for ungated extensions - https://phabricator.wikimedia.org/T285049 (10ItamarWMDE) 05Open→03Resolved a:03ItamarWMDE [11:44:07] 10GitLab, 10serviceops, 10serviceops-collab, 10Kubernetes, 10Patch-For-Review: Trusted gitlab runner containers need access to staging k8s cluster - https://phabricator.wikimedia.org/T325385 (10akosiaris) We already had that functionality for the deployment pipeline on gerrit, it should be around for Git... [11:50:48] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10SRE, 10Patch-For-Review: git: detected dubious ownership in repository at '/srv/mediawiki-staging' - https://phabricator.wikimedia.org/T325128 (10akosiaris) Switching from #serviceops to #SRE for greater visibility within the SRE team, this could... [12:01:17] 10GitLab, 10serviceops, 10serviceops-collab, 10Kubernetes, 10Patch-For-Review: Trusted gitlab runner containers need access to staging k8s cluster - https://phabricator.wikimedia.org/T325385 (10Jelto) Ah sorry for the confusion, I missed that! Sounds good to recreate all of the functionality in the `ci`... [12:02:06] (03CR) 10Jaime Nuche: [C: 03+1] ""situtaions" spelled wrong in the commit message. Other than that, LGTM" [tools/release] - 10https://gerrit.wikimedia.org/r/869248 (https://phabricator.wikimedia.org/T325576) (owner: 10Ahmon Dancy) [13:07:50] (03CR) 10Isabelle Hurbain-Palatin: [C: 03+1] "I think it's fine, but I'm not going to put a +2 on a patch on a repository the first time I see said repository 😄" [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/868455 (owner: 10Subramanya Sastry) [13:58:02] 10GitLab, 10Release-Engineering-Team, 10serviceops-collab: Align the GitLab runner tags - https://phabricator.wikimedia.org/T325069 (10Jelto) Regarding change of `protected` tag to `trusted`: Most projects migrated and MR were merged. 4 projects have to migrate before we can remove the `protected` tag. http... [14:27:05] 10GitLab, 10serviceops, 10serviceops-collab, 10Kubernetes, 10Patch-For-Review: Trusted gitlab runner containers need access to staging k8s cluster - https://phabricator.wikimedia.org/T325385 (10akosiaris) >>! In T325385#8481064, @Jelto wrote: > Ah sorry for the confusion, I missed that! No worries! > I... [15:25:24] 10Continuous-Integration-Config, 10ci-test-error: Flake8 5.0.0+ release breaking CI jobs - https://phabricator.wikimedia.org/T314577 (10taavi) [15:32:36] (03PS3) 10Ahmon Dancy: Remove new-train-version logic [tools/release] - 10https://gerrit.wikimedia.org/r/869248 (https://phabricator.wikimedia.org/T325576) [15:37:57] 10GitLab (CI & Job Runners), 10serviceops-collab: runner-1025.gitlab-runners.eqiad1.wikimedia.cloud cannot connect to docker daemon errors - https://phabricator.wikimedia.org/T325604 (10dancy) Thanks @jelto! [15:42:51] 10GitLab, 10Release-Engineering-Team, 10serviceops-collab: Align the GitLab runner tags - https://phabricator.wikimedia.org/T325069 (10dancy) >>! In T325069#8481408, @Jelto wrote: > Regarding change of `protected` tag to `trusted`: Most projects migrated and MR were merged. 4 projects have to migrate before... [15:46:28] 10GitLab, 10serviceops, 10serviceops-collab, 10Kubernetes, 10Patch-For-Review: Trusted gitlab runner containers need access to staging k8s cluster - https://phabricator.wikimedia.org/T325385 (10dancy) >>! In T325385#8481064, @Jelto wrote: > @dancy What is your plan of providing the kubeconfig? Is mathoid... [15:48:46] Project beta-scap-sync-world build #82523: 04FAILURE in 2 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82523/ [15:52:38] (03CR) 10Ahmon Dancy: [C: 03+2] Remove new-train-version logic (031 comment) [tools/release] - 10https://gerrit.wikimedia.org/r/869248 (https://phabricator.wikimedia.org/T325576) (owner: 10Ahmon Dancy) [15:53:14] (03Merged) 10jenkins-bot: Remove new-train-version logic [tools/release] - 10https://gerrit.wikimedia.org/r/869248 (https://phabricator.wikimedia.org/T325576) (owner: 10Ahmon Dancy) [15:54:06] (03PS4) 10Ahmon Dancy: multiversion-base: Include /usr/share/{GeoIP,GeoIPInfo} from host if available [tools/release] - 10https://gerrit.wikimedia.org/r/868122 (https://phabricator.wikimedia.org/T288375) [15:56:06] Yippee, build fixed! [15:56:06] Project beta-scap-sync-world build #82524: 09FIXED in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82524/ [17:05:01] Project beta-update-databases-eqiad build #63820: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/63820/ [17:31:08] 10GitLab, 10Striker: Toolsadmin unable to deal with deleted tool repository - https://phabricator.wikimedia.org/T323768 (10bd808) p:05Triage→03Medium > What should have happened instead?: The repo should be deleted from the database. I do agree that mismatched state between Striker's database and GitLab s... [19:01:48] (03PS5) 10Subramanya Sastry: WIP: Add script-generated Cite CSS [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/869335 [19:19:06] Project beta-scap-sync-world build #82544: 04FAILURE in 2 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82544/ [19:25:54] Yippee, build fixed! [19:25:54] Project beta-scap-sync-world build #82545: 09FIXED in 1 min 6 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82545/ [20:00:48] 10Continuous-Integration-Infrastructure, 10Cloud-VPS, 10Wikidata, 10wdwb-tech, and 3 others: Wikibase selenium tests timeout, seemingly due to "memory compaction" events on CI VMs - https://phabricator.wikimedia.org/T281122 (10Umherirrender) My core patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/8... [20:07:54] Project beta-scap-sync-world build #82549: 04FAILURE in 3 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82549/ [20:16:00] Yippee, build fixed! [20:16:00] Project beta-scap-sync-world build #82550: 09FIXED in 1 min 5 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82550/ [20:59:03] Project beta-scap-sync-world build #82554: 04FAILURE in 4 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82554/ [21:06:02] Yippee, build fixed! [21:06:02] Project beta-scap-sync-world build #82555: 09FIXED in 1 min 8 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82555/ [21:43:07] that's flaky [21:44:13] Nod [21:46:06] 20:56:17 20:56:17 ['/usr/bin/scap', 'pull', '--no-php-restart', '--no-update-l10n', '--exclude-wikiversions.php', 'deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud'] (ran as mwdeploy@deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud) returned [255]: Connection to 172.16.3.92 port 22 timed out [21:46:08] is it always that? [21:46:42] Possibly related: https://phabricator.wikimedia.org/P42730 [21:47:37] is it on an unhappy hypervisor? [21:47:41] * Reedy asks in cloud [21:49:45] Project beta-scap-sync-world build #82559: 04FAILURE in 4 min 51 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82559/ [21:50:24] Interesting. So that last failure is deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud timing out trying to connect to itself. [21:51:06] No recent interesting dmesg entries on deployment-deploy03 [21:52:14] I know this is cliche, but have you tried turning it off and on again? [21:52:34] I was just about to :P [21:52:50] !log reboot deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud [21:52:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:53:26] hrm, well, was this a failure of deployment-deploy03 for both? not mwmaint? [21:53:48] like there's a timeout connecting to deployment-deploy03 [21:54:10] also, I note, the disk is getting kinda full there [21:54:13] Connecting to deployment-deploy03 is the common theme [21:54:30] fwiw, netcat from deployment-deploy03 works for the rsync port [21:54:49] and ssh port [21:54:54] from itself (last error) [21:56:09] Yippee, build fixed! [21:56:09] Project beta-scap-sync-world build #82560: 09FIXED in 1 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82560/ [21:56:26] ok ¯\_(ツ)_/¯ [21:56:28] :) [21:56:49] Not a new problem, but particularly annoying today. [21:56:59] is it always deploy03? [21:57:03] yes [21:57:07] That's the only deploy server in beta [21:57:24] I wonder if it's running out of space for temporary rsync storage? [21:57:43] well [21:57:45] I doubt it. [21:58:44] nod. more of a problem for targets with delayed updates. [21:59:15] well. weird. [22:37:31] Project beta-scap-sync-world build #82564: 04FAILURE in 2 min 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82564/ [22:38:57] ^ error is ssh connect timeout to deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud again. Maybe it needs a reboot too? [22:41:14] hmmm.. "Dec 20 22:37:35 deployment-mwmaint02 sshd[9892]: Connection closed by authenticating user mwdeploy 172.16.4.233 port 44240 [preauth]" -- it thinks the scap process hung up on it [22:42:22] * bd808 thought that -deploy03 was rebooted before, but it was -mwmaint02 [22:42:27] *shrug* [22:46:05] Yippee, build fixed! [22:46:06] Project beta-scap-sync-world build #82565: 09FIXED in 1 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/82565/ [23:12:33] I collected some vmstat data from the last failure. [23:14:57] Annotating it now. I'll post a paste [23:32:26] (03PS2) 10Krinkle: Emphasize "too long" warning label [releng/phatality] - 10https://gerrit.wikimedia.org/r/814009 [23:32:54] (03PS2) 10Krinkle: Switch kbnDocViewerTable -> osdDocViewerTable [releng/phatality] - 10https://gerrit.wikimedia.org/r/814010 [23:36:38] I'll get straight to the conclusion. /dev/sda i/o latency is way too high (consistently >100ms) [23:36:55] (on deploy03) [23:51:41] https://phabricator.wikimedia.org/P42731 [23:56:52] It comes and goes. e.g, it feels find now. [23:57:43] but I do get weird figures out of iostat. I'm going to reboot. [23:58:23] !log reboot deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud [23:58:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL