[00:15:05] (03PS1) 10Krinkle: doc: Move CSSJanus demo from GitHub to doc.wikimedia.org [integration/docroot] - 10https://gerrit.wikimedia.org/r/908380 [00:18:15] (03CR) 10Krinkle: [C: 03+2] doc: Move CSSJanus demo from GitHub to doc.wikimedia.org [integration/docroot] - 10https://gerrit.wikimedia.org/r/908380 (owner: 10Krinkle) [00:19:31] (03Merged) 10jenkins-bot: doc: Move CSSJanus demo from GitHub to doc.wikimedia.org [integration/docroot] - 10https://gerrit.wikimedia.org/r/908380 (owner: 10Krinkle) [02:19:55] (03CR) 10Hashar: "The root cause is a119a9edc639085b59b00280388ee65934373785 from September 2017. I guess we can drop the `cd /src` from all images (after " [integration/config] - 10https://gerrit.wikimedia.org/r/908351 (owner: 10Ahmon Dancy) [02:29:41] 10GitLab (Project Migration), 10Release-Engineering-Team: Create new GitLab project group: ci-tools - https://phabricator.wikimedia.org/T334616 (10Legoktm) +1 to the overall goal (thank you James!) Is it easy to rename groups in GitLab down the road? Or do we need to bikeshed it now? (My personal preference w... [02:41:01] 10GitLab (Project Migration), 10CSSJanus, 10MediaWiki-Codesniffer, 10MinusX, and 6 others: Consolidate all Wikimedia CI tools into a single Wikimedia GitLab project group - https://phabricator.wikimedia.org/T334615 (10Legoktm) Great idea. Here are some more repositories that might be in scope * integratio... [06:35:51] 10GitLab (Infrastructure), 10SRE, 10ops-eqiad, 10serviceops-collab: Install additional SSDs on gitlab1004.wikimedia.org (B1) - https://phabricator.wikimedia.org/T333997 (10Jelto) Thanks @Jclark-ctr , I can confirm disks are available on `gitlab1004`: ` sdc 8:32 0... [06:38:59] (03CR) 10Hashar: [C: 03+2] Add new Extension:SectionAnchors [integration/config] - 10https://gerrit.wikimedia.org/r/908240 (owner: 10Robert Vogel) [06:40:05] (03Merged) 10jenkins-bot: Add new Extension:SectionAnchors [integration/config] - 10https://gerrit.wikimedia.org/r/908240 (owner: 10Robert Vogel) [06:43:07] jnuche: good morning, I'd like to convert production jenkins-ci dsh group to be populated by Puppet with https://gerrit.wikimedia.org/r/c/operations/puppet/+/893484 [06:43:25] 10GitLab (Infrastructure), 10serviceops-collab: Troubleshoot partman config for two additional disks on GitLab hosts - https://phabricator.wikimedia.org/T333674 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin2002 for host gitlab2003.wikimedia.org with OS bullseye [06:43:32] that will remove the currently defined host cause none of the CI Jenkins have scap::target['releng/jenkins-deploy'] [06:44:11] as they get promoted to use scap (with $use_scap3_deployment) the hosts will be added to jenkins-ci by Puppet automagically [06:44:27] if that makes sense, may you +1 it and I will check with SRE to get it deployed [06:47:00] also I have merge requests pending for each of scap3-dev and jenkins-deploy but I CANT FIND THEM IN GITLAB FOR **** SAKE [06:51:39] (the sidebar was collapsed :D ) [06:53:07] https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/21 [06:53:39] and https://gitlab.wikimedia.org/repos/releng/scap3-dev/-/merge_requests/23 [06:54:16] (03CR) 10Hashar: [C: 03+2] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/908240 (owner: 10Robert Vogel) [07:14:25] 10GitLab (Infrastructure), 10serviceops-collab: Troubleshoot partman config for two additional disks on GitLab hosts - https://phabricator.wikimedia.org/T333674 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin2002 for host gitlab2003.wikimedia.org with OS bullseye completed: -... [07:27:32] 10Phabricator: Remove phabricator Multi-factor Auth for Atieno - https://phabricator.wikimedia.org/T334480 (10Atieno) [08:10:54] hashar: 👋 bonjour, I've +1'd your Puppet change and also reviewed everything else :) [08:22:39] :]]] [10:52:33] 10GitLab (Infrastructure), 10SRE, 10ops-eqiad, 10serviceops-collab: Install additional SSDs on gitlab1003.wikimedia.org (A3) - https://phabricator.wikimedia.org/T333996 (10Jclark-ctr) 05Open→03Resolved drives installed into gitlab1003 [11:03:39] 10Phabricator: Remove phabricator Multi-factor Auth for Atieno - https://phabricator.wikimedia.org/T334480 (10Atieno) Hi @sbassett so can I schedule some time on your calendar for a video chat or can I slack you as @Aklapper has suggested. Though, I might go the video chat route it looks faster. [11:23:52] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) [12:04:06] 10GitLab (Project Migration), 10CSSJanus, 10MediaWiki-Codesniffer, 10MinusX, and 6 others: Consolidate all Wikimedia CI tools into a single Wikimedia GitLab project group - https://phabricator.wikimedia.org/T334615 (10Jdforrester-WMF) >>! In T334615#8777864, @Legoktm wrote: > Great idea. Here are some more... [12:06:37] 10GitLab (Project Migration), 10Release-Engineering-Team: Create new GitLab project group: ci-tools - https://phabricator.wikimedia.org/T334616 (10Jdforrester-WMF) >>! In T334616#8777832, @Legoktm wrote: > +1 to the overall goal (thank you James!) > > Is it easy to rename groups in GitLab down the road? Or do... [12:09:53] 10Scap: Improve behavior around global Scap lock + communicate changes - https://phabricator.wikimedia.org/T330756 (10jnuche) 05Open→03Resolved Some improvements added: - An operator trying to get a lock already acquired will be shown details of the lock including the owner, reason for the lock and acquisi... [12:36:35] (Queue (Jenkins jobs + Zuul functions) alert) firing: - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [12:46:13] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [12:50:02] 10GitLab (Infrastructure), 10serviceops-collab: Troubleshoot partman config for two additional disks on GitLab hosts - https://phabricator.wikimedia.org/T333674 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin2002 for host gitlab2003.wikimedia.org with OS bullseye [12:53:42] hi hashar or anyone else who understands Jenkins + LDAP acls, how are per-job permissions controlled? I'd like to allow another LDAP group to trigger builds for operations-puppet-catalog-compiler [12:56:35] (Queue (Jenkins jobs + Zuul functions) alert) resolved: - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [12:59:19] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [13:18:26] my best guess at this point is that this is in the jenkins UI itself rather than in a config file? as i've codesearched quite a bit and haven't found anything :) [13:20:14] cdanis: my understanding is that those permissions are global, not per-job. and in the UI i can see that the 'start a new build' permission is granted to wmf and nda members [13:20:56] not seeing anything in the job settings that would override that [13:22:15] taavi: ah, I'm guessing you can see the global settings because you're in ciadmin ? [13:22:39] yes [13:23:10] thanks [13:23:17] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Priority Backlog 📥): Add job definitions to CasC configuration of releases Jenkins - https://phabricator.wikimedia.org/T334669 (10jnuche) [13:24:15] taavi: would you feel comfortable adding ldap group sre-admins to that list? [13:26:47] cdanis: yes jnuche from releng :] [13:27:57] cdanis: I don't see any issues with that, although it feels like easier to add people to nda or wmf than to go through every service currently using those two to add sre-admins to the list (unless I'm misunderstanding the purpose of that group) [13:28:44] yeah, there are some things on nda/wmf that we'd rather not grant access to all members of that group [13:30:05] please file a task against #continuous-integration-infrastructure [13:30:19] but in short with our current system the permission to build a job is an all or nothing kind [13:31:09] and being able to run any job grants a ton of rights [13:32:09] so essentially that requires a nda, which adds the persons to nda and grants the permission to build [13:32:29] for the puppet compiler, mortals can use `check experimental` against a change pending in Gerrit [13:33:11] which is a somewhat terrible interface, I'd like some kind of button in the web UI to make it easier to trigger it [13:38:09] (03CR) 10Hashar: "The changelog entry was made in https://gerrit.wikimedia.org/r/c/integration/config/+/908257" [integration/config] - 10https://gerrit.wikimedia.org/r/908248 (https://phabricator.wikimedia.org/T334211) (owner: 10Jaime Nuche) [13:40:58] 10Release-Engineering-Team: wmf/branch_cut_pretest branch on mediawiki/core causes multiple problems - https://phabricator.wikimedia.org/T334322 (10hashar) 05Open→03Resolved a:03hashar The branch cut script did not work but I tracked that on the train blocker task T330209#8753387 The other issue was with... [13:43:41] 10GitLab (Infrastructure), 10serviceops-collab: Troubleshoot partman config for two additional disks on GitLab hosts - https://phabricator.wikimedia.org/T333674 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin2002 for host gitlab2003.wikimedia.org with OS bullseye executed wit... [14:14:01] 10GitLab (Project Migration), 10Release-Engineering-Team (Priority Backlog 📥), 10Anti-Harassment, 10Cloud-Services, and 16 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10MSantos) [14:31:23] 10Gerrit: Reviewer-bot option to be added as CC instead of reviewer - https://phabricator.wikimedia.org/T334118 (10hashar) 05Open→03Declined For the specific case of being added as a CC, lets use the Gerrit plugin so. Maybe one day we can migrate all the rules from the Wiki page toward the Gerrit plugin but... [14:50:14] 10Release-Engineering-Team (Priority Backlog 📥), 10Scap, 10SRE, 10Python3-Porting: git-fat replacement/removal - https://phabricator.wikimedia.org/T279509 (10hashar) [14:50:20] 10Gerrit, 10Release-Engineering-Team: Migrate Gerrit deployment from git-fat to git-lfs - https://phabricator.wikimedia.org/T333465 (10hashar) 05Open→03Resolved I have updated the doc for the {nav git-fat > git-lfs} change. https://wikitech.wikimedia.org/w/index.php?title=Gerrit%2FUpgrade&diff=prev&oldid=2... [14:51:37] 10Release-Engineering-Team (Seen), 10Scap, 10SRE, 10Python3-Porting: git-fat replacement/removal - https://phabricator.wikimedia.org/T279509 (10hashar) a:05demon→03None [14:51:53] 10Release-Engineering-Team (Seen), 10Scap, 10SRE, 10Python3-Porting: git-fat replacement/removal - https://phabricator.wikimedia.org/T279509 (10hashar) [15:09:04] 10Phabricator, 10SRE: Remove phabricator Multi-factor Auth for Atieno - https://phabricator.wikimedia.org/T334480 (10sbassett) >>! In T334480#8778684, @Atieno wrote: > Hi @sbassett so can I schedule some time on your calendar for a video chat or can I slack you as @Aklapper has suggested. Though, I might go t... [15:23:53] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) [15:25:28] 10GitLab, 10Release-Engineering-Team: Let's Encrypt certificate for registry.cloud.releng.team did not automatically renew - https://phabricator.wikimedia.org/T334679 (10dancy) [15:42:17] 10GitLab, 10Release-Engineering-Team: Let's Encrypt certificate for registry.cloud.releng.team did not automatically renew - https://phabricator.wikimedia.org/T334679 (10dancy) https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/230 [15:51:00] 10Beta-Cluster-Infrastructure, 10Performance-Team (Radar): DBError: Access denied for user 'wikiadmin'@'172.16.%' to database 'mainstash' on Beta Cluster - https://phabricator.wikimedia.org/T322469 (10thcipriani) [15:52:20] 10Beta-Cluster-Infrastructure: Could not enqueue jobs for stream mediawiki.job.cirrusSearchCheckerJob - https://phabricator.wikimedia.org/T325594 (10thcipriani) [15:54:43] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10PageTriage, 10Wikimedia-production-error: PHP Notice: Undefined index: afc_state - https://phabricator.wikimedia.org/T323647 (10thcipriani) [15:56:53] 10GitLab, 10Release-Engineering-Team: Let's Encrypt certificate for registry.cloud.releng.team did not automatically renew - https://phabricator.wikimedia.org/T334679 (10dancy) 05Open→03Resolved Deploying https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/230 caused the letsencr... [17:35:33] thcipriani: your updated Blubber links are live at https://developer.wikimedia.org/contribute/by-language/#blubber. Thanks for the patch! [18:13:40] 10GitLab: kubectl commands using SPDY requests (e.g. cp) on CI runners through the Gitlab k8s-proxy fail - https://phabricator.wikimedia.org/T334690 (10SDunlap) [18:21:50] 10Gerrit, 10Release-Engineering-Team, 10serviceops-collab, 10Patch-For-Review, 10Sustainability (Incident Followup): Move Gerrit data out of root partition - https://phabricator.wikimedia.org/T333143 (10hashar) The LFS plugin stores the data under `/srv/gerrit/plugins/lfs` which will clash with the `$GER... [19:33:57] 10GitLab: kubectl commands using SPDY requests (e.g. cp) on CI runners through the Gitlab k8s-proxy fail - https://phabricator.wikimedia.org/T334690 (10SDunlap) [19:43:24] (03PS2) 10Hashar: node16-test 0.2.0: Don't cd /src [integration/config] - 10https://gerrit.wikimedia.org/r/908351 (owner: 10Ahmon Dancy) [19:46:26] (03CR) 10Hashar: [C: 03+2] "I have cascaded the update to child images by running:" [integration/config] - 10https://gerrit.wikimedia.org/r/908351 (owner: 10Ahmon Dancy) [19:47:32] (03Merged) 10jenkins-bot: node16-test 0.2.0: Don't cd /src [integration/config] - 10https://gerrit.wikimedia.org/r/908351 (owner: 10Ahmon Dancy) [20:04:39] (03CR) 10Hashar: [C: 03+2] "Successfully published image docker-registry.discovery.wmnet/releng/node16-test:0.2.0" [integration/config] - 10https://gerrit.wikimedia.org/r/908351 (owner: 10Ahmon Dancy) [20:07:06] (03CR) 10Ahmon Dancy: node16-test 0.2.0: Don't cd /src (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/908351 (owner: 10Ahmon Dancy) [20:09:39] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T330210 (10brennen) [20:11:03] dancy: the node16-test has been published to the registry :) [20:11:08] I will update the jenkins jobs tomorrow [20:11:20] Here's a gitlab-ci job using it: https://gitlab.wikimedia.org/dancy/deleteme/-/jobs/91535 [20:13:43] niceeeee [20:14:19] what I am wonder is whether we could have developers to use kokkuri to define the nodejs image [20:14:37] this way they get more control about the node/npm versions or extra dependencies they might need [20:14:55] food for later :-] [20:15:08] https://gitlab.wikimedia.org/dancy/deleteme/-/blob/f676541c06af3913510dde3f54b660f851225a8e/.gitlab-ci.yml [20:15:17] that may or may not be what you're talking about [20:15:58] btw my deleteme repo is a copy of https://gerrit.wikimedia.org/r/mediawiki/services/function-orchestrator (at the moment) [20:16:02] I thought about using a base image like bullseye then: [20:16:03] apt: [20:16:06] - nodejs [20:16:08] - npm [20:16:11] - libfoobar [20:16:57] (and this https://gitlab.wikimedia.org/dancy/deleteme/-/blob/f676541c06af3913510dde3f54b660f851225a8e/.pipeline/blubber.yaml ) [20:18:06] You might mean something else. Lemme know! [20:18:14] or use one of the SRE maintained images like docker-registry.wikimedia.org/nodejs16-slim as a base image which will be suitable for prod deployment [20:18:39] ah yeah that last links looks good :] [20:21:00] dancy: also James.F knows those releng/node* images better than me and can surely provide assistance if needed ;) [20:23:23] I am off & [20:39:53] what do i run again if i want to change wikiversions on a debug box? [20:42:01] ...change json on deploy box and scap pull? [20:43:30] brennen: I usually edit the php file on the mwdebug box and then run the magical command to restart php-fpm [20:44:28] taavi: ah, yeah, that makes sense. [20:46:36] 10Continuous-Integration-Infrastructure, 10OOUI, 10Patch-For-Review, 10Regression: OOUI PHP demos page is broken (again) - https://phabricator.wikimedia.org/T322357 (10Kizule) >>! In T322357#8780382, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://sal.toolforg... [20:56:19] upgrading PHP version from 7.3 to 7.4 on backends of doc.wikimedia.org - needs a few minutes maintenance [20:56:31] 3 puppet runs are needed [20:56:44] and some manual removal [20:59:20] 10Continuous-Integration-Infrastructure, 10OOUI, 10Patch-For-Review, 10Regression: OOUI PHP demos page is broken (again) - https://phabricator.wikimedia.org/T322357 (10Dzahn) @Kizule In progress, I only did the passive host first. Few minutes downtime and on it.. [21:03:00] 10Continuous-Integration-Infrastructure, 10OOUI, 10Patch-For-Review, 10Regression: OOUI PHP demos page is broken (again) - https://phabricator.wikimedia.org/T322357 (10Dzahn) @Kizule please try now [21:04:30] 10Continuous-Integration-Infrastructure, 10OOUI, 10Patch-For-Review, 10Regression: OOUI PHP demos page is broken (again) - https://phabricator.wikimedia.org/T322357 (10Dzahn) **Both backends are now upgraded to PHP 7.4**. Some manual steps were needed but all tests still passing. ` [deploy1002:~] $ ht... [21:09:07] 10Continuous-Integration-Infrastructure, 10OOUI, 10Patch-For-Review, 10Regression: OOUI PHP demos page is broken (again) - https://phabricator.wikimedia.org/T322357 (10Kizule) >>! In T322357#8780482, @Dzahn wrote: > @Kizule please try now Thank you @Dzahn. It's working perfectly now! [21:10:59] 10Continuous-Integration-Infrastructure, 10OOUI, 10Patch-For-Review, 10Regression: OOUI PHP demos page is broken (again) - https://phabricator.wikimedia.org/T322357 (10Dzahn) 05Open→03Resolved Glad to hear that!:) [21:17:50] both doc* machines are now running on PHP7.4, 7.3 is purged. all tests are still passing. [21:18:41] and it resolved the ticket above. I am looking forward to this also unblocking that we switch to bullseye machines that already exist. [21:31:30] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T330210 (10brennen) [21:32:48] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T330210 (10brennen) 05Open→03Resolved Logs look good, blockers resolved. [22:04:31] bd808: belated \o/ thans for the deploy :) [22:14:00] yw. #someday I would love to talk with your folks and the right SREs about how to auto-deploy things as simple as the developer-portal service. [22:14:51] I know a lot of things around here are more complicated, but a single container with nginx and some static html should be a good test case for CD to go with our CI [22:48:07] 10Continuous-Integration-Infrastructure, 10serviceops-collab, 10Patch-For-Review: Migrate doc hosts to Bullseye - https://phabricator.wikimedia.org/T319477 (10Dzahn) With the original reason being T322357, PHP on the existing _buster_ doc* machines has now been upgraded from PHP 7.3 to PHP 7.4. It solved th...