[02:11:34] why can't I log in to integration-agent-docker-1029? integration-agent-docker-1032 works but 1029 gives permission denied [02:13:23] can someone give me the auth.log extract? is it rejecting my authorized_keys? [02:24:42] * legoktm looks [02:24:47] it rejected me too, but my root key worked [02:25:20] Dec 12 01:58:23 integration-agent-docker-1029 sshd[1192375]: Invalid user tstarling from 172.16.5.8 port 37600 [02:25:27] Dec 12 02:24:30 integration-agent-docker-1029 sshd[1218173]: Connection closed by invalid user legoktm 172.16.3.145 port 49152 [preauth] [02:26:36] TimStarling: I think it's an issue with the host rather than you specifically [02:31:53] I saw the root keys on integration-agent-docker-1032, unfortunately I'm not in it [02:32:35] does this work? /usr/sbin/ssh-key-ldap-lookup tstarling [02:32:51] when I run that as non-root on 1032 it gives me keys [02:33:30] root@integration-agent-docker-1029:~# /usr/sbin/ssh-key-ldap-lookup tstarling [02:33:30] ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/gmwMhe6S4EualYJVcisxJ+kH/VQdqtV0j0OHdj3ZBGtCop50DzMwDaVj5Hc/H+yxOjghd8lOODg5t5TT+GcBCRkbYA0ICspkpWepjHLVdYK/Y+hm3+UcWZ3yJMn6gL01KxvMQtvWqfpoGANitocteMiUh6quJ7uhU2DDdbs2wvocpZ/EvTo2kJoQqP3snf9qwDOhr5oES031asV8TZG6Zn9AQDOyrrYaVaxabYKgAz9gQfHsIi+xGYLQHDxG7AULbHQfStZvYHhyuuJt9i45fb7z1k9oRCb3XBaICjyhBFgTRLTPtdcOU5yHDRbpIZBmhZhARE4diek6JN0XJDhl yubikey2 [02:33:30] ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCnLNE9kJNEQcSCkYBBUmgvT4KrThfW7V7g2g2EefR9gzOehHcAlyocQ5IR1RwIwdrNfaX7IR1r9qAQMIpQM93hp7eU9sTKxSWyXSQItbDWGhGQugahuqvZ9JNedW5PdojM2W5bb2FBCjoS9wHIAlbgteB1KLfqBF0STWZKk6QtgFLRdarQ0UIojN3jfeNiiX3WbFj+gV+4elMOQj+OrV0V58+oaSdUulQU102C0jNBXxXdgP+TAqh2oL5I2dTMYeDl3hnIRVyQkBRWZAHLBgVcWPjSYxV2H7aHBKgr2QVVTCc9IxL5bh4tJsBX+Ku1JXBGlTrW+PiegIk1Mxa/NHfx yubikey [02:34:17] And the root keys file is labs/private if you want to add yourself [02:35:53] restarting sshd didn't help fwiw [04:42:53] I'm in now, seems like a pam failure rather than SSH [04:43:53] auth.log says "Invalid user" and getent doesn't find me [04:46:27] refreshing my memory about this stuff, nsswitch.conf etc. [04:48:40] 10Project-Admins, 10WMDE-TechWish: #WMDE-TechWish-Maintenance archived under #German-Community-Wishlist - https://phabricator.wikimedia.org/T324929 (10Aklapper) [04:51:29] 10Project-Admins, 10WMDE-TechWish: #WMDE-TechWish-Maintenance and #WMDE-Blueprint-tickets archived under #German-Community-Wishlist - https://phabricator.wikimedia.org/T324929 (10Aklapper) [04:53:27] (2022-12-12 4:53:00): [nss] [cache_req_common_process_dp_reply] (0x0040): CR #242094: Could not get account info [1432158212]: SSSD is offline [05:02:26] 10Release-Engineering-Team (Radar), 10MediaWiki-Vagrant, 10RESTBase, 10Parsoid (Tracking): Support RESTBase using Parsoid/PHP in MediaWiki-Vagrant - https://phabricator.wikimedia.org/T259989 (10Aklapper) 05Open→03Declined Boldly declining per https://www.mediawiki.org/wiki/RESTBase/deprecation [06:29:38] (03PS2) 10Physikerwelt: Archive texvcinfo [integration/config] - 10https://gerrit.wikimedia.org/r/866664 (https://phabricator.wikimedia.org/T324924) [06:36:33] I filed https://phabricator.wikimedia.org/T324934 [06:56:37] (03CR) 10Physikerwelt: Archive texvcinfo (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/866664 (https://phabricator.wikimedia.org/T324924) (owner: 10Physikerwelt) [08:46:57] (03CR) 10Hashar: [C: 03+2] Archive texvcinfo [integration/config] - 10https://gerrit.wikimedia.org/r/866664 (https://phabricator.wikimedia.org/T324924) (owner: 10Physikerwelt) [08:48:42] (03Merged) 10jenkins-bot: Archive texvcinfo [integration/config] - 10https://gerrit.wikimedia.org/r/866664 (https://phabricator.wikimedia.org/T324924) (owner: 10Physikerwelt) [09:54:29] (03CR) 10Hashar: [C: 03+2] jjb: Update jobs to PHP images without xdebug [integration/config] - 10https://gerrit.wikimedia.org/r/866618 (https://phabricator.wikimedia.org/T319495) (owner: 10Jforrester) [09:55:08] !log Switching PHP based jobs to container images without XDebug | https://gerrit.wikimedia.org/r/c/integration/config/+/866618 | T319495 [09:55:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:55:11] T319495: Drop xdebug from all RelEng images - https://phabricator.wikimedia.org/T319495 [09:56:17] (03Merged) 10jenkins-bot: jjb: Update jobs to PHP images without xdebug [integration/config] - 10https://gerrit.wikimedia.org/r/866618 (https://phabricator.wikimedia.org/T319495) (owner: 10Jforrester) [09:57:49] 10Release-Engineering-Team, 10Scap, 10MW-on-K8s: Scap Mediawiki K8s deployments - https://phabricator.wikimedia.org/T318536 (10jnuche) [09:57:52] 10Release-Engineering-Team, 10Scap: Treat K8s deployment errors as soft errors in scap - https://phabricator.wikimedia.org/T324574 (10jnuche) 05Open→03Resolved a:03jnuche [10:24:43] (03CR) 10Hashar: [C: 03+2] "Well done!" [integration/config] - 10https://gerrit.wikimedia.org/r/866570 (https://phabricator.wikimedia.org/T323586) (owner: 10Ilias Sarantopoulos) [10:26:28] (03Merged) 10jenkins-bot: inference-services: add revscoring pipelines [integration/config] - 10https://gerrit.wikimedia.org/r/866570 (https://phabricator.wikimedia.org/T323586) (owner: 10Ilias Sarantopoulos) [10:28:08] (03PS3) 10Ladsgroup: jjb: Make wikimedia-portals-build job rebase [integration/config] - 10https://gerrit.wikimedia.org/r/856033 [10:28:58] (03CR) 10Ladsgroup: "I'm going to merge this since there is no objection in the past couple of days." [integration/config] - 10https://gerrit.wikimedia.org/r/856033 (owner: 10Ladsgroup) [10:29:54] (03CR) 10Ladsgroup: [C: 03+2] jjb: Make wikimedia-portals-build job rebase [integration/config] - 10https://gerrit.wikimedia.org/r/856033 (owner: 10Ladsgroup) [10:31:43] (03Merged) 10jenkins-bot: jjb: Make wikimedia-portals-build job rebase [integration/config] - 10https://gerrit.wikimedia.org/r/856033 (owner: 10Ladsgroup) [10:58:31] (03CR) 10Hashar: [C: 03+2] Document how to test a JavaScript Gerrit plugin [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/860885 (https://phabricator.wikimedia.org/T214068) (owner: 10Hashar) [10:59:06] (03Merged) 10jenkins-bot: Document how to test a JavaScript Gerrit plugin [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/860885 (https://phabricator.wikimedia.org/T214068) (owner: 10Hashar) [11:15:08] (03PS4) 10Dom Walden: Run selenium tests for extensions against test2wiki. [integration/config] - 10https://gerrit.wikimedia.org/r/866548 (https://phabricator.wikimedia.org/T303739) [11:16:30] (03CR) 10Dom Walden: "I have removed the tests for Newsletter (which is not installed on test2wiki) and CentralNotice (which I think requires further config to " [integration/config] - 10https://gerrit.wikimedia.org/r/866548 (https://phabricator.wikimedia.org/T303739) (owner: 10Dom Walden) [13:30:09] 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Drop xdebug from all RelEng images - https://phabricator.wikimedia.org/T319495 (10Jdforrester-WMF) 05Open→03Resolved a:03Jdforrester-WMF [13:35:29] 10Release-Engineering-Team (Seen), 10serviceops-collab: contint1001 hardware failures (remove contint1001 from production) - https://phabricator.wikimedia.org/T324698 (10jcrespo) Hi, backups of conting1001 on the 3rd, 4th and 5th of December failed: ` id: 486071, ts: 2022-12-03 07:50:39, type: I, status: f, by... [13:41:08] <_joe_> Say I want to add a special one-off deployment to https://wikitech.wikimedia.org/wiki/Deployments, how would that be done? [13:41:17] <_joe_> just edit the page? [13:41:24] yes [13:42:01] <_joe_> So that won't create issues to DeploymentCalendarTool? [13:42:06] <_joe_> that was my only worry [13:42:12] The tool only creates the original version of the page. [13:42:25] It doesn't edit weeks after they're created. [13:42:37] <_joe_> but it removes them [13:42:46] <_joe_> anyways, thanks :) [13:43:10] True, it also archives, but that's a different tool under the same account. [13:43:12] Any time. [13:50:50] <_joe_> https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=2040182&oldid=2040126 :) [14:03:49] (03CR) 10Hashar: [C: 03+2] Replace ESLint built-in jsdoc by the plugin version [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/860976 (owner: 10Hashar) [14:04:21] (03Merged) 10jenkins-bot: Replace ESLint built-in jsdoc by the plugin version [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/860976 (owner: 10Hashar) [14:22:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [14:27:34] (03PS2) 10Jforrester: dockerfiles: [php82] Upgrade PHP to 8.2.0 now GM is out [integration/config] - 10https://gerrit.wikimedia.org/r/866389 (https://phabricator.wikimedia.org/T314093) [14:28:40] (03CR) 10Jforrester: [C: 03+2] dockerfiles: [php82] Upgrade PHP to 8.2.0 now GM is out [integration/config] - 10https://gerrit.wikimedia.org/r/866389 (https://phabricator.wikimedia.org/T314093) (owner: 10Jforrester) [14:30:24] (03Merged) 10jenkins-bot: dockerfiles: [php82] Upgrade PHP to 8.2.0 now GM is out [integration/config] - 10https://gerrit.wikimedia.org/r/866389 (https://phabricator.wikimedia.org/T314093) (owner: 10Jforrester) [14:31:19] !log Docker: Publishing PHP 8.2 images now based on 8.2.0 GM for T314093 [14:31:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:31:22] T314093: Create PHP 8.2 CI images and jobs for early testing - https://phabricator.wikimedia.org/T314093 [14:32:08] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [14:37:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: (2) Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [14:38:28] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [14:57:40] (Queue (Jenkins jobs + Zuul functions) alert) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [15:06:11] (03PS18) 10Hashar: Replace CI results table by Gerrit Check API [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/859083 (https://phabricator.wikimedia.org/T214068) [15:06:13] (03PS8) 10Hashar: Add unit testing with QUnit [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/861486 [15:16:40] (03CR) 10Hashar: "Kosta reported https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageTriage/+/865596/ was showing some errors even though everything p" [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/859083 (https://phabricator.wikimedia.org/T214068) (owner: 10Hashar) [15:18:40] (03CR) 10Hashar: Add unit testing with QUnit (031 comment) [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/861486 (owner: 10Hashar) [15:20:58] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.40.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T320519 (10hashar) I have been sworn in as the last train conductor of the year 2022! [15:38:33] hashar: last of the year sounds strange [15:38:55] we don't deploy the next two weeks [15:40:21] Yeah I know [15:40:37] Just can't believe 2022 is nearly over [15:43:56] 10GitLab, 10serviceops-collab: Optimize Gitlab Backups - https://phabricator.wikimedia.org/T324506 (10Jelto) [15:44:08] 10GitLab (Infrastructure), 10Data-Persistence-Backup, 10serviceops-collab, 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Jelto) [15:45:48] (03PS1) 10Skizzerz: Add CI for OurWorldInData [integration/config] - 10https://gerrit.wikimedia.org/r/867194 [15:51:40] 10Release-Engineering-Team (Seen), 10serviceops-collab: contint1001 hardware failures (remove contint1001 from production) - https://phabricator.wikimedia.org/T324698 (10Dzahn) @jcrespo Pretty sure we can disable backup checking until decom, but also CCing @hashar [15:58:39] (03CR) 10Hashar: [C: 03+2] Add CI for OurWorldInData [integration/config] - 10https://gerrit.wikimedia.org/r/867194 (owner: 10Skizzerz) [16:00:25] (03Merged) 10jenkins-bot: Add CI for OurWorldInData [integration/config] - 10https://gerrit.wikimedia.org/r/867194 (owner: 10Skizzerz) [16:23:57] (03CR) 10Zfilipin: [C: 03+2] Run selenium tests for extensions against test2wiki. [integration/config] - 10https://gerrit.wikimedia.org/r/866548 (https://phabricator.wikimedia.org/T303739) (owner: 10Dom Walden) [16:24:10] (03CR) 10Zfilipin: [C: 03+2] "The jobs are deployed." [integration/config] - 10https://gerrit.wikimedia.org/r/866548 (https://phabricator.wikimedia.org/T303739) (owner: 10Dom Walden) [16:25:46] (03Merged) 10jenkins-bot: Run selenium tests for extensions against test2wiki. [integration/config] - 10https://gerrit.wikimedia.org/r/866548 (https://phabricator.wikimedia.org/T303739) (owner: 10Dom Walden) [16:31:56] 10Release-Engineering-Team (Seen), 10serviceops-collab: contint1001 hardware failures (remove contint1001 from production) - https://phabricator.wikimedia.org/T324698 (10jcrespo) I will send a patch and add you both as reviewers. Thanks. [16:55:37] 10GitLab, 10serviceops-collab: Optimize Gitlab Backups - https://phabricator.wikimedia.org/T324506 (10LSobanski) p:05Triage→03Medium [16:55:48] 10Continuous-Integration-Infrastructure, 10SRE, 10serviceops-collab: contint2002 service implementation tracking - https://phabricator.wikimedia.org/T324659 (10LSobanski) p:05Triage→03Medium [17:03:45] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.40.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T320519 (10ssastry) ##### Risky Patch! 🚂🔥 * **Change**: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Cite/+/853079 * **Summary**: This (and the pr... [17:35:31] (03PS19) 10Hashar: Replace CI results table by Gerrit Check API [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/859083 (https://phabricator.wikimedia.org/T214068) [17:35:34] (03PS9) 10Hashar: Add unit testing with QUnit [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/861486 [17:40:07] (03CR) 10Hashar: "non voting jobs with a message (such as operations-mw-config-php74-composer-diffConfig-docker ) were not recognized and defaulted to ERROR" [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/859083 (https://phabricator.wikimedia.org/T214068) (owner: 10Hashar) [17:41:17] (03CR) 10Hashar: "Updated tests for https://gerrit.wikimedia.org/r/c/operations/software/gerrit/+/859083/18..19" [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/861486 (owner: 10Hashar) [17:57:37] 10Release-Engineering-Team (Seen), 10serviceops-collab, 10Patch-For-Review: contint1001 hardware failures (remove contint1001 from production) - https://phabricator.wikimedia.org/T324698 (10Dzahn) a:03Dzahn [20:32:02] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.40.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T320519 (10daniel) This revert should go in before the branch is cut: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/867232 This is not critical, it will... [22:58:30] 10Phabricator, 10Release-Engineering-Team: phabricator-prod-1001 Puppet failure - https://phabricator.wikimedia.org/T324915 (10Dzahn) a:03Dzahn [22:58:41] 10Phabricator, 10Release-Engineering-Team, 10serviceops-collab: phabricator-prod-1001 Puppet failure - https://phabricator.wikimedia.org/T324915 (10Dzahn) [23:02:47] 10Phabricator, 10Release-Engineering-Team, 10serviceops-collab: phabricator-prod-1001 Puppet failure - https://phabricator.wikimedia.org/T324915 (10Dzahn) Fixed! I stopped phd, ran puppet and started phd again and that fixed it. ` systemctl stop phd puppet agent -tv ` ` Notice: /Stage[main]/Phabricat... [23:03:10] 10Phabricator, 10Release-Engineering-Team, 10serviceops-collab: phabricator-prod-1001 Puppet failure - https://phabricator.wikimedia.org/T324915 (10Dzahn) 05Open→03Resolved [23:12:41] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SRE, 10SRE-Access-Requests, 10serviceops-collab: New Keyholder identity for RelEng Jenkins service - https://phabricator.wikimedia.org/T324014 (10Dzahn) The existing format is "deploy_$service", deploy_zuul, deploy_ci_docroot, deploy_service,... so I... [23:24:26] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SRE, 10SRE-Access-Requests, 10serviceops-collab: New Keyholder identity for RelEng Jenkins service - https://phabricator.wikimedia.org/T324014 (10Dzahn) A new keypair / identity has been created in the private puppet repo: ` remote: modules/secret/... [23:27:41] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SRE, 10SRE-Access-Requests, 10serviceops-collab: New Keyholder identity for RelEng Jenkins service - https://phabricator.wikimedia.org/T324014 (10Dzahn) Next this needs to be used either from scap hiera data or directly with the `keyholder::agent` cla... [23:54:59] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SRE, 10SRE-Access-Requests, and 2 others: New Keyholder identity for RelEng Jenkins service - https://phabricator.wikimedia.org/T324014 (10Dzahn) cc: @jelto I followed https://wikitech.wikimedia.org/wiki/Keyholder#Generating_a_key_for_a_new_identity_or_...