[00:06:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:11:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:36:59] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [00:40:07] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [00:41:18] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [00:49:55] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [00:51:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [00:51:52] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [01:05:56] (SystemdUnitDown) firing: (2) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:10:19] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772 (10Andrew) 03NEW [03:10:34] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9686788 (10Andrew) [03:25:00] (NovafullstackSustainedFailures) firing: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [03:25:10] 06cloud-services-team: NovafullstackSustainedFailures The automated tests were unable to create, provision and decommission a VM in the last 5h - https://phabricator.wikimedia.org/T361773 (10phaultfinder) 03NEW [03:25:39] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9686800 (10Andrew) So I have at least two questions: 1) Why didn't clients automatically renew /var/lib/puppet/ssl/certs/ca.pem on expira... [03:39:47] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9686821 (10Andrew) This is my favorite kind of joke: ` root@cloudinfra-cloudvps-puppetserver-1:/srv/puppet/server/ssl/public_keys# puppe... [04:00:56] (SystemdUnitDown) firing: (3) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:01:08] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9686840 (10phaultfinder) [04:05:56] (SystemdUnitDown) firing: (4) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:06:00] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9686841 (10phaultfinder) [04:13:19] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9686842 (10Andrew) I'm sure there's a perfectly reasonable, linear path to getting this fixed but it's going to have to wait until I get s... [04:51:01] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:58:30] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:03:30] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:04:30] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:09:30] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:53:26] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [k8s] Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant - https://phabricator.wikimedia.org/T358203#9686932 (10Slst2020) 05Open→03In progress [07:20:32] 10Toolforge (Toolforge iteration 08), 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9686986 (10Slst2020) a:03Slst2020 [07:21:58] 10Toolforge (Toolforge iteration 08), 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9686988 (10Slst2020) 05Open→03In progress [07:37:44] 10Toolforge (Toolforge iteration 08), 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9687079 (10Slst2020) To clarify, do we want AGPL-3.0-only or AGPL-3.0-or-later? The latter option seems more sensible to me, see https://www.gnu.org/lic... [07:42:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:57:41] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:13:41] (SystemdUnitDown) firing: (3) The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:21:28] (PuppetSyncFailure) resolved: Failed to update Puppet repository /srv/git/operations/puppet on instance metricsinfra-puppetserver-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [09:24:41] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9687568 (10aborrero) [09:33:28] (PuppetAgentFailure) firing: (2) Puppet agent failure detected on instance cloudinfra-cloudvps-puppetserver-1 in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:33:43] 10Cloud Services Proposals: Decision request template - Update python team best practices - https://phabricator.wikimedia.org/T361804 (10dcaro) 03NEW [09:34:45] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9687649 (10dcaro) p:05Triage→03Medium [09:35:23] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9687646 (10dcaro) [09:40:38] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9687681 (10aborrero) reading https://www.puppet.com/docs/puppet/7/ssl_regenerate_certificates#regenerate_ca_and_all_certificates [09:42:32] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9687689 (10dcaro) [09:48:28] (PuppetAgentFailure) firing: (2) Puppet agent failure detected on instance cloudinfra-cloudvps-puppetserver-1 in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:48:53] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9687705 (10dcaro) [09:58:48] 10Toolforge (Toolforge iteration 08), 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9687722 (10dcaro) >>! In T361007#9687079, @Slst2020 wrote: > To clarify, do we want AGPL-3.0-only or AGPL-3.0-or-later? > > The latter option seems more... [10:05:22] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9687732 (10aborrero) my intervention today: * saw a failing puppet agent on a canary VM: `lang=shell-session root@canary1039-3:~# run-pup... [10:14:13] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9687752 (10taavi) This is the "correct" certificate that instances should have: `lines=10 Certificate: Data: Version: 3 (0x2)... [10:54:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance gitlab-runners-puppetserver-01 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:55:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance metricsinfra-puppetserver-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:55:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance bastion on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:02:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance clouddb-services-puppetserver-1 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:04:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance project-proxy-puppetserver-1 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:05:28] (PuppetAgentNoResources) resolved: (2) No Puppet resources found on instance bastion on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:06:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:07:30] (NovafullstackSustainedFailures) resolved: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [11:54:09] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1039'] [11:54:13] !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1039'] [11:54:25] !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1039'] [11:54:47] !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1039'] [12:55:55] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9687944 (10taavi) I'm going with (1) since I realized that (2) will have some chicken-and-egg problems. So what I did was: 1. Restore the... [12:58:41] 10superset.wmcloud.org, 07Regression: Lost saved queries - https://phabricator.wikimedia.org/T361822 (10Snaevar) 03NEW [12:58:57] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove dummy cert for debmonitor [labs/private] - 10https://gerrit.wikimedia.org/r/1016726 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff) [12:59:38] 10Toolforge (Toolforge iteration 08): 14[jobs-cli] Allow exporting jobs list in YAML format - 14https://phabricator.wikimedia.org/T320575#9688096 (10aborrero) 05In progress→03Resolved 14documentation updated: * https://wikitech.wikimedia.org/w/index.php?title=Help:Toolforge/Jobs_framework&diff=prev&oldi... [13:00:20] 06cloud-services-team, 10Toolforge: [infra] Replace PodSecurityPolicy in Toolforge Kubernetes - https://phabricator.wikimedia.org/T279110#9688125 (10aborrero) Started: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/RBAC_and_PSP/PSP_migration [13:07:56] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [harbor] upgrade to 2.10.1 - https://phabricator.wikimedia.org/T354507#9688320 (10Slst2020) [13:08:54] 10Cloud-VPS: cloudcumin can't reach bastion-restricted itself - https://phabricator.wikimedia.org/T361831 (10taavi) 03NEW [13:09:43] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9688382 (10taavi) I ran this via Cumin on all the instances: ` echo "1940237fec9f9ca9e5084fe0c4c4e60ac7ee17bd9b79835efb6812302fa9062d /va... [13:09:47] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review, 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9688376 (10CodeReviewBot) sstefanova opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/83 dev:... [13:10:11] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review, 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9688392 (10Slst2020) >>! In T361007#9688376, @CodeReviewBot wrote: > sstefanova opened https://gitlab.wikimedia.org/repos/cloud/too... [13:10:47] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review, 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9688420 (10taavi) I tend to prefer -only for my own projects but am also totally fine with -or-later. [13:11:45] 06cloud-services-team, 10Cloud-VPS, 10Cumin, 06Infrastructure-Foundations, 13Patch-For-Review: [cumin] [openstack] Openstack backend fails when project is not set - https://phabricator.wikimedia.org/T346453#9688446 (10fnegri) [13:26:47] 06cloud-services-team, 10VPS-Projects, 06collaboration-services, 10Puppet (Puppet 7.0): Update devtools project puppetmaster - https://phabricator.wikimedia.org/T360470#9688488 (10taavi) >>! In T360470#9685143, @Dzahn wrote: > ` > Error 500 on SERVER: Server Error: Could not find class role::puppetserver::... [13:34:55] PROBLEM - toolschecker: Redis set/get on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/redis - 236 bytes in 0.013 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [13:35:51] (ProbeDown) firing: (2) Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:40:55] RECOVERY - toolschecker: Redis set/get on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.015 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [13:42:41] (CloudVPSDesignateLeaks) firing: (2) Detected 18 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:45:52] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:47:41] (CloudVPSDesignateLeaks) firing: (3) Detected 18 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:03:28] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance cloudinfra-cloudvps-puppetserver-2 in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [14:06:28] (InstanceDown) firing: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:07:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 1 deleted instances on cloudinfra-internal-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:11:28] (InstanceDown) resolved: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:11:35] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.vps.remove_instance for instance cloudinfra-internal-puppetmaster-02 [14:11:47] !log taavi@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance cloudinfra-internal-puppetmaster-02 [14:12:07] !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.vps.remove_instance for instance cloudinfra-acme-chief-01 [14:12:55] !log taavi@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance cloudinfra-acme-chief-01 [14:42:09] 10Toolforge (Toolforge iteration 08): [harbor, maintain-harbor] Harbor upgrade 2.10 breaks delete-stale-toolforge-artifacts cron job - https://phabricator.wikimedia.org/T361842 (10Slst2020) 03NEW [14:42:19] 10Toolforge (Toolforge iteration 08): [harbor, maintain-harbor] Harbor upgrade 2.10 breaks delete-stale-toolforge-artifacts cron job - https://phabricator.wikimedia.org/T361842#9688733 (10Slst2020) a:03Slst2020 [14:49:11] 10superset.wmcloud.org, 07Regression: Lost saved queries - https://phabricator.wikimedia.org/T361822#9688904 (10rook) I'm guessing these were lost during an upgrade to a newer version. Which was made from a backup of the db, suggesting that a backup would not recover them. I've been concerned that we would dis... [15:10:58] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [15:12:11] 10Toolforge (Toolforge iteration 08): [harbor, maintain-harbor] Harbor upgrade 2.10 breaks delete-stale-toolforge-artifacts cron job - https://phabricator.wikimedia.org/T361842#9689045 (10Slst2020) Upon further investigation, this seems to have been an artifact of my lima-kilo harbor instance, where I had a few... [15:13:08] 10Quarry: Update helm for quarry on pr - https://phabricator.wikimedia.org/T349031#9689068 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/33 [15:13:19] vivian-rook opened https://github.com/toolforge/quarry/pull/33 [15:16:55] vivian-rook opened https://github.com/toolforge/quarry/pull/34 [15:25:02] vivian-rook opened https://github.com/vivian-rook/quarry/pull/1 [15:25:37] vivian-rook closed https://github.com/vivian-rook/quarry/pull/1 [15:27:40] vivian-rook opened https://github.com/vivian-rook/quarry/pull/2 [15:34:26] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [15:38:52] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [maintain-harbor] Improvements to subcommands and config validation - https://phabricator.wikimedia.org/T353059#9689160 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/23 [maint... [15:44:29] 10Quarry: Update helm for quarry on pr - https://phabricator.wikimedia.org/T349031#9689174 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/33 [15:44:42] vivian-rook closed https://github.com/toolforge/quarry/pull/33 [15:45:00] 10Quarry: Update helm for quarry on pr - https://phabricator.wikimedia.org/T349031#9689176 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/35 [15:45:11] vivian-rook opened https://github.com/toolforge/quarry/pull/35 [15:47:02] 10Quarry: Update helm for quarry on pr - https://phabricator.wikimedia.org/T349031#9689185 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/35 [15:47:07] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [maintain-harbor] Improvements to subcommands and config validation - https://phabricator.wikimedia.org/T353059#9689186 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/24 [maint... [15:47:10] vivian-rook closed https://github.com/toolforge/quarry/pull/35 [15:47:19] vivian-rook closed https://github.com/toolforge/quarry/pull/34 [15:49:05] vivian-rook opened https://github.com/toolforge/quarry/pull/36 [15:52:18] vivian-rook closed https://github.com/toolforge/quarry/pull/36 [15:53:04] 10Quarry: 14Update helm for quarry on pr - 14https://phabricator.wikimedia.org/T349031#9689215 (10rook) 05Open→03Resolved a:03rook [16:02:00] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [16:06:48] (PuppetZeroResources) firing: Puppet has failed generate resources on cloudbackup1001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [16:27:41] (CloudVPSDesignateLeaks) resolved: (3) Detected 17 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:43:12] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloudbackup100[12]-dev for cinder backup test/dev - https://phabricator.wikimedia.org/T358855#9689530 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm [16:46:20] 10Toolforge, 10wikitech.wikimedia.org, 10Diffusion, 07Documentation: Document diffusion->github mirroring to https://github.com/toolforge/ on wikitech - https://phabricator.wikimedia.org/T361859 (10bd808) 03NEW [16:52:54] 10cloud-services-team (FY2023/2024-Q3-Q4), 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools, 10Data-Platform-SRE (2024.03.25 - 2024.04.14): 14create and deploy new Elastic Curator deb package - 14https://phabricator.wikimedia.org/T361105#9689591 (10bking) 05Resolved→03Declined [16:54:57] 10Wikibugs: Hashar does not like grey foreground color for distinguishing closed status events - https://phabricator.wikimedia.org/T360353#9689602 (10Nikerabbit) This is how it looks to me: {F44520662} I feel the main issue here is that changing the color of the main text is not as intuitive as colors that are... [16:58:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [17:08:59] 10Wikibugs: Hashar does not like grey foreground color for distinguishing closed status events - https://phabricator.wikimedia.org/T360353#9689656 (10bd808) >>! In T360353#9689602, @Nikerabbit wrote: > This is how it looks to me: I think that "dimming" of the title and url is what @Danny_B was hoping for in the... [17:19:51] 10Toolforge, 10wikitech.wikimedia.org, 10Diffusion, 07Documentation: Document diffusion->github mirroring to https://github.com/toolforge/ on wikitech - https://phabricator.wikimedia.org/T361859#9689712 (10bd808) The mystery of who poked the account and triggered the first email has been solved. Nothing ne... [17:31:36] 10Toolforge, 10wikitech.wikimedia.org, 10Diffusion, 07Documentation: Document diffusion->github mirroring to https://github.com/toolforge/ on wikitech - https://phabricator.wikimedia.org/T361859#9689794 (10Aklapper) I might misunderstand the task scope but see also T347577#9689792 [17:39:40] 10ToolforgeBundle, 06Community-Tech, 10CopyPatrol: Session can't be invalidated, causing problems with language selection - https://phabricator.wikimedia.org/T357821#9689829 (10dmaza) [17:48:51] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloudbackup100[12]-dev for cinder backup test/dev - https://phabricator.wikimedia.org/T358855#9689866 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm c... [17:49:37] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloudbackup100[12]-dev for cinder backup test/dev - https://phabricator.wikimedia.org/T358855#9689884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm [18:19:26] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [18:20:00] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloudbackup100[12]-dev for cinder backup test/dev - https://phabricator.wikimedia.org/T358855#9690037 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm c... [18:23:58] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [18:24:10] PROBLEM - nova-compute proc minimum on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:28:10] RECOVERY - nova-compute proc minimum on cloudvirt1031 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:31:11] PROBLEM - nova-compute proc minimum on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:35:11] PROBLEM - nova-compute proc maximum on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:04:13] RECOVERY - nova-compute proc minimum on cloudvirt1031 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:04:13] RECOVERY - nova-compute proc maximum on cloudvirt1031 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:07:41] (SystemdUnitDown) firing: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:22:10] (ProjectProxyMainProxyDown) firing: Proxy on proxy-04 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyDown [19:27:10] (ProjectProxyMainProxyDown) resolved: (2) Proxy on proxy-03 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyDown [19:31:49] 06cloud-services-team, 10Cloud-VPS: 14Request to add catalyst-qte.wmcloud.org webproxy subdomain for the catalyst-qte CloudVPS project - 14https://phabricator.wikimedia.org/T361517#9690334 (10Andrew) 05Open→03Resolved a:03Andrew 14I added catalyst-qte.wmcloud.org to the domain config by following T... [19:36:38] 10Cloud-VPS: ProjectProxyMainProxyDown should have response page - https://phabricator.wikimedia.org/T361873 (10rook) 03NEW [19:38:49] 10Cloud-VPS: nova-compute proc minimum section in response page - https://phabricator.wikimedia.org/T361874 (10rook) 03NEW [20:00:12] (03PS1) 10Andrea Denisse: Delete dummy TLS certificate for the performance host [labs/private] - 10https://gerrit.wikimedia.org/r/1017146 (https://phabricator.wikimedia.org/T333615) [20:00:36] (03CR) 10Andrea Denisse: [C:03+2] Delete dummy TLS certificate for the performance host [labs/private] - 10https://gerrit.wikimedia.org/r/1017146 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse) [20:00:42] (03CR) 10Andrea Denisse: [V:03+2 C:03+2] Delete dummy TLS certificate for the performance host [labs/private] - 10https://gerrit.wikimedia.org/r/1017146 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse) [20:02:00] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [20:05:26] (SystemdUnitDown) firing: (2) The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:07:48] (PuppetFailure) firing: Puppet has failed on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [20:07:55] 06cloud-services-team: PuppetFailure Puppet failure on cloudbackup1002-dev:9100 - https://phabricator.wikimedia.org/T361880 (10phaultfinder) 03NEW [20:15:45] 06cloud-services-team, 10Cloud-VPS: cloud-init timeout too short on Bookworm - https://phabricator.wikimedia.org/T361749#9690649 (10Andrew) I started with with a VM from a raw debian image, and cloud-init is installed and configured: ` debian@nopuppetbookworm:~$ systemctl show cloud-init.service | grep Timeo... [20:21:18] 06cloud-services-team, 10Cloud-VPS: cloud-init timeout too short on Bookworm - https://phabricator.wikimedia.org/T361749#9690661 (10Andrew) So puppet is removing eject, and since cloud-init requires it, puppet also removes it. that is not how I expect dependencies to work! But, in any case, the offending pupp... [20:22:56] (SystemdUnitDown) firing: The systemd unit postgresql@15-main.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:23:03] 06cloud-services-team: SystemdUnitDown Unit postgresql@15-main.service on node cloudbackup1002-dev has been down for long. - https://phabricator.wikimedia.org/T361882 (10phaultfinder) 03NEW [20:30:25] 06cloud-services-team, 10Cloud-VPS: cloud-init timeout too short on Bookworm - https://phabricator.wikimedia.org/T361749#9690675 (10MoritzMuehlenhoff) >>! In T361749#9690661, @Andrew wrote: > So puppet is removing eject, and since cloud-init requires it, puppet also removes it. that is not how I expect depende... [21:02:56] (SystemdUnitDown) firing: (3) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:03:03] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9690848 (10phaultfinder) [21:10:25] 06Toolforge-standards-committee: Adoption request for Yapperbot - https://phabricator.wikimedia.org/T361426#9690874 (10DavidTornheim) While you all are decided what to do with the adoption process, could you please unprotect the code files in yapperbot? In order to maintain the code--either at its original loca... [21:12:24] 10Cloud-VPS (Project-requests): 14Reassign cloud VPS project "media-streaming" to bvibber - 14https://phabricator.wikimedia.org/T361730#9690889 (10bvibber) 14Thanks. Confirmed I'm into the admin interface on new account and can take it from here. :D [21:29:17] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: 14cloud-init timeout too short on Bookworm - 14https://phabricator.wikimedia.org/T361749#9690937 (10Andrew) 05Open→03Resolved [21:57:40] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[jobs-api,jobs-cli] Support job health checks - 14https://phabricator.wikimedia.org/T335592#9691002 (10Raymond_Ndibe) 05In progress→03Resolved [21:58:47] 10Toolforge (Toolforge iteration 08): 14[maintain-harbor] Improvements to subcommands and config validation - 14https://phabricator.wikimedia.org/T353059#9691006 (10Raymond_Ndibe) 05In progress→03Resolved [22:14:19] 10Toolforge: [jobs-cli,jobs-api] quota shows different units for limit and usage - https://phabricator.wikimedia.org/T361120#9691052 (10Raymond_Ndibe) a:03Raymond_Ndibe [22:14:26] 10Toolforge (Toolforge iteration 08): [jobs-cli,jobs-api] quota shows different units for limit and usage - https://phabricator.wikimedia.org/T361120#9691053 (10Raymond_Ndibe) [22:26:04] 10Toolforge: [jobs-api,jobs-cli] Support multiple replicas of continuous jobs - https://phabricator.wikimedia.org/T341066#9691068 (10Raymond_Ndibe) [22:27:07] 10Toolforge: [jobs-api,jobs-cli] Support multiple replicas of continuous jobs - https://phabricator.wikimedia.org/T341066#9691067 (10Raymond_Ndibe) how will this affect the current `3 continuous jobs` limit? does 2 replicas of a continuous job count as 1 or 2 when considering limits? [22:27:22] 10Toolforge: [jobs-api,jobs-cli] API read timeout exception crashes `toolforge jobs logs --follow NAME` after a few seconds - https://phabricator.wikimedia.org/T358534#9691069 (10Raymond_Ndibe) [22:34:17] 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#9691074 (10Raymond_Ndibe) we now have a `--health-check-script` argument that allows you to provide a custom health script that kubernetes uses to decide when your workload... [22:36:25] 06cloud-services-team, 10Toolforge: [builds-cli] --debug option behaviour is confusing - https://phabricator.wikimedia.org/T354726#9691076 (10Raymond_Ndibe) [22:38:56] 10Toolforge: [toolforge,jobs] "toolforge jobs logs" fails when job has not started yet - https://phabricator.wikimedia.org/T349775#9691081 (10Raymond_Ndibe) [22:39:46] 10Toolforge: [toolforge,jobs] toolforge jobs logs read timeout error - https://phabricator.wikimedia.org/T356503#9691082 (10Raymond_Ndibe) [22:40:00] 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#9691083 (10AntiCompositeNumber) That's not particularly useful for this task about CronJobs. [22:41:56] 10Toolforge, 07Documentation: Create a high-level overview of Toolforge system architecture - https://phabricator.wikimedia.org/T327760#9691087 (10Raymond_Ndibe) [22:42:33] 10Horizon: project owidm volume owidm-static can not be removed from attachment to owidm-instance and can not be attached to another instance - https://phabricator.wikimedia.org/T361893#9691088 (10Tim-moody) [22:43:15] 10Toolforge: [maintain-harbor] investigate how the tools deletion process currently works and how that can be handled in maintain-harbor - https://phabricator.wikimedia.org/T336813#9691100 (10Raymond_Ndibe) [22:44:16] 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, 07Epic: [toolforge-cli.build] Implement a --json flag to output machine-readable output - https://phabricator.wikimedia.org/T334589#9691102 (10Raymond_Ndibe) [22:44:17] 10Horizon: project owidm volume owidm-static can not be removed from attachment to owidm-instance and can not be attached to another instance - https://phabricator.wikimedia.org/T361893#9691101 (10Tim-moody) @Aklapper I hope that is the right tag, as I'm not very familiar with your project classification. [22:46:58] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[jobs-api,jobs-cli] Support job health checks - 14https://phabricator.wikimedia.org/T335592#9691103 (10bd808) 14@Raymond_Ndibe I think this feature deserves a section on https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework and an ema... [22:54:22] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[jobs-api,jobs-cli] Support job health checks - 14https://phabricator.wikimedia.org/T335592#9691106 (10bd808) 14>>! In T335592#9691103, @bd808 wrote: > @Raymond_Ndibe I think this feature deserves a section on https://wikitech.wikimedia.org/wiki/... [22:55:24] 10Cloud-VPS: project owidm volume owidm-static can not be removed from attachment to owidm-instance and can not be attached to another instance - https://phabricator.wikimedia.org/T361893#9691107 (10Tim-moody) [23:02:38] 10Cloud-VPS (Quota-requests): owidm storage quota request - https://phabricator.wikimedia.org/T361895 (10Tim-moody) 03NEW [23:05:56] 10Toolforge (Toolforge iteration 08): remove `File log:` column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896 (10Raymond_Ndibe) 03NEW [23:06:33] 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9691137 (10Raymond_Ndibe) [23:06:54] 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9691138 (10Raymond_Ndibe) [23:07:06] 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9691139 (10Raymond_Ndibe) [23:12:03] 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9691163 (10Raymond_Ndibe) [23:12:56] 10Cloud-VPS: project iiab public key error - https://phabricator.wikimedia.org/T361898 (10Tim-moody) 03NEW [23:17:32] 10VPS-project-Codesearch: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899 (10Krinkle) 03NEW [23:18:18] 10Cloud-VPS: project iiab public key error - https://phabricator.wikimedia.org/T361898#9691187 (10bd808) It looks like this new instance fell victim to {T361749}. I forced a puppet run there and I think you should be able to login now. [23:18:47] 10Cloud-VPS: ssh to new instance "med.iiab.eqiad1.wikimedia.cloud" fails for user timmoody - https://phabricator.wikimedia.org/T361898#9691189 (10bd808) [23:20:48] 10VPS-project-Codesearch: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899#9691194 (10Krinkle) [23:20:56] 10VPS-project-Codesearch: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899#9691192 (10Krinkle) @cmooney suggested I run these commamds for some detail: * `iptables -L -v --line -n` * `iptables -L -v --line -n -t nat` * `ip netns list` {P59607} [23:21:46] (03PS2) 10Krinkle: frontend: Change Dockerport to expose port 3003 instead of port 80 [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1016887 (https://phabricator.wikimedia.org/T361899) [23:22:20] 10Cloud-VPS: 14ssh to new instance "med.iiab.eqiad1.wikimedia.cloud" fails for user timmoody - 14https://phabricator.wikimedia.org/T361898#9691211 (10bd808) 05Open→03Resolved a:03bd808 14Please do reopen if the puppet run didn't fix things. [23:40:02] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [23:40:34] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691221 (10bd808) test [23:43:35] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691224 (10bd808) test [23:44:22] 10Toolforge (Toolforge iteration 08): [builds-api] replace all error message models with ResponseMessages - https://phabricator.wikimedia.org/T361901 (10Raymond_Ndibe) 03NEW [23:44:34] 10Toolforge (Toolforge iteration 08): [builds-api] replace all error message models with ResponseMessages - https://phabricator.wikimedia.org/T361901#9691238 (10Raymond_Ndibe) a:03Raymond_Ndibe [23:44:44] 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9691239 (10Raymond_Ndibe) a:03Raymond_Ndibe [23:45:53] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691240 (10bd808) test [23:46:41] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691241 (10bd808) test [23:47:58] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691242 (10bd808) test [23:55:36] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691246 (10bd808) test [23:55:50] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691249 (10bd808) test