[00:06:28] <wmcs-alerts>	 (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[00:11:28] <wmcs-alerts>	 (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[00:36:59] <wikibugs>	 (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis)
[00:40:07] <wikibugs>	 (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis)
[00:41:18] <wikibugs>	 (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis)
[00:49:55] <wikibugs>	 (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis)
[00:51:16] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[00:51:52] <wikibugs>	 (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis)
[01:05:56] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[03:10:19] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772 (10Andrew) 03NEW
[03:10:34] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9686788 (10Andrew)
[03:25:00] <jinxer-wm>	 (NovafullstackSustainedFailures) firing: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[03:25:10] <wikibugs>	 06cloud-services-team: NovafullstackSustainedFailures  The automated tests were unable to create, provision and decommission a VM in the last 5h - https://phabricator.wikimedia.org/T361773 (10phaultfinder) 03NEW
[03:25:39] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9686800 (10Andrew) So I have at least two questions:  1) Why didn't clients automatically renew /var/lib/puppet/ssl/certs/ca.pem on expira...
[03:39:47] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9686821 (10Andrew) This is my favorite kind of joke:   ` root@cloudinfra-cloudvps-puppetserver-1:/srv/puppet/server/ssl/public_keys# puppe...
[04:00:56] <jinxer-wm>	 (SystemdUnitDown) firing: (3) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[04:01:08] <wikibugs>	 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9686840 (10phaultfinder)
[04:05:56] <jinxer-wm>	 (SystemdUnitDown) firing: (4) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[04:06:00] <wikibugs>	 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9686841 (10phaultfinder)
[04:13:19] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9686842 (10Andrew) I'm sure there's a perfectly reasonable, linear path to getting this fixed but it's going to have to wait until I get s...
[04:51:01] <jinxer-wm>	 (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[04:58:30] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[05:03:30] <jinxer-wm>	 (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[05:04:30] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[05:09:30] <jinxer-wm>	 (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[06:53:26] <wikibugs>	 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [k8s] Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant - https://phabricator.wikimedia.org/T358203#9686932 (10Slst2020) 05Open→03In progress
[07:20:32] <wikibugs>	 10Toolforge (Toolforge iteration 08), 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9686986 (10Slst2020) a:03Slst2020
[07:21:58] <wikibugs>	 10Toolforge (Toolforge iteration 08), 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9686988 (10Slst2020) 05Open→03In progress
[07:37:44] <wikibugs>	 10Toolforge (Toolforge iteration 08), 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9687079 (10Slst2020) To clarify, do we want AGPL-3.0-only or AGPL-3.0-or-later?  The latter option seems more sensible to me, see https://www.gnu.org/lic...
[07:42:41] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[07:57:41] <jinxer-wm>	 (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[09:13:41] <jinxer-wm>	 (SystemdUnitDown) firing: (3) The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[09:21:28] <wmcs-alerts>	 (PuppetSyncFailure) resolved: Failed to update Puppet repository /srv/git/operations/puppet on instance metricsinfra-puppetserver-1 in project metricsinfra   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure
[09:24:41] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9687568 (10aborrero)
[09:33:28] <wmcs-alerts>	 (PuppetAgentFailure) firing: (2) Puppet agent failure detected on instance cloudinfra-cloudvps-puppetserver-1 in project cloudinfra   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure
[09:33:43] <wikibugs>	 10Cloud Services Proposals: Decision request template - Update python team best practices - https://phabricator.wikimedia.org/T361804 (10dcaro) 03NEW
[09:34:45] <wikibugs>	 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9687649 (10dcaro) p:05Triage→03Medium
[09:35:23] <wikibugs>	 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9687646 (10dcaro)
[09:40:38] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9687681 (10aborrero) reading https://www.puppet.com/docs/puppet/7/ssl_regenerate_certificates#regenerate_ca_and_all_certificates
[09:42:32] <wikibugs>	 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9687689 (10dcaro)
[09:48:28] <wmcs-alerts>	 (PuppetAgentFailure) firing: (2) Puppet agent failure detected on instance cloudinfra-cloudvps-puppetserver-1 in project cloudinfra   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure
[09:48:53] <wikibugs>	 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9687705 (10dcaro)
[09:58:48] <wikibugs>	 10Toolforge (Toolforge iteration 08), 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9687722 (10dcaro) >>! In T361007#9687079, @Slst2020 wrote: > To clarify, do we want AGPL-3.0-only or AGPL-3.0-or-later? >  > The latter option seems more...
[10:05:22] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9687732 (10aborrero) my intervention today: * saw a failing puppet agent on a canary VM:  `lang=shell-session root@canary1039-3:~# run-pup...
[10:14:13] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9687752 (10taavi) This is the "correct" certificate that instances should have: `lines=10 Certificate:     Data:         Version: 3 (0x2)...
[10:54:28] <wmcs-alerts>	 (PuppetAgentNoResources) resolved: No Puppet resources found on instance gitlab-runners-puppetserver-01 on project gitlab-runners   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[10:55:28] <wmcs-alerts>	 (PuppetAgentNoResources) resolved: No Puppet resources found on instance metricsinfra-puppetserver-1 on project metricsinfra   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[10:55:28] <wmcs-alerts>	 (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance bastion on project paws   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[11:02:28] <wmcs-alerts>	 (PuppetAgentNoResources) resolved: No Puppet resources found on instance clouddb-services-puppetserver-1 on project clouddb-services   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[11:04:28] <wmcs-alerts>	 (PuppetAgentNoResources) resolved: No Puppet resources found on instance project-proxy-puppetserver-1 on project project-proxy   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[11:05:28] <wmcs-alerts>	 (PuppetAgentNoResources) resolved: (2) No Puppet resources found on instance bastion on project paws   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[11:06:28] <wmcs-alerts>	 (PuppetAgentNoResources) resolved: No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[11:07:30] <jinxer-wm>	 (NovafullstackSustainedFailures) resolved: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[11:54:09] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1039']
[11:54:13] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1039']
[11:54:25] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirt1039']
[11:54:47] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirt1039']
[12:55:55] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9687944 (10taavi) I'm going with (1) since I realized that (2) will have some chicken-and-egg problems.  So what I did was: 1. Restore the...
[12:58:41] <wikibugs>	 10superset.wmcloud.org, 07Regression: Lost saved queries - https://phabricator.wikimedia.org/T361822 (10Snaevar) 03NEW
[12:58:57] <wikibugs>	 (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove dummy cert for debmonitor [labs/private] - 10https://gerrit.wikimedia.org/r/1016726 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff)
[12:59:38] <wikibugs>	 10Toolforge (Toolforge iteration 08): 14[jobs-cli] Allow exporting jobs list in YAML format - 14https://phabricator.wikimedia.org/T320575#9688096 (10aborrero) 05In progress→03Resolved 14documentation updated: * https://wikitech.wikimedia.org/w/index.php?title=Help:Toolforge/Jobs_framework&diff=prev&oldi...
[13:00:20] <wikibugs>	 06cloud-services-team, 10Toolforge: [infra] Replace PodSecurityPolicy in Toolforge Kubernetes - https://phabricator.wikimedia.org/T279110#9688125 (10aborrero) Started: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/RBAC_and_PSP/PSP_migration
[13:07:56] <wikibugs>	 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [harbor] upgrade to 2.10.1 - https://phabricator.wikimedia.org/T354507#9688320 (10Slst2020)
[13:08:54] <wikibugs>	 10Cloud-VPS: cloudcumin can't reach bastion-restricted itself - https://phabricator.wikimedia.org/T361831 (10taavi) 03NEW
[13:09:43] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9688382 (10taavi) I ran this via Cumin on all the instances: ` echo "1940237fec9f9ca9e5084fe0c4c4e60ac7ee17bd9b79835efb6812302fa9062d  /va...
[13:09:47] <wikibugs>	 10Toolforge (Toolforge iteration 08), 13Patch-For-Review, 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9688376 (10CodeReviewBot) sstefanova opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/83  dev:...
[13:10:11] <wikibugs>	 10Toolforge (Toolforge iteration 08), 13Patch-For-Review, 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9688392 (10Slst2020) >>! In T361007#9688376, @CodeReviewBot wrote: > sstefanova opened https://gitlab.wikimedia.org/repos/cloud/too...
[13:10:47] <wikibugs>	 10Toolforge (Toolforge iteration 08), 13Patch-For-Review, 07Software-Licensing: [builds-api] builds-api is missing a software license - https://phabricator.wikimedia.org/T361007#9688420 (10taavi) I tend to prefer -only for my own projects but am also totally fine with -or-later.
[13:11:45] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Cumin, 06Infrastructure-Foundations, 13Patch-For-Review: [cumin] [openstack] Openstack backend fails when project is not set - https://phabricator.wikimedia.org/T346453#9688446 (10fnegri)
[13:26:47] <wikibugs>	 06cloud-services-team, 10VPS-Projects, 06collaboration-services, 10Puppet (Puppet 7.0): Update devtools project puppetmaster - https://phabricator.wikimedia.org/T360470#9688488 (10taavi) >>! In T360470#9685143, @Dzahn wrote: > ` > Error 500 on SERVER: Server Error: Could not find class role::puppetserver::...
[13:34:55] <icinga-wm>	 PROBLEM - toolschecker: Redis set/get on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/redis - 236 bytes in 0.013 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[13:35:51] <wmcs-alerts>	 (ProbeDown) firing: (2) Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:40:55] <icinga-wm>	 RECOVERY - toolschecker: Redis set/get on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.015 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[13:42:41] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: (2) Detected 18 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[13:45:52] <wmcs-alerts>	 (ProbeDown) resolved: (2) Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:47:41] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: (3) Detected 18 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[14:03:28] <wmcs-alerts>	 (PuppetAgentFailure) resolved: Puppet agent failure detected on instance cloudinfra-cloudvps-puppetserver-2 in project cloudinfra   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure
[14:06:28] <wmcs-alerts>	 (InstanceDown) firing: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-2 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:07:28] <wmcs-alerts>	 (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 1 deleted instances on cloudinfra-internal-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates
[14:11:28] <wmcs-alerts>	 (InstanceDown) resolved: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-2 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:11:35] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.vps.remove_instance for instance cloudinfra-internal-puppetmaster-02
[14:11:47] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance cloudinfra-internal-puppetmaster-02
[14:12:07] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 cloudinfra START - Cookbook wmcs.vps.remove_instance for instance cloudinfra-acme-chief-01
[14:12:55] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 cloudinfra END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance cloudinfra-acme-chief-01
[14:42:09] <wikibugs>	 10Toolforge (Toolforge iteration 08): [harbor, maintain-harbor] Harbor upgrade 2.10 breaks delete-stale-toolforge-artifacts cron job - https://phabricator.wikimedia.org/T361842 (10Slst2020) 03NEW
[14:42:19] <wikibugs>	 10Toolforge (Toolforge iteration 08): [harbor, maintain-harbor] Harbor upgrade 2.10 breaks delete-stale-toolforge-artifacts cron job - https://phabricator.wikimedia.org/T361842#9688733 (10Slst2020) a:03Slst2020
[14:49:11] <wikibugs>	 10superset.wmcloud.org, 07Regression: Lost saved queries - https://phabricator.wikimedia.org/T361822#9688904 (10rook) I'm guessing these were lost during an upgrade to a newer version. Which was made from a backup of the db, suggesting that a backup would not recover them. I've been concerned that we would dis...
[15:10:58] <wikibugs>	 (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis)
[15:12:11] <wikibugs>	 10Toolforge (Toolforge iteration 08): [harbor, maintain-harbor] Harbor upgrade 2.10 breaks delete-stale-toolforge-artifacts cron job - https://phabricator.wikimedia.org/T361842#9689045 (10Slst2020) Upon further investigation, this seems to have been an artifact of my lima-kilo harbor instance, where I had a few...
[15:13:08] <wikibugs>	 10Quarry: Update helm for quarry on pr - https://phabricator.wikimedia.org/T349031#9689068 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/33
[15:13:19] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/quarry/pull/33
[15:16:55] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/quarry/pull/34
[15:25:02] <notefromgithub>	 vivian-rook opened https://github.com/vivian-rook/quarry/pull/1
[15:25:37] <notefromgithub>	 vivian-rook closed https://github.com/vivian-rook/quarry/pull/1
[15:27:40] <notefromgithub>	 vivian-rook opened https://github.com/vivian-rook/quarry/pull/2
[15:34:26] <wikibugs>	 (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis)
[15:38:52] <wikibugs>	 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [maintain-harbor] Improvements to subcommands and config validation - https://phabricator.wikimedia.org/T353059#9689160 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/23  [maint...
[15:44:29] <wikibugs>	 10Quarry: Update helm for quarry on pr - https://phabricator.wikimedia.org/T349031#9689174 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/33
[15:44:42] <notefromgithub>	 vivian-rook closed https://github.com/toolforge/quarry/pull/33
[15:45:00] <wikibugs>	 10Quarry: Update helm for quarry on pr - https://phabricator.wikimedia.org/T349031#9689176 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/35
[15:45:11] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/quarry/pull/35
[15:47:02] <wikibugs>	 10Quarry: Update helm for quarry on pr - https://phabricator.wikimedia.org/T349031#9689185 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/35
[15:47:07] <wikibugs>	 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [maintain-harbor] Improvements to subcommands and config validation - https://phabricator.wikimedia.org/T353059#9689186 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/24  [maint...
[15:47:10] <notefromgithub>	 vivian-rook closed https://github.com/toolforge/quarry/pull/35
[15:47:19] <notefromgithub>	 vivian-rook closed https://github.com/toolforge/quarry/pull/34
[15:49:05] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/quarry/pull/36
[15:52:18] <notefromgithub>	 vivian-rook closed https://github.com/toolforge/quarry/pull/36
[15:53:04] <wikibugs>	 10Quarry: 14Update helm for quarry on pr - 14https://phabricator.wikimedia.org/T349031#9689215 (10rook) 05Open→03Resolved a:03rook
[16:02:00] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[16:06:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on cloudbackup1001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[16:27:41] <jinxer-wm>	 (CloudVPSDesignateLeaks) resolved: (3) Detected 17 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[16:43:12] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloudbackup100[12]-dev for cinder backup test/dev - https://phabricator.wikimedia.org/T358855#9689530 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm
[16:46:20] <wikibugs>	 10Toolforge, 10wikitech.wikimedia.org, 10Diffusion, 07Documentation: Document diffusion->github mirroring to https://github.com/toolforge/ on wikitech - https://phabricator.wikimedia.org/T361859 (10bd808) 03NEW
[16:52:54] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools, 10Data-Platform-SRE (2024.03.25 - 2024.04.14): 14create and deploy new Elastic Curator deb package - 14https://phabricator.wikimedia.org/T361105#9689591 (10bking) 05Resolved→03Declined
[16:54:57] <wikibugs>	 10Wikibugs: Hashar does not like grey foreground color for distinguishing closed status events - https://phabricator.wikimedia.org/T360353#9689602 (10Nikerabbit) This is how it looks to me: {F44520662}  I feel the main issue here is that changing the color of the main text is not as intuitive as colors that are...
[16:58:49] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[17:08:59] <wikibugs>	 10Wikibugs: Hashar does not like grey foreground color for distinguishing closed status events - https://phabricator.wikimedia.org/T360353#9689656 (10bd808) >>! In T360353#9689602, @Nikerabbit wrote: > This is how it looks to me:  I think that "dimming" of the title and url is what @Danny_B was hoping for in the...
[17:19:51] <wikibugs>	 10Toolforge, 10wikitech.wikimedia.org, 10Diffusion, 07Documentation: Document diffusion->github mirroring to https://github.com/toolforge/ on wikitech - https://phabricator.wikimedia.org/T361859#9689712 (10bd808) The mystery of who poked the account and triggered the first email has been solved. Nothing ne...
[17:31:36] <wikibugs>	 10Toolforge, 10wikitech.wikimedia.org, 10Diffusion, 07Documentation: Document diffusion->github mirroring to https://github.com/toolforge/ on wikitech - https://phabricator.wikimedia.org/T361859#9689794 (10Aklapper) I might misunderstand the task scope but see also T347577#9689792
[17:39:40] <wikibugs>	 10ToolforgeBundle, 06Community-Tech, 10CopyPatrol: Session can't be invalidated, causing problems with language selection - https://phabricator.wikimedia.org/T357821#9689829 (10dmaza)
[17:48:51] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloudbackup100[12]-dev for cinder backup test/dev - https://phabricator.wikimedia.org/T358855#9689866 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm c...
[17:49:37] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloudbackup100[12]-dev for cinder backup test/dev - https://phabricator.wikimedia.org/T358855#9689884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
[18:19:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack
[18:20:00] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloudbackup100[12]-dev for cinder backup test/dev - https://phabricator.wikimedia.org/T358855#9690037 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm c...
[18:23:58] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0)
[18:24:10] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:28:10] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1031 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:31:11] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:35:11] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1031 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:04:13] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1031 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:04:13] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1031 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:07:41] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[19:22:10] <wmcs-alerts>	 (ProjectProxyMainProxyDown) firing: Proxy on proxy-04 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyDown
[19:27:10] <wmcs-alerts>	 (ProjectProxyMainProxyDown) resolved: (2) Proxy on proxy-03 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyDown
[19:31:49] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: 14Request to add catalyst-qte.wmcloud.org webproxy subdomain for the catalyst-qte CloudVPS project - 14https://phabricator.wikimedia.org/T361517#9690334 (10Andrew) 05Open→03Resolved a:03Andrew 14I added catalyst-qte.wmcloud.org  to the domain config by following T...
[19:36:38] <wikibugs>	 10Cloud-VPS: ProjectProxyMainProxyDown should have response page - https://phabricator.wikimedia.org/T361873 (10rook) 03NEW
[19:38:49] <wikibugs>	 10Cloud-VPS: nova-compute proc minimum section in response page - https://phabricator.wikimedia.org/T361874 (10rook) 03NEW
[20:00:12] <wikibugs>	 (03PS1) 10Andrea Denisse: Delete dummy TLS certificate for the performance host [labs/private] - 10https://gerrit.wikimedia.org/r/1017146 (https://phabricator.wikimedia.org/T333615)
[20:00:36] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+2] Delete dummy TLS certificate for the performance host [labs/private] - 10https://gerrit.wikimedia.org/r/1017146 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse)
[20:00:42] <wikibugs>	 (03CR) 10Andrea Denisse: [V:03+2 C:03+2] Delete dummy TLS certificate for the performance host [labs/private] - 10https://gerrit.wikimedia.org/r/1017146 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse)
[20:02:00] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[20:05:26] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[20:07:48] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[20:07:55] <wikibugs>	 06cloud-services-team: PuppetFailure  Puppet failure on cloudbackup1002-dev:9100 - https://phabricator.wikimedia.org/T361880 (10phaultfinder) 03NEW
[20:15:45] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: cloud-init timeout too short on Bookworm - https://phabricator.wikimedia.org/T361749#9690649 (10Andrew) I started with with a VM from a raw debian image, and cloud-init is installed and configured:   ` debian@nopuppetbookworm:~$ systemctl show cloud-init.service | grep Timeo...
[20:21:18] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: cloud-init timeout too short on Bookworm - https://phabricator.wikimedia.org/T361749#9690661 (10Andrew) So puppet is removing eject, and since cloud-init requires it, puppet also removes it. that is not how I expect dependencies to work!  But, in any case, the offending pupp...
[20:22:56] <jinxer-wm>	 (SystemdUnitDown) firing: The systemd unit postgresql@15-main.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[20:23:03] <wikibugs>	 06cloud-services-team: SystemdUnitDown  Unit postgresql@15-main.service on node cloudbackup1002-dev has been down for long. - https://phabricator.wikimedia.org/T361882 (10phaultfinder) 03NEW
[20:30:25] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: cloud-init timeout too short on Bookworm - https://phabricator.wikimedia.org/T361749#9690675 (10MoritzMuehlenhoff) >>! In T361749#9690661, @Andrew wrote: > So puppet is removing eject, and since cloud-init requires it, puppet also removes it. that is not how I expect depende...
[21:02:56] <jinxer-wm>	 (SystemdUnitDown) firing: (3) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:03:03] <wikibugs>	 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9690848 (10phaultfinder)
[21:10:25] <wikibugs>	 06Toolforge-standards-committee: Adoption request for Yapperbot - https://phabricator.wikimedia.org/T361426#9690874 (10DavidTornheim) While you all are decided what to do with the adoption process, could you please unprotect the code files in yapperbot?  In order to maintain the code--either at its original loca...
[21:12:24] <wikibugs>	 10Cloud-VPS (Project-requests): 14Reassign cloud VPS project "media-streaming" to bvibber - 14https://phabricator.wikimedia.org/T361730#9690889 (10bvibber) 14Thanks. Confirmed I'm into the admin interface on new account and can take it from here. :D
[21:29:17] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: 14cloud-init timeout too short on Bookworm - 14https://phabricator.wikimedia.org/T361749#9690937 (10Andrew) 05Open→03Resolved
[21:57:40] <wikibugs>	 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[jobs-api,jobs-cli] Support job health checks - 14https://phabricator.wikimedia.org/T335592#9691002 (10Raymond_Ndibe) 05In progress→03Resolved
[21:58:47] <wikibugs>	 10Toolforge (Toolforge iteration 08): 14[maintain-harbor] Improvements to subcommands and config validation - 14https://phabricator.wikimedia.org/T353059#9691006 (10Raymond_Ndibe) 05In progress→03Resolved
[22:14:19] <wikibugs>	 10Toolforge: [jobs-cli,jobs-api] quota shows different units for limit and usage - https://phabricator.wikimedia.org/T361120#9691052 (10Raymond_Ndibe) a:03Raymond_Ndibe
[22:14:26] <wikibugs>	 10Toolforge (Toolforge iteration 08): [jobs-cli,jobs-api] quota shows different units for limit and usage - https://phabricator.wikimedia.org/T361120#9691053 (10Raymond_Ndibe)
[22:26:04] <wikibugs>	 10Toolforge: [jobs-api,jobs-cli] Support multiple replicas of continuous jobs - https://phabricator.wikimedia.org/T341066#9691068 (10Raymond_Ndibe)
[22:27:07] <wikibugs>	 10Toolforge: [jobs-api,jobs-cli] Support multiple replicas of continuous jobs - https://phabricator.wikimedia.org/T341066#9691067 (10Raymond_Ndibe) how will this affect the current `3 continuous jobs` limit? does 2 replicas of a continuous job count as 1 or 2 when considering limits?
[22:27:22] <wikibugs>	 10Toolforge: [jobs-api,jobs-cli] API read timeout exception crashes `toolforge jobs logs --follow NAME` after a few seconds - https://phabricator.wikimedia.org/T358534#9691069 (10Raymond_Ndibe)
[22:34:17] <wikibugs>	 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#9691074 (10Raymond_Ndibe) we now have a `--health-check-script` argument that allows you to provide a custom health script that kubernetes uses to decide when your workload...
[22:36:25] <wikibugs>	 06cloud-services-team, 10Toolforge: [builds-cli] --debug option behaviour is confusing - https://phabricator.wikimedia.org/T354726#9691076 (10Raymond_Ndibe)
[22:38:56] <wikibugs>	 10Toolforge: [toolforge,jobs] "toolforge jobs logs" fails when job has not started yet - https://phabricator.wikimedia.org/T349775#9691081 (10Raymond_Ndibe)
[22:39:46] <wikibugs>	 10Toolforge: [toolforge,jobs] toolforge jobs logs read timeout error - https://phabricator.wikimedia.org/T356503#9691082 (10Raymond_Ndibe)
[22:40:00] <wikibugs>	 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#9691083 (10AntiCompositeNumber) That's not particularly useful for this task about CronJobs.
[22:41:56] <wikibugs>	 10Toolforge, 07Documentation: Create a high-level overview of Toolforge system architecture - https://phabricator.wikimedia.org/T327760#9691087 (10Raymond_Ndibe)
[22:42:33] <wikibugs>	 10Horizon: project owidm volume owidm-static can not be removed from attachment to owidm-instance and can not be attached to another instance - https://phabricator.wikimedia.org/T361893#9691088 (10Tim-moody)
[22:43:15] <wikibugs>	 10Toolforge: [maintain-harbor] investigate how the tools deletion process currently works and how that can be handled in maintain-harbor - https://phabricator.wikimedia.org/T336813#9691100 (10Raymond_Ndibe)
[22:44:16] <wikibugs>	 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, 07Epic: [toolforge-cli.build] Implement a --json flag to output machine-readable output - https://phabricator.wikimedia.org/T334589#9691102 (10Raymond_Ndibe)
[22:44:17] <wikibugs>	 10Horizon: project owidm volume owidm-static can not be removed from attachment to owidm-instance and can not be attached to another instance - https://phabricator.wikimedia.org/T361893#9691101 (10Tim-moody) @Aklapper I hope that is the right tag, as I'm not very familiar with your project classification.
[22:46:58] <wikibugs>	 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[jobs-api,jobs-cli] Support job health checks - 14https://phabricator.wikimedia.org/T335592#9691103 (10bd808) 14@Raymond_Ndibe I think this feature deserves a section on https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework and an ema...
[22:54:22] <wikibugs>	 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[jobs-api,jobs-cli] Support job health checks - 14https://phabricator.wikimedia.org/T335592#9691106 (10bd808) 14>>! In T335592#9691103, @bd808 wrote: > @Raymond_Ndibe I think this feature deserves a section on https://wikitech.wikimedia.org/wiki/...
[22:55:24] <wikibugs>	 10Cloud-VPS: project owidm volume owidm-static can not be removed from attachment to owidm-instance and can not be attached to another instance - https://phabricator.wikimedia.org/T361893#9691107 (10Tim-moody)
[23:02:38] <wikibugs>	 10Cloud-VPS (Quota-requests): owidm storage quota request - https://phabricator.wikimedia.org/T361895 (10Tim-moody) 03NEW
[23:05:56] <wikibugs>	 10Toolforge (Toolforge iteration 08): remove `File log:` column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896 (10Raymond_Ndibe) 03NEW
[23:06:33] <wikibugs>	 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9691137 (10Raymond_Ndibe)
[23:06:54] <wikibugs>	 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9691138 (10Raymond_Ndibe)
[23:07:06] <wikibugs>	 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9691139 (10Raymond_Ndibe)
[23:12:03] <wikibugs>	 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9691163 (10Raymond_Ndibe)
[23:12:56] <wikibugs>	 10Cloud-VPS: project iiab public key error - https://phabricator.wikimedia.org/T361898 (10Tim-moody) 03NEW
[23:17:32] <wikibugs>	 10VPS-project-Codesearch: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899 (10Krinkle) 03NEW
[23:18:18] <wikibugs>	 10Cloud-VPS: project iiab public key error - https://phabricator.wikimedia.org/T361898#9691187 (10bd808) It looks like this new instance fell victim to {T361749}. I forced a puppet run there and I think you should be able to login now.
[23:18:47] <wikibugs>	 10Cloud-VPS: ssh to new instance "med.iiab.eqiad1.wikimedia.cloud" fails for user timmoody - https://phabricator.wikimedia.org/T361898#9691189 (10bd808)
[23:20:48] <wikibugs>	 10VPS-project-Codesearch: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899#9691194 (10Krinkle)
[23:20:56] <wikibugs>	 10VPS-project-Codesearch: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899#9691192 (10Krinkle) @cmooney suggested I run these commamds for some detail: * `iptables -L -v --line -n` * `iptables -L -v --line -n -t nat` * `ip netns list`  {P59607}
[23:21:46] <wikibugs>	 (03PS2) 10Krinkle: frontend: Change Dockerport to expose port 3003 instead of port 80 [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1016887 (https://phabricator.wikimedia.org/T361899)
[23:22:20] <wikibugs>	 10Cloud-VPS: 14ssh to new instance "med.iiab.eqiad1.wikimedia.cloud" fails for user timmoody - 14https://phabricator.wikimedia.org/T361898#9691211 (10bd808) 05Open→03Resolved a:03bd808 14Please do reopen if the puppet run didn't fix things.
[23:40:02] <wikibugs>	 (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis)
[23:40:34] <wikibugs>	 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691221 (10bd808) test
[23:43:35] <wikibugs>	 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691224 (10bd808) test
[23:44:22] <wikibugs>	 10Toolforge (Toolforge iteration 08): [builds-api] replace all error message models with ResponseMessages - https://phabricator.wikimedia.org/T361901 (10Raymond_Ndibe) 03NEW
[23:44:34] <wikibugs>	 10Toolforge (Toolforge iteration 08): [builds-api] replace all error message models with ResponseMessages - https://phabricator.wikimedia.org/T361901#9691238 (10Raymond_Ndibe) a:03Raymond_Ndibe
[23:44:44] <wikibugs>	 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9691239 (10Raymond_Ndibe) a:03Raymond_Ndibe
[23:45:53] <wikibugs>	 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691240 (10bd808) test
[23:46:41] <wikibugs>	 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691241 (10bd808) test
[23:47:58] <wikibugs>	 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691242 (10bd808) test
[23:55:36] <wikibugs>	 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691246 (10bd808) test
[23:55:50] <wikibugs>	 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691249 (10bd808) test