[00:07:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:08:39] (OpenstackAPIResponse) firing: (7) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [00:17:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [02:24:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [03:08:27] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [03:28:27] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:08:39] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:12:03] (InstanceDown) firing: Project tools instance tools-db-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [05:17:03] (InstanceDown) resolved: Project tools instance tools-db-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [05:24:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [07:28:27] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [08:08:39] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [08:24:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:17:33] 10Cloud Services Proposals, 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, and 3 others: [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs - https://phabricator.wikimedia.org/T194332 (10dcaro) [09:17:41] 10Toolforge (Toolforge iteration 01), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [buildservice] Create a buildservice API and move any logic from the client to it - https://phabricator.wikimedia.org/T334590 (10dcaro) 05In progre... [09:27:56] (ToolsToolsDBWritableState) firing: There should be exactly one writable MariaDB instance instead of 0 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState [09:31:12] 10Toolforge (Toolforge iteration 01): Decision request – Toolforge CLI consolidation - https://phabricator.wikimedia.org/T348749 (10Slst2020) >>! In T348749#9272678, @dcaro wrote: > I vote for option 3, as it's the one that will require less effort duplication, given that the api definiton is something that we w... [09:39:50] (TfInfraTestApplyFailed) firing: (2) Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:42:56] (ToolsToolsDBWritableState) resolved: There should be exactly one writable MariaDB instance instead of 0 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState [09:44:50] (TfInfraTestApplyFailed) firing: (2) Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:51:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance tools-sgewebgen-10-2 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:56:03] (InstanceDown) firing: Project tools instance tools-sgebastion-11 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:01:03] (InstanceDown) resolved: Project tools instance tools-sgebastion-11 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:06:03] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance tools-sgewebgen-10-2 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [10:09:33] (PuppetAgentFailure) firing: (2) Puppet agent failure detected on instance tools-sgewebgen-10-2 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [10:42:30] (03CR) 10Jbond: [C: 03+1] "lgtm" [labs/private] - 10https://gerrit.wikimedia.org/r/967519 (https://phabricator.wikimedia.org/T343486) (owner: 10Dwisehaupt) [11:03:39] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/967183 (owner: 10L10n-bot) [11:03:55] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/967184 (owner: 10L10n-bot) [11:28:28] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [11:30:13] (DiskSpace) firing: Disk space cloudbackup1004:9100:/ 5.956% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [11:37:45] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/967892 (owner: 10L10n-bot) [11:38:03] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/967890 (owner: 10L10n-bot) [11:38:18] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/967895 (owner: 10L10n-bot) [11:45:13] (DiskSpace) resolved: Disk space cloudbackup1004:9100:/ 5.955% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [11:46:19] 10Quarry: Quarry not restarting off main branch - https://phabricator.wikimedia.org/T349603 (10rook) [11:48:48] 10Quarry: Quarry not restarting off main branch - https://phabricator.wikimedia.org/T349603 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/30 [11:48:57] vivian-rook opened https://github.com/toolforge/quarry/pull/30 [11:49:37] 10Quarry: Remove quarry.wsgi on move to k8s - https://phabricator.wikimedia.org/T349605 (10rook) [11:50:06] 10Quarry: Move quarry to magnum - https://phabricator.wikimedia.org/T349029 (10rook) [11:50:09] 10Quarry: Remove quarry.wsgi on move to k8s - https://phabricator.wikimedia.org/T349605 (10rook) [11:50:36] 10Quarry: Quarry not restarting off main branch - https://phabricator.wikimedia.org/T349603 (10rook) T349605 created and linked to T349029 [11:55:17] 10Quarry: Quarry not restarting off main branch - https://phabricator.wikimedia.org/T349603 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/30 [11:55:23] vivian-rook closed https://github.com/toolforge/quarry/pull/30 [11:59:54] 10Quarry: Quarry not restarting off main branch - https://phabricator.wikimedia.org/T349603 (10rook) 05Open→03Resolved [12:00:59] 10cloud-services-team, 10MediaWiki-Engineering: Get platform engineering team green light for Cloud NAT to wikis change - https://phabricator.wikimedia.org/T273738 (10Bmueller) @Andrew @nskaggs - what's the status of this project? - Do you still need this? - if yes, what would be the timeline? Thanks! [12:04:40] 10Grid-Engine-to-K8s-Migration: Migrate rebot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319996 (10komla) 05Open→03Resolved [12:08:39] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:11:18] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Data Products (Sprint 03)): Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10WDoranWMF) [12:17:46] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02): Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10WDoranWMF) [12:18:50] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02): Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10WDoranWMF) [12:19:22] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02): Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10hnowlan) This change has been deployed, and 404 errors have greatly dropped off. Please update if you see any persi... [12:19:29] 10Grid-Engine-to-K8s-Migration: Migrate ytcleaner from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320199 (10komla) >>! In T320199#8337416, @Mbch331 wrote: > I need to adjust my script so it sends mail using the smtp server for tools. > And I need to migrate my wget command... [12:34:33] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance tools-sgeweblight-10-28 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [12:36:03] (PuppetAgentStaleLastRun) firing: (2) Last Puppet run was over 24 hours ago on instance quarry-dev-03 in project quarry - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:37:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:43:31] 10Cloud-VPS, 10cloud-services-team, 10Sustainability (Incident Followup): Monitoring for the main project-proxy instance going down - https://phabricator.wikimedia.org/T316981 (10taavi) 05Open→03Resolved [12:43:37] 10Cloud-VPS, 10cloud-services-team (Kanban): cloud vps web proxy is down - https://phabricator.wikimedia.org/T316975 (10taavi) [12:43:39] 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10Epic, and 2 others: Streamline WMCS Alerting and Paging - https://phabricator.wikimedia.org/T313444 (10taavi) [12:44:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [12:46:37] (CephSlowOps) firing: Ceph cluster in eqiad has 1 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [12:46:42] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349502 (10phaultfinder) [12:50:57] 10Quarry, 10Patch-For-Review: Create minikube deploy for quarry - https://phabricator.wikimedia.org/T301469 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/28 [12:51:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 4 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [12:51:41] 10Quarry: Update helm for quarry on pr - https://phabricator.wikimedia.org/T349031 (10rook) [12:51:43] 10Quarry, 10Patch-For-Review: Create minikube deploy for quarry - https://phabricator.wikimedia.org/T301469 (10rook) 05Open→03Resolved [12:55:47] 10PAWS: Remove old cluster - https://phabricator.wikimedia.org/T349551 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/343 [12:55:56] vivian-rook opened https://github.com/toolforge/paws/pull/343 [13:03:39] 10PAWS: Remove old cluster - https://phabricator.wikimedia.org/T349551 (10rook) 05Open→03Resolved [13:03:45] 10PAWS: Remove old cluster - https://phabricator.wikimedia.org/T349551 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/343 [13:03:53] vivian-rook closed https://github.com/toolforge/paws/pull/343 [13:10:58] 10Toolforge (Toolforge iteration 01): [gitlab,toolforge-deploy] Create a process to open an MR to toolforge-deploy when a new release ofa component happens - https://phabricator.wikimedia.org/T347392 (10Raymond_Ndibe) @dcaro we should mark this as resolved no? [13:12:58] 10Toolforge (Toolforge iteration 01): Add `toolforge envvars quota` - https://phabricator.wikimedia.org/T341087 (10Raymond_Ndibe) a:03Raymond_Ndibe [13:14:40] 10Toolforge (Toolforge iteration 01), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [builds-api] catch harbor timeout when creating repository - https://phabricator.wikimedia.org/T345903 (10Raymond_Ndibe) a:03Raymond_Ndibe [13:16:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance tools-sgeweblight-10-28 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [14:11:42] 10Cloud-VPS, 10cloud-services-team, 10Observability-Alerting, 10SRE-OnFire, and 2 others: monitoring: find out how we could have been paged for outage "Multiple CloudVPS instances lost their IPs" - https://phabricator.wikimedia.org/T347694 (10lmata) [14:18:46] 10Cloud-VPS, 10cloud-services-team, 10Observability-Metrics, 10User-fgiunchedi: Move labs/wmcs (OpenStack) Prometheus instance off cloudmetrics hosts to prometheus* hosts - https://phabricator.wikimedia.org/T336854 (10fgiunchedi) [15:28:28] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [15:30:15] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [15:31:02] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [15:33:05] PROBLEM - Check systemd state on cloudservices1005 is CRITICAL: CRITICAL - degraded: The following units failed: labs-ip-alias-dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:33:15] PROBLEM - Check systemd state on cloudservices1006 is CRITICAL: CRITICAL - degraded: The following units failed: labs-ip-alias-dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:36:03] (PuppetAgentStaleLastRun) firing: (2) Last Puppet run was over 24 hours ago on instance quarry-dev-03 in project quarry - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:36:33] (SystemdUnitDown) firing: The service unit labs-ip-alias-dump.service is in failed status on host cloudservices1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:37:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:37:33] (SystemdUnitDown) firing: The service unit labs-ip-alias-dump.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:44:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [16:06:03] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance tools-sgeweblight-10-28 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [16:08:40] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [16:30:15] RECOVERY - Check systemd state on cloudservices1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:30:29] RECOVERY - Check systemd state on cloudservices1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:31:33] (SystemdUnitDown) resolved: The service unit labs-ip-alias-dump.service is in failed status on host cloudservices1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:32:33] (SystemdUnitDown) resolved: The service unit labs-ip-alias-dump.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:12:16] 10Cloud-VPS, 10Data-Services, 10cloud-services-team, 10User-Marostegui: Support Trove + Swift integration - https://phabricator.wikimedia.org/T349651 (10Andrew) [17:16:01] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032 (10rook) a:03rook [17:44:50] 10Toolforge (Toolforge iteration 01): [tools,harbor] Cleanup old production images - https://phabricator.wikimedia.org/T348538 (10Raymond_Ndibe) did a little research on this and from what I was able to find, there is currently no way to delete and image in a project with `immutable` policy set without first dis... [17:46:07] 10Toolforge (Toolforge iteration 01): [tools,harbor] Cleanup old production images - https://phabricator.wikimedia.org/T348538 (10Raymond_Ndibe) wdyt @dcaro, @Slst2020 ? [17:51:02] 10Toolforge Build Service (Beta release), 10cloud-services-team (FY2022/2023-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 2 others: [buildservice-api] Create a build POST endpoint to start a new build - https://phabricator.wikimedia.org/T337218 (10Raymond_Ndibe) [17:51:09] 10Toolforge (Toolforge iteration 01): Add `toolforge envvars quota` - https://phabricator.wikimedia.org/T341087 (10Raymond_Ndibe) 05Open→03In progress [17:51:14] 10Toolforge (Toolforge iteration 01), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [builds-api] catch harbor timeout when creating repository - https://phabricator.wikimedia.org/T345903 (10Raymond_Ndibe) 05Open→03In progress [18:36:03] (PuppetAgentStaleLastRun) firing: (2) Last Puppet run was over 24 hours ago on instance quarry-dev-03 in project quarry - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:37:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:38:40] (OpenstackAPIResponse) firing: (7) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [18:44:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [19:00:45] 10VPS-project-Wikistats: Add 'wikitide' to wikistats - https://phabricator.wikimedia.org/T349660 (10Reception123) [19:02:21] 10VPS-project-Wikistats, 10User-RhinosF1: Add 'wikitide' to wikistats - https://phabricator.wikimedia.org/T349660 (10RhinosF1) p:05Triage→03Medium [19:03:42] 10VPS-project-Wikistats, 10collaboration-services, 10User-RhinosF1: Add 'wikitide' to wikistats - https://phabricator.wikimedia.org/T349660 (10RhinosF1) @arnoldokoth: This will need puppet work too [19:04:38] 10VPS-project-Wikistats, 10collaboration-services, 10User-RhinosF1: Add 'wikitide' to wikistats - https://phabricator.wikimedia.org/T349660 (10RhinosF1) [19:06:54] 10VPS-project-Wikistats, 10collaboration-services, 10User-RhinosF1: Add 'wikitide' to wikistats - https://phabricator.wikimedia.org/T349660 (10RhinosF1) p:05Medium→03Low @Reception123: given the Miraheze setup is a broken mess, is anyone willing to put the engineering in to develop a proper wrapper aroun... [19:08:40] (OpenstackAPIResponse) firing: (8) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:23:24] 10VPS-project-Wikistats, 10collaboration-services, 10User-RhinosF1: Add 'wikitide' to wikistats - https://phabricator.wikimedia.org/T349660 (10Agent_Isai) What exactly is broken? We use WikiDiscover too as Miraheze as mentioned. [19:24:29] 10VPS-project-Wikistats, 10collaboration-services, 10User-RhinosF1: Add 'wikitide' to wikistats - https://phabricator.wikimedia.org/T349660 (10RhinosF1) The current import scripts do not have any method to remove closed/deleted/made private wikis [19:28:28] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:37:40] (03PS1) 10Anne Tomasevich: releases: Bump Codex to 1.0.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/968347 [19:46:42] (03CR) 10VolkerE: [C: 03+2] releases: Bump Codex to 1.0.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/968347 (owner: 10Anne Tomasevich) [21:37:03] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:44:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [21:50:55] (03CR) 10Jforrester: [C: 03+2] releases: Bump Codex to 1.0.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/968347 (owner: 10Anne Tomasevich) [21:51:45] (03Merged) 10jenkins-bot: releases: Bump Codex to 1.0.0 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/968347 (owner: 10Anne Tomasevich) [22:37:49] 10Grid-Engine-to-K8s-Migration, 10Community-Tech, 10Event Metrics: Migrate grantmetrics from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319779 (10MusikAnimal) [23:08:40] (OpenstackAPIResponse) firing: (8) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:23:48] 10Grid-Engine-to-K8s-Migration, 10Community-Tech, 10Event Metrics: Migrate grantmetrics from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319779 (10MusikAnimal) 05Stalled→03Open No longer stalled. I think for this task, a normal Toolforge job will work, since T254636... [23:28:28] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:51:38] 10Grid-Engine-to-K8s-Migration, 10Community-Tech, 10Event Metrics: Migrate grantmetrics from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319779 (10MusikAnimal) As per above I was hoping to use a simple Toolforge job, but it looks like there's something amiss with the `ma...