[00:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [00:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [00:16:04] (TfInfraTestDestroyFailed) resolved: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:19:46] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [00:20:34] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:49:30] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [02:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:33:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [02:49:30] (OpenstackAPIResponse) firing: (4) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [03:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [03:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [04:59:02] 10Tool-bub2: Redesign the FAQs page - https://phabricator.wikimedia.org/T340385 (10Ed-Gah) Sir @PMenon-WMF. I have gotten acquainted with the codebase. Should I go ahead with the implementation of this task with respect to what is written on the task description? [05:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [05:33:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [06:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [06:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [06:45:21] 10Quarry, 10cloud-services-team (FY2023/2024-Q1), 10superset.wmcloud.org: Replace Quarry with an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Aklapper) > So is it correct that we're looking for a new maintainer, but only in the capacity of migrating all usage of Quarry to Superset?... [06:49:46] (OpenstackAPIResponse) firing: (4) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:11:55] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.drain_node [07:12:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:12:14] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) [07:12:16] 10Cloud-VPS, 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) [07:12:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:15:35] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.drain_node [07:15:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:21:42] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) [07:21:44] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.drain_node [07:21:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:21:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:24:11] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-User, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [cert-manager] Pods are not being restarted after the certificate renewal - https://phabricator.wikimedia.org/T346130 (10CodeReviewBot) dcaro me... [07:26:00] 10Toolforge, 10cloud-services-team, 10GitLab (Pipeline Services Migration🐀), 10Release-Engineering-Team (Priority Backlog πŸ“₯): Move Toolforge PipelineLib repositories to GitLab - https://phabricator.wikimedia.org/T334399 (10dcaro) [07:26:06] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project: [toolforge] repositories move to gitlab - https://phabricator.wikimedia.org/T327057 (10dcaro) [07:26:48] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Goal, 10Patch-For-Review, 10User-aborrero: [toolforge] Move all the components to the gitlab ci/cd flow - https://phabricator.wikimedia.org/T341084 (10dcaro) 05In progressβ†’03Resolved [07:37:33] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [07:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [07:38:01] !log toolsbeta dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [07:38:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [07:40:01] !log tools dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [07:40:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [07:40:32] !log tools dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [07:40:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [07:42:30] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review: [tbs][buildpacks] ensure apt buildpack runs before others - https://phabricator.wikimedia.org/T347985 (10Slst2020) 05Openβ†’03In progress [07:48:00] 10Toolforge (Toolforge iteration 00): [tbs][builder] provide a way to remove buildpacks - https://phabricator.wikimedia.org/T348110 (10Slst2020) [07:56:10] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review: [tbs][buildpacks] ensure apt buildpack runs before others - https://phabricator.wikimedia.org/T347985 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/13 inject-buildpacks: ensure th... [08:16:33] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Integrate Wikimedia Ecosystem within BUB2 tool - https://phabricator.wikimedia.org/T346386 (10Joannetich) hello everyone i am sorry to ask this here despite of the rules @wassan.anmol117 about the "Integrate Wikimedia... [08:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:30:58] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-User, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [cert-manager] Pods are not being restarted after the certificate renewal - https://phabricator.wikimedia.org/T346130 (10CodeReviewBot) dcaro me... [08:33:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [08:40:23] 10Toolforge (Toolforge iteration 00): [tbs][builder] provide a way to remove buildpacks - https://phabricator.wikimedia.org/T348110 (10dcaro) We don't have a way to specifying the buildpacks that you want, how does this happen? For example, once you remove the Aptfile from your repository, the apt buildpack wil... [08:42:20] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-User, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [cert-manager] Pods are not being restarted after the certificate renewal - https://phabricator.wikimedia.org/T346130 (10dcaro) 05In progressβ†’... [08:43:19] 10Cloud-VPS, 10cloud-services-team, 10Data-Platform-SRE, 10ops-eqiad, 10User-aborrero: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10taavi) a:03Jclark-ctr Hi! Can we please have `cloudvirt-wdqs100[1-3]` moved to the WMCS racks, preferrably `E4` or `F4`? They will all need a s... [08:43:57] 10Toolforge (Toolforge iteration 00): [tbs][builder] provide a way to remove buildpacks - https://phabricator.wikimedia.org/T348110 (10Slst2020) Don't have any logs at hand, but can try to reproduce. During testing, I removed the nodejs buildpack I had previously injected, but it was still included in the next b... [08:46:56] 10Toolforge (Toolforge iteration 00): [tbs][builder] provide a way to remove buildpacks - https://phabricator.wikimedia.org/T348110 (10dcaro) Removed it from where? From the tekton pipeline? I usually run two different branches of the same code, with and without Aptfile to test the apt buildpack selection (http... [08:47:36] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [envvars-cli] build packages using gitlab ci - https://phabricator.wikimedia.org/T347580 (10dcaro) 05Openβ†’03In progress [08:47:42] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [envvars-cli] build packages using gitlab ci - https://phabricator.wikimedia.org/T347580 (10dcaro) a:03dcaro [08:50:44] 10Toolforge (Toolforge iteration 00): [tbs][builder] provide a way to remove buildpacks - https://phabricator.wikimedia.org/T348110 (10Slst2020) >>! In T348110#9223854, @dcaro wrote: > Removed it from where? From the tekton pipeline? Yes. I also had some other issues with helm sometimes not actually updating k8... [09:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [09:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [09:39:34] 10Tool-bub2: Redesign the FAQs page - https://phabricator.wikimedia.org/T340385 (10Ed-Gah) 05Openβ†’03In progress [09:39:39] 10Tool-bub2: Redesign the UI to be more minimalistic and cleaner - https://phabricator.wikimedia.org/T340387 (10Ed-Gah) [09:47:06] 10Toolforge (Toolforge iteration 00): [tbs][builder] provide a way to remove buildpacks - https://phabricator.wikimedia.org/T348110 (10dcaro) >>! In T348110#9223864, @Slst2020 wrote: >>>! In T348110#9223854, @dcaro wrote: >> Removed it from where? From the tekton pipeline? > > Yes. I also had some other issues... [09:49:32] 10Toolforge (Toolforge iteration 00): [tbs][builder] provide a way to remove buildpacks - https://phabricator.wikimedia.org/T348110 (10Slst2020) >>! In T348110#9223854, @dcaro wrote: > Ack, helm will not update changes you did manually to the objects, it will check against it's 'recorded state', that is a secret... [10:08:02] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: Decision request – Toolforge (re)architecture - https://phabricator.wikimedia.org/T346153 (10dcaro) I'd go for option 3, with a focus on getting a unified openapi definition... [10:21:33] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Add search bar in queue - https://phabricator.wikimedia.org/T315134 (10Okerekechinweotito) @wassan.anmol117 @Aklapper I have opened a PR to fix this issue. Opened here - [[ https://github.com/coderwassananmol/BUB2/pull/180 | implement sear... [10:29:30] (OpenstackAPIResponse) firing: (5) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [11:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:33:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [11:42:06] 10Striker, 10Release-Engineering-Team (Priority Backlog πŸ“₯): Striker-created Diffusion mirrors of GitLab repos are empty (due to master vs main branch name mismatch) - https://phabricator.wikimedia.org/T348131 (10Aklapper) [11:42:19] 10Striker, 10Release-Engineering-Team (Priority Backlog πŸ“₯): Striker-created Diffusion mirrors of GitLab repos are empty (due to master vs main branch name mismatch) - https://phabricator.wikimedia.org/T348131 (10Aklapper) p:05Triageβ†’03Medium a:03Aklapper [11:43:19] (03PS1) 10Aklapper: Set default branch to "main" for GitLab repos mirrored to Diffusion [labs/striker] - 10https://gerrit.wikimedia.org/r/963292 (https://phabricator.wikimedia.org/T348131) [12:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [12:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [12:08:07] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) [12:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:09:19] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.drain_node [12:09:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:10:21] 10VPS-Projects, 10WMDE-TechWish-Maintenance-2023: Scraper: destroy Cloud VPS runner instance - https://phabricator.wikimedia.org/T345411 (10thiemowmde) [12:15:33] 10VPS-Projects, 10WMDE-TechWish-Maintenance-2023: Scraper: destroy Cloud VPS runner instance - https://phabricator.wikimedia.org/T345411 (10thiemowmde) [12:17:43] (03PS1) 10Majavah: Add support for querying logs [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963295 (https://phabricator.wikimedia.org/T336057) [12:54:30] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:55:56] 10cloud-services-team, 10User-aborrero: cloudgw improvements - https://phabricator.wikimedia.org/T347469 (10aborrero) [12:56:10] 10cloud-services-team, 10User-aborrero: cloudgw improvements - https://phabricator.wikimedia.org/T347469 (10aborrero) p:05Triageβ†’03High [12:57:09] 10Toolforge Build Service (Beta release), 10User-Raymond_Ndibe, 10User-dcaro: Add a way to wait for a Toolforge build to finish - https://phabricator.wikimedia.org/T337043 (10dcaro) [12:57:11] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review, 10User-dcaro: `toolforge build logs`: add follow options - https://phabricator.wikimedia.org/T339922 (10dcaro) 05Stalledβ†’03In progress [12:57:28] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Patch-For-Review: Add commands to `webservice` and `jobs` to query logs from Kubernetes - https://phabricator.wikimedia.org/T336057 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/23 Ad... [12:57:48] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [builds-cli] build packages using gitlab ci - https://phabricator.wikimedia.org/T347579 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/... [13:02:44] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Patch-For-Review: Add commands to `webservice` and `jobs` to query logs from Kubernetes - https://phabricator.wikimedia.org/T336057 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_request... [13:03:35] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Patch-For-Review: Add commands to `webservice` and `jobs` to query logs from Kubernetes - https://phabricator.wikimedia.org/T336057 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_request... [13:03:57] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [13:04:08] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [13:05:37] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [13:05:50] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [13:08:31] 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10netops: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10cmooney) p:05Triageβ†’03Low [13:08:40] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'python3-toolforge-weld' version '1.3.0' [13:08:52] 10cloud-services-team, 10Patch-For-Review, 10User-aborrero: cloudgw improvements - https://phabricator.wikimedia.org/T347469 (10cmooney) [13:08:55] 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10netops: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10cmooney) [13:08:56] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'python3-toolforge-weld' version '1.3.0' [13:09:17] (03CR) 10David Caro: [C: 03+1] "A couple questions, bun nothing blocker" [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963295 (https://phabricator.wikimedia.org/T336057) (owner: 10Majavah) [13:10:25] 10cloud-services-team, 10Patch-For-Review, 10User-aborrero: cloudgw improvements - https://phabricator.wikimedia.org/T347469 (10aborrero) [13:12:07] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [13:13:01] (03PS2) 10Majavah: Add support for querying logs [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963295 (https://phabricator.wikimedia.org/T336057) [13:13:31] (03CR) 10Majavah: Add support for querying logs (032 comments) [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963295 (https://phabricator.wikimedia.org/T336057) (owner: 10Majavah) [13:16:25] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: Decision request – Toolforge (re)architecture - https://phabricator.wikimedia.org/T346153 (10fnegri) I think I have a //slight// preference for option 1, as it seems a good i... [13:16:42] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [envvars-cli] build packages using gitlab ci - https://phabricator.wikimedia.org/T347580 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org... [13:17:14] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review: [tbs][buildpacks] ensure apt buildpack runs before others - https://phabricator.wikimedia.org/T347985 (10CodeReviewBot) sstefanova opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/109 builds-builder: bump to 0... [13:34:08] 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10netops, 10Patch-For-Review: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10cmooney) [13:37:29] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade codfw hosts to bookworm - https://phabricator.wikimedia.org/T345810 (10fnegri) [13:37:31] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [codfw1dev] DNS fails to resolve some addresses - https://phabricator.wikimedia.org/T347861 (10fnegri) 05In progressβ†’03Resolved @aborrero I am resolving this task, please reopen if you still encounter issues. [13:37:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [13:37:45] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node [13:38:25] !log fnegri@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) [13:40:01] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285) [13:42:47] 10cloud-services-team, 10Data-Platform-SRE, 10Dumps-Generation, 10Patch-For-Review: clouddumps100[12] puppet alert: "Puppet performing a change on every puppet run" - https://phabricator.wikimedia.org/T346165 (10jbond) > So why are some returning with uppercase padded zeros, while others are returned witho... [13:45:44] (SystemdUnitCrashLoop) firing: (3) cinder-api.service crashloop on cloudcontrol2004-dev:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [13:49:56] !log admin fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) (T341285) [13:50:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:50:03] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [13:54:07] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal, 10Patch-For-Review: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 (10fnegri) The cookbook failed with the following error ` Database expansion failed. Database expansion should have brought the dat... [13:59:44] (SystemdUnitCrashLoop) firing: (2) neutron-api.service crashloop on cloudcontrol2005-dev:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:08:49] (03PS1) 10Andrew Bogott: Container dashboard: inject polite error page for projects w/out object support [openstack/horizon/horizon] - 10https://gerrit.wikimedia.org/r/963323 (https://phabricator.wikimedia.org/T341509) [14:09:44] (SystemdUnitCrashLoop) firing: (2) neutron-api.service crashloop on cloudcontrol2005-dev:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:10:44] (SystemdUnitCrashLoop) firing: (3) cinder-api.service crashloop on cloudcontrol2004-dev:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:15:44] (SystemdUnitCrashLoop) firing: (3) cinder-api.service crashloop on cloudcontrol2004-dev:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:21:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:23:32] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host cloudelastic1007.e... [14:29:44] (SystemdUnitCrashLoop) firing: (3) cinder-api.service crashloop on cloudcontrol2005-dev:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:29:45] PROBLEM - puppet last run on cloudcontrol1005 is CRITICAL: CRITICAL: Puppet last ran 20 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:33:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:33:13] PROBLEM - puppet last run on cloudcontrol1006 is CRITICAL: CRITICAL: Puppet last ran 21 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:34:44] (SystemdUnitCrashLoop) firing: (3) cinder-api.service crashloop on cloudcontrol2005-dev:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:38:19] RECOVERY - puppet last run on cloudcontrol1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:39:44] (SystemdUnitCrashLoop) firing: (3) cinder-api.service crashloop on cloudcontrol2005-dev:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:39:51] RECOVERY - puppet last run on cloudcontrol1005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:40:33] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10bking) 05Openβ†’03In progress p:05Mediumβ†’03Low a:03bking [14:40:44] (SystemdUnitCrashLoop) firing: (3) cinder-api.service crashloop on cloudcontrol2004-dev:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:40:50] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285) [14:40:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:40:55] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [14:41:11] !log admin fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) (T341285) [14:41:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:41:42] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10bking) Taking this back, as I was able to get the host to boot by changing the boot option for the 2nd NIC interfac... [14:42:44] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal, 10Patch-For-Review: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 (10fnegri) Running the failing command manually worked just fine: ` root@cloudcontrol2001-dev:~# glance-manage db sync 2023-10-04 1... [14:44:41] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285) [14:44:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:51:53] 10Cloud-VPS (Project-requests): Request creation of impact-visualizer VPS project - https://phabricator.wikimedia.org/T347905 (10Andrew) +1 sgtm [14:53:56] 10Cloud-VPS (Project-requests): Request creation of impact-visualizer VPS project - https://phabricator.wikimedia.org/T347905 (10taavi) +1, although due to {T341509} we would prefer to avoid using `-`s in project names. [14:54:16] 10Toolforge (Quota-requests): Request increased quota for deltabot Toolforge tool - https://phabricator.wikimedia.org/T347951 (10taavi) +1 [14:54:53] !log admin fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285) [14:54:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:54:59] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [14:55:59] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [14:56:13] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [14:56:17] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host cloudelastic1007.eqiad... [14:56:18] !log impact_visualizer dcaro@urcuchillay START - Cookbook wmcs.vps.create_project for project impact_visualizer in eqiad1 (T347905) [14:56:18] wm-bot2: Unknown project "impact_visualizer" [14:56:19] T347905: Request creation of impact-visualizer VPS project - https://phabricator.wikimedia.org/T347905 [14:56:22] !log impact_visualizer dcaro@urcuchillay END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for project impact_visualizer in eqiad1 (T347905) [14:56:23] wm-bot2: Unknown project "impact_visualizer" [14:57:00] 10Cloud-VPS (Project-requests): Request creation of impact-visualizer VPS project - https://phabricator.wikimedia.org/T347905 (10Ragesoss) No objection to `impactvisualizer` as the name, if that avoids a possible problem. [14:58:57] !log impactvisualizer dcaro@urcuchillay START - Cookbook wmcs.vps.create_project for project impactvisualizer in eqiad1 (T347905) [14:58:57] wm-bot2: Unknown project "impactvisualizer" [14:58:59] !log impactvisualizer dcaro@urcuchillay END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for project impactvisualizer in eqiad1 (T347905) [14:58:59] wm-bot2: Unknown project "impactvisualizer" [15:00:16] !log impactvisualizer dcaro@urcuchillay START - Cookbook wmcs.vps.create_project for project impactvisualizer in eqiad1 (T347905) [15:00:18] !log impactvisualizer dcaro@urcuchillay END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for project impactvisualizer in eqiad1 (T347905) [15:00:20] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host cloudelastic1007.e... [15:00:23] wm-bot2: Unknown project "impactvisualizer" [15:00:23] wm-bot2: Unknown project "impactvisualizer" [15:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [15:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [15:03:22] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [15:06:51] 10Toolforge (Quota-requests), 10User-dcaro: Request increased quota for deltabot Toolforge tool - https://phabricator.wikimedia.org/T347951 (10dcaro) [15:06:55] 10Toolforge (Quota-requests), 10User-dcaro: Request increased quota for deltabot Toolforge tool - https://phabricator.wikimedia.org/T347951 (10dcaro) a:03dcaro [15:07:01] 10Cloud-VPS (Project-requests): Request creation of impact-visualizer VPS project - https://phabricator.wikimedia.org/T347905 (10dcaro) a:03dcaro [15:07:59] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10Jclark-ctr) [15:16:13] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Use API:EmailUser to send Emails to the users - https://phabricator.wikimedia.org/T338267 (10Joannetich) hello please can this issue be assigned to me [15:17:28] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host cloudelastic1007.eqiad... [15:21:29] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host cloudelastic1007.e... [15:22:34] (ProbeDown) firing: (2) Service toolsbeta-test-k8s-haproxy-3:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:23:11] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal: keystone: segfaults in debian bookworm - https://phabricator.wikimedia.org/T348157 (10aborrero) [15:34:30] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [15:37:35] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host cloudelastic1007.eqiad... [15:40:03] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host cloudelastic1007.e... [15:55:51] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) [15:55:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:07:31] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS b... [16:07:35] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1062.eqiad.wmnet with OS b... [16:07:41] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS b... [16:07:47] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1065.eqiad.wmnet with OS b... [16:09:57] 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 2 others: [tbs.beta] Create a toolforge build service beta release - https://phabricator.wikimedia.org/T267374 (10dcaro) [16:10:36] 10Toolforge Build Service (Beta release), 10User-Raymond_Ndibe, 10User-dcaro: Add a way to wait for a Toolforge build to finish - https://phabricator.wikimedia.org/T337043 (10dcaro) 05Openβ†’03In progress [16:10:56] 10Toolforge Build Service (Beta release), 10cloud-services-team (FY2023/2024-Q1), 10Goal: Toolforge Build Service Beta Rollout To Selected Users - https://phabricator.wikimedia.org/T335249 (10dcaro) 05In progressβ†’03Resolved [16:17:22] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [16:17:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:17:49] !log toolsbeta dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [16:17:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:20:21] !log tools dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [16:20:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:20:53] !log tools dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [16:20:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:23:21] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1066.eqiad.wmnet with OS bullseye [16:23:25] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye [16:29:23] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal: keystone: segfaults in debian bookworm - https://phabricator.wikimedia.org/T348157 (10fnegri) The file `/var/log/keystone/keystone.log` haven't been updated in the past 2 hours, so the segfault is not happening on each restart, and might have happe... [16:39:11] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10bking) Hello DC Ops, I've confirmed that our new partman recipe works in T342463 , but the reimage for `cloudelas... [16:39:34] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10bking) p:05Lowβ†’03Medium a:05bkingβ†’03None [16:52:50] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.drain_node [16:52:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:53:05] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [16:53:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:53:32] !log toolsbeta dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [16:53:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:54:08] !log tools dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [16:54:10] 10Tool-bub2: Fix README.md - https://phabricator.wikimedia.org/T344123 (10Akanksha.t05) 05Openβ†’03In progress [16:54:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:54:39] !log tools dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [16:54:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:55:04] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host cloudelastic1007.eqiad... [17:01:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [17:16:20] 10cloud-services-team, 10Patch-For-Review, 10User-aborrero: Open swift port (28080) to the public internet - https://phabricator.wikimedia.org/T341380 (10Andrew) I've moved this service to port 443, which is open in eqiad1. [17:21:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [17:23:47] 10Cloud-VPS, 10SRE: cloudlb2001-dev and cloudlb2002-dev connected at different speeds - https://phabricator.wikimedia.org/T348173 (10cmooney) p:05Triageβ†’03Low [17:27:42] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye executed with erro... [17:27:46] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS bullseye executed with erro... [17:30:08] 10cloud-services-team, 10Patch-For-Review, 10User-aborrero: Open swift port (28080) to the public internet - https://phabricator.wikimedia.org/T341380 (10cmooney) Looking on one of the cloudlb hosts in codfw it doesn't look like port 443 is open to the world: `lines=10 cmooney@cloudlb2001-dev:~$ sudo iptabl... [17:33:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [17:33:44] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1062.eqiad.wmnet with OS bullseye [17:33:47] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye [17:33:49] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS bullseye [17:33:53] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1065.eqiad.wmnet with OS bullseye [17:34:05] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1066.eqiad.wmnet with OS bullseye [17:43:40] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye executed with erro... [17:52:24] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye [17:56:37] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [18:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [18:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [18:22:34] (ProbeDown) firing: (2) Service toolsbeta-test-k8s-haproxy-3:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:35:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [18:51:50] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) [18:53:28] (03PS1) 10Krinkle: Upgrade to toollabs-base v2, add Phan, PHP 8 compat, remove GitHub ref [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963398 [18:54:00] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye executed with erro... [18:54:06] (03CR) 10Krinkle: [C: 03+2] Upgrade to toollabs-base v2, add Phan, PHP 8 compat, remove GitHub ref [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963398 (owner: 10Krinkle) [18:54:08] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS bullseye executed with erro... [18:54:37] (03Merged) 10jenkins-bot: Upgrade to toollabs-base v2, add Phan, PHP 8 compat, remove GitHub ref [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963398 (owner: 10Krinkle) [18:58:36] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) Added ` root@cloudcontrol1005:~# openstack role add --project quarry --user sd member root@cloudcontrol1005:~# openstack role add --project quarry --user sd reader root@cloudcontrol1005:~# openstack role add --project quarry... [19:06:06] (03PS1) 10Krinkle: Restore apmaxsize=0 parameter in API request [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963399 [19:06:08] (03PS1) 10Krinkle: Improve 'selected namespace' styling [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963400 [19:06:17] (03CR) 10Krinkle: [C: 03+2] Restore apmaxsize=0 parameter in API request [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963399 (owner: 10Krinkle) [19:06:21] (03CR) 10Krinkle: [C: 03+2] Improve 'selected namespace' styling [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963400 (owner: 10Krinkle) [19:06:35] (03CR) 10CI reject: [V: 04-1] Improve 'selected namespace' styling [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963400 (owner: 10Krinkle) [19:06:53] (03Merged) 10jenkins-bot: Restore apmaxsize=0 parameter in API request [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963399 (owner: 10Krinkle) [19:10:13] 10Tool-bub2, 10Internet-Archive: Author is not being sent to Internet Archive for Google Books - https://phabricator.wikimedia.org/T348186 (10wassan.anmol117) [19:11:37] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Author is not being sent to Internet Archive for Google Books - https://phabricator.wikimedia.org/T348186 (10wassan.anmol117) [19:12:39] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye executed with erro... [19:16:16] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): For PDL, download and stream the PDF if available - https://phabricator.wikimedia.org/T348188 (10wassan.anmol117) [19:27:45] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Add max character limit while creating identifier in Internet Archive and remove some special characters - https://phabricator.wikimedia.org/T348192 (10wassan.anmol117) [19:34:47] (OpenstackAPIResponse) firing: (5) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:36:55] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Author is not being sent to Internet Archive for Google Books - https://phabricator.wikimedia.org/T348186 (10Joannetich) @wassan.anmol117 are the skill set the same as js nextjs and nodejs or it is different [19:39:56] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Author is not being sent to Internet Archive for Google Books - https://phabricator.wikimedia.org/T348186 (10Joannetich) hello assign me this task [19:57:11] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Integrate Wikimedia Ecosystem within BUB2 tool - https://phabricator.wikimedia.org/T346386 (10Maryann-Onyinye) a:05Robovaughanβ†’03DO-NOT-CHANGE [19:58:30] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Integrate Wikimedia Ecosystem within BUB2 tool - https://phabricator.wikimedia.org/T346386 (10Maryann-Onyinye) >>! In T346386#9217981, @Robovaughan wrote: > @wassan.anmol117 I will love to work on this task. hI @Robova... [19:59:33] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Integrate Wikimedia Ecosystem within BUB2 tool - https://phabricator.wikimedia.org/T346386 (10Maryann-Onyinye) [20:05:37] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [20:08:11] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Author is not being sent to Internet Archive for Google Books - https://phabricator.wikimedia.org/T348186 (10Joannetich) a:03Joannetich [20:10:06] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Add max character limit while creating identifier in Internet Archive and remove some special characters - https://phabricator.wikimedia.org/T348192 (10Ed-Gah) a:03Ed-Gah [20:15:40] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Add max character limit while creating identifier in Internet Archive and remove some special characters - https://phabricator.wikimedia.org/T348192 (10Maryann-Onyinye) a:05Ed-Gahβ†’03DO-NOT-CHANGE [20:16:36] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Author is not being sent to Internet Archive for Google Books - https://phabricator.wikimedia.org/T348186 (10Maryann-Onyinye) a:05Joannetichβ†’03DO-NOT-CHANGE [20:18:18] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [20:18:20] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) [20:21:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [20:29:30] (OpenstackAPIResponse) firing: (5) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [20:33:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [20:45:57] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1062.eqiad.wmnet with OS bullseye [20:46:06] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS bullseye [20:46:15] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye [20:46:21] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1065.eqiad.wmnet with OS bullseye [20:46:27] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1066.eqiad.wmnet with OS bullseye [20:46:33] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye [21:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [21:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [21:02:39] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1062.eqiad.wmnet with OS bullseye [21:04:30] (OpenstackAPIResponse) firing: (4) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [21:18:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [21:22:34] (ProbeDown) firing: (2) Service toolsbeta-test-k8s-haproxy-3:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:40:26] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1062.eqiad.wmnet with OS bullseye completed: - cloud... [21:46:11] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1065.eqiad.wmnet with OS bullseye [21:54:53] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1066.eqiad.wmnet with OS bullseye completed: - cloud... [22:03:37] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [22:03:52] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [22:04:07] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [22:05:20] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1065.eqiad.wmnet with OS bullseye [22:06:21] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS bullseye executed with erro... [22:06:26] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye executed with erro... [22:06:35] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye executed with erro... [22:09:40] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Author is not being sent to Internet Archive for Google Books - https://phabricator.wikimedia.org/T348186 (10Okerekechinweotito) @wassan.anmol117 I have opened a PR that fixes this issue. Opened here - [[ https://githu... [22:11:53] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS bullseye [22:16:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [22:18:47] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye [22:24:01] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye [22:40:57] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1065.eqiad.wmnet with OS bullseye completed: - cloud... [22:46:37] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [23:04:30] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:05:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [23:21:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [23:24:30] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:32:07] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS bullseye executed with erro... [23:33:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [23:39:03] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye executed with erro... [23:44:17] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye executed with erro...