[00:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [00:07:19] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [00:33:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [02:26:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [03:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [03:07:19] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [03:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [03:33:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [03:58:32] (OpenstackAPIResponse) firing: (4) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:56:29] 10Tool-bub2, 10Test-Coverage: Write unit test cases - https://phabricator.wikimedia.org/T344117 (10Akanksha.t05) Made a PR for it - https://github.com/coderwassananmol/BUB2/pull/188 [05:12:23] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Change UploadedItems.js component to stateless functional components - https://phabricator.wikimedia.org/T348416 (10ReemBsrat) Made a pull request for this task. [05:16:17] 10cloud-services-team: git repo for kolla - https://phabricator.wikimedia.org/T348459 (10Aklapper) Removing tag for a calendar quarter in Jan-Mar 2023; adding generic #cloud-services-team as there is no tag for the current quarter. [05:26:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [05:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [05:44:11] 10tool-wscontest: Incorrect stats on landing page - https://phabricator.wikimedia.org/T348210 (10PMenon-WMF) 05Open→03Resolved [05:48:32] (OpenstackAPIResponse) firing: (5) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [06:07:19] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [06:18:32] (OpenstackAPIResponse) firing: (5) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:33:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [06:33:32] (OpenstackAPIResponse) firing: (5) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:58:32] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:11:16] 10Tool-refill: Pick a capitalization: reFill? Refill? ReFill? - https://phabricator.wikimedia.org/T340506 (10Curb_Safe_Charmer) 05Open→03Declined p:05Triage→03Low [07:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:45:23] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [07:45:29] 10Toolforge (Toolforge iteration 00), 10Toolforge Jobs framework, 10Patch-For-Review: jobs: Add option to disable NFS mounts - https://phabricator.wikimedia.org/T348250 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/114 jobs-api: bump vers... [07:45:35] 10Toolforge (Toolforge iteration 00), 10Toolforge Jobs framework, 10Patch-For-Review: jobs: Add option to disable NFS mounts - https://phabricator.wikimedia.org/T348250 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/114 jobs-api: bump vers... [07:45:35] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [07:58:39] 10Cloud-VPS, 10cloud-services-team: postgresql is stuck on cloudbackup2001 - https://phabricator.wikimedia.org/T348431 (10taavi) 05Open→03Resolved It seems like Postgres is back up. Andrew also did something to the data directory and now it's about halfway full instead of 99% full. [08:12:59] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [08:13:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:13:05] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [08:13:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:14:54] 10Toolforge: Access denied to Toolforge DB - https://phabricator.wikimedia.org/T348502 (10Criscod) [08:15:03] 10cloud-services-team, 10wikitech.wikimedia.org: wikitech-static is out of disk - https://phabricator.wikimedia.org/T348503 (10taavi) [08:17:02] 10Tool-bub2, 10Patch-For-Review: Make the queue refresh automatically - https://phabricator.wikimedia.org/T344119 (10PMenon-WMF) [08:17:05] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [08:17:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:17:10] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [08:17:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:17:42] 10cloud-services-team, 10wikitech.wikimedia.org: wikitech-static is out of disk - https://phabricator.wikimedia.org/T348503 (10taavi) The archived images directory seems to be taking most of the space: ` root@wikitech-static:/srv/mediawiki/images/wikitech# du -sh archive/ 41G archive/ ` [08:17:46] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [08:17:48] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [08:17:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:17:53] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [08:17:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:17:58] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [08:18:14] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [08:18:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:18:19] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [08:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:18:29] 10Cloud-VPS, 10cloud-services-team: pdns auth metrics unreachable on prod network - https://phabricator.wikimedia.org/T348437 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi [08:19:24] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [08:19:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:26:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:30:11] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [08:30:28] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [08:32:27] 10Tool-bub2: Use .jsx for files that content JSX syntax - https://phabricator.wikimedia.org/T348505 (10ThierryW23) [08:32:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [08:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [09:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [09:04:52] 10Toolforge (Toolforge iteration 00): [tbs][builder] provide a way to remove buildpacks - https://phabricator.wikimedia.org/T348110 (10Slst2020) 05Open→03Invalid Closing this for now, as the issues I was observing most likely were caused by my dev environment. [09:05:17] 10Toolforge (Toolforge iteration 00): [tbs][builder] provide a way to remove buildpacks - https://phabricator.wikimedia.org/T348110 (10Slst2020) 05Invalid→03Resolved [09:05:34] 10Toolforge (Toolforge iteration 00): [tbs][builder] provide a way to remove buildpacks - https://phabricator.wikimedia.org/T348110 (10Slst2020) 05Resolved→03Invalid [09:07:19] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [09:09:52] 10Toolforge (Toolforge iteration 00), 10Toolforge Jobs framework, 10Patch-For-Review: jobs: Add option to disable NFS mounts - https://phabricator.wikimedia.org/T348250 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/2 Add --mount option [09:29:33] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal, 10Patch-For-Review: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 (10fnegri) https://gerrit.wikimedia.org/r/964858 fixed the Puppet constant change in `cloudservices200[4-5]-dev`. I'm proceeding wit... [09:31:03] 10cloud-services-team, 10wikitech.wikimedia.org: wikitech-static is out of disk - https://phabricator.wikimedia.org/T348503 (10taavi) 05Open→03Resolved a:03taavi [09:31:22] 10cloud-services-team, 10Data-Platform-SRE, 10Dumps-Generation, 10Patch-For-Review: clouddumps100[12] puppet alert: "Puppet performing a change on every puppet run" - https://phabricator.wikimedia.org/T346165 (10jbond) @BTullis i have merged a patch and ran puppet on the two clouddumps host 5 times now and... [09:32:13] (DiskSpace) firing: Disk space cloudbackup1004:9100:/ 5.929% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:33:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [09:43:43] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285) [09:43:48] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [09:47:13] (DiskSpace) resolved: Disk space cloudbackup1004:9100:/ 5.927% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:50:22] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285) [09:50:28] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [09:53:50] (03PS1) 10FNegri: live_upgrade_openstack: add runtime description [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/964872 [09:54:58] 10Toolforge: Access denied to Toolforge DB - https://phabricator.wikimedia.org/T348502 (10dcaro) Hi @Criscod, I think you are using the wrong database name (probably), the replica databases end with `_p`. I was able to connect from your user account: ` cristinasarasua@tools-sgebastion-10:~$ mariadb --defaults-fi... [09:56:36] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285) [09:56:41] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [09:56:54] (03CR) 10CI reject: [V: 04-1] live_upgrade_openstack: add runtime description [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/964872 (owner: 10FNegri) [10:01:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:03:20] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285) [10:03:26] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [10:06:31] 10cloud-services-team, 10Data-Platform-SRE, 10Dumps-Generation, 10Patch-For-Review: clouddumps100[12] puppet alert: "Puppet performing a change on every puppet run" - https://phabricator.wikimedia.org/T346165 (10BTullis) Great! Many thanks indeed @jbond - I'll monitor for a day or so, as you suggest. [10:08:03] (TfInfraTestDestroyFailed) resolved: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [10:11:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:16:29] 10Toolforge: Access denied to Toolforge DB - https://phabricator.wikimedia.org/T348502 (10Criscod) Thanks for the quick response! That was the problem. Thank you very much and sorry for the oversight! Best, Cristina [10:17:29] 10Toolforge: Access denied to Toolforge DB - https://phabricator.wikimedia.org/T348502 (10Criscod) 05Open→03Resolved a:03Criscod [10:30:45] 10Toolforge Jobs framework: Add health check support to toolforge-jobs - https://phabricator.wikimedia.org/T348512 (10taavi) [10:52:45] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285) [10:52:50] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [10:58:33] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [10:59:43] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285) [10:59:48] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [11:00:35] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285) [11:00:42] !log fnegri@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=99) (T341285) [11:26:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [11:32:35] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudvirt.safe_reboot [11:32:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:33:05] !log admin fran@wmf3169 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) [11:33:06] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285) [11:33:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:33:13] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [11:36:43] 10Tool-bub2: Use .jsx for files that containt JSX syntax - https://phabricator.wikimedia.org/T348505 (10ThierryW23) [11:38:38] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285) [11:38:43] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [11:48:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [11:48:21] 10Toolforge: Deprecated PHP 5.6 container not working with `webservice` command - https://phabricator.wikimedia.org/T341524 (10taavi) 05Open→03Resolved a:03taavi `lang=shell-session tools.taavi-test-tool@tools-sgebastion-11:~$ webservice --backend=kubernetes php5.6 shell DEPRECATED: 'php5.6' type is deprec... [11:57:14] !log admin dcaro@urcuchillay END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0) [11:57:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:00:07] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) 05Resolved→03In progress [12:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [12:04:07] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [12:07:19] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [12:09:08] 10Quarry, 10Patch-For-Review: investigate quarry on k8s - https://phabricator.wikimedia.org/T301469 (10rook) >>! In T301469#9237091, @Audiodude wrote: > I'm completely new to Kubernetes but have been reading through https://wikitech.wikimedia.org/wiki/Kubernetes/Kubernetes_Workshop. Does WM Cloud provide k8s c... [12:23:55] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [12:23:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:36:28] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops: HDD failure in cloudvirt2004-dev - https://phabricator.wikimedia.org/T348531 (10fnegri) [12:37:35] PROBLEM - Check unit status of backup_cinder_volumes on cloudbackup2001 is CRITICAL: CRITICAL: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:38:12] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10ops-codfw: HDD failure in cloudvirt2004-dev - https://phabricator.wikimedia.org/T348531 (10RhinosF1) [12:40:57] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285) [12:41:03] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [12:41:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [12:44:27] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10ops-codfw: HDD failure in cloudvirt2004-dev - https://phabricator.wikimedia.org/T348531 (10fnegri) [12:46:32] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285) [12:46:37] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [12:48:39] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-codfw: hw troubleshooting: disk failure for cloudvirt2004-dev.codfw.wmnet - https://phabricator.wikimedia.org/T348531 (10fnegri) [13:00:10] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-codfw: hw troubleshooting: disk failure for cloudvirt2004-dev.codfw.wmnet - https://phabricator.wikimedia.org/T348531 (10fnegri) This is not urgent and can wait a few days if necessary. [13:05:47] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10SD0001) Thanks for the details. I had figured out the manual deployment process but had been confused about the role of Puppet – we don't use Puppet at all for this project? >>! In T348184#9236836, @rook wrote: > The primary thin... [13:09:13] (03PS1) 10Muehlenhoff: Add dummy keytabs for apt1002/apt2002 [labs/private] - 10https://gerrit.wikimedia.org/r/964900 (https://phabricator.wikimedia.org/T331613) [13:27:57] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS bullseye [13:49:49] 10Toolforge (Toolforge iteration 00): [toolforge] add changelog page to send small updates for projects - https://phabricator.wikimedia.org/T348537 (10dcaro) [13:52:27] 10Toolforge (Toolforge iteration 00): [tools,harbor] Cleanup old production images - https://phabricator.wikimedia.org/T348538 (10dcaro) [13:52:31] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye [13:53:25] 10Data-Services: Access denied to Toolforge DB - https://phabricator.wikimedia.org/T348502 (10JJMC89) 05Resolved→03Invalid a:05Criscod→03None [13:58:27] 10Toolforge (Toolforge iteration 01), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 3 others: [tbs][builder] Inject nodejs buildpack - https://phabricator.wikimedia.org/T346635 (10dcaro) [13:59:03] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 01), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: Toolforge beyond build service - https://phabricator.wikimedia.org/T342077 (10dcaro) [13:59:08] 10Toolforge (Toolforge iteration 01), 10Toolforge Jobs framework, 10Patch-For-Review: jobs: Add option to disable NFS mounts - https://phabricator.wikimedia.org/T348250 (10dcaro) [13:59:11] 10Toolforge (Toolforge iteration 01), 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [builds-api] Automatically deploy the webservice when the image is built - https://phabricator.wikimedia.org/T341065 (10dcaro) [13:59:18] 10Toolforge (Toolforge iteration 01), 10Patch-For-Review, 10User-dcaro: `toolforge build logs`: add follow options - https://phabricator.wikimedia.org/T339922 (10dcaro) [13:59:21] 10Toolforge (Toolforge iteration 01), 10Patch-For-Review: [tbs.build.logs] Show a more user-friendly error message when logs are not ready - https://phabricator.wikimedia.org/T341059 (10dcaro) [14:00:03] 10Toolforge (Toolforge iteration 01), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 3 others: [tbs] User story - I can use multiple language stacks for my application - https://phabricator.wikimedia.org/T325799 (10dcaro) [14:00:21] 10Toolforge (Toolforge iteration 01), 10Patch-For-Review, 10User-Raymond_Ndibe: toolforge build start: default to tailing the build as it progresses with the option of -d/--detached - https://phabricator.wikimedia.org/T340079 (10dcaro) [14:00:34] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 01), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: Decision request – Toolforge (re)architecture - https://phabricator.wikimedia.org/T346153 (10dcaro) [14:00:45] 10Toolforge (Toolforge iteration 01), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 3 others: [builds-api.start] Add statistics - https://phabricator.wikimedia.org/T337390 (10dcaro) [14:01:02] 10Toolforge (Toolforge iteration 01), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [tbs.maintain-harbor] Document current setup and admin procedures - https://phabricator.wikimedia.org/T329176 (10dcaro) [14:01:04] 10Toolforge (Toolforge iteration 01): [tools,harbor] Cleanup old production images - https://phabricator.wikimedia.org/T348538 (10dcaro) [14:01:06] 10Toolforge (Toolforge iteration 01): [toolforge] add changelog page to send small updates for projects - https://phabricator.wikimedia.org/T348537 (10dcaro) [14:01:08] 10Toolforge (Toolforge iteration 01), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 2 others: [buildservice] Create GET /build/latest endpoint in the buildservice API - https://phabricator.wikimedia.org/T345675 (10dcaro) [14:01:11] 10Toolforge (Toolforge iteration 01), 10Toolforge Build Service, 10cloud-services-team: toolsbeta harbor instance ran out of disk - https://phabricator.wikimedia.org/T348337 (10dcaro) [14:01:13] 10Toolforge (Toolforge iteration 01), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-User, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [webservice] Error shown when restarting buildpack-based tool - https://phabricator.wikimedia.org/T348312 (10dcaro) [14:01:15] 10Toolforge (Toolforge iteration 01): [tbs] migrate sample tools to Gitlab - https://phabricator.wikimedia.org/T348213 (10dcaro) [14:01:23] 10Toolforge (Toolforge iteration 01): decide on which kubernetes bootstrapper to focus on between minikube and kind - https://phabricator.wikimedia.org/T347723 (10dcaro) [14:01:25] 10Toolforge (Toolforge iteration 01): [envvars-api] Add statistics - https://phabricator.wikimedia.org/T346228 (10dcaro) [14:01:27] 10Toolforge (Toolforge iteration 01), 10Documentation, 10Kubernetes: [buildservice] Add docs on how to run a ruby based tool using buildpacks - https://phabricator.wikimedia.org/T347402 (10dcaro) [14:01:29] 10Toolforge (Toolforge iteration 01): Upgrade harbor - https://phabricator.wikimedia.org/T346241 (10dcaro) [14:01:31] 10Toolforge (Toolforge iteration 01): Add `toolforge envvars quota` - https://phabricator.wikimedia.org/T341087 (10dcaro) [14:01:33] 10Toolforge (Toolforge iteration 01): Allow listing and managing images of a tool - https://phabricator.wikimedia.org/T341067 (10dcaro) [14:01:35] 10Toolforge (Toolforge iteration 01): Add `toolforge build quota` command - https://phabricator.wikimedia.org/T341068 (10dcaro) [14:01:37] 10Toolforge (Toolforge iteration 01): `webservice restart` sometimes timing out for buildservice images - https://phabricator.wikimedia.org/T341057 (10dcaro) [14:01:39] 10Toolforge (Toolforge iteration 01), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [buildservice] Create a buildservice API and move any logic from the client to it - https://phabricator.wikimedia.org/T334590 (10dcaro) [14:01:41] 10Toolforge (Toolforge iteration 01), 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [builds-api] Add triggering support - https://phabricator.wikimedia.org/T334587 (10dcaro) [14:01:43] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 01), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: [toolforge-envvars.api,toolforge-build.api] Support flagging environment variables to be injected at build time - https://phabricator.wikimedia.org/T338142 (10dcaro) [14:01:45] 10Toolforge (Toolforge iteration 01): [gitlab,toolforge-deploy] Create a process to open an MR to toolforge-deploy when a new release ofa component happens - https://phabricator.wikimedia.org/T347392 (10dcaro) [14:01:47] 10Toolforge (Toolforge iteration 01), 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 2 others: tbs: user-story 10: I want to know how to manage the service - https://phabricator.wikimedia.org/T325166 (10dcaro) [14:01:49] 10Toolforge (Toolforge iteration 01), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [builds-api] catch harbor timeout when creating repository - https://phabricator.wikimedia.org/T345903 (10dcaro) [14:01:52] 10Toolforge (Toolforge iteration 01): Expose tool-labs service names via environment variables - https://phabricator.wikimedia.org/T151002 (10dcaro) [14:01:58] 10Toolforge (Toolforge iteration 01), 10Documentation, 10Kubernetes: Add a easy way to run a ruby webservice on tools - https://phabricator.wikimedia.org/T141388 (10dcaro) [14:13:13] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10Papaul) on cloudvirt1064 during install i am getting when you reboot the server on console you get the server login prompt but since the system didn't comp... [14:26:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:48:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:58:33] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [15:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [15:07:19] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [15:08:15] (03PS1) 10Arturo Borrero Gonzalez: aborrero: drop access [labs/private] - 10https://gerrit.wikimedia.org/r/964926 [15:10:22] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [15:23:16] 10Toolforge Build Service (Beta release), 10cloud-services-team (FY2023/2024-Q1), 10Goal: Toolforge Build Service Beta Rollout To Selected Users - https://phabricator.wikimedia.org/T335249 (10komla) The draft for open beta announcement will be shared with cloud-admin mailing list before the final announcement. [15:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [15:32:40] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-codfw: hw troubleshooting: disk failure for cloudvirt2004-dev.codfw.wmnet - https://phabricator.wikimedia.org/T348531 (10Jhancock.wm) @fnegri I'm also seeing a potentially failed DIMM. is it safe to power down the server for trou... [15:39:52] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-codfw: hw troubleshooting: disk failure for cloudvirt2004-dev.codfw.wmnet - https://phabricator.wikimedia.org/T348531 (10fnegri) @Jhancock.wm yes you can power it down. [15:40:26] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-codfw: hw troubleshooting: disk failure for cloudvirt2004-dev.codfw.wmnet - https://phabricator.wikimedia.org/T348531 (10fnegri) [15:42:28] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10Patch-For-Review: [toolforge] repositories move to gitlab - https://phabricator.wikimedia.org/T327057 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/too... [15:45:32] 10cloud-services-team, 10User-aborrero: cloudgw improvements - https://phabricator.wikimedia.org/T347469 (10aborrero) [15:46:24] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye [15:47:01] 10cloud-services-team, 10User-aborrero: cloudgw improvements - https://phabricator.wikimedia.org/T347469 (10aborrero) a:05aborrero→03cmooney I guess @cmooney can drive the future work to complete the remaining bits in this task. [15:47:58] 10cloud-services-team, 10Patch-For-Review, 10User-aborrero: cloudgw: replace keepalived with BGP - https://phabricator.wikimedia.org/T347687 (10aborrero) a:05aborrero→03cmooney I think @cmooney can continue this work in the future. [15:48:22] 10cloud-services-team, 10User-aborrero: cloud: consider creating a reproducible local development environment for openstack-helm-based Cloud VPS - https://phabricator.wikimedia.org/T346785 (10aborrero) a:05aborrero→03None This is something for the WMCS team to decide. [15:49:12] 10cloud-services-team, 10User-aborrero: cloud: introduce eqiad2dev region for openstack-in-kubernetes PoC via openstack-helm - https://phabricator.wikimedia.org/T346665 (10aborrero) a:05aborrero→03None [15:49:53] 10cloud-services-team (Kanban), 10Infrastructure-Foundations, 10SRE, 10netops: cloud: decide on general idea for having cloud-dedicated hardware provide service in the cloud realm & the internet - https://phabricator.wikimedia.org/T296411 (10aborrero) [15:51:31] 10Toolforge (Toolforge iteration 01), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 2 others: [buildservice] Create GET /build/latest endpoint in the buildservice API - https://phabricator.wikimedia.org/T345675 (10aborrero) a:05aborrero→03dc... [15:51:36] 10cloud-services-team (FY2023/2024-Q1), 10User-aborrero: eqiad1: fix PTR delegations for 185.15.56.0/24 - https://phabricator.wikimedia.org/T341338 (10aborrero) a:05aborrero→03taavi I think @taavi can take care of this change. [15:52:27] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10User-aborrero: Cloud VPS: refresh openstack resources grafana dashboard - https://phabricator.wikimedia.org/T333975 (10aborrero) a:05aborrero→03None This is mostly done. [15:52:31] 10cloud-services-team, 10User-aborrero, 10User-dcaro: cloud: introduce a kubernetes undercloud to run openstack (via openstack-helm) - https://phabricator.wikimedia.org/T342750 (10aborrero) a:05aborrero→03None [15:53:57] 10cloud-services-team (FY2023/2024-Q1), 10Patch-For-Review, 10User-aborrero: cloudgw: add cloud-private subnet support - https://phabricator.wikimedia.org/T338334 (10aborrero) a:05aborrero→03taavi I guess @cmooney and/or @taavi can follow up on this. [16:03:18] 10cloud-services-team (FY2023/2024-Q1), 10Goal: have cloud hardware servers in the cloud realm using a dedicated LB layer - https://phabricator.wikimedia.org/T297596 (10Aklapper) [16:17:00] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-codfw: hw troubleshooting: disk failure for cloudvirt2004-dev.codfw.wmnet - https://phabricator.wikimedia.org/T348531 (10Jhancock.wm) Re: DIMM I've swapped B1 and B7. if the error recurs in B7, it is the stick. If it recurs in B1... [16:25:07] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Redesign HTML template as per Codex design - https://phabricator.wikimedia.org/T348413 (10Okerekechinweotito) I have made a PR for this issue Available here - [[ https://github.com/coderwassananmol/BUB2/pull/206 | Redesign HTML template as pe... [16:42:33] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-codfw: hw troubleshooting: disk failure for cloudvirt2004-dev.codfw.wmnet - https://phabricator.wikimedia.org/T348531 (10Jhancock.wm) new error popped up after rebooting T348550 [16:47:11] 10Tool-bub2: Use .jsx for files that containt JSX syntax - https://phabricator.wikimedia.org/T348505 (10SamMintah) a:03SamMintah [17:25:38] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [17:25:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:26:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [17:48:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [18:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [18:07:19] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [18:58:13] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10Jclark-ctr) @Papaul I see cloudelasticservers in site.pp it was added by Bking previously node /^cloudelastic1... [18:58:33] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:04:53] 10Tool-refill: Pick a capitalization: reFill? Refill? ReFill? - https://phabricator.wikimedia.org/T340506 (10Novem_Linguae) If this task weren't declined, I would suggest renaming everything to "ReFill" so that everything could be consistent. [19:10:03] RECOVERY - Check unit status of backup_cinder_volumes on cloudbackup2001 is OK: OK: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [20:04:45] (ProbeDown) firing: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:09:45] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:14:45] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:26:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [20:36:15] 10Cloud-VPS, 10cloud-services-team: Neutron default security group provisioning is broken - https://phabricator.wikimedia.org/T348581 (10taavi) [20:37:37] 10Cloud-VPS, 10cloud-services-team: Neutron policy does not allow the admin role to modify security groups - https://phabricator.wikimedia.org/T348582 (10taavi) [20:38:19] 10Cloud-VPS, 10cloud-services-team: Neutron default security group provisioning is broken - https://phabricator.wikimedia.org/T348581 (10taavi) p:05Triage→03High [20:48:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [20:54:55] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Change Books.js component to React hooks component - https://phabricator.wikimedia.org/T348414 (10ThierryW23) GitHub PR [[ https://github.com/coderwassananmol/BUB2/pull/211 | here ]] [21:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [21:07:19] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [21:55:26] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10Papaul) @Jclark-ctr ok then the only thing left is to change it in netbox to use the public VLAN [22:13:09] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye [22:15:27] 10Tool-bub2: Switching Header.js to Functional stateless components creates an application error because withSession.js HOC expects a Class component - https://phabricator.wikimedia.org/T348471 (10Peter_Kampete) Hello @Spykelionel , please I would like to work on this issue, it is interesting and I have the proj... [22:17:48] 10Tool-bub2: Switching Header.js to Functional stateless components creates an application error because withSession.js HOC expects a Class component - https://phabricator.wikimedia.org/T348471 (10Peter_Kampete) 05Open→03In progress a:03Peter_Kampete [22:26:45] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10Papaul) @MoritzMuehlenhoff i was getting the error above on cloudvirt1064 and wanted to drop in the virtual console to see the syslog but when i restart th... [22:41:39] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10Papaul) @cmooney @ayounsi i check the virtual console on clouvirt1064 to see the reason i was getting the 2 above errors. it end up being the server is not... [22:46:23] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10Papaul) looking at the gerrit history about the late command i see also that there where some changes made today @jbond @Volans can you please also see if... [22:48:21] 10Tool-bub2: Use .jsx for files that containt JSX syntax - https://phabricator.wikimedia.org/T348505 (10SamMintah) 05Open→03In progress [22:58:33] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:31:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [23:48:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates