[17:55:56] 10cloud-services-team (FY2023/2024-Q1): Add #wikimedia-cloud-admin and -cloud-feed to public IRC logs - https://phabricator.wikimedia.org/T346382 (10fnegri) 05Stalled→03Resolved Logging is now enabled also in `#wikimedia-cloud-feed`. [18:00:22] 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [builds-cli] build packages using gitlab ci - https://phabricator.wikimedia.org/T347830 (10fnegri) 05Invalid→03Resolved [18:00:24] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade codfw hosts to bookworm - https://phabricator.wikimedia.org/T345810 (10fnegri) [18:00:48] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): codfw1dev: we lost the PDNS database content - https://phabricator.wikimedia.org/T347856 (10fnegri) 05Invalid→03Resolved [18:00:56] 10Cloud-VPS, 10Toolforge, 10cloud-services-team, 10SRE Observability, 10Patch-For-Review: grafana-cloud: Browser access to Prometheus is deprecated - https://phabricator.wikimedia.org/T307465 (10fnegri) [18:01:16] 10cloud-services-team (FY2023/2024-Q1), 10Epic, 10Goal: Move WMCS dashboards to grafana.wmcloud.org - https://phabricator.wikimedia.org/T333568 (10fnegri) 05Declined→03Resolved [18:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [18:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [18:31:38] 10Toolforge, 10Documentation, 10good first task: Add doc type categories to Toolforge user docs - https://phabricator.wikimedia.org/T348047 (10TBurmeister) [18:31:48] 10Toolforge, 10Documentation, 10good first task: Add doc type categories to Toolforge user docs - https://phabricator.wikimedia.org/T348047 (10TBurmeister) p:05Triage→03Low [18:35:44] 10Cloud-VPS, 10Documentation, 10good first task: Add doc type categories to Cloud VPS user docs - https://phabricator.wikimedia.org/T348049 (10TBurmeister) [18:35:52] 10Cloud-VPS, 10Documentation, 10good first task: Add doc type categories to Cloud VPS user docs - https://phabricator.wikimedia.org/T348049 (10TBurmeister) p:05Triage→03Low [18:42:19] 10Tool-inteGraality: Support wikibase-lexeme as datatype for grouping - https://phabricator.wikimedia.org/T348053 (10JeanFred) [18:48:44] 10Toolforge, 10Documentation: Update and Improve Toolforge and Cloud VPS Technical Documentation - https://phabricator.wikimedia.org/T203131 (10TBurmeister) [18:48:46] 10Toolforge, 10Documentation: Create a "my first Python webservice" tutorial for Toolforge - https://phabricator.wikimedia.org/T134494 (10TBurmeister) [18:49:02] 10Toolforge, 10Documentation: Update and Improve Toolforge and Cloud VPS Technical Documentation - https://phabricator.wikimedia.org/T203131 (10TBurmeister) [18:49:04] 10Toolforge, 10Documentation: Create a "my first PHP webservice" tutorial for Toolforge - https://phabricator.wikimedia.org/T134493 (10TBurmeister) [18:49:19] 10Toolforge, 10Documentation: Update and Improve Toolforge and Cloud VPS Technical Documentation - https://phabricator.wikimedia.org/T203131 (10TBurmeister) [18:49:21] 10Toolforge, 10Documentation, 10User-srishakatux: Create a "my first React app" tutorial for Toolforge - https://phabricator.wikimedia.org/T231950 (10TBurmeister) [18:49:36] 10Toolforge, 10Documentation: Update and Improve Toolforge and Cloud VPS Technical Documentation - https://phabricator.wikimedia.org/T203131 (10TBurmeister) [19:02:33] 10cloud-services-team, 10Tech-Docs-Team, 10Documentation, 10Goal: Redesign Cloud Services documentation information architecture - https://phabricator.wikimedia.org/T327319 (10TBurmeister) [19:38:21] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) [19:38:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [20:14:00] (ProbeDown) firing: Service toolsbeta-proxy-3:443 has failed probes (http_toolsbeta_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-proxy-3:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:19:30] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [20:21:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [20:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [20:25:34] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:33:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [20:44:00] (ProbeDown) resolved: Service toolsbeta-proxy-3:443 has failed probes (http_toolsbeta_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-proxy-3:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:53:55] 10Cloud-VPS, 10cloud-services-team, 10Security: Add prefix to novaobserver password to make it obvious it's intended to be public - https://phabricator.wikimedia.org/T348067 (10taavi) As documented [[ https://gerrit.wikimedia.org/g/operations/puppet/+/16906c693da99eacdf7be557cc19e110a30c96f1/hieradata/cloud/... [20:54:01] 10Cloud-VPS, 10cloud-services-team, 10Security: Add prefix to novaobserver password to make it obvious it's intended to be public - https://phabricator.wikimedia.org/T348067 (10taavi) [20:58:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-proxy-3 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [21:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [21:01:57] 10Cloud-VPS, 10cloud-services-team, 10Security: Add prefix to novaobserver password to make it obvious it's intended to be public - https://phabricator.wikimedia.org/T348067 (10taavi) p:05Medium→03Low [21:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [21:13:03] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance toolsbeta-proxy-3 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [22:28:07] 10Cloud-VPS, 10observability, 10Security: Ingest Cloud VPS audit logs into production logging pipeline - https://phabricator.wikimedia.org/T348075 (10Southparkfan) [22:29:09] 10Cloud-VPS, 10cloud-services-team, 10Sustainability (Incident Followup), 10User-dcaro: Move Cloud VPS auth.logs to central logging - https://phabricator.wikimedia.org/T127717 (10Southparkfan) [22:29:11] 10Cloud-VPS, 10observability, 10Security: Ingest Cloud VPS audit logs into production logging pipeline - https://phabricator.wikimedia.org/T348075 (10Southparkfan) [23:07:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-prometheus-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [23:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [23:21:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [23:23:42] 10Quarry, 10cloud-services-team (FY2023/2024-Q1), 10superset.wmcloud.org: Replace Quarry with an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Audiodude) So is it correct that we're looking for a new maintainer, but only in the capacity of migrating all usage of Quarry to Superset?... [23:25:34] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [23:33:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [23:47:03] (InstanceDown) resolved: Project toolsbeta instance toolsbeta-prometheus-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown