[00:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [00:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [00:06:52] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [00:07:59] 10Tool-bub2, 10Test-Coverage: Write unit test cases - https://phabricator.wikimedia.org/T344117 (10Aklapper) [00:38:32] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [02:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:24:10] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): For PDL, download and stream the PDF if available - https://phabricator.wikimedia.org/T348188 (10Razeetech) a:03Razeetech [02:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [02:44:03] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Add search bar in queue - https://phabricator.wikimedia.org/T315134 (10Razeetech) Do i have your permission to work on this task @Sujith116 [03:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [03:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [03:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:06:52] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [04:38:32] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [05:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [06:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [06:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [06:16:36] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Fix preview for books with long description - https://phabricator.wikimedia.org/T348411 (10Shreyashidabral) Github PR [[ https://github.com/coderwassananmol/BUB2/pull/196 | here ]] [06:37:13] 10Tool-bub2: Fix peer dependencies and remove deprecation warnings - https://phabricator.wikimedia.org/T344116 (10Okerekechinweotito) I have made a PR for this issue PR here - [[ https://github.com/coderwassananmol/BUB2/pull/195 | Fix peer dependencies and remove deprecation warnings ]] [06:52:00] (03PS1) 10Majavah: d/changelog: prepare for new release 14 [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/964331 [06:57:24] (03CR) 10Majavah: [C: 03+2] d/changelog: prepare for new release 14 [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/964331 (owner: 10Majavah) [06:58:48] (03Merged) 10jenkins-bot: d/changelog: prepare for new release 14 [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/964331 (owner: 10Majavah) [07:13:36] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-jobs-framework-cli' version '14' [07:13:52] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-jobs-framework-cli' version '14' [07:25:00] PROBLEM - puppet last run on cloudbackup2001 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:30:41] 10Cloud-VPS, 10cloud-services-team: postgresql is stuck on cloudbackup2001 - https://phabricator.wikimedia.org/T348431 (10taavi) p:05Triage→03High [07:31:19] 10Cloud-VPS, 10cloud-services-team: postgresql is stuck on cloudbackup2001 - https://phabricator.wikimedia.org/T348431 (10taavi) Not sure if related, but the host is also almost out of disk space: ` /dev/mapper/backup-cinder--backups 80T 75T 1.5T 99% /srv/cinder-backups ` [07:32:19] ACKNOWLEDGEMENT - Disk space on cloudbackup2001 is CRITICAL: DISK CRITICAL - free space: /srv/cinder-backups 1549482 MB (1% inode=98%): Majavah https://phabricator.wikimedia.org/T348431 - The acknowledgement expires at: 2023-10-10 07:31:58. https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup2001&var-datasource=codfw+prometheus/ops [07:32:49] ACKNOWLEDGEMENT - puppet last run on cloudbackup2001 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago Majavah https://phabricator.wikimedia.org/T348431 - The acknowledgement expires at: 2023-10-10 07:32:40. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:33:53] 10Cloud-VPS, 10cloud-services-team: postgresql is stuck on cloudbackup2001 - https://phabricator.wikimedia.org/T348431 (10taavi) the postgres log is full of: ` 2023-10-09 01:44:03 GMT LOG: using stale statistics instead of current ones because stats collector is not responding 2023-10-09 01:44:13 GMT LOG: us... [07:49:22] PROBLEM - Check unit status of backup_cinder_volumes on cloudbackup2001 is CRITICAL: CRITICAL: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [07:50:07] 10tool-wscontest: The score command throws deprecated warning - https://phabricator.wikimedia.org/T348270 (10Samwilson) This deprecation warning should be fixed (by upgrading Symfony), but it's not likely to be related to the score command failing. Try with `./bin/score -vv` and see if there's anything being pro... [07:50:25] 10Cloud-VPS, 10cloud-services-team: postgresql is stuck on cloudbackup2001 - https://phabricator.wikimedia.org/T348431 (10taavi) a:03taavi I restarted Postgres. It's clearly doing something on an 80G `pgsql_tmp` directory according to a `strace`, but that's taking a while. I'll come back to it later. [07:54:37] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [07:54:50] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [08:03:16] RECOVERY - puppet last run on cloudbackup2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:06:52] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [08:15:39] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [08:15:54] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [08:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:38:34] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [08:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [08:58:10] 10Tool-Pageviews, 10Data-Engineering: None result with some chars in the file name - https://phabricator.wikimedia.org/T347899 (10Lokal_Profil) >>! In T347899#9217630, @MusikAnimal wrote: > Well first, the file was only uploaded 22 hours ago, so the data might simply [[ https://pageviews.wmcloud.org/mediaviews... [09:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [09:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [09:05:36] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [09:05:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:05:48] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [09:05:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:05:53] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [09:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:06:57] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [09:07:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:09:07] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [09:09:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:09:54] !log admin dcaro@urcuchillay END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0) [09:09:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:26:46] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) This is the patch to enable the single NIC setup on ceph nodes: https://gerrit.wikimedia.org/r/c/operations/puppet/+/856675/ Is marked as abando... [09:28:45] 10Cloud-VPS, 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) Unfortunately, it seems that the cluster has grown in the last few days :/, as draining the last 21 osd d... [09:39:24] 10Cloud-VPS, 10cloud-services-team: pdns auth metrics unreachable on prod network - https://phabricator.wikimedia.org/T348437 (10fgiunchedi) [10:05:33] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [envvars-cli] build packages using gitlab ci - https://phabricator.wikimedia.org/T347580 (10dcaro) 05In progress→03Resolved [10:19:42] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Handle PDL library failures - https://phabricator.wikimedia.org/T348412 (10Okerekechinweotito) I have made a PR for this issue PR here - [[ https://github.com/coderwassananmol/BUB2/pull/198 | Handle PDL library failures ]] [10:22:05] 10cloud-services-team (FY2023/2024-Q1), 10wikitech.wikimedia.org: [wikitech] administrator rights for WMCS - https://phabricator.wikimedia.org/T347557 (10fnegri) I see Arturo, Andrew and Bryan are in that list, so maybe we don't need extra permissions, and adding a link to the bureaucrat list is enough. Someth... [10:23:19] 10Cloud-VPS, 10cloud-services-team: pdns auth metrics unreachable on prod network - https://phabricator.wikimedia.org/T348437 (10taavi) It seems like the pdns web server [[ https://github.com/PowerDNS/pdns/issues/960 | can't listen on multiple interfaces ]], and we need it on the cloud-private address for desi... [10:24:47] 10cloud-services-team (FY2023/2024-Q1), 10wikitech.wikimedia.org: [wikitech] administrator rights for WMCS - https://phabricator.wikimedia.org/T347557 (10taavi) Any [[ https://wikitech.wikimedia.org/wiki/Special:ListUsers/sysop | admins ]] can edit protected pages, 'crat is only needed for granting admin acces... [10:26:23] 10cloud-services-team (FY2023/2024-Q1), 10wikitech.wikimedia.org: [wikitech] administrator rights for WMCS - https://phabricator.wikimedia.org/T347557 (10fnegri) Thanks, then maybe the note in the page could be "This page is protected, if you need to edit it please [contact an admin](https://wikitech.wikimedia... [10:29:43] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'python3-toolforge-weld' version '1.4.0' [10:29:57] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'python3-toolforge-weld' version '1.4.0' [10:36:43] 10Cloud-VPS (Quota-requests): Quota increase for linkwatcher - https://phabricator.wikimedia.org/T348441 (10TheresNoTime) [10:37:42] 10Toolforge: Standardize Toolfroge CLI user interface looks - https://phabricator.wikimedia.org/T348442 (10taavi) [10:40:34] 10Cloud-VPS (Quota-requests), 10linkwatcher: Quota increase for linkwatcher - https://phabricator.wikimedia.org/T348441 (10TheresNoTime) [10:52:18] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10Patch-For-Review: [toolforge] repositories move to gitlab - https://phabricator.wikimedia.org/T327057 (10CodeReviewBot) dcaro updated https://gitlab.wikimedia.org/repos/cloud/to... [10:52:31] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10Patch-For-Review: [toolforge] repositories move to gitlab - https://phabricator.wikimedia.org/T327057 (10dcaro) [10:54:30] (03CR) 10Samtar: [C: 03+2] "+Vv set on wikibugs, self-serve" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/964459 (owner: 10Samtar) [10:55:31] (03Merged) 10jenkins-bot: Add wikimedia-external-links to follow linkwatcher [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/964459 (owner: 10Samtar) [11:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [11:34:11] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/map-of-monuments] - 10https://gerrit.wikimedia.org/r/964515 (owner: 10L10n-bot) [11:34:12] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/964516 (owner: 10L10n-bot) [11:35:19] 10Cloud-VPS, 10cloud-services-team: pdns auth metrics unreachable on prod network - https://phabricator.wikimedia.org/T348437 (10fgiunchedi) Thank you, that's a bummer re: pdns not listening on multiple interfaces. I don't feel strongly about either implementing something like `socat` you mentioned or move pd... [11:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [12:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [12:02:09] vivian-rook opened https://github.com/toolforge/paws/pull/337 [12:02:10] 10PAWS: New upstream release 8.4.0 for Pywikibot - https://phabricator.wikimedia.org/T348372 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/337 [12:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [12:22:39] 10Cloud-Services, 10SRE, 10User-aborrero: cloudservices1006 using 10. address to send DNS NOTIFYs to cloudservices1005 - https://phabricator.wikimedia.org/T346385 (10MoritzMuehlenhoff) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.or... [12:24:38] 10Cloud-VPS, 10cloud-services-team, 10SRE, 10User-aborrero: cloudservices1006 using 10. address to send DNS NOTIFYs to cloudservices1005 - https://phabricator.wikimedia.org/T346385 (10taavi) [12:33:32] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [12:33:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:34:20] 10Tools, 10Privacy: enkore.toolforge.org violates Privacy Policy by loading third-party resources - https://phabricator.wikimedia.org/T348445 (10Aklapper) [12:35:30] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10Patch-For-Review: [toolforge] repositories move to gitlab - https://phabricator.wikimedia.org/T327057 (10dcaro) [12:39:21] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:54:55] 10Cloud-VPS, 10cloud-services-team, 10Observability-Metrics, 10Patch-For-Review, 10User-fgiunchedi: Move labs/wmcs (OpenStack) Prometheus instance off cloudmetrics hosts to prometheus* hosts - https://phabricator.wikimedia.org/T336854 (10fgiunchedi) [12:57:22] 10Cloud-VPS, 10cloud-services-team, 10Observability-Metrics, 10Patch-For-Review, 10User-fgiunchedi: Move labs/wmcs (OpenStack) Prometheus instance off cloudmetrics hosts to prometheus* hosts - https://phabricator.wikimedia.org/T336854 (10fgiunchedi) The Prometheus `cloud` instance is live at https://prom... [12:58:38] 10Cloud-VPS, 10cloud-services-team, 10Observability-Metrics, 10Patch-For-Review, 10User-fgiunchedi: Move labs/wmcs (OpenStack) Prometheus instance off cloudmetrics hosts to prometheus* hosts - https://phabricator.wikimedia.org/T336854 (10taavi) The openstack exporter was fixed in https://gerrit.wikimedia... [13:00:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [13:04:18] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [13:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:12:11] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [13:12:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:13:08] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [13:13:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:14:16] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal, 10Patch-For-Review: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 (10fnegri) Something is still broken in cloudcontrol2001-dev, the service `cinder-scheduler` is failing with `Unable to connect to A... [13:22:07] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [13:36:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [13:38:34] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.undrain_node [13:38:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:40:34] 10Cloud-VPS, 10cloud-services-team, 10Observability-Metrics, 10User-fgiunchedi: Move labs/wmcs (OpenStack) Prometheus instance off cloudmetrics hosts to prometheus* hosts - https://phabricator.wikimedia.org/T336854 (10fgiunchedi) [13:41:13] 10Cloud-VPS, 10cloud-services-team, 10Observability-Metrics, 10User-fgiunchedi: Move labs/wmcs (OpenStack) Prometheus instance off cloudmetrics hosts to prometheus* hosts - https://phabricator.wikimedia.org/T336854 (10fgiunchedi) >>! In T336854#9235385, @taavi wrote: > The openstack exporter was fixed in h... [13:41:37] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [13:41:52] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [13:42:07] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [13:44:31] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal, 10Patch-For-Review: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 (10fnegri) The error above was fixed by restarting `rabbitmq-server` in `cloudcontrol2005-dev` (which is the host corresponding to `... [13:45:16] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285) [13:45:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:45:22] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [13:48:32] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10Patch-For-Review: [toolforge] repositories move to gitlab - https://phabricator.wikimedia.org/T327057 (10dcaro) [13:49:37] 10Cloud Services Proposals, 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, and 3 others: [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs - https://phabricator.wikimedia.org/T194332 (10dcaro) [13:50:50] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10Patch-For-Review: [toolforge] repositories move to gitlab - https://phabricator.wikimedia.org/T327057 (10dcaro) 05In progress→03Resolved [13:55:03] !log admin fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285) [13:55:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:55:09] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [14:01:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [14:08:26] 10tool-wscontest, 10Patch-For-Review: Incorrect stats on landing page - https://phabricator.wikimedia.org/T348210 (10PMenon-WMF) Made a [[ https://github.com/wikisource/wscontest/pull/68 | PR ]] here. This stat is still a //bit// ambiguous, but it's still a good quick fix! [14:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [15:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [15:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [15:23:41] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Change UploadedItems.js component to stateless functional components - https://phabricator.wikimedia.org/T348416 (10Spykelionel) Hello, I wish to work on this issue. [15:26:11] !log admin fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) (T341285) [15:26:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:26:17] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [15:29:29] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review: [tbs.build.logs] Show a more user-friendly error message when logs are not ready - https://phabricator.wikimedia.org/T341059 (10dcaro) 05In progress→03Stalled [15:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [15:29:59] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Change UploadedItems.js component to stateless functional components - https://phabricator.wikimedia.org/T348416 (10Spykelionel) Alright. I am currently working on this. [15:30:01] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review, 10User-Raymond_Ndibe: toolforge build start: default to tailing the build as it progresses with the option of -d/--detached - https://phabricator.wikimedia.org/T340079 (10dcaro) 05In progress→03Stalled [15:30:07] 10cloud-services-team (FY2022/2023-Q3): kolla-ansible poc - https://phabricator.wikimedia.org/T348457 (10rook) [15:30:30] 10cloud-services-team (FY2022/2023-Q3): ldap for kolla - https://phabricator.wikimedia.org/T348458 (10rook) [15:30:53] 10cloud-services-team (FY2022/2023-Q3): git repo for kolla - https://phabricator.wikimedia.org/T348459 (10rook) [15:31:14] 10cloud-services-team (FY2022/2023-Q3): git repo for kolla - https://phabricator.wikimedia.org/T348459 (10rook) https://gitlab.wikimedia.org/repos/cloud/cloud-vps/cloud-deploy [15:31:22] 10cloud-services-team (FY2022/2023-Q3): kolla-ansible poc - https://phabricator.wikimedia.org/T348457 (10rook) [15:31:24] 10cloud-services-team (FY2022/2023-Q3): git repo for kolla - https://phabricator.wikimedia.org/T348459 (10rook) 05Open→03Resolved [15:31:46] 10cloud-services-team (FY2022/2023-Q3): kolla ceph integration - https://phabricator.wikimedia.org/T348460 (10rook) [15:32:00] 10cloud-services-team (FY2022/2023-Q3): bare metal deploy poc - https://phabricator.wikimedia.org/T348461 (10rook) [15:32:56] 10Toolforge (Toolforge iteration 00), 10Toolforge Build Service, 10cloud-services-team: toolsbeta harbor instance ran out of disk - https://phabricator.wikimedia.org/T348337 (10dcaro) I think that there's a few things that can be done: * Increase the size of the volume * Reduce the disk usage: ** Reduce the... [15:33:29] 10Toolforge (Toolforge iteration 00), 10Toolforge Build Service, 10cloud-services-team: toolsbeta harbor instance ran out of disk - https://phabricator.wikimedia.org/T348337 (10dcaro) Note that in tools harbor the problem will exist too, also there the cleanup does not remove images as they are immutable. [15:35:57] 10cloud-services-team (FY2022/2023-Q3): ldap for kolla - https://phabricator.wikimedia.org/T348458 (10rook) [15:36:21] 10cloud-services-team (FY2022/2023-Q3): kolla ceph integration - https://phabricator.wikimedia.org/T348460 (10rook) [15:36:48] 10cloud-services-team (FY2022/2023-Q3): bare metal deploy poc - https://phabricator.wikimedia.org/T348461 (10rook) [15:37:21] 10cloud-services-team (FY2022/2023-Q3): bare metal deploy poc - https://phabricator.wikimedia.org/T348461 (10rook) Bare metal deploys are hindered by a lack of web access in prod. https://wikitech.wikimedia.org/wiki/HTTP_proxy may help. [15:41:56] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285) [15:42:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:42:02] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [15:42:19] 10cloud-services-team, 10MediaWiki-Platform-Team: Get platform engineering team green light for Cloud NAT to wikis change - https://phabricator.wikimedia.org/T273738 (10Aklapper) > we would like to get a green light from the Platform Engineering team. #platform_engineering does not exist anymore. Adding #Med... [15:49:55] !log admin fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) (T341285) [15:50:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:50:01] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [15:50:11] 10PAWS: Upgrade openrefine to 3.7.6 - https://phabricator.wikimedia.org/T348464 (10rook) [15:50:41] 10PAWS: New upstream release 8.4.0 for Pywikibot - https://phabricator.wikimedia.org/T348372 (10rook) 05Open→03Resolved a:03rook [15:50:42] vivian-rook closed https://github.com/toolforge/paws/pull/337 [15:50:47] 10PAWS: New upstream release 8.4.0 for Pywikibot - https://phabricator.wikimedia.org/T348372 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/337 [16:13:21] vivian-rook opened https://github.com/toolforge/paws/pull/338 [16:13:21] 10PAWS: Upgrade openrefine to 3.7.6 - https://phabricator.wikimedia.org/T348464 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/338 [16:18:10] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285) [16:18:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:18:16] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [16:26:19] !log admin fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285) [16:26:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:26:25] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [16:33:54] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal, 10Patch-For-Review: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 (10fnegri) `cloudservices200[45]-dev` have been upgraded. Puppet is not showing errors, but in both hosts it's showing a corrective... [16:43:34] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [17:02:48] RECOVERY - Disk space on cloudbackup2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup2001&var-datasource=codfw+prometheus/ops [17:13:59] 10cloud-services-team (FY2022/2023-Q3): bare metal deploy poc - https://phabricator.wikimedia.org/T348461 (10rook) [17:15:13] 10cloud-services-team (FY2022/2023-Q3): kolla-ansible poc - https://phabricator.wikimedia.org/T348457 (10rook) [17:16:18] !log admin dcaro@urcuchillay END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0) [17:16:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:20:22] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [17:20:23] 10cloud-services-team (FY2022/2023-Q3): bare metal deploy poc - https://phabricator.wikimedia.org/T348461 (10rook) [17:20:25] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Change Books.js component to React hooks component - https://phabricator.wikimedia.org/T348414 (10Ademola04) Hello i'm an outreachy applicant i would like to work on this issue [17:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [17:25:58] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10Audiodude) Looking at that wiki page I linked, it seems at least somewhat out of date. I'd like to work on upgrading Python to at least 3.11, since 3.7 is EOL since June of 2023. Of course this might require upgrading dependencies... [17:30:50] 10cloud-services-team: kolla ceph integration - https://phabricator.wikimedia.org/T348460 (10Aklapper) Removing tag for a calendar quarter in Jan-Mar 2023; adding generic #cloud-services-team as no tag seems to exist for the current quarter (FY2023/24-Q2). [17:30:53] 10cloud-services-team: ldap for kolla - https://phabricator.wikimedia.org/T348458 (10Aklapper) Removing tag for a calendar quarter in Jan-Mar 2023; adding generic #cloud-services-team as no tag seems to exist for the current quarter (FY2023/24-Q2). [17:30:57] 10cloud-services-team: bare metal deploy poc - https://phabricator.wikimedia.org/T348461 (10Aklapper) Removing tag for a calendar quarter in Jan-Mar 2023; adding generic #cloud-services-team as no tag seems to exist for the current quarter (FY2023/24-Q2). [17:30:59] 10cloud-services-team: kolla-ansible poc - https://phabricator.wikimedia.org/T348457 (10Aklapper) Removing tag for a calendar quarter in Jan-Mar 2023; adding generic #cloud-services-team as no tag seems to exist for the current quarter (FY2023/24-Q2). [17:32:07] 10cloud-services-team: [research] kolla-ansible poc - https://phabricator.wikimedia.org/T348457 (10rook) [17:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [18:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [18:02:00] 10Tool-bub2: Switching Header.js to Functional stateless components creates an application error because withSession.js HOC depends expects a Class component - https://phabricator.wikimedia.org/T348471 (10Spykelionel) [18:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [18:02:54] 10Tool-bub2: Switching Header.js to Functional stateless components creates an application error because withSession.js HOC expects a Class component - https://phabricator.wikimedia.org/T348471 (10Spykelionel) [18:11:28] 10Tool-bub2: Switching Header.js to Functional stateless components creates an application error because withSession.js HOC expects a Class component - https://phabricator.wikimedia.org/T348471 (10Spykelionel) I wish to work on this issue if accepted as part my Outreachy contribution task. [18:13:44] 10MediaWiki-extensions-OpenStackManager, 10cloud-services-team, 10wikitech.wikimedia.org, 10MW-1.35-notes (1.35.0-wmf.8; 2019-11-26): Remove OpenStackManager from Wikitech - https://phabricator.wikimedia.org/T161553 (10Pppery) [18:18:01] 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Update wikitech customised shell account name registration instructions - https://phabricator.wikimedia.org/T88092 (10Pppery) 05Open→03Declined Developer account creation is no longer done on Wikitech. Closing as obsolete. [18:18:19] 10cloud-services-team (Kanban), 10wikitech.wikimedia.org, 10User-bd808: Update messages on Wikitech account creation screen - https://phabricator.wikimedia.org/T190412 (10Pppery) [19:09:42] RECOVERY - Check unit status of backup_cinder_volumes on cloudbackup2001 is OK: OK: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:32:02] 10Quarry: git-crypt for config.yaml files - https://phabricator.wikimedia.org/T348476 (10rook) [19:32:36] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) >>! In T348184#9233043, @SD0001 wrote: > @rook Are there any docs on how to do deployments once a GitHub PR gets merged? The document you found describes the process. https://wikitech.wikimedia.org/wiki/Portal:Data_Services... [19:58:32] (OpenstackAPIResponse) firing: (4) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [20:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [20:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [21:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [21:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [21:34:22] 10cloud-services-team, 10superset.wmcloud.org, 10MediaWiki-extensions-OAuth, 10Security-Team, and 3 others: Superset shows me logged in as another user - https://phabricator.wikimedia.org/T336994 (10sbassett) 05Open→03Invalid p:05Triage→03Low [21:47:30] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10Audiodude) Thank you for all the information, it is very helpful! We can stick to asynchronous communication if that's what works best, no problem. I guess we can keep using this ticket for Q&A? Anyways looking at T301469, anothe... [21:51:09] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) Ah yes when the k8s investigation ticket was opened the quarry source was hosted in Gerrit. The source has since moved to GitHub and GitHub would be the correct place to do development. I can add some container building logi... [22:45:48] 10Quarry, 10Patch-For-Review: investigate quarry on k8s - https://phabricator.wikimedia.org/T301469 (10Audiodude) I'm completely new to Kubernetes but have been reading through https://wikitech.wikimedia.org/wiki/Kubernetes/Kubernetes_Workshop. Does WM Cloud provide k8s clusters, or is it expected that we woul... [23:26:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [23:29:42] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [23:58:32] (OpenstackAPIResponse) firing: (4) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse