[00:11:12] 10Striker, 10Patch-For-Review, 10Release-Engineering-Team (Social Piranhas 🐟): Striker-created Diffusion mirrors of GitLab repos are empty (due to master vs main branch name mismatch) - https://phabricator.wikimedia.org/T348131 (10Aklapper) Thanks! Note to myself: I found at least [one toolforge-repos projec... [02:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:54:50] (PuppetAgentDisabled) firing: Puppet agent disabled on instance quarry-web-02 in project quarry - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentDisabled [05:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [05:54:50] (PuppetAgentDisabled) firing: Puppet agent disabled on instance quarry-web-02 in project quarry - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentDisabled [07:04:50] (PuppetAgentDisabled) resolved: Puppet agent disabled on instance quarry-web-02 in project quarry - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentDisabled [07:50:48] (03CR) 10David Caro: general: move to spicerack>8 (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967244 (https://phabricator.wikimedia.org/T348726) (owner: 10David Caro) [07:52:28] (03PS4) 10David Caro: general: move to spicerack>8 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967244 [07:53:55] (03PS5) 10David Caro: general: move to spicerack>8 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967244 [07:54:50] (03PS2) 10David Caro: mypy: skip build directory [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/966132 [07:54:56] (03PS2) 10David Caro: alerts: don't fail if host already downtimed or uptimed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/966133 [07:55:00] (03PS2) 10David Caro: openstack: don't pass the new project when creating it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/966134 (https://phabricator.wikimedia.org/T346427) [07:55:05] (03PS2) 10David Caro: ceph: Adapt to multi-level crush tree [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/966135 (https://phabricator.wikimedia.org/T331145) [07:55:09] (03PS2) 10David Caro: ceph: add drain/undrain host and rack cookbooks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/966136 (https://phabricator.wikimedia.org/T329709) [07:57:57] (03CR) 10CI reject: [V: 04-1] alerts: don't fail if host already downtimed or uptimed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/966133 (owner: 10David Caro) [07:58:03] (03CR) 10CI reject: [V: 04-1] openstack: don't pass the new project when creating it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/966134 (https://phabricator.wikimedia.org/T346427) (owner: 10David Caro) [07:58:09] (03CR) 10CI reject: [V: 04-1] ceph: Adapt to multi-level crush tree [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/966135 (https://phabricator.wikimedia.org/T331145) (owner: 10David Caro) [07:58:18] (03CR) 10CI reject: [V: 04-1] ceph: add drain/undrain host and rack cookbooks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/966136 (https://phabricator.wikimedia.org/T329709) (owner: 10David Caro) [08:11:02] 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10Epic, and 2 others: Streamline WMCS Alerting and Paging - https://phabricator.wikimedia.org/T313444 (10dcaro) [08:11:04] 10cloud-services-team, 10Patch-For-Review, 10User-dcaro, 10User-fgiunchedi: [wmcs][alerting] Allow volunteer admins silencing alerts from cloudvps/toolforge/paws/quarry - https://phabricator.wikimedia.org/T320973 (10dcaro) 05In progress→03Stalled [08:11:23] 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [wmcs][alerting] Integrate metricsinfra alertmanager with victorops - https://phabricator.wikimedia.org/T323713 (10dcaro) 05Open→03Resolved [08:11:26] 10cloud-services-team, 10Patch-For-Review, 10User-dcaro, 10User-fgiunchedi: [wmcs][alerting] Allow volunteer admins silencing alerts from cloudvps/toolforge/paws/quarry - https://phabricator.wikimedia.org/T320973 (10dcaro) [08:11:28] 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10Epic, and 2 others: Streamline WMCS Alerting and Paging - https://phabricator.wikimedia.org/T313444 (10dcaro) [08:11:37] 10cloud-services-team: cloudgw improvements - https://phabricator.wikimedia.org/T347469 (10dcaro) [08:11:49] 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [ceph] export number of bad sectors per-disk - https://phabricator.wikimedia.org/T348716 (10dcaro) 05Open→03In progress [08:12:32] 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [cloudvps] grafana stats for haproxy response time give different data on refresh - https://phabricator.wikimedia.org/T343872 (10dcaro) 05Open→03Resolved [08:12:43] 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10dcaro) 05Open→03In progress [08:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:28:50] 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [ceph] export number of bad sectors per-disk - https://phabricator.wikimedia.org/T348716 (10dcaro) It will take at least 6 days to get any predictions: Oct 20 08:24:34 cloudcephmon2005-dev ceph-mgr[174... [08:48:37] (CephClusterInWarning) firing: Ceph cluster in is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [08:49:55] 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [ceph.eqiad] upgrade all hosts to 15 - https://phabricator.wikimedia.org/T349363 (10dcaro) p:05Triage→03High [08:50:02] 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [ceph.eqiad] upgrade all hosts to 15 - https://phabricator.wikimedia.org/T349363 (10dcaro) 05Open→03In progress [08:50:13] 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [ceph.eqiad] upgrade all hosts to 15 - https://phabricator.wikimedia.org/T349363 (10dcaro) Upgraded osd.50... [09:14:50] 10Cloud-VPS, 10cloud-services-team: cloud/instance-puppet.git updater is broken - https://phabricator.wikimedia.org/T349195 (10taavi) 05Open→03Resolved [09:28:27] (03CR) 10FNegri: [C: 03+1] general: move to spicerack>8 (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967244 (owner: 10David Caro) [09:33:13] (DiskSpace) firing: Disk space cloudbackup1004:9100:/ 5.95% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:40:21] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Use SMTP to send Emails to the users - https://phabricator.wikimedia.org/T338267 (10Ibinaboadiela) Good day everyone, I have made a pull request for this task. Please @wassan.anmol117 kindly review 🙏 Here's the link: https://github.com/coderw... [10:24:56] PROBLEM - Check systemd state on cloudcephosd1001 is CRITICAL: CRITICAL - degraded: The following units failed: ceph-osd@5.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:39:06] (03CR) 10David Caro: [C: 03+2] general: move to spicerack>8 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967244 (owner: 10David Caro) [10:42:51] (03Merged) 10jenkins-bot: general: move to spicerack>8 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967244 (owner: 10David Caro) [10:45:16] 10Toolforge (Toolforge iteration 01): [tbs][harbor] Improve Harbor admin docs - https://phabricator.wikimedia.org/T349313 (10dcaro) sketch of the upgrade process: === Migrate config and upload to puppet === === prepare script === Get the newer version 'online' install scripts, that will pull a zip with a `prep... [10:58:47] 10Striker, 10Release-Engineering-Team (Social Piranhas 🐟): Striker-created Diffusion mirrors of GitLab repos are empty (due to master vs main branch name mismatch) - https://phabricator.wikimedia.org/T348131 (10Aklapper) 05Open→03Resolved [10:59:20] RECOVERY - Check systemd state on cloudcephosd1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:24:55] (03PS2) 10FNegri: live_upgrade_openstack: add runtime description [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/964872 [11:51:18] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Use API:EmailUser to send Emails to the users - https://phabricator.wikimedia.org/T338267 (10wassan.anmol117) [12:03:01] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02), 10Patch-For-Review: Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10Sfaci) a:05SGupta-WMF→03EChukwukere-WMF [12:32:15] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02), 10Patch-For-Review: Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10Lokal_Profil) >>! In T347899#9258560, @Sfaci wrote: > Just wondering, for example, why this i... [12:41:32] 10Cloud-VPS (Project-requests): Request creation of catalyst-qte-admin VPS project - https://phabricator.wikimedia.org/T349378 (10Slst2020) [12:44:02] 10Cloud-VPS (Project-requests): Request creation of catalyst-qte-admin VPS project - https://phabricator.wikimedia.org/T349378 (10Slst2020) [12:46:07] 10Toolforge (Toolforge iteration 01), 10Patch-For-Review: Upgrade harbor from 2.5 to 2.9 - https://phabricator.wikimedia.org/T346241 (10Slst2020) [12:47:38] 10Toolforge (Toolforge iteration 01): [tbs][harbor] Improve Harbor admin docs - https://phabricator.wikimedia.org/T349313 (10Slst2020) a:03Slst2020 [12:51:17] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02), 10Patch-For-Review: Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10Sfaci) >>! In T347899#9268234, @Lokal_Profil wrote: >>>! In T347899#9258560, @Sfaci wrote: >>... [12:58:01] (03PS1) 10Majavah: vps: create_project: ask before using possibly-problematic name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967443 [13:01:33] (03CR) 10CI reject: [V: 04-1] vps: create_project: ask before using possibly-problematic name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967443 (owner: 10Majavah) [13:05:53] (03PS2) 10Majavah: vps: create_project: ask before using possibly-problematic name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967443 [13:07:51] (03PS1) 10Jbond: add bin file [labs/private] - 10https://gerrit.wikimedia.org/r/967446 [13:19:25] (03CR) 10David Caro: live_upgrade_openstack: add runtime description (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/964872 (owner: 10FNegri) [13:19:40] (03CR) 10David Caro: live_upgrade_openstack: add runtime description (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/964872 (owner: 10FNegri) [13:22:57] (03CR) 10Majavah: [C: 03+1] live_upgrade_openstack: add runtime description (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/964872 (owner: 10FNegri) [13:24:28] (03CR) 10David Caro: [C: 03+1] "LGTM" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967443 (owner: 10Majavah) [13:26:24] (03PS3) 10Majavah: vps: create_project: ask before using possibly-problematic name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967443 [13:27:02] (03CR) 10Majavah: [C: 03+2] vps: create_project: ask before using possibly-problematic name (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967443 (owner: 10Majavah) [13:31:08] (03Merged) 10jenkins-bot: vps: create_project: ask before using possibly-problematic name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/967443 (owner: 10Majavah) [13:31:39] 10Toolforge (Toolforge iteration 01): [tbs][harbor] Improve Harbor admin docs - https://phabricator.wikimedia.org/T349313 (10Slst2020) @dcaro Added to https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Harbor. Can you review? [13:33:14] (DiskSpace) firing: Disk space cloudbackup1004:9100:/ 5.974% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [13:36:12] 10Toolforge, 10Patch-For-Review: Turn wmcs-k8s-node-upgrade.py into a set of cookbooks - https://phabricator.wikimedia.org/T343869 (10taavi) 05Open→03Resolved [13:39:33] 10Toolforge (Toolforge iteration 01): [tbs][harbor] Improve Harbor admin docs - https://phabricator.wikimedia.org/T349313 (10dcaro) Done, looks good :), tweaked a couple little things [13:42:53] (03CR) 10FNegri: [C: 03+2] live_upgrade_openstack: add runtime description (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/964872 (owner: 10FNegri) [13:44:12] 10Toolforge (Toolforge iteration 01): [tbs][harbor] Improve Harbor admin docs - https://phabricator.wikimedia.org/T349313 (10Slst2020) >>! In T349313#9268435, @dcaro wrote: > Done, looks good :), tweaked a couple little things Thanks! [13:46:18] 10Cloud-VPS (Project-requests): Request creation of catalyst-qte-admin VPS project - https://phabricator.wikimedia.org/T349378 (10dcaro) +1 [13:50:48] 10Cloud-VPS (Project-requests): Request creation of catalyst VPS project - https://phabricator.wikimedia.org/T349378 (10Slst2020) [13:50:52] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - https://phabricator.wikimedia.org/T342455 (10cmooney) @Jclark-ctr these hosts are for a new proof-of-concept cloud openstack deployment. As such the [[ https://wikitech.wi... [13:51:30] 10Cloud-VPS (Project-requests): Request creation of catalyst VPS project - https://phabricator.wikimedia.org/T349378 (10Slst2020) renaming to just "catalyst" from "catalyst-qte-admin" for simplicity and to avoid using dashes. [13:53:24] 10Cloud-VPS (Project-requests): Request creation of catalyst VPS project - https://phabricator.wikimedia.org/T349378 (10taavi) +1 [13:57:30] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - https://phabricator.wikimedia.org/T342455 (10cmooney) >>! In T342455#9268504, @cmooney wrote: > @Jclark-ctr these hosts are for a new proof-of-concept cloud openstack deplo... [14:02:57] (03CR) 10David Caro: [C: 04-1] "We moved this to gitlab, I think I missed it when I did the move, can you migrate it if needed? thanks!" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/929009 (owner: 10Majavah) [14:03:42] (03Abandoned) 10David Caro: tests/test_job: test job from_k8s_object() parsing routine [cloud/toolforge/jobs-framework-api] - 10https://gerrit.wikimedia.org/r/908791 (owner: 10Arturo Borrero Gonzalez) [14:09:13] 10Toolforge, 10cloud-services-team, 10Acme-chief: toolforge acme-chief: Failed to generate additional resources using 'eval_generate': Could not intern_multiple from application/json: 416: unexpected token at '{"checksum":{"type":"md5","val' - https://phabricator.wikimedia.org/T349384 (10taavi) [14:13:57] (03PS3) 10FNegri: live_upgrade_openstack: add runtime description [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/964872 [14:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [15:23:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-puppetdb-02 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:28:03] (InstanceDown) resolved: Project toolsbeta instance toolsbeta-puppetdb-02 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:34:26] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [16:10:18] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Patch-For-Review, 10User-dcaro: [wmcs-cookbooks] tox is failing - https://phabricator.wikimedia.org/T348726 (10fnegri) 05In progress→03Resolved [16:13:13] 10Quarry, 10Toolforge, 10cloud-services-team (FY2023/2024-Q1): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407 (10fnegri) p:05Triage→03Medium There were no objections in the WMCS meeting, so we can proceed with creating the DNS record... [16:13:22] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10User-dcaro: [wmcs-cookbooks] tox is failing - https://phabricator.wikimedia.org/T348726 (10fnegri) [16:13:38] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [16:19:35] 10Toolforge, 10cloud-services-team, 10Acme-chief: toolforge acme-chief: Failed to generate additional resources using 'eval_generate': Could not intern_multiple from application/json: 416: unexpected token at '{"checksum":{"type":"md5","val' - https://phabricator.wikimedia.org/T349384 (10Vgutierrez) acme-chi... [16:22:57] 10Toolforge (Quota-requests): Request increased quota for Montage Toolforge tool - https://phabricator.wikimedia.org/T348894 (10fnegri) > I'd definitely be interested in increasing resources @taavi can I get your +1 on doubling the CPU and Memory quotas for this tool? > it occurs to me that too many workers co... [16:24:32] 10Toolforge, 10cloud-services-team, 10Acme-chief: toolforge acme-chief: Failed to generate additional resources using 'eval_generate': Could not intern_multiple from application/json: 416: unexpected token at '{"checksum":{"type":"md5","val' - https://phabricator.wikimedia.org/T349384 (10Vgutierrez) At the s... [16:52:09] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad1 cluster to Antelope - https://phabricator.wikimedia.org/T348843 (10fnegri) p:05Triage→03Medium [16:54:25] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Cumin, 10Infrastructure-Foundations, 10Patch-For-Review: [cumin] [openstack] Openstack backend fails when project is not set - https://phabricator.wikimedia.org/T346453 (10fnegri) p:05Medium→03High [16:58:35] 10Toolforge (Quota-requests): Request increased quota for Montage Toolforge tool - https://phabricator.wikimedia.org/T348894 (10fnegri) p:05Triage→03Medium a:03fnegri [17:02:06] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02), 10Patch-For-Review: Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10EChukwukere-WMF) Test status: //**QA PASS**// tested response and data ( compared with AQS 1... [17:08:13] (DiskSpace) resolved: Disk space cloudbackup1004:9100:/ 5.951% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [17:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [17:31:38] 10cloud-services-team: cloudgw improvements - https://phabricator.wikimedia.org/T347469 (10cmooney) p:05High→03Low [17:33:08] 10cloud-services-team: cloudgw improvements - https://phabricator.wikimedia.org/T347469 (10cmooney) Changed this to low priority for now. While in general BGP would be an improvement to VRRP on the cloudgw, the current setup in Eqiad is ok for now. There are no massive wins in moving to BGP, although it is a n... [17:34:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance tools-sgeweblight-10-24 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [17:54:03] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance tools-sgeweblight-10-24 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [19:22:57] (03PS1) 10Dwisehaupt: Add dummy db password for community_civicrm [labs/private] - 10https://gerrit.wikimedia.org/r/967519 (https://phabricator.wikimedia.org/T343486) [20:13:38] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [20:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [20:47:31] 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [ceph.eqiad] upgrade all hosts to 15 - https://phabricator.wikimedia.org/T349363 (10dcaro) Finished, all eqiad ceph hosts are on 15.2.6 now: ` root@cloudcephmon1001:~# ceph versions {... [20:47:36] 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [ceph.eqiad] upgrade all hosts to 15 - https://phabricator.wikimedia.org/T349363 (10dcaro) 05In progress→03Resolved [23:19:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [23:53:26] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse