[00:09:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:19:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [02:04:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [03:49:25] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:39:25] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:48:35] (OpenstackAPIResponse) firing: (6) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:03:35] (OpenstackAPIResponse) firing: (5) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:04:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [05:12:59] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02): Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10SGupta-WMF) Upon investigation , we concluded that this is a bug in AQS 2.0 media analytics service . It's missing... [05:44:25] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:43:35] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:44:25] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:48:35] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:59:25] (OpenstackAPIResponse) resolved: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:10:17] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Use SMTP to send Emails to the users - https://phabricator.wikimedia.org/T338267 (10Pamisijohn) Hello @wassan.anmol117, would this task be renamed back to //Use API:EmailUser to send Emails to the users// from //Use SMTP to send Emails to the... [07:41:33] 10Toolforge (Toolforge iteration 01), 10Patch-For-Review: [tbs][builder] Refactor task yaml template - https://phabricator.wikimedia.org/T348750 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/17 dev: Refactor shell scripts [08:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:16:00] 10Toolforge (Toolforge iteration 01): [tbs][builder] Refactor task yaml template - https://phabricator.wikimedia.org/T348750 (10Slst2020) 05In progress→03Resolved [08:18:42] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02): Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10Sfaci) Hi, In the description of this ticket there is a list with some items and the text "is correct" or "is not... [08:21:05] 10Toolforge (Toolforge iteration 01), 10Patch-For-Review: [tbs][builder] Add shellcheck to pre-commit - https://phabricator.wikimedia.org/T348961 (10CodeReviewBot) sstefanova opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/18 dev: add shellcheck [08:52:58] 10Cloud-VPS, 10cloud-services-team, 10Observability-Metrics, 10User-fgiunchedi: Move labs/wmcs (OpenStack) Prometheus instance off cloudmetrics hosts to prometheus* hosts - https://phabricator.wikimedia.org/T336854 (10taavi) >>! In T336854#9255656, @cmooney wrote: > There is an option to control what IP th... [09:47:10] siddharthvp opened https://github.com/toolforge/quarry/pull/29 [09:53:09] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Fix preview for books with long description - https://phabricator.wikimedia.org/T348411 (10Akanksha.t05) Made pr on github - https://github.com/coderwassananmol/BUB2/pull/225 [09:54:51] 10Cloud Services Proposals, 10Toolforge: Decision request - Toolforge external infrastructure domain usage - https://phabricator.wikimedia.org/T306039 (10dcaro) @taavi we can follow the decision making process for this, that will get you both exposure and a resolution https://www.mediawiki.org/wiki/Wikimedia_C... [09:54:56] 10Quarry, 10Toolforge, 10cloud-services-team (FY2023/2024-Q1): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407 (10fnegri) a:03fnegri I think the best solution here (both for security and performance) is to let Quarry connect to the read... [10:15:56] 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10cmooney) [10:28:28] 10Grid-Engine-to-K8s-Migration, 10MediaWiki-Platform-Team: Migrate ruprecht from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320021 (10Aklapper) Boldly adding #mediawiki-platform-team to open Ruprecht tasks as https://toolsadmin.wikimedia.org/tools/id/ruprecht does not pro... [10:37:34] 10Quarry, 10Toolforge, 10cloud-services-team (FY2023/2024-Q1): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407 (10SD0001) >>! In T348407#9257164, @fnegri wrote: > but I also want to consider another thing before opening this access: do we... [10:40:47] 10Toolforge, 10cloud-services-team (FY2023/2024-Q1), 10Goal: Upgrade Toolforge Kubernetes to version 1.23 - https://phabricator.wikimedia.org/T298005 (10taavi) [11:06:36] vivian-rook closed https://github.com/toolforge/quarry/pull/29 [11:11:33] 10Cloud-VPS (Quota-requests), 10Infrastructure-Foundations, 10Puppet CI: Request Addtional resources for puppet-diffs project - https://phabricator.wikimedia.org/T349006 (10taavi) +1 [11:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:21:13] (DiskSpace) firing: Disk space cloudbackup1004:9100:/ 5.494% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [12:39:04] !log puppet-diffs dcaro@urcuchillay START - Cookbook wmcs.openstack.quota_increase (T349006) [12:39:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet-diffs/SAL [12:39:09] T349006: Request Addtional resources for puppet-diffs project - https://phabricator.wikimedia.org/T349006 [12:39:28] !log puppet-diffs dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T349006) [12:39:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet-diffs/SAL [12:43:26] 10Cloud-VPS (Quota-requests), 10Infrastructure-Foundations, 10Puppet CI: Request Addtional resources for puppet-diffs project - https://phabricator.wikimedia.org/T349006 (10dcaro) 05Open→03Resolved a:03dcaro Done, let me know if you have any issues :) [12:59:06] 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10cmooney) Stab in the dark guessing what commands are needed in codfw, based on man page and some guides (including info Artur... [13:27:21] 10Cloud-VPS, 10cloud-services-team, 10Sustainability (Incident Followup): Add external meta-monitoring for metricsinfra - https://phabricator.wikimedia.org/T288053 (10taavi) a:03taavi I am going to be working on this. The general plan is that there'll be a VM in metricsinfra that hosts a toolschecker-style... [13:44:30] 10Toolforge (Toolforge iteration 01): Upgrade harbor - https://phabricator.wikimedia.org/T346241 (10Slst2020) [13:50:14] 10Toolforge (Toolforge iteration 01): Upgrade harbor - https://phabricator.wikimedia.org/T346241 (10Slst2020) Upgrading to 2.9 directly works fine locally (lima-kilo/vagrant). There seem to be no breaking changes or deprecations that affect us, so I think I'll YOLO it without first upgrading to 2.7 in between. [14:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:43:23] 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10cmooney) Ok we seem to have muddled through, for the record commands needed as follows: ` wmcs-openstack port unset 1290224c-... [14:55:54] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02): Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10Ladsgroup) >>! In T347899#9256819, @Sfaci wrote: > Hi, > > In the description of this ticket there is a list with... [14:58:25] 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10dcaro) Note that we have to merge and deploy this first: https://gerrit.wikimedia.org/r/c/operations/puppet/+/965708 [15:00:20] 10cloud-services-team (Hardware), 10SRE, 10ops-codfw, 10User-dcaro: cloud: prepare codfw for expansion (racks, switches, ceph) - https://phabricator.wikimedia.org/T346661 (10nskaggs) @Papaul Is it possible there is 1 more rack that could be dedicated in the current setup (so 2 total WMCS racks, 1 existing... [15:16:30] PROBLEM - Host cloudvirt1051 is DOWN: PING CRITICAL - Packet loss = 100% [15:17:45] (ProbeDown) firing: (2) Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:20:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-acme-chief-01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:21:03] (InstanceDown) firing: (2) Project cloudinfra instance cloud-puppetmaster-03 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:21:03] (InstanceDown) firing: (6) Project tools instance tools-k8s-haproxy-4 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:21:13] (DiskSpace) firing: Disk space cloudbackup1004:9100:/ 5.849% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [15:21:25] (NodeDown) firing: The node cloudvirt1051 is unreachable. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1051 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [15:21:25] (NodeDown) firing: #page The cloudvirt node cloudvirt1051 is unreachable. This is a [15:21:30] 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T349109 (10phaultfinder) [15:23:49] 10Cloud-VPS, 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T349109 (10taavi) p:05Triage→03Unbreak! a:03taavi [15:24:13] 10Cloud-VPS, 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T349109 (10taavi) affected VMs: {P52997} [15:25:03] (InstanceDown) firing: (2) Project toolsbeta instance toolsbeta-acme-chief-01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:25:52] PROBLEM - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 177 bytes in 0.114 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [15:28:23] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain (T349109) [15:28:28] T349109: NodeDown - https://phabricator.wikimedia.org/T349109 [15:29:03] !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) (T349109) [15:41:06] 10Cloud-VPS, 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T349109 (10taavi) ` mysql:root@localhost [nova_eqiad1]> update instances set host = 'cloudvirt1058' where uuid = '1c3e4b8a-9076-4c8c-b2e6-51606c0b1fb8'; taavi@cloudcontrol1006 ~ $ sudo OS_PROJECT_ID=tools wmcs-openstack server re... [15:41:57] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02): Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10Sfaci) Just wondering, for example, why this item `File:)(_-_Flickr_-_Time.Captured..jpg` is included as "is not co... [15:42:22] 10Grid-Engine-to-K8s-Migration, 10MediaWiki-Engineering: Migrate ruprecht from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320021 (10larissagaulia) [15:42:45] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:44:33] 10Cloud-VPS, 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T349109 (10taavi) ` mysql:root@localhost [nova_eqiad1]> update instances set host = 'cloudvirt1058' where host = 'cloudvirt1051' and deleted = 0; Query OK, 25 rows affected (0.007 sec) Rows matched: 25 Changed: 25 Warnings: 0 m... [15:44:46] RECOVERY - Host cloudvirt1051 is UP: PING OK - Packet loss = 0%, RTA = 8.48 ms [15:46:03] (InstanceDown) firing: (6) Project tools instance tools-k8s-haproxy-4 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:46:25] (NodeDown) resolved: The node cloudvirt1051 is unreachable. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1051 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [15:46:25] (NodeDown) resolved: #page The cloudvirt node cloudvirt1051 is unreachable. This is a [15:46:46] PROBLEM - ensure kvm processes are running on cloudvirt1051 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [15:48:03] (WidespreadPuppetAgentFailure) firing: Widespread puppet agent failures in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [15:49:08] 10Cloud-VPS, 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T349109 (10taavi) ` select group_concat(CONCAT("sudo OS_PROJECT_ID=", project_id, " wmcs-openstack server reboot ", hostname, " --hard") SEPARATOR "\n") from instances where host = 'cloudvirt1058' and deleted = 0; ` [15:50:03] (InstanceDown) firing: (2) Project toolsbeta instance toolsbeta-acme-chief-01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:51:03] (InstanceDown) resolved: (2) Project cloudinfra instance cloud-puppetmaster-03 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:51:03] (InstanceDown) resolved: (6) Project tools instance tools-k8s-haproxy-4 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:52:52] RECOVERY - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.113 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [15:55:03] (InstanceDown) resolved: (2) Project toolsbeta instance toolsbeta-acme-chief-01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:56:55] 10Cloud-VPS, 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T349109 (10taavi) ` mysql:root@localhost [cinder]> select nova_eqiad1.instances.uuid as instance_uuid, -> volume_attachment.volume_id, volumes.status, -> volume_attachment.attach_status, volume_attachment.mou... [16:01:41] 10Cloud-VPS, 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T349109 (10taavi) cloudvirt1051 moved from `ceph` aggregate to `maintenance`, cloudvirt1058 moved from `spare` to `ceph` [16:02:30] 10Cloud-VPS, 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T349109 (10taavi) p:05Unbreak!→03High [16:03:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-docker-registry-02 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [16:05:03] (PuppetAgentNoResources) firing: No Puppet resources found on instance quarry-worker-03 on project quarry - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:08:03] (WidespreadPuppetAgentFailure) resolved: Widespread puppet agent failures in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [16:18:03] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance toolsbeta-docker-registry-02 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [16:20:03] (PuppetAgentNoResources) resolved: No Puppet resources found on instance quarry-worker-03 on project quarry - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:23:23] 10Quarry, 10Toolforge, 10cloud-services-team (FY2023/2024-Q1): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407 (10fnegri) > Only the public tool databases (the ones with names ending in _p) are planned to be made accessible from Quarry.... [16:28:24] 10Cloud-VPS, 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T349109 (10dcaro) From the web console logs, it just says it was turned off: ` 2023-10-17 15:41:50 PSU0800 Power Supply 1: Status = 0x1, IOUT = 0x0, VOUT= 0x0, TEMP= 0x0, FAN = 0x0, INPUT= 0x0. -------- 2023-10-17 16:14:23... [17:11:13] (DiskSpace) resolved: Disk space cloudbackup1004:9100:/ 5.894% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [17:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [18:39:17] (03PS1) 10Andrew Bogott: container panel: add a policy check [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/966599 (https://phabricator.wikimedia.org/T348885) [18:45:17] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] container panel: add a policy check [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/966599 (https://phabricator.wikimedia.org/T348885) (owner: 10Andrew Bogott) [19:11:12] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Change Header.js component to React hooks component - https://phabricator.wikimedia.org/T348415 (10wassan.anmol117) 05Open→03Resolved [19:11:52] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Change Books.js component to React hooks component - https://phabricator.wikimedia.org/T348414 (10wassan.anmol117) 05Open→03Resolved [19:12:58] PROBLEM - puppet last run on cloudcontrol1005 is CRITICAL: CRITICAL: Puppet last ran 23 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [19:14:10] PROBLEM - puppet last run on cloudcontrol1006 is CRITICAL: CRITICAL: Puppet last ran 23 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [19:16:24] PROBLEM - puppet last run on cloudcontrol1007 is CRITICAL: CRITICAL: Puppet last ran 23 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [19:23:26] 10Tool-Pageviews, 10Data-Engineering, 10Data Products (Sprint 02): Mediarequests returning "file not found" for filenames with specific characters - https://phabricator.wikimedia.org/T347899 (10Ladsgroup) I don't know that file (I didn't report it) so I can't say it was among the incorrect ones or not. The... [19:45:41] 10Quarry, 10cloud-services-team: Should quarry use our standard secrets management - https://phabricator.wikimedia.org/T290184 (10rook) [19:45:51] 10Quarry: git-crypt for config.yaml files - https://phabricator.wikimedia.org/T348476 (10rook) [19:47:10] 10Quarry, 10Toolforge, 10cloud-services-team (FY2023/2024-Q1): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407 (10nskaggs) Is this a feature we want to also make accessible to Superset? I suspect it could use the same technical implementa... [19:48:29] 10Quarry: Deduplicate config load - https://phabricator.wikimedia.org/T349135 (10rook) [19:51:24] 10Quarry, 10Toolforge, 10cloud-services-team (FY2023/2024-Q1): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407 (10rook) >>! In T348407#9259469, @nskaggs wrote: > Is this a feature we want to also make accessible to Superset? I suspect it... [20:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [21:33:31] 10Cloud-VPS, 10Data-Services, 10cloud-services-team, 10User-Marostegui: Horizon Object Storage UI should not display for readers - https://phabricator.wikimedia.org/T348885 (10Andrew) 05Open→03Resolved [21:33:33] 10Cloud-VPS, 10Data-Services, 10cloud-services-team, 10User-Marostegui: Support Openstack Swift APIs via the radosgw - https://phabricator.wikimedia.org/T276961 (10Andrew) [23:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed