[00:10:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:10:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:49:28] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [03:18:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [03:36:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [03:41:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [03:43:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [03:50:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [04:00:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [04:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [04:14:29] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:11:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [05:21:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:52:46] 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of FoundationMemory VPS project - https://phabricator.wikimedia.org/T350760 (10Varnent) [06:53:13] 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of FoundationMemory VPS project - https://phabricator.wikimedia.org/T350760 (10Varnent) [07:00:10] (03PS1) 10Catrope: releases: Bump Codex to 1.0.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/972688 [07:04:11] RECOVERY - Check unit status of backup_cinder_volumes on cloudbackup2002 is OK: OK: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [07:11:11] PROBLEM - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2002 is CRITICAL: CRITICAL: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [07:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [07:18:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [07:38:31] (03CR) 10VolkerE: [C: 03+2] releases: Bump Codex to 1.0.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/972688 (owner: 10Catrope) [07:39:18] (03Merged) 10jenkins-bot: releases: Bump Codex to 1.0.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/972688 (owner: 10Catrope) [07:43:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:12:08] 10Cloud-VPS, 10cloud-services-team: Check if nfs-maps.wikimedia.org is still in use - https://phabricator.wikimedia.org/T350259 (10taavi) a:03taavi [11:14:56] (ToolsToolsDBWritableState) firing: There should be exactly one writable MariaDB instance instead of 0 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState [11:18:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [11:24:56] (ToolsToolsDBWritableState) resolved: There should be exactly one writable MariaDB instance instead of 0 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState [11:43:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [12:00:05] 10Data-Services, 10DBA, 10Data Engineering and Event Platform Team, 10Data-Platform-SRE: Prepare and check storage layer for fonwiki - https://phabricator.wikimedia.org/T347938 (10BTullis) 05Open→03Resolved a:03BTullis This is done now. Once again it took two executions of the cookbook, but the opens... [12:35:28] 10Wikibugs: Better message than "This change is ready for review" when patch stops being WIP - https://phabricator.wikimedia.org/T350778 (10jhsoby) [13:14:25] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [13:16:15] 10Cloud-VPS, 10cloud-services-team: Check if nfs-maps.wikimedia.org is still in use - https://phabricator.wikimedia.org/T350259 (10taavi) 05Open→03Resolved Removed from Netbox and DNS. Closing. [14:09:38] (03PS1) 10Brouberol: Index the data-engineering/airflow-tags gitlab repo [labs/codesearch] - 10https://gerrit.wikimedia.org/r/972829 (https://phabricator.wikimedia.org/T337468) [14:10:50] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1001 for host cloudservices1006.eqiad.wmnet with OS bookworm [14:13:17] (03CR) 10Ladsgroup: [C: 03+2] Index the data-engineering/airflow-tags gitlab repo [labs/codesearch] - 10https://gerrit.wikimedia.org/r/972829 (https://phabricator.wikimedia.org/T337468) (owner: 10Brouberol) [14:14:09] (03Merged) 10jenkins-bot: Index the data-engineering/airflow-tags gitlab repo [labs/codesearch] - 10https://gerrit.wikimedia.org/r/972829 (https://phabricator.wikimedia.org/T337468) (owner: 10Brouberol) [14:14:19] (HAProxyBackendUnavailable) firing: HAProxy service designate-api_backend backend cloudservices1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [14:15:08] 10VPS-project-Codesearch, 10Patch-For-Review: Please could we add the data-engineering/airflow_dags repo to codesearch? - https://phabricator.wikimedia.org/T337468 (10Ladsgroup) 05Open→03Resolved a:03brouberol It'll be accessible in a day or so. [14:17:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:21:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [14:24:25] (TfInfraTestApplyFailed) resolved: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:43:38] 10PAWS: Cannot deploy ingress-nginx chart 4.8.0 - https://phabricator.wikimedia.org/T347506 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/346 [14:43:48] vivian-rook opened https://github.com/toolforge/paws/pull/346 [14:49:19] (HAProxyBackendUnavailable) resolved: HAProxy service designate-api_backend backend cloudservices1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [14:50:19] (HAProxyBackendUnavailable) firing: HAProxy service designate-api_backend backend cloudservices1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [14:55:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:59:34] (HAProxyBackendUnavailable) resolved: HAProxy service designate-api_backend backend cloudservices1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [15:18:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [15:36:03] (TfInfraTestDestroyFailed) resolved: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [15:43:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [15:44:48] 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of foundationmemory VPS project - https://phabricator.wikimedia.org/T350760 (10taavi) [15:50:03] (TfInfraTestApplyFailed) resolved: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [15:52:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:05:46] PROBLEM - Check systemd state on clouddb1021 is CRITICAL: CRITICAL - degraded: The following units failed: mariadb.service,prometheus-mysqld-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:10:50] 10Toolforge (Quota-requests): Request increased quota for anchor-corrector Toolforge tool - https://phabricator.wikimedia.org/T350484 (10Andrew) +1 anything to move jobs off the grid :) [16:11:10] 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of foundationmemory VPS project - https://phabricator.wikimedia.org/T350760 (10Andrew) a:03Andrew [16:11:23] 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of foundationmemory VPS project - https://phabricator.wikimedia.org/T350760 (10taavi) +1, discussed and approved in the WMCS meeting today. [16:18:13] 10Toolforge (Quota-requests): Request increased quota for anchor-corrector Toolforge tool - https://phabricator.wikimedia.org/T350484 (10taavi) @Kanashimi I've increased the quota for this tool to max 32Gi of memory at once, with up to 6Gi for a single pod. However, there are two changes you need to make your j... [16:18:46] PROBLEM - mysqld processes on clouddb1021 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [16:19:38] RECOVERY - Check systemd state on clouddb1021 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:23:40] RECOVERY - mysqld processes on clouddb1021 is OK: PROCS OK: 8 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [16:32:59] (PuppetFailure) firing: Puppet has failed on cloudservices1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [16:33:04] 10cloud-services-team: PuppetFailure cloudservices1006:9100 Puppet failure on cloudservices1006:9100 - https://phabricator.wikimedia.org/T350808 (10phaultfinder) [16:33:33] (SystemdUnitDown) firing: (2) The service unit labs-ip-alias-dump.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:36:41] 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of foundationmemory VPS project - https://phabricator.wikimedia.org/T350760 (10Andrew) 05Open→03Resolved All set! [17:18:34] (SystemdUnitDown) resolved: The service unit mariadb.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:20:03] 10PAWS, 10good first task: Remove variables if unused. - https://phabricator.wikimedia.org/T350812 (10rook) [17:24:33] (SystemdUnitDown) firing: The service unit mariadb.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:52:19] (HAProxyBackendUnavailable) firing: (4) HAProxy service keystone-admin-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:54:31] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [17:57:19] (HAProxyBackendUnavailable) resolved: (4) HAProxy service keystone-admin-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:59:24] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [18:10:55] PROBLEM - Check systemd state on cloudservices1006 is CRITICAL: CRITICAL - degraded: The following units failed: mariadb.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:16:33] RECOVERY - Check systemd state on cloudservices1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:17:59] (PuppetFailure) resolved: Puppet has failed on cloudservices1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:19:33] (SystemdUnitDown) resolved: The service unit mariadb.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:30:10] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1001 for host cloudservices1006.eqiad.wmnet with OS bookworm completed: - cloudservices... [18:43:41] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: add pre-commit to maintain-harbor - https://phabricator.wikimedia.org/T350452 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/17 [maintain-harbor] add pr... [19:06:37] RECOVERY - Check unit status of backup_cinder_volumes on cloudbackup2001 is OK: OK: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:07:01] 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-User, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [horizon] Log in timing out due to nutcracker being stopped - https://phabricator.wikimedia.org/T333561 (10Krinkle) [19:07:04] 10Horizon, 10Striker, 10cloud-services-team, 10wikitech.wikimedia.org: Move labweb hosts from nutcracker to mcrouter? - https://phabricator.wikimedia.org/T202431 (10Krinkle) [19:13:03] PROBLEM - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2001 is CRITICAL: CRITICAL: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:18:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [19:43:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [20:02:37] (CephSlowOps) firing: Ceph cluster in eqiad has 59 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [20:02:42] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349502 (10phaultfinder) [20:06:37] (CephClusterInWarning) firing: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:12:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 44 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [20:16:37] (CephClusterInWarning) resolved: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [21:06:27] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [21:31:26] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [22:29:38] 10wikitech.wikimedia.org, 10MediaWiki-Blocks, 10MediaWiki-extensions-OAuth, 10Wikimedia-production-error: OAuth login to wikitech fails when running MediaWiki 1.42.0-wmf.4 - https://phabricator.wikimedia.org/T350836 (10bd808) [22:35:16] 10wikitech.wikimedia.org, 10MediaWiki-Blocks, 10MediaWiki-extensions-OAuth, 10Wikimedia-production-error: OAuth login to wikitech fails when running MediaWiki 1.42.0-wmf.4 - https://phabricator.wikimedia.org/T350836 (10bd808) This backtrace was the breadcrumb that lead to realizing there was a double unstu... [22:45:59] 10wikitech.wikimedia.org, 10MediaWiki-Blocks, 10MediaWiki-extensions-OAuth, 10Wikimedia-production-error: OAuth login to wikitech fails when running MediaWiki 1.42.0-wmf.4 - https://phabricator.wikimedia.org/T350836 (10bd808) This is certainly {T350080} related, but I'm not sure if it is worth blocking the... [23:18:13] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [23:34:40] 10wikitech.wikimedia.org, 10MediaWiki-Blocks, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Wikimedia-production-error: OAuth login to wikitech fails when running MediaWiki 1.42.0-wmf.4 - https://phabricator.wikimedia.org/T350836 (10bd808) [23:37:47] 10wikitech.wikimedia.org, 10MediaWiki-Blocks, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Wikimedia-production-error: OAuth login to wikitech fails when running MediaWiki 1.42.0-wmf.4 - https://phabricator.wikimedia.org/T350836 (10bd808) p:05Triage→03High [23:41:27] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:43:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange