[00:10:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[01:10:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[01:14:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[02:49:28] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[03:18:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[03:36:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[03:41:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[03:43:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[03:50:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[04:00:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[04:14:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[04:14:29] <jinxer-wm>	 (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[05:11:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[05:21:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[06:52:46] <wikibugs>	 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of FoundationMemory VPS project - https://phabricator.wikimedia.org/T350760 (10Varnent)
[06:53:13] <wikibugs>	 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of FoundationMemory VPS project - https://phabricator.wikimedia.org/T350760 (10Varnent)
[07:00:10] <wikibugs>	 (03PS1) 10Catrope: releases: Bump Codex to 1.0.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/972688
[07:04:11] <icinga-wm>	 RECOVERY - Check unit status of backup_cinder_volumes on cloudbackup2002 is OK: OK: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[07:11:11] <icinga-wm>	 PROBLEM - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2002 is CRITICAL: CRITICAL: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[07:14:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[07:18:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[07:38:31] <wikibugs>	 (03CR) 10VolkerE: [C: 03+2] releases: Bump Codex to 1.0.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/972688 (owner: 10Catrope)
[07:39:18] <wikibugs>	 (03Merged) 10jenkins-bot: releases: Bump Codex to 1.0.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/972688 (owner: 10Catrope)
[07:43:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[10:14:24] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[11:12:08] <wikibugs>	 10Cloud-VPS, 10cloud-services-team: Check if nfs-maps.wikimedia.org is still in use - https://phabricator.wikimedia.org/T350259 (10taavi) a:03taavi
[11:14:56] <wmcs-alerts>	 (ToolsToolsDBWritableState) firing: There should be exactly one writable MariaDB instance instead of 0 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState
[11:18:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[11:24:56] <wmcs-alerts>	 (ToolsToolsDBWritableState) resolved: There should be exactly one writable MariaDB instance instead of 0 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState
[11:43:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[12:00:05] <wikibugs>	 10Data-Services, 10DBA, 10Data Engineering and Event Platform Team, 10Data-Platform-SRE: Prepare and check storage layer for fonwiki - https://phabricator.wikimedia.org/T347938 (10BTullis) 05Open→03Resolved a:03BTullis This is done now. Once again it took two executions of the cookbook, but the opens...
[12:35:28] <wikibugs>	 10Wikibugs: Better message than "This change is ready for review" when patch stops being WIP - https://phabricator.wikimedia.org/T350778 (10jhsoby)
[13:14:25] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[13:16:15] <wikibugs>	 10Cloud-VPS, 10cloud-services-team: Check if nfs-maps.wikimedia.org is still in use - https://phabricator.wikimedia.org/T350259 (10taavi) 05Open→03Resolved Removed from Netbox and DNS. Closing.
[14:09:38] <wikibugs>	 (03PS1) 10Brouberol: Index the data-engineering/airflow-tags gitlab repo [labs/codesearch] - 10https://gerrit.wikimedia.org/r/972829 (https://phabricator.wikimedia.org/T337468)
[14:10:50] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1001 for host cloudservices1006.eqiad.wmnet with OS bookworm
[14:13:17] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Index the data-engineering/airflow-tags gitlab repo [labs/codesearch] - 10https://gerrit.wikimedia.org/r/972829 (https://phabricator.wikimedia.org/T337468) (owner: 10Brouberol)
[14:14:09] <wikibugs>	 (03Merged) 10jenkins-bot: Index the data-engineering/airflow-tags gitlab repo [labs/codesearch] - 10https://gerrit.wikimedia.org/r/972829 (https://phabricator.wikimedia.org/T337468) (owner: 10Brouberol)
[14:14:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: HAProxy service designate-api_backend backend cloudservices1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[14:15:08] <wikibugs>	 10VPS-project-Codesearch, 10Patch-For-Review: Please could we add the data-engineering/airflow_dags repo to codesearch? - https://phabricator.wikimedia.org/T337468 (10Ladsgroup) 05Open→03Resolved a:03brouberol It'll be accessible in a day or so.
[14:17:03] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:21:03] <wmcs-alerts>	 (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[14:24:25] <wmcs-alerts>	 (TfInfraTestApplyFailed) resolved: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[14:43:38] <wikibugs>	 10PAWS: Cannot deploy ingress-nginx chart 4.8.0 - https://phabricator.wikimedia.org/T347506 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/346
[14:43:48] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/paws/pull/346
[14:49:19] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: HAProxy service designate-api_backend backend cloudservices1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[14:50:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: HAProxy service designate-api_backend backend cloudservices1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[14:55:03] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[14:59:34] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: HAProxy service designate-api_backend backend cloudservices1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[15:18:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[15:36:03] <wmcs-alerts>	 (TfInfraTestDestroyFailed) resolved: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[15:43:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[15:44:48] <wikibugs>	 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of foundationmemory VPS project - https://phabricator.wikimedia.org/T350760 (10taavi)
[15:50:03] <wmcs-alerts>	 (TfInfraTestApplyFailed) resolved: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[15:52:03] <wmcs-alerts>	 (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[16:05:46] <icinga-wm>	 PROBLEM - Check systemd state on clouddb1021 is CRITICAL: CRITICAL - degraded: The following units failed: mariadb.service,prometheus-mysqld-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:10:50] <wikibugs>	 10Toolforge (Quota-requests): Request increased quota for anchor-corrector Toolforge tool - https://phabricator.wikimedia.org/T350484 (10Andrew) +1 anything to move jobs off the grid :)
[16:11:10] <wikibugs>	 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of foundationmemory VPS project - https://phabricator.wikimedia.org/T350760 (10Andrew) a:03Andrew
[16:11:23] <wikibugs>	 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of foundationmemory VPS project - https://phabricator.wikimedia.org/T350760 (10taavi) +1, discussed and approved in the WMCS meeting today.
[16:18:13] <wikibugs>	 10Toolforge (Quota-requests): Request increased quota for anchor-corrector Toolforge tool - https://phabricator.wikimedia.org/T350484 (10taavi) @Kanashimi I've increased the quota for this tool to max 32Gi of memory at once, with up to 6Gi for a single pod.  However, there are two changes you need to make your j...
[16:18:46] <icinga-wm>	 PROBLEM - mysqld processes on clouddb1021 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[16:19:38] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1021 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:23:40] <icinga-wm>	 RECOVERY - mysqld processes on clouddb1021 is OK: PROCS OK: 8 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[16:32:59] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on cloudservices1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[16:33:04] <wikibugs>	 10cloud-services-team: PuppetFailure cloudservices1006:9100 Puppet failure on cloudservices1006:9100 - https://phabricator.wikimedia.org/T350808 (10phaultfinder)
[16:33:33] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit labs-ip-alias-dump.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[16:36:41] <wikibugs>	 10Cloud-VPS (Project-requests), 10WMF-Communications: Request creation of foundationmemory VPS project - https://phabricator.wikimedia.org/T350760 (10Andrew) 05Open→03Resolved All set!
[17:18:34] <jinxer-wm>	 (SystemdUnitDown) resolved: The service unit mariadb.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[17:20:03] <wikibugs>	 10PAWS, 10good first task: Remove variables if unused. - https://phabricator.wikimedia.org/T350812 (10rook)
[17:24:33] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit mariadb.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[17:52:19] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (4) HAProxy service keystone-admin-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[17:54:31] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack
[17:57:19] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: (4) HAProxy service keystone-admin-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[17:59:24] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0)
[18:10:55] <icinga-wm>	 PROBLEM - Check systemd state on cloudservices1006 is CRITICAL: CRITICAL - degraded: The following units failed: mariadb.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:16:33] <icinga-wm>	 RECOVERY - Check systemd state on cloudservices1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:17:59] <jinxer-wm>	 (PuppetFailure) resolved: Puppet has failed on cloudservices1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[18:19:33] <jinxer-wm>	 (SystemdUnitDown) resolved: The service unit mariadb.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[18:30:10] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1001 for host cloudservices1006.eqiad.wmnet with OS bookworm completed: - cloudservices...
[18:43:41] <wikibugs>	 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: add pre-commit to maintain-harbor - https://phabricator.wikimedia.org/T350452 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/17  [maintain-harbor] add pr...
[19:06:37] <icinga-wm>	 RECOVERY - Check unit status of backup_cinder_volumes on cloudbackup2001 is OK: OK: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:07:01] <wikibugs>	 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-User, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [horizon] Log in timing out due to nutcracker being stopped - https://phabricator.wikimedia.org/T333561 (10Krinkle)
[19:07:04] <wikibugs>	 10Horizon, 10Striker, 10cloud-services-team, 10wikitech.wikimedia.org: Move labweb hosts from nutcracker to mcrouter? - https://phabricator.wikimedia.org/T202431 (10Krinkle)
[19:13:03] <icinga-wm>	 PROBLEM - Check unit status of remove_dangling_cinder_snapshots on cloudbackup2001 is CRITICAL: CRITICAL: Status of the systemd unit remove_dangling_cinder_snapshots https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:18:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[19:43:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[20:02:37] <jinxer-wm>	 (CephSlowOps) firing: Ceph cluster in eqiad has 59 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[20:02:42] <wikibugs>	 10cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349502 (10phaultfinder)
[20:06:37] <jinxer-wm>	 (CephClusterInWarning) firing: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[20:12:37] <jinxer-wm>	 (CephSlowOps) resolved: Ceph cluster in eqiad has 44 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[20:16:37] <jinxer-wm>	 (CephClusterInWarning) resolved: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[21:06:27] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[21:31:26] <jinxer-wm>	 (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[22:29:38] <wikibugs>	 10wikitech.wikimedia.org, 10MediaWiki-Blocks, 10MediaWiki-extensions-OAuth, 10Wikimedia-production-error: OAuth login to wikitech fails when running MediaWiki 1.42.0-wmf.4 - https://phabricator.wikimedia.org/T350836 (10bd808)
[22:35:16] <wikibugs>	 10wikitech.wikimedia.org, 10MediaWiki-Blocks, 10MediaWiki-extensions-OAuth, 10Wikimedia-production-error: OAuth login to wikitech fails when running MediaWiki 1.42.0-wmf.4 - https://phabricator.wikimedia.org/T350836 (10bd808) This backtrace was the breadcrumb that lead to realizing there was a double unstu...
[22:45:59] <wikibugs>	 10wikitech.wikimedia.org, 10MediaWiki-Blocks, 10MediaWiki-extensions-OAuth, 10Wikimedia-production-error: OAuth login to wikitech fails when running MediaWiki 1.42.0-wmf.4 - https://phabricator.wikimedia.org/T350836 (10bd808) This is certainly {T350080} related, but I'm not sure if it is worth blocking the...
[23:18:13] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[23:34:40] <wikibugs>	 10wikitech.wikimedia.org, 10MediaWiki-Blocks, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Wikimedia-production-error: OAuth login to wikitech fails when running MediaWiki 1.42.0-wmf.4 - https://phabricator.wikimedia.org/T350836 (10bd808)
[23:37:47] <wikibugs>	 10wikitech.wikimedia.org, 10MediaWiki-Blocks, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Wikimedia-production-error: OAuth login to wikitech fails when running MediaWiki 1.42.0-wmf.4 - https://phabricator.wikimedia.org/T350836 (10bd808) p:05Triage→03High
[23:41:27] <jinxer-wm>	 (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[23:43:12] <jinxer-wm>	 (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange