[01:54:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: clouddb-services-puppetmaster-01.clouddb-services.eqiad.wmflabs is about to expire in 27d 11h 58m 43s - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [02:10:31] (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 14722 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [04:54:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: clouddb-services-puppetmaster-01.clouddb-services.eqiad.wmflabs is about to expire in 27d 8h 58m 43s - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [05:15:31] (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 25900 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [05:23:50] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [05:52:51] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [05:57:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:54:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: clouddb-services-puppetmaster-01.clouddb-services.eqiad.wmflabs is about to expire in 27d 5h 58m 43s - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [08:15:31] (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 36700 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [08:33:56] (ProbeDown) firing: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:34:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [08:38:56] (ProbeDown) resolved: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:39:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [08:56:42] 10Toolforge Build Service: Support monorepos with the Multi Procfile buildpack - https://phabricator.wikimedia.org/T355329 (10dcaro) That buildpack has been >4 years without any commits, and it does not seem widely used either, I would prefer not having to support it if it does not solve a blocking issue or help... [09:24:03] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [09:27:23] (03PS4) 10Eugene233: Fix capitalization in some ISA messages [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/990675 (https://phabricator.wikimedia.org/T354920) [09:39:12] 10Toolforge (Toolforge iteration 03): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) I'll close this then, but feel free to open another task with any issue you find :+1: [09:39:17] 10Grid-Engine-to-K8s-Migration: Migrate botsister from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319605 (10dcaro) [09:39:19] 10Grid-Engine-to-K8s-Migration: Migrate botorder from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319604 (10dcaro) [09:39:22] 10Grid-Engine-to-K8s-Migration: Migrate bothasava from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319601 (10dcaro) [09:39:54] 10Toolforge (Toolforge iteration 03): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) 05In progress→03Resolved [10:54:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: clouddb-services-puppetmaster-01.clouddb-services.eqiad.wmflabs is about to expire in 27d 2h 58m 43s - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [10:58:44] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [clouddb-service-puppetmaster-2] Renew puppet CA certificates - https://phabricator.wikimedia.org/T355410 (10dcaro) p:05Triage→03High [10:58:51] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [clouddb-service-puppetmaster-2] Renew puppet CA certificates - https://phabricator.wikimedia.org/T355410 (10dcaro) 05Open→03In progress [11:07:11] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-01-19 - https://phabricator.wikimedia.org/T355411 (10dcaro) p:05Triage→03High [11:07:17] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-01-19 - https://phabricator.wikimedia.org/T355411 (10dcaro) 05Open→03In progress [11:13:24] 10Toolforge (Toolforge iteration 03): [jobs-api,toolforge-deploy] allow using local harbor instance - https://phabricator.wikimedia.org/T355299 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/57 harbor: allow specifying the protocol [11:58:48] (PuppetConstantChange) resolved: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [12:08:26] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [12:08:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:08:58] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [12:09:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:10:28] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [12:10:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:11:02] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [12:11:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:15:40] 10Toolforge (Toolforge iteration 03): [jobs-api,toolforge-deploy] allow using local harbor instance - https://phabricator.wikimedia.org/T355299 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/178 jobs-api: use the local harbor for the local env [12:29:19] 10Cloud-VPS, 10cloud-services-team: Move Cloud VPS internal flows from cloud-hosts to cloud-private - https://phabricator.wikimedia.org/T355416 (10taavi) [12:29:47] 10Cloud-VPS, 10cloud-services-team: Move Cloud VPS internal flows from cloud-hosts to cloud-private - https://phabricator.wikimedia.org/T355416 (10taavi) [12:29:50] 10Cloud-VPS, 10cloud-services-team: Use cloud-private and cfssl certs for instance live migrations - https://phabricator.wikimedia.org/T355145 (10taavi) [12:29:53] 10Cloud-VPS, 10cloud-services-team: Move Cloud VPS internal flows from cloud-hosts to cloud-private - https://phabricator.wikimedia.org/T355416 (10taavi) a:05taavi→03None [12:30:45] 10Cloud-VPS, 10cloud-services-team: Move cloudcontrol memcached flows to cloud-private - https://phabricator.wikimedia.org/T355417 (10taavi) [12:33:10] 10Cloud-VPS, 10cloud-services-team: Move Galera clustering to cloud-private - https://phabricator.wikimedia.org/T355418 (10taavi) [12:51:03] 10Toolforge, 10cloud-services-team: tools-nfs-2 almost out of disk space (October 2023 edition) - https://phabricator.wikimedia.org/T349895 (10taavi) 05Open→03Resolved [13:00:11] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Epic, 10Goal: openstack eqiad1: introduce cloud-private and cloudlb - https://phabricator.wikimedia.org/T341060 (10taavi) [13:00:24] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Epic, 10Goal: openstack eqiad1: introduce cloud-private and cloudlb - https://phabricator.wikimedia.org/T341060 (10taavi) [14:23:44] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review: [apt-buildpack] Not sourcing /layers/fagiani_apt/apt/.profile.d/000_apt.sh - https://phabricator.wikimedia.org/T355214 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_... [15:38:07] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [15:38:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:38:38] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [15:38:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:40:10] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [15:40:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:40:42] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [15:40:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:53:42] 10Cloud-VPS, 10cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138 (10Andrew) 05Open→03Resolved >>! In T355138#9467520, @YonatanWMIL wrote: > I have stopped the daily service from adding more data to the DB to prevent it from filling up... [18:10:43] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Use API:EmailUser to send Emails to the users - https://phabricator.wikimedia.org/T338267 (10theprotonade) 05Open→03Resolved Merged [[ https://github.com/coderwassananmol/BUB2/pull/227 | PR here ]] [20:40:00] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [21:08:58] 10Cloud-VPS, 10cloud-services-team, 10Patch-For-Review: Move Galera clustering to cloud-private - https://phabricator.wikimedia.org/T355418 (10Andrew) check experimental [21:25:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [21:30:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [21:39:00] 10Cloud-VPS (Quota-requests): Quota increase for reading-web-staging - https://phabricator.wikimedia.org/T355453 (10Jdlrobson) [23:04:48] (PuppetFailure) firing: Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [23:04:59] 10cloud-services-team: PuppetFailure Puppet failure on cloudcontrol2004-dev:9100 - https://phabricator.wikimedia.org/T355458 (10phaultfinder) [23:24:48] (PuppetFailure) firing: (2) Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [23:24:54] 10cloud-services-team: PuppetFailure - https://phabricator.wikimedia.org/T355460 (10phaultfinder) [23:29:48] (PuppetFailure) firing: (2) Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [23:32:08] 10Cloud-VPS (Quota-requests): Quota increase for reading-web-staging - https://phabricator.wikimedia.org/T355453 (10Andrew) +1 sounds good but will probably not implement until Monday [23:32:30] 10Cloud-VPS, 10cloud-services-team: Move Galera clustering to cloud-private - https://phabricator.wikimedia.org/T355418 (10Andrew) Galera on codfw1dev is now using private addresses. Let's see if it stays happy over the weekend. [23:33:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [23:36:14] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [23:44:48] (PuppetFailure) resolved: (2) Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure