[00:06:31] 10Toolforge (Software install/update): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10bd808) >>! In T311466#9392107, @Hawkeye7 wrote: > When I try to use the admin console (https://toolsadmin.wikimedia.org/tools/id/milhistbot/repos/create) to create a new repos... [01:08:48] 10Cloud-VPS, 10cloud-services-team, 10PostgreSQL: Consider removing Postgres support from Trove - https://phabricator.wikimedia.org/T353018 (10rook) Of these I believe three are wmcs projects: `tf-infra-test`, `tools` and `toolsbeta` Many are listed as `ERROR` or `SHUTDOWN` for their `Status` or `Operating... [01:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [01:55:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [03:18:27] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:43:27] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [04:55:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [06:08:27] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:14:24] 10Grid-Engine-to-K8s-Migration: Migrate panoviewer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319953 (10tstarling) >>! In T319953#9392014, @Fuzheado wrote: >Therefore I have only recently noticed that it is at risk of being deleted. I don't think it is at risk of bein... [07:13:09] 10Cloud-VPS, 10cloud-services-team, 10PostgreSQL: Consider removing Postgres support from Trove - https://phabricator.wikimedia.org/T353018 (10Slst2020) `harbordb` and `tools-harbordb` are used by the toolforge buildservice harbor (open source registry) instances. Alas, harbor doesn't support any other db, s... [07:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [07:55:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:26:48] 10Toolforge (Toolforge iteration 02): [tbs][builds-cli] Fix tabulate issue - https://phabricator.wikimedia.org/T353043 (10Slst2020) [08:27:03] 10Toolforge (Toolforge iteration 02): [tbs][builds-cli] Fix tabulate issue - https://phabricator.wikimedia.org/T353043 (10Slst2020) 05Open→03In progress [08:30:18] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: [tbs][builds-cli] Fix tabulate issue - https://phabricator.wikimedia.org/T353043 (10CodeReviewBot) sstefanova opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/32 Fix: don't use colalign in tabulate [08:40:24] 10Toolforge (Toolforge iteration 02): [builds-cli][ci] Investigate discrepancy between different CI envs - https://phabricator.wikimedia.org/T353044 (10Slst2020) [08:54:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [08:59:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [09:02:54] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: [tbs][builds-cli] Fix tabulate issue - https://phabricator.wikimedia.org/T353043 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/32 Fix: don't use colalign in tabulate [09:11:08] 10Quarry: CSV files not being written in UTF-8 - https://phabricator.wikimedia.org/T353047 (10Novem_Linguae) [09:29:36] 10Quarry: CSV files not being written in UTF-8 - https://phabricator.wikimedia.org/T353047 (10SD0001) 05Open→03Invalid The text/csv response does specify chatset=utf-8. {F41574205} I couldn't reproduce this – the downloaded file looks alright to me: {F41574052} Please check if this is a problem with the lo... [09:31:28] 10Quarry: CSV files not being written in UTF-8 - https://phabricator.wikimedia.org/T353047 (10Novem_Linguae) Yeah I think you're right. It looks OK in Notepad++. Interesting! {F41574241} [09:31:32] !log admin dcaro@urcuchillay START - Cookbook wmcs.openstack.restart_openstack (T345084) [09:31:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:31:38] T345084: OpenStack API response time gets slower over time - https://phabricator.wikimedia.org/T345084 [09:32:03] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) (T345084) [09:32:05] !log admin dcaro@urcuchillay START - Cookbook wmcs.openstack.restart_openstack (T345084) [09:32:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:38:39] !log admin dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) (T345084) [09:38:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:38:45] T345084: OpenStack API response time gets slower over time - https://phabricator.wikimedia.org/T345084 [09:54:24] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [puppetmaster-02.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing to run - https://phabricator.wikimedia.org/T353048 (10dcaro) p:05Triage→03High [10:08:28] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [10:24:38] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: [tbs][builds-cli] Fix tabulate issue - https://phabricator.wikimedia.org/T353043 (10CodeReviewBot) sstefanova opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/34 d/changelog: bump to 0.0.8 [10:25:45] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: [tbs][builds-cli] Fix tabulate issue - https://phabricator.wikimedia.org/T353043 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/34 d/changelog: bump to 0.0.8 [10:46:01] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10User-dcaro: OpenStack API response time gets slower over time - https://phabricator.wikimedia.org/T345084 (10dcaro) I modified the limits a bit (trove -> 5s, nova-api -> 3s, others -> 1.5s), updated the dashboard too, they still fail from time to time, [10:53:40] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service (Beta release), 10User-Raymond_Ndibe, 10User-dcaro: Add a way to wait for a Toolforge build to finish - https://phabricator.wikimedia.org/T337043 (10Slst2020) My understanding is that 'waiting' on the start command to finish is now the default... [10:55:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [10:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [10:56:15] 10Cloud-VPS, 10cloud-services-team, 10PostgreSQL: Consider removing Postgres support from Trove - https://phabricator.wikimedia.org/T353018 (10dcaro) Is there an option to remove public usage of trove, instead of removing it altogether? (so we don't have to maintain our own postgres deployment for harbor). [10:57:41] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service (Beta release), 10User-Raymond_Ndibe, 10User-dcaro: Add a way to wait for a Toolforge build to finish - https://phabricator.wikimedia.org/T337043 (10dcaro) >>! In T337043#9392710, @Slst2020 wrote: > My understanding is that 'waiting' on the st... [11:03:28] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [11:03:41] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [11:08:31] 10Toolforge (Toolforge iteration 02): [tbs][builds-cli] Fix tabulate issue - https://phabricator.wikimedia.org/T353043 (10Slst2020) 05In progress→03Resolved [11:17:00] 10Toolforge (Toolforge iteration 02): [ci] Add shellcheck to pre-commit where missing - https://phabricator.wikimedia.org/T353052 (10Slst2020) [11:42:09] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: Add `toolforge build quota` command - https://phabricator.wikimedia.org/T341068 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/144 builds-api: bump to 0.0.112-20231207081230-2d86... [11:42:15] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: [apis] nginx fails to reload on config change - https://phabricator.wikimedia.org/T350928 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/144 builds-api: b... [11:42:22] 10Toolforge (Toolforge iteration 02): [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092 (10Slst2020) [11:42:58] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: Add `toolforge build quota` command - https://phabricator.wikimedia.org/T341068 (10Slst2020) 05In progress→03Resolved [11:44:45] 10Toolforge (Toolforge iteration 02): [tbs][builds-api] Refactor `internal/builds.go` - https://phabricator.wikimedia.org/T352762 (10Slst2020) 05In progress→03Open a:05Slst2020→03None unassigning myself for now as I won't have the time to do it until after the holidays [11:54:54] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [etcd-discovery-1.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing for a month - https://phabricator.wikimedia.org/T353055 (10dcaro) p:05Triage→03High [11:55:03] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [etcd-discovery-1.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing for a month - https://phabricator.wikimedia.org/T353055 (10dcaro) 05Open→03In progress [11:55:21] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [etcd-discovery-1.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing for a month - https://phabricator.wikimedia.org/T353055 (10dcaro) The puppetmaster was changed here... [11:57:32] !log admin dcaro@urcuchillay START - Cookbook wmcs.vps.refresh_puppet_certs on etcd-discovery-1.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud [11:57:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:57:59] !log admin dcaro@urcuchillay END (ERROR) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=97) on etcd-discovery-1.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud [11:58:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:58:33] !log admin dcaro@urcuchillay START - Cookbook wmcs.vps.refresh_puppet_certs on etcd-discovery-1.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud (T353055) [11:58:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:58:38] T353055: [etcd-discovery-1.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing for a month - https://phabricator.wikimedia.org/T353055 [12:00:28] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on etcd-discovery-1.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud (T353055) [12:00:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:01:52] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [etcd-discovery-1.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing for a month - https://phabricator.wikimedia.org/T353055 (10dcaro) 05In progress→03Resolved [12:43:42] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10User-dcaro: OpenStack API response time gets slower over time - https://phabricator.wikimedia.org/T345084 (10dcaro) There's an issue with trove related to requests, as from cloudlb1002 we don't get almost any, the 12h mean gets very skewed. Will have... [12:46:28] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [acme-chief-2.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing - https://phabricator.wikimedia.org/T353056 (10dcaro) p:05Triage→03High [12:46:31] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [acme-chief-2.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing - https://phabricator.wikimedia.org/T353056 (10dcaro) 05Open→03In progress [12:55:40] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [acme-chief-2.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing - https://phabricator.wikimedia.org/T353056 (10dcaro) # The problem is that we removed the support wit... [12:57:44] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [acme-chief-2.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing - https://phabricator.wikimedia.org/T353056 (10dcaro) Yep, that was it: https://gerrit.wikimedia.org/r... [12:57:49] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [acme-chief-2.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud] puppet failing - https://phabricator.wikimedia.org/T353056 (10dcaro) 05In progress→03Resolved [13:51:11] 10Toolforge (Toolforge iteration 02): [maintain-harbor] Improvements to subcommands and config validation - https://phabricator.wikimedia.org/T353059 (10Slst2020) [13:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [13:55:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:08:28] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [14:21:20] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: [maintain-harbor] Manage project quotas via maintain-harbor - https://phabricator.wikimedia.org/T352417 (10Slst2020) 05In progress→03Open tested in prod: there's unfortunately no way to get/set the default quota without giving the maintain-harbor us... [14:21:22] 10Toolforge (Toolforge iteration 02): [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092 (10Slst2020) [14:29:16] 10Grid-Engine-to-K8s-Migration: Migrate wikihistory from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320157 (10nskaggs) @Wurgl it's hard for me to know exactly what you are seeing, but you have the basic idea correct. Pick a buildpack for one of the languages you use, and us... [14:35:22] 10Cloud-VPS (Quota-requests): Please delete meet and chat VPS projects - https://phabricator.wikimedia.org/T352727 (10dcaro) a:05Slst2020→03dcaro [14:38:29] 10Grid-Engine-to-K8s-Migration: Migrate wikihistory from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320157 (10dcaro) > Question 2: In the persondata-Tool I start my jobs with option --image php7.4 or --image php8.2 Do I simple use one of these options too? Or do I need to a... [14:42:09] vivian-rook closed https://github.com/toolforge/paws/pull/358 [14:42:14] 10PAWS, 10OpenRefine: Upgrade OpenRefine on PAWS to 3.7.7. - https://phabricator.wikimedia.org/T353021 (10rook) 05Open→03Resolved a:03rook [14:42:19] 10PAWS, 10OpenRefine: Upgrade OpenRefine on PAWS to 3.7.7. - https://phabricator.wikimedia.org/T353021 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/358 [15:05:56] 10Grid-Engine-to-K8s-Migration: Migrate wikihistory from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320157 (10Wurgl) Sorry, I am too dumb to understand what I should do. Source for the C#-Program of Wikihistory can be found at https://persondata.toolforge.org/WikiHistory_sr... [15:23:04] 10Cloud-VPS, 10cloud-services-team, 10PostgreSQL: Consider removing Postgres support from Trove - https://phabricator.wikimedia.org/T353018 (10bd808) >>! In T353018#9392274, @rook wrote: > Of these I believe three are wmcs projects: `tf-infra-test`, `tools` and `toolsbeta` > > Many are listed as `ERROR` or... [15:31:59] 10PAWS: Repair codfw1dev deploy - https://phabricator.wikimedia.org/T353063 (10rook) [15:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:50:16] 10PAWS: update fcos version in PAWS - https://phabricator.wikimedia.org/T353070 (10rook) [16:50:27] 10PAWS: update fcos version in PAWS - https://phabricator.wikimedia.org/T353070 (10rook) [16:51:05] 10PAWS: Repair codfw1dev deploy - https://phabricator.wikimedia.org/T353063 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/359 [16:51:11] vivian-rook opened https://github.com/toolforge/paws/pull/359 [16:52:18] 10PAWS: Repair codfw1dev deploy - https://phabricator.wikimedia.org/T353063 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/359 [16:52:20] 10PAWS: Repair codfw1dev deploy - https://phabricator.wikimedia.org/T353063 (10rook) 05Open→03Resolved [16:52:28] vivian-rook closed https://github.com/toolforge/paws/pull/359 [16:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [16:56:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [17:01:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:08:43] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [18:14:02] 10Grid-Engine-to-K8s-Migration, 10Pywikibot: Migrate pywikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319981 (10komla) >>! In T319981#9392100, @Xqt wrote: > @komla: I have no glue, what to do here. @Xqt pywikibot is [[ https://grid-deprecation.toolforge.org/t/py... [18:34:21] 10Cloud-VPS: enable lists.wikimedia.org or wikimedia.org email addresses to receive dmarc reports for *.wmflabs.org - https://phabricator.wikimedia.org/T352902 (10jhathaway) @herron & @jsn.sherman what do you think about the following: ==== Email ==== * RFC5321.MailFrom: [[mailto:bounce@wmflabs.org|bounce@wmfl... [18:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:38:05] 10Cloud-VPS, 10cloud-services-team, 10PostgreSQL: Consider removing Postgres support from Trove - https://phabricator.wikimedia.org/T353018 (10Andrew) >>! In T353018#9392712, @dcaro wrote: > Is there an option to remove public usage of trove, instead of removing it altogether? (so we don't have to maintain o... [19:43:28] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [19:57:54] (03PS1) 10Eevans: keys & certs for missing restbase nodes [labs/private] - 10https://gerrit.wikimedia.org/r/981601 (https://phabricator.wikimedia.org/T352468) [20:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [20:07:17] (03PS2) 10Eevans: add keys & certs for missing restbase nodes [labs/private] - 10https://gerrit.wikimedia.org/r/981601 (https://phabricator.wikimedia.org/T352468) [20:09:56] (03PS3) 10Eevans: restbase: add missing keys & certs, remove obsolete [labs/private] - 10https://gerrit.wikimedia.org/r/981601 (https://phabricator.wikimedia.org/T352468) [20:20:51] 10PAWS: update fcos version in PAWS in codfw1dev - https://phabricator.wikimedia.org/T353070 (10rook) [20:21:34] 10PAWS: update fcos version in PAWS in eqiad1 - https://phabricator.wikimedia.org/T353077 (10rook) [20:21:44] 10PAWS: update fcos version in PAWS in eqiad1 - https://phabricator.wikimedia.org/T353077 (10rook) [20:21:46] 10PAWS: update fcos version in PAWS in codfw1dev - https://phabricator.wikimedia.org/T353070 (10rook) [20:23:41] 10PAWS: update fcos version in PAWS in eqiad1 - https://phabricator.wikimedia.org/T353077 (10rook) [20:58:49] 10PAWS: update fcos version in PAWS in codfw1dev - https://phabricator.wikimedia.org/T353070 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/360 [20:58:54] vivian-rook opened https://github.com/toolforge/paws/pull/360 [20:59:08] 10PAWS: update fcos version in PAWS in codfw1dev - https://phabricator.wikimedia.org/T353070 (10rook) [21:01:30] 10PAWS: update fcos version in PAWS in codfw1dev - https://phabricator.wikimedia.org/T353070 (10rook) 05Open→03Resolved [21:01:32] 10PAWS: update fcos version in PAWS in eqiad1 - https://phabricator.wikimedia.org/T353077 (10rook) [21:01:34] 10PAWS: update fcos version in PAWS in codfw1dev - https://phabricator.wikimedia.org/T353070 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/360 [21:01:38] vivian-rook closed https://github.com/toolforge/paws/pull/360 [21:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [22:55:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [23:00:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [23:40:20] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [23:43:28] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:45:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable