[00:15:28] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:20:28] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:52:27] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [builds-api,jobs-api,envvars-api,api-gateway] Figure out and document how to do non-backwards compatible changes - https://phabricator.wikimedia.org/T356974#9705635 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforg... [05:40:25] 10Quarry: Error 500 when clicking "stop query" - https://phabricator.wikimedia.org/T362213#9705788 (10SD0001) I don't think that's the issue. We persist the db process id in the query_run table, so even a different pod is able to execute KILL on the db to get the query to stop. The issue I suspect is that... [06:17:34] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic: Set up a bitu instance for codfw1dev - https://phabricator.wikimedia.org/T360795#9705799 (10SLyngshede-WMF) Just to ensure that everyone agrees on what we need. This will be one Ganeti VM, running Bitu, and an LDAP instance. Do we need to have do a da... [06:49:55] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic: Set up a bitu instance for codfw1dev - https://phabricator.wikimedia.org/T360795#9705826 (10MoritzMuehlenhoff) The labtest LDAP currently runs on cloudservices2004/2005-dev. I think we have two options: Either a separate Ganeti VM for the labtest/Bitu... [08:05:56] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent - https://phabricator.wikimedia.org/T362233#9705907 (10dcaro) > we evaluate dropping kyverno in favor of VAPs Can we add an option where this is "drop kyverno" instead? [08:07:19] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent - https://phabricator.wikimedia.org/T362233#9705910 (10aborrero) >>! In T362233#9705907, @dcaro wrote: >> we evaluate dropping kyverno in favor of VAPs > > Can we add an option where this is "drop kyverno" instead? This is very spe... [08:08:30] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299 (10Slst2020) 03NEW [08:14:21] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent - https://phabricator.wikimedia.org/T362233#9705940 (10dcaro) >>! In T362233#9705910, @aborrero wrote: >>>! In T362233#9705907, @dcaro wrote: >>> we evaluate dropping kyverno in favor of VAPs >> >> Can we add an option where this is... [08:28:58] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9705965 (10dcaro) For the deployment we can reuse the [[ https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/blob/main/deployment/chart/templates/ngi... [08:33:43] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9705982 (10dcaro) p:05Triage→03High [08:34:13] 10Toolforge (Toolforge iteration 08): remove "File log:" column from toolforge jobs list -o long output - https://phabricator.wikimedia.org/T361896#9705983 (10dcaro) p:05Triage→03Low [08:37:21] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [08:37:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [08:37:39] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder [08:37:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [08:38:06] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [08:38:07] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services, 05Goal: [toolsdb] Upgrade to MariaDB 10.6 - https://phabricator.wikimedia.org/T352206#9705991 (10fnegri) a:03fnegri [08:38:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [08:38:35] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [08:38:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [08:39:19] (03PS1) 10Majavah: tools: Allow configuring webservice domain [labs/striker] - 10https://gerrit.wikimedia.org/r/1018938 [08:39:19] (03PS1) 10Majavah: tools: Generate toolinfo prefix from web domain [labs/striker] - 10https://gerrit.wikimedia.org/r/1018939 [08:41:33] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [08:41:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:42:10] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [08:42:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:46:27] (03CR) 10David Caro: [C:03+2] ceph: use timedelta instead of integers [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990975 (owner: 10David Caro) [08:47:41] (03CR) 10Majavah: [C:03+1] "not a huge fan of the inner function, but I think it's fine here" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990976 (owner: 10David Caro) [08:48:32] (03CR) 10Majavah: "can this validate that the cluster name matches the given node names?" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990977 (owner: 10David Caro) [08:49:16] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9706001 (10Slst2020) >>! In T362299#9705965, @dcaro wrote: > For the deployment we can reuse the [[ https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway... [08:49:42] (03CR) 10Majavah: [C:03+1] ceph: add missing cumin params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990979 (owner: 10David Caro) [08:50:28] (03Merged) 10jenkins-bot: ceph: use timedelta instead of integers [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990975 (owner: 10David Caro) [08:55:20] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services, 05Goal: [toolsdb] Upgrade to MariaDB 10.6 - https://phabricator.wikimedia.org/T352206#9706026 (10Marostegui) I would also encourage migrate to Bookworm too :) [08:55:24] (03PS1) 10Majavah: labsauth: Update UI labels to use 'developer account' term [labs/striker] - 10https://gerrit.wikimedia.org/r/1018942 [08:57:10] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9706048 (10Slst2020) [09:00:39] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9706054 (10dcaro) >>! In T362299#9706001, @Slst2020 wrote: >>>! In T362299#9705965, @dcaro wrote: >> For the deployment we can reuse the [[ https://gitlab.wikimed... [09:06:22] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent - https://phabricator.wikimedia.org/T362233#9706061 (10aborrero) >>! In T362233#9705940, @dcaro wrote: > Because any option that ends without a 3rd party in the mid-run (it's actually just one k8s upgrade away!) is way better than an... [09:16:36] 10Quarry: Error 500 when clicking "stop query" - https://phabricator.wikimedia.org/T362213#9706071 (10taavi) >>! In T362213#9705787, @SD0001 wrote: > The issue I suspect is that `*.analytics.db.svc.eqiad.wmflabs` are LB endpoints behind which there could be multiple replicas (@taavi - would you be able to confir... [09:16:39] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent - https://phabricator.wikimedia.org/T362233#9706072 (10dcaro) >>! In T362233#9706061, @aborrero wrote: >>>! In T362233#9705940, @dcaro wrote: >> Because any option that ends without a 3rd party in the mid-run (it's actually just one... [09:18:41] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9706084 (10aborrero) >>! In T362299#9706054, @dcaro wrote: >>>! In T362299#9706001, @Slst2020 wrote: >>>>! In T362299#9705965, @dcaro wrote: >>> For the deploymen... [09:19:01] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9706085 (10aborrero) [09:24:23] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9706093 (10dcaro) >>! In T362299#9706084, @aborrero wrote: >>>! In T362299#9706054, @dcaro wrote: >>>>! In T362299#9706001, @Slst2020 wrote: >>>>>! In T362299#970... [09:29:25] 10Toolforge: [component-api] Develop the webhook mechanism to trigger a deploment - https://phabricator.wikimedia.org/T362066#9706120 (10dcaro) [09:29:39] 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: 14[builds-api] Add triggering support - 14https://phabricator.wikimedia.org/T334587#9706118 (10dcaro) →14Duplicate dup:03T362066 [09:31:18] 10Toolforge: [component-api] add one-off, scheduled and continuous jobs support to the yaml + api - https://phabricator.wikimedia.org/T362075#9706144 (10dcaro) [09:31:27] 06cloud-services-team, 10Toolforge (Toolforge iteration 08), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: 14[builds-api,components-api] Automatically deploy the webservice when the image is built - 14https://phabricator.wikimedia.org/T341065#9706142 (10dcaro) →14Duplicate dup:03T... [09:36:53] 10Cloud-VPS (Quota-requests): owidm storage quota request - https://phabricator.wikimedia.org/T361895#9706180 (10Slst2020) +1 [09:37:04] 10Cloud-VPS (Quota-requests): owidm storage quota request - https://phabricator.wikimedia.org/T361895#9706183 (10aborrero) looks good to me. [09:51:57] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services, 05Goal: [toolsdb] test failover procedure - https://phabricator.wikimedia.org/T344719#9706264 (10fnegri) [09:51:57] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services, 05Goal: [toolsdb] Upgrade to MariaDB 10.6 - https://phabricator.wikimedia.org/T352206#9706263 (10fnegri) [09:53:26] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services, 05Goal: [toolsdb] Upgrade to MariaDB 10.6 - https://phabricator.wikimedia.org/T352206#9706279 (10fnegri) [09:56:53] (03PS2) 10Majavah: tools: Generate toolinfo prefix from web domain [labs/striker] - 10https://gerrit.wikimedia.org/r/1018939 [09:56:53] (03PS2) 10Majavah: labsauth: Update UI labels to use 'developer account' term [labs/striker] - 10https://gerrit.wikimedia.org/r/1018942 [09:57:59] (03CR) 10Majavah: [C:03+2] tools: Allow configuring webservice domain [labs/striker] - 10https://gerrit.wikimedia.org/r/1018938 (owner: 10Majavah) [10:00:12] (03CR) 10Majavah: [C:03+2] tools: Generate toolinfo prefix from web domain [labs/striker] - 10https://gerrit.wikimedia.org/r/1018939 (owner: 10Majavah) [10:00:34] (03Merged) 10jenkins-bot: tools: Allow configuring webservice domain [labs/striker] - 10https://gerrit.wikimedia.org/r/1018938 (owner: 10Majavah) [10:01:41] (03Merged) 10jenkins-bot: tools: Generate toolinfo prefix from web domain [labs/striker] - 10https://gerrit.wikimedia.org/r/1018939 (owner: 10Majavah) [10:13:03] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9706292 (10dcaro) > For the CI, we might want to use the same as builds-api for example, that does pre-commit + blubber test. That would mean that we have to add... [10:15:30] 10Striker: Archive Phabricator project tags for disabled tools - https://phabricator.wikimedia.org/T362313 (10taavi) 03NEW [10:16:00] 10Toolforge (Toolforge iteration 08): [cicd,infra] pre-cache all the pre-commit hooks - https://phabricator.wikimedia.org/T362314 (10dcaro) 03NEW [10:16:15] 10Toolforge (Toolforge iteration 08): [cicd,infra] pre-cache all the pre-commit hooks - https://phabricator.wikimedia.org/T362314#9706317 (10dcaro) 05Open→03In progress p:05Triage→03Medium [10:17:38] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [toolsdb] set gtid_domain_id to 0 - https://phabricator.wikimedia.org/T357341#9706328 (10fnegri) p:05Medium→03Low [10:22:57] (03PS3) 10Majavah: labsauth: Add field for SUL account ID [labs/striker] - 10https://gerrit.wikimedia.org/r/1009310 (https://phabricator.wikimedia.org/T359428) [10:22:57] (03PS4) 10Majavah: labsauth: Store SUL user ID like username [labs/striker] - 10https://gerrit.wikimedia.org/r/1009311 (https://phabricator.wikimedia.org/T359428) [10:38:41] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: 14cumin and cloud-vps instances not working - 14https://phabricator.wikimedia.org/T347428#9706398 (10fnegri) 05Stalled→03Resolved 14I am marking this task as Resolved, as the commands in the description are now working fine both in cloudcumin1001... [10:40:44] 10Cloud-VPS: nginx /var/lib/nginx accidentaly mounted on tmpfs in WMCS - https://phabricator.wikimedia.org/T347432#9706408 (10fnegri) @Andrew do you still want to do a restart of nginx servers? Cumin is now working fine (though only thanks to your manually-applied patch, upstream Cumin is still broken). [10:42:11] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 10Cumin, 06Infrastructure-Foundations, 13Patch-For-Review: [cumin] [openstack] Openstack backend fails when project is not set - https://phabricator.wikimedia.org/T346453#9706412 (10fnegri) [10:45:19] 10Striker: Add Bitu container to Striker development environment - https://phabricator.wikimedia.org/T362318 (10taavi) 03NEW [10:55:04] 10Cloud-VPS: 14nginx /var/lib/nginx accidentaly mounted on tmpfs in WMCS - 14https://phabricator.wikimedia.org/T347432#9706520 (10taavi) 05Open→03Invalid 14I think this is moot at this point. [11:10:28] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent - https://phabricator.wikimedia.org/T362233#9706547 (10aborrero) >>! In T362233#9706072, @dcaro wrote: > > Please add an option in which we decide to drop kyverno with the 1.26 upgrade. > Please feel free to add it yourself :-) >... [11:22:06] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent - https://phabricator.wikimedia.org/T362233#9706574 (10dcaro) >>! In T362233#9706547, @aborrero wrote: >>>! In T362233#9706072, @dcaro wrote: >> >> Please add an option in which we decide to drop kyverno with the 1.26 upgrade. >> >... [11:40:12] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9706660 (10aborrero) >>! In T362299#9706093, @dcaro wrote: > > It also complicates debugging, as now your pods will potentially run in different hosts, in differ... [12:17:55] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent - https://phabricator.wikimedia.org/T362233#9706790 (10dcaro) >>! In T362233#9706547, @aborrero wrote: >>>! In T362233#9706072, @dcaro wrote: >> >> Please add an option in which we decide to drop kyverno with the 1.26 upgrade. >> >... [12:23:51] 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent - https://phabricator.wikimedia.org/T362233#9706799 (10dcaro) [12:36:10] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Migrate Wikitech to Kubernetes - https://phabricator.wikimedia.org/T292707#9706839 (10jijiki) a:03jijiki [13:41:59] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [cicd,infra] pre-cache all the pre-commit hooks - https://phabricator.wikimedia.org/T362314#9707027 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/32 precommit: add code to open MRs to update pre... [13:54:47] (03CR) 10FNegri: [C:03+1] "LGTM!" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018244 (owner: 10Majavah) [13:57:54] (03CR) 10FNegri: openstack: cloudcontrol: Update for designate on cloudcontrols (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018243 (owner: 10Majavah) [13:59:14] (03CR) 10FNegri: [C:03+1] "LGTM" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018242 (owner: 10Majavah) [14:32:24] (03CR) 10Majavah: [C:03+2] openstack: cloudnet: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018242 (owner: 10Majavah) [14:34:16] (03CR) 10Majavah: openstack: cloudcontrol: Update for designate on cloudcontrols (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018243 (owner: 10Majavah) [14:36:27] (03Merged) 10jenkins-bot: openstack: cloudnet: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018242 (owner: 10Majavah) [14:37:08] (03PS4) 10Majavah: openstack: cloudcontrol: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018244 [14:37:09] (03PS4) 10Majavah: openstack: cloudcontrol: Update for designate on cloudcontrols [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018243 [14:37:39] (03CR) 10Majavah: [C:03+2] openstack: cloudcontrol: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018244 (owner: 10Majavah) [14:41:06] (03Merged) 10jenkins-bot: openstack: cloudcontrol: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018244 (owner: 10Majavah) [14:42:23] 06cloud-services-team, 10Striker, 10Data-Persistence-Backup, 06DBA, 13Patch-For-Review: Create a database for Striker test instance - https://phabricator.wikimedia.org/T360149#9707467 (10ABran-WMF) 05Resolved→03Open a:05ABran-WMF→03jcrespo [14:44:28] (03CR) 10Majavah: "just fyi, these cookbooks need to be eventually migrated from `wmcs_libs.alerts` to `spicerack.alertmanager`" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 (https://phabricator.wikimedia.org/T329709) (owner: 10David Caro) [14:54:06] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9707520 (10Andrew) 05Resolved→03Open Oops, 'puppetserver ca' is still not working on the puppet server (or perhaps it was working but... [15:29:18] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9707683 (10Slst2020) [15:31:25] 10Toolforge (Toolforge iteration 08): [api-gateway] Add a python server to serve consolidated openapi docs - https://phabricator.wikimedia.org/T362299#9707685 (10Slst2020) 05Open→03In progress [15:32:18] 10Toolforge (Toolforge iteration 08): 14[harbor, maintain-harbor] Harbor upgrade 2.10 breaks delete-stale-toolforge-artifacts cron job - 14https://phabricator.wikimedia.org/T361842#9707697 (10Slst2020) 05Resolved→03Invalid [15:32:29] 06cloud-services-team, 10Toolforge (Toolforge iteration 08), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: 14[builds-api,components-api] Automatically deploy the webservice when the image is built - 14https://phabricator.wikimedia.org/T341065#9707693 (10Slst2020) 05Duplicate→03Res... [15:42:57] 06cloud-services-team, 10Striker, 10Data-Persistence-Backup, 06DBA, 13Patch-For-Review: Create a database for Striker test instance - https://phabricator.wikimedia.org/T360149#9707733 (10jcrespo) There is some issues on the already provided user grants. I don't think we should create databases with under... [15:52:07] 06cloud-services-team, 10Striker, 10Data-Persistence-Backup, 06DBA, 13Patch-For-Review: Create a database for Striker test instance - https://phabricator.wikimedia.org/T360149#9707738 (10jcrespo) a:05jcrespo→03ABran-WMF Please see my last comment. Other than that, my work is done. [15:58:27] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic: Set up a bitu instance for codfw1dev - https://phabricator.wikimedia.org/T360795#9707776 (10Andrew) I'd prefer that it go on its own ganeti VM, as I'm trying to pare down on the total number if weird things that run on cloudservices. [15:59:08] 10Cloud-VPS: 14nginx /var/lib/nginx accidentaly mounted on tmpfs in WMCS - 14https://phabricator.wikimedia.org/T347432#9707781 (10Andrew) 14>>! In T347432#9706520, @taavi wrote: > I think this is moot at this point. yep, agreed [16:05:02] 06cloud-services-team, 10Striker, 10Data-Persistence-Backup, 06DBA, 13Patch-For-Review: Create a database for Striker test instance - https://phabricator.wikimedia.org/T360149#9707810 (10taavi) Oops - sorry about that. If you prefer a version without underscores, from my side it's totally fine to rename... [16:06:49] 06cloud-services-team, 10Striker, 10Data-Persistence-Backup, 06DBA, 13Patch-For-Review: Create a database for Striker test instance - https://phabricator.wikimedia.org/T360149#9707818 (10jcrespo) No need. I just wanted to warn the DBAs- althought you may find it interesting, as the last issue was with wi... [16:08:43] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node [16:15:42] 10Cloud Services Proposals, 06cloud-services-team, 10Toolforge: Decision Request - Toolforge policy agent - https://phabricator.wikimedia.org/T362233#9707854 (10fnegri) [16:22:54] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [16:23:37] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [16:26:56] (03CR) 10FNegri: openstack: cloudcontrol: Update for designate on cloudcontrols (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018243 (owner: 10Majavah) [16:31:49] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [16:34:20] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node [16:43:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:48:41] (CloudVPSDesignateLeaks) firing: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:50:17] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [16:52:36] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node [16:53:41] (CloudVPSDesignateLeaks) firing: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:58:41] (CloudVPSDesignateLeaks) resolved: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:06:31] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [17:12:26] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [17:13:01] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [17:13:41] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [17:23:48] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [17:23:55] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [17:29:39] (ProbeDown) firing: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:33:20] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [17:34:39] (ProbeDown) resolved: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:57:34] 10Toolforge: [infra,builds-builder] "failed to create fsnotify watcher: too many open files" - https://phabricator.wikimedia.org/T361519#9708237 (10bd808) `lang=shell-session tools.wikibugs-testing@tools-bastion-12:~$ toolforge build start --ref work/bd808/expect-phorge-api-errors https://gitlab.wikimedia.org/to... [18:05:26] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services: replace buster machines in devtools project - https://phabricator.wikimedia.org/T360964#9708245 (10Dzahn) regarding the puppetdb server - it was created in 2022 by @jbond and I couldn't remember why exactly we did it - to be able to run cumin i... [18:09:13] 10Toolforge: [infra,builds-builder] "failed to create fsnotify watcher: too many open files" - https://phabricator.wikimedia.org/T361519#9708247 (10bd808) >>! In T361519#9708237, @bd808 wrote: > 3rd retry (4th call overall) worked. This keeps faking me out. The 4th try failed deep into the build with ` [step-ex... [18:34:38] 06cloud-services-team, 10VPS-Projects, 06collaboration-services, 10Puppet (Puppet 7.0), 10Release-Engineering-Team (Now this 🫠): 14Update devtools project puppetmaster - 14https://phabricator.wikimedia.org/T360470#9708295 (10brennen) 05Open→03Resolved 14I changed `puppetmaster-1003` to `role::pu... [18:49:09] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [18:58:39] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [19:03:34] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node [19:16:57] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [19:17:47] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node [19:33:57] 10Quarry: store quarry state in object storage - https://phabricator.wikimedia.org/T360233#9708375 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/39 [19:34:06] vivian-rook opened https://github.com/toolforge/quarry/pull/39 [19:34:16] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [19:42:28] (03PS1) 10Andrea Denisse: Revert "ssl: Delete dummy TLS key for the Prometheus hosts" [labs/private] - 10https://gerrit.wikimedia.org/r/1018978 [19:42:44] (03CR) 10Andrea Denisse: [V:03+2 C:03+2] Revert "ssl: Delete dummy TLS key for the Prometheus hosts" [labs/private] - 10https://gerrit.wikimedia.org/r/1018978 (owner: 10Andrea Denisse) [19:46:14] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [19:47:01] 10Quarry: store quarry state in object storage - https://phabricator.wikimedia.org/T360233#9708412 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/39 [19:47:13] vivian-rook closed https://github.com/toolforge/quarry/pull/39 [19:48:47] 10Quarry: store quarry state in object storage - https://phabricator.wikimedia.org/T360233#9708415 (10rook) Docs in: https://wikitech.wikimedia.org/wiki/Help:Object_storage_user_guide#S3_API https://wikitech.wikimedia.org/wiki/Help:Using_OpenTofu_on_Cloud_VPS#State_management [19:50:05] 10Quarry: 14store quarry state in object storage - 14https://phabricator.wikimedia.org/T360233#9708416 (10rook) 05Open→03Resolved a:03rook [19:58:37] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [19:59:34] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:00:08] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [20:01:38] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:02:11] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [20:02:53] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:03:29] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [20:04:43] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:05:17] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [20:16:28] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-test-k8s-etcd-28 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [20:17:23] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:17:57] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [20:18:19] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:27:35] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [20:27:53] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:38:14] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [20:39:20] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:41:28] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance toolsbeta-test-k8s-etcd-28 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [20:42:41] (CloudVPSDesignateLeaks) firing: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:49:17] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [20:52:41] (CloudVPSDesignateLeaks) firing: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:56:28] (InstanceDown) firing: Project toolsbeta instance toolsbeta-test-k8s-etcd-27 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:57:41] (CloudVPSDesignateLeaks) resolved: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:01:28] (InstanceDown) resolved: Project toolsbeta instance toolsbeta-test-k8s-etcd-27 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:13:48] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [22:00:36] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [22:01:12] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [22:03:00] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [22:10:54] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [22:11:06] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [22:12:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:17:41] (CloudVPSDesignateLeaks) firing: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:18:56] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [22:22:41] (CloudVPSDesignateLeaks) resolved: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:36:37] 10Cloud-VPS (Quota-requests), 10VPS-project-devtools, 06collaboration-services, 10Release-Engineering-Team (Radar): 14Increase instance and volume quota in devtools project for puppetmaster upgrade - 14https://phabricator.wikimedia.org/T360823#9708712 (10brennen) [22:36:54] 06cloud-services-team, 10VPS-project-devtools, 06collaboration-services, 13Patch-For-Review, and 2 others: 14Update devtools project puppetmaster - 14https://phabricator.wikimedia.org/T360470#9708713 (10brennen) [22:54:52] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node [23:06:28] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [23:23:25] 10wikitech.wikimedia.org, 10DiscussionTools, 10Editing-team (Kanban Board), 10MW-1.42-notes (1.42.0-wmf.25; 2024-04-02), 07Verified: Page state routing triggers DiscussionTools warning, e.g. #!/deploycal/current - https://phabricator.wikimedia.org/T361322#9708828 (10Ryasmeen) [23:46:28] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-test-k8s-etcd-27 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure