[00:37:56] FIRING: SystemdUnitDown: The service unit keystone_sync_keys_from_cloudcontrol1005.private.eqiad.wikimedia.cloud.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:42:56] FIRING: [3x] SystemdUnitDown: The service unit keystone_sync_keys_from_cloudcontrol1005.private.eqiad.wikimedia.cloud.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:52:56] FIRING: [4x] SystemdUnitDown: The service unit keystone_sync_keys_from_cloudcontrol1005.private.eqiad.wikimedia.cloud.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:02:56] FIRING: [5x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:12:56] FIRING: [5x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:22:56] FIRING: [4x] SystemdUnitDown: The service unit keystone_sync_keys_from_cloudcontrol1005.private.eqiad.wikimedia.cloud.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:32:56] FIRING: [3x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:32:56] FIRING: SystemdUnitDown: The systemd unit keystone_sync_keys_from_cloudcontrol1005.private.eqiad.wikimedia.cloud.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:33:01] 06cloud-services-team: SystemdUnitDown The systemd unit keystone_sync_keys_from_cloudcontrol1005.private.eqiad.wikimedia.cloud.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T391424 (10phaultfinder) 03NEW [02:37:56] FIRING: [3x] SystemdUnitDown: The systemd unit keystone_sync_keys_from_cloudcontrol1005.private.eqiad.wikimedia.cloud.service on node cloudcontrol1006 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:38:08] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T391425 (10phaultfinder) 03NEW [02:52:56] RESOLVED: [5x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:58:26] RESOLVED: [3x] SystemdUnitDown: The systemd unit keystone_sync_keys_from_cloudcontrol1005.private.eqiad.wikimedia.cloud.service on node cloudcontrol1006 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:15:56] FIRING: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:17:58] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [03:19:17] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) on deployment eqiad1 for all services [03:20:56] FIRING: [3x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:21:48] (03PS1) 10Andrew Bogott: Replace cloudcontrol1005 with cloudcontrol1011 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135149 (https://phabricator.wikimedia.org/T391413) [03:22:18] (03CR) 10Andrew Bogott: [C:03+2] upgrade_openstack_node: don't lock tables when backing up [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1133432 (owner: 10Andrew Bogott) [03:23:25] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [03:24:39] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) on deployment eqiad1 for all services [03:25:56] RESOLVED: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:26:14] (03Merged) 10jenkins-bot: upgrade_openstack_node: don't lock tables when backing up [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1133432 (owner: 10Andrew Bogott) [03:29:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [03:30:26] (03PS2) 10Andrew Bogott: Replace cloudcontrol1005 with cloudcontrol1011 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135149 (https://phabricator.wikimedia.org/T391413) [03:30:27] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.restart_openstack (exit_code=97) on deployment eqiad1 for all services [03:30:31] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [03:30:56] FIRING: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:31:49] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) on deployment eqiad1 for all services [03:35:56] RESOLVED: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:46:12] FIRING: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:48:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [03:48:05] 06cloud-services-team: NovafullstackSustainedFailures Novafullstack tests have been failing for more than 5hours in eqiad - https://phabricator.wikimedia.org/T391428 (10phaultfinder) 03NEW [03:50:56] RESOLVED: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:54:16] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [03:55:36] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) on deployment eqiad1 for all services [03:59:49] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [04:00:56] FIRING: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:05:56] RESOLVED: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:09:28] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services [04:15:32] (03CR) 10Andrew Bogott: [C:03+2] Replace cloudcontrol1005 with cloudcontrol1011 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135149 (https://phabricator.wikimedia.org/T391413) (owner: 10Andrew Bogott) [04:19:18] (03Merged) 10jenkins-bot: Replace cloudcontrol1005 with cloudcontrol1011 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135149 (https://phabricator.wikimedia.org/T391413) (owner: 10Andrew Bogott) [04:48:22] FIRING: [3x] HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [04:49:22] FIRING: [2x] HAProxyServiceUnavailable: HAProxy service designate-api_backend has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [04:49:27] 06cloud-services-team: HAProxyServiceUnavailable - https://phabricator.wikimedia.org/T391430 (10phaultfinder) 03NEW [05:03:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:04:52] RESOLVED: [3x] HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [05:04:52] RESOLVED: HAProxyServiceUnavailable: HAProxy service designate-api_backend has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [05:10:30] RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [05:13:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:13:56] FIRING: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [05:14:01] 06cloud-services-team: SystemdUnitDown The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T391431 (10phaultfinder) 03NEW [08:38:39] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for main branch [08:39:11] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for main branch [08:41:37] 06cloud-services-team: SystemdUnitDown The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T391431#10724793 (10aborrero) 05Open→03Resolved a:03aborrero some openstack API endpoints were briefly unavailable:... [08:43:42] 06cloud-services-team, 10Toolforge, 07Documentation, 07Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919#10724797 (10Addshore) So, touching on this from the context of https://gitlab.wikimedia.org/repos/cloud/t... [08:43:56] RESOLVED: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [08:45:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [08:55:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [08:57:11] 06cloud-services-team, 10Toolforge, 07Documentation, 07Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919#10724833 (10aborrero) >>! In T321919#10724797, @Addshore wrote: > It would likely also be trivialish to e... [09:19:07] (03update) 10raymond-ndibe: [jobs-api] move core logic to seperate core module [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/153 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) [09:20:15] (03update) 10raymond-ndibe: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/101 (https://phabricator.wikimedia.org/T359650) [09:20:33] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [09:22:09] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [09:22:34] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [09:22:42] (03approved) 10fnegri: [jobs-api] move core logic to seperate core module [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/153 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) (owner: 10raymond-ndibe) [09:25:19] (03update) 10raymond-ndibe: [jobs-api] move core logic to seperate core module [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/153 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) [09:26:12] (03approved) 10raymond-ndibe: [jobs-api] move core logic to seperate core module [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/153 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) [09:26:14] (03update) 10raymond-ndibe: [jobs-api] move core logic to seperate core module [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/153 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) [09:26:20] (03merge) 10raymond-ndibe: [jobs-api] move core logic to seperate core module [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/153 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) [09:26:21] (03update) 10raymond-ndibe: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/101 (https://phabricator.wikimedia.org/T359650) [09:26:22] (03update) 10raymond-ndibe: [jobs-api] move custom validations out of api models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/150 (https://phabricator.wikimedia.org/T389118) [09:28:54] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: jobs-api: bump to 0.0.365-20250409092629-77469f38 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/742 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) [09:29:50] 06cloud-services-team, 10Toolforge, 07Documentation, 07Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919#10724936 (10Addshore) >>! In T321919#10724833, @aborrero wrote: > I like this idea, and the semantics tha... [09:32:15] (03open) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [09:32:51] (03update) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [09:33:31] (03update) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [09:35:08] 06cloud-services-team, 10Toolforge, 07Documentation, 07Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919#10724963 (10aborrero) >>! In T321919#10724936, @Addshore wrote: >>>! In T321919#10724833, @aborrero wrote... [09:37:26] (03update) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [09:39:11] (03update) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [09:47:15] (03update) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [09:48:18] (03update) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [09:51:33] (03update) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [09:52:03] (03update) 10raymond-ndibe: [jobs-api] stream logs from all containers in all pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 (https://phabricator.wikimedia.org/T388274) [09:56:23] (03update) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [09:58:08] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [09:58:14] (03update) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [10:06:10] (03approved) 10fnegri: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) (owner: 10aborrero) [10:06:44] (03merge) 10aborrero: tofu-infra: add initial support for quotas [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/175 (https://phabricator.wikimedia.org/T371391) [10:06:51] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [10:07:35] (03approved) 10aborrero: Upgrade Kubernetes to 1.29 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/227 (https://phabricator.wikimedia.org/T362868) (owner: 10fnegri) [10:08:20] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [10:09:28] (03open) 10aborrero: codfw1dev: testlabs: bump floating IP quota to 3 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/176 (https://phabricator.wikimedia.org/T391325) [10:09:43] (03merge) 10fnegri: Upgrade Kubernetes to 1.29 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/227 (https://phabricator.wikimedia.org/T362868) [10:09:45] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [10:11:28] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.29 - https://phabricator.wikimedia.org/T362868#10725264 (10fnegri) [10:11:36] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.29 - https://phabricator.wikimedia.org/T362868#10725265 (10fnegri) 05In progress→03Resolved [10:12:22] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [10:16:14] (03approved) 10fnegri: codfw1dev: testlabs: bump floating IP quota to 3 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/176 (https://phabricator.wikimedia.org/T391325) (owner: 10aborrero) [10:17:41] (03merge) 10aborrero: codfw1dev: testlabs: bump floating IP quota to 3 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/176 (https://phabricator.wikimedia.org/T391325) [10:17:43] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [10:18:23] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [10:28:26] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [10:32:26] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.365-20250409092629-77469f38 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/742 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:32:27] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.365-20250409092629-77469f38 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/742 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:32:35] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.365-20250409092629-77469f38 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/742 (https://phabricator.wikimedia.org/T359804 https://phabricator.wikimedia.org/T390135) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [11:31:21] (03PS4) 10Arturo Borrero Gonzalez: wmcs.toolforge.add_k8s_node: add smarter default image [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 [11:43:09] (03CR) 10Arturo Borrero Gonzalez: "recheck" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 (owner: 10Arturo Borrero Gonzalez) [11:46:48] (03Merged) 10jenkins-bot: wmcs.toolforge.add_k8s_node: add smarter default image [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1135067 (owner: 10Arturo Borrero Gonzalez) [12:04:41] (03open) 10aborrero: networktests_infra: allocate and attach floating IPs to VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/3 (https://phabricator.wikimedia.org/T391325) [12:05:26] (03update) 10aborrero: networktests_infra: allocate and attach floating IPs to VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/3 (https://phabricator.wikimedia.org/T391325) [12:13:08] 06cloud-services-team, 10Cloud-VPS: gitlab ci: validate secrets settings in pipeline for tofu integration - https://phabricator.wikimedia.org/T391467 (10aborrero) 03NEW [12:13:25] 06cloud-services-team, 10Cloud-VPS: gitlab ci: validate secrets settings in pipeline for tofu integration - https://phabricator.wikimedia.org/T391467#10725685 (10aborrero) p:05Triage→03High [12:14:01] 06cloud-services-team, 10Cloud-VPS: gitlab ci: validate secrets settings in pipeline for tofu integration - https://phabricator.wikimedia.org/T391467#10725689 (10aborrero) [12:14:05] 06cloud-services-team, 10Cloud-VPS, 07Epic: tofu-infra: introduce additional gitlab-ci automation - https://phabricator.wikimedia.org/T370652#10725690 (10aborrero) [12:14:30] 06cloud-services-team, 10Cloud-VPS: gitlab ci: validate secrets settings in pipeline for tofu integration - https://phabricator.wikimedia.org/T391467#10725692 (10aborrero) [12:14:36] 06cloud-services-team, 10Toolforge: bootstrap Toolforge IaC automation - https://phabricator.wikimedia.org/T390057#10725693 (10aborrero) [12:14:42] 06cloud-services-team, 10Cloud-VPS: gitlab ci: validate secrets settings in pipeline for tofu integration - https://phabricator.wikimedia.org/T391467#10725694 (10taavi) Which credentials is that using in the first place? [12:16:02] 06cloud-services-team, 10Cloud-VPS: gitlab ci: validate secrets settings in pipeline for tofu integration - https://phabricator.wikimedia.org/T391467#10725700 (10aborrero) >>! In T391467#10725694, @taavi wrote: > Which credentials is that using in the first place? An user called `srv-networktests` @ codfw1dev... [12:19:53] 06cloud-services-team, 10Cloud-VPS: gitlab ci: validate secrets settings in pipeline for tofu integration - https://phabricator.wikimedia.org/T391467#10725723 (10aborrero) >>! In T391467#10725694, @taavi wrote: > Which credentials is that using in the first place? As we plan to expand and replicate this setup... [12:22:34] (03update) 10aborrero: networktests_infra: allocate and attach floating IPs to VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/3 (https://phabricator.wikimedia.org/T391325) [12:27:02] (03update) 10aborrero: networktests_infra: allocate and attach floating IPs to VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/3 (https://phabricator.wikimedia.org/T391325) [12:29:22] (03merge) 10aborrero: networktests_infra: allocate and attach floating IPs to VMs [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/3 (https://phabricator.wikimedia.org/T391325) [12:33:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:34:03] (03open) 10aborrero: networktests_infra: fix port reference [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/4 [12:43:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:03:20] (03PS1) 10Btullis: Add a dummy password for rsyncing mediawiki-dumps-legacy [labs/private] - 10https://gerrit.wikimedia.org/r/1135425 (https://phabricator.wikimedia.org/T390738) [13:04:07] (03CR) 10Btullis: [V:03+2 C:03+2] Add a dummy password for rsyncing mediawiki-dumps-legacy [labs/private] - 10https://gerrit.wikimedia.org/r/1135425 (https://phabricator.wikimedia.org/T390738) (owner: 10Btullis) [13:21:42] 06cloud-services-team, 10Cloud-VPS: gitlab ci: validate secrets settings in pipeline for tofu integration - https://phabricator.wikimedia.org/T391467#10726025 (10aborrero) >>! In T391467#10725700, @aborrero wrote: >>>! In T391467#10725694, @taavi wrote: >> Which credentials is that using in the first place? >... [13:27:59] 10cloud-services-team (FY2024/2025-Q3-Q4), 06DC-Ops, 10ops-eqiad: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10726042 (10fnegri) 05In progress→03Stalled [13:30:56] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Toolforge: If the inactive clouddumps host goes down, it causes a ripple effect on Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T391369#10726068 (10fnegri) @Andrew do you have any thoughts on this? I think ideally we would find a way to... [13:38:48] (03PS1) 10Btullis: Rename mediawiki-dumps-legacy rsync password [labs/private] - 10https://gerrit.wikimedia.org/r/1135442 (https://phabricator.wikimedia.org/T390738) [13:39:34] (03approved) 10raymond-ndibe: [jobs-api] stream logs from all containers in all pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 (https://phabricator.wikimedia.org/T388274) [13:42:02] (03CR) 10Btullis: [V:03+2 C:03+2] Rename mediawiki-dumps-legacy rsync password [labs/private] - 10https://gerrit.wikimedia.org/r/1135442 (https://phabricator.wikimedia.org/T390738) (owner: 10Btullis) [13:49:49] 06cloud-services-team: Move WMCS servers out of eqiad row B - https://phabricator.wikimedia.org/T330479#10726140 (10Andrew) 05Open→03Resolved a:03Andrew None of the servers listed here are racked anymore, so I think this can be closed. [13:54:51] (03update) 10fnegri: tools-db DNS: update comments and descriptions [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/144 (https://phabricator.wikimedia.org/T352206 https://phabricator.wikimedia.org/T374953) [13:54:54] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Toolforge: If the inactive clouddumps host goes down, it causes a ripple effect on Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T391369#10726161 (10Andrew) This is because we keep the primary and secondary hosts mounted at the same time... [14:05:15] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Toolforge: If the inactive clouddumps host goes down, it causes a ripple effect on Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T391369#10726218 (10fnegri) > This is because we keep the primary and secondary hosts mounted at the same ti... [14:13:15] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Data-Services: tofu-infra: replace wmcs-wikireplica-dns.py with tofu - https://phabricator.wikimedia.org/T374953#10726265 (10fnegri) [14:13:16] 06cloud-services-team, 10Cloud-VPS: [designate] migrate DNS records in noauth-project to cloudinfra project - https://phabricator.wikimedia.org/T380484#10726266 (10fnegri) [14:16:15] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate "eqiad.wmflabs." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380488#10726287 (10fnegri) 05Open→03In progress a:03fnegri [14:21:09] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate "eqiad.wmflabs." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380488#10726319 (10fnegri) ` fnegri@cloudcontrol1011:~$ sudo wmcs-openstack zone transfer request create --sudo-project-id noauth-project --target-project-id clou... [14:28:41] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Update scripts that are referencing "noauth-project" for Designate - https://phabricator.wikimedia.org/T391486 (10fnegri) 03NEW [14:29:59] 06cloud-services-team, 10Cloud-VPS: [designate] migrate DNS records in noauth-project to cloudinfra project - https://phabricator.wikimedia.org/T380484#10726367 (10fnegri) [14:30:07] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate "eqiad.wmflabs." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380488#10726368 (10fnegri) 05In progress→03Resolved [14:34:49] 06cloud-services-team, 10Toolforge: bootstrap Toolforge IaC automation - https://phabricator.wikimedia.org/T390057#10726420 (10Chuckonwumelu) [14:39:25] (03update) 10chuckonwumelu: Draft: Start [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/1 [14:44:35] 06cloud-services-team: Move WMCS servers out of eqiad row B - https://phabricator.wikimedia.org/T330479#10726455 (10ayounsi) [14:45:12] (03merge) 10aborrero: networktests_infra: fix port reference [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/4 [14:58:38] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [15:03:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:09:12] (03open) 10aborrero: gitlab-ci: remap env vars for codfw1dev [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/5 (https://phabricator.wikimedia.org/T391325) [15:13:30] (03update) 10aborrero: gitlab-ci: remap env vars for codfw1dev [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/5 (https://phabricator.wikimedia.org/T391325) [15:13:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:13:50] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate "svc.eqiad.wmflabs." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380490#10726516 (10fnegri) 05Open→03In progress [15:15:48] (03update) 10aborrero: gitlab-ci: remap env vars for codfw1dev [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/5 (https://phabricator.wikimedia.org/T391325) [15:16:55] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate "svc.eqiad.wmflabs." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380490#10726528 (10fnegri) ` fnegri@cloudcontrol1011:~$ sudo wmcs-openstack zone transfer request create --sudo-project-id noauth-project --target-project-id... [15:17:32] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate "svc.eqiad.wmflabs." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380490#10726530 (10fnegri) 05In progress→03Resolved a:03fnegri [15:19:02] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate web.db.svc.eqiad.wmflabs. and analytics.db.svc.eqiad.wmflabs. to cloudinfra project - https://phabricator.wikimedia.org/T380493#10726547 (10fnegri) 05Open→03In progress a:03fnegri [15:21:53] (03update) 10aborrero: gitlab-ci: remap env vars for codfw1dev [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/5 (https://phabricator.wikimedia.org/T391325) [15:22:37] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate web.db.svc.eqiad.wmflabs. and analytics.db.svc.eqiad.wmflabs. to cloudinfra project - https://phabricator.wikimedia.org/T380493#10726560 (10fnegri) ` fnegri@cloudcontrol1011:~$ sudo wmcs-openstack zone transfer request create --sudo-project-id noa... [15:30:53] (03merge) 10aborrero: gitlab-ci: remap env vars for codfw1dev [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/5 (https://phabricator.wikimedia.org/T391325) [15:38:14] 06cloud-services-team, 10Cloud-VPS: Migrate "16.172.in-addr.arpa." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380495#10726636 (10fnegri) ` fnegri@cloudcontrol1011:~$ sudo wmcs-openstack zone transfer request create --sudo-project-id noauth-project --target-project-id cloudinfra 16.172.... [15:38:48] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate "16.172.in-addr.arpa." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380495#10726638 (10fnegri) 05Open→03In progress a:03fnegri [15:41:03] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 10Toolforge: If the inactive clouddumps host goes down, it causes a ripple effect on Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T391369#10726650 (10Andrew) My recollection is that we hard-mount nfs servers to prevent data corruption but... [15:44:41] (03open) 10raymond-ndibe: [jobs-cli] health_check and quota refactor [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/97 (https://phabricator.wikimedia.org/T389118) [15:45:06] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [15:45:17] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] (move_most_custom_validations_out_of_api_models) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [15:45:36] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate "16.172.in-addr.arpa." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380495#10726692 (10fnegri) [15:46:05] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate "16.172.in-addr.arpa." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380495#10726693 (10fnegri) I verified that creating a new VM, the PTR record is created correctly in the `cloudinfra` project. [15:46:16] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate "16.172.in-addr.arpa." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380495#10726694 (10fnegri) 05In progress→03Resolved [15:48:22] (03open) 10aborrero: README: refresh with instructions [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/6 (https://phabricator.wikimedia.org/T391325) [15:51:11] (03merge) 10aborrero: README: refresh with instructions [repos/cloud/cloud-vps/networktests-tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/networktests-tofu-provisioning/-/merge_requests/6 (https://phabricator.wikimedia.org/T391325) [16:26:30] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: Migrate web.db.svc.eqiad.wmflabs. and analytics.db.svc.eqiad.wmflabs. to cloudinfra project - https://phabricator.wikimedia.org/T380493#10726783 (10fnegri) 05In progress→03Resolved [16:26:51] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Update scripts that are referencing "noauth-project" for Designate - https://phabricator.wikimedia.org/T391486#10726792 (10fnegri) 05Open→03In progress p:05Triage→03High [16:29:09] (03CR) 10Arendpieter: [C:03+1] Switch username validation to Bitu API [labs/striker] - 10https://gerrit.wikimedia.org/r/1134724 (https://phabricator.wikimedia.org/T364605) (owner: 10Arendpieter) [16:29:16] (03CR) 10Arendpieter: [C:03+1] "Acknowledged" [labs/striker] - 10https://gerrit.wikimedia.org/r/1134724 (https://phabricator.wikimedia.org/T364605) (owner: 10Arendpieter) [16:30:18] (03CR) 10Majavah: "recheck" [labs/striker] - 10https://gerrit.wikimedia.org/r/1134724 (https://phabricator.wikimedia.org/T364605) (owner: 10Arendpieter) [16:32:28] (03CR) 10CI reject: [V:04-1] Switch username validation to Bitu API [labs/striker] - 10https://gerrit.wikimedia.org/r/1134724 (https://phabricator.wikimedia.org/T364605) (owner: 10Arendpieter) [16:33:50] 10Striker, 13Patch-For-Review: Add Bitu container to Striker development environment - https://phabricator.wikimedia.org/T362318#10726823 (10Arendpieter) @SLyngshede-WMF Thank you. Why does this issue block T364605? I tried in [this patch](https://gerrit.wikimedia.org/r/c/labs/striker/+/1134724) to set `BITU_U... [16:37:49] (03CR) 10Majavah: [C:04-1] Add Bitu container (034 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/1035718 (https://phabricator.wikimedia.org/T362318) (owner: 10Slyngshede) [16:42:01] (03PS1) 10Btullis: Revert "Add a dummy password for rsyncing mediawiki-dumps-legacy" [labs/private] - 10https://gerrit.wikimedia.org/r/1135472 [16:42:09] (03CR) 10Btullis: [V:03+2 C:03+2] Revert "Add a dummy password for rsyncing mediawiki-dumps-legacy" [labs/private] - 10https://gerrit.wikimedia.org/r/1135472 (owner: 10Btullis) [16:43:18] (03PS2) 10Btullis: Revert "Add a dummy password for rsyncing mediawiki-dumps-legacy" [labs/private] - 10https://gerrit.wikimedia.org/r/1135472 [16:43:28] (03CR) 10Btullis: [V:03+2] Revert "Add a dummy password for rsyncing mediawiki-dumps-legacy" [labs/private] - 10https://gerrit.wikimedia.org/r/1135472 (owner: 10Btullis) [17:33:57] (03PS1) 10Josefanthony: Create documentation instruction on windows setup Bug: T390421 [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1135476 (https://phabricator.wikimedia.org/T390421) [17:40:26] (03CR) 10Josefanthony: "Kindly review the submitted patch for T390421. Thanks" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1135476 (https://phabricator.wikimedia.org/T390421) (owner: 10Josefanthony) [18:33:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:45:30] (03CR) 10Dzahn: [C:03+2] community_civicrm: add stub for dovecot_passwd [labs/private] - 10https://gerrit.wikimedia.org/r/1124204 (https://phabricator.wikimedia.org/T383715) (owner: 10Dwisehaupt) [18:45:31] (03CR) 10Dzahn: [V:03+2 C:03+2] community_civicrm: add stub for dovecot_passwd [labs/private] - 10https://gerrit.wikimedia.org/r/1124204 (https://phabricator.wikimedia.org/T383715) (owner: 10Dwisehaupt) [19:28:52] 06cloud-services-team, 10Cloud-VPS, 10Tool-spacemedia, 10video2commons, 07Upstream: Cloud Services shared IP (static NAT for external communications) often rate limited by YouTube for video downloads - https://phabricator.wikimedia.org/T236446#10727517 (10bvibber) YouTube terms of service don't appear to... [19:33:23] (03PS5) 10Arendpieter: Switch username validation to Bitu API [labs/striker] - 10https://gerrit.wikimedia.org/r/1134724 (https://phabricator.wikimedia.org/T364605) [19:43:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:29:37] 10Tool-inteGraality: Missing a space in the query - https://phabricator.wikimedia.org/T391523 (10Soylacarli) 03NEW [20:30:34] 10Tool-inteGraality: Missing a space in the query - https://phabricator.wikimedia.org/T391523#10727672 (10Soylacarli) [20:42:01] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [labs/tools/mostvisitedarticle] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1135492 [20:42:01] (03CR) 10QChris: [V:03+2 C:03+2] Allow “Gerrit Managers” to import history [labs/tools/mostvisitedarticle] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1135492 (owner: 10QChris) [20:42:51] (03PS1) 10QChris: Import done. Revoke import grants [labs/tools/mostvisitedarticle] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1135493 [20:42:51] (03CR) 10QChris: [V:03+2 C:03+2] Import done. Revoke import grants [labs/tools/mostvisitedarticle] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1135493 (owner: 10QChris) [20:49:19] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [labs/tools/WdTmCollab] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1135494 [20:49:19] (03CR) 10QChris: [V:03+2 C:03+2] Allow “Gerrit Managers” to import history [labs/tools/WdTmCollab] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1135494 (owner: 10QChris) [20:49:52] (03PS1) 10QChris: Import done. Revoke import grants [labs/tools/WdTmCollab] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1135495 [20:49:52] (03CR) 10QChris: [V:03+2 C:03+2] Import done. Revoke import grants [labs/tools/WdTmCollab] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1135495 (owner: 10QChris) [21:51:48] FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on cloudcephmon1004:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [21:52:11] 06cloud-services-team, 10Cloud-VPS: Options/thoughts for faster VM provisioning - https://phabricator.wikimedia.org/T390822#10727945 (10Andrew) Three things! 1) This patch should greatly speed up VM creation, cutting maybe 90 seconds or so from each launch. https://gerrit.wikimedia.org/r/c/operations/puppet/... [22:31:48] RESOLVED: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on cloudcephmon1004:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [22:51:11] 06cloud-services-team, 10Toolforge: become command not working properly on Toolforge - https://phabricator.wikimedia.org/T391538 (10Ykhwong) 03NEW [22:56:40] 06cloud-services-team, 10Cloud-VPS, 10Tool-spacemedia, 10video2commons, 07Upstream: Cloud Services shared IP (static NAT for external communications) often rate limited by YouTube for video downloads - https://phabricator.wikimedia.org/T236446#10728114 (10Jeff_G) @bvibber The tasks are submitted manually... [23:11:51] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:18:15] 06cloud-services-team, 10Toolforge: `become` command not working properly on login-buster.toolforge.org - https://phabricator.wikimedia.org/T391538#10728199 (10bd808) [23:21:09] 06cloud-services-team, 10Toolforge: `become` command not working properly on login-buster.toolforge.org - https://phabricator.wikimedia.org/T391538#10728217 (10bd808) Load average on tools-sgebastion-10 (login-buster.toolforge.org) is 24. My guess is that the NFS connection for the tool home directories is mes... [23:26:31] 06cloud-services-team, 10Toolforge: `become` command not working properly on login-buster.toolforge.org - https://phabricator.wikimedia.org/T391538#10728226 (10bd808) `shutdown -r now` reboot was unresponsive so I did a hard reboot via Horizon. The instance's console log also had: ` [185770.523700] Memory cgro... [23:26:51] RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:27:51] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:29:50] 06cloud-services-team, 10Toolforge: `become` command not working properly on login-buster.toolforge.org - https://phabricator.wikimedia.org/T391538#10728233 (10Ykhwong) Thanks so much, it's working perfectly now! Really appreciate your help and quick support. [23:32:17] 06cloud-services-team, 10Toolforge: `become` command not working properly on login-buster.toolforge.org - https://phabricator.wikimedia.org/T391538#10728235 (10bd808) 05Open→03Resolved a:03bd808 [23:32:51] RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown