[00:04:49] (TfInfraTestDestroyFailed) resolved: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:06:28] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:11:28] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:20:55] (PawsJupyterHubDown) firing: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown [00:23:25] Paws recovered quickly [00:25:55] (PawsJupyterHubDown) resolved: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown [00:34:31] (ToolsGridQueueProblem) firing: Grid queue webgrid-lighttpd@tools-sgeweblight-10-25.tools.eqiad1.wikimedia.cloud is in state E - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsGridQueueProblem - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsGridQueueProblem [01:52:57] 10Grid-Engine-to-K8s-Migration: Migrate yapperbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320195 (10Sj) Just asked on yapperbot's talk if it was down due to gridengine migration; I see it is. @komla are you around? [03:34:31] (ToolsGridQueueProblem) firing: Grid queue webgrid-lighttpd@tools-sgeweblight-10-25.tools.eqiad1.wikimedia.cloud is in state E - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsGridQueueProblem - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsGridQueueProblem [06:34:31] (ToolsGridQueueProblem) firing: Grid queue webgrid-lighttpd@tools-sgeweblight-10-25.tools.eqiad1.wikimedia.cloud is in state E - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsGridQueueProblem - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsGridQueueProblem [08:03:27] 10Cloud-VPS, 10Toolforge: Create a dataset of edits per Toolforge / Could VPS project - https://phabricator.wikimedia.org/T356029 (10Tgr) [08:05:23] 10Cloud-VPS, 10Toolforge: Create a dataset of edits per Toolforge / Could VPS project - https://phabricator.wikimedia.org/T356029 (10Tgr) (Hat tip to @Ainali for the idea.) [08:16:04] 10Toolforge, 10cloud-services-team, 10User-Raymond_Ndibe: add on-wiki edits of toolforge tools to toolviews report - https://phabricator.wikimedia.org/T317953 (10dcaro) This raised up also here {T356029} [08:51:00] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [09:28:44] (03CR) 10Kosta Harlan: Append coverage value (032 comments) [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/992929 (https://phabricator.wikimedia.org/T355803) (owner: 10Pwangai) [09:34:31] (ToolsGridQueueProblem) firing: Grid queue webgrid-lighttpd@tools-sgeweblight-10-25.tools.eqiad1.wikimedia.cloud is in state E - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsGridQueueProblem - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsGridQueueProblem [09:57:20] 10Toolforge (Quota-requests): Requesting additional Harbor / Toolforge Build Service disk quota for wd-shex-infer tool - https://phabricator.wikimedia.org/T355997 (10Slst2020) > It’s perhaps worth noting that for some reason, 317.19Mi or 317.20Mi of storage are still reported by toolforge build quota even after... [10:08:14] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [tbs.maintain-harbor] Document current setup and admin procedures - https://phabricator.wikimedia.org/T329176 (10Slst2020) @Raymond_Ndibe do you still plan to wo... [10:17:29] 10Grid-Engine-to-K8s-Migration: Migrate yapperbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320195 (10taavi) >>! In T320195#9492993, @Sj wrote: > @komla are you around? [[ https://www.nohello.com/ | Do you have a question? ]] [10:19:42] 10Toolforge Build Service, 10Documentation: [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092 (10Slst2020) >>! In T351092#9382124, @dcaro wrote: > For the docs on the current quota setup this might be {T329176} I think this task is specifically for documenting the functi... [10:22:24] 10Toolforge (Software install/update): Toolforge bastion hosts need updated python - https://phabricator.wikimedia.org/T356021 (10taavi) 05Open→03Invalid We rely on Debian for security updates, the last time [[ https://tracker.debian.org/pkg/python3.7 | python3.7 ]] was updated was in October. Once https://w... [10:31:05] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors [10:31:08] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) [10:31:42] !log taavi@runko tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [10:31:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:34:31] (ToolsGridQueueProblem) resolved: Grid queue webgrid-lighttpd@tools-sgeweblight-10-25.tools.eqiad1.wikimedia.cloud is in state E - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsGridQueueProblem - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsGridQueueProblem [10:44:16] !log tools increased harbor quota for lucaswerkmeister-test to 2GiB [10:44:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:45:29] !log tools.lucaswerkmeister-test increased harbor quota to 2GiB [10:45:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lucaswerkmeister-test/SAL [10:46:25] !log tools increased harbor quota for wd-shex-infer to 2GiB [10:46:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:47:01] !log tools.wd-shex-infer increased harbor quota to 2GiB [10:47:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wd-shex-infer/SAL [10:51:43] !log taavi@runko tools Added a new k8s worker-nfs tools-k8s-worker-nfs-3.tools.eqiad1.wikimedia.cloud to the cluster [10:51:43] !log taavi@runko tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [10:51:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:51:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:56:13] !log taavi@runko tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [10:56:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:56:21] 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140 (10Slst2020) [10:56:32] 10Toolforge (Quota-requests): Requesting additional Harbor / Toolforge Build Service disk quota for wd-shex-infer tool - https://phabricator.wikimedia.org/T355997 (10Slst2020) 05Open→03In progress a:03Slst2020 Done – Both projects now have 2Gi harbor storage quota each. [10:56:37] 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140 (10Slst2020) [10:56:41] 10Toolforge (Quota-requests): Requesting additional Harbor / Toolforge Build Service disk quota for wd-shex-infer tool - https://phabricator.wikimedia.org/T355997 (10Slst2020) 05In progress→03Resolved [10:57:12] !log taavi@runko tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [10:57:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:59:51] !log taavi@runko tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-30 [10:59:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:01:37] !log taavi@runko tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-31 [11:01:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:03:06] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) [11:04:22] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) Some maybe relevant upstream bugs: https://github.com/goharbor/harbor/issues/17611 -> closed, failing to cleanup execution + job_logs https://github.com/goharbor/harbor/issues/1725... [11:04:39] !log taavi@runko tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-32 [11:04:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:05:07] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) [11:06:01] !log taavi@runko tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-32 [11:06:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:07:33] (03PS3) 10Majavah: wmcs_libs: k8s: Fix Kubernetes role usage [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992924 [11:07:35] (03PS3) 10Majavah: Add worker-nfs Toolforge Kubernetes role/prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992925 (https://phabricator.wikimedia.org/T355883) [11:07:35] !log taavi@runko tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-33 [11:07:37] (03PS3) 10Majavah: toolforge: add_k8s_node: Allow passing --network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992926 (https://phabricator.wikimedia.org/T284656) [11:07:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:07:39] (03PS3) 10Majavah: toolforge: add_k8s_node: Update hiera for control and ingress nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993071 (https://phabricator.wikimedia.org/T274499) [11:07:41] (03PS1) 10Majavah: toolforge: k8s: depool_and_remove_node: Fix END logging [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993670 [11:09:00] !log taavi@runko tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-33 [11:09:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:09:08] !log taavi@runko tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-34 [11:09:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:10:29] !log taavi@runko tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-34 [11:10:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:10:38] !log taavi@runko tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-35 [11:10:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:11:13] (03CR) 10CI reject: [V: 04-1] Add worker-nfs Toolforge Kubernetes role/prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992925 (https://phabricator.wikimedia.org/T355883) (owner: 10Majavah) [11:11:15] (03CR) 10CI reject: [V: 04-1] toolforge: add_k8s_node: Update hiera for control and ingress nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993071 (https://phabricator.wikimedia.org/T274499) (owner: 10Majavah) [11:11:17] (03CR) 10CI reject: [V: 04-1] toolforge: k8s: depool_and_remove_node: Fix END logging [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993670 (owner: 10Majavah) [11:11:23] (03CR) 10CI reject: [V: 04-1] toolforge: add_k8s_node: Allow passing --network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992926 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [11:12:02] !log taavi@runko tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-35 [11:12:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:12:28] (InstanceDown) firing: Project tools instance tools-k8s-worker-32 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:12:31] !log taavi@runko tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [11:12:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:17:28] (InstanceDown) resolved: (2) Project tools instance tools-k8s-worker-32 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:22:31] (03CR) 10David Caro: [C: 03+1] "lgtm" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992924 (owner: 10Majavah) [11:22:57] !log taavi@runko tools Added a new k8s worker-nfs tools-k8s-worker-nfs-4.tools.eqiad1.wikimedia.cloud to the cluster [11:22:58] !log taavi@runko tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [11:23:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:23:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:23:13] (03PS4) 10David Caro: quota_show: fix change in openstack cli return value [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992085 [11:26:17] !log taavi@runko tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [11:26:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:26:36] (03CR) 10Majavah: [C: 03+2] wmcs_libs: k8s: Fix Kubernetes role usage [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992924 (owner: 10Majavah) [11:28:51] (03CR) 10David Caro: [C: 03+2] quota_show: fix change in openstack cli return value [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992085 (owner: 10David Caro) [11:31:41] (03Merged) 10jenkins-bot: wmcs_libs: k8s: Fix Kubernetes role usage [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992924 (owner: 10Majavah) [11:37:13] !log taavi@runko tools Added a new k8s worker-nfs tools-k8s-worker-nfs-5.tools.eqiad1.wikimedia.cloud to the cluster [11:37:13] !log taavi@runko tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [11:37:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:37:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:38:41] (CloudVPSDesignateLeaks) firing: Detected 10 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:39:35] (03PS5) 10Majavah: quota_show: fix change in openstack cli return value [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992085 (owner: 10David Caro) [11:39:39] (03CR) 10Majavah: [C: 03+2] quota_show: fix change in openstack cli return value [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992085 (owner: 10David Caro) [11:42:09] (CephSlowOps) firing: Ceph cluster in eqiad has 2 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [11:42:22] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T352570 (10phaultfinder) [11:43:41] (CloudVPSDesignateLeaks) firing: (2) Detected 10 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:45:18] (03Merged) 10jenkins-bot: quota_show: fix change in openstack cli return value [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992085 (owner: 10David Caro) [11:47:09] (CephSlowOps) resolved: Ceph cluster in eqiad has 7 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [11:48:59] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) [11:49:19] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) [11:51:20] !log taavi@runko tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [11:51:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:51:30] !log taavi@runko tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [11:51:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:51:37] 10PAWS: jupyterlab to 4.0.11 - https://phabricator.wikimedia.org/T355890 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/369 [11:51:46] vivian-rook opened https://github.com/toolforge/paws/pull/369 [11:51:54] 10PAWS: jupyterlab to 4.0.11 - https://phabricator.wikimedia.org/T355890 (10rook) a:03rook [11:55:23] !log taavi@runko tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [11:55:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:58:01] (03PS1) 10Stevemunene: Add dummy keytabs for new an-worker1157-1175 [labs/private] - 10https://gerrit.wikimedia.org/r/993675 (https://phabricator.wikimedia.org/T353776) [12:00:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:05:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:06:49] !log taavi@runko tools Added a new k8s worker-nfs tools-k8s-worker-nfs-6.tools.eqiad1.wikimedia.cloud to the cluster [12:06:49] !log taavi@runko tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [12:06:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:06:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:27:33] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/993683 (owner: 10L10n-bot) [12:37:19] (03CR) 10FNegri: "This looks good, but I'm confused by the difference with the "-standalone" images. I left a comment in the task." [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/991595 (https://phabricator.wikimedia.org/T355231) (owner: 10Majavah) [12:37:52] 10Toolforge (Toolforge iteration 04), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Create Bookworm-based standalone webservice image - https://phabricator.wikimedia.org/T355231 (10fnegri) I'm trying to understand the use cases of the different images we have. The use case for this image seems t... [12:44:27] 10PAWS: jupyterlab to 4.0.11 - https://phabricator.wikimedia.org/T355890 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/369 [12:44:37] 10PAWS: jupyterlab to 4.0.11 - https://phabricator.wikimedia.org/T355890 (10rook) 05Open→03Resolved [12:44:39] vivian-rook closed https://github.com/toolforge/paws/pull/369 [12:51:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:52:25] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) Easy ones first, cleaning up old executions without a task: ` DELETE from execution where execution.id not in (select execution_id from task); ` [12:54:08] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [12:54:47] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [12:57:27] 10Cloud-VPS, 10Toolforge (Toolforge iteration 04), 10cloud-services-team: Ensure Toolforge and Cloud VPS comply with Google's new email sender guidelines - https://phabricator.wikimedia.org/T354112 (10taavi) [12:57:29] 10Toolforge, 10cloud-services-team: Upgrade Toolforge mail server to Debian Bullseye or later - https://phabricator.wikimedia.org/T311910 (10taavi) [12:57:50] 10Toolforge, 10cloud-services-team: Upgrade Toolforge mail server to Debian Bullseye or later - https://phabricator.wikimedia.org/T311910 (10taavi) p:05Triage→03High a:03taavi [13:03:08] 10Toolforge (Quota-requests): Requesting additional Harbor / Toolforge Build Service disk quota for wd-shex-infer tool - https://phabricator.wikimedia.org/T355997 (10dcaro) I did notice that also at some point, it's gone now though, I'm guessing that there might be some dangling layers or similar that a cleanup... [13:04:24] !log taavi@runko toolsbeta START - Cookbook wmcs.vps.refresh_puppet_certs on toolsbeta-mail-2.toolsbeta.eqiad1.wikimedia.cloud [13:04:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:06:27] !log taavi@runko toolsbeta END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on toolsbeta-mail-2.toolsbeta.eqiad1.wikimedia.cloud [13:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:22:06] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] UX exploration and wireframes - https://phabricator.wikimedia.org/T354531 (10KColeman-WMF) [13:43:42] 10Toolforge (Quota-requests): Requesting additional Harbor / Toolforge Build Service disk quota for wd-shex-infer tool - https://phabricator.wikimedia.org/T355997 (10LucasWerkmeister) Alright, thank you! [13:59:18] 10Cloud-VPS (Project-requests): Request creation of mariadb-test VPS project - https://phabricator.wikimedia.org/T343341 (10ABran-WMF) [14:06:11] 10Cloud-VPS, 10cloud-services-team, 10Goal: Gather feedback from users of the 'unmanaged' debian-12.0-nopuppet image - https://phabricator.wikimedia.org/T355963 (10fgiunchedi) Thank you @Andrew for working on this! I tested the image today on the `monitoring` project and I'm happy to report that it works as... [14:16:11] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Infrastructure-Foundations: Remove wmcs-admin access from production cumin hosts - https://phabricator.wikimedia.org/T347979 (10MoritzMuehlenhoff) 05Open→03Stalled Given this is blocked on T347490, I'm marking it as Stalled. [14:34:24] !log taavi@runko tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-mail-4.tools.eqiad1.wikimedia.cloud [14:34:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:36:25] !log taavi@runko tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-mail-4.tools.eqiad1.wikimedia.cloud [14:36:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:40:36] 10Striker, 10Infrastructure-Foundations, 10LDAP: Store Wikimedia unified account name (SUL) in LDAP directory - https://phabricator.wikimedia.org/T148048 (10SLyngshede-WMF) 05In progress→03Resolved [15:40:43] 10cloud-services-team, 10wikitech.wikimedia.org, 10Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859 (10SLyngshede-WMF) [15:42:24] 10Cloud-VPS, 10cloud-services-team, 10SRE, 10Patch-For-Review: Restrict traffic from instances to private IPs on cloudgw level - https://phabricator.wikimedia.org/T350132 (10joanna_borun) [15:43:42] (CloudVPSDesignateLeaks) firing: (2) Detected 24 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:49:14] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Goal: Support 'unmanaged' projects in cloud-vps - https://phabricator.wikimedia.org/T326818 (10Andrew) Docs at https://wikitech.wikimedia.org/wiki/Help:Unmanaged_Cloud_VPS_instances [15:51:55] 10Cloud-VPS, 10cloud-services-team, 10Puppet (Puppet 7.0): Update designate-sink cert cleaning hook to work with Puppet 7 CA changes - https://phabricator.wikimedia.org/T351455 (10joanna_borun) [16:01:44] 10wikitech.wikimedia.org, 10Gerrit: Can't login into Gerrit with a Wikimedia Developer account with non-unique email address - https://phabricator.wikimedia.org/T270233 (10hashar) 05Open→03Declined Declining since the root cause was two accounts having the same email addresses while Gerrit requires account... [16:02:52] 10cloud-services-team, 10wikitech.wikimedia.org: Developer account creation without OpenStackManager - https://phabricator.wikimedia.org/T196171 (10SLyngshede-WMF) [16:03:03] 10cloud-services-team, 10wikitech.wikimedia.org, 10Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859 (10SLyngshede-WMF) [16:03:11] 10cloud-services-team, 10Bitu, 10Infrastructure-Foundations, 10SRE, 10LDAP: Create a single application to provision and manage developer (LDAP) accounts - https://phabricator.wikimedia.org/T179463 (10SLyngshede-WMF) 05Open→03Declined We're already working on Bitu, which has at least some overlap wit... [16:03:38] (03PS4) 10Majavah: Add worker-nfs Toolforge Kubernetes role/prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992925 (https://phabricator.wikimedia.org/T355883) [16:03:40] (03PS4) 10Majavah: toolforge: add_k8s_node: Allow passing --network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992926 (https://phabricator.wikimedia.org/T284656) [16:03:41] 10Cloud-VPS, 10cloud-services-team, 10Patch-For-Review, 10Puppet (Puppet 7.0): Andrew tries to make a cloud-vps puppet7 server - https://phabricator.wikimedia.org/T351468 (10joanna_borun) [16:03:44] (03PS4) 10Majavah: toolforge: add_k8s_node: Update hiera for control and ingress nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993071 (https://phabricator.wikimedia.org/T274499) [16:03:46] (03PS2) 10Majavah: toolforge: k8s: depool_and_remove_node: Fix END logging [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993670 [16:03:51] 10Cloud-VPS, 10cloud-services-team, 10Puppet (Puppet 7.0): Migrate Cloud VPS puppet infrastructure to Puppet 7 - https://phabricator.wikimedia.org/T351450 (10joanna_borun) [16:04:16] 10Cloud-VPS, 10cloud-services-team, 10Documentation, 10Puppet (Puppet 7.0): Update Wikitech documentation on per-project Puppet servers - https://phabricator.wikimedia.org/T351509 (10joanna_borun) [16:04:23] 10Grid-Engine-to-K8s-Migration: Migrate arambot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319562 (10Aram) **Done, all jobs migrated!** Just three questions: # After toolforge commands, I get `/usr/bin/toolforge-jobs:15: DeprecationWarning: pkg_resources is deprecat... [16:04:25] 10Cloud-VPS, 10cloud-services-team, 10Puppet (Puppet 7.0): Migrate Cloud VPS central puppet server to Puppet 7 - https://phabricator.wikimedia.org/T351451 (10joanna_borun) [16:04:38] 10Cloud-VPS, 10cloud-services-team, 10Puppet (Puppet 7.0): Build new Bullseye and Bookworm base images with Puppet 7 - https://phabricator.wikimedia.org/T351510 (10joanna_borun) [16:04:46] 10VPS-Projects, 10cloud-services-team, 10Puppet (Puppet 7.0): Migrate per-project Puppet servers to Puppet 7 - https://phabricator.wikimedia.org/T351452 (10joanna_borun) [16:05:07] 10VPS-Projects, 10cloud-services-team, 10Puppet (Puppet 7.0): Migrate Puppet servers in Cloud Services team managed projects to Puppet 7 - https://phabricator.wikimedia.org/T351453 (10joanna_borun) [16:05:26] 10Cloud-VPS, 10cloud-services-team, 10Patch-For-Review, 10Puppet (Puppet 7.0): Write script or cookbook to migrate data from a Puppet 5 puppetmaster to a Puppet 7 puppetserver - https://phabricator.wikimedia.org/T351454 (10joanna_borun) [16:07:53] 10Cloud-VPS (Project-requests): Request creation of mariadb-test VPS project - https://phabricator.wikimedia.org/T343341 (10ABran-WMF) [16:08:10] 10Cloud-VPS (Project-requests): Request creation of mariadb-test VPS project - https://phabricator.wikimedia.org/T343341 (10ABran-WMF) [16:11:34] 10Grid-Engine-to-K8s-Migration: Migrate arambot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319562 (10JJMC89) >>! In T319562#9494715, @Aram wrote: > **Done, all jobs migrated!** Just three questions: > > # After toolforge commands, I get `/usr/bin/toolforge-jobs:15:... [16:19:36] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/993683 (owner: 10L10n-bot) [16:51:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [17:06:27] 10Grid-Engine-to-K8s-Migration: Migrate arambot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319562 (10Aram) @JJMC89 Great! And huge thanks to you! [17:07:19] 10Grid-Engine-to-K8s-Migration: Migrate arambot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319562 (10Aram) 05Open→03Resolved [17:20:54] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) [17:58:48] 10Grid-Engine-to-K8s-Migration: Migrate croptool from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319653 (10Fuzheado) FYI, some conversations about finding folks to help migrate it over are taking place here: https://commons.wikimedia.org/wiki/Commons_talk:CropTool#Not_work... [18:00:31] 10Grid-Engine-to-K8s-Migration: Migrate croptool from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319653 (10Soda) I'm still working on this off and on, but migrating the app is a bit more involved since php7.4 is not availiable when building with buildpacks. [18:01:59] 10Cloud-VPS (Quota-requests): Request temporary quota increase for videowiki - https://phabricator.wikimedia.org/T356089 (10Harej) [18:02:05] 10Cloud-VPS (Quota-requests): Request temporary quota increase for owidm - https://phabricator.wikimedia.org/T356090 (10Harej) [18:43:09] (03CR) 10Andrew Bogott: [C: 03+1] Add worker-nfs Toolforge Kubernetes role/prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992925 (https://phabricator.wikimedia.org/T355883) (owner: 10Majavah) [18:44:06] (03CR) 10Andrew Bogott: [C: 03+1] toolforge: add_k8s_node: Allow passing --network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992926 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [18:45:12] (03CR) 10Majavah: [C: 03+2] Add worker-nfs Toolforge Kubernetes role/prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992925 (https://phabricator.wikimedia.org/T355883) (owner: 10Majavah) [18:45:29] (03CR) 10Majavah: [C: 03+2] toolforge: add_k8s_node: Allow passing --network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992926 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [18:46:16] (03CR) 10Andrew Bogott: [C: 03+1] toolforge: add_k8s_node: Update hiera for control and ingress nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993071 (https://phabricator.wikimedia.org/T274499) (owner: 10Majavah) [18:46:56] (03CR) 10Majavah: [C: 03+2] toolforge: add_k8s_node: Update hiera for control and ingress nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993071 (https://phabricator.wikimedia.org/T274499) (owner: 10Majavah) [18:48:42] (CloudVPSDesignateLeaks) firing: (2) Detected 24 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:53:42] (CloudVPSDesignateLeaks) resolved: (2) Detected 24 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:54:28] (03Merged) 10jenkins-bot: Add worker-nfs Toolforge Kubernetes role/prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992925 (https://phabricator.wikimedia.org/T355883) (owner: 10Majavah) [18:54:30] (03Merged) 10jenkins-bot: toolforge: add_k8s_node: Allow passing --network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992926 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [18:54:32] (03Merged) 10jenkins-bot: toolforge: add_k8s_node: Update hiera for control and ingress nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993071 (https://phabricator.wikimedia.org/T274499) (owner: 10Majavah) [18:57:09] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] UX exploration and wireframes - https://phabricator.wikimedia.org/T354531 (10KColeman-WMF) [19:04:39] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, 10Design: [Design] Prototype and user testing plan - https://phabricator.wikimedia.org/T356099 (10KColeman-WMF) [19:05:02] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, 10Design: [Design] Prototype and user testing plan - https://phabricator.wikimedia.org/T356099 (10KColeman-WMF) [19:05:43] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Prototype and user testing plan - https://phabricator.wikimedia.org/T356099 (10KColeman-WMF) [19:07:47] 10Tool-gitlab-account-approval: gitlab-account-approval bot stalled on 2024-01-09 - https://phabricator.wikimedia.org/T356097 (10Legoktm) [19:08:51] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, 10Design: [Design EPIC] Global User Contributions - https://phabricator.wikimedia.org/T349901 (10KColeman-WMF) [19:46:28] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-36 [19:46:28] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-36 [19:46:51] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-36 [20:38:41] (CloudVPSDesignateLeaks) firing: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:43:41] (CloudVPSDesignateLeaks) firing: (2) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:51:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [21:01:43] 10Grid-Engine-to-K8s-Migration, 10Wikimedia-Medicine, 10User-Harej: Migrate mdwiki from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319887 (10Harej) a:03Harej [21:07:08] 10Grid-Engine-to-K8s-Migration: Migrate yapperbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320195 (10Sj) Nice to see you @taavi, hello to you too. That was my question! @komla seems lightly active elsewhere on Phab, and afaict currently presumed to be maintaining t... [21:12:45] 10Cloud-VPS (Quota-requests): Request temporary quota increase for owidm - https://phabricator.wikimedia.org/T356090 (10dcaro) +1 [21:13:12] 10Cloud-VPS (Quota-requests): Request temporary quota increase for videowiki - https://phabricator.wikimedia.org/T356089 (10dcaro) +1 [21:56:02] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [22:35:36] 10Tools, 10Wikimedia-Medicine: Integrate "Content Translation" into the "Not in the other language" tool - https://phabricator.wikimedia.org/T195432 (10Harej) 05Open→03Resolved