[00:03:08] 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T378674#10279540 (10LibUp-bot) [00:03:10] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T378675#10279542 (10LibUp-bot) [00:03:13] 06cloud-services-team, 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T378676#10279544 (10LibUp-bot) [01:22:13] (03update) 10raymond-ndibe: [maintain-harbor] do not clean up images currently running in production [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/35 (https://phabricator.wikimedia.org/T377854) [01:22:36] FIRING: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:24:31] (03update) 10raymond-ndibe: [maintain-harbor] do not clean up images currently running in production [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/35 (https://phabricator.wikimedia.org/T377854) [02:41:29] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [03:31:40] 10Tool-video-answer-tool, 06Future-Audiences: [Spike] Other video tool tweaks: narration speedup - https://phabricator.wikimedia.org/T378623#10279678 (10derenrich) see sample videos here https://drive.google.com/drive/u/0/folders/1b39BrfYE3k9mSy2qWDZCFNYKN--uYc6L [03:35:15] 10Tool-video-answer-tool, 06Future-Audiences: [Spike] Other video tool tweaks: narration speedup - https://phabricator.wikimedia.org/T378623#10279679 (10derenrich) 05Open→03In progress a:03derenrich [05:22:36] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:04:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-27 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:09:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-27 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:07:37] 06cloud-services-team, 10Toolforge, 06Language and Product Localization: https://cxdebugger.toolforge.org/ has become very slow - https://phabricator.wikimedia.org/T367022#10279923 (10Nikerabbit) Can confirm: {F57662802} [09:22:36] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:49:20] 10Tool-refill: refill: review citoid usage - https://phabricator.wikimedia.org/T378686 (10aborrero) 03NEW [09:49:31] 10Tool-refill: refill: review citoid usage - https://phabricator.wikimedia.org/T378686#10280036 (10aborrero) [09:50:49] 10Tool-refill: refill: review citoid usage - https://phabricator.wikimedia.org/T378686#10280037 (10aborrero) p:05Triage→03High [09:54:31] 06cloud-services-team, 10Citoid: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com - https://phabricator.wikimedia.org/T378461#10280043 (10aborrero) [10:01:26] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services: maintain_meta-p prints warning related to labtestwikitech - https://phabricator.wikimedia.org/T378687 (10fnegri) 03NEW [10:01:27] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services: maintain_meta-p prints warning related to labtestwikitech - https://phabricator.wikimedia.org/T378687#10280072 (10fnegri) p:05Triage→03Medium [10:01:35] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services: maintain_meta-p prints warning related to labtestwikitech - https://phabricator.wikimedia.org/T378687#10280084 (10fnegri) a:03fnegri [10:03:41] (03open) 10dcaro: add token validation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) [10:04:11] (03update) 10dcaro: add token validation [repos/cloud/toolforge/components-api] (add_creation_date_to_token) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) [10:04:21] (03update) 10dcaro: add token validation [repos/cloud/toolforge/components-api] (add_creation_date_to_token) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/32 (https://phabricator.wikimedia.org/T362066) [10:15:42] 06cloud-services-team, 10Citoid: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com - https://phabricator.wikimedia.org/T378461#10280150 (10Curb_Safe_Charmer) a:03Curb_Safe_Charmer [10:15:45] (03CR) 10FNegri: [C:03+1] "Personally I'm not a fan of automating code generation and MR generation, because I think the long-term maintenance cost can outweigh the " [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [10:25:37] (03CR) 10David Caro: "> Personally I'm not a fan of automating code generation and MR generation, because I think the long-term maintenance cost can outweigh th" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [10:26:07] (03approved) 10fnegri: restish: add autocompletion config [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/205 (owner: 10dcaro) [10:35:16] (03CR) 10David Caro: "btw. a really good example is creating the packages and release notes automatically, I find it is a huge win, every time there's an issue " [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [10:35:25] 06cloud-services-team, 10Cloud-VPS: [cloud-vps] creating a new project can override existing DNS entries in the wmcloud.org domain - https://phabricator.wikimedia.org/T360294#10280208 (10aborrero) [10:36:20] 06cloud-services-team, 10Cloud-VPS: [cloud-vps] creating a new project can override existing DNS entries in the wmcloud.org domain - https://phabricator.wikimedia.org/T360294#10280206 (10aborrero) The proper fix for this is to have different domains for each thing. Otherwise there will always be potential con... [10:37:17] (03CR) 10David Caro: "I think that the key there is not only automating the process, but making it simpler, this is, automate + hide defaults so the most common" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [10:37:49] (03merge) 10dcaro: restish: add autocompletion config [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/205 [10:38:29] FIRING: PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-idp-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:44:47] 10Cloud-VPS (Project-requests), 06Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service: Request creation of wikiqlever VPS project - https://phabricator.wikimedia.org/T377655#10280238 (10Seppl2013) @bking - thanks for the hint i'll try it out on the upcoming imports [10:49:05] (03approved) 10dcaro: components-api: configure for local [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/556 [10:49:05] (03CR) 10FNegri: [C:03+1] "" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [10:49:08] (03merge) 10dcaro: components-api: configure for local [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/556 [10:56:26] (03CR) 10FNegri: [C:03+1] "> not only automating the process, but making it simpler, this is, automate + hide defaults" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [11:00:11] (03CR) 10David Caro: "> we could achieve the same simplicity for end users (or even more!) with less code to maintain and fewer level of indirection." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [11:01:56] (03CR) 10David Caro: "I there's a way to make any admin action a minimal patch in a repo, then MRs can be the entry point, if not, then something else has to (t" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [11:11:03] (03CR) 10FNegri: [C:03+1] "> I there's a way to make any admin action a minimal patch in a repo, then MRs can be the entry point, if not, then something else has to " [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [11:21:08] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services: maintain_meta-p prints warning related to labtestwikitech - https://phabricator.wikimedia.org/T378687#10280266 (10fnegri) [11:21:16] 10wikitech.wikimedia.org, 10MW-1.44-notes (1.44.0-wmf.2; 2024-11-05), 13Patch-For-Review, 10Wiki-Setup (Delete / Redirect): Retire labtestwiki - https://phabricator.wikimedia.org/T378260#10280267 (10fnegri) [11:23:40] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services: maintain_meta-p prints warning related to labtestwikitech - https://phabricator.wikimedia.org/T378687#10280264 (10fnegri) I think this will be fixed by {T378260} and specifically by this patch https://gerrit.wikimedia.org/r/c/operations/mediawiki-conf... [11:43:47] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services: maintain_meta-p prints warning related to labtestwikitech - https://phabricator.wikimedia.org/T378687#10280329 (10fnegri) 05Open→03Stalled [11:56:09] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for rskwiki - https://phabricator.wikimedia.org/T375016#10280370 (10fnegri) a:03fnegri [11:57:07] 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for annwiki - https://phabricator.wikimedia.org/T377118#10280374 (10fnegri) a:03fnegri [11:57:21] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for annwiki - https://phabricator.wikimedia.org/T377118#10280375 (10fnegri) [12:03:11] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for rskwiki - https://phabricator.wikimedia.org/T375016#10280382 (10fnegri) 05Open→03Resolved [12:04:19] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for annwiki - https://phabricator.wikimedia.org/T377118#10280387 (10fnegri) 05Open→03Resolved [12:22:03] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1085369 (owner: 10L10n-bot) [12:45:15] (03CR) 10David Caro: "> personally prefer to interact with the two tools (cookbooks and Tofu) separately" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [12:47:17] (03open) 10aborrero: py3.11-bookworm-tox/Dockerfile: install gcc [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/46 [13:08:15] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Decision Request - How to do the Cloud VPS VXLAN/IPv6 migration - https://phabricator.wikimedia.org/T377467#10280732 (10aborrero) I think there is some agreement on option 2 being the best course of action. I'll leave the t... [13:09:22] (03merge) 10aborrero: py3.11-bookworm-tox/Dockerfile: install gcc [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/46 [13:22:36] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:48:56] (03update) 10dcaro: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) (owner: 10raymond-ndibe) [13:52:38] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate deployment-poolcounter06.deployment-prep.eqiad.wmflabs is about to expire in 27d 23h 58m 30s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [13:55:54] (03CR) 10FNegri: [C:03+1] "> So part of your push to tofu, is to remove as many cookbooks as possible?" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1084054 (owner: 10David Caro) [13:58:20] 10PAWS: Can gitlab build docker images? - https://phabricator.wikimedia.org/T373896#10280915 (10rook) @Jelto In T357612 is it being suggested that one can build docker images in a runner with Dockerfile at this point? Or is it describing that images built with Dockerfile can be run and it is more about where the... [14:00:33] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T378675#10280918 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/456 [14:00:46] vivian-rook opened https://github.com/toolforge/paws/pull/456 [14:02:54] 10PAWS: remove 126a cluster - https://phabricator.wikimedia.org/T378718 (10rook) 03NEW [14:03:11] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-haproxy-6 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [14:03:40] 10PAWS: remove 126a cluster - https://phabricator.wikimedia.org/T378718#10280938 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/457 [14:03:52] vivian-rook opened https://github.com/toolforge/paws/pull/457 [14:07:00] 10PAWS: jupyterlab to 4.3.0 - https://phabricator.wikimedia.org/T378719 (10rook) 03NEW [14:18:58] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for tcywiktionary - https://phabricator.wikimedia.org/T378462#10280982 (10fnegri) a:03fnegri [14:19:00] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for bclwikisource - https://phabricator.wikimedia.org/T377087#10280987 (10fnegri) a:03fnegri [14:21:30] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for tcywikisource - https://phabricator.wikimedia.org/T378469#10280980 (10fnegri) a:03fnegri [14:21:38] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for ibawiki - https://phabricator.wikimedia.org/T376571#10280985 (10fnegri) a:03fnegri [14:22:57] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for bclwikisource - https://phabricator.wikimedia.org/T377087#10281017 (10fnegri) 05Open→03Resolved [14:24:41] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for tcywiktionary - https://phabricator.wikimedia.org/T378462#10281029 (10fnegri) 05Open→03Resolved [14:25:11] 10PAWS: remove 126a cluster - https://phabricator.wikimedia.org/T378718#10281042 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/457 [14:25:18] 10PAWS: remove 126a cluster - https://phabricator.wikimedia.org/T378718#10281055 (10rook) 05Open→03Resolved [14:25:20] vivian-rook closed https://github.com/toolforge/paws/pull/457 [14:25:28] 10wikitech.wikimedia.org, 10MW-1.44-notes (1.44.0-wmf.2; 2024-11-05), 13Patch-For-Review, 10Wiki-Setup (Delete / Redirect): Retire labtestwiki - https://phabricator.wikimedia.org/T378260#10281022 (10Count_Count) It needs to be removed from the sitematrix as well. Code which tries to access the associated U... [14:26:01] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for ibawiki - https://phabricator.wikimedia.org/T376571#10281024 (10fnegri) 05Open→03Resolved [14:32:49] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06DBA, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Prepare and check storage layer for tcywikisource - https://phabricator.wikimedia.org/T378469#10281063 (10fnegri) 05Open→03Resolved [15:19:04] 06cloud-services-team: project-proxy: maps have puppet acme-chief problems - https://phabricator.wikimedia.org/T378726 (10aborrero) 03NEW [15:20:07] 06cloud-services-team: project-proxy: maps have puppet acme-chief problems - https://phabricator.wikimedia.org/T378726#10281274 (10aborrero) p:05Triage→03High [15:21:28] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [15:22:28] FIRING: [2x] PuppetAgentFailure: Puppet agent failure detected on instance maps-proxy-03 in project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [15:23:12] 06cloud-services-team: project-proxy: maps have puppet acme-chief problems - https://phabricator.wikimedia.org/T378726#10281305 (10taavi) 05Open→03Resolved a:03taavi [15:24:44] 06cloud-services-team, 10Toolforge, 10Elasticsearch, 07Epic: Deploy multi-tenant OpenSearch cluster as replacement for Elasticsearch - https://phabricator.wikimedia.org/T348943#10281308 (10fnegri) [15:42:28] RESOLVED: PuppetAgentFailure: Puppet agent failure detected on instance maps-proxy-03 in project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [16:04:42] 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T378674#10281596 (10rook) 05Open→03Resolved a:03rook [16:04:42] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T378675#10281598 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/456 [16:04:50] vivian-rook closed https://github.com/toolforge/paws/pull/456 [16:05:32] 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T378674#10281605 (10rook) 05Resolved→03Open [16:06:07] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T378675#10281603 (10rook) 05Open→03Resolved a:03rook [16:06:37] 10PAWS: jupyterlab to 4.3.0 - https://phabricator.wikimedia.org/T378719#10281611 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/458 [16:06:49] vivian-rook opened https://github.com/toolforge/paws/pull/458 [16:37:57] 10PAWS: jupyterlab to 4.3.0 - https://phabricator.wikimedia.org/T378719#10281747 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/458 [16:37:58] vivian-rook closed https://github.com/toolforge/paws/pull/458 [16:38:44] 10PAWS: jupyterlab to 4.3.0 - https://phabricator.wikimedia.org/T378719#10281749 (10rook) 05Open→03Resolved a:03rook [16:40:34] vivian-rook opened https://github.com/toolforge/paws/pull/459 [16:46:54] 06cloud-services-team, 10Cloud-VPS: [cloud-vps] creating a new project can override existing DNS entries in the wmcloud.org domain - https://phabricator.wikimedia.org/T360294#10281822 (10bd808) We knew this could happen when we started delegating the project name related subdomain to the project. It feels like... [16:57:03] 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T378674#10281883 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/459 [16:57:12] 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T378674#10281884 (10rook) 05Open→03Resolved [16:57:12] vivian-rook closed https://github.com/toolforge/paws/pull/459 [17:04:23] (03PS1) 10Majavah: Drop unused passwords::wikitech [labs/private] - 10https://gerrit.wikimedia.org/r/1085441 [17:04:23] (03PS1) 10Majavah: Drop unused Wikitech settings [labs/private] - 10https://gerrit.wikimedia.org/r/1085442 [17:04:24] (03PS1) 10Majavah: Drop bunch of unused Hiera [labs/private] - 10https://gerrit.wikimedia.org/r/1085443 [17:21:11] 10Toolforge (Toolforge iteration 16): [components-api] Add support for pre-built images (ex. python3.11, to refine) - https://phabricator.wikimedia.org/T362076#10282059 (10dcaro) p:05Triage→03High [17:22:33] 10Toolforge (Toolforge iteration 16): [components-api] Add support for pre-built images (ex. python3.11, to refine) - https://phabricator.wikimedia.org/T362076#10282057 (10dcaro) This was already done xd [17:22:36] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:47:38] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services: maintain_meta-p prints warning related to labtestwikitech - https://phabricator.wikimedia.org/T378687#10282197 (10taavi) Still happening? [18:20:10] (03update) 10dcaro: token: add created_at field to the token [repos/cloud/toolforge/components-api] (rename_deploy_token) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/31 [18:26:44] 06cloud-services-team, 10Citoid: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com - https://phabricator.wikimedia.org/T378461#10282291 (10Novem_Linguae) > The reFill tool is just a userscript so we don't really have any way of getting more information about the cont... [18:48:48] 06cloud-services-team, 10Citoid: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com - https://phabricator.wikimedia.org/T378461#10282329 (10Curb_Safe_Charmer) a:05Curb_Safe_Charmer→03None I probably should have claimed T378686 rather than this one. I don't believ... [18:49:14] 10Tool-refill: refill: review citoid usage - https://phabricator.wikimedia.org/T378686#10282325 (10Curb_Safe_Charmer) a:03Curb_Safe_Charmer I claimed the wrong task - this is the correct one [19:01:26] (03open) 10raymond-ndibe: [toolforge-weld] refactor parse_quantity [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/64 (https://phabricator.wikimedia.org/T361120) [19:01:35] (03update) 10raymond-ndibe: [toolforge-weld] refactor parse_quantity [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/64 (https://phabricator.wikimedia.org/T361120) [19:02:02] (03update) 10raymond-ndibe: [toolforge-weld] refactor parse_quantity [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/64 (https://phabricator.wikimedia.org/T361120) [19:06:35] (03update) 10raymond-ndibe: [toolforge-weld] refactor parse_quantity [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/64 (https://phabricator.wikimedia.org/T361120) [19:07:16] (03update) 10raymond-ndibe: [toolforge-weld] refactor parse_quantity [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/64 (https://phabricator.wikimedia.org/T361120) [19:25:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-27 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:30:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-27 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:56:34] (03update) 10raymond-ndibe: [toolforge-weld] refactor parse_quantity [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/64 (https://phabricator.wikimedia.org/T361120) [19:59:44] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] (refactor_validate_kube_quant) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [20:21:26] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] (refactor_validate_kube_quant) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [20:21:56] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [20:22:02] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [20:33:09] 10Tool-refill: refill: review citoid usage - https://phabricator.wikimedia.org/T378686#10282694 (10Curb_Safe_Charmer) 05Open→03In progress While it is possible for someone to call the reFill API directly, and therefore to misuse the API to bombard a website with requests, that would be reflected in the uwsgi... [20:35:41] 10Tool-refill: refill: review citoid usage - https://phabricator.wikimedia.org/T378686#10282710 (10Curb_Safe_Charmer) Note that the uwsgi.log records the Wikipedia page, and language code for the version of Wikipedia, that the user has requested citoid to expand references for, rather than the URL of those refer... [20:57:44] 10Tool-refill: refill: review citoid usage - https://phabricator.wikimedia.org/T378686#10282784 (10Curb_Safe_Charmer) Looking at English Wikipedia, there are over 5,000 articles that reference pro-football-reference.com and that were edited between 19 and 31 October 2024. It seems to me to be not beyond the imag... [21:08:37] 10Tool-refill: refill: review citoid usage - https://phabricator.wikimedia.org/T378686#10282814 (10Curb_Safe_Charmer) There is certainly nothing in the service log that indicates that reFill was called 30,000 times in any 12 hours. [21:08:53] 10Tool-refill: refill: review citoid usage - https://phabricator.wikimedia.org/T378686#10282815 (10Curb_Safe_Charmer) 05In progress→03Resolved [21:10:26] 06cloud-services-team, 10Citoid: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com - https://phabricator.wikimedia.org/T378461#10282818 (10Curb_Safe_Charmer) I have completed my review of usage of the reFill service during the dates in question and have documented my... [21:17:27] 06cloud-services-team, 10SRE Observability (FY2024/2025-Q2): cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599#10282844 (10lmata) [21:18:11] FIRING: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-harbor-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:20:07] 06cloud-services-team, 10Citoid: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com - https://phabricator.wikimedia.org/T378461#10282856 (10Curb_Safe_Charmer) I've added @AManWithNoPlan as a subscriber. They are one of the maintainers of CitationBot. [21:22:36] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:42:58] 06cloud-services-team, 10Citoid: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com - https://phabricator.wikimedia.org/T378461#10282956 (10Curb_Safe_Charmer) I couldn't access the https://turnilo.wikimedia.org link - what access do we need? [23:08:29] (03PS3) 10Brian Wolff: Use a CSP policy to reduce risk of XSS [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1080836 (https://phabricator.wikimedia.org/T377168) [23:11:24] (03CR) 10Brian Wolff: "@krinkle@fastmail.com Is PS3 better? It centralizes most of the CSP, but overrides it for the one page - thus there is much less repetitiv" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1080836 (https://phabricator.wikimedia.org/T377168) (owner: 10Brian Wolff) [23:34:23] FIRING: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM [23:39:23] RESOLVED: OOM: OOM killer active on cloudcontrol2006-dev:9100 - TODO - https://grafana.wikimedia.org/d/-OcleDKIz/oom-kill - https://alerts.wikimedia.org/?q=alertname%3DOOM