[00:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:46:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [02:11:01] 10Cloud-VPS (Project-requests), 10cloud-services-team (Kanban), 10OurWorldInData, 10Security-Team, 10Wiki-Project-Med: Request creation of OurWorldinData VPS project - https://phabricator.wikimedia.org/T301044 (10Harej) [02:19:17] 10Cloud-VPS (Quota-requests), 10Wiki-Project-Med: Quota request for VideoWiki - https://phabricator.wikimedia.org/T300750 (10Harej) [02:21:34] 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10Wiki-Project-Med: videowiki temporary quota increase - https://phabricator.wikimedia.org/T299314 (10Harej) [02:25:24] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Wikimedia-Medicine: Give admin rights to Doc_James and Pratik to VideoWiki project on Cloud VPS - https://phabricator.wikimedia.org/T275104 (10Harej) [02:26:13] 10Cloud-VPS (Project-requests), 10Wikimedia-Medicine: Request creation of VPS project - https://phabricator.wikimedia.org/T211523 (10Harej) [02:33:20] 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10Wikimedia-Medicine: iiab temporary quota can be released - https://phabricator.wikimedia.org/T299313 (10Harej) [02:34:19] 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10Wikimedia-Medicine: iiab temporary quota increase - https://phabricator.wikimedia.org/T297909 (10Harej) [02:34:47] 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10Wikimedia-Medicine: Request increased quota for iiab Cloud VPS project - https://phabricator.wikimedia.org/T277758 (10Harej) [02:35:14] 10Cloud-VPS (Project-requests), 10Wikimedia-Medicine, 10User-bd808: Request creation of IIAB VPS project - https://phabricator.wikimedia.org/T176926 (10Harej) [02:35:34] 10Cloud-VPS (Quota-requests), 10cloud-services-team (Kanban), 10Wikimedia-Medicine: Revert: iiab temporary m1.xlarge increase - https://phabricator.wikimedia.org/T188176 (10Harej) [02:35:47] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Wikimedia-Medicine: should attached volumes automatically mount? - https://phabricator.wikimedia.org/T298544 (10Harej) [02:37:31] 10Cloud-VPS, 10cloud-services-team (Kanban), 10Internet-Archive, 10Wikimedia-Medicine: puppet failure on unknown instance - https://phabricator.wikimedia.org/T298466 (10Harej) [03:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [03:26:56] 10Grid-Engine-to-K8s-Migration: Migrate betacommand-dev from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319587 (10Betacommand) 1) I dont use Git, I use SVN 2) The code is hosted in a private repo to avoid leaking of credentials/non-public information. 3) None of the listed... [03:38:06] 10Tools, 10Wikimedia-Medicine: Integrate "Content Translation" into the "Not in the other language" tool - https://phabricator.wikimedia.org/T195432 (10Harej) [04:23:05] 10Tools: [dplbot] uncategorized articles not working - https://phabricator.wikimedia.org/T355014 (10Pppery) >>! In T355014#9458242, @Peachey88 wrote: > The page in question, directs (in the footer) contact queries to https://en.wikipedia.org/wiki/User_talk:JaGa. Please raise as appropriate on JaGa's talk page.... [04:46:01] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:52:30] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:57:30] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:02:30] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:07:30] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:11:21] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [tbs][builds-api] Refactor `internal/builds.go` - https://phabricator.wikimedia.org/T352762 (10Raymond_Ndibe) 05Open→03In progress [05:17:35] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review: [tbs][builds-api] Refactor `internal/builds.go` - https://phabricator.wikimedia.org/T352762 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/71 [builds-ap... [05:33:06] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api,logs] Increase pod starting timeout to the same as the request - https://phabricator.wikimedia.org/T354856 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/... [05:35:07] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: [ci] Add shellcheck to pre-commit where missing - https://phabricator.wikimedia.org/T353052 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/51 [builds-cli] add shellcheck [05:35:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [05:35:55] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: [ci] Add shellcheck to pre-commit where missing - https://phabricator.wikimedia.org/T353052 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/22 [envvars-cli] add shellcheck [05:35:59] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: [ci] Add shellcheck to pre-commit where missing - https://phabricator.wikimedia.org/T353052 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/175 [toolforge-deploy] add shellcheck [05:40:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [05:41:09] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api,logs] Increase pod starting timeout to the same as the request - https://phabricator.wikimedia.org/T354856 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https... [05:44:07] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api,logs] Increase pod starting timeout to the same as the request - https://phabricator.wikimedia.org/T354856 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/... [06:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [07:09:48] (03PS2) 10Eugene233: Fix capitalization in some ISA messages [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/990675 (https://phabricator.wikimedia.org/T354920) [07:11:12] (03CR) 10Eugene233: Fix capitalization in some ISA messages (033 comments) [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/990675 (https://phabricator.wikimedia.org/T354920) (owner: 10Eugene233) [07:25:43] (03PS2) 10Eugene233: Suggestions for improvements in some ISA messages [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/990679 (https://phabricator.wikimedia.org/T354921) [08:26:29] 10Tools: [dplbot] uncategorized articles not working - https://phabricator.wikimedia.org/T355014 (10Peachey88) (Also, if you would a phabricator project to track tasks, You can self service via striker to create one) [08:55:58] 10Toolforge, 10cloud-services-team: php 8.2 crashes when using XMLReader - https://phabricator.wikimedia.org/T352886 (10taavi) a:03taavi [09:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:12:00] (03PS1) 10Majavah: vps: create_instance: wait for cloud-init marker [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991288 [09:16:02] 10Toolforge, 10cloud-services-team: php 8.2 crashes when using XMLReader - https://phabricator.wikimedia.org/T352886 (10taavi) Ah, I was wrong. `8.2.7` is the newest on Bookworm (stable), the newer versions are in testing. However weirdly enough I can't reproduce in a standalone VM on 8.2.7. [09:23:20] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: apt buildpack (Aptfile support): not installing dependencies of packages already present on the build image - https://phabricator.wikimedia.org/T353847 (10dcaro) >>! In T353847#9463709, @LucasWerkmeister wrote: > Thanks, but this doesn’t resolve... [10:06:03] (03CR) 10FNegri: [C: 03+1] "LGTM" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991288 (owner: 10Majavah) [10:07:47] 10Toolforge (Toolforge iteration 03): [build-service] php + nodejs projects get detected as nodejs only - https://phabricator.wikimedia.org/T355207 (10dcaro) [10:12:43] (03CR) 10Majavah: [C: 03+2] vps: create_instance: wait for cloud-init marker [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991288 (owner: 10Majavah) [10:16:02] (03Merged) 10jenkins-bot: vps: create_instance: wait for cloud-init marker [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991288 (owner: 10Majavah) [11:08:32] 10Toolforge (Software install/update): Build Bookworm based Toolforge Kubernetes images - https://phabricator.wikimedia.org/T335507 (10Count_Count) @taavi Is there a separate task for the webservice images? [11:30:00] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [apt-buildpack] Not sourcing /layers/fagiani_apt/apt/.profile.d/000_apt.sh - https://phabricator.wikimedia.org/T355214 (10dcaro) [11:30:33] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: [build-service] php + nodejs projects get detected as nodejs only - https://phabricator.wikimedia.org/T355207 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/30 inject_buildpacks: ensure... [11:31:26] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [apt-buildpack] alternatives aren’t being set up - https://phabricator.wikimedia.org/T355215 (10dcaro) [11:33:20] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [apt-buildpack] some packages install broken links - https://phabricator.wikimedia.org/T355217 (10dcaro) [11:51:30] 10Grid-Engine-to-K8s-Migration: Migrate betacommand-dev from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319587 (10dcaro) >>! In T319587#9464438, @Betacommand wrote: > 1) I dont use Git, I use SVN We don't support SVN for the build service :/ If staying on svn is a necessity... [11:52:54] 10Grid-Engine-to-K8s-Migration: Migrate blogconverter from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319597 (10dcaro) The maintainer said that the tool can be disabled (see https://www.mediawiki.org/wiki/User_talk:HaeB#Migrating_blogconverter_tool_to_k8s). [12:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [12:27:37] 10Toolforge (Software install/update): Build Bookworm based Toolforge Kubernetes images - https://phabricator.wikimedia.org/T335507 (10taavi) No. [12:47:10] 10Grid-Engine-to-K8s-Migration: Migrate betacommand-dev from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319587 (10Betacommand) > The code should be publicly accessible Per RFC 2119 > 3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reas... [12:50:47] 10Grid-Engine-to-K8s-Migration: Migrate betacommand-dev from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319587 (10taavi) From [[ https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rules | Help:Toolforge/Rules ]]: > 2. All code in the Tools project **must be published** unde... [13:04:42] 10Grid-Engine-to-K8s-Migration: Migrate betacommand-dev from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319587 (10Betacommand) Ive reverted the unilateral change in policy about source code publication. Providing a OSI license with the code has met the toolserver and labs/c... [13:05:38] (03PS1) 10Ladsgroup: Drop unused nagios sql pass [labs/private] - 10https://gerrit.wikimedia.org/r/991323 [13:08:49] (03CR) 10Marostegui: [C: 03+1] Drop unused nagios sql pass [labs/private] - 10https://gerrit.wikimedia.org/r/991323 (owner: 10Ladsgroup) [13:09:42] (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] Drop unused nagios sql pass [labs/private] - 10https://gerrit.wikimedia.org/r/991323 (owner: 10Ladsgroup) [13:24:55] (MaxConntrack) firing: Max conntrack at 89.37% on cloudvirt1043:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [13:25:55] (MaxConntrack) firing: Max conntrack at 100% on cloudvirt1043:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [13:26:01] 10cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10phaultfinder) [13:36:57] 10Cloud-VPS, 10cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10taavi) a:03taavi This is a repeat of {T355061} and I assume it's one of the migrated VMs that's causing it: `lang=shell... [13:47:19] 10cloud-services-team, 10Infrastructure-Foundations, 10netops: Remove cloud-support1-c-eqiad VLAN - https://phabricator.wikimedia.org/T355115 (10taavi) 05Open→03Resolved a:03taavi [13:47:55] 10Data-Services, 10cloud-services-team, 10Patch-For-Review: Move wiki replicas behind cloudlb - https://phabricator.wikimedia.org/T346947 (10taavi) 05Open→03Resolved This is all done, I think. [13:47:58] 10cloud-services-team, 10Infrastructure-Foundations, 10netops: Remove cloud-support1-c-eqiad VLAN - https://phabricator.wikimedia.org/T355115 (10taavi) [13:48:07] 10Data-Services, 10cloud-services-team, 10Data-Platform-SRE, 10Patch-For-Review: Automate maintain-views replica depooling - https://phabricator.wikimedia.org/T300427 (10taavi) [13:48:17] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Goal: have cloud hardware servers in the cloud realm using a dedicated LB layer - https://phabricator.wikimedia.org/T297596 (10taavi) [13:55:58] 10Grid-Engine-to-K8s-Migration: Migrate betacommand-dev from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319587 (10dcaro) >>! In T319587#9465459, @Betacommand wrote: >> The code should be publicly accessible > Per RFC 2119 >> 3. SHOULD This word, or the adjective "RECOMMEN... [14:23:41] 10Toolforge: [builds-builder] Consider building our own bash image with extra tooling for toml parsing - https://phabricator.wikimedia.org/T355228 (10dcaro) [14:25:39] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: [build-service] php + nodejs projects get detected as nodejs only - https://phabricator.wikimedia.org/T355207 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/30 inject_buildpacks: ensure... [14:27:04] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: [build-service] php + nodejs projects get detected as nodejs only - https://phabricator.wikimedia.org/T355207 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/... [14:27:56] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [14:28:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:28:31] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [14:28:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:31:45] 10Toolforge (Software install/update), 10Kubernetes: Bookworm based Toolforge Kubernetes webservice image - https://phabricator.wikimedia.org/T355231 (10Count_Count) [14:32:54] 10Toolforge (Software install/update): Build Bookworm based Toolforge Kubernetes images - https://phabricator.wikimedia.org/T335507 (10Count_Count) OK, I opened T355231. [14:34:00] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [14:34:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:34:36] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [14:34:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:36:15] 10Toolforge (Software install/update), 10Kubernetes: Bookworm based Toolforge Kubernetes webservice image - https://phabricator.wikimedia.org/T355231 (10taavi) I'm not sure I understand this request here - the majority of the [[ https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Available_container_... [14:36:40] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: [build-service] php + nodejs projects get detected as nodejs only - https://phabricator.wikimedia.org/T355207 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/177 builds-builder: bump t... [14:37:42] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: [build-service] php + nodejs projects get detected as nodejs only - https://phabricator.wikimedia.org/T355207 (10dcaro) 05Open→03Resolved [14:42:27] 10Toolforge (Software install/update), 10Kubernetes: Bookworm based Toolforge Kubernetes webservice image - https://phabricator.wikimedia.org/T355231 (10Count_Count) The golang1.11 webservice image is on Debian buster. The jdk17 webservice image is on bullseye: ` tools.spamcheck@tools-sgebastion-10:~$ cat serv... [14:50:30] (03PS1) 10Majavah: t5: ignore delete log entries [labs/tools/majavah-bot] - 10https://gerrit.wikimedia.org/r/991359 [14:52:03] (03CR) 10Majavah: [C: 03+2] t5: ignore delete log entries [labs/tools/majavah-bot] - 10https://gerrit.wikimedia.org/r/991359 (owner: 10Majavah) [14:53:06] (03Merged) 10jenkins-bot: t5: ignore delete log entries [labs/tools/majavah-bot] - 10https://gerrit.wikimedia.org/r/991359 (owner: 10Majavah) [15:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [15:14:04] 10Cloud-VPS, 10cloud-services-team, 10Patch-For-Review: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10taavi) 05Open→03Resolved [15:15:05] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: [build-service] php + nodejs projects get detected as nodejs only - https://phabricator.wikimedia.org/T355207 (10pwangai) My PHP tool encountered this problem where nodejs would be installed but PHP wasn't. The issue is now fixed, and my tool builds and... [15:15:55] (MaxConntrack) resolved: Max conntrack at 99.99% on cloudvirt1043:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [15:16:09] (03PS1) 10Majavah: t5: fix array access [labs/tools/majavah-bot] - 10https://gerrit.wikimedia.org/r/991367 [15:17:01] 10Cloud-VPS, 10cloud-services-team, 10Patch-For-Review: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10fnegri) I would expect to see a high number of open connections in one of those hosts, but diffscan... [15:18:28] (03CR) 10Majavah: [C: 03+2] t5: fix array access [labs/tools/majavah-bot] - 10https://gerrit.wikimedia.org/r/991367 (owner: 10Majavah) [15:19:27] (03Merged) 10jenkins-bot: t5: fix array access [labs/tools/majavah-bot] - 10https://gerrit.wikimedia.org/r/991367 (owner: 10Majavah) [15:20:52] PROBLEM - Check systemd state on clouddb1015 is CRITICAL: CRITICAL - degraded: The following units failed: wmf-pt-kill@s4.service,wmf-pt-kill@s6.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:22:48] 10Toolforge (Software install/update), 10Kubernetes: Bookworm based Toolforge Kubernetes webservice image - https://phabricator.wikimedia.org/T355231 (10Count_Count) [15:23:52] RECOVERY - Check systemd state on clouddb1015 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:27:00] 10Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883 (10nskaggs) @Ghuron thank you for the wonderful summary! That is helpful. I agree with you. Let's see if we can't close this knowledge gap. Could link to the source code for... [15:27:51] 10Cloud-VPS, 10cloud-services-team, 10Patch-For-Review: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10fnegri) I found another way that confirms your guess is correct and most connections are from `diff... [15:38:02] 10Cloud-VPS, 10cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10dcaro) >>! In T355222#9466043, @fnegri wrote: > I found another way that confirms your guess is correct and most connecti... [15:39:51] 10Cloud-VPS, 10cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10taavi) >>! In T355222#9465995, @fnegri wrote: > Maybe we can check if any hosts has some conntrack error in the logs? Acc... [15:45:33] vivian-rook opened https://github.com/toolforge/superset-deploy/pull/17 [15:46:02] 10Cloud-VPS, 10cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10fnegri) > I would suspect the current check for conntrack being full would be enough? Yep sorry, I was wrongly assuming... [15:46:56] vivian-rook closed https://github.com/toolforge/superset-deploy/pull/17 [15:47:06] 10Cloud-VPS, 10cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10fnegri) > The fact that the number of flow entries shows is more or less the same even if specifics for diffscan02 means... [15:49:40] 10Cloud-VPS, 10cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10dcaro) >>! In T355222#9466141, @fnegri wrote: >> The fact that the number of flow entries shows is more or less the same... [15:56:57] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: apt buildpack (Aptfile support): not installing dependencies of packages already present on the build image - https://phabricator.wikimedia.org/T353847 (10dcaro) [15:57:35] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [apt-buildpack] some packages install broken links - https://phabricator.wikimedia.org/T355217 (10dcaro) 05Open→03In progress [16:00:39] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Observability-Alerting, 10Goal: Move WMCS off of Icinga and introduce alertmanager - https://phabricator.wikimedia.org/T328502 (10taavi) 05Open→03In progress [16:56:12] 10Cloud-VPS, 10cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1043:9100 - https://phabricator.wikimedia.org/T355222 (10fnegri) `diffscan02` was never at 6k afaict, 6k in my previous comment was the total number //excluding// that host (`gre... [17:38:53] 10Toolforge Build Service: [apt-buildpak] Some APT packages are not installed during the image build, but it l the image build - https://phabricator.wikimedia.org/T355252 (10Dapete) [17:40:12] 10Toolforge Build Service: [apt-buildpak] Some APT packages are not installed during the image build - https://phabricator.wikimedia.org/T355252 (10Dapete) [17:51:53] 10Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883 (10MBH) > Could link to the source code for the tool? Any of [[ https://github.com/Saisengen/wikibots/tree/main/web-services | these]] `.cs` tools. For example, [[ https://gi... [18:05:40] 10Data-Services, 10cloud-services-team: ToolsDB: simplify volume chain - https://phabricator.wikimedia.org/T335593 (10fnegri) [18:05:42] 10Data-Services, 10cloud-services-team (FY2023/2024-Q1-Q2): [toolsdb] test failover procedure - https://phabricator.wikimedia.org/T344719 (10fnegri) [18:05:44] 10Data-Services, 10cloud-services-team (FY2023/2024-Q1-Q2): [toolsdb] test creating a new replica host - https://phabricator.wikimedia.org/T344717 (10fnegri) [18:09:40] 10Data-Services, 10cloud-services-team: ToolsDB: simplify volume chain - https://phabricator.wikimedia.org/T335593 (10fnegri) I've added T344717 and T344719 as subtasks, after those two tasks are completed the volume chain should be simplified and we can avoid following the procedure detailed in the descriptio... [18:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [18:13:59] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.openstack.quota_increase (T344717) [18:14:02] !log fnegri@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) (T344717) [18:14:04] T344717: [toolsdb] test creating a new replica host - https://phabricator.wikimedia.org/T344717 [19:14:40] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [apt-buildpack] Not sourcing /layers/fagiani_apt/apt/.profile.d/000_apt.sh - https://phabricator.wikimedia.org/T355214 (10LucasWerkmeister) [19:23:51] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [apt-buildpack] some packages install broken links - https://phabricator.wikimedia.org/T355217 (10LucasWerkmeister) It’s probably doable to do a post-extract check that looks for any absolute symlinks below `/layers/fagiani_apt/apt/` and adjusts... [20:06:24] 10Cloud-VPS, 10cloud-services-team (Hardware), 10SRE, 10ops-eqiad: Cloudvirt1063.eqiad.wmnet overheating - https://phabricator.wikimedia.org/T353408 (10Jclark-ctr) @Andrew before i change from PerformancePerWatt to PerformanceOptimized do you have any hesitations with that change? Thank you for logs p... [20:24:02] 10PAWS: Move prometheus inside of the cluster - https://phabricator.wikimedia.org/T355179 (10rook) 05Open→03In progress a:03rook [20:31:45] 10wikitech.wikimedia.org, 10LDAP: Change my username on Wikitech - https://phabricator.wikimedia.org/T355249 (10taavi) 05Open→03Stalled [[ https://wikitech.wikimedia.org/wiki/SRE/LDAP/Renaming_users | Renaming developer accounts is currently not possible ]]. Also, please note that we prefer to work in publ... [20:34:10] 10Toolforge (Software install/update), 10cloud-services-team, 10Kubernetes: Create Bookworm-based standalone webservice image - https://phabricator.wikimedia.org/T355231 (10taavi) [20:44:34] 10Cloud-VPS, 10cloud-services-team (Hardware), 10SRE, 10ops-eqiad: Cloudvirt1063.eqiad.wmnet overheating - https://phabricator.wikimedia.org/T353408 (10Andrew) >>! In T353408#9467046, @Jclark-ctr wrote: > @Andrew before i change from PerformancePerWatt to PerformanceOptimized do you have any hesitations... [21:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [21:17:06] 10Cloud-VPS, 10cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138 (10Reedy) [22:13:17] 10Grid-Engine-to-K8s-Migration: Migrate dvorapabot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319709 (10Dvorapa) No need for that, per [[ https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python#Jobs ]] I can just create venv prior and the recreate it every time I nee... [22:13:37] 10Grid-Engine-to-K8s-Migration: Migrate dvorapabot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319709 (10Dvorapa) 05Open→03Resolved BTW solved [22:22:03] (03PS1) 10BCornwall: Add markmonitor API username/password [labs/private] - 10https://gerrit.wikimedia.org/r/991426 (https://phabricator.wikimedia.org/T355190) [22:22:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [22:23:03] (03CR) 10BCornwall: [V: 03+2 C: 03+2] Add markmonitor API username/password [labs/private] - 10https://gerrit.wikimedia.org/r/991426 (https://phabricator.wikimedia.org/T355190) (owner: 10BCornwall) [22:24:48] 10Grid-Engine-to-K8s-Migration: Migrate enwikt-translations from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319724 (10Erutuon) I looked back at my first post and got the `test.sh` to run in `toolforge-jobs` when I set `PATH=/data/project/rustup/rustup/.rustup/toolchains/sta... [22:27:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [22:32:35] (03CR) 10Dzahn: "Interesting to see this, I literally have an ancient ToDo to "check out MarkMonitor API, ask Traffic team if they are still interested in " [labs/private] - 10https://gerrit.wikimedia.org/r/991426 (https://phabricator.wikimedia.org/T355190) (owner: 10BCornwall) [23:39:28] 10Grid-Engine-to-K8s-Migration, 10Pywikibot: Migrate pywikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319981 (10Dvorapa) 05Open→03Resolved a:03Dvorapa Simplified a bit (not using venv branch, rather creating venv from scratch every time) and deployed