[00:01:46] 10Cloud-VPS (Quota-requests), 10cloud-services-team: Increase disk qouta for math - https://phabricator.wikimedia.org/T354579 (10bd808) +1 [00:09:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:57:00] 10Striker, 10Patch-For-Review: Striker dev environment needs a new Phabricator base image - https://phabricator.wikimedia.org/T340080 (10bd808) If nothing else turns up, making an entirely local Dockerfile in the spirit of https://gitlab.wikimedia.org/repos/releng/scap3-dev/-/blob/master/docker/targets/phorge/... [00:59:47] 10Striker: Set description to tool URL when creating project tags - https://phabricator.wikimedia.org/T320916 (10bd808) [01:00:06] (03PS7) 10BryanDavis: Set description when creating Phabricator projects [labs/striker] - 10https://gerrit.wikimedia.org/r/987145 (https://phabricator.wikimedia.org/T344610) (owner: 10Aklapper) [01:01:41] 10Striker, 10Patch-For-Review: Set description to tool URL when creating project tags - https://phabricator.wikimedia.org/T320916 (10bd808) p:05Triage→03Medium a:03Aklapper [01:02:55] (03CR) 10BryanDavis: [C: 03+2] check_username_create: Guard against missing response keys [labs/striker] - 10https://gerrit.wikimedia.org/r/981356 (owner: 10BryanDavis) [01:04:34] (03Merged) 10jenkins-bot: check_username_create: Guard against missing response keys [labs/striker] - 10https://gerrit.wikimedia.org/r/981356 (owner: 10BryanDavis) [01:06:05] 10Striker, 10ARM support, 10Patch-For-Review, 10User-bd808: "Operation not supported: AH00023: Couldn't create the mpm-accept mutex" Apache2 crash under QEMU emulation - https://phabricator.wikimedia.org/T354468 (10bd808) [01:10:46] 10Tool-spacemedia, 10Toolforge: NFS broken - Cannot build nor run my tool anymore - https://phabricator.wikimedia.org/T354581 (10Don-vip) [03:52:38] 10Cloud-VPS (Quota-requests), 10cloud-services-team: Increase disk qouta for math - https://phabricator.wikimedia.org/T354579 (10Andrew) 05Open→03Resolved a:03Andrew done! Please re-open if/when you free up the extra disk space. [05:21:30] 10Toolforge (Software install/update): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10Hawkeye7) The documentation says to become milihistbot and run the build from there $ become mytool $ toolforge build start https://gitlab.wikimedia.org/toolforge-repos/ 10Grid-Engine-to-K8s-Migration: Migrate fastilybot-reports from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319741 (10Fastily) 05Open→03Resolved [07:25:45] (ProbeDown) firing: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:30:45] (ProbeDown) resolved: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:42:45] (ProbeDown) firing: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:47:45] (ProbeDown) resolved: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:28:31] 10cloud-services-team, 10Infrastructure-Foundations, 10LDAP, 10Patch-Needs-Improvement: Rename ldap-labs cluster - https://phabricator.wikimedia.org/T295150 (10MoritzMuehlenhoff) 05Open→03Resolved a:05MoritzMuehlenhoff→03Andrew This is complete (there are still SNIs for the certs for the old name,... [08:57:03] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/988477 (owner: 10L10n-bot) [09:25:45] (ProbeDown) firing: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:30:45] (ProbeDown) resolved: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:31:05] 10Cloud-VPS (Quota-requests), 10cloud-services-team: Increase disk quota for math - https://phabricator.wikimedia.org/T354579 (10Aklapper) [09:40:45] (ProbeDown) firing: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:44:24] 10Tool-spacemedia, 10Toolforge: NFS broken - Cannot build nor run my tool anymore - https://phabricator.wikimedia.org/T354581 (10taavi) As far as I can tell, the logs directory can be written into: `lang=shell-session tools.spacemedia@tools-sgebastion-10:~$ touch logs/test tools.spacemedia@tools-sgebastion-10:... [09:45:00] 10Toolforge (Toolforge iteration 02): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) [09:45:45] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:46:45] (ProbeDown) firing: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:47:26] 10Toolforge (Toolforge iteration 02): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) It's failing already when starting to build using dotnet, at that point the project has been cloned already, looking into the specific scripts of the upstream buildpack t... [09:51:45] (ProbeDown) resolved: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:05:41] 10Striker, 10Patch-For-Review: Set description to tool URL when creating project tags - https://phabricator.wikimedia.org/T320916 (10fnegri) Hmm, I think the best fit for the "description" field in Phabricator would be the "description" value in Striker. I see how the URL of the service can also be useful, but... [10:24:28] 10Tool-spacemedia, 10Toolforge: NFS broken - Cannot build nor run my tool anymore - https://phabricator.wikimedia.org/T354581 (10Don-vip) My logback configuration file is /data/project/spacemedia/conf/logback-spring-toolforge.xml Originally the file was configured to log on file system: ` (ProbeDown) firing: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:54:45] (ProbeDown) resolved: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:56:45] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:01:45] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:08:46] 10Data-Services, 10cloud-services-team, 10Data-Persistence: Improve LVS config for wikireplicas (dbproxy1018/dbproxy1019) - https://phabricator.wikimedia.org/T322658 (10taavi) 05Open→03Declined Declined in favour of the cloudlb work that happened in {T300427} and {T346947}. [11:31:36] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/27 inject_buildpacks: use shimmed dotnet bu... [12:47:47] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) I think that the issue is here: https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpacks/dotnetcore-buildpack/-/blob/move_to_cnb/bin/compile?re... [13:06:33] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Comparative review - https://phabricator.wikimedia.org/T349907 (10KColeman-WMF) [13:06:44] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) a:03dcaro [13:07:49] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) 05Open→03In progress [13:07:51] 10Grid-Engine-to-K8s-Migration: Migrate botsister from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319605 (10dcaro) [13:07:53] 10Grid-Engine-to-K8s-Migration: Migrate botorder from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319604 (10dcaro) [13:07:56] 10Grid-Engine-to-K8s-Migration: Migrate bothasava from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319601 (10dcaro) [13:09:33] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Comparative review - https://phabricator.wikimedia.org/T349907 (10KColeman-WMF) 05Open→03Resolved [13:09:41] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, 10Design: [Design EPIC] Global User Contributions - https://phabricator.wikimedia.org/T349901 (10KColeman-WMF) [13:15:50] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10dcaro) They denied the new drives request, so will try to gather information in a per-host basis with: * Smartct... [13:17:53] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Create user flows for different GUC scenarios - https://phabricator.wikimedia.org/T349902 (10KColeman-WMF) [13:18:06] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10dcaro) Helper graph here https://grafana-rw.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?forceLogin&forceLogi... [13:55:05] 10Toolforge, 10Patch-For-Review: Nullroute tool mail if no maintainers have valid email addresses - https://phabricator.wikimedia.org/T341006 (10taavi) 05Open→03Resolved a:03taavi [13:55:19] 10Grid-Engine-to-K8s-Migration: Migrate potd from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319974 (10komla) Thanks, @Legoktm Can this be closed now? [14:13:47] 10Toolforge (Toolforge iteration 02), 10Toolforge Jobs framework, 10Patch-For-Review: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537 (10taavi) The shell wrapper thing still needs some thinking, so moving back to next up until I have some time to work on this. [14:20:21] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10taavi) [14:20:44] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [tbs.maintain-harbor] Document current setup and admin procedures - https://phabricator.wikimedia.org/T329176 (10taavi) [14:21:35] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1-Q2), 10User-dcaro: [harbor] Redis using all available memory - https://phabricator.wikimedia.org/T354176 (10taavi) [14:22:02] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: builds log streaming times out when time between two loglines exceeds ~1min - https://phabricator.wikimedia.org/T354189 (10taavi) [14:23:35] 10Toolforge (Toolforge iteration 02): Indicate when long envvars are cutoff when listing - https://phabricator.wikimedia.org/T353287 (10taavi) [14:25:47] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [harbor] upgrade to 2.10.x - https://phabricator.wikimedia.org/T354507 (10taavi) [14:25:50] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [builds-builder] move buildpacks under their own namespace in gitlab - https://phabricator.wikimedia.org/T354349 (10taavi) [14:25:53] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [harbor] Investigate new robot account permissions in Harbor 2.10 - https://phabricator.wikimedia.org/T354270 (10taavi) [14:25:55] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [bulids-builder] upgrade builder to lastest with >=2.6.1 buildpacks - https://phabricator.wikimedia.org/T354330 (10taavi) [14:25:58] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [builds-builder] Clojure support was dropped from the builder image - https://phabricator.wikimedia.org/T354252 (10taavi) [14:26:01] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10User-Raymond_Ndibe: alert users when they are about to exceed their harbor quota - https://phabricator.wikimedia.org/T353535 (10taavi) [14:26:05] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Patch-For-Review: [build-serviece,clojure] Current supported heroku builder does not yet include clojure support - https://phabricator.wikimedia.org/T353575 (10taavi) [14:26:07] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [maintain-harbor] Improvements to subcommands and config validation - https://phabricator.wikimedia.org/T353059 (10taavi) [14:26:09] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Patch-For-Review: [builds-cli] delete --all gets 0 builds to delete - https://phabricator.wikimedia.org/T353519 (10taavi) [14:26:11] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [tbs, builds-api] change local environment to use admin account - https://phabricator.wikimedia.org/T352770 (10taavi) [14:26:13] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [tbs] Add dashboards with the new statistics - https://phabricator.wikimedia.org/T352764 (10taavi) [14:26:15] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763 (10taavi) [14:26:21] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10taavi) [14:26:23] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: Add command/arguments to allow a script to wait on build completion/failure - https://phabricator.wikimedia.org/T352561 (10taavi) [14:26:25] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [tbs][builds-api] Refactor `internal/builds.go` - https://phabricator.wikimedia.org/T352762 (10taavi) [14:26:29] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [tbs][builder] Explore adding support for third-party buildpacks - https://phabricator.wikimedia.org/T352389 (10taavi) [14:26:32] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [builds-api] Use admin user credentials for Harbor API auth in dev - https://phabricator.wikimedia.org/T352022 (10taavi) [14:26:37] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Patch-For-Review, 10Upstream: [maintain-harbor] Manage project quotas via maintain-harbor - https://phabricator.wikimedia.org/T352417 (10taavi) [14:26:43] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178 (10taavi) [14:26:46] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [tbs.builds][api, cli] Bring back the ability to specify an image name - https://phabricator.wikimedia.org/T351516 (10taavi) [14:26:48] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: Give builds-api access to system admin credentials - https://phabricator.wikimedia.org/T352007 (10taavi) [14:26:50] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10User-Raymond_Ndibe: add pre-commit to maintain-harbor - https://phabricator.wikimedia.org/T350452 (10taavi) [14:26:52] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Documentation: [tbs] Create a tutorial on compiling static frontend assets at build time - https://phabricator.wikimedia.org/T351082 (10taavi) [14:26:54] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Documentation: Create an ASGI tutorial for buildservice - https://phabricator.wikimedia.org/T350692 (10taavi) [14:26:56] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Documentation: [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092 (10taavi) [14:26:59] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [tbs] migrate sample tools to Gitlab - https://phabricator.wikimedia.org/T348213 (10taavi) [14:27:01] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [tools,harbor] Cleanup old production images - https://phabricator.wikimedia.org/T348538 (10taavi) [14:27:03] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: move from single script to multi-script approach in maintain-harbor - https://phabricator.wikimedia.org/T350410 (10taavi) [14:27:05] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [builds-api] catch harbor timeout when creating repository - https://phabricator.wikimedia.org/T345903 (10taavi) [14:27:07] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Documentation: [tbs] Create a tutorial on how to deploy a ruby on rails tool using build service - https://phabricator.wikimedia.org/T347402 (10taavi) [14:27:09] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: [tbs.build.logs] Show a more user-friendly error message when logs are not ready - https://phabricator.wikimedia.org/T341059 (10taavi) [14:27:11] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: [builds-api] Automatically deploy the webservice when the image is built - https://phabricator.wikimedia.org/T341065 (10taavi) [14:27:14] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10Patch-For-Review: [builds-cli,builds-api] Allow build service to cleanup images to free quota - https://phabricator.wikimedia.org/T341067 (10taavi) [14:27:18] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: `webservice restart` sometimes timing out for buildservice images - https://phabricator.wikimedia.org/T341057 (10taavi) [14:27:20] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10User-dcaro: `toolforge build logs`: add follow options - https://phabricator.wikimedia.org/T339922 (10taavi) [14:27:22] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10User-Raymond_Ndibe: toolforge build start: default to tailing the build as it progresses with the option of -d/--detached - https://phabricator.wikimedia.org/T340079 (10taavi) [14:27:28] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10User-Raymond_Ndibe, 10User-dcaro: Add a way to wait for a Toolforge build to finish - https://phabricator.wikimedia.org/T337043 (10taavi) [14:27:30] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10cloud-services-team, and 3 others: [toolforge-envvars.api,toolforge-build.api] Support using custom environment variables at build time - https://phabricator.wikimedia.org/T338142 (10taavi) [14:27:39] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1-Q2): [tbs] Create a tutorial on how to deploy a Node.js app using Build Service - https://phabricator.wikimedia.org/T353313 (10taavi) [14:27:42] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Team, and 4 others: [builds-api.start] Add statistics - https://phabricator.wikimedia.org/T337390 (10taavi) [14:30:58] 10Toolforge (Toolforge iteration 02), 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-User, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [webservice] Error shown when restarting buildpack-based tool - https://phabricator.wikimedia.org/T348312 (10taavi) > `If this persists pleas... [14:42:33] 10Toolforge Build Service: [tbs][dev] decide on which kubernetes bootstrapper to focus on between minikube and kind - https://phabricator.wikimedia.org/T347723 (10fnegri) [14:42:46] 10Toolforge Build Service: [tbs][dev] find an alternative to Vagrant - https://phabricator.wikimedia.org/T348960 (10fnegri) [14:42:55] 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: tbs: user-story 11: Add section to admin docs on how to debug the service, how to pin-point the failing component and how to get the ... - https://phabricator.wikimedia.org/T325174 [14:43:11] 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: tbs: user-story 14: Run a set of security checks on the full service - https://phabricator.wikimedia.org/T325208 (10fnegri) [14:45:16] 10Toolforge: [dev] find an alternative to Vagrant - https://phabricator.wikimedia.org/T348960 (10taavi) [14:46:46] 10Toolforge: [dev] find an alternative to Vagrant - https://phabricator.wikimedia.org/T348960 (10Slst2020) a:03Slst2020 [14:47:38] 10Toolforge (Toolforge iteration 02), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10Slst2020) 05In progress→03Open [15:28:16] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review: [build-serviece,clojure] Current supported heroku builder does not yet include clojure support - https://phabricator.wikimedia.org/T353575 (10dcaro) [15:28:33] 10Toolforge (Toolforge iteration 03): Indicate when long envvars are cutoff when listing - https://phabricator.wikimedia.org/T353287 (10dcaro) [15:28:37] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: builds log streaming times out when time between two loglines exceeds ~1min - https://phabricator.wikimedia.org/T354189 (10dcaro) [15:28:52] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) [15:29:04] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: [webservice] php 7.4 containers don't pass through the environment variables to the scripts - https://phabricator.wikimedia.org/T354320 (10dcaro) [15:29:16] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review, 10Upstream: [maintain-harbor] Manage project quotas via maintain-harbor - https://phabricator.wikimedia.org/T352417 (10dcaro) [15:29:26] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [harbor] upgrade to 2.10.x - https://phabricator.wikimedia.org/T354507 (10dcaro) [15:29:33] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: [builds-api] Automatically deploy the webservice when the image is built - https://phabricator.wikimedia.org/T341065 (10dcaro) [15:29:44] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review, 10User-Raymond_Ndibe: [gitlab,toolforge-deploy] Create a process to open an MR to toolforge-deploy when a new release ofa component happens - https://phabricator.wikimedia.org/T347392 (10dcaro) [15:29:46] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 03): Decision request – Toolforge CLI consolidation - https://phabricator.wikimedia.org/T348749 (10dcaro) [15:29:54] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review: Toolforge Build Service: add the locale buildpack - https://phabricator.wikimedia.org/T354128 (10dcaro) [15:30:04] 10Toolforge (Toolforge iteration 03): [toolforge-cd] gitlab-ci refactor - https://phabricator.wikimedia.org/T353514 (10dcaro) [15:30:08] 10Toolforge (Toolforge iteration 03), 10User-Raymond_Ndibe: [toolforge-cd] find out why we run two gitlab ci/cd pipelines after merge - https://phabricator.wikimedia.org/T353563 (10dcaro) [15:30:28] 10Toolforge (Toolforge iteration 03): [dev] Investigate lima-vm as an alternative to Vagrant for lima-kilo - https://phabricator.wikimedia.org/T354406 (10dcaro) [15:30:30] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1-Q2), 10User-dcaro: [harbor] Redis using all available memory - https://phabricator.wikimedia.org/T354176 (10dcaro) [15:30:44] 10Toolforge (Toolforge iteration 03): [envvars-cli] move pytest from tox to pre-commit - https://phabricator.wikimedia.org/T351476 (10dcaro) [15:30:51] 10Toolforge (Toolforge iteration 03): [toolforge-cd] discuss the possibility of removing tests from merge request ci/cd pipelines - https://phabricator.wikimedia.org/T353740 (10dcaro) [15:30:53] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [tbs.maintain-harbor] Document current setup and admin procedures - https://phabricator.wikimedia.org/T329176 (10dcaro) [15:30:56] 10Toolforge (Toolforge iteration 03), 10Toolforge Jobs framework, 10Patch-For-Review: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537 (10dcaro) [15:30:58] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10User-Raymond_Ndibe: alert users when they are about to exceed their harbor quota - https://phabricator.wikimedia.org/T353535 (10dcaro) [15:31:00] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10dcaro) [15:31:02] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [maintain-harbor] Improvements to subcommands and config validation - https://phabricator.wikimedia.org/T353059 (10dcaro) [15:31:04] 10Toolforge (Toolforge iteration 03): [ci] Add shellcheck to pre-commit where missing - https://phabricator.wikimedia.org/T353052 (10dcaro) [15:31:06] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1-Q2): [tbs] Create a tutorial on how to deploy a Node.js app using Build Service - https://phabricator.wikimedia.org/T353313 (10dcaro) [15:31:11] 10Toolforge (Toolforge iteration 03): [ci] Investigate discrepancy between different CI envs - https://phabricator.wikimedia.org/T353044 (10dcaro) [15:31:13] 10Toolforge (Toolforge iteration 03): Decide what abstractions we want to expose to Toolforge users in the longer term - https://phabricator.wikimedia.org/T352857 (10dcaro) [15:31:15] 10Toolforge (Toolforge iteration 03): [docs] Update Toolforge component README's - https://phabricator.wikimedia.org/T352964 (10dcaro) [15:31:17] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [tbs] Add dashboards with the new statistics - https://phabricator.wikimedia.org/T352764 (10dcaro) [15:31:19] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763 (10dcaro) [15:31:21] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [tbs][builder] Explore adding support for third-party buildpacks - https://phabricator.wikimedia.org/T352389 (10dcaro) [15:31:23] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178 (10dcaro) [15:31:25] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Documentation: [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092 (10dcaro) [15:31:27] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Documentation: [tbs] Create a tutorial on compiling static frontend assets at build time - https://phabricator.wikimedia.org/T351082 (10dcaro) [15:31:29] 10Toolforge (Toolforge iteration 03), 10Technical-blog-posts: Publish a blog post about buildservice on the Tech Blog - https://phabricator.wikimedia.org/T350691 (10dcaro) [15:31:31] 10Toolforge (Toolforge iteration 03), 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-User, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [webservice] Error shown when restarting buildpack-based tool - https://phabricator.wikimedia.org/T348312 (10dcaro) [15:31:33] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: `webservice restart` sometimes timing out for buildservice images - https://phabricator.wikimedia.org/T341057 (10dcaro) [15:31:35] 10Toolforge (Toolforge iteration 03), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Upgrade cadvisor - https://phabricator.wikimedia.org/T349795 (10dcaro) [15:31:38] 10Toolforge (Toolforge iteration 03): Expose tool-labs service names via environment variables - https://phabricator.wikimedia.org/T151002 (10dcaro) [15:31:42] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, and 3 others: tbs: user-story 10: I want to know how to manage the service - https://phabricator.wikimedia.org/T325166 (10dcaro) [15:31:46] 10Toolforge (Toolforge iteration 03), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656 (10dcaro) [15:53:51] PROBLEM - Check systemd state on cloudrabbit1003 is CRITICAL: CRITICAL - degraded: The following units failed: rabbitmq_detect_partition.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:56:33] (SystemdUnitDown) firing: The service unit rabbitmq_detect_partition.service is in failed status on host cloudrabbit1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:10:51] 10Cloud-VPS, 10cloud-services-team, 10SRE, 10ops-eqiad, 10Patch-For-Review: cloudrabbit: connect them via cloudsw and cloud-private - https://phabricator.wikimedia.org/T345610 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by taavi@cumin1002 for hosts: `cloudrabbit1003.wikimedia.org` -... [16:12:04] 10Cloud-VPS, 10cloud-services-team, 10SRE, 10ops-eqiad, 10Patch-For-Review: cloudrabbit: connect them via cloudsw and cloud-private - https://phabricator.wikimedia.org/T345610 (10taavi) >>! In T345610#9409823, @taavi wrote: > This can move forward now, although due to the nature of Rabbit this needs to b... [16:13:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:16:00] 10Tool-spacemedia, 10Toolforge: NFS broken - Cannot build nor run my tool anymore - https://phabricator.wikimedia.org/T354581 (10taavi) That's odd, I indeed can't delete that directory from any Toolforge bastion node. Moving the directory is possible, however: `lang=shell-session tools.spacemedia@tools-sgebast... [16:17:17] 10Tool-spacemedia, 10Toolforge: NFS broken - Cannot build nor run my tool anymore - https://phabricator.wikimedia.org/T354581 (10taavi) It's that specific directory that's cursed, not the name: `lang=shell-session tools.spacemedia@tools-sgebastion-11:~$ mv /data/project/spacemedia/spacemedia/target /data/proje... [16:18:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:25:21] 10Cloud-VPS, 10cloud-services-team, 10DC-Ops, 10SRE, and 2 others: cloudrabbit: connect them via cloudsw and cloud-private - https://phabricator.wikimedia.org/T345610 (10taavi) a:05taavi→03None [16:47:09] 10Tool-ducttape, 10Abstract Wikipedia team: Add documentation for retriggering a failed pipeline to AW developer cheatsheet - https://phabricator.wikimedia.org/T333192 (10Etonkovidova) 05In progress→03Resolved [16:57:22] 10PAWS: update opentofu version - https://phabricator.wikimedia.org/T351402 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/353 [16:57:28] vivian-rook closed https://github.com/toolforge/paws/pull/353 [16:58:04] 10PAWS: move to tofu - https://phabricator.wikimedia.org/T354671 (10rook) [16:59:56] 10PAWS: move to tofu - https://phabricator.wikimedia.org/T354671 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/362 [17:00:00] vivian-rook opened https://github.com/toolforge/paws/pull/362 [17:03:01] vivian-rook closed https://github.com/toolforge/paws/pull/362 [17:04:58] 10PAWS: move to tofu - https://phabricator.wikimedia.org/T354671 (10rook) 05Open→03Resolved [17:05:28] 10PAWS: update opentofu version - https://phabricator.wikimedia.org/T351402 (10rook) 05Open→03Resolved [17:06:25] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/27 inject_buildpacks: use shimmed dotnet bu... [17:07:48] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_request... [17:17:27] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [17:17:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:18:02] !log toolsbeta dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [17:18:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:21:00] 10Cloud Services Proposals: Decision Request - Incident Response Process - https://phabricator.wikimedia.org/T348887 (10fnegri) Getting back to this after a while... I like option 2.1, and I think everything mentioned there is in scope for this task. [17:25:23] 10Cloud Services Proposals, 10cloud-services-team (FY2023/2024-Q1-Q2): Decision Request - Incident Response Process - https://phabricator.wikimedia.org/T348887 (10fnegri) a:03fnegri [17:25:31] 10Cloud Services Proposals, 10cloud-services-team (FY2023/2024-Q1-Q2): Decision Request - Incident Response Process - https://phabricator.wikimedia.org/T348887 (10fnegri) [17:30:35] !log tools dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [17:30:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:31:11] !log tools dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [17:31:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:32:18] PROBLEM - Check systemd state on cloudservices1006 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-node-textfile-wmcs-dnsleaks.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:33:36] I have a fix pending for ^, just waiting for Jenkins to catch up [17:36:33] (SystemdUnitDown) firing: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:42:52] !log fran@wmf3169 admin %(message)s (T346631) [17:42:52] !log fran@wmf3169 admin %(message)s (T346631) [17:42:52] !log fran@wmf3169 admin %(message)s (T346631) [17:42:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:42:59] T346631: [wmcs-cookbooks] SAL messages are shown differently when logging via wm-bot - https://phabricator.wikimedia.org/T346631 [17:43:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:43:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:46:47] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) @Hawkeye7 Just deployed a fix for that, can you try again? Note that the compiled binaries are under `heroku_output/`, so for your proc... [17:47:35] 10Toolforge (Toolforge iteration 03), 10Patch-For-Review: Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/166 builds-builder: bump to 0.0.85-202401... [17:49:52] !log fran@wmf3169 admin START - Cookbook wmcs.do_log_msg (T346631) [17:49:52] !log fran@wmf3169 admin test message2 from local cookbook (T346631) [17:49:52] !log fran@wmf3169 admin END (PASS) - Cookbook wmcs.do_log_msg (exit_code=0) (T346631) [17:49:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:49:58] T346631: [wmcs-cookbooks] SAL messages are shown differently when logging via wm-bot - https://phabricator.wikimedia.org/T346631 [17:50:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:50:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:51:54] (03PS1) 10FNegri: SAL logging: invert user and project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 [17:55:19] (03CR) 10CI reject: [V: 04-1] SAL logging: invert user and project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 (owner: 10FNegri) [18:04:34] PROBLEM - Check systemd state on cloudservices1005 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-node-textfile-wmcs-dnsleaks.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:05:11] (03PS2) 10FNegri: SAL logging: invert user and project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 [18:06:33] (SystemdUnitDown) firing: (2) The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudservices1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:08:09] (03CR) 10CI reject: [V: 04-1] SAL logging: invert user and project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 (owner: 10FNegri) [18:10:23] (03PS3) 10FNegri: SAL logging: invert user and project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 [18:14:20] (03PS1) 10Btullis: Add dummy keytabs for new hadoop master servers [labs/private] - 10https://gerrit.wikimedia.org/r/989222 (https://phabricator.wikimedia.org/T332573) [18:14:49] (03CR) 10Btullis: [V: 03+2 C: 03+2] Add dummy keytabs for new hadoop master servers [labs/private] - 10https://gerrit.wikimedia.org/r/989222 (https://phabricator.wikimedia.org/T332573) (owner: 10Btullis) [19:21:26] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:30:24] RECOVERY - Check systemd state on cloudservices1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:34:19] (HAProxyBackendUnavailable) firing: (2) HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:39:19] (HAProxyBackendUnavailable) resolved: (2) HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:49:51] 10Tool-spacemedia, 10Toolforge: NFS broken - Cannot build nor run my tool anymore - https://phabricator.wikimedia.org/T354581 (10Don-vip) Thank you! I am now able to build my tool again. I've done the same with the logs folder (mv logs logs-old && mkdir logs) and now my tool can start and log again, as before... [19:56:05] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: [gitlab,toolforge-deploy] Create a process to open an MR to toolforge-deploy when a new release ofa component happens - https://phabricator.wikimedia.org/T347392 (10Raymond_Ndibe) [19:56:36] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: [gitlab,toolforge-deploy] Create a process to open an MR to toolforge-deploy when a new release ofa component happens - https://phabricator.wikimedia.org/T347392 (10Raymond_Ndibe) 05Stalled→03Resolved [20:01:32] RECOVERY - Check systemd state on cloudservices1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:14:03] (SystemdUnitDown) resolved: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudservices1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:07:38] 10PAWS: tofu state file to object storage - https://phabricator.wikimedia.org/T352164 (10rook) 05Open→03Resolved [21:07:40] 10PAWS: tofu state file to object storage - https://phabricator.wikimedia.org/T352164 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/354 [21:07:46] vivian-rook closed https://github.com/toolforge/paws/pull/354 [21:21:09] 10Striker, 10wikitech.wikimedia.org, 10MediaWiki-extensions-OATHAuth: Wikitech 2FA does not appear to allow recovery with recovery codes - https://phabricator.wikimedia.org/T204682 (10Reedy) [22:08:06] (03PS1) 10LWatson: releases: Bump Codex to 1.2.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/989252 [22:29:53] (03CR) 10Eric Gardner: [C: 03+2] releases: Bump Codex to 1.2.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/989252 (owner: 10LWatson) [22:30:30] (03Merged) 10jenkins-bot: releases: Bump Codex to 1.2.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/989252 (owner: 10LWatson) [22:34:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance tools-k8s-worker-98 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [23:11:25] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds.builder [23:11:34] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds.builder [23:12:02] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [23:12:10] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder [23:21:27] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:27:35] (HarborDown) firing: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [23:36:32] 10Toolforge Build Service, 10cloud-services-team: harbor is failing, breaking many toolforge workflows - https://phabricator.wikimedia.org/T354714 (10Andrew) [23:49:35] 10Toolforge Build Service, 10cloud-services-team: harbor is failing, breaking many toolforge workflows - https://phabricator.wikimedia.org/T354714 (10Andrew) this is the harbor DB volume running out of space [23:50:00] 10Toolforge Build Service, 10cloud-services-team: harbor is failing, breaking many toolforge workflows - https://phabricator.wikimedia.org/T354714 (10bd808) `lang=irc [23:36] < bd808> "Jan  9 23:35:32 172.18.0.1 core[628]: 2024-01-09T23:35:32Z [FATAL] [/core/main.go:180]: failed to initialize database: reg... [23:50:24] (HAProxyBackendUnavailable) firing: HAProxy service nova-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [23:50:52] 10Toolforge Build Service, 10cloud-services-team: harbor is failing, breaking many toolforge workflows - https://phabricator.wikimedia.org/T354714 (10bd808) p:05Triage→03High [23:54:55] 10Toolforge Build Service, 10cloud-services-team: harbor is failing, breaking many toolforge workflows - https://phabricator.wikimedia.org/T354714 (10bd808) `lang=irc [23:27] (HarborDown) firing: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown  - h...