[00:06:09] 10Toolforge Build Service, 10cloud-services-team: harbor is failing, breaking many toolforge workflows - https://phabricator.wikimedia.org/T354714 (10Andrew) I resized the posgres volume (the hard way) and harbor seems happier now. [00:07:35] (HarborDown) resolved: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [00:09:26] 10Toolforge Build Service, 10cloud-services-team: harbor is failing, breaking many toolforge workflows - https://phabricator.wikimedia.org/T354714 (10bd808) It is alive again! `lang=shell-session,lines=10 $ toolforge-build -v --debug start https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval DE... [00:10:19] (HAProxyBackendUnavailable) resolved: HAProxy service nova-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [00:11:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:12:51] 10Toolforge Build Service, 10cloud-services-team: harbor is failing, breaking many toolforge workflows - https://phabricator.wikimedia.org/T354714 (10bd808) [00:25:59] 10Toolforge Build Service, 10cloud-services-team: harbor is failing, breaking many toolforge workflows - https://phabricator.wikimedia.org/T354714 (10bd808) p:05High→03Medium The user facing breakage seems to be over. I think there are a few things that we could consider following up on: * An end-to-end te... [01:14:40] 10Toolforge (Toolforge iteration 03): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10Hawkeye7) Making some progress. Build succeeded. tools.milhistbot@tools-sgebastion-10:~/bin$ toolforge build show Build ID: milhistbot-buildpacks-pipelinerun-x5lv7 Start Time:... [01:28:18] 10Toolforge (Toolforge iteration 03): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10bd808) >>! In T311466#9448631, @Hawkeye7 wrote: > What is the image called? From your `toolforge build show` output, "Destination Image: tools-harbor.wmcloud.org/tool-milhistbo... [01:34:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance tools-k8s-worker-98 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [01:34:19] 10Toolforge (Toolforge iteration 03): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10Hawkeye7) That's what it says, but it is giving me an error message saying it does not exist: $ toolforge jobs run --image tools-harbor.wmcloud.org/tool-milhistbot/liftwing:lat... [01:40:20] 10Toolforge (Toolforge iteration 03): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10JJMC89) Based on https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Job, it would be `--image tool-milhistbot/liftwing:latest`. [02:17:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [02:19:55] 10Toolforge (Toolforge iteration 03): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10Hawkeye7) @JJMC89 That seems correct. Now I have an error: tools.milhistbot@tools-sgebastion-10:~$ toolforge jobs run --image tool-milhistbot/liftwing:latest --command "web '... [02:22:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [03:11:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [03:11:22] 10Toolforge (Toolforge iteration 03): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10Hawkeye7) tools.milhistbot@tools-sgebastion-10:~$ cat milhistbot-liftwing/Procfile web: heroku_output/Liftwing tools.milhistbot@tools-sgebastion-10:~$ toolforge jobs run --ima... [03:21:28] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:34:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance tools-k8s-worker-98 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [05:36:57] 10Toolforge Jobs framework, 10Kubernetes: Transient cronjob scheduling failures on Toolforge k8s - https://phabricator.wikimedia.org/T338006 (10Legoktm) So I've seen what I suspect is the same issue, the potd tool's "send" job was never triggered on 2024-01-06 at 2:00; and the tfaprotbot tool's "tfasemibot" jo... [05:38:58] 10Grid-Engine-to-K8s-Migration: Migrate potd from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319974 (10Legoktm) 05Open→03Resolved I think so, there's still an issue that a job was skipped one day, but we can track that at T338006#9448777. [05:56:28] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [05:56:56] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:01:43] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:11:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [06:19:06] 10Toolforge Jobs framework: toolforge-jobs --wait will only wait 5 minutes - https://phabricator.wikimedia.org/T352945 (10Legoktm) 05Open→03Resolved Thank you! [07:34:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance tools-k8s-worker-98 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:11:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [09:19:03] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance tools-k8s-worker-98 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:29:22] 10Toolforge Build Service, 10cloud-services-team: builds-cli --debug option behaviour is confusing - https://phabricator.wikimedia.org/T354726 (10taavi) [09:34:53] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10cloud-services-team: builds-cli loses body text from 503 errors - https://phabricator.wikimedia.org/T354727 (10taavi) [09:37:05] 10Toolforge Build Service, 10cloud-services-team: harbor is failing, breaking many toolforge workflows - https://phabricator.wikimedia.org/T354714 (10dcaro) >>! In T354714#9448601, @bd808 wrote: > The user facing breakage seems to be over. I think there are a few things that we could consider following up on:... [09:37:21] 10Cloud-VPS, 10cloud-services-team, 10Upstream: Trove does not expose amount of disk space used - https://phabricator.wikimedia.org/T354728 (10taavi) [09:41:06] 10Toolforge Build Service, 10cloud-services-team: builds-api should log errors leading up to 5xx errors - https://phabricator.wikimedia.org/T354731 (10taavi) [09:42:03] 10Toolforge Build Service, 10cloud-services-team: [harbor,trove] trove database full mave harbor-core fail, breaking any toolforge buildservice related flow - https://phabricator.wikimedia.org/T354714 (10dcaro) [09:42:53] 10Toolforge Build Service, 10cloud-services-team: [harbor,trove] Trove DB filled disk and caused toolforge-build to fail as a result - https://phabricator.wikimedia.org/T354714 (10dcaro) [09:50:14] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10cloud-services-team: [harbor] Use `harbor_up` for alerting about harbor components being down - https://phabricator.wikimedia.org/T354736 (10dcaro) [09:53:37] 10Toolforge Build Service, 10cloud-services-team: [harbor] Update HarborDown runbook with the incident debugging details - https://phabricator.wikimedia.org/T354739 (10dcaro) [10:02:47] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10cloud-services-team: builds-cli loses body text from 503 errors - https://phabricator.wikimedia.org/T354727 (10taavi) [10:05:42] 10Toolforge Build Service, 10cloud-services-team: builds-cli --debug option behaviour is confusing - https://phabricator.wikimedia.org/T354726 (10taavi) [10:40:27] 10Toolforge Build Service, 10cloud-services-team, 10Patch-For-Review: [harbor,trove] Trove DB filled disk and caused toolforge-build to fail as a result - https://phabricator.wikimedia.org/T354714 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/8 har... [10:51:08] 10Toolforge (Toolforge iteration 03): [toolforge API] Investigate ways to present our openapi definitions to users - https://phabricator.wikimedia.org/T354745 (10dcaro) [11:27:42] (03PS4) 10FNegri: SAL logging: invert user and project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 (https://phabricator.wikimedia.org/T346631) [11:28:32] 10Toolforge (Toolforge iteration 03): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10dcaro) For some reason that image did not pick up the Procfile properly, it should have generated a `/cnb/lifecycle/web` binary that's not there: ` dcaro@urcuchillay$ podman run... [11:30:19] 10Toolforge (Toolforge iteration 03): [jobs-cli] AttributeError: module 'requests.exceptions' has no attribute 'InvalidJSONError' when getting 5xx from the server - https://phabricator.wikimedia.org/T354748 (10dcaro) [11:32:43] 10Toolforge (Toolforge iteration 03), 10Toolforge Jobs framework: [jobs-cli] AttributeError: module 'requests.exceptions' has no attribute 'InvalidJSONError' when getting 5xx from the server - https://phabricator.wikimedia.org/T354748 (10taavi) This looks like a case of the `requests` version from Buster that'... [11:33:09] 10Toolforge (Toolforge iteration 03), 10Toolforge Jobs framework: [jobs-cli] AttributeError: module 'requests.exceptions' has no attribute 'InvalidJSONError' when getting 5xx from the server - https://phabricator.wikimedia.org/T354748 (10taavi) (The actual reason for the 5xx error is T349775) [11:34:06] 10Toolforge (Toolforge iteration 03), 10Toolforge Jobs framework: [jobs-cli] AttributeError: module 'requests.exceptions' has no attribute 'InvalidJSONError' when getting 5xx from the server - https://phabricator.wikimedia.org/T354748 (10Slst2020) I ran into the same error yesterday with builds-cli when trying... [11:34:17] 10PAWS: Upgrade Jupyterlab - https://phabricator.wikimedia.org/T354749 (10rook) [11:35:51] 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: tbs: user-story 11: Add section to admin docs on how to debug the service, how to pin-point the failing component and how to get the ... - https://phabricator.wikimedia.org/T325174 [11:35:57] 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 2 others: [tbs.beta] Create a toolforge build service beta release - https://phabricator.wikimedia.org/T267374 (10dcaro) [11:36:25] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, and 3 others: tbs: user-story 10: I want to know how to manage the service - https://phabricator.wikimedia.org/T325166 (10dcaro) 05Open→03Resolved a:03dcaro [11:40:10] 10Toolforge (Toolforge iteration 03), 10Toolforge Jobs framework: [jobs-cli] AttributeError: module 'requests.exceptions' has no attribute 'InvalidJSONError' when getting 5xx from the server - https://phabricator.wikimedia.org/T354748 (10dcaro) a:05dcaro→03None [11:42:11] 10Toolforge Build Service, 10Documentation: [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092 (10dcaro) [11:43:20] 10Toolforge Build Service, 10Technical-blog-posts: Publish a blog post about buildservice on the Tech Blog - https://phabricator.wikimedia.org/T350691 (10dcaro) [11:43:33] 10Toolforge Build Service, 10Documentation: [tbs] Create a tutorial on compiling static frontend assets at build time - https://phabricator.wikimedia.org/T351082 (10dcaro) [11:43:47] 10Toolforge Build Service: [tbs][builder] Explore adding support for third-party buildpacks - https://phabricator.wikimedia.org/T352389 (10dcaro) [11:44:05] 10Toolforge Build Service: `webservice restart` sometimes timing out for buildservice images - https://phabricator.wikimedia.org/T341057 (10dcaro) [11:44:44] 10Toolforge: [ci] Investigate discrepancy between different CI envs - https://phabricator.wikimedia.org/T353044 (10dcaro) [11:45:02] 10Toolforge: [docs] Update Toolforge component README's - https://phabricator.wikimedia.org/T352964 (10dcaro) [12:01:46] 10Toolforge (Toolforge iteration 03), 10Toolforge Jobs framework: [jobs-api] Migrate to Poetry - https://phabricator.wikimedia.org/T354751 (10taavi) [12:02:46] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:04:18] 10Toolforge (Toolforge iteration 03), 10Toolforge Jobs framework: [jobs-api] Migrate to Gunicorn - https://phabricator.wikimedia.org/T354752 (10taavi) [12:05:26] 10Toolforge (Toolforge iteration 03): Toolforge next user stories - 2024 version - https://phabricator.wikimedia.org/T352857 (10dcaro) [12:07:46] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:09:52] 10PAWS: Upgrade Jupyterlab - https://phabricator.wikimedia.org/T354749 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/363 [12:09:58] vivian-rook opened https://github.com/toolforge/paws/pull/363 [12:11:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [12:15:41] 10Toolforge (Toolforge iteration 03), 10Toolforge Jobs framework, 10Patch-For-Review: [jobs-api] Migrate to Gunicorn - https://phabricator.wikimedia.org/T354752 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/53 build: Migrate to Poetry [12:16:02] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10cloud-services-team: builds-cli loses body text from 503 errors - https://phabricator.wikimedia.org/T354727 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/47 build: Fix response... [12:49:52] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [builds-api] Improve error message when logs time out - https://phabricator.wikimedia.org/T354755 (10Slst2020) [12:53:54] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [builds-api] Improve error message when logs time out - https://phabricator.wikimedia.org/T354755 (10Slst2020) [13:11:33] 10Toolforge (Toolforge iteration 03), 10Toolforge Jobs framework, 10Patch-For-Review: [jobs-api] Migrate to Gunicorn - https://phabricator.wikimedia.org/T354752 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/55 build: Use Gunicorn instead of uWSGI [13:24:41] (PrometheusRestarted) firing: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [13:25:50] 10PAWS: Upgrade Jupyterlab - https://phabricator.wikimedia.org/T354749 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/363 [13:25:57] vivian-rook closed https://github.com/toolforge/paws/pull/363 [13:27:12] 10PAWS: Upgrade Jupyterlab - https://phabricator.wikimedia.org/T354749 (10rook) 05Open→03Resolved a:03rook [13:29:04] 10Toolforge: Setup Pint for tools-prometheus - https://phabricator.wikimedia.org/T354760 (10taavi) [13:34:24] 10Cloud-VPS, 10cloud-services-team: Linting problems found for NovafullstackSustainedFailures - https://phabricator.wikimedia.org/T351698 (10Andrew) a:05taavi→03Andrew [13:35:40] 10Cloud-VPS (Project-requests): Request creation of OpenVAS VPS project - https://phabricator.wikimedia.org/T354192 (10KHurd-WMF) 05Resolved→03Open Hey Francisco, I have a question regarding SSH access which may lead to me saying the wrong username. For me to use Khurd and not Khurd1, I would need to reques... [13:49:41] (PrometheusRestarted) resolved: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [13:51:39] (CloudVPSDesignateLeaks) firing: (2) Detected 5 stray dns records on - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:09:18] 10Striker, 10Infrastructure-Foundations, 10LDAP: Store Wikimedia unified account name (SUL) in LDAP directory - https://phabricator.wikimedia.org/T148048 (10bd808) I have local draft changes which confirm that a migration is needed to add an `objectClass: wikimediaPerson` attribute to each existing Developer... [14:25:52] 10Toolforge Build Service: Add builder support for Perl runtime projects - https://phabricator.wikimedia.org/T353744 (10dcaro) All the options look quite dire to be fair, the most active seems to be https://github.com/miyagawa/heroku-buildpack-perl (2 years since the last commit). We might have to end up adopti... [14:28:56] 10Toolforge Jobs framework, 10Kubernetes: Transient cronjob scheduling failures on Toolforge k8s - https://phabricator.wikimedia.org/T338006 (10taavi) >>! In T338006#9448777, @Legoktm wrote: > So I've seen what I suspect is the same issue, the potd tool's "send" job was never triggered on 2024-01-06 at 2:00; a... [14:39:11] (03PS1) 10MVernon: hiera: add fake swift passwords for netbox_dev user [labs/private] - 10https://gerrit.wikimedia.org/r/989531 (https://phabricator.wikimedia.org/T354766) [14:44:52] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Unplanned, 10Continuous-Integration-Config, 10User-dcaro: [ci,operations-puppet] upgrade to tox 4 in order to detect changed requirement files - https://phabricator.wikimedia.org/T345152 (10hashar) 05Open... [14:48:43] !log fnegri@cloudcumin1001 openvas START - Cookbook wmcs.vps.add_user_to_project for user 'kelhurd' in role 'member' (T354192) [14:48:47] T354192: Request creation of OpenVAS VPS project - https://phabricator.wikimedia.org/T354192 [14:49:28] !log fnegri@cloudcumin1001 openvas END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'kelhurd' in role 'member' (T354192) [14:50:16] 10Cloud-VPS (Project-requests): Request creation of OpenVAS VPS project - https://phabricator.wikimedia.org/T354192 (10fnegri) 05Open→03Resolved No problem, I have now added both accounts to the project, so you can use either. The "shell username" used for SSH is different from your developer account userna... [14:51:39] (CloudVPSDesignateLeaks) resolved: (2) Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:54:12] (03CR) 10BryanDavis: [C: 03+2] dev(docker): Update blubber buildkit to support Apple Silicon [labs/striker] - 10https://gerrit.wikimedia.org/r/988121 (https://phabricator.wikimedia.org/T318866) (owner: 10BryanDavis) [14:54:18] (03CR) 10BryanDavis: [C: 03+2] dev(docker): Force linux/amd64 runtime selection [labs/striker] - 10https://gerrit.wikimedia.org/r/988122 (https://phabricator.wikimedia.org/T354467) (owner: 10BryanDavis) [14:54:52] (03CR) 10BryanDavis: [C: 03+2] dev(docker): Workaround Apache2 crash under QEMU emulation [labs/striker] - 10https://gerrit.wikimedia.org/r/988123 (https://phabricator.wikimedia.org/T354468) (owner: 10BryanDavis) [14:55:22] (03CR) 10BryanDavis: [C: 03+2] dev(docker): Declare that ldapwiki depends on sulwiki [labs/striker] - 10https://gerrit.wikimedia.org/r/988127 (owner: 10BryanDavis) [14:55:57] (03CR) 10BryanDavis: [C: 03+2] dev(docker): Temporarily use quay.io/bd808/bitnami/phabricator:2021 [labs/striker] - 10https://gerrit.wikimedia.org/r/988744 (https://phabricator.wikimedia.org/T340080) (owner: 10BryanDavis) [14:57:22] (03Merged) 10jenkins-bot: dev(docker): Update blubber buildkit to support Apple Silicon [labs/striker] - 10https://gerrit.wikimedia.org/r/988121 (https://phabricator.wikimedia.org/T318866) (owner: 10BryanDavis) [14:57:36] (03Merged) 10jenkins-bot: dev(docker): Force linux/amd64 runtime selection [labs/striker] - 10https://gerrit.wikimedia.org/r/988122 (https://phabricator.wikimedia.org/T354467) (owner: 10BryanDavis) [14:58:14] (03Merged) 10jenkins-bot: dev(docker): Workaround Apache2 crash under QEMU emulation [labs/striker] - 10https://gerrit.wikimedia.org/r/988123 (https://phabricator.wikimedia.org/T354468) (owner: 10BryanDavis) [14:58:16] (03Merged) 10jenkins-bot: dev(docker): Fix typo in Keystone configuration [labs/striker] - 10https://gerrit.wikimedia.org/r/988124 (owner: 10BryanDavis) [14:58:18] (03Merged) 10jenkins-bot: dev(docker): Use stable MediaWiki vendor branch source [labs/striker] - 10https://gerrit.wikimedia.org/r/988125 (owner: 10BryanDavis) [15:00:39] (03Merged) 10jenkins-bot: dev(docker): Update phabricator extension source [labs/striker] - 10https://gerrit.wikimedia.org/r/988126 (https://phabricator.wikimedia.org/T340080) (owner: 10BryanDavis) [15:00:41] (03Merged) 10jenkins-bot: dev(docker): Declare that ldapwiki depends on sulwiki [labs/striker] - 10https://gerrit.wikimedia.org/r/988127 (owner: 10BryanDavis) [15:00:57] (03Merged) 10jenkins-bot: dev(docker): Temporarily use quay.io/bd808/bitnami/phabricator:2021 [labs/striker] - 10https://gerrit.wikimedia.org/r/988744 (https://phabricator.wikimedia.org/T340080) (owner: 10BryanDavis) [15:06:44] 10Data-Services, 10cloud-services-team, 10Patch-For-Review: Move wiki replicas behind cloudlb - https://phabricator.wikimedia.org/T346947 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by taavi@cumin1002 for hosts: `dbproxy[1018-1019].eqiad.wmnet` - dbproxy1018.eqiad.wmnet (**PASS**) - Do... [15:11:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [15:35:06] (03CR) 10BryanDavis: [V: 04-1 C: 04-1] "Comments inline on how to collect a Tool's toolinfo description and an idea for giving the user more control over this initial description" [labs/striker] - 10https://gerrit.wikimedia.org/r/987145 (https://phabricator.wikimedia.org/T344610) (owner: 10Aklapper) [15:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:37:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [15:41:03] (TfInfraTestDestroyFailed) resolved: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [15:42:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:42:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [15:42:37] (03CR) 10David Caro: [C: 03+1] "LGTM" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 (https://phabricator.wikimedia.org/T346631) (owner: 10FNegri) [15:44:33] (03CR) 10Majavah: [C: 03+1] "the Formatter is now the same on both types of loggers, so you could simplify the code a bit even further" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 (https://phabricator.wikimedia.org/T346631) (owner: 10FNegri) [15:47:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:47:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:13:24] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: [builds-api] Improve error message when logs time out - https://phabricator.wikimedia.org/T354755 (10Raymond_Ndibe) @Slst2020 we have patches here https://phabricator.wikimedia.org/T354189 that increases the timeout to `10 minutes` (it was `1 min... [16:22:29] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: builds log streaming times out when time between two loglines exceeds ~1min - https://phabricator.wikimedia.org/T354189 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/... [16:22:54] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: builds log streaming times out when time between two loglines exceeds ~1min - https://phabricator.wikimedia.org/T354189 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/... [16:32:58] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: builds log streaming times out when time between two loglines exceeds ~1min - https://phabricator.wikimedia.org/T354189 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https... [16:39:49] 10Cloud-VPS, 10cloud-services-team: nova-api seems to die after a while, complains of a full listen queue - https://phabricator.wikimedia.org/T354483 (10fnegri) [16:41:48] 10Cloud-VPS, 10cloud-services-team: nova-api seems to die after a while, complains of a full listen queue - https://phabricator.wikimedia.org/T354483 (10Andrew) for now, the way to resolve this is to connect to the affected cloudcontrol and ` $ sudo systemctl restart nova-api ` As long as only one of our th... [16:45:41] (PrometheusRestarted) firing: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [16:48:43] !log fran@wmf3169 admin START - Cookbook wmcs.do_log_msg (T346631) [16:48:43] !log fran@wmf3169 admin test message3 from local cookbook (T346631) [16:48:43] !log fran@wmf3169 admin END (PASS) - Cookbook wmcs.do_log_msg (exit_code=0) (T346631) [16:48:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:48:50] T346631: [wmcs-cookbooks] SAL messages are shown differently when logging via wm-bot - https://phabricator.wikimedia.org/T346631 [16:48:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:48:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:49:30] (03PS5) 10FNegri: SAL logging: invert user and project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 (https://phabricator.wikimedia.org/T346631) [16:56:07] (03PS6) 10FNegri: SAL logging: invert user and project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 (https://phabricator.wikimedia.org/T346631) [16:56:51] (03CR) 10FNegri: SAL logging: invert user and project (032 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 (https://phabricator.wikimedia.org/T346631) (owner: 10FNegri) [16:57:34] (03PS7) 10FNegri: SAL logging: invert user and project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/989219 (https://phabricator.wikimedia.org/T346631) [17:10:41] (PrometheusRestarted) resolved: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [17:37:41] (CloudVPSDesignateLeaks) firing: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:42:41] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:53:31] 10Toolforge, 10cloud-services-team: Send "are you there?" email to tool labs members every 3 months to revalidate email address - https://phabricator.wikimedia.org/T148792 (10bd808) 5+ years later we continue to have the same problem of unreachable tool maintainers from the #Toolforge admin side. The process o... [17:55:15] 10Toolforge, 10cloud-services-team: Send "are you there?" email to Toolforge members every 3 months to revalidate email address - https://phabricator.wikimedia.org/T148792 (10bd808) [17:55:17] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: builds log streaming times out when time between two loglines exceeds ~1min - https://phabricator.wikimedia.org/T354189 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/... [18:07:47] 10Quarry: build container on PR - https://phabricator.wikimedia.org/T316958 (10rook) 05Open→03Resolved [18:07:53] 10Quarry, 10GitLab (Project Migration): Move Quarry from Gerrit to GitHub - https://phabricator.wikimedia.org/T308978 (10rook) [18:08:59] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Openstack-Magnum, 10Goal: Magnum in Horizon (magnum-ui) in codfw1dev - https://phabricator.wikimedia.org/T328711 (10rook) 05Open→03Resolved [18:12:43] 10Cloud-VPS: Have testlabs-terraform run somewhere and alert on failures - https://phabricator.wikimedia.org/T338636 (10rook) This is running on the bastion node in the tf-infra-test project and alerts on alerts.wikimedia.org via /var/lib/prometheus/node.d/ files [18:13:05] (03CR) 10Subramanya Sastry: [C: 03+1] Update channel config for #mediawiki-parsoid (031 comment) [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/987992 (owner: 10Reedy) [18:13:13] 10Cloud-VPS: Have testlabs-terraform run somewhere and alert on failures - https://phabricator.wikimedia.org/T338636 (10rook) 05Open→03Resolved a:03rook [18:13:27] 10Cloud-VPS: Cannot remove things in testlabs - https://phabricator.wikimedia.org/T339012 (10rook) 05Open→03Resolved [18:14:48] 10superset.wmcloud.org: Process for replicating DBs between clusters - https://phabricator.wikimedia.org/T343527 (10rook) Process described in README [18:14:58] 10superset.wmcloud.org: Process for replicating DBs between clusters - https://phabricator.wikimedia.org/T343527 (10rook) 05Open→03Resolved a:03rook [18:15:00] 10superset.wmcloud.org: Move superset DB back inside k8s - https://phabricator.wikimedia.org/T343526 (10rook) [18:15:02] 10superset.wmcloud.org: enable on Async Queries via Celery on https://superset.wmcloud.org/ - https://phabricator.wikimedia.org/T340623 (10rook) [18:15:42] 10superset.wmcloud.org: automate superset db backup - https://phabricator.wikimedia.org/T342699 (10rook) 05Open→03Resolved a:03rook [18:15:44] 10superset.wmcloud.org: Move superset DB back inside k8s - https://phabricator.wikimedia.org/T343526 (10rook) [18:16:08] 10Cloud-VPS: codfw1dev not updating dns? - https://phabricator.wikimedia.org/T345734 (10rook) 05Open→03Resolved [18:17:00] 10Cloud-VPS: Unable to delete proxy codfw1dev - https://phabricator.wikimedia.org/T345739 (10rook) 05Open→03Resolved [18:17:20] 10Cloud-VPS: kolla to bookworm - https://phabricator.wikimedia.org/T347715 (10rook) 05Open→03Declined [18:17:30] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) 05In progress→03Resolved [18:17:47] (03CR) 10Reedy: Update channel config for #mediawiki-parsoid (031 comment) [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/987992 (owner: 10Reedy) [18:18:33] 10cloud-services-team: bare metal deploy poc - https://phabricator.wikimedia.org/T348461 (10rook) 05Stalled→03Declined [18:18:35] 10cloud-services-team: [research] kolla-ansible poc - https://phabricator.wikimedia.org/T348457 (10rook) [18:18:39] 10cloud-services-team: kolla ceph integration - https://phabricator.wikimedia.org/T348460 (10rook) 05Open→03Declined [18:18:41] 10cloud-services-team: [research] kolla-ansible poc - https://phabricator.wikimedia.org/T348457 (10rook) [18:18:46] 10cloud-services-team: [research] kolla-ansible poc - https://phabricator.wikimedia.org/T348457 (10rook) [18:18:48] 10cloud-services-team: ldap for kolla - https://phabricator.wikimedia.org/T348458 (10rook) 05Open→03Declined [18:18:55] 10cloud-services-team: [research] kolla-ansible poc - https://phabricator.wikimedia.org/T348457 (10rook) 05Open→03Resolved [18:19:11] 10cloud-services-team: bare metal deploy poc - https://phabricator.wikimedia.org/T348461 (10rook) [18:19:13] 10cloud-services-team: Need baremetal system(s) with internet access - https://phabricator.wikimedia.org/T349003 (10rook) 05Open→03Declined [18:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:37:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [18:38:34] 10Toolforge: Setup Pint for tools-prometheus - https://phabricator.wikimedia.org/T354760 (10taavi) a:03taavi Still need to wire up an alert. [18:41:46] (03PS1) 10Majavah: channels: Route #openstack-magnum phab tag to -cloud-feed [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/989571 [19:47:41] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:52:41] (CloudVPSDesignateLeaks) resolved: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:56:10] (03CR) 10Majavah: [C: 03+2] channels: Route #openstack-magnum phab tag to -cloud-feed [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/989571 (owner: 10Majavah) [20:56:45] (03Merged) 10jenkins-bot: channels: Route #openstack-magnum phab tag to -cloud-feed [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/989571 (owner: 10Majavah) [21:29:33] (SystemdUnitDown) firing: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:34:33] (SystemdUnitDown) firing: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:36:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:37:03] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [21:44:33] (SystemdUnitDown) firing: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:50:41] 10Toolforge: Webservice crashes loudly when out of deployment quota - https://phabricator.wikimedia.org/T354808 (10taavi) [21:54:33] (SystemdUnitDown) resolved: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:02:31] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [22:02:43] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [22:23:37] (03PS3) 10Reedy: Update channel config for #mediawiki-parsoid [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/987992 [22:35:33] (SystemdUnitDown) firing: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:40:33] (SystemdUnitDown) firing: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:55:33] (SystemdUnitDown) resolved: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:37:33] (SystemdUnitDown) firing: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:42:33] (SystemdUnitDown) resolved: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:45:33] (SystemdUnitDown) firing: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:50:33] (SystemdUnitDown) firing: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:55:33] (SystemdUnitDown) resolved: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:55:46] 10Toolforge (Toolforge iteration 03): Create a kubernetes container with mono and dotnet - https://phabricator.wikimedia.org/T311466 (10Hawkeye7) I attach a copy of the build output. {F41664508}