[01:11:01] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693) [01:11:08] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [01:16:09] PROBLEM - Host cloudcephosd1045 is DOWN: PING CRITICAL - Packet loss = 100% [01:17:37] RECOVERY - Host cloudcephosd1045 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [01:20:41] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693) [01:20:48] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [01:20:49] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693) [02:03:18] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on cloudweb1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:40:02] 06cloud-services-team, 10decommission-hardware: decommission cloudcephosd1004-10015 - https://phabricator.wikimedia.org/T402881 (10Andrew) 03NEW [03:52:26] 06cloud-services-team, 10Cloud-VPS: [tofu-cloudvps] cloudvps_puppet_prefix.hiera settings show dirty diffs based on YAML canonicalization - https://phabricator.wikimedia.org/T398643#11117661 (10bd808) >>! In T398643#10973146, @taavi wrote: > I'm fairly sure the provider actually transforms the YAML string into... [03:53:38] 06cloud-services-team, 10Cloud-VPS: [tofu-cloudvps] cloudvps_puppet_prefix.hiera settings show dirty diffs based on YAML canonicalization - https://phabricator.wikimedia.org/T398643#11117662 (10bd808) 05Open→03In progress a:03bd808 [08:07:53] (03update) 10vriaa: feat: Add banner code generation feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/8 [08:19:05] PROBLEM - nova-compute proc minimum on cloudvirt1061 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [08:20:05] RECOVERY - nova-compute proc minimum on cloudvirt1061 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [08:38:21] 06cloud-services-team, 10Cloud-VPS: Use cloud-private network and cfssl certs for instance live migrations - https://phabricator.wikimedia.org/T355145#11117933 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is done -- we're now using cfssl certs and private hostnames for live migrations [09:17:46] (03update) 10vriaa: feat: Add banner code generation feature [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/8 [10:09:56] (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569) [10:31:28] FIRING: InstanceDown: Project tools instance tools-harbor-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:31:35] (03CR) 10FNegri: vps: Add cookbook to delete a project (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1139027 (https://phabricator.wikimedia.org/T391836) (owner: 10Majavah) [10:46:54] 10Tools, 10Wiki-Loves-Monuments: Redirect toolserver.org/~erfgoed/stream/ - https://phabricator.wikimedia.org/T175671#11118502 (10Ciell) 05Open→03Resolved a:03Ciell [10:51:34] (03PS1) 10Muehlenhoff: Remove obsolete stub keytabs [labs/private] - 10https://gerrit.wikimedia.org/r/1182114 (https://phabricator.wikimedia.org/T396487) [10:53:15] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove obsolete stub keytabs [labs/private] - 10https://gerrit.wikimedia.org/r/1182114 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff) [11:16:28] RESOLVED: InstanceDown: Project tools instance tools-harbor-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:23:19] (03update) 10dcaro: dump: skip unset keys [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/124 [12:27:47] (03open) 10raymond-ndibe: [tool-config] handle unset and default arguments consistently [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/123 (https://phabricator.wikimedia.org/T402572) [12:31:08] (03update) 10dcaro: dump: skip unset keys [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/124 [12:39:24] 06cloud-services-team, 10Toolforge: [components-api] split source from config - https://phabricator.wikimedia.org/T402790#11118916 (10dcaro) I think it's too many different places to configure, one of the goals of the `tool config` is to have everything in one place (say a yaml file), so you don't need to go d... [12:49:11] (03open) 10dcaro: components api feature/add lima kilo to readme [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/124 [12:49:20] (03close) 10dcaro: Add basic instructions for deploying into lima-kilo [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/120 (owner: 10damian) [12:50:22] (03update) 10dcaro: lima-kilo: update readme and local deploy settings [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/124 [12:55:31] (03update) 10raymond-ndibe: [tool-config] handle unset and default arguments consistently [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/123 (https://phabricator.wikimedia.org/T402572) [13:06:36] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923 (10dcaro) 03NEW [13:06:51] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923#11119021 (10dcaro) 05Open→03In progress p:05Triage→03High [13:08:18] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693#11119026 (10Andrew) 05Open→03Resolved [13:18:34] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923#11119064 (10Raymond_Ndibe) looking at this [13:20:02] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923#11119065 (10dcaro) Trying to pull + push a single image from a tool repo using the robot account ends in 500 error: ` dcaro@acme$ podman login https://tools-harbor.w... [13:23:43] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923#11119089 (10dcaro) Using the admin account also fails, so the error is not permissions, but something else [13:31:30] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923#11119127 (10Raymond_Ndibe) might be worth it to check the storage quota of `harborstorage` s3 bucket. The fact that is it was working intially but stopped suddenly m... [13:31:54] (03update) 10fnegri: Setup pytest, add first test [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/4 [13:33:57] Guest31: [13:35:42] (03update) 10fnegri: Setup pytest, add first test [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/4 [13:42:11] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923#11119180 (10Raymond_Ndibe) yeaaaa, I think I see where the problem is coming from. ` raymond-ndibe@cloudcontrol1006:~$ sudo radosgw-admin user info --uid tools\$tool... [13:43:07] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923#11119186 (10Raymond_Ndibe) >>! In T402923#11119183, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-cloud), href=https://sal.toolforge.org/log/W... [13:43:44] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923#11119189 (10dcaro) @Raymond_Ndibe I increased the quota too, for issues like this, can you drop to irc instead? it's way easier to coordinate [13:44:47] 10VPS-Projects, 10Content-Transform-Team (Work In Progress), 07Essential-Work: Request new VPS for Content Transform Team Visual Diff testing - https://phabricator.wikimedia.org/T402836#11119196 (10cscott) [13:44:58] (03update) 10fnegri: Setup pytest, add first test [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/4 [13:45:28] 10Toolforge (Toolforge iteration 23): [builds-service] builds not working due to access issues in tools - https://phabricator.wikimedia.org/T402923#11119203 (10Raymond_Ndibe) >>! In T402923#11119189, @dcaro wrote: > @Raymond_Ndibe I increased the quota too, for issues like this, can you drop to irc instead? it's... [13:50:58] (03approved) 10dcaro: lima-kilo: update readme and local deploy settings [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/124 [13:51:02] (03merge) 10dcaro: lima-kilo: update readme and local deploy settings [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/124 [13:51:57] (03approved) 10dcaro: harbor - only download and setup once [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/265 (owner: 10damian) [13:52:08] (03merge) 10dcaro: harbor - only download and setup once [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/265 (owner: 10damian) [13:53:05] (03approved) 10dcaro: harbor - move restart to handler [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/267 (owner: 10damian) [13:53:33] (03update) 10dcaro: harbor - move restart to handler [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/267 (owner: 10damian) [13:53:57] (03approved) 10dcaro: kubectl alias - use blockinfile [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/262 (owner: 10damian) [13:55:12] 06cloud-services-team, 10Toolforge: [components-api] split source from config - https://phabricator.wikimedia.org/T402790#11119256 (10DamianZaremba) >>! In T402790#11118916, @dcaro wrote: > I think it's too many different places to configure, one of the goals of the `tool config` is to have everything in one p... [13:56:36] (03merge) 10dcaro: harbor - move restart to handler [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/267 (owner: 10damian) [13:58:33] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.144-20250826135230-8cd71749 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/934 [14:02:35] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T401693) [14:02:43] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [14:16:24] (03PS1) 10Andrew Bogott: osd undrain_node: change default batch size to 2 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1182153 [14:22:51] 06cloud-services-team, 10Cloud-VPS, 10DNS, 06SRE, 06Traffic: PDNS in cloud can return inconsistent answers - https://phabricator.wikimedia.org/T281700#11119387 (10ssingh) 05Open→03Resolved a:03ssingh Some quick notes: - We are running `pdns-recursor` 4.8 in production, with an upgrade to 5 in... [15:02:25] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [15:05:09] (03CR) 10David Caro: [C:03+1] "LGTM" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1182153 (owner: 10Andrew Bogott) [15:14:25] (03update) 10dcaro: kubectl alias - use blockinfile [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/262 (owner: 10damian) [15:15:32] (03merge) 10dcaro: kubectl alias - use blockinfile [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/262 (owner: 10damian) [15:17:22] (03update) 10dcaro: docker - move restart to handler [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/264 (owner: 10damian) [15:17:33] (03approved) 10dcaro: docker - move restart to handler [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/264 (owner: 10damian) [15:17:38] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11119601 (10fnegri) p:05Triage→03Medium [15:18:32] (03merge) 10dcaro: docker - move restart to handler [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/264 (owner: 10damian) [15:23:05] 10Toolforge (Toolforge iteration 23): [harbor,infra] gather stats about object storage qutoa usage and add an alert when tools is getting out of quota - https://phabricator.wikimedia.org/T402932 (10dcaro) 03NEW [15:31:23] (03CR) 10Andrew Bogott: [C:03+2] osd undrain_node: change default batch size to 2 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1182153 (owner: 10Andrew Bogott) [15:31:34] (03update) 10raymond-ndibe: [tool-config] handle unset and default arguments consistently [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/123 (https://phabricator.wikimedia.org/T402572) [15:33:32] 06cloud-services-team: KernelErrors Server cloudcephosd1048 logged kernel errors - https://phabricator.wikimedia.org/T402699#11119763 (10fnegri) 05Open→03Resolved a:03fnegri I think this is {T402646} that re-triggered because a silence expired. [15:33:49] (03update) 10raymond-ndibe: [config] support port protocol [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T401994) [15:35:20] (03Merged) 10jenkins-bot: osd undrain_node: change default batch size to 2 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1182153 (owner: 10Andrew Bogott) [15:35:45] (03update) 10damian: Add validated type for git urls [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/121 [15:36:40] 06cloud-services-team, 06DC-Ops, 06SRE: cloudcephosd10[48-52] service implementation - https://phabricator.wikimedia.org/T395910#11119794 (10Andrew) [15:37:09] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api] allow specifying `source_repo`+`ref` for the config - https://phabricator.wikimedia.org/T402764#11119800 (10DamianZaremba) I had to make some changes due to no writeable temp directory, but the above appears to work. Some testing ou... [15:38:06] 06cloud-services-team: PuppetFailure - https://phabricator.wikimedia.org/T402562#11119808 (10fnegri) 05Open→03Resolved a:03fnegri [15:38:23] (03update) 10dcaro: tool home dir - update permissions [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/268 (owner: 10damian) [15:38:27] (03approved) 10dcaro: tool home dir - update permissions [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/268 (owner: 10damian) [15:39:38] (03merge) 10dcaro: tool home dir - update permissions [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/268 (owner: 10damian) [15:39:48] (03update) 10raymond-ndibe: [config] support port protocol [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T401994) [15:39:56] (03update) 10raymond-ndibe: [config] support port protocol [repos/cloud/toolforge/components-api] (handle_unset_and_default_arguments_consistently) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T401994) [15:40:06] (03update) 10raymond-ndibe: [config] support port protocol [repos/cloud/toolforge/components-api] (handle_unset_and_default_arguments_consistently) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T401994) [15:41:04] (03update) 10raymond-ndibe: [tool-config] handle unset and default arguments consistently [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/123 (https://phabricator.wikimedia.org/T402572) [15:49:18] FIRING: KernelErrors: Server cloudcephosd1052 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1052 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [15:49:24] 06cloud-services-team: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T402938 (10phaultfinder) 03NEW [15:54:55] 06cloud-services-team: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T402938#11119950 (10fnegri) Some errors were logged while setting up this new host. The lvm2 errors can be ignored (race condition, same as T402475). The firmware errors might require more inves... [16:00:44] (03update) 10dcaro: dump: skip unset keys [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/124 [16:00:51] 06cloud-services-team: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T402938#11120001 (10Andrew) a:03Jclark-ctr @Jclark-ctr We're pretty sure those lvm issues can be ignored, but the firmware errors are concerning; do you know what fw version we want to run for t... [16:03:36] 06cloud-services-team, 06DC-Ops, 06SRE, 13Patch-For-Review: cloudcephosd10[48-52] service implementation - https://phabricator.wikimedia.org/T395910#11120025 (10Andrew) [16:05:44] (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569) [16:22:54] (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569) [16:23:37] (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569) [17:08:53] 06cloud-services-team, 10Toolforge: Duplicated utils script - https://phabricator.wikimedia.org/T402949 (10fnegri) 03NEW [17:09:04] 06cloud-services-team, 10Toolforge: Duplicated utils scripts - https://phabricator.wikimedia.org/T402949#11120440 (10fnegri) [17:09:43] 06cloud-services-team, 10Toolforge: Duplicated utils scripts - https://phabricator.wikimedia.org/T402949#11120441 (10fnegri) [17:30:17] 06cloud-services-team: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T402938#11120503 (10Jclark-ctr) Firmware was 23.0.8 Updating to 24.0.5 [17:31:46] PROBLEM - Host cloudcephosd1052 is DOWN: PING CRITICAL - Packet loss = 100% [17:35:47] FIRING: NodeDown: Node cloudcephosd1052 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1052 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [17:36:14] RECOVERY - Host cloudcephosd1052 is UP: PING OK - Packet loss = 0%, RTA = 0.45 ms [17:37:32] 06cloud-services-team: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T402938#11120554 (10Jclark-ctr) @Andrew If errors continue, we might need to upgrade to a newer version of Debian. I imaged the server with Bullseye today [17:40:47] RESOLVED: NodeDown: Node cloudcephosd1052 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1052 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [18:11:03] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11120726 (10Raymond_Ndibe) >>! In T401172#11110418, @DamianZaremba wrote: > This sounds like a good improvement. > > Just a question regardin... [18:12:08] (03open) 10bd808: puppet_prefix: Generate YAML with `yamlencode` equivalent [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/10 (https://phabricator.wikimedia.org/T398643) [18:12:27] (03update) 10bd808: puppet_prefix: Generate YAML with `yamlencode` equivalent [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/10 (https://phabricator.wikimedia.org/T398643) [18:12:33] (03update) 10bd808: puppet_prefix: Generate YAML with `yamlencode` equivalent [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/10 (https://phabricator.wikimedia.org/T398643) [18:25:38] 06cloud-services-team, 06DC-Ops, 10ops-eqiad: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T402938#11120820 (10Jclark-ctr) [18:42:55] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11120916 (10Raymond_Ndibe) * **one-off | continuous jobs**: for one-off jobs this can mean the pod is still waiting to be scheduled, pod... [19:35:07] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97) [19:39:52] (03update) 10bd808: puppet_prefix: Generate YAML with `yamlencode` equivalent [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/10 (https://phabricator.wikimedia.org/T398643) [20:04:40] (03PS1) 10Wandji collins: Remove the backend of this project to a new gitlab repo Update and test this repo to work as expected. [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1182211 [20:05:17] (03CR) 10CI reject: [V:04-1] Remove the backend of this project to a new gitlab repo Update and test this repo to work as expected. [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1182211 (owner: 10Wandji collins) [20:07:31] (03PS2) 10Wandji collins: Remove the backend of this project to a new gitlab repo Update and test this repo to work as expected. [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1182211 [20:07:57] (03CR) 10CI reject: [V:04-1] Remove the backend of this project to a new gitlab repo Update and test this repo to work as expected. [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1182211 (owner: 10Wandji collins) [20:08:47] (03PS3) 10Wandji collins: Remove the backend of this project to a new gitlab repo Update and test this repo to work as expected. [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1182211 [20:57:29] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [22:11:13] (03PS4) 10Wandji collins: Remove the backend of this project to a new gitlab repo Update and test this repo to work as expected. [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1182211 [22:34:54] (03PS1) 10Bovimacoco: T398344: Fix Firefox compatibility issue [labs/tools/mostvisitedarticle] - 10https://gerrit.wikimedia.org/r/1182233 [22:35:06] (03PS5) 10Wandji collins: Remove the backend of this project to a new gitlab repo Update and test this repo to work as expected. [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1182211 [23:37:33] (03update) 10pepepiton: Fix search field alignment and improve dropdown selection UX [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/7 (owner: 10josefanthony) [23:57:42] 10Tool-paulina: Add autocomplete suggestions to search bar using Wikidata wbsearchentities API - https://phabricator.wikimedia.org/T402458#11121943 (10Pepe_piton) Great progress! I just changed a little thing in the API call so that the autocomplete is displayed in the user's language. One thing it would be awe... [23:59:51] (03update) 10bd808: puppet_prefix: Generate YAML with `yamlencode` equivalent [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/10 (https://phabricator.wikimedia.org/T397994 https://phabricator.wikimedia.org/T398643)