[00:21:56] FIRING: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:34:33] FIRING: KernelErrors: Server cloudcephosd1048 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1048 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [01:24:41] (03update) 10samwilson: Update repo URL in toolinfo.json [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/3 (https://phabricator.wikimedia.org/T395398) [01:41:59] 10Cloud-VPS (Project-requests): Request creation of eseap VPS project - https://phabricator.wikimedia.org/T401957#11112307 (10Chlod) Likewise, I'm aware of the effort and responsibilities, and I'll help out with managing project resources. [02:16:56] FIRING: SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:17:06] 06cloud-services-team: SystemdUnitDown The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://phabricator.wikimedia.org/T402708 (10phaultfinder) 03NEW [03:43:49] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.546% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:02:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693) [04:02:24] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [04:06:02] PROBLEM - Host cloudcephosd1043 is DOWN: PING CRITICAL - Packet loss = 100% [04:08:27] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T401693) [04:08:30] RECOVERY - Host cloudcephosd1043 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [04:08:34] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [04:09:10] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [04:13:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.932% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:21:20] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693) [04:21:28] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [04:29:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [06:17:11] FIRING: SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [08:16:56] RESOLVED: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [08:16:56] RESOLVED: SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1002 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:57:00] 06cloud-services-team, 10Toolforge: "toolforge-jobs list" error - "TjfCliError: Unable to find image in the supported list or harbor" - https://phabricator.wikimedia.org/T402724 (10Bamyers99) 03NEW [18:52:44] (03PS1) 10Rehan_khan_78: Split main.css into multiple CSS files by page and update templates [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1181245 [19:15:11] (03update) 10don-vip: Draft: DVIDS: incremental update service [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/4 [19:24:04] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T401693) [19:24:11] T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693 [19:27:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693) [19:33:35] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693#11112748 (10Andrew) [20:01:45] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11112749 (10DamianZaremba) Sorry, happened again.... Previous build from https://github.com/cluebotng/component-configs/actions/runs/17179445014/... [20:05:09] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11112750 (10DamianZaremba) This endpoint is being cached: ` via: 1.1 varnish x-served-by: cache-lis1490020-LIS x-cache: HIT ` `raw.githubusercont... [20:11:23] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11112752 (10DamianZaremba) Basic cache busting does not appear to work: ` $ for x in {1..3}; do curl -si https://raw.githubusercontent.com/cluebot... [20:13:31] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11112755 (10DamianZaremba) I see 2 reasonable paths forward: * Deploy API accepts sha and/or url which is then used in combination/place of `sour...