[00:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [00:30:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [01:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [01:30:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [02:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [02:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:30:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [02:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [03:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:30:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [03:35:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [04:30:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [05:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [05:30:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [06:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [06:30:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [07:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [07:30:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [08:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [08:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:30:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [08:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:59:26] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10495818 (10Silvan_WMDE) >>! In T376267#10493210, @Ladsgroup wrote: > Renamed and force attached your account. Thank you! [09:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [09:04:03] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809 (10aborrero) 03NEW [09:04:10] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809#10495844 (10aborrero) p:05Triage→03Medium [09:13:30] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809#10495860 (10aborrero) Just now I saw: `` Failed to pull image "toolsbeta-harbor.wmcloud.org/toolforge/maintain-kubeusers │ │ :image-0.0.17... [09:17:22] RESOLVED: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown [09:21:58] (03open) 10hashar: phorge: move datacenter tasks to DC-Ops channel [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/49 (https://phabricator.wikimedia.org/T384804) [09:30:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [09:40:12] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809#10495953 (10aborrero) There seems to be something different in toolsbeta harbor compared to tools: `lang=shell-session arturo@nostromo:~ $... [09:42:50] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809#10495973 (10aborrero) more differences! `lang=shell-session aborrero@tools-harbor-1:/srv/ops/harbor$ sudo docker-compose ps Name... [09:43:55] (03open) 10hashar: Pin Flask to 3.0.3 due to incompat with Quart 0.19.4 [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/50 [10:00:18] FIRING: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [10:07:22] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809#10496109 (10Raymond_Ndibe) Hello Arturo, welcome back! Yea thanks for reporting this. This is partly my fault. I was working on something o... [10:10:18] RESOLVED: AlertLintProblem: Linting problems found for MaintainKubeusersHang - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [10:35:17] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809#10496277 (10aborrero) [10:35:25] 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2 - https://phabricator.wikimedia.org/T384327#10496278 (10aborrero) [10:37:33] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809#10496289 (10aborrero) p:05Medium→03Low [11:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:11:32] FIRING: ToolsNfsAlmostFull: Toolforge NFS is 0.8580219143742968/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull [12:20:29] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1114368 (owner: 10L10n-bot) [12:46:24] (03update) 10raymond-ndibe: Draft: [jobs-api] further group code into api, business and runtime logic [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [12:47:26] (03update) 10raymond-ndibe: [jobs-api] further group code into api, business and runtime logic [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [12:51:54] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [12:52:16] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [12:52:49] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [13:10:43] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10496809 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [13:11:12] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10496811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [13:21:26] (03open) 10raymond-ndibe: [jobs-api] remove wait_for_job from runtime methods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/138 (https://phabricator.wikimedia.org/T359804) [13:27:02] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [13:30:34] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [13:34:12] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [13:37:32] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [13:47:32] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [13:50:10] RESOLVED: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate pontoon-puppet-01.monitoring.eqiad.wmflabs is about to expire in 19d 2h 27m 27s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [13:51:02] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [13:52:20] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1114368 (owner: 10L10n-bot) [13:52:32] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api [13:52:39] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api [13:53:00] (03Abandoned) 10Abijeet Patro: Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1113784 (owner: 10L10n-bot) [13:53:58] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10496953 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumi... [14:31:10] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10497111 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin100... [14:31:34] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [14:35:13] PROBLEM - Host cloudcephosd1013 is DOWN: PING CRITICAL - Packet loss = 100% [14:35:49] RECOVERY - Host cloudcephosd1013 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [14:38:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [14:39:38] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) [14:42:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node [14:43:39] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) [14:46:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [14:46:09] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [14:46:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [14:46:51] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [14:55:21] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [14:55:23] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [14:55:58] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [14:56:00] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [14:59:31] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [14:59:34] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [15:00:14] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [15:00:16] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [15:03:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [15:03:15] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [15:04:08] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [15:04:10] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [15:04:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [15:04:58] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [15:11:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [15:11:29] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [15:11:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [15:12:02] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99) [15:24:09] 10Tools: [ErinnerMichBot] Query current page title before posting reminder - https://phabricator.wikimedia.org/T381563#10497390 (10Tkarcher) 05Invalid→03Open Back to Phabricator after failed attempt to use Gitlab issues... [15:24:56] 10Tool-erinnermich: [ErinnerMichBot] Query current page title before posting reminder - https://phabricator.wikimedia.org/T381563#10497392 (10Tkarcher) a:03Tkarcher [15:30:12] 10Tool-erinnermich: Possible support for other languages and projects? - https://phabricator.wikimedia.org/T384842 (10Tkarcher) 03NEW [15:31:08] 10Tool-erinnermich: [ErinnerMichBot] Possible support for other languages and projects? - https://phabricator.wikimedia.org/T384842#10497422 (10Tkarcher) [15:32:26] 10Tool-erinnermich: [ErinnerMichBot] Possible support for other languages and projects? - https://phabricator.wikimedia.org/T384842#10497425 (10Tkarcher) Comment from @M-J copied over here from Gitlab: about 1: I think a bot operator is sufficient, who can support. a native speaker is also neccessary to transla... [15:45:16] 10Toolforge (Toolforge iteration 17): [components-api] skip functional tests for tools - https://phabricator.wikimedia.org/T384843 (10Raymond_Ndibe) 03NEW [15:45:28] 10Toolforge (Toolforge iteration 17): [components-api] skip functional tests for tools - https://phabricator.wikimedia.org/T384843#10497464 (10Raymond_Ndibe) a:03Raymond_Ndibe [15:47:10] (03open) 10raymond-ndibe: [components-api] skip tests for tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/657 (https://phabricator.wikimedia.org/T384843) [15:48:38] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [15:56:44] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [15:57:35] (03approved) 10raymond-ndibe: [components-api] skip tests for tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/657 (https://phabricator.wikimedia.org/T384843) [15:57:39] (03merge) 10raymond-ndibe: [components-api] skip tests for tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/657 (https://phabricator.wikimedia.org/T384843) [15:58:21] 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [components-api] skip functional tests for tools - https://phabricator.wikimedia.org/T384843#10497513 (10Raymond_Ndibe) 05Open→03Resolved [15:58:48] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.346-20250123135045-edb3fcc8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/656 (https://phabricator.wikimedia.org/T361120 https://phabricator.wikimedia.org/T362621) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:59:55] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [16:07:53] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [16:08:04] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.346-20250123135045-edb3fcc8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/656 (https://phabricator.wikimedia.org/T361120 https://phabricator.wikimedia.org/T362621) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [16:08:09] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.346-20250123135045-edb3fcc8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/656 (https://phabricator.wikimedia.org/T361120 https://phabricator.wikimedia.org/T362621) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [16:18:18] FIRING: [2x] KernelErrors: Server cloudcephosd1013 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1013 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [17:38:57] FIRING: HarborDown: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [18:03:57] RESOLVED: HarborDown: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [18:06:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-harbor-2 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [18:34:58] 10Tool-events-impact-report: Set up the Phabricator workboard for the EIR Toolforge tool - https://phabricator.wikimedia.org/T384864 (10Arinaigu) 03NEW [18:35:56] 10Tool-events-impact-report: Set up the Phabricator workboard for the EIR Toolforge tool - https://phabricator.wikimedia.org/T384864#10498332 (10Arinaigu) [18:35:57] 10Tool-events-impact-report: Create a Phabricator board for the public EIR web app - https://phabricator.wikimedia.org/T384863#10498333 (10Arinaigu) [18:37:57] 10Tool-events-impact-report: Create a Phabricator board for the public EIR web app - https://phabricator.wikimedia.org/T384863#10498334 (10Arinaigu) 05Open→03Resolved I created this Phabricator board via the [[ https://toolsadmin.wikimedia.org/tools/id/events-impact-report | Toolforge admin console ]]. [18:55:09] 10Tool-wikiqanda, 06Future-Audiences, 07Spike: [Spike] Explore integrating with/using WikiChat methodology to serve bot Q&A - https://phabricator.wikimedia.org/T382020#10498391 (10DLin-WMF) 05Open→03Resolved a:03DLin-WMF [19:00:13] 10Tool-wikiqanda, 06Future-Audiences, 07Design: Bot user personas - https://phabricator.wikimedia.org/T381009#10498428 (10DLin-WMF) 05Open→03Resolved a:03DLin-WMF [19:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:06:37] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10498642 (10RobH) After Andrew pinged about this today in IRC, I can see on the system it has the alarms on idrac: System Inlet Temperature 35 °C (95.0 °F) w... [20:13:56] (03CR) 10Thiemo Kreuz (WMDE): Fixed the typo in Top contributor campaign page (031 comment) [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1053046 (https://phabricator.wikimedia.org/T358396) (owner: 10Hridyesh_Gupta) [20:18:08] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10498659 (10RobH) >>! In T383723#10498642, @RobH wrote: > After Andrew pinged about this today in IRC, I can see on the system it has the alarms on idrac: Sys... [20:46:22] FIRING: HAProxyBackendUnavailable: HAProxy service nova-metadata-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [20:56:22] RESOLVED: HAProxyBackendUnavailable: HAProxy service nova-metadata-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [21:44:31] PROBLEM - Host cloudcephosd1013 is DOWN: PING CRITICAL - Packet loss = 100% [21:45:49] RECOVERY - Host cloudcephosd1013 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [21:49:10] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [21:52:00] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97) [21:52:23] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [21:52:25] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0) [21:56:49] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.unset_cluster_maintenance [21:56:49] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.unset_cluster_maintenance (exit_code=0) [22:09:39] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [22:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:35:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:41:41] 10Tool-inteGraality: InteGraality bug - https://phabricator.wikimedia.org/T384882 (10Palotabarat) 03NEW [22:51:28] RESOLVED: PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-harbor-2 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [23:09:57] FIRING: HarborDown: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [23:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:22:35] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10499217 (10VRiley-WMF) The servers were getting the IP address from private 1-C and private 1-D, and not from th... [23:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:34:57] RESOLVED: HarborDown: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [23:42:39] 10Tool-inteGraality: pywikibot.exceptions.ServerError: HTTPSConnectionPool(host='query.wikidata.org', port=443): Read timed out. (read timeout=45) - https://phabricator.wikimedia.org/T384882#10499233 (10Izno)