[00:02:29] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T371337#10025852 (10LibUp-bot) [00:02:31] 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T370115#10025854 (10LibUp-bot) A new upstream version of Pywikibot is now available: 9.3.0. * https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Pywikibot_image * https://gerrit.wikimedia.org/g/pywikibot/core/+/refs/tags/... [00:15:28] FIRING: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:20:28] RESOLVED: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [03:02:46] 10VPS-Projects: GLAMWiki Dashboard not loading - https://phabricator.wikimedia.org/T355082#10026092 (10Pppery) 05Open→03Resolved Works for me now. [03:19:56] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:31:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-19 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [05:21:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-19 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:17:57] 10Toolforge (Toolforge iteration 13): [lima-kilo] add ingress-admission - https://phabricator.wikimedia.org/T370774#10026228 (10Slst2020) 05In progress→03Resolved [07:19:56] FIRING: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:40:12] (03approved) 10dcaro: [jobs-cli] move jobs load to backend [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T366209) (owner: 10raymond-ndibe) [07:47:11] (03approved) 10dcaro: kind: upgrade k8s to 1.26 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/177 (https://phabricator.wikimedia.org/T370244) (owner: 10sstefanova) [07:47:52] (03update) 10dcaro: kind: upgrade k8s to 1.26 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/177 (https://phabricator.wikimedia.org/T370244) (owner: 10sstefanova) [07:51:34] (03merge) 10dcaro: toolforge_get_versions: add component bang to the mr number [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/457 [07:54:48] (03open) 10dcaro: run_functional_tests: don't tabulate the versions [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/460 [07:57:22] (03update) 10dcaro: helpers: add toolforge_redeploy_components.sh [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/166 (owner: 10aborrero) [08:00:26] (03open) 10dcaro: toolforge_deploy_mr: use the correct name when registering an mr [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/179 [08:08:13] 10Toolforge (Toolforge iteration 13): [jobs-cli,builds-cli,envvars-cli] consolidate user agent - https://phabricator.wikimedia.org/T370393#10026411 (10dcaro) 05In progress→03Resolved [08:32:35] 10Toolforge (Toolforge iteration 13): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10026497 (10dcaro) Restarting the jobservice container and looking at the logs shows some errors on startup, that affect the `EXECUTION_SWEEP` job (that should be cleaning up that ta... [08:35:11] RESOLVED: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:01:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-19 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:11:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-19 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [10:01:53] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: ☂ Migrate Wikitech to Kubernetes - https://phabricator.wikimedia.org/T292707#10026777 (10jijiki) [10:03:07] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: ☂ Migrate Wikitech to Kubernetes - https://phabricator.wikimedia.org/T292707#10026770 (10jijiki) 05Open→03In progress p:05Medium→03High [10:05:24] 06cloud-services-team, 10wikitech.wikimedia.org, 07LDAP: Replace wikitech as source of two-factor auth protection for developer accounts - https://phabricator.wikimedia.org/T359551#10026789 (10jijiki) [10:05:27] 06cloud-services-team, 10wikitech.wikimedia.org: Disable SSH key management on Wikitech - https://phabricator.wikimedia.org/T359544#10026790 (10jijiki) [10:05:30] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic, 07Security: sustainability of wikitech.wikimedia.org - https://phabricator.wikimedia.org/T363125#10026788 (10jijiki) [10:05:32] 06cloud-services-team, 10wikitech.wikimedia.org: Move Wikitech onto the production MW cluster - https://phabricator.wikimedia.org/T237773#10026791 (10jijiki) [10:05:38] 06cloud-services-team, 14MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Remove OpenStackManager from Wikitech - https://phabricator.wikimedia.org/T161553#10026793 (10jijiki) [10:05:41] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10026792 (10jijiki) [10:05:42] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic, 07Security: sustainability of wikitech.wikimedia.org - https://phabricator.wikimedia.org/T363125#10026794 (10jijiki) [10:06:14] 10wikitech.wikimedia.org: Remove and empty useless user groups - https://phabricator.wikimedia.org/T237890#10026797 (10jijiki) [10:06:15] 10wikitech.wikimedia.org, 06serviceops, 06SRE: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889#10026798 (10jijiki) [10:06:16] 06cloud-services-team, 10wikitech.wikimedia.org: Move Wikitech onto the production MW cluster - https://phabricator.wikimedia.org/T237773#10026796 (10jijiki) [10:06:39] 10wikitech.wikimedia.org: Make Wikitech a normal wiki - https://phabricator.wikimedia.org/T237771#10026814 (10jijiki) [10:06:41] 06cloud-services-team, 10wikitech.wikimedia.org: Move Wikitech onto the production MW cluster - https://phabricator.wikimedia.org/T237773#10026812 (10jijiki) [10:06:47] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10026815 (10jijiki) [10:07:14] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic, 07Security: sustainability of wikitech.wikimedia.org - https://phabricator.wikimedia.org/T363125#10026781 (10jijiki) 05Open→03Resolved a:03jijiki Plan has been draften in the "Wikitech Migration Plan" document, and in the interest of not h... [10:30:18] 10Toolforge (Toolforge iteration 13): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10026865 (10dcaro) I did a quick cleanup of the db manually: ` harbor=> delete from task where vendor_type in('IMAGE_SCAN', 'RETENTION') AND status not in('Pending', 'Scheduled', 'Ru... [10:45:14] 06cloud-services-team, 10Toolforge: toolforge: maintain-kubeusers: review & correct kubernetes templated resource names - https://phabricator.wikimedia.org/T371355 (10aborrero) 03NEW [10:55:16] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#10026909 (10fnegri) The schema change took about 27 hours to complete in db1155 (Sanitarium host): {F56784369} But it's taking more than 48 hours in c... [11:17:10] 06cloud-services-team, 10Cloud-VPS: openstack: consider reducing log pressure - https://phabricator.wikimedia.org/T371356 (10aborrero) 03NEW [11:17:17] 06cloud-services-team, 10Cloud-VPS: openstack: consider reducing log pressure - https://phabricator.wikimedia.org/T371356#10026940 (10aborrero) a:03Andrew [11:27:55] 06cloud-services-team: openstack: codfw1dev: nova-compute can't contact rabbitmq - https://phabricator.wikimedia.org/T371242#10026964 (10aborrero) [11:28:21] 06cloud-services-team: openstack: codfw1dev: nova-compute can't contact rabbitmq - https://phabricator.wikimedia.org/T371242#10026958 (10aborrero) 05Open→03Resolved a:03Andrew Thanks @Andrew everything seems working now. I assume all you did was restarting everything multiple times? [11:38:55] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Traffic: Direct wikitech.wikimedia.org to mw-on-k8s - https://phabricator.wikimedia.org/T371358 (10jijiki) 03NEW [11:40:25] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Migrate Wikitech's Jobqueue - https://phabricator.wikimedia.org/T371359 (10jijiki) 03NEW [11:41:29] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Apache: Include Wikitech in mw-on-k8s' virtual hosts - https://phabricator.wikimedia.org/T371360 (10jijiki) 03NEW [12:33:27] 06cloud-services-team: openstack: codfw1dev: nova-compute can't contact rabbitmq - https://phabricator.wikimedia.org/T371242#10027326 (10Andrew) I did a full reset and rebuild of rabbitmq. I definitely do not know why that helped :( [12:39:05] 10Toolforge (Toolforge iteration 13): [k8s,infra,cookbook] change the hiera under the -k8s-control prefix whet adding/removing an etcd node - https://phabricator.wikimedia.org/T371370 (10dcaro) 03NEW [12:54:37] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: mediawiki-config: consolidate labswiki - https://phabricator.wikimedia.org/T371374 (10jijiki) 03NEW [12:59:27] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: mediawiki-config: consolidate labswiki - https://phabricator.wikimedia.org/T371374#10027469 (10jijiki) [13:04:59] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Cleanup: Wikitech code leftovers - https://phabricator.wikimedia.org/T371378 (10jijiki) 03NEW [13:22:28] 06cloud-services-team, 10Cloud-VPS: Collect access metrics from cloud-vps web proxy - https://phabricator.wikimedia.org/T371382 (10Andrew) 03NEW [13:29:27] 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [builds-api,jobs-api,envvars-api,api-gateway] Figure out and document how to do non-backwards compatible changes - https://phabricator.wikimedia.org/T356974#10027614 (10Raymond_Ndibe) 05Stalled→03In progress [13:46:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-23 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:46:45] 10Toolforge (Toolforge iteration 13): [toolforge-weld] support back python 3.7 - https://phabricator.wikimedia.org/T370932#10027687 (10dcaro) 05Open→03In progress [13:47:14] 10Toolforge: [builds-builder, builds-api] upgrade tekton version - https://phabricator.wikimedia.org/T370869#10027692 (10dcaro) [13:47:48] 10Toolforge (Toolforge iteration 13): [harbor] Investigate how to deactivate wal from trove for postrges databases - https://phabricator.wikimedia.org/T370845#10027696 (10dcaro) [13:52:20] 10Toolforge: [cli] the generic cli swallows the `--` from other commands - https://phabricator.wikimedia.org/T370184#10027716 (10dcaro) [13:56:28] 06cloud-services-team, 10Toolforge: toolforge: integrate fourohfour as a custom component, rather than a normal tool - https://phabricator.wikimedia.org/T369364#10027730 (10dcaro) [13:58:36] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [api-gateway] add alert for uptime - https://phabricator.wikimedia.org/T348633#10027734 (10dcaro) a:05Slst2020→03None [14:01:12] 06cloud-services-team, 10Toolforge: toolforge: maintain-kubeusers: review & correct kubernetes templated resource names - https://phabricator.wikimedia.org/T371355#10027773 (10dcaro) p:05Triage→03Medium [14:04:16] 10Toolforge (Toolforge iteration 13): [k8s,infra,cookbook] change the hiera under the -k8s-control prefix whet adding/removing an etcd node - https://phabricator.wikimedia.org/T371370#10027790 (10dcaro) p:05Triage→03Low [14:04:26] 10Toolforge (Toolforge iteration 13): [gateway-api] something is caching the openapi docs - https://phabricator.wikimedia.org/T371033#10027791 (10dcaro) p:05Triage→03Medium [14:12:43] (03update) 10raymond-ndibe: [builds-cli] remove _display_messages [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/69 [14:12:59] (03update) 10raymond-ndibe: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46 [14:17:15] (03update) 10raymond-ndibe: Tag v1.5.0 release [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/44 (owner: 10taavi) [14:17:53] (03approved) 10raymond-ndibe: Tag v1.6.0 release [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/52 [14:17:56] (03merge) 10raymond-ndibe: Tag v1.6.0 release [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/52 [14:22:15] 06cloud-services-team, 10Cloud-VPS: Cloud VPS: extend tofu-infra to cover quotas - https://phabricator.wikimedia.org/T371391 (10aborrero) 03NEW [14:23:24] (03update) 10raymond-ndibe: [jobs-cli] move jobs load to backend [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T366209) [14:25:10] 06cloud-services-team, 10Cloud-VPS: Cloud VPS: extend tofu-infra to cover projects, users and roles - https://phabricator.wikimedia.org/T371393 (10aborrero) 03NEW [14:25:32] 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T370115#10027941 (10dcaro) p:05Triage→03Medium [14:26:42] 06cloud-services-team, 10Cloud-VPS: Cloud VPS: extend tofu-infra to cover quotas - https://phabricator.wikimedia.org/T371391#10027943 (10aborrero) [14:28:30] 10Toolforge, 07Kubernetes: toolforge-jobs and packbuild images - https://phabricator.wikimedia.org/T369786#10027961 (10dcaro) 05Open→03Resolved a:03dcaro Glad to hear it's working :) > launcher is present now - is it required? Works okay with it. Yep, it's ok, you can remove it too, it will be add... [14:28:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:30:09] 10Toolforge: fagiani/apt buildpack very slow when processing a large collection of packages - https://phabricator.wikimedia.org/T369563#10028017 (10dcaro) p:05Triage→03Low This might be solved using a similar setup than {T350307} (adding link just in case) [14:31:29] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#10028027 (10fnegri) An even better comparison is between clouddb1019 (the host struggling with replication lag) and clouddb1015 (the "web" s4 wikireplic... [14:33:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:48:12] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] review kubelet flags before 1.26 upgrade - https://phabricator.wikimedia.org/T370245#10028166 (10Slst2020) Both `--container-runtime` and `--pod-infra-container-image` are set in `/var/lib/kubelet/kubeadm-flags.env`. This file is not man... [15:44:30] (03update) 10raymond-ndibe: [jobs-cli] move jobs load to backend [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T366209) [15:46:55] (03PS1) 10Krinkle: frontend: Enable php-opcache, Debian 11 Bullseye to 12 Bookworm, PHP 8.1 to 8.3 [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1058203 [15:47:49] (03CR) 10Krinkle: "Tested locally, commands in frontend/README.md" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1058203 (owner: 10Krinkle) [16:03:58] (03update) 10raymond-ndibe: [jobs-cli] move jobs load to backend [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T366209) [16:04:25] (03update) 10raymond-ndibe: [jobs-cli] move jobs load to backend [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T366209) [16:17:54] (03open) 10raymond-ndibe: [toolforge-weld] require python 3.9 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/53 [16:19:20] (03approved) 10dcaro: [toolforge-weld] require python 3.9 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/53 (owner: 10raymond-ndibe) [16:19:33] (03approved) 10raymond-ndibe: [toolforge-weld] require python 3.9 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/53 [16:19:36] (03merge) 10raymond-ndibe: [toolforge-weld] require python 3.9 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/53 [16:24:27] (03open) 10raymond-ndibe: bump to v1.6.1 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/54 [16:25:40] (03approved) 10raymond-ndibe: bump to v1.6.1 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/54 [16:26:30] (03approved) 10dcaro: bump to v1.6.1 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/54 (owner: 10raymond-ndibe) [16:26:36] (03merge) 10raymond-ndibe: bump to v1.6.1 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/54 [16:28:19] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 [16:28:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:34:04] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 [16:34:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:44:21] (03update) 10raymond-ndibe: [jobs-cli] move jobs load to backend [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T366209) [16:45:27] (03update) 10raymond-ndibe: [jobs-cli] move jobs load to backend [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T366209) [16:53:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-23 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:54:03] (03approved) 10raymond-ndibe: [jobs-cli] move jobs load to backend [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T366209) [16:54:11] (03merge) 10raymond-ndibe: [jobs-cli] move jobs load to backend [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T366209) [16:58:01] FIRING: ToolsToolsNFSDown: No tools nfs services running found - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNFSDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsNFSDown [16:58:55] FIRING: PawsPawsNFSDown: No paws nfs services running found - https://wikitech.wikimedia.org/wiki/PAWS/Admin - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsPawsNFSDown [17:01:55] (03open) 10raymond-ndibe: d/changelog: bump to 16.1.0 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/58 (https://phabricator.wikimedia.org/T366209) [17:02:15] (03update) 10raymond-ndibe: d/changelog: bump to 16.1.0 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/58 (https://phabricator.wikimedia.org/T366209) [17:05:59] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [17:06:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:07:05] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli [17:07:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:07:15] (03PS10) 10David Caro: toolforge.deploy: support deploying packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1057847 [17:09:49] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#10028841 (10fnegri) > there was a big increase in traffic to clouddb1019 starting from 2024-06-12 when I rebooted the host, but the "Network activity" g... [17:27:29] (03open) 10raymond-ndibe: [jobs-api] remove image validation from DefinedJob [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/110 [17:28:00] (03approved) 10dcaro: [jobs-api] remove image validation from DefinedJob [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/110 (owner: 10raymond-ndibe) [17:29:45] (03approved) 10raymond-ndibe: [jobs-api] remove image validation from DefinedJob [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/110 [17:30:53] (03merge) 10raymond-ndibe: [jobs-api] remove image validation from DefinedJob [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/110 [17:33:58] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: jobs-api: bump to 0.0.326-20240730173106-a8852dcd [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/461 [17:34:47] !log raymond@ubuntu toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [17:34:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:36:08] !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [17:36:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:37:09] !log raymond@ubuntu tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [17:37:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:39:42] !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [17:39:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:40:10] !log raymond@ubuntu tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli [17:40:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:46:32] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic, 07Security: sustainability of wikitech.wikimedia.org - https://phabricator.wikimedia.org/T363125#10028983 (10nshahquinn-wmf) >>! In T363125#10026781, @jijiki wrote: > Plan has been draften in the "Wikitech Migration Plan" document Thank you—very... [17:47:32] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/18 (owner: 10l10n-bot) [17:49:20] !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [17:49:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:49:49] !log raymond@ubuntu tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli [17:49:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:55:31] FIRING: ToolsNFSDown: No tools nfs services running found - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNFSDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNFSDown [17:59:05] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [17:59:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:59:14] !log dcaro@urcuchillay tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli [17:59:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:00:31] RESOLVED: ToolsNFSDown: No tools nfs services running found - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNFSDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNFSDown [18:01:41] !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [18:01:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:02:01] !log raymond@ubuntu tools END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-cli [18:02:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:02:52] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [18:02:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:02:57] !log dcaro@urcuchillay tools END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli [18:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:05:07] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [18:05:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:05:27] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T371337#10029033 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/443 [18:05:33] vivian-rook opened https://github.com/toolforge/paws/pull/443 [18:06:16] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli [18:06:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:06:22] !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [18:06:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:08:04] !log raymond@ubuntu tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli [18:08:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:13:57] (03CR) 10Raymond Ndibe: [C:03+2] "used it to deploy jobs-cli to tools and it worked perfectly." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1057847 (owner: 10David Caro) [18:15:20] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.326-20240730173106-a8852dcd [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/461 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [18:15:23] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.326-20240730173106-a8852dcd [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/461 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [18:15:41] (03merge) 10raymond-ndibe: d/changelog: bump to 16.1.0 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/58 (https://phabricator.wikimedia.org/T366209) [18:15:58] (03update) 10raymond-ndibe: d/changelog: bump to 16.1.0 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/58 (https://phabricator.wikimedia.org/T366209) [18:16:36] 06cloud-services-team, 10Toolforge: [NFS] Add monitoring and alerting to the new NFS system - https://phabricator.wikimedia.org/T293804#10029068 (10dcaro) 05Open→03Resolved Added a couple basic alerts to check if the systemd unit is up for tools, toolsbeta and paws, that should be enough. We have other... [18:17:45] (03Merged) 10jenkins-bot: toolforge.deploy: support deploying packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1057847 (owner: 10David Caro) [18:19:39] 06cloud-services-team, 10Toolforge: [NFS] Add monitoring and alerting to the new NFS system - https://phabricator.wikimedia.org/T293804#10029083 (10Andrew) Thanks @dcaro! [18:21:27] FIRING: ToolsbetaNFSDown: No toolsbeta nfs services running found - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNFSDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsbetaNFSDown [19:12:57] RESOLVED: ToolsbetaNFSDown: No toolsbeta nfs services running found - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNFSDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsbetaNFSDown [19:14:15] 06cloud-services-team, 10Cloud-VPS: openstack: consider reducing log pressure - https://phabricator.wikimedia.org/T371356#10029387 (10Andrew) Would this be best addressed by changing the default logstash board settings to filter info and debug logs? Then they'd be there if we needed them. [19:16:24] 06cloud-services-team, 10Cloud-VPS: openstack: consider reducing log pressure - https://phabricator.wikimedia.org/T371356#10029406 (10Andrew) >>! In T371356#10029387, @Andrew wrote: > Would this be best addressed by changing the default logstash board settings to filter info and debug logs? Then they'd be ther... [19:51:54] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): toolforge: puppetserver got OOMkilled - https://phabricator.wikimedia.org/T369797#10029661 (10Andrew) 05Open→03Declined Closing this for now because it doesn't seem to be happening repeatedly. [20:05:48] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispore: Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm - https://phabricator.wikimedia.org/T365934#10029691 (10Andrew) Thank you for paying attention to this, @Tgr. Do you still hope to work on this transfer? [20:13:44] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10029734 (10Andrew) @Jgiannelos do you have any suggestions about this... [20:50:14] 10Data-Services: View 'centralauth_p.globalblocks' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them - https://phabricator.wikimedia.org/T371437 (10AntiCompositeNumber) 03NEW [20:51:53] 10Data-Services: View 'centralauth_p.globalblocks' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them - https://phabricator.wikimedia.org/T371437#10029916 (10AntiCompositeNumber) [20:55:04] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Migrate Wikitech's Jobqueue - https://phabricator.wikimedia.org/T371359#10029921 (10bd808) The "very special" bits are I think just the legacy setup from being on it's own strange pair of hosts detached from all other MediaWiki deployments. I can't think o... [21:06:23] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispore: Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm - https://phabricator.wikimedia.org/T365934#10029964 (10Tgr) I would still like to do this if possible but had a series of distractions. Sorry for the delay. [21:57:48] (03update) 10raymond-ndibe: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46 [22:18:50] (03update) 10raymond-ndibe: [builds-cli] remove _display_messages [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/69 [22:19:21] (03update) 10raymond-ndibe: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46 [22:39:41] (03update) 10raymond-ndibe: [builds-cli] remove _display_messages [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/69 [23:31:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-22 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses