[00:07:56] FIRING: MaxConntrack: Max conntrack at 80.36% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:22:56] RESOLVED: MaxConntrack: Max conntrack at 80.96% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:35:56] FIRING: MaxConntrack: Max conntrack at 80.23% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:43:11] RESOLVED: MaxConntrack: Max conntrack at 80.51% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:43:52] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:47:56] FIRING: MaxConntrack: Max conntrack at 80.16% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:48:52] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:54:52] FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:59:52] RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:02:56] RESOLVED: MaxConntrack: Max conntrack at 81.09% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:24:52] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:29:52] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:33:52] FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:38:52] RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:41:01] 10Toolforge: attempting to create a python virtual environment on the bastion has a confusing error message - https://phabricator.wikimedia.org/T369477 (10AntiCompositeNumber) 03NEW [02:20:52] FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [02:25:52] RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [02:34:04] 10Toolforge: attempting to create a python virtual environment on the bastion has a confusing error message - https://phabricator.wikimedia.org/T369477#9959524 (10Bugreporter) See also current way to create venv: T363071#9732167 although this is not obvious. [03:13:52] FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:18:52] RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:50:52] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:55:52] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:06:56] FIRING: SystemdUnitDown: The service unit purge_vm_rbd_images.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:07:52] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:12:52] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:32:52] FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:37:52] RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [05:34:52] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [05:39:52] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:01:57] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:02:04] 06cloud-services-team: SystemdUnitDown Unit purge_vm_rbd_images.service on node cloudcontrol1005 has been down for long. - https://phabricator.wikimedia.org/T369479 (10phaultfinder) 03NEW [06:07:52] FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:12:52] RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:25:52] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:29:47] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "mediawiki-vagrant" project Buster deprecation - https://phabricator.wikimedia.org/T367541#9959653 (10hashar) [06:30:34] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "mediawiki-vagrant" project Buster deprecation - https://phabricator.wikimedia.org/T367541#9959654 (10hashar) I have removed myself as an administrator of the project since I know nothing about it. [06:30:52] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:35:23] 10Tool-yearinreview, 06Indic MediaWiki Developers UG, 06Indic-TechCom, 07good first task: Shareable image download issue - https://phabricator.wikimedia.org/T364642#9959663 (10KCVelaga) [06:36:42] 10Tool-yearinreview, 06Indic MediaWiki Developers UG, 06Indic-TechCom: add support for punjabi in yearinreview tool - https://phabricator.wikimedia.org/T369465#9959682 (10KCVelaga) [06:37:41] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07), 13Patch-For-Review: Modify db-mysql to connect to an-redacteddb1001 from cumin hosts - https://phabricator.wikimedia.org/T368354#9959691 (10ABran-WMF) we've not seen any regression since you released the update, I think you... [06:45:52] (03PS2) 10David Caro: bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 [06:48:46] (03CR) 10CI reject: [V:04-1] bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 (owner: 10David Caro) [06:49:32] (03approved) 10dcaro: api: rename params for clarity [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/100 (owner: 10sstefanova) [06:49:34] (03update) 10dcaro: api: rename params for clarity [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/100 (owner: 10sstefanova) [06:51:41] (03update) 10dcaro: [lima-kilo] fix toolforge_deploy_mr restore [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/163 (owner: 10raymond-ndibe) [06:51:44] (03approved) 10dcaro: [lima-kilo] fix toolforge_deploy_mr restore [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/163 (owner: 10raymond-ndibe) [06:51:46] (03update) 10dcaro: [lima-kilo] fix toolforge_deploy_mr restore [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/163 (owner: 10raymond-ndibe) [06:57:39] (03update) 10dcaro: toolforge: add webservice configuration [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/127 [07:22:49] 06cloud-services-team, 10Toolforge (Toolforge iteration 12): toolforge: integrate fourohfour as a custom component, rather than a normal tool - https://phabricator.wikimedia.org/T369364#9959759 (10dcaro) [07:24:00] 06cloud-services-team, 10Toolforge (Toolforge iteration 12): toolforge: get a working setup for ingress-nginx and webservices in lima-kilo - https://phabricator.wikimedia.org/T369363#9959760 (10dcaro) [07:41:07] (03open) 10dcaro: Draft: ingress-nginx: deploy without fourohfour locally [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/396 [07:41:26] 06cloud-services-team, 10Toolforge (Toolforge iteration 12): toolforge: get a working setup for ingress-nginx and webservices in lima-kilo - https://phabricator.wikimedia.org/T369363#9959785 (10dcaro) [07:44:01] (03merge) 10sstefanova: api: rename params for clarity [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/100 [07:49:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:51:23] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: builds-api: bump to 0.0.159-20240708074416-a1b6f7d5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/397 [08:04:42] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:12:17] 10tool-wscontest, 06Indic MediaWiki Developers UG, 06Indic-TechCom: Create wscontest tool frontend - https://phabricator.wikimedia.org/T369402#9959802 (10KCVelaga) [08:13:10] 10Tool-dabfix, 06Indic MediaWiki Developers UG, 06Indic-TechCom: Fix Text Overflow - https://phabricator.wikimedia.org/T369380#9959804 (10KCVelaga) [08:15:34] 10Tool-dabfix, 06Indic MediaWiki Developers UG, 06Indic-TechCom: Improve Footer Styling and Fix Position - https://phabricator.wikimedia.org/T369379#9959817 (10KCVelaga) [08:15:56] 10Tool-yearinreview, 06Indic MediaWiki Developers UG, 06Indic-TechCom: search button while selecting language in yearinreview tool - https://phabricator.wikimedia.org/T369400#9959818 (10KCVelaga) [08:16:08] 10tool-wscontest, 06Indic MediaWiki Developers UG, 06Indic-TechCom: Create wscontest tool backend - https://phabricator.wikimedia.org/T369410#9959820 (10KCVelaga) [08:18:55] 10Tool-yearinreview, 06Indic MediaWiki Developers UG, 06Indic-TechCom: Restore caching of user stats - https://phabricator.wikimedia.org/T364697#9959835 (10KCVelaga) [08:18:57] 10Tool-toolwatch, 06Indic MediaWiki Developers UG, 06Indic-TechCom: Sort tools based on tool Title - https://phabricator.wikimedia.org/T353579#9959836 (10KCVelaga) [08:29:35] 06cloud-services-team, 10Toolforge (Toolforge iteration 12): toolforge: get a working setup for ingress-nginx and webservices in lima-kilo - https://phabricator.wikimedia.org/T369363#9959890 (10Slst2020) [08:30:27] 10Toolforge: [lima-kilo] add the ingress-admission-controller - https://phabricator.wikimedia.org/T369355#9959888 (10Slst2020) →14Duplicate dup:03T369363 [08:35:52] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [08:36:04] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [08:42:17] 06cloud-services-team, 10Toolforge (Toolforge iteration 12), 13Patch-For-Review: toolforge: prepare deb packages for k8s 1.25 - https://phabricator.wikimedia.org/T369163#9959941 (10aborrero) 05Open→03In progress p:05Triage→03Medium [08:46:15] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [08:46:27] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [08:48:33] 06cloud-services-team, 10Toolforge (Toolforge iteration 12), 13Patch-For-Review: toolforge: prepare deb packages for k8s 1.25 - https://phabricator.wikimedia.org/T369163#9959952 (10aborrero) 05In progress→03Resolved a:03aborrero this should be ready to go: `lang=shell-session aborrero@toolsbeta-te... [08:49:02] (03approved) 10aborrero: registry-admission: bump to 0.0.44-20240705083909-fbafef28 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/390 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:49:12] (03approved) 10aborrero: wmcs-k8s-metrics: bump kube-state-metrics version [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/392 (https://phabricator.wikimedia.org/T329671) (owner: 10sstefanova) [08:49:23] (03approved) 10aborrero: volume-admission: bump to 0.0.50-20240705111023-80cfa300 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/393 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:50:47] (03update) 10sstefanova: volume-admission: bump to 0.0.50-20240705111023-80cfa300 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/393 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:52:10] (03update) 10sstefanova: wmcs-k8s-metrics: bump kube-state-metrics version [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/392 (https://phabricator.wikimedia.org/T329671) [08:52:29] (03update) 10sstefanova: volume-admission: bump to 0.0.50-20240705111023-80cfa300 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/393 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:53:55] (03update) 10sstefanova: registry-admission: bump to 0.0.44-20240705083909-fbafef28 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/390 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:53:56] (03update) 10sstefanova: registry-admission: bump to 0.0.44-20240705083909-fbafef28 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/390 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:55:25] (03update) 10sstefanova: builds-api: bump to 0.0.159-20240708074416-a1b6f7d5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/397 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:55:27] (03approved) 10sstefanova: builds-api: bump to 0.0.159-20240708074416-a1b6f7d5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/397 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:55:38] (03merge) 10sstefanova: builds-api: bump to 0.0.159-20240708074416-a1b6f7d5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/397 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:57:53] (03update) 10sstefanova: registry-admission: bump to 0.0.44-20240705083909-fbafef28 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/390 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [08:59:07] (03update) 10sstefanova: tekton: update apiVersion to autoscaling/v2 [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/50 (https://phabricator.wikimedia.org/T369164) [09:04:22] (03approved) 10aborrero: tekton: update apiVersion to autoscaling/v2 [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/50 (https://phabricator.wikimedia.org/T369164) (owner: 10sstefanova) [09:05:17] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07), 13Patch-For-Review: Modify db-mysql to connect to an-redacteddb1001 from cumin hosts - https://phabricator.wikimedia.org/T368354#9959977 (10MatthewVernon) >>! In T368354#9950437, @BTullis wrote: > It would be nice if we cou... [09:06:29] (03merge) 10sstefanova: tekton: update apiVersion to autoscaling/v2 [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/50 (https://phabricator.wikimedia.org/T369164) [09:07:49] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: builds-builder: bump to 0.0.109-20240708090642-a7a583cb [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/398 (https://phabricator.wikimedia.org/T369164) [09:14:08] (03update) 10sstefanova: volume-admission: bump to 0.0.50-20240705111023-80cfa300 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/393 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:28:33] 06cloud-services-team, 10Toolforge (Toolforge iteration 12), 13Patch-For-Review: [infra,k8s] package k9s for use in kubernetes - https://phabricator.wikimedia.org/T366061#9960009 (10aborrero) 05Open→03Resolved done: `lang=shell-session aborrero@tools-k8s-control-7:~$ sudo -i k9s `` [09:57:47] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07), 13Patch-For-Review: Modify db-mysql to connect to an-redacteddb1001 from cumin hosts - https://phabricator.wikimedia.org/T368354#9960122 (10BTullis) 05Open→03Resolved OK, thanks all. I've deployed the updated packag... [10:02:12] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:19:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:22:13] 10Tool-spacemedia, 10Server-side-upload-request: Server-side upload request for OptimusPrimeBot (INPE DPI) - https://phabricator.wikimedia.org/T366353#9960282 (10Urbanecm_WMF) 05Stalled→03Open a:03Urbanecm_WMF Hi @Don-vip! The issue doesn't appear to be within the WMF network; a 403 Forbidden error mean... [10:22:16] 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2024.06.17 - 2024.07.07), 13Patch-For-Review: Modify db-mysql to connect to an-redacteddb1001 from cumin hosts - https://phabricator.wikimedia.org/T368354#9960290 (10Marostegui) Thank you Ben! [10:29:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:33:57] (03open) 10sstefanova: Revert "k8s: deploy registry-admission" [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/164 [10:33:58] 10Tool-spacemedia, 10Server-side-upload-request: Server-side upload request for OptimusPrimeBot (INPE DPI) - https://phabricator.wikimedia.org/T366353#9960344 (10Urbanecm_WMF) 05Open→03Resolved In any case, this is now done: ` [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonsw... [10:34:04] (03approved) 10sstefanova: Revert "k8s: deploy registry-admission" [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/164 [10:34:11] (03merge) 10sstefanova: Revert "k8s: deploy registry-admission" [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/164 [10:36:19] (03update) 10sstefanova: kind: upgrade to k8s 1.25 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/161 (https://phabricator.wikimedia.org/T369165) [10:38:52] (03update) 10sstefanova: kind: upgrade to k8s 1.25 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/161 (https://phabricator.wikimedia.org/T369165) [11:29:01] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [11:29:18] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [11:59:58] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [12:00:16] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [12:05:41] (03update) 10sstefanova: builds-builder: bump to 0.0.109-20240708090642-a7a583cb [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/398 (https://phabricator.wikimedia.org/T369164) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:06:12] (03approved) 10sstefanova: builds-builder: bump to 0.0.109-20240708090642-a7a583cb [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/398 (https://phabricator.wikimedia.org/T369164) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:06:17] (03merge) 10sstefanova: builds-builder: bump to 0.0.109-20240708090642-a7a583cb [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/398 (https://phabricator.wikimedia.org/T369164) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:07:37] (03update) 10sstefanova: registry-admission: bump to 0.0.44-20240705083909-fbafef28 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/390 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:12:22] (03update) 10sstefanova: build: update dependencies [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/6 (https://phabricator.wikimedia.org/T329671) [12:12:28] (03merge) 10sstefanova: build: update dependencies [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/6 (https://phabricator.wikimedia.org/T329671) [12:15:37] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: ingress-admission: bump to 0.0.46-20240708121241-4dd9c743 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/399 (https://phabricator.wikimedia.org/T329671) [12:22:50] (03update) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/14 [12:24:11] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1052725 (owner: 10L10n-bot) [12:26:19] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission [12:26:30] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission [12:49:47] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission [12:49:57] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission [12:52:17] (03PS11) 10David Caro: ceph.osd.drain_node: force passing the cluster name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990977 [12:52:18] (03PS11) 10David Caro: ceph.osd.undrain_node: fix help and default batch param [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990978 [12:52:18] (03PS12) 10David Caro: ceph: add missing cumin params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990979 [12:52:18] (03PS17) 10David Caro: ceph: drain and undrain in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 (https://phabricator.wikimedia.org/T329709) [12:52:19] (03PS8) 10David Caro: alerts: use spicerack provided code [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1050335 [12:52:21] (03PS2) 10David Caro: ceph: don't fail for single ceph status failures [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052070 [12:52:25] (03PS2) 10David Caro: bootstrap_and_add: add cluster-wide silences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052071 [12:52:29] (03PS3) 10David Caro: bootstrap_and_add: allow passing the batch size [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052072 [12:52:33] (03PS3) 10David Caro: bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 [12:54:02] (03CR) 10David Caro: [C:03+2] ceph.osd.drain_node: force passing the cluster name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990977 (owner: 10David Caro) [12:54:22] (03CR) 10David Caro: [C:03+2] ceph.osd.undrain_node: fix help and default batch param [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990978 (owner: 10David Caro) [12:54:31] (03CR) 10David Caro: [C:03+2] ceph: add missing cumin params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990979 (owner: 10David Caro) [12:55:55] (03CR) 10CI reject: [V:04-1] alerts: use spicerack provided code [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1050335 (owner: 10David Caro) [12:56:05] (03CR) 10CI reject: [V:04-1] ceph: don't fail for single ceph status failures [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052070 (owner: 10David Caro) [12:56:18] (03CR) 10CI reject: [V:04-1] bootstrap_and_add: allow passing the batch size [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052072 (owner: 10David Caro) [12:56:29] (03CR) 10CI reject: [V:04-1] bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 (owner: 10David Caro) [12:56:33] (03CR) 10CI reject: [V:04-1] bootstrap_and_add: add cluster-wide silences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052071 (owner: 10David Caro) [12:57:07] (03CR) 10David Caro: [C:03+2] ceph: drain and undrain in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 (https://phabricator.wikimedia.org/T329709) (owner: 10David Caro) [12:57:11] (03Merged) 10jenkins-bot: ceph.osd.drain_node: force passing the cluster name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990977 (owner: 10David Caro) [12:57:54] (03Merged) 10jenkins-bot: ceph.osd.undrain_node: fix help and default batch param [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990978 (owner: 10David Caro) [12:57:56] (03Merged) 10jenkins-bot: ceph: add missing cumin params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990979 (owner: 10David Caro) [13:00:22] (03Merged) 10jenkins-bot: ceph: drain and undrain in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 (https://phabricator.wikimedia.org/T329709) (owner: 10David Caro) [13:00:27] (03update) 10sstefanova: ingress-admission: bump to 0.0.46-20240708121241-4dd9c743 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/399 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:00:27] (03PS9) 10David Caro: alerts: use spicerack provided code [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1050335 [13:00:27] (03PS3) 10David Caro: ceph: don't fail for single ceph status failures [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052070 [13:00:28] (03PS3) 10David Caro: bootstrap_and_add: add cluster-wide silences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052071 [13:00:29] (03PS4) 10David Caro: bootstrap_and_add: allow passing the batch size [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052072 [13:00:30] (03PS4) 10David Caro: bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 [13:00:39] (03approved) 10sstefanova: ingress-admission: bump to 0.0.46-20240708121241-4dd9c743 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/399 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:00:44] (03merge) 10sstefanova: ingress-admission: bump to 0.0.46-20240708121241-4dd9c743 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/399 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:01:49] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, and 2 others: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#9960714 (10dcaro) [13:01:56] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy (T309789) [13:02:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:02:02] T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789 [13:02:32] (03update) 10sstefanova: volume-admission: bump to 0.0.50-20240705111023-80cfa300 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/393 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:03:55] (03CR) 10CI reject: [V:04-1] ceph: don't fail for single ceph status failures [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052070 (owner: 10David Caro) [13:04:15] (03CR) 10CI reject: [V:04-1] bootstrap_and_add: allow passing the batch size [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052072 (owner: 10David Caro) [13:04:48] (03CR) 10CI reject: [V:04-1] bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 (owner: 10David Caro) [13:04:48] (03CR) 10CI reject: [V:04-1] bootstrap_and_add: add cluster-wide silences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052071 (owner: 10David Caro) [13:05:55] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission [13:06:07] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission [13:09:10] (03open) 10aborrero: kubecerts: have certificates lifetime to be max 10 days, renew them often [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/55 (https://phabricator.wikimedia.org/T365681) [13:17:12] (03CR) 10David Caro: [C:03+2] alerts: use spicerack provided code [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1050335 (owner: 10David Caro) [13:19:48] (03PS4) 10David Caro: ceph: don't fail for single ceph status failures [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052070 [13:19:48] (03PS4) 10David Caro: bootstrap_and_add: add cluster-wide silences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052071 [13:19:48] (03PS5) 10David Caro: bootstrap_and_add: allow passing the batch size [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052072 [13:19:49] (03PS5) 10David Caro: bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 [13:20:00] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission [13:20:11] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission [13:21:04] (03Merged) 10jenkins-bot: alerts: use spicerack provided code [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1050335 (owner: 10David Caro) [13:23:31] (03CR) 10CI reject: [V:04-1] bootstrap_and_add: allow passing the batch size [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052072 (owner: 10David Caro) [13:23:51] (03CR) 10CI reject: [V:04-1] bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 (owner: 10David Caro) [13:25:31] (03CR) 10David Caro: [C:03+2] ceph: don't fail for single ceph status failures [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052070 (owner: 10David Caro) [13:25:34] (03CR) 10David Caro: [C:03+2] bootstrap_and_add: add cluster-wide silences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052071 (owner: 10David Caro) [13:27:23] (03merge) 10sstefanova: volume-admission: bump to 0.0.50-20240705111023-80cfa300 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/393 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:27:38] (03PS6) 10David Caro: bootstrap_and_add: allow passing the batch size [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052072 [13:27:38] (03PS6) 10David Caro: bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 [13:28:09] (03update) 10sstefanova: registry-admission: bump to 0.0.44-20240705083909-fbafef28 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/390 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:28:42] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission [13:28:53] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission [13:29:27] (03Merged) 10jenkins-bot: ceph: don't fail for single ceph status failures [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052070 (owner: 10David Caro) [13:29:28] (03Merged) 10jenkins-bot: bootstrap_and_add: add cluster-wide silences [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052071 (owner: 10David Caro) [13:32:06] (03CR) 10David Caro: [C:03+2] bootstrap_and_add: allow passing the batch size [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052072 (owner: 10David Caro) [13:32:09] (03CR) 10David Caro: [C:03+2] bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 (owner: 10David Caro) [13:35:59] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission [13:36:09] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission [13:36:17] (03Merged) 10jenkins-bot: bootstrap_and_add: allow passing the batch size [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052072 (owner: 10David Caro) [13:36:34] (03Merged) 10jenkins-bot: bootstrap_and_add: ignore osds that already were there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1052094 (owner: 10David Caro) [13:41:57] RESOLVED: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:51:16] (03merge) 10sstefanova: registry-admission: bump to 0.0.44-20240705083909-fbafef28 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/390 (https://phabricator.wikimedia.org/T329671) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [13:53:26] (03update) 10sstefanova: wmcs-k8s-metrics: bump kube-state-metrics version [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/392 (https://phabricator.wikimedia.org/T329671) [13:56:13] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-1 [13:56:33] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-1 [13:56:57] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-2 [13:57:16] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-2 [13:57:23] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-3 [13:57:43] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-3 [13:59:07] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics [13:59:21] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics [14:01:28] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 3 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:08:50] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics [14:09:04] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics [14:11:28] RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 3 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:16:27] (03merge) 10sstefanova: wmcs-k8s-metrics: bump kube-state-metrics version [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/392 (https://phabricator.wikimedia.org/T329671) [14:18:31] (03update) 10sstefanova: kind: upgrade to k8s 1.25 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/161 (https://phabricator.wikimedia.org/T369165) [14:22:27] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) (T309789) [14:22:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:22:32] T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789 [14:45:09] 10Toolforge (Toolforge iteration 12): [envvars-api, envvars-cli] Prefix all endpoints with `/tool/` - https://phabricator.wikimedia.org/T363809#9961308 (10Slst2020) 05In progress→03Resolved [14:46:31] 06cloud-services-team, 10Toolforge (Toolforge iteration 12), 13Patch-For-Review: toolforge: upgrade all Kubernetes components to versions supporting Kubernetes 1.25 - https://phabricator.wikimedia.org/T329671#9961315 (10Slst2020) 05In progress→03Resolved [14:48:56] FIRING: [8x] SystemdUnitDown: The service unit ceph-osd@80.service is in failed status on host cloudcephosd1011. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1011 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:49:41] FIRING: CloudVPSDesignateLeaks: Detected 12 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:51:07] (03merge) 10aborrero: deployment: remove PSP reference [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/8 (https://phabricator.wikimedia.org/T368142) [14:54:21] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: registry-admission: bump to 0.0.45-20240708145115-17015d83 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/400 (https://phabricator.wikimedia.org/T368142) [14:59:25] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, and 2 others: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#9961401 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by root@cum... [15:04:42] 10cloud-services-team (Hardware), 05Goal: eqiad1: procure 1 additional cloudlb server - https://phabricator.wikimedia.org/T341062#9961424 (10Andrew) This will be a peer with existing cloudlb servers which are A-10G so I think this can be A as well. [15:09:17] 10Tool-yearinreview, 07good first task: Please attribute the original, add disclaimer and add the LICENSE - https://phabricator.wikimedia.org/T366114#9961427 (10theprotonade) Thank you for tagging this task with #good_first_task for Wikimedia newcomers! Newcomers often may not be aware of things that may seem... [15:17:39] 06cloud-services-team, 10Toolforge (Toolforge iteration 12), 13Patch-For-Review: toolforge: review k8s API usage by custom components for 1.25 upgrade - https://phabricator.wikimedia.org/T369164#9961460 (10Slst2020) 05Open→03In progress [15:18:50] 06cloud-services-team, 10Toolforge (Toolforge iteration 12): toolforge: refresh kubernetes cookbooks for the 1.25 upgrade - https://phabricator.wikimedia.org/T369166#9961481 (10Slst2020) a:03Slst2020 [15:29:42] RESOLVED: CloudVPSDesignateLeaks: Detected 12 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:30:03] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: eqiad1: fix PTR delegations for 185.15.56.0/24 - https://phabricator.wikimedia.org/T341338#9961545 (10Andrew) It's not clear to me that I can delete 56.15.185.in-addr.arpa. while 0-25.56.15.185.in-addr.arpa. exists: ` root@cloudcontrol1007:~# openstack... [15:52:09] 06cloud-services-team, 10Technical-blog-posts: Tech blog post: "Wikimedia Toolforge: migrating Kubernetes from PodSecurityPolicy to kyverno" - https://phabricator.wikimedia.org/T368948#9961687 (10debt) @aborrero done! :) [15:58:10] 06cloud-services-team, 10Toolforge (Toolforge iteration 12): toolforge: lima-kilo: deploy registry admission - https://phabricator.wikimedia.org/T369527 (10aborrero) 03NEW [15:58:40] (03open) 10aborrero: registryadmission: allow to exclude namespaces from checks [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/9 (https://phabricator.wikimedia.org/T369527) [16:08:54] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, and 2 others: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#9961796 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by root@cumin10... [16:09:42] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, and 2 others: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#9961797 (10ops-monitoring-bot) Host rebooted by dcaro@cumin1002 with reason: upgraded packa... [16:14:39] FIRING: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:19:39] RESOLVED: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:47:39] FIRING: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:54:19] 10Tool-bridgebot: IgnoreNicks setting applied to a gateway does not work as expected - https://phabricator.wikimedia.org/T369534 (10bd808) 03NEW [16:57:39] RESOLVED: [2x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:57:44] 10Tool-bridgebot: IgnoreNicks setting applied to a gateway does not work as expected - https://phabricator.wikimedia.org/T369534#9962052 (10bd808) p:05Triage→03Medium I was just wondering as I wrote this up if the `IgnoreNicks` should be added to the Telegram side of the bridge rather than the IRC side. The... [17:10:51] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T309789) [17:10:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:10:57] T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789 [17:18:10] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [17:31:03] (03update) 10dcaro: Cache ldap connection and config loading [toolforge-repos/fourohfour] - 10https://gitlab.wikimedia.org/toolforge-repos/fourohfour/-/merge_requests/2 (https://phabricator.wikimedia.org/T335680) [17:31:18] (03update) 10dcaro: Cache ldap connection and config loading [toolforge-repos/fourohfour] - 10https://gitlab.wikimedia.org/toolforge-repos/fourohfour/-/merge_requests/2 (https://phabricator.wikimedia.org/T335680) [17:35:59] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T309789) [17:36:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:36:06] T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789 [17:38:10] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [17:48:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-37 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:00:11] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/14 (owner: 10l10n-bot) [18:00:49] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/14 (owner: 10l10n-bot) [18:00:58] (03open) 10bd808: Make image useful for Brad [toolforge-repos/bd808-buildpack-perl-bastion] - 10https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/merge_requests/1 [18:03:12] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikidumpparse" project Buster deprecation - https://phabricator.wikimedia.org/T367561#9962436 (10Andrew) Congrats on the new kid! For the quota increase, please file a quota request here: https://phabricator.wikimedia.org/project/view/2880/ [18:05:27] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikidumpparse" project Buster deprecation - https://phabricator.wikimedia.org/T367561#9962437 (10Andrew) Oh, btw, it might be useful for you to subscribe to https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/ so that you get timely n... [18:17:30] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 07Epic, 10Temporary accounts (Create/update essential tools/anti-abuse management): [Epic] Implement global user contributions feature - https://phabricator.wikimedia.org/T337089#9962459 (10Bugreporter) Note: If this feature can only view contr... [18:20:26] 10Cloud-VPS (Quota-requests): Humaniki - https://phabricator.wikimedia.org/T369545 (10TheEugeniaKim) 03NEW [18:22:19] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikidumpparse" project Buster deprecation - https://phabricator.wikimedia.org/T367561#9962503 (10TheEugeniaKim) @notconfusing Congratulations on the new kid! Enjoy the travels. I opened a quota request here https://phabricator.wikimedia.org/T369545 . Let me kno... [18:32:48] 10Tool-spacemedia, 10Server-side-upload-request: Server-side upload request for OptimusPrimeBot (INPE DPI) - https://phabricator.wikimedia.org/T366353#9962525 (10Don-vip) Thank you @Urbanecm_WMF! I see the files have been imported, but the thumbnails have not been generated: {F56294387} Is there somethin... [18:34:36] 10Cloud-VPS (Quota-requests): Humaniki - https://phabricator.wikimedia.org/T369545#9962551 (10JJMC89) [18:34:38] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikidumpparse" project Buster deprecation - https://phabricator.wikimedia.org/T367561#9962552 (10JJMC89) [19:06:34] 10Tool-yearinreview, 07good first task: Please attribute the original, add disclaimer and add the LICENSE - https://phabricator.wikimedia.org/T366114#9962664 (10Jdlrobson) p:05Triage→03High Hi there @theprotonade! Attribution is quite a serious matter. It's important to me from an ethical standpoint that i... [19:48:44] 10Cloud-VPS (Quota-requests): Humaniki - https://phabricator.wikimedia.org/T369545#9962810 (10bd808) > g2.cores8.ram16.disk1120.custom "Fat" disk images have been retired. The recommended solution today is Cinder volumes as explained at https://wikitech.wikimedia.org/wiki/Help:Adding_disk_space_to_Cloud_VPS_ins... [19:55:23] 10Cloud-VPS (Quota-requests): Humaniki - https://phabricator.wikimedia.org/T369545#9962832 (10Andrew) Trove might be worth a try for this although iirc it has some scaling issues that cause it to misbehave in the hundreds-of-gigabytes db size. We can provide either a cinder quota or a trove quota, whichever you... [20:03:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-37 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:03:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-37 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:04:24] 10Cloud-VPS (Quota-requests): Storage quota increase request for project wikidumpparse - https://phabricator.wikimedia.org/T369545#9962858 (10Andrew) [20:08:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-37 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:09:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-37 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:09:48] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-37 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:14:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-37 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:14:45] 10Quarry: Deduplicate config load - https://phabricator.wikimedia.org/T349135#9962915 (10SD0001) 05Open→03Resolved a:03SD0001 [20:16:20] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 [20:16:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:17:26] 10Quarry, 07good first task, 07Regression: Bad resultset number case is not handled - https://phabricator.wikimedia.org/T218470#9962923 (10SD0001) [20:17:48] 10Quarry: Error in web instances. - https://phabricator.wikimedia.org/T362157#9962921 (10SD0001) →14Duplicate dup:03T218470 [20:22:14] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 [20:22:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:33:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-37 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [21:19:11] 10Toolforge: Regression dev.toolforge.org is missing a C compiler - https://phabricator.wikimedia.org/T369408#9963088 (10bd808) 05Open→03Invalid This behavior change was [[https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/UAMLGQ42CVHLRZ5W2CZBJDJFRNSBT4DC/|announced to... [21:32:52] (03update) 10bd808: Make image useful for Brad [toolforge-repos/bd808-buildpack-perl-bastion] - 10https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/merge_requests/1 [21:57:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-9 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:16:59] 10Toolforge: fagiani/apt buildpack very slow when processing a large collection of packages - https://phabricator.wikimedia.org/T369563 (10bd808) 03NEW [22:24:25] 10Toolforge: fagiani/apt buildpack very slow when processing a large collection of packages - https://phabricator.wikimedia.org/T369563#9963320 (10bd808) [23:07:18] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-9 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:12:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-9 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:13:48] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-9 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:14:48] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-9 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:20:38] 10Toolforge: `webservice` requires effective user to be the tool user and listed in NSS passwd data - https://phabricator.wikimedia.org/T369569 (10bd808) 03NEW [23:25:40] 10Toolforge: `webservice` requires effective user to be the tool user and listed in NSS passwd data - https://phabricator.wikimedia.org/T369569#9963504 (10bd808) [23:28:35] 10Toolforge: `toolforge jobs` requires current user to be the tool user and listed in NSS passwd data - https://phabricator.wikimedia.org/T369573 (10bd808) 03NEW [23:29:21] 10Toolforge: `webservice` requires effective user to be the tool user and listed in NSS passwd data - https://phabricator.wikimedia.org/T369569#9963548 (10bd808) [23:37:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-9 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses