[00:49:25] 10Cloud-VPS (Project-requests): Request creation of mdwiki-offline VPS project - https://phabricator.wikimedia.org/T358023#9562034 (10Bugreporter) I am not sure it is proper to use cloud VPS to support a non-WMF project is proper. It also hosts a Commons-like site for NC/ND files which do not meet https://founda... [00:50:42] 10Cloud-VPS (Project-requests), 10Wikimedia-Medicine: Request creation of mdwiki-offline VPS project - https://phabricator.wikimedia.org/T358023#9562038 (10Bugreporter) [05:47:20] 10Grid-Engine-to-K8s-Migration: Migrate yapperbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320195#9562334 (10Legoktm) Luckily these are all statically linked golang binaries, so moving them to the grid is straightforward: ` tools.yapperbot@tools-sgebastion-10:~$ cat... [06:36:17] 10Grid-Engine-to-K8s-Migration: Migrate yapperbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320195#9562349 (10Legoktm) Hmm, something is wrong: ` 2024/02/21 06:30:40 Error editing user talk for Compassionate727 meant they couldn't be notified and were ignored. The err... [09:06:27] 10Cloud-VPS: GLAMWiki Dashboard not loading - https://phabricator.wikimedia.org/T355082#9562490 (10Peachey88) [09:07:14] 10VPS-Projects: GLAMWiki Dashboard not loading - https://phabricator.wikimedia.org/T355082#9562491 (10taavi) [09:10:44] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster [09:13:18] 10Cloud-VPS (Project-requests), 10Wikimedia-Medicine: Request creation of mdwiki-offline VPS project - https://phabricator.wikimedia.org/T358023#9562508 (10dcaro) Giving support to non-WMF projects is perfectly fine, as long as they benefit the Wikimedia movement (https://meta.wikimedia.org/wiki/Wikimedia_move... [09:20:06] !log taavi@cloudcumin1001 tools Added a new k8s control tools-k8s-control-7.tools.eqiad1.wikimedia.cloud to the cluster [09:20:07] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster [09:39:29] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-control-4 [09:40:08] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-control-4 [09:46:28] (InstanceDown) firing: Project tools instance tools-k8s-control-4 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:51:28] (InstanceDown) resolved: Project tools instance tools-k8s-control-4 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:12:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:17:41] (CloudVPSDesignateLeaks) firing: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:44:22] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1033.eqiad.wmnet' (T319184) [11:44:28] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [12:03:14] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1033.eqiad.wmnet' (T319184) [12:03:20] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [12:13:18] 10Toolforge, 10cloud-services-team: Toolforge: potential improvements for labs/toollabs.git - https://phabricator.wikimedia.org/T279308#9563146 (10dcaro) [12:13:49] 10Cloud-Services, 10Toolforge: Rename 'misctools' toollabs package to something more appropriate - https://phabricator.wikimedia.org/T91879#9563144 (10dcaro) 05Open→03Invalid The parent task needs re-thinking given the latest changes in the platform, will re-create this task if needed. [12:22:59] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netbox, and 2 others: Netbox: Add support for our complex host network setups in provision script - https://phabricator.wikimedia.org/T346428#9563208 (10ayounsi) {T358096} for the Cassandra/extra IPs usecase. [12:27:59] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563229 (10Slst2020) >>! In T354745#9563125, @dcaro wrote: > The big advantag... [12:28:59] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Synthesis user testing results - https://phabricator.wikimedia.org/T358098#9563231 (10KColeman-WMF) [12:31:39] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, 10Design: [Design EPIC] Global User Contributions - https://phabricator.wikimedia.org/T349901#9563243 (10KColeman-WMF) [12:36:09] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Synthesise user testing results - https://phabricator.wikimedia.org/T358098#9563261 (10KColeman-WMF) [12:50:30] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1033.eqiad.wmnet' (T319184) [12:50:36] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [12:50:56] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1033.eqiad.wmnet' (T319184) [12:51:04] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary [12:51:22] !log aborrero@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) [12:53:04] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9563291 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1033.eqiad.wmnet with OS book... [12:53:46] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563294 (10dcaro) >>! In T354745#9563229, @Slst2020 wrote: >>>! In T354745#95... [12:57:49] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563318 (10Slst2020) >>! In T354745#9563294, @dcaro wrote: > We can do it lit... [13:02:07] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563322 (10dcaro) >>! In T354745#9563318, @Slst2020 wrote: >>>! In T354745#95... [13:05:26] 10Toolforge, 10cloud-services-team, 10community-labs-monitoring, 10User-Matthewrbowker: Establish an internal system or a recommended external system for monitoring user-created Toolforge web services - https://phabricator.wikimedia.org/T53434#9563326 (10dcaro) We can implement this soon using metricsinfra... [13:09:08] 10Toolforge (Toolforge iteration 06), 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9563333 (10taavi) I don't know how... [13:12:07] 10Toolforge: Listeria bot sometimes gets stuck with 104 errors from Wikimedia APIs - https://phabricator.wikimedia.org/T356160#9563340 (10Magnus) Thanks, that works form me. It is much slower than before, as it needs to run full compilation every time, instead of just the changed bits. Ah well. [13:12:58] 10Toolforge: Listeria bot sometimes gets stuck with 104 errors from Wikimedia APIs - https://phabricator.wikimedia.org/T356160#9563341 (10Magnus) 05Open→03Resolved a:03Magnus [13:13:03] 10Toolforge (Toolforge iteration 06), 10User-aborrero: [toolforge] several tools get periods of connection refused (104) when connecting to wikis - https://phabricator.wikimedia.org/T356164#9563343 (10Magnus) [13:13:27] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-dcaro: Tool Labs users .bashrc file does not exist for tools accounts - https://phabricator.wikimedia.org/T131561#9563344 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/13... [13:16:10] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-dcaro: Tool Labs users .bashrc file does not exist for tools accounts - https://phabricator.wikimedia.org/T131561#9563351 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolfo... [13:16:41] 10Toolforge (Toolforge iteration 06), 10User-aborrero: [toolforge API] expose all backend APIs OpenAPI specs - https://phabricator.wikimedia.org/T358100#9563353 (10Slst2020) [13:16:46] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [13:16:48] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component maintain-kubeusers [13:16:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:16:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:17:28] 10Toolforge (Toolforge iteration 06), 10Toolforge Jobs framework, 10Patch-For-Review, 10User-aborrero: toolforge: introduce OpenAPI to jobs framework - https://phabricator.wikimedia.org/T356523#9563365 (10Slst2020) [13:17:33] 10Toolforge (Toolforge iteration 06), 10User-aborrero: [toolforge API] expose all backend APIs OpenAPI specs - https://phabricator.wikimedia.org/T358100#9563366 (10Slst2020) [13:19:54] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (T319184) [13:19:59] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [13:20:00] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) (T319184) [13:20:12] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [13:20:14] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component maintain-kubeusers [13:20:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:20:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:20:52] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [13:20:54] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component maintain-kubeusers [13:20:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:20:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:24:14] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [13:24:17] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component maintain-kubeusers [13:24:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:24:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:25:52] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [13:25:54] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component maintain-kubeusers [13:25:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:25:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:26:55] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [13:26:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:27:24] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [13:27:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:29:20] 10cloud-services-team, 10User-aborrero: nova-compute: error running local ceph command - https://phabricator.wikimedia.org/T358101#9563387 (10aborrero) [13:29:46] 10Toolforge, 10Infrastructure-Foundations, 10Mail: Set up alerts for mail queue - https://phabricator.wikimedia.org/T60871#9563397 (10dcaro) p:05Medium→03Low [13:30:05] 10Cloud-Services, 10Toolforge: Provide regular report on tools with single owner - https://phabricator.wikimedia.org/T86432#9563385 (10dcaro) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a... [13:31:07] 10Toolforge, 10Infrastructure-Foundations, 10Mail: Set up alerts for mail queue - https://phabricator.wikimedia.org/T60871#606527 (10dcaro) This would be now on prometheus + alertmanager/metricsinfra [13:31:15] 10Toolforge: Listeria bot sometimes gets stuck with 104 errors from Wikimedia APIs - https://phabricator.wikimedia.org/T356160#9563410 (10dcaro) Yep, we have pending add some caching capabilities to the build system, but might take a bit to get there (we have to enable k8s volumes first). [13:31:40] 10Toolforge, 10Infrastructure-Foundations, 10Mail: [toolforge.infra] Set up alerts for mail queue - https://phabricator.wikimedia.org/T60871#9563411 (10dcaro) [13:37:08] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9563419 (10aborrero) [13:38:08] 10cloud-services-team, 10User-aborrero: openstack: nova refuses to admit a compute node after a reimage - https://phabricator.wikimedia.org/T357631#9563421 (10aborrero) 05Open→03Resolved a:03aborrero The patch solved the problem! [13:38:15] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#8416726 (10aborrero) [13:39:05] (03Abandoned) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez) [13:39:12] (03Abandoned) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add post-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004116 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez) [13:40:08] 10cloud-services-team, 10User-aborrero: openstack: nova refuses to admit a compute node after a reimage - https://phabricator.wikimedia.org/T357631#9563435 (10aborrero) [13:41:56] 10cloud-services-team, 10Patch-For-Review, 10User-aborrero: openstack: create cookbooks to run common pre/post reimage actions on hypervisors - https://phabricator.wikimedia.org/T357765#9563432 (10aborrero) 05Open→03Declined The solution in task {T357631} (persist compute id) solves the actual problem th... [13:43:44] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance (T319184) [13:43:50] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [13:44:15] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) (T319184) [13:46:46] 10Cloud-VPS, 10cloud-services-team, 10User-aborrero: nova-compute: error running local ceph command - https://phabricator.wikimedia.org/T358101#9563451 (10taavi) [13:46:55] 10cloud-services-team: Replace or deprecate WMCS uses of report updater - https://phabricator.wikimedia.org/T357856#9563452 (10fnegri) [13:50:54] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9563463 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1033.eqiad.wmnet with OS... [14:05:06] (ProbeDown) firing: Service toolserver-proxy-01:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolserver-proxy-01:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:06:54] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563495 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/clo... [14:10:06] (ProbeDown) resolved: Service toolserver-proxy-01:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolserver-proxy-01:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:10:59] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563509 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 o... [14:17:42] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:20:49] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [14:20:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:21:22] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [14:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:30:05] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] expose all backend APIs OpenAPI specs - https://phabricator.wikimedia.org/T358100#9563624 (10CodeReviewBot) sstefanova opened https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/23 api: exp... [14:32:16] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [14:32:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:32:29] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-dcaro: Tool Labs users .bashrc file does not exist for tools accounts - https://phabricator.wikimedia.org/T131561#9563637 (10CodeReviewBot) dcaro closed https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/201 m... [14:32:40] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component maintain-kubeusers [14:32:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:33:06] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-dcaro: Tool Labs users .bashrc file does not exist for tools accounts - https://phabricator.wikimedia.org/T131561#9563638 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/14... [14:34:26] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [14:34:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:34:56] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [14:34:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:36:21] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-dcaro: Tool Labs users .bashrc file does not exist for tools accounts - https://phabricator.wikimedia.org/T131561#9563646 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/14... [14:38:43] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-dcaro: Tool Labs users .bashrc file does not exist for tools accounts - https://phabricator.wikimedia.org/T131561#9563658 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolfo... [14:39:57] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [14:40:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:40:20] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9563666 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1033.eqiad.wmnet with OS book... [14:40:26] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [14:40:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:40:36] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [14:40:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:41:07] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [14:41:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:42:59] 10Cloud-VPS (Project-requests), 10Wikimedia-Medicine: Request creation of mdwiki-offline VPS project - https://phabricator.wikimedia.org/T358023#9563682 (10Tim-moody) In general no media will be stored on this WMCS instance, and any content stored will be CC BY or CC BY-SA 4.0. [14:45:18] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-dcaro: Tool Labs users .bashrc file does not exist for tools accounts - https://phabricator.wikimedia.org/T131561#9563687 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/203 m... [14:47:28] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-dcaro: Tool Labs users .bashrc file does not exist for tools accounts - https://phabricator.wikimedia.org/T131561#9563688 (10dcaro) 05In progress→03Resolved [14:51:04] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [14:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:51:34] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [14:51:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:06:56] (SystemdUnitDown) firing: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:20:24] 10PAWS: Optimize for capacity - https://phabricator.wikimedia.org/T218228#9563761 (10rook) https://grafana.wmcloud.org/d/eV0M3UyVk/paws-usage-statistics?orgId=1&from=now-7d&to=now&forceLogin The limiting factor is RAM. Some time ago I switched to RAM heavy systems to help even the usage between CPU and RAM. Sti... [15:20:33] 10PAWS: Optimize for capacity - https://phabricator.wikimedia.org/T218228#9563762 (10rook) 05Open→03Resolved a:03rook [15:20:37] 10PAWS: Increase prometheus retention time - https://phabricator.wikimedia.org/T357786#9563765 (10rook) [15:26:41] 10Cloud-VPS (Project-requests), 10Wikimedia-Medicine: Request creation of mdwiki-offline VPS project - https://phabricator.wikimedia.org/T358023#9563777 (10dcaro) LGTM +1 thanks! [15:27:41] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563806 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/clo... [15:28:13] 10Cloud-VPS (Project-requests), 10Wikimedia-Medicine: Request creation of mdwiki-offline VPS project - https://phabricator.wikimedia.org/T358023#9563809 (10aborrero) Have you considered using Toolforge instead? [15:31:56] (SystemdUnitDown) resolved: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudservices1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:32:20] 10PAWS: Increase prometheus retention time - https://phabricator.wikimedia.org/T357786#9563841 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/379 [15:32:30] vivian-rook opened https://github.com/toolforge/paws/pull/379 [15:32:41] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563842 (10CodeReviewBot) dcaro closed https://gitlab.wikimedia.org/repos/clo... [15:32:42] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:33:19] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563846 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/clo... [15:37:08] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563854 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 o... [15:39:19] 10PAWS: Increase prometheus retention time - https://phabricator.wikimedia.org/T357786#9563856 (10rook) a:03rook [15:47:36] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [15:47:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:48:06] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [15:48:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:48:22] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api [15:48:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:48:52] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api [15:48:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:49:25] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563886 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/clo... [15:52:42] (CloudVPSDesignateLeaks) resolved: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:54:20] (03CR) 10Jforrester: releases: Bump Code to 1.3.3 (031 comment) [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1005174 (owner: 10VolkerE) [15:56:31] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9563909 (10CodeReviewBot) dcaro updated https://gitlab.wikimedia.org/repos/cl... [15:57:48] 10Tool-toolwatch, 10Technical-Tool-Request: Tool Request: ToolForge Health Dashboard Tool (ToolWatch) - https://phabricator.wikimedia.org/T341379#9563905 (10fnegri) 05Open→03Resolved I'm marking this as resolved (see my previous comment). Feel free to reopen if you think there is still work to do as part o... [16:02:25] 10Toolforge, 10Documentation: Update and Improve Toolforge and Cloud VPS Technical Documentation - https://phabricator.wikimedia.org/T203131#9563925 (10dcaro) [16:02:31] 10Toolforge, 10Documentation: Run a documentation sprint for Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T101659#9563926 (10dcaro) [16:02:54] 10Toolforge, 10Documentation: Document how to install Python modules in a tool's home directory/virtual environment - https://phabricator.wikimedia.org/T63824#9563922 (10dcaro) 05Open→03Resolved a:03dcaro I have double checked the current tutorials and updated them with the latest instructions, I think t... [16:07:07] 10PAWS: Increase prometheus retention time - https://phabricator.wikimedia.org/T357786#9564003 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/379 [16:07:14] vivian-rook closed https://github.com/toolforge/paws/pull/379 [16:07:46] 10PAWS: Increase prometheus retention time - https://phabricator.wikimedia.org/T357786#9564009 (10rook) 05Open→03Resolved [16:08:37] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9564011 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/clo... [16:10:55] PROBLEM - Host wikitech-static.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [16:17:01] RECOVERY - Host wikitech-static.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 22.26 ms [16:19:09] (03PS1) 10Arturo Borrero Gonzalez: openstack.cloudvirt.lib.ensure_canary: refresh for openstack API changes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005557 (https://phabricator.wikimedia.org/T357970) [16:21:03] PROBLEM - Host wikitech-static.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [16:24:20] 10Toolforge, 10User-aborrero: [toolforge,jobs] current image aliases - https://phabricator.wikimedia.org/T357388#9564134 (10dcaro) [16:25:23] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9564143 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 o... [16:25:34] 10Toolforge, 10User-aborrero: jobs-api: Periodically refresh image-config data - https://phabricator.wikimedia.org/T357112#9564140 (10dcaro) p:05Triage→03Medium [16:27:47] 10Toolforge: Add a new output format for toolforge jobs list command which returns the input command for scheduled jobs - https://phabricator.wikimedia.org/T356581#9564152 (10dcaro) p:05Triage→03Low [16:30:24] 10Toolforge Jobs framework: [toolforge,jobs,docs] Document how to force a rerun of a scheduled cron (just restart) - https://phabricator.wikimedia.org/T356580#9564165 (10dcaro) [16:32:59] 10Toolforge Jobs framework: [toolforge,jobs,docs] Document how to force a rerun of a scheduled cron (just restart) - https://phabricator.wikimedia.org/T356580#9564176 (10dcaro) 05Open→03Resolved p:05Triage→03Low a:03dcaro Just updated the docs (was easy enough): https://wikitech.wikimedia.org/wiki/Help... [16:35:52] RECOVERY - Host wikitech-static.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 22.31 ms [16:38:53] 10Toolforge, 10Patch-For-Review, 10User-aborrero: toolforge: introduce OpenAPI to jobs framework - https://phabricator.wikimedia.org/T356523#9564196 (10dcaro) [16:39:24] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: toolforge: introduce OpenAPI to jobs framework - https://phabricator.wikimedia.org/T356523#9509999 (10dcaro) [16:44:26] 10Toolforge: toolforge jobs logs read timeout error - https://phabricator.wikimedia.org/T356503#9564208 (10dcaro) p:05Triage→03Medium [16:45:45] 10Data-Services: [toolsdb] Replica is frequently lagging behind the primary - https://phabricator.wikimedia.org/T357624#9564217 (10fnegri) One thing we should probably check is how long the problematic queries take to complete on the primary host: * Do they take longer to complete in the replica because of RBR r... [16:46:26] 10Toolforge, 10User-aborrero: [jobs-api] when running a command with wrong quoting, no logs nor useful feedback is given to the user - https://phabricator.wikimedia.org/T356267#9564215 (10dcaro) p:05Triage→03Low [16:46:44] 10Toolforge (Toolforge iteration 06): Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9564226 (10dcaro) [16:49:07] 10Toolforge: Replace already completed one-off jobs when starting a new one - https://phabricator.wikimedia.org/T352989#9564257 (10dcaro) p:05Triage→03Medium [16:50:09] 10Toolforge: "toolforge jobs logs" fails when job has not started yet - https://phabricator.wikimedia.org/T349775#9564274 (10dcaro) p:05Triage→03Medium [16:50:39] 10Toolforge: [toolforge,jobs] Replace already completed one-off jobs when starting a new one - https://phabricator.wikimedia.org/T352989#9564282 (10dcaro) [16:50:49] 10Toolforge, 10User-aborrero: [jobs-api] Periodically refresh image-config data - https://phabricator.wikimedia.org/T357112#9564284 (10dcaro) [16:51:01] 10Toolforge: [toolforge,jobs] Add a new output format for toolforge jobs list command which returns the input command for scheduled jobs - https://phabricator.wikimedia.org/T356581#9564287 (10dcaro) [16:51:54] 10Toolforge: [toolforge,jobs] toolforge jobs logs read timeout error - https://phabricator.wikimedia.org/T356503#9564292 (10dcaro) [16:52:13] 10Toolforge: [toolforge,jobs] "toolforge jobs logs" fails when job has not started yet - https://phabricator.wikimedia.org/T349775#9564293 (10dcaro) [17:05:24] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [17:05:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:05:55] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [17:05:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:06:41] 10Cloud-VPS, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Patch-For-Review: [toolsdb] [cinder] [ceph] Deleting snapshot does not work - https://phabricator.wikimedia.org/T356904#9564358 (10fnegri) Simplified scenario: * create a new volume A * wait for the `wmcs-backup` script to create a... [17:07:05] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api [17:07:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:07:49] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api [17:07:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:08:52] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review, 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9564378 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/clo... [17:11:47] (03CR) 10Andrew Bogott: [C: 03+1] "odd" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005557 (https://phabricator.wikimedia.org/T357970) (owner: 10Arturo Borrero Gonzalez) [17:15:12] 10Toolforge (Toolforge iteration 06), 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9564417 (10dcaro) I'm also on the f... [17:17:12] 10cloud-services-team: NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T353406#9564426 (10Jclark-ctr) [17:17:14] 10cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T352595#9564427 (10Jclark-ctr) [17:18:52] 10Cloud-VPS, 10cloud-services-team (Hardware), 10SRE, 10ops-eqiad: Cloudvirt1063.eqiad.wmnet overheating - https://phabricator.wikimedia.org/T353408#9564424 (10Jclark-ctr) 05Open→03Resolved closing ticket 7 days no faults [17:53:01] 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9564630 (10Albertoleoncio) @Slst2020 I'm having a problem with credentials. I don't know if I'm running it incorrectly, or if I'm missing some permission. ` tools.capacity-exc... [18:29:53] 10Tool-gitlab-account-approval: Investigate OAuth 2 Resource owner password credentials flow as a replacement for Personal Access Token auth - https://phabricator.wikimedia.org/T358134#9564770 (10bd808) p:05Triage→03Medium [18:32:03] 10Tool-gitlab-account-approval: Investigate OAuth 2 Resource owner password credentials flow as a replacement for Personal Access Token auth - https://phabricator.wikimedia.org/T358134#9564781 (10bd808) [[https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#create-a-service-account-personal-access... [19:07:52] 10Tool-gitlab-account-approval: Investigate OAuth 2 Resource owner password credentials flow as a replacement for Personal Access Token auth - https://phabricator.wikimedia.org/T358134#9564901 (10bd808) >>! In T358134#9564781, @bd808 wrote: > [[https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#... [19:41:41] (CloudVPSDesignateLeaks) firing: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:46:41] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:03:49] (03CR) 10VolkerE: releases: Bump Code to 1.3.3 (031 comment) [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1005174 (owner: 10VolkerE) [21:54:22] (03PS1) 10Eevans: restbase: (phony) keys & certs for missing hosts [labs/private] - 10https://gerrit.wikimedia.org/r/1005608 (https://phabricator.wikimedia.org/T354560) [21:54:56] (03PS2) 10Eevans: restbase: (phony) keys & certs for missing/new hosts [labs/private] - 10https://gerrit.wikimedia.org/r/1005608 (https://phabricator.wikimedia.org/T354560) [22:18:58] (03CR) 10Eevans: [V: 03+2 C: 03+2] restbase: (phony) keys & certs for missing/new hosts [labs/private] - 10https://gerrit.wikimedia.org/r/1005608 (https://phabricator.wikimedia.org/T354560) (owner: 10Eevans) [22:41:48] (PuppetZeroResources) firing: Puppet has failed generate resources on cloudcephosd1008:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [22:41:53] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudcephosd1008:9100 - https://phabricator.wikimedia.org/T358156#9565529 (10phaultfinder) [22:46:48] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on cloudcephosd1008:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [22:46:53] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9565586 (10phaultfinder) [22:51:48] (PuppetZeroResources) firing: (3) Puppet has failed generate resources on cloudcephosd1008:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [22:51:53] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9565587 (10phaultfinder) [22:54:45] (WidespreadPuppetFailure) firing: Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [22:56:48] (PuppetZeroResources) firing: (3) Puppet has failed generate resources on cloudcephosd1008:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [23:01:48] (PuppetZeroResources) firing: (4) Puppet has failed generate resources on cloudcephosd1008:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [23:01:55] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9565640 (10phaultfinder) [23:16:48] (PuppetZeroResources) firing: (3) Puppet has failed generate resources on cloudcephosd1011:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [23:16:54] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9565722 (10phaultfinder) [23:24:45] (WidespreadPuppetFailure) resolved: Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [23:26:48] (PuppetZeroResources) firing: (4) Puppet has failed generate resources on cloudcephosd1011:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [23:26:53] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9565735 (10phaultfinder) [23:31:48] (PuppetZeroResources) firing: (4) Puppet has failed generate resources on cloudcephosd1011:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [23:36:48] (PuppetZeroResources) firing: (5) Puppet has failed generate resources on cloudcephmon1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [23:36:52] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9565747 (10phaultfinder) [23:41:48] (PuppetZeroResources) firing: (4) Puppet has failed generate resources on cloudcephmon1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [23:46:42] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:46:48] (PuppetZeroResources) firing: (4) Puppet has failed generate resources on cloudcephmon1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources