[00:07:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:12:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:20:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:38:28] (PuppetStaleCertificates) firing: (2) Found non-revoked Puppet certificates for 1 deleted instances on project-proxy-puppetmaster-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [03:20:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [04:10:41] (CloudVPSDesignateLeaks) firing: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:15:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:36:21] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9656642 (10bd808) test [05:38:28] (PuppetStaleCertificates) firing: (2) Found non-revoked Puppet certificates for 1 deleted instances on project-proxy-puppetmaster-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [06:20:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [07:29:45] 10Toolforge (Quota-requests): 14Request increased quota for pm20-* Toolforge tool - 14https://phabricator.wikimedia.org/T359785#9656674 (10Jneubert) 14Thanks, @Andrew ! I was able to log into Horizon, but I see no "Launch Instance" button: {F43301799} Apparently, I also do not have access to the project:... [07:30:25] 10Toolforge (Quota-requests): Request increased quota for pm20-* Toolforge tool - https://phabricator.wikimedia.org/T359785#9656675 (10Jneubert) 05Resolved→03Open [07:51:09] 10Toolforge (Quota-requests): 14Request increased quota for pm20-* Toolforge tool - 14https://phabricator.wikimedia.org/T359785#9656678 (10Jneubert) 05Open→03Resolved 14Ok, figured it out: Had to switch from "bastion" to "pm20database" project ... [08:15:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:38:28] (PuppetStaleCertificates) firing: (2) Found non-revoked Puppet certificates for 1 deleted instances on project-proxy-puppetmaster-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [09:10:25] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: 14[cookbooks] adapt to having an extra level of buckets in the crushmap - 14https://phabricator.wikimedia.org/T331145#9656763 (10dcaro) 05In progress→03Resolved [09:10:57] (03PS12) 10David Caro: ceph: drain and undrain in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 (https://phabricator.wikimedia.org/T329709) [09:20:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:39:07] 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: [builds-builder,builds-admission] Remove direct access to tekton from tools and remove the admission controller - https://phabricator.wikimedia.org/T360329#9656808 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolfor... [09:40:46] 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: [builds-builder,builds-admission] Remove direct access to tekton from tools and remove the admission controller - https://phabricator.wikimedia.org/T360329#9656810 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-ki... [09:48:13] 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: [builds-builder,builds-admission] Remove direct access to tekton from tools and remove the admission controller - https://phabricator.wikimedia.org/T360329#9656844 (10dcaro) [10:00:29] 10Toolforge: I can't connect to Toolforge DB replicas from my PC using MySQL Workbench - https://phabricator.wikimedia.org/T360839#9656863 (10dcaro) The instructions are only for mysql workbench, as in, only mysql workbench will be the one connecting, not your local code. To connect your local running code to t... [10:01:28] 06cloud-services-team, 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: Upgrade Toolforge legacy URL redirectors to Debian Bullseye or later - https://phabricator.wikimedia.org/T311909#9656864 (10dcaro) p:05Triage→03Medium [10:01:56] 10Toolforge: Toolforge should not re-invent profile::mail::default_mail_relay - https://phabricator.wikimedia.org/T360651#9656865 (10dcaro) p:05Triage→03Medium [10:04:20] 10Toolforge (Toolforge iteration 07): I can't connect to Toolforge DB replicas from my PC using MySQL Workbench - https://phabricator.wikimedia.org/T360839#9656868 (10dcaro) 05Open→03In progress p:05Triage→03Medium a:03dcaro [10:05:13] 10Toolforge: [webservice] should have more easily understandable error messages when run as a non-tool user - https://phabricator.wikimedia.org/T360478#9656878 (10dcaro) [10:05:22] 10Toolforge: [webservice] should have more easily understandable error messages when run as a non-tool user - https://phabricator.wikimedia.org/T360478#9656876 (10dcaro) p:05Triage→03Low This will change automatically once it gets merged into the jobs side of things (as we use a common error handling library... [10:06:57] 10Toolforge: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641#9656881 (10dcaro) p:05Triage→03Medium [10:08:37] 10Toolforge: Provide metrics about build service and non-NFS adoption - https://phabricator.wikimedia.org/T360190#9656898 (10dcaro) [10:08:37] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 07): [builds-api] Add dashboards with the new statistics - https://phabricator.wikimedia.org/T352764#9656899 (10dcaro) [10:08:56] 10Toolforge: [infra,builds-api,jobs-api,webservice] Provide metrics about build service and non-NFS adoption - https://phabricator.wikimedia.org/T360190#9656900 (10dcaro) [10:09:20] 10Toolforge: [infra,builds-api,jobs-api,webservice] Provide metrics about build service and non-NFS adoption - https://phabricator.wikimedia.org/T360190#9656890 (10dcaro) p:05Triage→03Medium [10:09:48] 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: [builds-builder,builds-admission] Remove direct access to tekton from tools and remove the admission controller - https://phabricator.wikimedia.org/T360329#9656902 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-ki... [10:33:28] (PuppetStaleCertificates) firing: (2) Found non-revoked Puppet certificates for 1 deleted instances on project-proxy-puppetmaster-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [10:43:28] (PuppetStaleCertificates) resolved: (2) Found non-revoked Puppet certificates for 1 deleted instances on project-proxy-puppetmaster-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [11:20:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:25:41] (CloudVPSDesignateLeaks) resolved: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:29:28] (InstanceDown) firing: Project metricsinfra instance metricsinfra-haproxy-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:34:28] (InstanceDown) resolved: Project metricsinfra instance metricsinfra-haproxy-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:59:43] (InstanceDown) firing: Project metricsinfra instance metricsinfra-alertmanager-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:02:42] 06cloud-services-team, 10wikitech.wikimedia.org: Disable email address changes in Wikitech - https://phabricator.wikimedia.org/T360883 (10taavi) 03NEW [12:04:43] (InstanceDown) resolved: Project metricsinfra instance metricsinfra-alertmanager-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:08:01] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS (Debian Buster Deprecation), 13Patch-For-Review: 14Migrate metricsinfra off buster - 14https://phabricator.wikimedia.org/T360630#9657377 (10taavi) 05Open→03Resolved [12:10:41] (CloudVPSDesignateLeaks) firing: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:15:41] (CloudVPSDesignateLeaks) firing: (4) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:20:41] (CloudVPSDesignateLeaks) firing: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:25:50] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/1013987 (owner: 10L10n-bot) [12:39:24] vivian-rook closed https://github.com/toolforge/paws/pull/393 [12:41:12] vivian-rook opened https://github.com/toolforge/paws/pull/394 [12:44:10] 10PAWS: update helm chart and jupyterhub - https://phabricator.wikimedia.org/T360643#9657442 (10rook) Github issue resolved. Try chart version 4.1.1 [12:45:30] 10PAWS: Update nb_serverproxy_openrefine - https://phabricator.wikimedia.org/T360798#9657465 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/393 [12:45:42] 10PAWS: 14Update nb_serverproxy_openrefine - 14https://phabricator.wikimedia.org/T360798#9657466 (10rook) 05Open→03Resolved [12:46:04] 10PAWS: Update jupyter-rsession-proxy - https://phabricator.wikimedia.org/T360800#9657467 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/394 [12:59:30] 06cloud-services-team, 10wikitech.wikimedia.org: Disable SSH key management on Wikitech - https://phabricator.wikimedia.org/T359544#9657493 (10taavi) a:03taavi [12:59:51] 10PAWS: Update jupyter-rsession-proxy - https://phabricator.wikimedia.org/T360800#9657496 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/394 [12:59:58] vivian-rook closed https://github.com/toolforge/paws/pull/394 [13:00:41] 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: `Nova Resource:` namespace should be declared in wmf-config, not in Extension:OpenStackManager - https://phabricator.wikimedia.org/T338477#9657502 (10taavi) a:03taavi [13:01:08] 06cloud-services-team, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Remove OpenStackManager from Wikitech - https://phabricator.wikimedia.org/T161553#9657498 (10taavi) a:03taavi Claiming to implement this once a few fixes for the idm.wikimedia.org SSH key management interface has been... [13:02:16] 10PAWS: 14Use upstream jupyter-rsession-proxy - 14https://phabricator.wikimedia.org/T360800#9657509 (10rook) 05Open→03Resolved [13:35:17] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Synthesise user testing results - https://phabricator.wikimedia.org/T358098#9657603 (10KColeman-WMF) [14:10:18] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9657687 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudbackup2003.codfw.... [14:10:39] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9657688 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudbackup2003.codfw.wmne... [14:18:51] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud [14:18:53] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud [14:19:11] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud [14:20:29] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud [14:22:32] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9657711 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudbackup2003.codfw.... [14:25:41] (CloudVPSDesignateLeaks) resolved: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:26:42] 10Toolforge (Quota-requests): 14Request increased quota for pm20-* Toolforge tool - 14https://phabricator.wikimedia.org/T359785#9657728 (10Andrew) 14>>! In T359785#9656678, @Jneubert wrote: > Ok, figured it out: Had to switch from "bastion" to "pm20database" project ... That fits -- you only have read-only... [14:27:35] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud [14:29:53] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud [15:40:14] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9658063 (10MBH) 14@dcaro I can open https://mbh.toolforge.org/clusters.html , but not https://mbh.toolforge.org/elections.txt (404), but both files is in the new stat... [15:59:14] vivian-rook closed https://github.com/toolforge/paws/pull/392 [15:59:27] 10PAWS: update helm chart and jupyterhub - https://phabricator.wikimedia.org/T360643#9658159 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/392 [15:59:38] 10PAWS: 14update helm chart and jupyterhub - 14https://phabricator.wikimedia.org/T360643#9658188 (10rook) 05Open→03Resolved [16:00:50] 10PAWS: Remove paws-123-12 cluster - https://phabricator.wikimedia.org/T360916 (10rook) 03NEW [16:07:38] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9658253 (10dcaro) 14The static files are under https://tools-static.wmflabs.org/mbh/, not https://mbh.toolforge.org (under mbh.toolforge.org are the files that are co... [16:10:41] (CloudVPSDesignateLeaks) firing: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:14:30] 10PAWS: Remove paws-123-12 cluster - https://phabricator.wikimedia.org/T360916#9658280 (10rook) a:03rook [16:15:41] (CloudVPSDesignateLeaks) firing: (4) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:20:41] (CloudVPSDesignateLeaks) firing: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:32:18] 10PAWS: 14add worker to paws - 14https://phabricator.wikimedia.org/T359591#9658342 (10rook) 05Open→03Resolved a:03rook [17:01:29] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9658550 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudbackup2003.codfw.... [18:04:16] 10cloud-services-team (FY2023/2024-Q3-Q4), 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools, 13Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9658807 (10Volans) Will you take care also of debian packaging it and any... [18:07:46] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 10XTools, 07Design, and 2 others: [Design] Communicate to users why there gaps in IP data on Special:Contributions - https://phabricator.wikimedia.org/T360928 (10KColeman-WMF) 03NEW [18:14:43] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 10XTools, 07Design, and 2 others: [Design] Communicate to users why there gaps in IP data on Special:Contributions - https://phabricator.wikimedia.org/T360928#9658826 (10KColeman-WMF) @Urbanecm as discussed, please can you add a comment on the... [18:22:30] 10Quarry, 10Toolforge, 10ChangeProp, 06collaboration-services, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9658856 (10bd808) [18:23:32] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-legacy-redirector [18:24:23] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-legacy-redirector [18:30:28] (InstanceDown) firing: Project tools instance tools-legacy-redirector is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:35:28] (InstanceDown) resolved: Project tools instance tools-legacy-redirector is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:45:22] (03CR) 10Majavah: [C:03+2] openstack: cloudvirt: safe_reboot: Downtime during reboot [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) (owner: 10Majavah) [18:48:32] 10Cloud-VPS, 10Spicerack, 10SRE-tools: Support downtiming metricsinfra alerts in wmcs-cookbooks - https://phabricator.wikimedia.org/T360932 (10taavi) 03NEW [18:49:11] (03Merged) 10jenkins-bot: openstack: cloudvirt: safe_reboot: Downtime during reboot [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) (owner: 10Majavah) [18:53:46] 10Cloud-VPS, 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools, 13Patch-For-Review: Support downtiming metricsinfra alerts in wmcs-cookbooks - https://phabricator.wikimedia.org/T360932#9659005 (10taavi) a:03taavi [18:55:26] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.vps.remove_instance for instance toolsbeta-legacy-redirector [18:56:15] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance toolsbeta-legacy-redirector [19:01:38] (ProbeDown) firing: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:06:38] (ProbeDown) resolved: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:20:28] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 10XTools, 07Design, 10Temporary accounts (Create/update essential tools/anti-abuse management): [Design] Communicate to users why there gaps in IP data on Special:Contributions - https://phabricator.wikimedia.org/T360928#9659097 (10JJMC89) [19:45:28] (InstanceDown) firing: Project cloudinfra instance mx-out04 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:50:28] (InstanceDown) resolved: (2) Project cloudinfra instance mx-out03 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:00:39] 06cloud-services-team, 10VPS-Projects, 10Puppet (Puppet 7.0): Update traffic project puppetmaster - https://phabricator.wikimedia.org/T360940 (10Andrew) 03NEW [20:20:41] (CloudVPSDesignateLeaks) firing: (5) Detected 12 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:21:03] 10Cloud-VPS (Quota-requests), 06collaboration-services, 10Release-Engineering-Team (Radar): 14Increase instance and volume quota in devtools project for puppetmaster upgrade - 14https://phabricator.wikimedia.org/T360823#9659346 (10brennen) 14Thanks! [20:26:11] 06cloud-services-team, 10VPS-Projects, 06Traffic, 10Puppet (Puppet 7.0): 14Update traffic project puppetmaster - 14https://phabricator.wikimedia.org/T360940#9659356 (10Andrew) 05Open→03Resolved a:03Andrew 14I've built this new server (traffic-puppetserver-bookworm) and moved hosts over to it fro... [20:33:20] 06cloud-services-team, 10VPS-Projects, 10Puppet (Puppet 7.0): Migrate per-project Puppet servers to Puppet 7 - https://phabricator.wikimedia.org/T351452#9659390 (10Andrew) [20:33:48] 10Quarry, 10Toolforge, 10ChangeProp, 06collaboration-services, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9659379 (10Tgr) >>! In T360596#9652082, @Krinkle wrote: > In MediaWiki (as deployed at WMF), there exists 1 use of... [20:41:28] (InstanceDown) firing: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:01:28] (InstanceDown) resolved: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:05:06] (ProbeDown) firing: (2) Service toolsbeta-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:11:06] (ProbeDown) firing: (2) Service toolsbeta-legacy-redirector-2:80 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-legacy-redirector-2:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:44:56] 10Tools: 'hoiscript' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349913#9659643 (10bd808) I found and killed a `sha1sum` process running on tools-sgebastion-10.tools.eqiad1.wikimedia.cloud a few minutes ago that seemed to be attempting to compute the sha1 of all 533G of PD... [22:40:08] (03PS1) 10Majavah: Relicense as AGPL-3.0-only [labs/tools/train-blockers] - 10https://gerrit.wikimedia.org/r/1014125 [22:47:30] (03CR) 10Majavah: [V:03+2 C:03+2] Relicense as AGPL-3.0-only [labs/tools/train-blockers] - 10https://gerrit.wikimedia.org/r/1014125 (owner: 10Majavah) [22:54:29] (03PS1) 10Majavah: Relicense as AGPL-3.0-only [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1014126 [22:57:54] (03PS1) 10Majavah: Relicense as AGPL-3.0-only [cloud/metricsinfra/prometheus-manager] - 10https://gerrit.wikimedia.org/r/1014128 [22:57:54] (03PS3) 10Krinkle: Fix race condition around midnight, prefer HTTP gzip, bump UA version [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/940489 [22:58:18] (03CR) 10Majavah: [C:03+2] Relicense as AGPL-3.0-only [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1014126 (owner: 10Majavah) [22:58:58] (03CR) 10Majavah: [C:03+2] Relicense as AGPL-3.0-only [cloud/metricsinfra/prometheus-manager] - 10https://gerrit.wikimedia.org/r/1014128 (owner: 10Majavah) [22:59:16] (03Merged) 10jenkins-bot: Relicense as AGPL-3.0-only [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1014126 (owner: 10Majavah) [22:59:31] (03Merged) 10jenkins-bot: Relicense as AGPL-3.0-only [cloud/metricsinfra/prometheus-manager] - 10https://gerrit.wikimedia.org/r/1014128 (owner: 10Majavah) [23:01:43] (03PS4) 10Krinkle: Fix race condition around midnight, prefer HTTP gzip, bump UA version [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/940489 [23:02:04] (03CR) 10Krinkle: [C:03+2] Fix race condition around midnight, prefer HTTP gzip, bump UA version [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/940489 (owner: 10Krinkle) [23:02:41] (03Merged) 10jenkins-bot: Fix race condition around midnight, prefer HTTP gzip, bump UA version [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/940489 (owner: 10Krinkle) [23:02:52] (03PS1) 10Dzahn: delete ticket-test.discovery.wmnet dummy key, not used [labs/private] - 10https://gerrit.wikimedia.org/r/1014131 (https://phabricator.wikimedia.org/T360413) [23:04:21] (03CR) 10Dzahn: [V:03+2 C:03+2] delete ticket-test.discovery.wmnet dummy key, not used [labs/private] - 10https://gerrit.wikimedia.org/r/1014131 (https://phabricator.wikimedia.org/T360413) (owner: 10Dzahn) [23:04:38] (03PS2) 10Dzahn: delete ticket-test.discovery.wmnet dummy key, not used [labs/private] - 10https://gerrit.wikimedia.org/r/1014131 (https://phabricator.wikimedia.org/T360413) [23:05:00] (03CR) 10Dzahn: [V:03+2 C:03+2] delete ticket-test.discovery.wmnet dummy key, not used [labs/private] - 10https://gerrit.wikimedia.org/r/1014131 (https://phabricator.wikimedia.org/T360413) (owner: 10Dzahn) [23:29:21] (03PS1) 10Dzahn: delete apt-staging.discovery dummy key [labs/private] - 10https://gerrit.wikimedia.org/r/1014133 (https://phabricator.wikimedia.org/T360413) [23:33:46] (03PS2) 10Dzahn: delete apt-staging.discovery dummy key [labs/private] - 10https://gerrit.wikimedia.org/r/1014133 (https://phabricator.wikimedia.org/T360413) [23:41:43] (03CR) 10Dzahn: [V:03+2 C:03+2] delete apt-staging.discovery dummy key [labs/private] - 10https://gerrit.wikimedia.org/r/1014133 (https://phabricator.wikimedia.org/T360413) (owner: 10Dzahn)