[00:05:55] FIRING: MaxConntrack: Max conntrack at 91.13% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:06:03] 06cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1050:9100 - https://phabricator.wikimedia.org/T372693#10075959 (10phaultfinder) [00:07:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:08:55] FIRING: MaxConntrack: Max conntrack at 89.95% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:10:55] RESOLVED: MaxConntrack: Max conntrack at 90.77% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:12:28] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:12:55] FIRING: MaxConntrack: Max conntrack at 90.1% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:13:02] 06cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1050:9100 - https://phabricator.wikimedia.org/T372693#10075966 (10phaultfinder) [00:27:44] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10075998 (10Dzahn) I think now we just need to know what the actual date of the change is. [00:27:55] RESOLVED: MaxConntrack: Max conntrack at 90.12% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:28:25] FIRING: MaxConntrack: Max conntrack at 90.42% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:28:31] 06cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1050:9100 - https://phabricator.wikimedia.org/T372693#10075999 (10phaultfinder) [00:33:10] RESOLVED: MaxConntrack: Max conntrack at 90.12% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:33:25] FIRING: MaxConntrack: Max conntrack at 90.28% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:33:33] 06cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1050:9100 - https://phabricator.wikimedia.org/T372693#10076015 (10phaultfinder) [00:41:25] RESOLVED: MaxConntrack: Max conntrack at 89.55% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:46:55] FIRING: MaxConntrack: Max conntrack at 89.93% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:48:10] RESOLVED: MaxConntrack: Max conntrack at 90.3% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:51:55] RESOLVED: MaxConntrack: Max conntrack at 81.62% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [03:18:42] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0) [06:11:12] (03CR) 10Lokal Profil: [C:03+2] Set file permissions on database configuration file in Docker build [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1063859 (owner: 10Jean-Frédéric) [06:12:58] (03Merged) 10jenkins-bot: Set file permissions on database configuration file in Docker build [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1063859 (owner: 10Jean-Frédéric) [06:49:54] 10VPS-project-devtools, 06collaboration-services, 10GitLab, 06Release-Engineering-Team, 13Patch-For-Review: https://gitlab.devtools.wmcloud.org is being indexed by google (and scoring pretty high) - https://phabricator.wikimedia.org/T372538#10076347 (10Jelto) 05Open→03Resolved a:03Jelto @dzahn... [07:08:29] FIRING: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [07:34:10] (03CR) 10Kosta Harlan: [C:03+2] Exempt Test Group Repositories [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565) (owner: 10Pwangai) [07:34:38] (03Merged) 10jenkins-bot: Exempt Test Group Repositories [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565) (owner: 10Pwangai) [08:33:29] RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:37:05] 10Data-Services: [wikireplicas] Log very slow queries - https://phabricator.wikimedia.org/T372859 (10fnegri) 03NEW [08:37:09] 10Data-Services: [wikireplicas] Log very slow queries - https://phabricator.wikimedia.org/T372859#10076474 (10fnegri) p:05Triage→03Low [08:37:13] 10Data-Services: Querying the logging table on labs is slow - https://phabricator.wikimedia.org/T131266#10076477 (10fnegri) p:05Lowest→03Low [08:37:29] 10Data-Services, 06DBA, 13Patch-For-Review: Make watchlist table available as curated foo_p.watchlist_count on labsdb - https://phabricator.wikimedia.org/T59617#10076478 (10fnegri) p:05Lowest→03Low [09:41:15] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Quarry: Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#10076746 (10fnegri) I created the quarry_readonly user manually because Trove doesn't let me create read-only users: ` MariaDB [(none)]> CREATE USER quarry_readonly@'172.16.%... [09:42:48] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Quarry: Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#10076764 (10fnegri) The `quarry_p` database was also created manually, but I've also added the schema to `schema.sql` in https://github.com/toolforge/quarry/pull/61. [10:48:23] 10Cloud-VPS (Debian Buster Deprecation), 10linkwatcher: Cloud VPS "linkwatcher" project Buster deprecation - https://phabricator.wikimedia.org/T367536#10076979 (10Epantaleo) Hello, I would like the VM etytree-a to be kept as a copy, as I am planning an upgrade of the operating system, but I need the old copy a... [10:58:03] vivian-rook opened https://github.com/toolforge/paws/pull/445 [11:12:28] FIRING: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:20:07] vivian-rook closed https://github.com/toolforge/paws/pull/445 [11:21:43] 10PAWS: update prometheus - https://phabricator.wikimedia.org/T366182#10077028 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/445 [11:21:47] 10PAWS: update prometheus - https://phabricator.wikimedia.org/T366182#10077029 (10rook) 05Open→03Resolved a:03rook [11:26:19] 10Toolforge: Toolforge CNF build pack jobs are not terminated gracefully. - https://phabricator.wikimedia.org/T372879 (10Count_Count) 03NEW [11:26:21] 10Toolforge: Toolforge CNF build pack jobs are not terminated gracefully. - https://phabricator.wikimedia.org/T372879#10077094 (10Count_Count) [11:27:10] 10Toolforge: Toolforge CNF build pack jobs are not terminated gracefully. - https://phabricator.wikimedia.org/T372879#10077107 (10Count_Count) This works fine with non-custom images by using commands like this in the job definition: ` command: exec ./xlinks.sh ` [11:37:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:14:03] 10Quarry: deploy.sh shows warnings - https://phabricator.wikimedia.org/T372881 (10fnegri) 03NEW [12:25:51] 10PAWS: Remedy warning in install - https://phabricator.wikimedia.org/T372883 (10rook) 03NEW [12:25:59] 10Quarry: Remedy warning in install - https://phabricator.wikimedia.org/T372884 (10rook) 03NEW [12:27:40] 10PAWS: Remove warning message - https://phabricator.wikimedia.org/T372885 (10rook) 03NEW [12:27:44] 10Quarry: Remove warning message - https://phabricator.wikimedia.org/T372886 (10rook) 03NEW [12:28:00] 10PAWS, 10Quarry: deploy.sh shows warnings - https://phabricator.wikimedia.org/T372881#10077284 (10rook) [12:38:09] 10PAWS: Remove warning message - https://phabricator.wikimedia.org/T372885#10077308 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/446 [12:38:21] vivian-rook opened https://github.com/toolforge/paws/pull/446 [12:39:45] vivian-rook closed https://github.com/toolforge/paws/pull/446 [12:40:59] 10PAWS: Remove warning message - https://phabricator.wikimedia.org/T372885#10077315 (10rook) 05Open→03Resolved a:03rook [12:43:34] vivian-rook opened https://github.com/toolforge/quarry/pull/64 [12:47:46] 10Quarry: Remove warning message - https://phabricator.wikimedia.org/T372886#10077321 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/64 [12:47:55] vivian-rook closed https://github.com/toolforge/quarry/pull/64 [12:50:00] 10Quarry: Remove warning message - https://phabricator.wikimedia.org/T372886#10077322 (10rook) 05Open→03Resolved a:03rook [13:09:28] FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:15:46] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10077393 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1002 for host clouddb1014.eqiad.wmnet with OS bookworm [13:29:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:59:34] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10077628 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1002 for host clouddb1014.eqiad.wmnet with OS bookworm completed: - cl... [14:05:11] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10077651 (10fnegri) [14:10:30] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10077676 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1002 for host clouddb1013.eqiad.wmnet with OS bookworm [14:11:40] 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "etytree" project Buster deprecation - https://phabricator.wikimedia.org/T367529#10077675 (10Andrew) Ester writes: > Hello, I would like the VM etytree-a to be kept as a copy, as I am planning an upgrade of the operating system, but I need the old copy as a r... [14:13:56] 10Cloud-VPS (Debian Buster Deprecation), 10linkwatcher: Cloud VPS "linkwatcher" project Buster deprecation - https://phabricator.wikimedia.org/T367536#10077693 (10Andrew) >>! In T367536#10076979, @Epantaleo wrote: > Hello, I would like the VM etytree-a to be kept as a copy, as I am planning an upgrade of the o... [14:17:19] 10Data-Services, 07Wikimedia-Slow-DB-Query: [wikireplicas] Log very slow queries - https://phabricator.wikimedia.org/T372859#10077723 (10Aklapper) [14:18:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:10:28] FIRING: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:26:05] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10078060 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1002 for host clouddb1013.eqiad.wmnet with OS bookworm completed: - cl... [15:26:13] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10078065 (10fnegri) [15:40:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:50:24] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10078183 (10Htriedman) As of a few weeks ago I am no longer an employee of WMF, so feel free to change it over... [16:14:30] vivian-rook opened https://github.com/toolforge/paws/pull/447 [16:16:02] vivian-rook closed https://github.com/toolforge/paws/pull/447 [16:16:52] 10PAWS: Remedy warning in install - https://phabricator.wikimedia.org/T372883#10078333 (10rook) 05Open→03Resolved a:03rook [16:20:06] vivian-rook opened https://github.com/toolforge/quarry/pull/65 [16:20:39] 10Quarry: Remedy warning in install - https://phabricator.wikimedia.org/T372884#10078379 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/65 [16:22:28] FIRING: InstanceDown: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:26:05] vivian-rook closed https://github.com/toolforge/quarry/pull/65 [16:26:15] 10Quarry: Remedy warning in install - https://phabricator.wikimedia.org/T372884#10078440 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/65 [16:26:31] 10Quarry: Remedy warning in install - https://phabricator.wikimedia.org/T372884#10078441 (10rook) 05Open→03Resolved a:03rook [16:26:47] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10078448 (10Dzahn) [16:28:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:30:41] 10PAWS, 10Quarry: deploy.sh shows warnings - https://phabricator.wikimedia.org/T372881#10078462 (10rook) 05Open→03Resolved [16:30:42] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10078464 (10Dzahn) @Htriedman You have now been removed from the "wmf" LDAP group and added to the "nda" LDAP... [16:47:28] RESOLVED: InstanceDown: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:47:28] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate coibot.linkwatcher.eqiad.wmflabs is about to expire in 18d 7h 32m 37s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [20:36:54] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10079257 (10Eevans) >>! In T370460#10070859, @Ee... [21:10:28] FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:25:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:26:14] (03PS1) 10JHathaway: puppet8: add db_user [labs/private] - 10https://gerrit.wikimedia.org/r/1064113 (https://phabricator.wikimedia.org/T372664) [21:26:44] (03CR) 10JHathaway: "check experimental" [labs/private] - 10https://gerrit.wikimedia.org/r/1064113 (https://phabricator.wikimedia.org/T372664) (owner: 10JHathaway) [21:28:03] (03CR) 10JHathaway: "check experimental" [labs/private] - 10https://gerrit.wikimedia.org/r/1064113 (https://phabricator.wikimedia.org/T372664) (owner: 10JHathaway) [21:30:28] FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [22:45:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [22:54:59] (03update) 10raymond-ndibe: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] (refactor_validate_kube_quant) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120)