[00:07:29] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:12:29] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:16:28] FIRING: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:21:28] RESOLVED: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:36:55] FIRING: MaxConntrack: Max conntrack at 80.04% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:41:55] RESOLVED: MaxConntrack: Max conntrack at 80.08% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [07:58:48] 10Toolforge: disable-tool trying to archive toolsbeta accounts on tools NFS server - https://phabricator.wikimedia.org/T372701#10071492 (10taavi) [09:21:38] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [09:34:46] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29254 bytes in 7.099 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [10:55:52] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [10:57:44] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29253 bytes in 1.031 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [13:11:14] (03open) 10raymond-ndibe: [jobs-api] refactor validate_kube_quant [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/118 (https://phabricator.wikimedia.org/T361120) [21:10:28] FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [22:10:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [23:12:28] FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [23:17:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [23:27:02] 10Toolforge (Toolforge iteration 14): Possible error in jobs and cronjobs quotas in maintain-kubeusers - https://phabricator.wikimedia.org/T372720 (10Raymond_Ndibe) 03NEW [23:41:11] (03open) 10raymond-ndibe: [jobs-api] refactor validate_kube_quant [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [23:41:17] (03update) 10raymond-ndibe: [jobs-api] refactor validate_kube_quant [repos/cloud/toolforge/jobs-api] (refactor_validate_kube_quant) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) [23:55:32] 10Toolforge (Toolforge iteration 14): Possible error in jobs and cronjobs quotas in maintain-kubeusers - https://phabricator.wikimedia.org/T372720#10071887 (10JJMC89) The current quota is fine. 50 cronjobs can be scheduled but only 15 jobs can run concurrently.