[00:00:47] RECOVERY - Check systemd state on centrallog1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:01:07] RECOVERY - Check systemd state on centrallog2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:01:45] (SystemdUnitFailed) firing: (2) man-db.service Failed on relforge1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:38:48] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/981433 [00:38:54] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/981433 (owner: 10TrainBranchBot) [00:41:13] PROBLEM - Check systemd state on centrallog2002 is CRITICAL: CRITICAL - degraded: The following units failed: logrotate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:42:19] PROBLEM - Check systemd state on centrallog1002 is CRITICAL: CRITICAL - degraded: The following units failed: logrotate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:44:31] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 7 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:45:59] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:58:54] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/981433 (owner: 10TrainBranchBot) [01:33:59] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:15:06] (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [02:39:09] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:00:28] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [03:09:09] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:29:28] (DiskSpace) firing: Disk space relforge1003:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=relforge1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:36:53] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [03:38:23] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [03:58:51] PROBLEM - BGP status on cr2-drmrs is CRITICAL: BGP CRITICAL - ASunknown/IPv4: Idle https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [04:01:46] (SystemdUnitFailed) firing: (2) man-db.service Failed on relforge1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:15:06] (CirrusSearchHighOldGCFrequency) resolved: Elasticsearch instance cloudelastic1006-cloudelastic-psi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency [04:22:51] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [04:24:21] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:00:28] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [07:29:29] (DiskSpace) firing: Disk space relforge1003:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=relforge1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [07:39:31] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:41:01] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231210T0800) [08:02:02] (SystemdUnitFailed) firing: (2) man-db.service Failed on relforge1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:10:13] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [09:11:43] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [09:51:47] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [09:54:47] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [11:05:07] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [11:29:29] (DiskSpace) firing: Disk space relforge1003:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=relforge1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [12:02:02] (SystemdUnitFailed) firing: (2) man-db.service Failed on relforge1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:18:45] (Primary outbound port utilisation over 80% #page) firing: Alert for device cr2-eqiad.wikimedia.org - Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page [12:18:45] (Primary outbound port utilisation over 80% #page) firing: Alert for device cr2-eqiad.wikimedia.org - Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page [12:20:20] here-ish [12:23:10] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [12:23:45] (Primary outbound port utilisation over 80% #page) resolved: Device cr2-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page [12:23:45] (Primary outbound port utilisation over 80% #page) resolved: Device cr2-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page [12:53:09] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [14:39:09] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:54:09] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:05:07] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [15:29:30] (DiskSpace) firing: Disk space relforge1003:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=relforge1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [15:38:49] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:41:47] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:46:17] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:47:45] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [16:02:03] (SystemdUnitFailed) firing: (2) man-db.service Failed on relforge1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:50:09] (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 49.81% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [16:51:08] 10SRE-swift-storage, 10UploadWizard: Internal error: The server could not save the temporary file - https://phabricator.wikimedia.org/T353068 (10Ahonc) Similar issue with PDF files over 100MB: I got error after trying to upload temporary file: {"errors":[{"code":"stashfailed","html":"Внутрішня помилка: сервер... [16:55:09] (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 49.81% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [17:08:17] (03PS1) 10Andrew Bogott: nova vendor-data: more puppet-switching attempts [puppet] - 10https://gerrit.wikimedia.org/r/981669 (https://phabricator.wikimedia.org/T326818) [17:08:35] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981669 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [17:28:48] (03PS2) 10Andrew Bogott: nova vendor-data: more puppet-switching attempts [puppet] - 10https://gerrit.wikimedia.org/r/981669 (https://phabricator.wikimedia.org/T326818) [17:29:01] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/981669 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [17:39:13] (03PS3) 10Andrew Bogott: nova vendor-data: more puppet-switching attempts [puppet] - 10https://gerrit.wikimedia.org/r/981669 (https://phabricator.wikimedia.org/T326818) [17:42:27] (03CR) 10Andrew Bogott: [C: 03+2] nova vendor-data: more puppet-switching attempts [puppet] - 10https://gerrit.wikimedia.org/r/981669 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [17:49:09] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [17:54:09] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [19:05:29] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [19:29:30] (DiskSpace) firing: Disk space relforge1003:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=relforge1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [19:43:16] (03PS1) 10Andrew Bogott: vendordata: don't specify puppet server on first run [puppet] - 10https://gerrit.wikimedia.org/r/981676 (https://phabricator.wikimedia.org/T326818) [19:45:03] (03CR) 10Andrew Bogott: [C: 03+2] vendordata: don't specify puppet server on first run [puppet] - 10https://gerrit.wikimedia.org/r/981676 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [19:49:04] (03CR) 10Gergő Tisza: Add virtual domain for botpasswords (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/976787 (https://phabricator.wikimedia.org/T351559) (owner: 10Ladsgroup) [20:02:03] (SystemdUnitFailed) firing: (2) man-db.service Failed on relforge1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:15:51] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [20:17:56] (03PS1) 10Andrew Bogott: nova: move the puppet cert cleanup from vendordata to wmcs-image-create [puppet] - 10https://gerrit.wikimedia.org/r/981677 (https://phabricator.wikimedia.org/T326818) [20:18:30] (03CR) 10CI reject: [V: 04-1] nova: move the puppet cert cleanup from vendordata to wmcs-image-create [puppet] - 10https://gerrit.wikimedia.org/r/981677 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [20:18:51] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [20:19:28] (03PS2) 10Andrew Bogott: nova: move the puppet cert cleanup from vendordata to wmcs-image-create [puppet] - 10https://gerrit.wikimedia.org/r/981677 (https://phabricator.wikimedia.org/T326818) [20:20:12] (03CR) 10Andrew Bogott: [C: 03+2] nova: move the puppet cert cleanup from vendordata to wmcs-image-create [puppet] - 10https://gerrit.wikimedia.org/r/981677 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [20:23:19] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [20:27:49] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [20:39:28] (03CR) 10Pols12: Make wiktionary and mw.org provide og:site_name (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981636 (https://phabricator.wikimedia.org/T348203) (owner: 10Pols12) [21:06:31] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [21:07:59] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [21:56:55] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [21:58:25] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:09:46] (03CR) 10Andrew Bogott: [C: 03+2] keystone fernet_keys: remove old absent section [puppet] - 10https://gerrit.wikimedia.org/r/980893 (owner: 10Andrew Bogott) [22:48:55] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:50:25] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [23:03:51] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Certificate lists.wikimedia.org expires in 6 day(s) (Sun 17 Dec 2023 03:07:37 AM GMT +0000). https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [23:05:21] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [23:05:29] (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [23:34:14] (DiskSpace) firing: Disk space relforge1003:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=relforge1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace