[00:01:41] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.35, 24.54, 23.29 [00:19:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.35, 21.58, 19.34 [00:21:56] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.20, 20.15, 19.09 [00:31:41] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.45, 20.84, 19.88 [00:36:03] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.51, 20.15, 18.59 [00:37:58] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.18, 19.06, 18.39 [00:39:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:39:27] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.80, 20.37, 20.04 [00:41:48] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.52, 20.89, 19.33 [00:44:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:45:37] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.08, 19.92, 19.41 [00:48:14] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.27, 20.66, 20.15 [00:50:11] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.44, 21.58, 20.53 [00:52:07] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.05, 20.63, 20.37 [01:01:50] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.40, 19.00, 20.00 [01:09:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.10, 23.77, 21.55 [01:13:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:13:31] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.58, 23.64, 22.13 [01:31:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.11, 19.45, 20.39 [01:31:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.29, 20.51, 18.53 [01:33:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.07, 22.04, 19.32 [01:35:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.24, 21.62, 21.06 [01:35:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.60, 21.49, 19.42 [01:45:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.47, 19.95, 19.72 [01:49:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.23, 23.14, 23.27 [02:13:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.52, 22.74, 22.20 [02:15:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.93, 22.38, 22.09 [02:19:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.69, 22.51, 22.15 [02:27:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.18, 21.45, 18.42 [02:31:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.15, 23.37, 23.43 [02:31:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 16.17, 21.26, 19.24 [02:33:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 14.15, 18.98, 18.66 [02:45:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.68, 22.22, 21.85 [02:47:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.18, 21.49, 21.62 [02:49:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.46, 22.88, 22.13 [02:51:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.78, 21.06, 21.53 [02:57:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.67, 19.04, 20.35 [03:01:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.03, 22.91, 21.51 [03:01:24] RECOVERY - mon181 Backups Grafana on mon181 is OK: FILE_AGE OK: /var/log/grafana-backup.log is 65 seconds old and 93 bytes [03:02:30] PROBLEM - kagaga.jp - LetsEncrypt on sslhost is CRITICAL: No address associated with hostnameHTTP CRITICAL - Unable to open TCP socket [03:05:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.90, 23.16, 22.08 [03:06:04] PROBLEM - kagaga.jp - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for kagaga.jp could not be found [03:08:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:15:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.36, 18.38, 20.18 [03:21:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.54, 20.21, 20.47 [03:23:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.72, 22.11, 21.09 [03:26:09] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 26.57, 22.57, 19.30 [03:27:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.92, 23.50, 22.07 [03:27:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:31:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.31, 23.37, 22.27 [03:33:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.56, 23.21, 22.39 [03:35:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.17, 24.05, 22.82 [03:37:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.19, 23.31, 22.68 [03:39:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.67, 24.30, 23.15 [03:41:26] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.47, 22.56, 22.12 [03:45:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.31, 22.57, 22.87 [03:47:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.50, 23.65, 23.19 [03:47:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.67, 21.64, 21.48 [03:49:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.61, 22.98, 22.99 [03:49:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.25, 21.48, 21.46 [03:51:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.14, 24.02, 23.39 [03:52:25] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:53:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 15.14, 20.96, 22.37 [03:53:20] PROBLEM - wiki.gab.pt.eu.org - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - wiki.gab.pt.eu.org All nameservers failed to answer the query. [03:55:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 19.06, 19.08, 20.37 [04:07:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.26, 21.76, 21.54 [04:08:09] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 28.77, 23.54, 21.42 [04:10:03] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.87, 22.54, 21.30 [04:11:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.37, 22.30, 21.93 [04:17:42] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.22, 18.31, 19.89 [04:21:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.80, 22.88, 22.02 [04:22:11] RECOVERY - wiki.gab.pt.eu.org - reverse DNS on sslhost is OK: SSL OK - wiki.gab.pt.eu.org reverse DNS resolves to cp36.wikitide.net - CNAME OK [04:24:11] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [04:26:11] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 323 system event log (SEL) entries present] [04:31:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.43, 22.46, 22.82 [04:33:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.02, 22.85, 22.90 [04:35:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.27, 22.76, 22.87 [04:37:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.70, 23.25, 23.01 [04:39:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.60, 22.04, 22.59 [04:42:25] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:45:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.33, 22.25, 22.40 [04:47:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.32, 22.71, 22.57 [04:55:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.65, 17.79, 20.28 [05:02:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:03:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.41, 3.20, 1.32 [05:05:56] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.90, 2.71, 1.36 [05:07:41] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 17.72, 20.59, 23.41 [05:09:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.26, 5.12, 2.66 [05:12:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:13:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.74, 3.30, 2.47 [05:16:37] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [05:17:41] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 13.26, 16.00, 20.10 [05:36:19] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [05:38:20] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 325 system event log (SEL) entries present] [05:45:08] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:14:58] PROBLEM - cloud18 Puppet on cloud18 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[ulogd2] [06:15:49] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [06:38:24] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [06:40:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 327 system event log (SEL) entries present] [06:40:58] RECOVERY - cloud18 Puppet on cloud18 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:45:46] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:10] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.19, 20.73, 18.99 [06:50:07] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 17.07, 19.21, 18.76 [07:24:41] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.64, 22.22, 19.54 [07:30:36] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.29, 22.42, 20.76 [07:32:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.36, 24.13, 21.57 [07:33:58] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.52, 24.00, 19.94 [07:36:31] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 16.44, 21.79, 21.31 [07:39:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 15.51, 22.84, 21.16 [07:40:28] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.37, 22.23, 21.51 [07:42:27] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.55, 21.00, 21.11 [07:44:25] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.81, 22.93, 21.78 [07:46:24] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.77, 23.69, 22.24 [07:50:21] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 9.25, 17.17, 20.05 [07:51:27] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 12.76, 17.61, 19.67 [08:05:07] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.99, 23.78, 21.23 [08:13:51] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.90, 21.32, 19.55 [08:21:37] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.94, 19.81, 19.71 [08:25:32] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.26, 23.75, 21.28 [08:29:25] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.25, 23.48, 21.77 [08:31:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:39:08] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.34, 23.14, 21.91 [08:41:04] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.02, 21.66, 21.50 [08:47:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.24, 18.30, 20.11 [08:49:43] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:50:37] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [08:51:42] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset 0.0003411769867 secs [08:52:37] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 330 system event log (SEL) entries present] [08:53:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.18, 21.31, 20.75 [08:55:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.68, 22.08, 21.12 [08:58:49] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 1 backends are down. mw152 [09:00:44] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [09:01:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.96, 22.73, 21.53 [09:05:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.86, 21.44, 21.25 [09:06:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:11:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.85, 19.07, 20.26 [09:17:14] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [09:21:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.93, 24.02, 21.91 [09:21:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.30, 19.16, 17.40 [09:23:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.59, 19.69, 17.82 [09:31:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 26.04, 22.18, 19.73 [09:33:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.48, 21.24, 19.68 [09:34:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:35:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.60, 23.55, 23.76 [09:35:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 19.73, 20.39, 19.53 [09:45:17] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:47:07] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [09:49:06] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset 0.0002382099628 secs [10:02:44] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [10:03:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.06, 23.47, 21.72 [10:04:44] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [331 system event log (SEL) entries present] [10:06:30] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 26.56, 20.93, 19.58 [10:07:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.59, 22.79, 21.99 [10:08:24] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 19.19, 19.90, 19.36 [10:13:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.33, 23.53, 22.26 [10:15:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.39, 22.89, 22.17 [10:17:24] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_dns] [10:23:03] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.41, 18.77, 20.30 [10:29:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:35:47] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.55, 20.79, 19.92 [10:37:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 23.45, 21.70, 19.13 [10:37:44] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.52, 21.01, 20.10 [10:41:37] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.07, 19.36, 19.68 [10:44:08] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [10:45:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.39, 19.92, 19.55 [10:50:41] [02puppet] 07redbluegreenhat created branch 03pywikibot-beebot - 13https://github.com/miraheze/puppet [10:50:43] [02puppet] 07redbluegreenhat pushed 032 commits to 03pywikibot-beebot [+0/-0/±2] 13https://github.com/miraheze/puppet/compare/b7c7cfab6454^...6a8a88e508e8 [10:50:45] [02puppet] 07redbluegreenhat 03b7c7cfa - Pywikibot: Switch accounts to BeeBot [10:50:47] [02puppet] 07redbluegreenhat 036a8a88e - Remove Windows line endings [10:51:42] [02puppet] 07redbluegreenhat opened pull request 03#3887: Pywikibot: Switch accounts to BeeBot & remove Windows line endings - 13https://github.com/miraheze/puppet/pull/3887 [10:51:50] [02puppet] 07coderabbitai[bot] commented on pull request 03#3887: Pywikibot: Switch accounts to BeeBot & remove Windows line endings - 13https://github.com/miraheze/puppet/pull/3887#issuecomment-2250040087 [10:52:59] miraheze/puppet - redbluegreenhat the build passed. [10:54:19] [02puppet] 07redbluegreenhat closed pull request 03#3887: Pywikibot: Switch accounts to BeeBot & remove Windows line endings - 13https://github.com/miraheze/puppet/pull/3887 [10:54:20] [02puppet] 07redbluegreenhat pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/fe0398388a58...59e69cffb762 [10:54:22] [02puppet] 07redbluegreenhat 0359e69cf - Pywikibot: Switch accounts to BeeBot & remove Windows line endings (#3887) [10:54:25] [02puppet] 07redbluegreenhat deleted branch 03pywikibot-beebot - 13https://github.com/miraheze/puppet [10:54:28] [02puppet] 07redbluegreenhat deleted branch 03pywikibot-beebot [10:55:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.05, 21.37, 19.99 [10:55:41] miraheze/puppet - redbluegreenhat the build passed. [10:57:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.71, 20.32, 19.79 [11:04:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:05:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.40, 2.96, 1.34 [11:06:03] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 29.24, 24.19, 21.50 [11:07:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.78, 2.87, 1.49 [11:07:58] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.97, 22.64, 21.27 [11:09:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:09:55] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.86, 2.56, 1.53 [11:11:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:11:47] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 11.27, 18.18, 19.88 [11:12:54] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [11:14:55] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [332 system event log (SEL) entries present] [11:16:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:16:36] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [11:43:52] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [11:53:20] PROBLEM - wiki.gab.pt.eu.org - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - wiki.gab.pt.eu.org All nameservers failed to answer the query. [12:11:57] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.39, 21.54, 19.46 [12:12:03] [02CreateWiki] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/CreateWiki/compare/208bb2b96ad9...0e1b3a9cd50e [12:12:04] [02CreateWiki] 07translatewiki 030e1b3a9 - Localisation updates from https://translatewiki.net. [12:12:05] [02ErrorPages] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ErrorPages/compare/9de2e8d1ece2...7366fa181abe [12:12:07] [02ErrorPages] 07translatewiki 037366fa1 - Localisation updates from https://translatewiki.net. [12:12:10] [02GlobalNewFiles] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/GlobalNewFiles/compare/f5b4c0204b2c...a874f9855593 [12:12:12] [02GlobalNewFiles] 07translatewiki 03a874f98 - Localisation updates from https://translatewiki.net. [12:12:14] [02ImportDump] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ImportDump/compare/41a5c6de5575...3efbd8731e06 [12:12:17] [02ImportDump] 07translatewiki 033efbd87 - Localisation updates from https://translatewiki.net. [12:12:19] [02IncidentReporting] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/IncidentReporting/compare/4ced7ddde652...822a1a7b2c3e [12:12:21] [02IncidentReporting] 07translatewiki 03822a1a7 - Localisation updates from https://translatewiki.net. [12:12:22] [02DataDump] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/DataDump/compare/5eb641ab8c8a...afe0c3712afa [12:12:23] [02DataDump] 07translatewiki 03afe0c37 - Localisation updates from https://translatewiki.net. [12:12:25] [02MatomoAnalytics] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/MatomoAnalytics/compare/bbb2138eb379...700e0db6b397 [12:12:28] [02MatomoAnalytics] 07translatewiki 03700e0db - Localisation updates from https://translatewiki.net. [12:12:31] [02landing] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/landing/compare/dc12b1ddf728...ef8564e6e666 [12:12:33] [02landing] 07translatewiki 03ef8564e - Localisation updates from https://translatewiki.net. [12:12:36] [02PDFEmbed] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/PDFEmbed/compare/19f6204a0af3...777072a9f6d8 [12:12:38] [02PDFEmbed] 07translatewiki 03777072a - Localisation updates from https://translatewiki.net. [12:12:43] [02RequestSSL] 07translatewiki pushed 031 commit to 03master [+1/-0/±1] 13https://github.com/miraheze/RequestSSL/compare/e366378a9539...ca36e1069bf0 [12:12:45] [02RequestSSL] 07translatewiki 03ca36e10 - Localisation updates from https://translatewiki.net. [12:12:46] [02MirahezeMagic] 07translatewiki pushed 031 commit to 03master [+0/-0/±4] 13https://github.com/miraheze/MirahezeMagic/compare/7592532880b6...6dc7f6fe84cb [12:12:47] [02MirahezeMagic] 07translatewiki 036dc7f6f - Localisation updates from https://translatewiki.net. [12:12:49] [02RemovePII] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/RemovePII/compare/18114922594f...63e58cb9df9e [12:12:52] [02RemovePII] 07translatewiki 0363e58cb - Localisation updates from https://translatewiki.net. [12:12:55] [02RottenLinks] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/RottenLinks/compare/1b34956fc390...b2a23a38f2ea [12:12:57] [02RottenLinks] 07translatewiki 03b2a23a3 - Localisation updates from https://translatewiki.net. [12:12:58] [02SpriteSheet] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/SpriteSheet/compare/90e10b30d752...2d68df532fa3 [12:12:59] [02SpriteSheet] 07translatewiki 032d68df5 - Localisation updates from https://translatewiki.net. [12:13:00] [02WikiDiscover] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/WikiDiscover/compare/364965e15e4f...d2c71a911108 [12:13:02] [02WikiDiscover] 07translatewiki 03d2c71a9 - Localisation updates from https://translatewiki.net. [12:15:10] miraheze/landing - translatewiki the build passed. [12:15:17] miraheze/ErrorPages - translatewiki the build passed. [12:15:32] miraheze/CreateWiki - translatewiki the build has errored. [12:15:47] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.89, 19.65, 19.24 [12:15:51] miraheze/RequestSSL - translatewiki the build has errored. [12:16:20] miraheze/MirahezeMagic - translatewiki the build has errored. [12:16:51] miraheze/GlobalNewFiles - translatewiki the build passed. [12:17:06] miraheze/DataDump - translatewiki the build passed. [12:17:10] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [12:17:14] miraheze/IncidentReporting - translatewiki the build passed. [12:17:16] miraheze/MatomoAnalytics - translatewiki the build passed. [12:17:23] miraheze/PDFEmbed - translatewiki the build passed. [12:18:57] miraheze/SpriteSheet - translatewiki the build has errored. [12:19:10] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [333 system event log (SEL) entries present] [12:19:51] miraheze/RemovePII - translatewiki the build has errored. [12:20:02] miraheze/RottenLinks - translatewiki the build passed. [12:20:26] miraheze/WikiDiscover - translatewiki the build passed. [12:20:29] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.59, 21.01, 19.37 [12:21:35] miraheze/ImportDump - translatewiki the build passed. [12:22:11] RECOVERY - wiki.gab.pt.eu.org - reverse DNS on sslhost is OK: SSL OK - wiki.gab.pt.eu.org reverse DNS resolves to cp36.wikitide.net - CNAME OK [12:23:36] !log [@test151] starting deploy of {'folders': '1.42/extensions/MirahezeMagic'} to test151 [12:23:37] !log [@test151] finished deploy of {'folders': '1.42/extensions/MirahezeMagic'} to test151 - SUCCESS in 0s [12:23:42] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:23:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:23:51] !log [@test151] starting deploy of {'folders': '1.43/extensions/MirahezeMagic'} to test151 [12:23:52] !log [@test151] finished deploy of {'folders': '1.43/extensions/MirahezeMagic'} to test151 - SUCCESS in 0s [12:23:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:24:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:24:23] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.63, 22.07, 20.14 [12:25:30] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [12:26:19] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.79, 23.38, 20.90 [12:27:30] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0002585947514 secs [12:30:09] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 27.90, 22.35, 20.04 [12:30:12] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.63, 24.06, 21.58 [12:32:03] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 18.72, 20.90, 19.80 [12:32:08] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.54, 23.91, 21.89 [12:33:04] !log [@mwtask181] starting deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all [12:33:13] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:33:17] !log [@mwtask181] finished deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all - SUCCESS in 13s [12:33:27] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:34:05] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.72, 24.50, 22.34 [12:36:01] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.17, 22.65, 21.92 [12:37:47] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.80, 19.33, 19.57 [12:38:04] !log [@mwtask171] starting deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all [12:38:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:38:16] !log [@mwtask171] finished deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all - SUCCESS in 11s [12:38:21] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:45:44] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.58, 19.02, 20.20 [12:48:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 18.53, 20.68, 19.97 [12:49:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:49:39] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.87, 21.62, 21.03 [12:50:13] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.40, 20.12, 19.86 [12:51:36] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.97, 19.05, 20.16 [13:07:26] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.57, 20.21, 19.16 [13:09:21] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 19.43, 19.42, 18.97 [13:10:28] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:12:27] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset 0.0003176033497 secs [13:13:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.65, 20.57, 19.51 [13:15:55] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.35, 18.55, 18.89 [13:31:30] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.52, 22.95, 20.26 [13:33:26] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.71, 22.25, 20.34 [13:37:19] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.61, 20.26, 19.97 [13:50:00] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.64, 21.55, 20.27 [13:51:31] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.01, 20.40, 18.29 [13:54:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:55:20] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.72, 19.99, 18.68 [13:55:50] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.15, 22.71, 21.41 [14:03:36] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.40, 18.17, 19.98 [14:04:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:13:41] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 16.55, 18.94, 23.44 [14:21:41] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.79, 21.82, 22.55 [14:24:59] !log [oa@mwtask181] starting deploy of {'l10n': True, 'versions': '1.42', 'upgrade_extensions': 'MirahezeMagic'} to all [14:25:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:25:07] !log [oa@mwtask181] DEPLOY ABORTED: Non-Zero Exit Code in prep, see output. [14:25:14] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:28:13] !log [oa@mwtask181] starting deploy of {'world': True, 'versions': '1.42', 'upgrade_extensions': 'HeaderTabs'} to all [14:28:18] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:28:35] hmm headertabs seems to be already up to date, must be a bug [14:29:41] !log [oa@mwtask181] finished deploy of {'world': True, 'versions': '1.42', 'upgrade_extensions': 'HeaderTabs'} to all - SUCCESS in 87s [14:29:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:31:37] is ManageWiki deployed yet? [14:31:41] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 17.73, 21.63, 23.11 [14:35:42] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.83, 19.96, 18.38 [14:37:37] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 16.32, 18.96, 18.23 [14:41:41] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 13.26, 16.62, 20.19 [14:54:30] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [15:15:43] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [15:17:46] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [15:19:45] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0002893209457 secs [15:58:45] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:44] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset 0.000480234623 secs [16:04:37] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 29.09, 20.08, 15.79 [16:31:35] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 1 backends are down. mw152 [16:33:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [16:33:30] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [16:33:32] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 23.46, 22.11, 19.17 [16:33:53] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.80, 21.55, 19.72 [16:41:39] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.21, 20.27, 19.86 [16:44:38] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:45:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 16.24, 19.28, 19.48 [16:53:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [16:53:21] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.86, 20.47, 19.86 [16:57:14] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.99, 20.13, 19.93 [16:57:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.14, 21.96, 20.36 [17:01:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.49, 21.00, 20.29 [17:01:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.87, 22.62, 21.12 [17:03:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.29, 22.95, 21.35 [17:05:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.08, 23.28, 21.31 [17:06:16] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [17:07:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.58, 23.68, 21.72 [17:07:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 11.84, 19.75, 20.71 [17:09:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.84, 24.82, 22.37 [17:11:16] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [17:15:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 20.31, 19.14, 20.15 [17:21:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.61, 21.05, 20.70 [17:23:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.47, 22.08, 21.10 [17:25:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.94, 21.21, 20.90 [17:29:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.66, 22.81, 23.98 [17:31:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.01, 19.55, 20.24 [17:35:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.83, 21.50, 20.83 [17:37:19] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.26, 22.49, 21.28 [17:39:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.08, 22.15, 22.96 [17:39:19] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 18.12, 21.01, 20.91 [17:41:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.61, 22.00, 22.76 [17:41:19] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.11, 19.37, 20.35 [17:43:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.30, 22.90, 23.01 [17:45:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.30, 22.32, 22.79 [17:45:28] [02puppet] 07redbluegreenhat created branch 03T11704 - 13https://github.com/miraheze/puppet [17:45:31] [02puppet] 07redbluegreenhat pushed 031 commit to 03T11704 [+0/-0/±1] 13https://github.com/miraheze/puppet/commit/4381a5639a65 [17:45:32] [02puppet] 07redbluegreenhat 034381a56 - T11704: Install espeak and LAME on the MediaWiki appservers [17:47:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.71, 23.83, 23.28 [17:50:09] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.97, 20.23, 20.18 [17:50:29] [02puppet] 07redbluegreenhat opened pull request 03#3888: T11704: Install espeak and LAME on the MediaWiki appservers - 13https://github.com/miraheze/puppet/pull/3888 [17:50:33] [02puppet] 07coderabbitai[bot] commented on pull request 03#3888: T11704: Install espeak and LAME on the MediaWiki appservers - 13https://github.com/miraheze/puppet/pull/3888#issuecomment-2251077411 [17:51:22] miraheze/puppet - redbluegreenhat the build has errored. [17:51:44] [02puppet] 07dependabot[bot] created branch 03dependabot/bundler/modules/graylog/rexml-3.3.2 - 13https://github.com/miraheze/puppet [17:51:46] [02puppet] 07dependabot[bot] pushed 031 commit to 03dependabot/bundler/modules/graylog/rexml-3.3.2 [+0/-0/±1] 13https://github.com/miraheze/puppet/commit/2ca1017a42f6 [17:51:49] [02puppet] 07dependabot[bot] 032ca1017 - build(deps-dev): bump rexml from 3.2.8 to 3.3.2 in /modules/graylog [17:51:52] [02puppet] 07dependabot[bot] opened pull request 03#3889: build(deps-dev): bump rexml from 3.2.8 to 3.3.2 in /modules/graylog - 13https://github.com/miraheze/puppet/pull/3889 [17:51:54] [02puppet] 07dependabot[bot] labeled pull request 03#3889: build(deps-dev): bump rexml from 3.2.8 to 3.3.2 in /modules/graylog - 13https://github.com/miraheze/puppet/pull/3889 [17:51:57] [02puppet] 07dependabot[bot] labeled pull request 03#3889: build(deps-dev): bump rexml from 3.2.8 to 3.3.2 in /modules/graylog - 13https://github.com/miraheze/puppet/pull/3889 [17:51:58] [02puppet] 07coderabbitai[bot] commented on pull request 03#3889: build(deps-dev): bump rexml from 3.2.8 to 3.3.2 in /modules/graylog - 13https://github.com/miraheze/puppet/pull/3889#issuecomment-2251079432 [17:52:03] [02puppet] 07redbluegreenhat pushed 031 commit to 03T11704 [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/4381a5639a65...6954d0b69d3a [17:52:06] [02puppet] 07redbluegreenhat 036954d0b - spaces [17:52:08] [02puppet] 07redbluegreenhat synchronize pull request 03#3888: T11704: Install espeak and LAME on the MediaWiki appservers - 13https://github.com/miraheze/puppet/pull/3888 [17:53:44] [02puppet] 07redbluegreenhat commented on pull request 03#3889: build(deps-dev): bump rexml from 3.2.8 to 3.3.2 in /modules/graylog - 13https://github.com/miraheze/puppet/pull/3889#issuecomment-2251082708 [17:53:48] [02puppet] 07dependabot[bot] closed pull request 03#3889: build(deps-dev): bump rexml from 3.2.8 to 3.3.2 in /modules/graylog - 13https://github.com/miraheze/puppet/pull/3889 [17:53:51] [02puppet] 07dependabot[bot] pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/59e69cffb762...dd67cf10bd35 [17:53:53] [02puppet] 07dependabot[bot] 03dd67cf1 - build(deps-dev): bump rexml from 3.2.8 to 3.3.2 in /modules/graylog (#3889) [17:53:55] [02puppet] 07dependabot[bot] deleted branch 03dependabot/bundler/modules/graylog/rexml-3.3.2 - 13https://github.com/miraheze/puppet [17:53:56] [02puppet] 07dependabot[bot] deleted branch 03dependabot/bundler/modules/graylog/rexml-3.3.2 [17:53:58] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 16.71, 19.06, 19.77 [17:57:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.99, 23.76, 23.92 [17:57:55] PROBLEM - wiki.kingmingshrimp.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.kingmingshrimp.com' expires in 15 day(s) (Sat 10 Aug 2024 05:28:27 PM GMT +0000). [17:58:07] [02ssl] 07WikiTideSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ssl/compare/f599e48daa06...8562d014ab35 [17:58:09] [02ssl] 07WikiTideSSLBot 038562d01 - Bot: Update SSL cert for wiki.kingmingshrimp.com [17:59:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.95, 24.39, 24.11 [18:03:03] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.38, 23.49, 23.81 [18:11:03] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.41, 24.10, 23.76 [18:15:07] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H