[00:00:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.17, 3.56, 3.91 [00:02:52] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.74, 3.20, 3.74 [00:04:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.92, 4.17, 4.03 [00:06:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.04, 3.66, 3.87 [00:08:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.25, 4.70, 4.22 [00:11:57] [02puppet] 07The-Voidwalker pushed 031 commit to 03patch-mwpatches [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/b4b2162f923f...29ee047b4a1a [00:11:58] [02puppet] 07The-Voidwalker 0329ee047 - resolve one TODO, remove other [00:12:01] [02puppet] 07The-Voidwalker synchronize pull request 03#3885: mwdeploy.py: add support for local patches - 13https://github.com/miraheze/puppet/pull/3885 [00:13:13] miraheze/puppet - The-Voidwalker the build has errored. [00:14:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.45, 3.67, 3.96 [00:19:19] [02mw-config] 07OAuthority synchronize pull request 03#5631: Enable edit recovery default - 13https://github.com/miraheze/mw-config/pull/5631 [00:20:24] miraheze/mw-config - OAuthority the build passed. [00:20:29] [02mw-config] 07OAuthority closed pull request 03#5631: Enable edit recovery default - 13https://github.com/miraheze/mw-config/pull/5631 [00:20:30] [02mw-config] 07OAuthority pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/mw-config/compare/ad8c7c2317a1...e89395fcc7c0 [00:20:32] [02mw-config] 07pixDeVl 03e89395f - Enable edit recovery default (#5631) [00:20:35] yaaaaaay [00:20:41] thanks OA [00:20:50] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.46, 2.66, 3.39 [00:21:24] miraheze/mw-config - OAuthority the build passed. [00:23:51] !log [@test151] starting deploy of {'config': True} to test151 [00:23:52] !log [@test151] finished deploy of {'config': True} to test151 - SUCCESS in 0s [00:23:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:24:04] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:24:23] just got deployed [00:24:32] lemme see rq [00:24:52] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.90, 19.43, 17.51 [00:24:56] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.43, 4.74, 3.97 [00:25:24] uuuh not seeing it [00:25:26] you? [00:25:33] [02puppet] 07The-Voidwalker pushed 031 commit to 03patch-mwpatches [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/29ee047b4a1a...cbe95f614972 [00:25:36] [02puppet] 07The-Voidwalker 03cbe95f6 - use try around open for CI [00:25:37] [02puppet] 07The-Voidwalker synchronize pull request 03#3885: mwdeploy.py: add support for local patches - 13https://github.com/miraheze/puppet/pull/3885 [00:25:42] what am I looking for [00:25:51] potentially hasn't gone round to mw appservers yet and is just on test? [00:26:49] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.70, 18.79, 17.51 [00:27:02] miraheze/puppet - The-Voidwalker the build has errored. [00:32:03] !log [@mwtask181] starting deploy of {'config': True} to all [00:32:08] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:32:15] !log [@mwtask181] finished deploy of {'config': True} to all - SUCCESS in 11s [00:32:21] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:32:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:34:32] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.08, 23.10, 19.80 [00:37:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:37:46] !log [@mwtask171] starting deploy of {'config': True} to all [00:37:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:38:01] !log [@mwtask171] finished deploy of {'config': True} to all - SUCCESS in 15s [00:38:13] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:40:45] [02puppet] 07The-Voidwalker pushed 031 commit to 03patch-mwpatches [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/cbe95f614972...10c5688ca69d [00:40:47] [02puppet] 07The-Voidwalker 0310c5688 - add user agent to requests headers [00:40:49] [02puppet] 07The-Voidwalker synchronize pull request 03#3885: mwdeploy.py: add support for local patches - 13https://github.com/miraheze/puppet/pull/3885 [00:42:06] miraheze/puppet - The-Voidwalker the build has errored. [00:42:19] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.36, 22.77, 21.41 [00:42:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.30, 3.38, 3.87 [00:43:07] [02puppet] 07The-Voidwalker pushed 031 commit to 03patch-mwpatches [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/10c5688ca69d...06a0af3e5b55 [00:43:10] [02puppet] 07The-Voidwalker 0306a0af3 - >:( [00:43:11] [02puppet] 07The-Voidwalker synchronize pull request 03#3885: mwdeploy.py: add support for local patches - 13https://github.com/miraheze/puppet/pull/3885 [00:44:26] miraheze/puppet - The-Voidwalker the build has errored. [00:44:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 3.86, 3.91, 4.02 [00:47:13] > potentially hasn't gone round to mw appservers yet and is just on test? [00:47:13] Seems so [00:47:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:47:34] I see the button on dev [00:47:36] checking meta [00:48:34] and shoo off icinga [00:48:57] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:50:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.01, 17.91, 19.74 [00:50:57] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:51:02] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:52:20] confirmed working on meta [00:52:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:53:08] Looks like it just got pushed to production [00:56:39] [02puppet] 07The-Voidwalker pushed 031 commit to 03patch-mwpatches [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/06a0af3e5b55...d9b69a96ebde [00:56:40] [02puppet] 07The-Voidwalker 03d9b69a9 - append -> extend [00:56:43] [02puppet] 07The-Voidwalker synchronize pull request 03#3885: mwdeploy.py: add support for local patches - 13https://github.com/miraheze/puppet/pull/3885 [00:56:52] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [00:57:25] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.079 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:58:12] miraheze/puppet - The-Voidwalker the build passed. [01:02:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:02:27] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [01:03:11] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:05:39] miraheze/puppet - The-Voidwalker the build passed. [01:05:42] miraheze/puppet - redbluegreenhat the build passed. [01:07:24] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [01:07:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:20:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.20, 2.78, 3.90 [01:22:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:22:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.78, 4.35, 4.34 [01:26:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.42, 3.19, 3.92 [01:27:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:30:57] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.33, 4.01, 4.04 [01:32:54] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.88, 3.30, 3.77 [01:33:00] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.91, 20.71, 19.37 [01:34:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.51, 4.22, 4.06 [01:34:56] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.76, 19.76, 19.20 [01:36:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.32, 3.71, 3.91 [01:40:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.88, 4.91, 4.28 [01:48:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.62, 3.50, 3.84 [01:52:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.60, 4.01, 3.94 [01:58:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.80, 3.77, 3.95 [01:59:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [02:00:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 3.17, 3.95, 4.01 [02:01:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [02:02:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.96, 3.82, 3.94 [02:08:51] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.42, 2.44, 3.30 [02:22:25] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:26:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:32:30] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.53, 4.56, 3.66 [02:38:17] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.68, 3.35, 3.47 [02:40:14] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.42, 2.84, 3.25 [02:41:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:43:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:46:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.93, 3.44, 3.42 [02:48:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.09, 4.13, 3.67 [02:48:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:51:17] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [02:51:25] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [02:53:17] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [02:55:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:56:33] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.25, 21.74, 18.71 [02:56:58] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:59:27] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 14 minutes ago with 0 failures [03:00:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:01:41] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:03:38] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [03:05:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:05:33] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 7.473 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:10:20] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:12:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.19, 22.93, 22.91 [03:16:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:21:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:22:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 13.04, 16.69, 19.92 [03:22:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:27:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:29:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:34:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:34:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.07, 3.46, 3.96 [03:40:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.47, 3.48, 3.75 [03:43:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:45:40] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:46:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.48, 19.94, 17.89 [03:47:38] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [03:48:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:49:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:51:29] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:52:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.89, 23.86, 20.38 [03:53:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:53:31] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 2.324 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:54:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:55:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [04:00:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.22, 24.06, 21.69 [04:01:24] What does icinga do anyway [04:01:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:02:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.74, 23.00, 21.61 [04:08:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.78, 23.86, 22.29 [04:08:18] !log [void@puppet181] Upgraded packages openjdk-17-jdk, openjdk-17-jdk-headless, openjdk-17-jre, and openjdk-17-jre-headless on graylog161 [04:08:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:08:37] Does every error from these bots mean something big? [04:08:47] !log [void@puppet181] Upgraded packages openjdk-17-jre, and openjdk-17-jre-headless on kafka181 [04:08:50] RECOVERY - graylog161 APT on graylog161 is OK: APT OK: 56 packages available for upgrade (0 critical updates). [04:08:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:09:09] RECOVERY - kafka181 APT on kafka181 is OK: APT OK: 38 packages available for upgrade (0 critical updates). [04:10:15] !log [void@puppet181] Upgraded packages openjdk-17-jdk, openjdk-17-jdk-headless, openjdk-17-jre, and openjdk-17-jre-headless on os151 [04:10:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:10:40] !log [void@puppet181] Upgraded packages openjdk-17-jdk, openjdk-17-jdk-headless, openjdk-17-jre, and openjdk-17-jre-headless on os162 [04:10:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:10:51] RECOVERY - os151 APT on os151 is OK: APT OK: 54 packages available for upgrade (0 critical updates). [04:11:20] !log [void@puppet181] Upgraded packages openjdk-17-jdk, openjdk-17-jdk-headless, openjdk-17-jre, and openjdk-17-jre-headless on os161 [04:11:29] RECOVERY - os161 APT on os161 is OK: APT OK: 53 packages available for upgrade (0 critical updates). [04:11:33] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:11:52] !log [void@puppet181] Upgraded packages openjdk-17-jdk, openjdk-17-jdk-headless, openjdk-17-jre, and openjdk-17-jre-headless on puppet181 [04:11:53] RECOVERY - puppet181 APT on puppet181 is OK: APT OK: 61 packages available for upgrade (0 critical updates). [04:11:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:12:16] RECOVERY - os162 APT on os162 is OK: APT OK: 54 packages available for upgrade (0 critical updates). [04:16:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:17:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:18:53] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:20:52] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.070 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:22:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:24:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:34:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:44:20] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:45:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [04:45:33] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.33, 21.86, 18.90 [04:46:36] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:47:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [04:51:36] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:53:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:53:33] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 15.47, 21.61, 20.52 [04:55:33] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.08, 19.84, 20.00 [04:56:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.32, 2.92, 3.81 [04:58:20] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:02:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.71, 3.64, 3.80 [05:04:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:05:21] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:06:40] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:06:46] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:07:47] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:09:35] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [05:09:38] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 23 minutes ago with 0 failures [05:09:46] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.219 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:11:36] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [05:12:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.34, 3.17, 3.94 [05:14:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 15.06, 19.23, 23.24 [05:14:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:16:50] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.14, 1.54, 3.09 [05:24:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.09, 21.11, 22.39 [05:30:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.04, 23.57, 23.26 [05:32:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.09, 24.19, 23.55 [05:34:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 16.02, 21.12, 22.50 [05:43:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [05:45:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [05:52:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.80, 18.77, 20.30 [05:53:34] [02mw-config] 07BlankEclair opened pull request 03#5632: T12437: Add $wgTabberNeueUseCodex - 13https://github.com/miraheze/mw-config/pull/5632 [05:54:29] miraheze/mw-config - BlankEclair the build passed. [05:56:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.75, 20.97, 20.86 [06:02:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.57, 23.13, 21.72 [06:04:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.63, 22.83, 21.79 [06:16:14] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [06:18:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.82, 23.34, 22.42 [06:20:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.27, 22.72, 22.32 [06:23:16] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:25:15] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0001912415028 secs [06:26:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.14, 23.05, 22.40 [06:28:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 15.88, 20.57, 21.59 [06:29:39] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [06:31:36] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [06:36:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.89, 19.15, 20.38 [06:44:55] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [06:49:04] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -8.490681648e-05 secs [07:07:16] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.16, 21.21, 20.11 [07:13:06] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.75, 19.74, 19.89 [07:16:25] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:27:33] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.34, 21.01, 20.51 [07:29:29] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.22, 21.90, 20.85 [07:31:26] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.10, 20.01, 20.29 [07:56:40] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.53, 20.84, 19.96 [08:07:12] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [08:09:21] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:11:21] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0008590817451 secs [08:18:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.02, 24.02, 22.07 [08:20:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.46, 23.64, 22.18 [08:28:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.12, 24.80, 23.10 [08:50:01] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.05, 19.54, 18.09 [08:51:58] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.41, 18.43, 17.87 [08:57:47] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.00, 21.47, 19.29 [08:59:44] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.34, 23.52, 20.29 [09:01:41] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.68, 22.97, 20.55 [09:02:07] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:03:39] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 15.03, 20.25, 19.84 [09:06:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.47, 3.01, 1.45 [09:07:07] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:12:51] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.54, 2.48, 1.72 [09:15:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:16:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.89, 21.74, 23.89 [09:18:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.19, 23.08, 24.08 [09:20:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:22:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.59, 23.13, 23.87 [09:25:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:30:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:40:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.11, 23.73, 22.69 [09:41:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:44:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 16.62, 21.41, 22.13 [09:45:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sensors failed with return code 1.-> /usr/sbin/ipmi-sensors was executed with the following parameters: [HIDDEN]sudo /usr/sbin/ipmi-sensors --exclude-sensor-types Drive_Slot,Entity_Presence --quiet-cache --sdr-cache-recreate --interpret-oem-data --output-senso [09:45:27] ignore-not-available-sensors --output-sensor-thresholds [09:46:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:47:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [09:54:37] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:04:37] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:08:03] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:11:57] PROBLEM - wiki.thelastdimension.net - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.thelastdimension.net' expires in 15 day(s) (Fri 23 Aug 2024 09:46:31 AM GMT +0000). [10:12:07] [02ssl] 07WikiTideSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ssl/compare/30540adce6a2...9382899b3140 [10:12:10] [02ssl] 07WikiTideSSLBot 039382899 - Bot: Update SSL cert for wiki.thelastdimension.net [10:13:03] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:16:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.29, 21.98, 21.23 [10:18:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.34, 22.85, 21.69 [10:20:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.87, 24.23, 22.32 [10:21:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:23:10] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [10:23:21] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.69, 4.96, 3.81 [10:24:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.68, 23.98, 22.69 [10:25:17] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.26, 3.98, 3.57 [10:26:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.66, 26.14, 23.64 [10:27:14] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.12, 5.09, 4.03 [10:27:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 1.380 second response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [10:32:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.60, 23.92, 23.50 [10:33:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [10:34:59] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.59, 3.63, 3.82 [10:35:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [10:36:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:41:36] RECOVERY - wiki.thelastdimension.net - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.thelastdimension.net' will expire on Tue 05 Nov 2024 09:13:32 AM GMT +0000. [10:42:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.21, 20.81, 21.51 [10:43:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:44:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.50, 20.66, 21.41 [10:46:15] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [10:48:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.68, 22.54, 21.90 [10:48:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:48:50] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 3.07, 2.86, 3.24 [10:50:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.70, 22.54, 22.03 [10:52:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.22, 23.88, 22.57 [10:57:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:02:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.20, 22.22, 22.79 [11:02:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:03:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:04:42] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.08, 4.42, 3.69 [11:05:27] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [11:07:26] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.000575274229 secs [11:08:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.91, 23.25, 22.83 [11:08:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:08:33] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.98, 3.39, 3.52 [11:10:29] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.24, 2.34, 3.12 [11:12:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.19, 21.92, 22.47 [11:14:15] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [11:25:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [11:26:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.66, 21.25, 21.31 [11:27:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [11:28:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.99, 21.79, 21.47 [11:30:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.29, 23.64, 22.15 [11:32:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.35, 23.23, 22.17 [11:36:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.86, 23.19, 22.32 [11:38:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.61, 21.32, 21.75 [11:42:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.78, 17.69, 20.11 [11:46:31] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:48:31] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0002519786358 secs [11:56:46] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.57, 21.95, 21.07 [12:08:26] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.41, 22.49, 21.41 [12:10:22] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.98, 21.42, 21.15 [12:15:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [12:16:12] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.27, 19.22, 20.35 [12:17:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [12:22:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.36, 21.11, 20.75 [12:24:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.05, 21.99, 21.08 [12:28:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.59, 22.93, 21.72 [12:36:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.17, 22.07, 21.55 [12:40:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.47, 21.96, 21.63 [12:42:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.93, 23.32, 22.16 [12:48:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.33, 22.77, 22.56 [12:52:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.71, 23.32, 22.77 [12:54:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.72, 22.33, 22.43 [12:56:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.69, 24.09, 23.05 [13:04:25] PROBLEM - pso2ngs.wiki - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'pso2ngs.wiki' expires in 15 day(s) (Fri 23 Aug 2024 12:45:00 PM GMT +0000). [13:04:37] [02ssl] 07WikiTideSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ssl/compare/9382899b3140...3f7131a10e4c [13:04:40] [02ssl] 07WikiTideSSLBot 033f7131a - Bot: Update SSL cert for pso2ngs.wiki [13:05:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [13:07:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [13:13:06] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.47, 20.55, 19.07 [13:15:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.90, 22.08, 19.79 [13:18:58] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.21, 22.87, 20.71 [13:22:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:22:52] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.67, 19.81, 19.92 [13:33:44] RECOVERY - pso2ngs.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'pso2ngs.wiki' will expire on Tue 05 Nov 2024 12:06:00 PM GMT +0000. [13:46:20] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [14:04:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 13.39, 20.15, 23.59 [14:10:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 13.18, 15.06, 20.10 [14:14:30] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:22:25] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:46:14] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [14:49:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [14:51:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [15:14:50] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [15:41:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [15:43:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [15:43:44] !log [macfan@mwtask181] sudo -u www-data php /srv/mediawiki/1.42/maintenance/run.php /srv/mediawiki/1.42/maintenance/importImages.php --wiki=baldisbasicswiki /home/macfan/images --search-recursively (START) [15:43:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:47:19] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:49:19] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0001862347126 secs [16:04:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.54, 19.54, 16.53 [16:06:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.11, 21.52, 17.63 [16:08:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.19, 20.95, 17.89 [16:10:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.83, 19.89, 17.88 [16:14:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.26, 23.17, 19.66 [16:16:58] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [16:17:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [16:18:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.86, 23.26, 20.56 [16:20:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.16, 25.51, 21.71 [16:30:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.61, 23.81, 22.95 [16:32:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.77, 24.15, 23.17 [16:33:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [16:35:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [16:40:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 16.02, 22.13, 23.12 [16:44:30] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:50:04] !log [macfan@mwtask181] sudo -u www-data php /srv/mediawiki/1.42/maintenance/run.php /srv/mediawiki/1.42/maintenance/importImages.php --wiki=baldisbasicswiki /home/macfan/images --search-recursively (END - exit=0) [16:50:09] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:54:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.98, 21.51, 21.47 [16:56:09]