[00:00:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.17, 3.56, 3.91 [00:02:52] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.74, 3.20, 3.74 [00:04:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.92, 4.17, 4.03 [00:06:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.04, 3.66, 3.87 [00:08:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.25, 4.70, 4.22 [00:11:57] [02puppet] 07The-Voidwalker pushed 031 commit to 03patch-mwpatches [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/b4b2162f923f...29ee047b4a1a [00:11:58] [02puppet] 07The-Voidwalker 0329ee047 - resolve one TODO, remove other [00:12:01] [02puppet] 07The-Voidwalker synchronize pull request 03#3885: mwdeploy.py: add support for local patches - 13https://github.com/miraheze/puppet/pull/3885 [00:13:13] miraheze/puppet - The-Voidwalker the build has errored. [00:14:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.45, 3.67, 3.96 [00:19:19] [02mw-config] 07OAuthority synchronize pull request 03#5631: Enable edit recovery default - 13https://github.com/miraheze/mw-config/pull/5631 [00:20:24] miraheze/mw-config - OAuthority the build passed. [00:20:29] [02mw-config] 07OAuthority closed pull request 03#5631: Enable edit recovery default - 13https://github.com/miraheze/mw-config/pull/5631 [00:20:30] [02mw-config] 07OAuthority pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/mw-config/compare/ad8c7c2317a1...e89395fcc7c0 [00:20:32] [02mw-config] 07pixDeVl 03e89395f - Enable edit recovery default (#5631) [00:20:35] yaaaaaay [00:20:41] thanks OA [00:20:50] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.46, 2.66, 3.39 [00:21:24] miraheze/mw-config - OAuthority the build passed. [00:23:51] !log [@test151] starting deploy of {'config': True} to test151 [00:23:52] !log [@test151] finished deploy of {'config': True} to test151 - SUCCESS in 0s [00:23:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:24:04] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:24:23] just got deployed [00:24:32] lemme see rq [00:24:52] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.90, 19.43, 17.51 [00:24:56] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.43, 4.74, 3.97 [00:25:24] uuuh not seeing it [00:25:26] you? [00:25:33] [02puppet] 07The-Voidwalker pushed 031 commit to 03patch-mwpatches [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/29ee047b4a1a...cbe95f614972 [00:25:36] [02puppet] 07The-Voidwalker 03cbe95f6 - use try around open for CI [00:25:37] [02puppet] 07The-Voidwalker synchronize pull request 03#3885: mwdeploy.py: add support for local patches - 13https://github.com/miraheze/puppet/pull/3885 [00:25:42] what am I looking for [00:25:51] potentially hasn't gone round to mw appservers yet and is just on test? [00:26:49] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.70, 18.79, 17.51 [00:27:02] miraheze/puppet - The-Voidwalker the build has errored. [00:32:03] !log [@mwtask181] starting deploy of {'config': True} to all [00:32:08] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:32:15] !log [@mwtask181] finished deploy of {'config': True} to all - SUCCESS in 11s [00:32:21] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:32:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:34:32] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.08, 23.10, 19.80 [00:37:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:37:46] !log [@mwtask171] starting deploy of {'config': True} to all [00:37:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:38:01] !log [@mwtask171] finished deploy of {'config': True} to all - SUCCESS in 15s [00:38:13] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:40:45] [02puppet] 07The-Voidwalker pushed 031 commit to 03patch-mwpatches [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/cbe95f614972...10c5688ca69d [00:40:47] [02puppet] 07The-Voidwalker 0310c5688 - add user agent to requests headers [00:40:49] [02puppet] 07The-Voidwalker synchronize pull request 03#3885: mwdeploy.py: add support for local patches - 13https://github.com/miraheze/puppet/pull/3885 [00:42:06] miraheze/puppet - The-Voidwalker the build has errored. [00:42:19] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.36, 22.77, 21.41 [00:42:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.30, 3.38, 3.87 [00:43:07] [02puppet] 07The-Voidwalker pushed 031 commit to 03patch-mwpatches [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/10c5688ca69d...06a0af3e5b55 [00:43:10] [02puppet] 07The-Voidwalker 0306a0af3 - >:( [00:43:11] [02puppet] 07The-Voidwalker synchronize pull request 03#3885: mwdeploy.py: add support for local patches - 13https://github.com/miraheze/puppet/pull/3885 [00:44:26] miraheze/puppet - The-Voidwalker the build has errored. [00:44:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 3.86, 3.91, 4.02 [00:47:13] > potentially hasn't gone round to mw appservers yet and is just on test? [00:47:13] Seems so [00:47:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:47:34] I see the button on dev [00:47:36] checking meta [00:48:34] and shoo off icinga [00:48:57] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:50:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.01, 17.91, 19.74 [00:50:57] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:51:02] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:52:20] confirmed working on meta [00:52:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:53:08] Looks like it just got pushed to production [00:56:39] [02puppet] 07The-Voidwalker pushed 031 commit to 03patch-mwpatches [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/06a0af3e5b55...d9b69a96ebde [00:56:40] [02puppet] 07The-Voidwalker 03d9b69a9 - append -> extend [00:56:43] [02puppet] 07The-Voidwalker synchronize pull request 03#3885: mwdeploy.py: add support for local patches - 13https://github.com/miraheze/puppet/pull/3885 [00:56:52] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [00:57:25] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.079 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:58:12] miraheze/puppet - The-Voidwalker the build passed. [01:02:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:02:27] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [01:03:11] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:05:39] miraheze/puppet - The-Voidwalker the build passed. [01:05:42] miraheze/puppet - redbluegreenhat the build passed. [01:07:24] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [01:07:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:20:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.20, 2.78, 3.90 [01:22:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:22:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.78, 4.35, 4.34 [01:26:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.42, 3.19, 3.92 [01:27:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:30:57] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.33, 4.01, 4.04 [01:32:54] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.88, 3.30, 3.77 [01:33:00] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.91, 20.71, 19.37 [01:34:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.51, 4.22, 4.06 [01:34:56] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.76, 19.76, 19.20 [01:36:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.32, 3.71, 3.91 [01:40:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.88, 4.91, 4.28 [01:48:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.62, 3.50, 3.84 [01:52:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.60, 4.01, 3.94 [01:58:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.80, 3.77, 3.95 [01:59:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [02:00:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 3.17, 3.95, 4.01 [02:01:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [02:02:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.96, 3.82, 3.94 [02:08:51] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.42, 2.44, 3.30 [02:22:25] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:26:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:32:30] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.53, 4.56, 3.66 [02:38:17] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.68, 3.35, 3.47 [02:40:14] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.42, 2.84, 3.25 [02:41:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:43:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:46:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.93, 3.44, 3.42 [02:48:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.09, 4.13, 3.67 [02:48:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:51:17] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [02:51:25] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [02:53:17] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [02:55:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:56:33] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.25, 21.74, 18.71 [02:56:58] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:59:27] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 14 minutes ago with 0 failures [03:00:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:01:41] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:03:38] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [03:05:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:05:33] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 7.473 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:10:20] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:12:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.19, 22.93, 22.91 [03:16:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:21:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:22:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 13.04, 16.69, 19.92 [03:22:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:27:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:29:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:34:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:34:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.07, 3.46, 3.96 [03:40:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.47, 3.48, 3.75 [03:43:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:45:40] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:46:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.48, 19.94, 17.89 [03:47:38] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [03:48:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:49:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:51:29] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:52:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.89, 23.86, 20.38 [03:53:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:53:31] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 2.324 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:54:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:55:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [04:00:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.22, 24.06, 21.69 [04:01:24] What does icinga do anyway [04:01:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:02:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.74, 23.00, 21.61 [04:08:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.78, 23.86, 22.29 [04:08:18] !log [void@puppet181] Upgraded packages openjdk-17-jdk, openjdk-17-jdk-headless, openjdk-17-jre, and openjdk-17-jre-headless on graylog161 [04:08:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:08:37] Does every error from these bots mean something big? [04:08:47] !log [void@puppet181] Upgraded packages openjdk-17-jre, and openjdk-17-jre-headless on kafka181 [04:08:50] RECOVERY - graylog161 APT on graylog161 is OK: APT OK: 56 packages available for upgrade (0 critical updates). [04:08:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:09:09] RECOVERY - kafka181 APT on kafka181 is OK: APT OK: 38 packages available for upgrade (0 critical updates). [04:10:15] !log [void@puppet181] Upgraded packages openjdk-17-jdk, openjdk-17-jdk-headless, openjdk-17-jre, and openjdk-17-jre-headless on os151 [04:10:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:10:40] !log [void@puppet181] Upgraded packages openjdk-17-jdk, openjdk-17-jdk-headless, openjdk-17-jre, and openjdk-17-jre-headless on os162 [04:10:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:10:51] RECOVERY - os151 APT on os151 is OK: APT OK: 54 packages available for upgrade (0 critical updates). [04:11:20] !log [void@puppet181] Upgraded packages openjdk-17-jdk, openjdk-17-jdk-headless, openjdk-17-jre, and openjdk-17-jre-headless on os161 [04:11:29] RECOVERY - os161 APT on os161 is OK: APT OK: 53 packages available for upgrade (0 critical updates). [04:11:33] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:11:52] !log [void@puppet181] Upgraded packages openjdk-17-jdk, openjdk-17-jdk-headless, openjdk-17-jre, and openjdk-17-jre-headless on puppet181 [04:11:53] RECOVERY - puppet181 APT on puppet181 is OK: APT OK: 61 packages available for upgrade (0 critical updates). [04:11:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:12:16] RECOVERY - os162 APT on os162 is OK: APT OK: 54 packages available for upgrade (0 critical updates). [04:16:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:17:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:18:53] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:20:52] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.070 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:22:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:24:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:34:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:44:20] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:45:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [04:45:33] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.33, 21.86, 18.90 [04:46:36] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:47:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [04:51:36] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:53:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:53:33] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 15.47, 21.61, 20.52 [04:55:33] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.08, 19.84, 20.00 [04:56:51] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.32, 2.92, 3.81 [04:58:20] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:02:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.71, 3.64, 3.80 [05:04:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:05:21] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:06:40] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:06:46] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:07:47] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:09:35] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [05:09:38] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 23 minutes ago with 0 failures [05:09:46] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.219 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:11:36] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [05:12:50] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.34, 3.17, 3.94 [05:14:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 15.06, 19.23, 23.24 [05:14:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:16:50] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.14, 1.54, 3.09 [05:24:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.09, 21.11, 22.39 [05:30:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.04, 23.57, 23.26 [05:32:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.09, 24.19, 23.55 [05:34:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 16.02, 21.12, 22.50 [05:43:25] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [05:45:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [05:52:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.80, 18.77, 20.30 [05:53:34] [02mw-config] 07BlankEclair opened pull request 03#5632: T12437: Add $wgTabberNeueUseCodex - 13https://github.com/miraheze/mw-config/pull/5632 [05:54:29] miraheze/mw-config - BlankEclair the build passed. [05:56:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.75, 20.97, 20.86 [06:02:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.57, 23.13, 21.72 [06:04:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.63, 22.83, 21.79 [06:16:14] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [06:18:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.82, 23.34, 22.42 [06:20:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.27, 22.72, 22.32 [06:23:16] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:25:15] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0001912415028 secs [06:26:09] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.14, 23.05, 22.40 [06:28:09] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 15.88, 20.57, 21.59 [06:29:39] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [06:31:36] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [06:36:09] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.89, 19.15, 20.38 [06:44:55] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o