[00:10:17] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 2001:41d0:800:170b::5/cpweb, 2607:5300:201:3100::1d3/cpweb [00:12:17] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [00:13:12] PROBLEM - graylog2 Current Load on graylog2 is CRITICAL: CRITICAL - load average: 5.30, 4.02, 3.00 [00:24:49] PROBLEM - graylog2 Current Load on graylog2 is WARNING: WARNING - load average: 3.64, 3.89, 3.58 [00:30:38] RECOVERY - graylog2 Current Load on graylog2 is OK: OK - load average: 2.38, 3.04, 3.29 [01:05:10] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.15, 3.68, 3.27 [01:07:05] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.99, 3.74, 3.34 [01:12:51] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.56, 3.18, 3.23 [01:21:31] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 5.06, 4.14, 3.59 [01:37:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 16.76, 20.35, 23.84 [01:39:32] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.04, 3.85, 3.93 [01:41:25] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 28.79, 23.25, 24.10 [01:41:31] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 6.09, 4.71, 4.23 [01:45:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 17.66, 22.71, 23.94 [01:49:25] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 28.13, 24.86, 24.46 [02:01:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 16.60, 21.70, 23.80 [02:02:02] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.08, 4.88, 5.88 [02:08:02] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.22, 3.85, 5.08 [02:11:26] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 30.71, 20.80, 21.48 [02:11:31] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.52, 3.59, 3.96 [02:13:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.34, 20.27, 21.18 [02:13:31] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.32, 3.66, 3.93 [02:14:01] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.47, 5.55, 5.47 [02:15:25] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 30.73, 22.45, 21.76 [02:15:31] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.08, 3.29, 3.75 [02:16:01] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 11.36, 7.58, 6.21 [02:19:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.84, 22.58, 22.04 [02:21:25] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 29.52, 25.01, 22.98 [02:21:31] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.16, 3.45, 3.64 [02:23:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 15.95, 21.88, 22.11 [02:23:31] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.37, 3.32, 3.57 [02:26:02] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.19, 5.42, 5.96 [02:27:25] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 12.53, 16.69, 19.93 [02:27:31] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.88, 3.00, 3.37 [02:32:02] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.21, 3.95, 5.10 [02:34:22] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.31, 3.36, 3.41 [02:36:17] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.84, 3.19, 3.35 [02:47:49] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.48, 3.30, 3.22 [02:49:44] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.91, 3.87, 3.44 [02:51:39] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.83, 3.92, 3.51 [03:03:31] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.12, 3.73, 3.55 [03:05:31] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.00, 3.63, 3.55 [03:13:31] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.37, 2.97, 3.29 [03:14:01] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.38, 4.60, 4.14 [03:16:02] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.87, 4.44, 4.12 [03:17:31] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.94, 3.41, 3.41 [03:21:31] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.61, 3.15, 3.31 [03:27:31] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.42, 3.57, 3.45 [03:29:31] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 3.31, 3.30, 3.35 [04:28:32] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.82, 3.67, 3.32 [04:30:27] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.50, 3.26, 3.22 [04:34:18] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.75, 3.63, 3.39 [04:38:09] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.44, 3.10, 3.24 [05:00:22] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.27, 3.68, 3.46 [05:02:17] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.71, 3.68, 3.49 [05:04:12] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.86, 3.98, 3.61 [05:06:07] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.28, 3.34, 3.42 [05:11:52] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.11, 3.60, 3.50 [05:13:48] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.33, 3.52, 3.48 [05:27:31] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.58, 3.02, 3.30 [06:04:35] PROBLEM - db11 Current Load on db11 is CRITICAL: CRITICAL - load average: 8.93, 6.02, 3.73 [06:07:13] alerting : [FIRING:2] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [06:12:12] ok : [RESOLVED] (mediawiki) https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki [06:12:35] RECOVERY - db11 Current Load on db11 is OK: OK - load average: 1.84, 6.73, 5.94 [06:21:39] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.68, 2.99, 2.78 [06:23:34] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 3.30, 3.17, 2.88 [06:24:52] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/importDump.php --wiki=senrankagurawiki --username-prefix wikia:senrankagura --report 1 --no-updates /home/reception/kagura_pages_full.xml (START) [06:24:56] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [06:25:52] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/importDump.php --wiki=henryedwardsenkafanonwiki --username-prefix wikia:senran-kagura-stonehammer-and-henrys-fanon --report 1 --no-updates /home/reception/senrankagurastonehammerandhenrysfanon_pages_full.xml (START) [06:25:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [06:27:08] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/importDump.php --wiki=henryedwardsenkafanonwiki --username-prefix wikia:senran-kagura-stonehammer-and-henrys-fanon --report 1 --no-updates /home/reception/senrankagurastonehammerandhenrysfanon_pages_full.xml (END - exit=0) [06:27:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [06:27:35] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/importDump.php --wiki=henryedwardsenkafanonwiki --username-prefix wikia:senran-kagura-stonehammer-and-henrys-fanon --report 1 --no-updates /home/reception/senrankagurastonehammerandhenrysfanon_pages_full.xml (START) [06:27:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [06:28:14] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/importDump.php --wiki=henryedwardsenkafanonwiki --username-prefix wikia:senran-kagura-stonehammer-and-henrys-fanon --report 1 --no-updates /home/reception/senrankagurastonehammerandhenrysfanon_pages_full.xml (END - exit=0) [06:28:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [06:28:59] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/rebuildall.php --wiki=henryedwardsenkafanonwiki (START) [06:29:05] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [06:34:15] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/rebuildall.php --wiki=henryedwardsenkafanonwiki (END - exit=0) [06:34:18] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [06:54:06] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03Reception123-patch-1 [+0/-0/±1] 13https://git.io/JMsEg [06:54:07] [02miraheze/mw-config] 07Reception123 03c565f60 - temporarily switch ReCaptcha back to v2 due to persistent errors (T8335) [06:54:09] [02mw-config] 07Reception123 created branch 03Reception123-patch-1 - 13https://git.io/vbvb3 [06:54:10] [02mw-config] 07Reception123 opened pull request 03#4235: temporarily switch ReCaptcha back to v2 due to persistent errors (T8335) - 13https://git.io/JMsE2 [06:55:12] miraheze/mw-config - Reception123 the build passed. [06:58:01] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.56, 5.70, 4.13 [06:58:14] [02mw-config] 07Reception123 closed pull request 03#4235: temporarily switch ReCaptcha back to v2 due to persistent errors (T8335) - 13https://git.io/JMsE2 [06:58:16] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JMsuX [06:58:17] [02miraheze/mw-config] 07Reception123 034086164 - temporarily switch ReCaptcha back to v2 due to persistent errors (T8335) (#4235) [06:58:19] [02mw-config] 07Reception123 deleted branch 03Reception123-patch-1 - 13https://git.io/vbvb3 [06:58:20] [02miraheze/mw-config] 07Reception123 deleted branch 03Reception123-patch-1 [06:59:02] !log [reception@mw11] starting deploy of {'config': True} to all [06:59:05] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [06:59:11] !log [reception@mw11] finished deploy of {'config': True} to all - SUCCESS in 9s [06:59:14] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [06:59:19] miraheze/mw-config - Reception123 the build passed. [07:00:02] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.56, 5.54, 4.27 [07:01:56] !log [@mw11] starting deploy of {'config': True} to all [07:02:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [07:02:01] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.43, 4.72, 4.12 [07:02:03] !log [@mw11] finished deploy of {'config': True} to all - SUCCESS in 6s [07:02:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [07:04:01] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 14.13, 8.04, 5.14 [07:05:59] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 4.60, 6.48, 4.91 [07:10:01] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.80, 5.21, 4.61 [07:14:01] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.41, 4.62, 4.53 [07:24:07] !log [@test3] starting deploy of {'config': True} to skip [07:24:08] !log [@test3] finished deploy of {'config': True} to skip - SUCCESS in 0s [07:24:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [07:24:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [07:58:42] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/importDump.php --wiki=senrankagurawiki --username-prefix wikia:senrankagura --report 1 --no-updates /home/reception/kagura_pages_full.xml (END - exit=0) [07:58:47] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [07:59:09] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/rebuildAll.php --wiki=senrankagurawiki (END - exit=256) [07:59:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [07:59:28] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/rebuildall.php --wiki=senrankagurawiki (START) [07:59:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [08:00:28] !log [reception@mwtask1] sudo -u www-data php /srv/mediawiki/w/maintenance/rebuildall.php --wiki=senrankagurawiki (END - exit=256) [08:00:31] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [08:55:21] PROBLEM - ambient.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'ambient.wiki' expires in 7 day(s) (Sun 05 Dec 2021 08:49:56 GMT +0000). [09:54:31] PROBLEM - lcn.zfc.id.lv - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for lcn.zfc.id.lv could not be found [10:01:21] RECOVERY - lcn.zfc.id.lv - reverse DNS on sslhost is OK: SSL OK - lcn.zfc.id.lv reverse DNS resolves to cp13.miraheze.org - CNAME OK [10:22:13] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.80, 19.31, 16.73 [10:24:08] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 15.60, 17.87, 16.51 [13:19:03] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.33, 5.27, 4.41 [13:20:58] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.62, 4.97, 4.40 [14:00:01] PROBLEM - mw12 APT on mw12 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:01:53] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 6.87, 7.64, 5.38 [14:01:57] RECOVERY - mw12 APT on mw12 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [14:03:51] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 3.89, 6.31, 5.16 [14:21:34] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 8.58, 5.65, 4.48 [14:22:10] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.63, 19.76, 17.36 [14:23:29] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.49, 5.46, 4.56 [14:24:04] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 15.48, 18.40, 17.15 [14:25:24] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.51, 4.92, 4.45 [14:53:25] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 31.49, 23.35, 18.56 [14:55:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.70, 22.48, 18.87 [14:59:53] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 9.03, 6.74, 5.36 [15:01:25] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.03, 22.36, 19.69 [15:01:51] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 4.99, 6.10, 5.30 [15:03:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.33, 21.83, 19.81 [15:04:01] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.19, 5.41, 4.84 [15:05:25] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 15.05, 19.06, 19.02 [15:06:01] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.74, 5.03, 4.76 [15:21:25] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.50, 18.96, 18.10 [15:23:25] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 19.23, 18.86, 18.16 [15:27:56] PROBLEM - cp12 Current Load on cp12 is CRITICAL: CRITICAL - load average: 1.39, 2.09, 1.40 [15:29:56] RECOVERY - cp12 Current Load on cp12 is OK: OK - load average: 0.98, 1.65, 1.32 [15:46:39] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.80, 5.28, 4.79 [15:48:35] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 5.04, 4.96, 4.72 [15:51:55] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 9.24, 6.06, 4.41 [15:52:17] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.25, 3.41, 3.04 [15:53:53] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 3.88, 5.17, 4.28 [15:56:07] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.88, 3.42, 3.15 [15:58:02] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.90, 3.16, 3.08 [16:11:31] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.59, 3.65, 3.25 [16:14:02] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.59, 5.03, 4.74 [16:15:31] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.25, 3.07, 3.13 [16:18:02] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.37, 4.31, 4.52 [16:21:27] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 6.88, 5.42, 4.32 [16:22:04] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 26.24, 22.68, 19.58 [16:23:25] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 4.07, 4.95, 4.29 [16:23:58] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 16.12, 20.26, 19.08 [16:53:20] Alright, thanks. [16:57:31] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.45, 3.55, 3.38 [17:01:31] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.87, 3.18, 3.28 [17:10:49] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.17, 5.57, 4.65 [17:11:31] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.67, 3.89, 3.51 [17:11:58] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 22.66, 20.59, 18.42 [17:12:44] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.62, 4.90, 4.51 [17:13:31] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.26, 3.45, 3.41 [17:13:49] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 7.27, 6.08, 4.40 [17:15:47] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.32, 22.26, 19.58 [17:15:49] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 6.06, 6.06, 4.60 [17:16:02] PROBLEM - mw8 Current Load on mw8 is WARNING: WARNING - load average: 7.30, 6.24, 4.77 [17:17:42] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.65, 21.34, 19.54 [17:18:02] RECOVERY - mw8 Current Load on mw8 is OK: OK - load average: 5.15, 5.91, 4.83 [17:18:32] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.76, 5.36, 4.82 [17:19:37] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 28.06, 23.45, 20.49 [17:20:27] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.03, 4.96, 4.74 [17:20:40] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.50, 3.91, 3.63 [17:21:36] PROBLEM - cp20 APT on cp20 is CRITICAL: APT CRITICAL: 19 packages available for upgrade (3 critical updates). [17:22:27] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 7.22, 8.23, 6.16 [17:22:30] PROBLEM - cp20 Puppet on cp20 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 9 minutes ago with 1 failures. Failed resources (up to 3 shown): User[johnflewis] [17:22:33] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.12, 7.35, 5.98 [17:22:39] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.60, 3.75, 3.61 [17:24:18] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 12.59, 8.17, 6.03 [17:24:22] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 4.02, 6.84, 5.90 [17:24:30] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 4.83, 6.62, 5.89 [17:24:38] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 1.63, 3.05, 3.37 [17:26:17] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 3.34, 5.80, 5.63 [17:28:34] PROBLEM - cp20 Current Load on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:28:50] PROBLEM - cp20 conntrack_table_size on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:28:57] PROBLEM - cp20 PowerDNS Recursor on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:29:12] PROBLEM - cp20 ferm_active on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:29:20] PROBLEM - cp20 NTP time on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:29:23] PROBLEM - cp20 Disk Space on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:30:45] RECOVERY - cp20 conntrack_table_size on cp20 is OK: OK: nf_conntrack is 0 % full [17:30:57] RECOVERY - cp20 PowerDNS Recursor on cp20 is OK: DNS OK: 0.172 seconds response time. miraheze.org returns 2001:41d0:800:170b::5,2001:41d0:801:2000::58af,51.38.69.175,54.38.211.199 [17:31:12] RECOVERY - cp20 ferm_active on cp20 is OK: OK ferm input default policy is set [17:31:18] RECOVERY - cp20 NTP time on cp20 is OK: NTP OK: Offset 6.139278412e-05 secs [17:31:23] RECOVERY - cp20 Disk Space on cp20 is OK: DISK OK - free space: / 36584 MB (93% inode=97%); [17:33:19] cp20? [17:33:20] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.82, 23.45, 23.99 [17:34:44] PROBLEM - cp20 conntrack_table_size on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:34:57] PROBLEM - cp20 PowerDNS Recursor on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:35:12] PROBLEM - cp20 ferm_active on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:35:21] PROBLEM - cp20 NTP time on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:35:23] PROBLEM - cp20 Disk Space on cp20 is CRITICAL: connect to address 51.195.220.68 port 5666: Connection refusedconnect to host 51.195.220.68 port 5666: Connection refused [17:37:21] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.77, 22.60, 23.39 [17:39:20] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 13.95, 19.20, 22.04 [17:40:10] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.92, 5.25, 5.90 [17:45:21] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 14.99, 16.31, 19.84 [17:46:10] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.70, 4.02, 5.09 [17:50:54] PROBLEM - mw9 Current Load on mw9 is CRITICAL: CRITICAL - load average: 12.53, 6.93, 5.34 [17:51:33] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.60, 19.87, 20.17 [17:52:48] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 5.20, 6.01, 5.19 [17:53:33] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 12.95, 17.36, 19.22 [18:17:09] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.13, 3.64, 3.26 [18:18:51] PROBLEM - mw11 Current Load on mw11 is WARNING: WARNING - load average: 6.82, 6.42, 5.00 [18:19:04] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.16, 3.43, 3.23 [18:20:50] RECOVERY - mw11 Current Load on mw11 is OK: OK - load average: 5.58, 6.07, 5.04 [18:21:23] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.59, 19.11, 17.69 [18:21:55] PROBLEM - mw9 Current Load on mw9 is WARNING: WARNING - load average: 7.05, 6.24, 4.94 [18:23:23] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 16.23, 18.02, 17.46 [18:23:55] RECOVERY - mw9 Current Load on mw9 is OK: OK - load average: 4.52, 5.42, 4.79 [18:24:47] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 3.10, 3.35, 3.26 [18:28:36] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.63, 3.63, 3.42 [18:32:24] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.46, 3.35, 3.37 [18:39:04] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.12, 3.82, 3.57 [18:42:43] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 5.42, 5.46, 4.71 [18:44:37] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.92, 4.87, 4.58 [18:44:47] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.30, 3.81, 3.67 [18:49:10] majavah: we're rehashing out front end setup to reduce costs. We're replacing our existing monthly contracted servers with ones on a 24-month contract [18:50:10] ah, interesting - I was just wondering why there are alerts about a server not in puppet site.pp [18:50:30] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.28, 3.72, 3.65 [18:51:01] Yeah, they're not in DNS or site.pp yet - I'm installing them on the base directory only currently [18:51:38] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 20.94, 19.12, 18.03 [18:52:05] PROBLEM - cp20 conntrack_table_size on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:52:14] PROBLEM - cp20 SMART on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:52:20] PROBLEM - cp20 PowerDNS Recursor on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:52:22] PROBLEM - cp20 NTP time on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:52:24] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.54, 3.69, 3.65 [18:52:28] PROBLEM - cp20 APT on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:52:30] PROBLEM - cp20 Puppet on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:52:37] PROBLEM - cp20 Disk Space on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:52:53] PROBLEM - cp20 Current Load on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:52:58] PROBLEM - cp20 ferm_active on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:53:32] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 15.92, 17.82, 17.68 [18:54:04] RECOVERY - cp20 conntrack_table_size on cp20 is OK: OK: nf_conntrack is 0 % full [18:54:13] PROBLEM - cp20 SMART on cp20 is UNKNOWN: UNKNOWN: [/dev/sda] - No health status line found| [18:54:20] RECOVERY - cp20 PowerDNS Recursor on cp20 is OK: DNS OK: 0.253 seconds response time. miraheze.org returns 2001:41d0:800:170b::5,2001:41d0:801:2000::58af,51.38.69.175,54.38.211.199 [18:54:21] RECOVERY - cp20 NTP time on cp20 is OK: NTP OK: Offset -0.0002700984478 secs [18:54:28] RECOVERY - cp20 Puppet on cp20 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [18:54:36] RECOVERY - cp20 Disk Space on cp20 is OK: DISK OK - free space: / 36850 MB (95% inode=97%); [18:54:52] RECOVERY - cp20 Current Load on cp20 is OK: OK - load average: 0.83, 0.64, 0.30 [18:54:57] RECOVERY - cp20 ferm_active on cp20 is OK: OK ferm input default policy is set [18:56:19] RECOVERY - cp20 APT on cp20 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [18:57:42] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.31, 5.25, 4.62 [18:59:14] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.20, 21.46, 19.11 [18:59:38] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.83, 5.02, 4.61 [19:01:08] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 14.21, 18.85, 18.45 [19:04:02] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.82, 3.11, 3.39 [19:09:19] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 3.92, 5.25, 4.83 [19:09:21] > cp20 PowerDNS Recursor on cp20 is OK: DNS OK: 0.253 seconds response time. miraheze.org returns 2001:41d0:800:170b::5,2001:41d0:801:2000::58af,51.38.69.175,54.38.211.199 [19:09:21] ^ I haven't seen that before. Have we added PowerDNS to our DNS stack? [19:10:31] @dmehus Oh, you're here. [19:10:33] It's been around as a local server DNS cache for a few months now [19:10:52] it cuts DNS queries down from 100ms+ to a few microseconds [19:11:14] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 5.04, 5.06, 4.80 [19:11:37] [02miraheze/dns] 07JohnFLewis pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JMnEY [19:11:38] [02miraheze/dns] 07JohnFLewis 03569e8f0 - push cp2* and cp3* hostnames [19:13:35] JohnLewis, ah, that's what was done. I remember when that was being discussed and implemented, but didn't realize we were using PowerDNS for it [19:17:12] pDNS is a big player for recursor - it's more light weight and flexible than ones such as bind which was a big appeal [19:22:11] PROBLEM - ping6 on cp30 is CRITICAL: CRITICAL - Destination Unreachable (fe80::f816:3eff:fe26:997a) [19:22:16] PROBLEM - ping6 on cp21 is CRITICAL: CRITICAL - Destination Unreachable (fe80::f816:3eff:fe14:7073) [19:22:18] PROBLEM - cp21 APT on cp21 is CRITICAL: APT CRITICAL: 19 packages available for upgrade (3 critical updates). [19:22:21] PROBLEM - cp31 APT on cp31 is CRITICAL: APT CRITICAL: 19 packages available for upgrade (3 critical updates). [19:22:23] PROBLEM - ping6 on cp31 is CRITICAL: CRITICAL - Destination Unreachable (fe80::f816:3eff:fe55:736d) [19:23:20] PROBLEM - cp14 Stunnel Http for mw9 on cp14 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.009 second response time [19:23:23] PROBLEM - cp15 Stunnel Http for mw9 on cp15 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.259 second response time [19:23:23] PROBLEM - mw9 MediaWiki Rendering on mw9 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.007 second response time [19:23:36] PROBLEM - cp12 Varnish Backends on cp12 is CRITICAL: 1 backends are down. mw9 [19:23:52] PROBLEM - cp12 Stunnel Http for mw9 on cp12 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.245 second response time [19:23:56] PROBLEM - cp13 Stunnel Http for mw9 on cp13 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.005 second response time [19:24:01] PROBLEM - cp14 Varnish Backends on cp14 is CRITICAL: 1 backends are down. mw9 [19:24:16] RECOVERY - cp31 APT on cp31 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [19:24:18] PROBLEM - cp15 Varnish Backends on cp15 is CRITICAL: 1 backends are down. mw9 [19:24:29] PROBLEM - cp13 Varnish Backends on cp13 is CRITICAL: 1 backends are down. mw9 [19:28:06] JohnLewis, yeah I've used PowerDNS many many years ago. Good DNS provider [19:28:41] they're based in the EU, iirc [19:28:56] but I forget in which country they're headquartered [19:32:02] RECOVERY - cp21 APT on cp21 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [19:40:21] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.92, 5.42, 4.51 [19:42:20] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.86, 5.15, 4.53 [19:46:21] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 6.87, 6.41, 5.18 [19:48:52] RECOVERY - ping6 on cp21 is OK: PING OK - Packet loss = 0%, RTA = 3.14 ms [19:50:14] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 5.11, 4.15, 3.50 [19:52:11] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.28, 5.48, 5.22 [19:54:06] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.48, 4.92, 5.05 [19:59:50] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.91, 3.90, 3.73 [20:01:45] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.18, 4.10, 3.83 [20:03:48] RECOVERY - ping6 on cp30 is OK: PING OK - Packet loss = 0%, RTA = 91.47 ms [20:04:10] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.47, 3.80, 3.75 [20:04:43] RECOVERY - ping6 on cp31 is OK: PING OK - Packet loss = 0%, RTA = 80.46 ms [20:11:54] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.52, 3.81, 3.70 [20:13:50] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 2.09, 3.18, 3.49 [20:15:30] [02miraheze/puppet] 07JohnFLewis pushed 031 commit to 03master [+4/-0/±1] 13https://git.io/JMnPK [20:15:31] [02miraheze/puppet] 07JohnFLewis 036b40979 - add cp20/21/30/31 [20:17:43] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.56, 3.04, 3.38 [20:21:35] PROBLEM - cp20 Varnish Backends on cp20 is CRITICAL: 5 backends are down. mw8 mw9 mw10 mw11 mw12 [20:21:41] PROBLEM - cp20 Stunnel Http for mw9 on cp20 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.010 second response time [20:21:44] PROBLEM - cp30 Varnish Backends on cp30 is CRITICAL: 5 backends are down. mw8 mw9 mw10 mw11 mw12 [20:21:53] PROBLEM - cp21 Stunnel Http for mw11 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:00] PROBLEM - cp21 Stunnel Http for mw8 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:00] PROBLEM - cp31 Stunnel Http for mw9 on cp31 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.240 second response time [20:22:01] PROBLEM - cp21 Varnish Backends on cp21 is CRITICAL: 5 backends are down. mw8 mw9 mw10 mw11 mw12 [20:22:01] PROBLEM - cp21 Stunnel Http for mw9 on cp21 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.006 second response time [20:22:10] PROBLEM - cp20 Stunnel Http for mw8 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:11] PROBLEM - cp31 Stunnel Http for mw8 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:11] PROBLEM - cp30 Stunnel Http for mw11 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:13] PROBLEM - cp21 HTTP 4xx/5xx ERROR Rate on cp21 is WARNING: WARNING - NGINX Error Rate is 50% [20:22:14] PROBLEM - cp21 Stunnel Http for mw12 on cp21 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:17] PROBLEM - cp30 Stunnel Http for mw9 on cp30 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 344 bytes in 0.233 second response time [20:22:20] PROBLEM - cp30 Stunnel Http for mw12 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:20] PROBLEM - cp31 Varnish Backends on cp31 is CRITICAL: 4 backends are down. mw8 mw9 mw11 mw12 [20:22:22] PROBLEM - cp30 Stunnel Http for mw8 on cp30 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:25] PROBLEM - cp20 Stunnel Http for mw11 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:29] PROBLEM - cp31 Stunnel Http for mw12 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:43] PROBLEM - cp31 Stunnel Http for mw11 on cp31 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:22:43] PROBLEM - cp20 Stunnel Http for mw12 on cp20 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:24:11] RECOVERY - cp21 HTTP 4xx/5xx ERROR Rate on cp21 is OK: OK - NGINX Error Rate is 33% [20:25:28] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.66, 3.53, 3.43 [20:29:20] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.28, 3.88, 3.59 [20:30:02] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 7.33, 6.08, 4.99 [20:30:11] RECOVERY - cp14 Stunnel Http for mw9 on cp14 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.082 second response time [20:30:18] RECOVERY - cp30 Stunnel Http for mw9 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 15868 bytes in 0.303 second response time [20:30:21] RECOVERY - cp21 Stunnel Http for mw12 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 15869 bytes in 0.006 second response time [20:30:37] RECOVERY - cp30 Stunnel Http for mw12 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.353 second response time [20:30:45] RECOVERY - cp12 Stunnel Http for mw9 on cp12 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.308 second response time [20:30:47] RECOVERY - cp31 Stunnel Http for mw12 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.319 second response time [20:30:50] RECOVERY - cp13 Stunnel Http for mw9 on cp13 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.019 second response time [20:30:51] RECOVERY - cp14 Varnish Backends on cp14 is OK: All 9 backends are healthy [20:30:56] and now hopefully everything goes green [20:31:01] RECOVERY - mw9 MediaWiki Rendering on mw9 is OK: HTTP OK: HTTP/1.1 200 OK - 20007 bytes in 0.200 second response time [20:31:11] RECOVERY - cp20 Stunnel Http for mw12 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 3.051 second response time [20:31:15] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.17, 3.65, 3.54 [20:31:23] JohnLewis, oh that's from your last commit [20:31:24] RECOVERY - cp20 Stunnel Http for mw9 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.020 second response time [20:31:31] RECOVERY - cp12 Varnish Backends on cp12 is OK: All 9 backends are healthy [20:31:32] I'm surprised you didn't manually run Puppet :P [20:31:33] RECOVERY - cp15 Varnish Backends on cp15 is OK: All 9 backends are healthy [20:31:39] RECOVERY - cp21 Stunnel Http for mw9 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 1.661 second response time [20:31:44] RECOVERY - cp13 Varnish Backends on cp13 is OK: All 9 backends are healthy [20:31:51] RECOVERY - cp15 Stunnel Http for mw9 on cp15 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.572 second response time [20:31:57] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 3.52, 5.15, 4.78 [20:32:00] I would need to run puppet on half our servers, it running manually is fine and lets me see things are working more easily [20:32:03] RECOVERY - cp31 Stunnel Http for mw9 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 3.925 second response time [20:32:15] JohnLewis, oh, yeah, that makes sense [20:32:21] true [20:33:04] PROBLEM - gluster3 Current Load on gluster3 is CRITICAL: CRITICAL - load average: 7.56, 5.47, 4.47 [20:35:03] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 3.94, 4.87, 4.38 [20:40:54] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 2.56, 3.00, 3.32 [20:45:24] RECOVERY - cp31 Stunnel Http for mw8 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.327 second response time [20:45:27] RECOVERY - cp30 Stunnel Http for mw8 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.356 second response time [20:45:38] PROBLEM - mwtask1 Puppet on mwtask1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner] [20:46:42] RECOVERY - cp21 Stunnel Http for mw8 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 0.053 second response time [20:47:17] RECOVERY - cp20 Stunnel Http for mw8 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 15860 bytes in 2.139 second response time [20:48:18] PROBLEM - mw8 Puppet on mw8 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner] [20:49:11] and GitHub looks like it is experiencing problems [20:49:36] PROBLEM - cloud4 Current Load on cloud4 is CRITICAL: CRITICAL - load average: 24.68, 21.45, 18.46 [20:50:34] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.48, 3.53, 3.29 [20:51:13] PROBLEM - mw12 Current Load on mw12 is CRITICAL: CRITICAL - load average: 8.31, 7.79, 6.01 [20:51:33] PROBLEM - cloud4 Current Load on cloud4 is WARNING: WARNING - load average: 21.68, 22.52, 19.30 [20:51:47] PROBLEM - jobchron1 Puppet on jobchron1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner] [20:53:08] PROBLEM - mw12 Current Load on mw12 is WARNING: WARNING - load average: 6.92, 7.59, 6.16 [20:54:25] PROBLEM - mon2 Current Load on mon2 is WARNING: WARNING - load average: 3.56, 3.57, 3.38 [20:54:31] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_dns] [20:55:27] RECOVERY - cloud4 Current Load on cloud4 is OK: OK - load average: 17.73, 19.87, 18.93 [20:56:50] oh [20:56:59] RECOVERY - mw12 Current Load on mw12 is OK: OK - load average: 5.54, 6.58, 6.06 [20:57:05] PROBLEM - mw13 Puppet on mw13 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner] [20:57:38] PROBLEM - gluster3 Current Load on gluster3 is WARNING: WARNING - load average: 4.78, 5.36, 4.93 [20:57:44] RECOVERY - cp20 Stunnel Http for mw11 on cp20 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.013 second response time [20:57:47] RECOVERY - cp21 Stunnel Http for mw11 on cp21 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.016 second response time [20:57:56] PROBLEM - mw9 Puppet on mw9 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner] [20:58:04] RECOVERY - cp31 Stunnel Http for mw11 on cp31 is OK: HTTP OK: HTTP/1.1 200 OK - 15869 bytes in 0.326 second response time [20:58:16] RECOVERY - cp30 Stunnel Http for mw11 on cp30 is OK: HTTP OK: HTTP/1.1 200 OK - 15861 bytes in 0.332 second response time [20:58:16] PROBLEM - mon2 Current Load on mon2 is CRITICAL: CRITICAL - load average: 4.07, 3.83, 3.52 [20:58:30] RECOVERY - cp20 Varnish Backends on cp20 is OK: All 9 backends are healthy [20:59:57] PROBLEM - mw11 Puppet on mw11 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 2 minutes ago with 4 failures. Failed resources (up to 3 shown): Exec[git_pull_JobRunner],Exec[git_pull_MediaWiki config],Exec[git_pull_landing],Exec[git_pull_ErrorPages] [20:59:59] RECOVERY - cp21 Varnish Backends on cp21 is OK: All 9 backends are healthy [21:00:10] RECOVERY - cp31 Varnish Backends on cp31 is OK: All 9 backends are healthy [21:00:12] RECOVERY - mon2 Current Load on mon2 is OK: OK - load average: 3.13, 3.33, 3.36 [21:00:15] RECOVERY - cp30 Varnish Backends on cp30 is OK: All 9 backends are healthy [21:03:01] PROBLEM - test3 Puppet on test3 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 2 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki config],Exec[git_pull_landing],Exec[git_pull_ErrorPages] [21:03:29] RECOVERY - gluster3 Current Load on gluster3 is OK: OK - load average: 4.47, 4.61, 4.75 [21:04:56] PROBLEM - phab2 Puppet on phab2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_phabricator-extensions] [21:14:16] RECOVERY - mw8 Puppet on mw8 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:14:45]