[00:00:07] PROBLEM - cp26 HTTP 4xx/5xx ERROR Rate on cp26 is UNKNOWN: UNKNOWN - NGINX Error Rate is UNKNOWN [00:00:22] PROBLEM - cp27 HTTP 4xx/5xx ERROR Rate on cp27 is UNKNOWN: UNKNOWN - NGINX Error Rate is UNKNOWN [00:02:02] PROBLEM - cp26 HTTP 4xx/5xx ERROR Rate on cp26 is CRITICAL: CRITICAL - NGINX Error Rate is 100% [00:02:22] PROBLEM - cp27 HTTP 4xx/5xx ERROR Rate on cp27 is CRITICAL: CRITICAL - NGINX Error Rate is 100% [00:03:45] RECOVERY - cp36 Disk Space on cp36 is OK: DISK OK - free space: / 21694MiB (24% inode=98%); [00:05:25] RECOVERY - cp37 Disk Space on cp37 is OK: DISK OK - free space: / 21682MiB (24% inode=98%); [00:07:47] PROBLEM - cp26 HTTP 4xx/5xx ERROR Rate on cp26 is WARNING: WARNING - NGINX Error Rate is 57% [00:09:42] PROBLEM - cp26 HTTP 4xx/5xx ERROR Rate on cp26 is CRITICAL: CRITICAL - NGINX Error Rate is 65% [00:18:41] PROBLEM - changeprop151 changeprop on changeprop151 is CRITICAL: connect to address 10.0.15.148 and port 7200: Connection refused [00:28:56] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/377845a00041...0269a59738a8 [00:28:59] [02miraheze/puppet] 07Universal-Omega 030269a59 - Fix [00:29:34] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/0269a59738a8...57721e2ed1fe [00:29:37] [02miraheze/puppet] 07Universal-Omega 0357721e2 - Revert [00:30:06] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/57721e2ed1fe...0fe1a2f349e1 [00:30:08] [02miraheze/puppet] 07Universal-Omega 030fe1a2f - Add check if composer is already defined [00:30:36] [Grafana] !sre FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:31:35] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/0fe1a2f349e1...b597867b8652 [00:31:37] [02miraheze/puppet] 07Universal-Omega 03b597867 - Fix [00:32:48] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/b597867b8652...4b747af145b6 [00:32:50] [02miraheze/puppet] 07Universal-Omega 034b747af - Fix [00:35:36] [Grafana] !sre RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:38:01] PROBLEM - test151 JobRunner Service on test151 is CRITICAL: PROCS CRITICAL: 0 processes with args 'redisJobRunnerService' [00:38:29] [02miraheze/dns] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/dns/compare/58d812a6d3c5...76d5bd344f0b [00:38:32] [02miraheze/dns] 07Universal-Omega 0376d5bd3 - Set jobrunner to test151 for now [00:38:49] RECOVERY - jobchron171 Puppet on jobchron171 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:41:44] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/4b747af145b6...78b1659acf88 [00:41:47] [02miraheze/puppet] 07Universal-Omega 0378b1659 - Add rpc [00:45:11] PROBLEM - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is WARNING: NoNameservers: All nameservers failed to answer the query wiki.andreijiroh.uk.eu.org. IN CNAME: Server 2606:4700:4700::1111 UDP port 53 answered SERVFAIL [00:46:06] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/78b1659acf88...39d74aa65cff [00:46:09] [02miraheze/puppet] 07Universal-Omega 0339d74aa - jobrunner: add PHP FPM socket [00:52:43] PROBLEM - test151 Puppet on test151 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 5 minutes ago with 3 failures [00:53:12] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/39d74aa65cff...b0444fe1fb0f [00:53:14] [02miraheze/puppet] 07Universal-Omega 03b0444fe - jobrunner: fix socket [00:58:46] [02miraheze/puppet] 07Universal-Omega created branch 03port 13https://github.com/miraheze/puppet/commit/39d74aa65cff148138d48d9d5a41d40981e7d9ea [00:58:49] [02puppet] 07Universal-Omega created branch 03port - 13https://github.com/miraheze/puppet [01:03:06] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03port [+0/-0/±4] 13https://github.com/miraheze/puppet/compare/39d74aa65cff...fe8caab97486 [01:03:09] [02miraheze/puppet] 07Universal-Omega 03fe8caab - jobrunner: remove default ports from apache2 [01:03:30] [02puppet] 07Universal-Omega opened pull request 03#3789: jobrunner: remove default ports from apache2 - 13https://github.com/miraheze/puppet/pull/3789 [01:03:55] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03port [+0/-0/±4] 13https://github.com/miraheze/puppet/compare/fe8caab97486...e01c908e8a6f [01:03:57] [02miraheze/puppet] 07Universal-Omega 03e01c908 - jobrunner: remove default ports from apache2 [01:04:00] [02puppet] 07Universal-Omega synchronize pull request 03#3789: jobrunner: remove default ports from apache2 - 13https://github.com/miraheze/puppet/pull/3789 [01:04:32] [02puppet] 07Universal-Omega closed pull request 03#3789: jobrunner: remove default ports from apache2 - 13https://github.com/miraheze/puppet/pull/3789 [01:04:34] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±4] 13https://github.com/miraheze/puppet/compare/b0444fe1fb0f...9d462417316e [01:04:37] [02miraheze/puppet] 07Universal-Omega 039d46241 - jobrunner: remove default ports from apache2 [01:04:39] [02miraheze/puppet] 07Universal-Omega deleted branch 03port [01:04:41] [02puppet] 07Universal-Omega deleted branch 03port - 13https://github.com/miraheze/puppet [01:06:22] PROBLEM - test151 Puppet on test151 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 12 seconds ago with 3 failures. Failed resources (up to 3 shown): Exec[git_clone_MediaWiki-REL1_41 SemanticMediaWiki],Exec[git_checkout_MediaWiki-REL1_41 UnlinkedWikibase],Exec[git_clone_MediaWiki-master SemanticMediaWiki] [01:07:50] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/9d462417316e...6cf51de70491 [01:07:51] [02miraheze/puppet] 07Universal-Omega 036cf51de - Fix [01:08:07] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/6cf51de70491...035d29195b95 [01:08:08] [02miraheze/puppet] 07Universal-Omega 03035d291 - changeprop: remove realm [01:09:48] [02miraheze/mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/mw-config/compare/1313e602da89...73841051e453 [01:09:50] [02miraheze/mw-config] 07Universal-Omega 037384105 - beta: enable EventBus JobQueue [01:10:40] miraheze/mw-config - Universal-Omega the build passed. [01:14:24] PROBLEM - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - wiki.andreijiroh.uk.eu.org All nameservers failed to answer the query. [01:16:21] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/035d29195b95...17f186e1736b [01:16:23] [02miraheze/puppet] 07Universal-Omega 0317f186e - kafka: add changeprop to firewall [01:26:29] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/17f186e1736b...eba2e9c2cd79 [01:26:30] [02miraheze/puppet] 07Universal-Omega 03eba2e9c - jobrunner: allow changeprop to access port 9006 [01:43:41] PROBLEM - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is WARNING: LifetimeTimeout: The resolution lifetime expired after 5.403 seconds: Server 2606:4700:4700::1111 UDP port 53 answered The DNS operation timed out.; Server 2606:4700:4700::1111 UDP port 53 answered The DNS operation timed out.; Server 2606:4700:4700::1111 UDP port 53 answered The DNS operation timed out. [01:48:27] PROBLEM - changeprop151 ferm_active on changeprop151 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [02:05:49] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/eba2e9c2cd79...c053c971d7b7 [02:05:51] [02miraheze/puppet] 07Universal-Omega 03c053c97 - changeprop: require libssl1.1 [02:07:04] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/c053c971d7b7...fca0c80fb48b [02:07:07] [02miraheze/puppet] 07Universal-Omega 03fca0c80 - eventgate: require libssl1.1 [02:12:55] PROBLEM - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - wiki.andreijiroh.uk.eu.org All nameservers failed to answer the query. [02:13:45] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/fca0c80fb48b...6f297b56ad74 [02:13:46] [02miraheze/puppet] 07Universal-Omega 036f297b5 - Fix some comments [02:15:36] [Grafana] !sre FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:28:31] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/6f297b56ad74...e0446c1fc79b [02:28:32] [02miraheze/puppet] 07Universal-Omega 03e0446c1 - Temp remove low_traffic_jobs [02:29:39] RECOVERY - changeprop151 changeprop on changeprop151 is OK: TCP OK - 0.000 second response time on 10.0.15.148 port 7200 [03:00:36] [Grafana] !sre RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:06:27] RECOVERY - changeprop151 ferm_active on changeprop151 is OK: OK ferm input default policy is set [03:06:29] PROBLEM - swiftobject181 ferm_active on swiftobject181 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [03:07:02] PROBLEM - swiftobject171 ferm_active on swiftobject171 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [03:07:31] PROBLEM - swiftproxy171 ferm_active on swiftproxy171 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [03:08:06] PROBLEM - swiftproxy161 ferm_active on swiftproxy161 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [03:11:02] RECOVERY - swiftobject171 ferm_active on swiftobject171 is OK: OK ferm input default policy is set [03:11:31] RECOVERY - swiftproxy171 ferm_active on swiftproxy171 is OK: OK ferm input default policy is set [03:12:06] RECOVERY - swiftproxy161 ferm_active on swiftproxy161 is OK: OK ferm input default policy is set [03:12:29] RECOVERY - swiftobject181 ferm_active on swiftobject181 is OK: OK ferm input default policy is set [05:13:35] [02dns] 07Universal-Omega closed pull request 03#501: Add volunteerforukraine.com zone - 13https://github.com/miraheze/dns/pull/501 [05:13:36] [02miraheze/dns] 07Universal-Omega pushed 031 commit to 03master [+1/-0/±0] 13https://github.com/miraheze/dns/compare/76d5bd344f0b...b0459545c7c8 [05:13:38] [02miraheze/dns] 07MacFan4000 03b045954 - Add volunteerforukraine.com zone (#501) [06:11:24] PROBLEM - swiftac171 Current Load on swiftac171 is CRITICAL: LOAD CRITICAL - total load average: 24.24, 11.77, 5.76 [06:13:23] RECOVERY - swiftac171 Current Load on swiftac171 is OK: LOAD OK - total load average: 5.90, 8.88, 5.40 [06:16:02] !log [reception@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/deleteBatch.php --wiki=herotvdatabasewiki /home/reception/herotvdatabasedel.txt --r=Requested at T11961 (START) [06:16:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [08:32:51] PROBLEM - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is WARNING: NoNameservers: All nameservers failed to answer the query uk.eu.org. IN NS: Server 2606:4700:4700::1111 UDP port 53 answered SERVFAIL [09:02:06] PROBLEM - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - wiki.andreijiroh.uk.eu.org All nameservers failed to answer the query. [09:31:20] PROBLEM - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is WARNING: NoNameservers: All nameservers failed to answer the query wiki.andreijiroh.uk.eu.org. IN CNAME: Server 2606:4700:4700::1111 UDP port 53 answered SERVFAIL [10:21:25] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/importDump.php --wiki=themanaworldwiki --username-prefix=themanaworld /home/reception/themanaworld.xml --report 1 (END - exit=2) [10:21:30] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/importDump.php --wiki=themanaworldwiki --username-prefix=themanaworld /home/reception/themanaworld.xml --report 1 (START) [10:21:33] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [10:21:40] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [10:29:49] RECOVERY - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is OK: SSL OK - wiki.andreijiroh.uk.eu.org reverse DNS resolves to cp36.wikitide.net - CNAME OK [10:39:31] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/importDump.php --wiki=themanaworldwiki --username-prefix=themanaworld /home/reception/themanaworld.xml --report 1 (END - exit=2) [10:39:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [10:39:39] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/importDump.php --wiki=themanaworldwiki --username-prefix=themanaworld /home/reception/themanaworld.xml --report 1 --no-updates (START) [10:39:47] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [10:46:09] PROBLEM - ping6 on cp51 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 178.90 ms [10:50:17] RECOVERY - ping6 on cp51 is OK: PING OK - Packet loss = 0%, RTA = 166.90 ms [11:27:17] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/importDump.php --wiki=themanaworldwiki --username-prefix=themanaworld /home/reception/themanaworld.xml --report 1 --no-updates (END - exit=0) [11:27:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:27:39] 😳 [11:27:42] it's done? [11:29:44] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/rebuildall.php --wiki=themanaworldwiki (START) [11:29:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:36:49] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/rebuildall.php --wiki=themanaworldwiki (END - exit=0) [11:36:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:15:09] PROBLEM - wiki.mcjones.gay - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - wiki.mcjones.gay All nameservers failed to answer the query. [13:44:29] RECOVERY - wiki.mcjones.gay - reverse DNS on sslhost is OK: SSL OK - wiki.mcjones.gay reverse DNS resolves to cp37.wikitide.net - CNAME OK [13:56:25] PROBLEM - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - wiki.andreijiroh.uk.eu.org All nameservers failed to answer the query. [14:25:38] PROBLEM - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is WARNING: NoNameservers: All nameservers failed to answer the query wiki.andreijiroh.uk.eu.org. IN CNAME: Server 2606:4700:4700::1111 UDP port 53 answered SERVFAIL [14:40:35] [Grafana] !sre FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [15:12:56] PROBLEM - ping6 on cp51 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 168.39 ms [15:15:00] RECOVERY - ping6 on cp51 is OK: PING OK - Packet loss = 0%, RTA = 168.94 ms [15:29:46] [02CreateWiki] 07redbluegreenhat synchronize pull request 03#487: T10683: Expose the canned responses to JavaScript - 13https://github.com/miraheze/CreateWiki/pull/487 [15:37:55] miraheze/CreateWiki - redbluegreenhat the build passed. [15:40:21] [02CreateWiki] 07redbluegreenhat closed pull request 03#487: T10683: Expose the canned responses to JavaScript - 13https://github.com/miraheze/CreateWiki/pull/487 [15:40:22] [02miraheze/CreateWiki] 07redbluegreenhat pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/CreateWiki/compare/3aea2b9ddada...fe70426a35b1 [15:40:23] [02miraheze/CreateWiki] 07redbluegreenhat 03fe70426 - T10683: Expose the canned responses to JavaScript (#487) [15:40:44] !log [alex@test151] starting deploy of {'versions': '1.41', 'upgrade_extensions': 'CreateWiki'} to test151 [15:40:45] !log [alex@test151] finished deploy of {'versions': '1.41', 'upgrade_extensions': 'CreateWiki'} to test151 - SUCCESS in 1s [15:40:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:40:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:46:26] miraheze/CreateWiki - redbluegreenhat the build has errored. [15:53:20] PROBLEM - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - wiki.andreijiroh.uk.eu.org All nameservers failed to answer the query. [16:08:06] [02CreateWiki] 07redbluegreenhat opened pull request 03#488: Reorder `use` statements - 13https://github.com/miraheze/CreateWiki/pull/488 [16:08:11] [02CreateWiki] 07redbluegreenhat closed pull request 03#488: Reorder `use` statements - 13https://github.com/miraheze/CreateWiki/pull/488 [16:08:13] [02miraheze/CreateWiki] 07redbluegreenhat pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/CreateWiki/compare/fe70426a35b1...190fcc59d0ca [16:08:15] [02miraheze/CreateWiki] 07redbluegreenhat 03190fcc5 - Reorder `use` statements (#488) [16:08:27] !log [alex@test151] starting deploy of {'versions': '1.41', 'upgrade_extensions': 'CreateWiki'} to test151 [16:08:28] !log [alex@test151] finished deploy of {'versions': '1.41', 'upgrade_extensions': 'CreateWiki'} to test151 - SUCCESS in 1s [16:08:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:08:43] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:10:58] PROBLEM - ping6 on cp51 is CRITICAL: PING CRITICAL - Packet loss = 37%, RTA = 170.16 ms [16:13:02] RECOVERY - ping6 on cp51 is OK: PING OK - Packet loss = 0%, RTA = 193.29 ms [16:16:01] miraheze/CreateWiki - redbluegreenhat the build passed. [16:16:05] miraheze/CreateWiki - redbluegreenhat the build passed. [16:22:34] RECOVERY - wiki.andreijiroh.uk.eu.org - reverse DNS on sslhost is OK: SSL OK - wiki.andreijiroh.uk.eu.org reverse DNS resolves to cp37.wikitide.net - CNAME OK [16:35:35] [Grafana] !sre RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [16:54:29] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/e0446c1fc79b...95066bd2d972 [16:54:32] [02miraheze/puppet] 07Universal-Omega 0395066bd - nginx: fix [16:55:23] !log [@mwtask181] starting deploy of {'config': True} to all [16:55:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:55:33] !log [@mwtask181] finished deploy of {'config': True} to all - SUCCESS in 9s [16:55:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:59:39] PROBLEM - ping6 on cp51 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 173.55 ms [17:03:48] RECOVERY - ping6 on cp51 is OK: PING OK - Packet loss = 0%, RTA = 174.07 ms [17:06:51] !log [@mwtask171] starting deploy of {'config': True} to all [17:06:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:06:59] !log [@mwtask171] finished deploy of {'config': True} to all - SUCCESS in 8s [17:07:07] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:25:38] !log [universalomega@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/importDump.php --wiki=sagan4alphawiki --no-updates s4awiki.xml (START) [17:25:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:32:34] [02mw-config] 07songnguxyz opened pull request 03#5511: Update ManageWikiSettings.php - 13https://github.com/miraheze/mw-config/pull/5511 [17:32:54] [02mw-config] 07songnguxyz edited pull request 03#5511: Adding function for UnlinkedWikibase - 13https://github.com/miraheze/mw-config/pull/5511 [17:33:34] miraheze/mw-config - songnguxyz the build passed. [17:44:32] PROBLEM - changeprop151 Puppet on changeprop151 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 24 minutes ago with 0 failures [17:47:39] PROBLEM - changeprop151 changeprop on changeprop151 is CRITICAL: connect to address 10.0.15.148 and port 7200: Connection refused [17:53:39] RECOVERY - changeprop151 changeprop on changeprop151 is OK: TCP OK - 0.000 second response time on 10.0.15.148 port 7200 [17:54:48] PROBLEM - wiki.gab.pt.eu.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.gab.pt.eu.org' expires in 15 day(s) (Tue 09 Apr 2024 05:31:08 PM GMT +0000). [17:57:39] PROBLEM - changeprop151 changeprop on changeprop151 is CRITICAL: connect to address 10.0.15.148 and port 7200: Connection refused [18:13:39] RECOVERY - changeprop151 changeprop on changeprop151 is OK: TCP OK - 0.000 second response time on 10.0.15.148 port 7200 [18:19:51] PROBLEM - test151 Puppet on test151 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 25 minutes ago with 3 failures [18:20:01] PROBLEM - test151 ferm_active on test151 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [18:29:42] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/95066bd2d972...35b8915caaf5 [18:29:45] [02miraheze/puppet] 07Universal-Omega 0335b8915 - kafka: set offsets.topic.replication.factor for broker to 1 [18:34:32] RECOVERY - changeprop151 Puppet on changeprop151 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [18:40:57] [02mw-config] 07redbluegreenhat opened pull request 03#5512: Setup wgPrivilegedGroups - 13https://github.com/miraheze/mw-config/pull/5512 [18:41:44] miraheze/mw-config - redbluegreenhat the build has errored. [18:41:57] [02mw-config] 07redbluegreenhat synchronize pull request 03#5512: Setup wgPrivilegedGroups - 13https://github.com/miraheze/mw-config/pull/5512 [18:42:50] miraheze/mw-config - redbluegreenhat the build passed. [18:46:32] PROBLEM - changeprop151 Puppet on changeprop151 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 12 minutes ago with 0 failures [18:50:50] PROBLEM - changeprop151 changeprop on changeprop151 is CRITICAL: connect to address 10.0.15.148 and port 7200: Connection refused [18:54:22] [02miraheze/puppet] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/puppet/compare/35b8915caaf5...527d4b3db5c4 [18:54:23] [02miraheze/puppet] 07Universal-Omega 03527d4b3 - kafka: add some topics config [18:56:42] RECOVERY - changeprop151 changeprop on changeprop151 is OK: TCP OK - 0.000 second response time on 10.0.15.148 port 7200 [19:00:34] PROBLEM - changeprop151 changeprop on changeprop151 is CRITICAL: connect to address 10.0.15.148 and port 7200: Connection refused [19:06:25] RECOVERY - changeprop151 changeprop on changeprop151 is OK: TCP OK - 0.000 second response time on 10.0.15.148 port 7200 [19:18:53] PROBLEM - kafka181 Puppet on kafka181 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 7 minutes ago with 0 failures [19:28:57] !log [universalomega@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/importDump.php --wiki=sagan4alphawiki --no-updates s4awiki.xml (END - exit=0) [19:29:05] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:33:41] PROBLEM - changeprop151 changeprop on changeprop151 is CRITICAL: connect to address 10.0.15.148 and port 7200: Connection refused [19:35:02] !log [universalomega@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/rebuildall.php --wiki=sagan4alphawiki (START) [19:35:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:35:39] RECOVERY - changeprop151 changeprop on changeprop151 is OK: TCP OK - 0.000 second response time on 10.0.15.148 port 7200 [19:41:39] PROBLEM - changeprop151 changeprop on changeprop151 is CRITICAL: connect to address 10.0.15.148 and port 7200: Connection refused [19:43:39] RECOVERY - changeprop151 changeprop on changeprop151 is OK: TCP OK - 0.000 second response time on 10.0.15.148 port 7200 [19:57:57] PROBLEM - wiki.aetherexplorers.science - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for wiki.aetherexplorers.science could not be found [20:23:39] PROBLEM - changeprop151 changeprop on changeprop151 is CRITICAL: connect to address 10.0.15.148 and port 7200: Connection refused [20:31:39] RECOVERY - changeprop151 changeprop on changeprop151 is OK: TCP OK - 0.000 second response time on 10.0.15.148 port 7200 [20:33:11] Good afternoon, is anyone around to help? [20:34:13] Anyone? [20:35:35] [Grafana] !sre FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [20:35:39] PROBLEM - changeprop151 changeprop on changeprop151 is CRITICAL: connect to address 10.0.15.148 and port 7200: Connection refused [20:36:22] ? [20:37:05] I need a little help with my page. I would like to be able to upload multiple files at a time, is there a way for me to do so? [20:38:13] This isn't really the channel ask but I believe yes there's options available, #miraheze is probably where you want to go, most of the support people hang over there here on IRC/Discord [20:38:37] All right, thank you [20:39:01] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 21.22, 14.20, 7.74 [20:39:21] PROBLEM - cp36 HTTPS on cp36 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [20:39:23] PROBLEM - cp36 HTTP 4xx/5xx ERROR Rate on cp36 is CRITICAL: CRITICAL - NGINX Error Rate is 72% [20:39:32] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 23.49, 15.67, 8.62 [20:40:08] PROBLEM - cp37 Current Load on cp37 is WARNING: LOAD WARNING - total load average: 6.96, 4.63, 2.56 [20:40:13] PROBLEM - cp26 HTTPS on cp26 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [20:40:14] PROBLEM - mw151 HTTPS on mw151 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [20:40:19] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 4 backends are down. mw152 mw161 mw162 mw181 [20:40:25] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 5 backends are down. mw151 mw152 mw162 mw172 mw182 [20:40:28] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 2 backends are down. mw162 mw181 [20:40:32] PROBLEM - cp36 Varnish Backends on cp36 is CRITICAL: 5 backends are down. mw151 mw152 mw161 mw162 mw182 [20:40:38] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 2 backends are down. mw171 mw182 [20:40:38] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 20.61, 16.96, 9.36 [20:40:41] Icinga sure loves to complain [20:40:48] PROBLEM - cp37 Varnish Backends on cp37 is CRITICAL: 1 backends are down. mw172 [20:41:01] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 15.95, 15.10, 8.90 [20:41:14] PROBLEM - cp37 Disk Space on cp37 is WARNING: DISK WARNING - free space: / 8879MiB (10% inode=98%); [20:41:21] RECOVERY - cp36 HTTPS on cp36 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3615 bytes in 0.070 second response time [20:41:22] RECOVERY - cp36 HTTP 4xx/5xx ERROR Rate on cp36 is OK: OK - NGINX Error Rate is 2% [20:41:32] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 9.42, 13.54, 8.73 [20:42:08] RECOVERY - cp37 Current Load on cp37 is OK: LOAD OK - total load average: 3.07, 4.01, 2.59 [20:42:13] RECOVERY - cp26 HTTPS on cp26 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3635 bytes in 0.917 second response time [20:42:13] PixDeVl: thats its job [20:42:14] PROBLEM - cp36 Disk Space on cp36 is WARNING: DISK WARNING - free space: / 8515MiB (9% inode=98%); [20:42:15] RECOVERY - mw151 HTTPS on mw151 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 285 bytes in 0.063 second response time [20:42:19] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [20:42:26] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [20:42:28] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [20:42:32] RECOVERY - cp36 Varnish Backends on cp36 is OK: All 19 backends are healthy [20:42:38] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [20:42:38] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 6.07, 12.71, 8.73 [20:42:48] RECOVERY - cp37 Varnish Backends on cp37 is OK: All 19 backends are healthy [20:42:57] True. Though normally I look in here and it's complaining while everything seem just fine [20:43:50] PixDeVl: thats also its job [20:43:58] If users notice first, icinga gets told off [20:44:07] We want monitoring to detect as early as possible [20:44:29] PROBLEM - eventgate181 Puppet on eventgate181 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 15 minutes ago with 0 failures [20:44:52] RhinosF1 hmmmmm. yup checks out [20:45:39] RECOVERY - changeprop151 changeprop on changeprop151 is OK: TCP OK - 0.000 second response time on 10.0.15.148 port 7200 [21:12:27] PROBLEM - kafka181 ferm_active on kafka181 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [21:16:54] job queue kinda funky [21:17:12] over 5000 unclaimed jobs [21:19:20] Oh dear [21:19:30] I thought it was 500? [21:19:41] no, 500_0_ [21:20:03] cause seems to be the recent import of sagan4alphawiki [21:20:09] also whats evergate181? I don't see a Meta page for it [21:20:30] new tech that I don't even know about [21:20:34] PixDeVL: Kafka is not called kafka for nothing [21:20:36] has to do with Kafka [21:21:47] I'm going to babysit the jobqueue for a while, see if I can push the numbers down [21:22:30] https://tenor.com/view/kafka-honkai-star-rail-gif-27232678 [21:22:47] I know what you mean by Kafka but that character always comes to mind when it's mentioned lol [21:24:10] benshin impact but SPACE [21:26:23] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/runJobs.php --wiki=sagan4alphawiki (START) [21:26:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:33:47] Impossible [21:36:32] RECOVERY - changeprop151 Puppet on changeprop151 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:36:35] RECOVERY - eventgate181 Puppet on eventgate181 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:36:53] RECOVERY - kafka181 Puppet on kafka181 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:39:34] Orange_Star: it's been doing fairly well at recovering recently [21:39:51] PROBLEM - test151 Puppet on test151 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 2 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_clone_MediaWiki-REL1_41 SemanticMediaWiki],Exec[git_checkout_MediaWiki-REL1_41 UnlinkedWikibase],Exec[git_clone_MediaWiki-master SemanticMediaWiki] [21:40:01] until it faces an import involving a SMW wiki it seems [21:40:03] Orange_Star: am I going blind https://grafana.miraheze.org/ [21:40:08] yes [21:40:09] Why that no work [21:40:15] grafana.wikitide.net [21:40:28] it is now only available for SRE members too [21:40:36] Oh [21:40:39] not public like it used to be [21:40:40] That's shit [21:40:51] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/runJobs.php --wiki=sagan4alphawiki (END - exit=65280) [21:40:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:41:01] How am I supposed to sigh at the job queue [21:41:16] There is no reason for grafana to be private [21:41:20] with vague "over 500 jobs" alerts hehe [21:41:30] also ran out of memory :/ [21:41:36] !log [alex@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/runJobs.php --wiki=sagan4alphawiki (START) [21:41:39] Oh it's one of them jobs [21:41:41] They are fun [21:41:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:42:01] Once you set the memory right, it should go happy quickly [21:42:36] Making grafana private is a big back step [21:42:58] We now have no public monitoring except what's relayed to irc [21:43:25] Cosmic says he will fix [21:43:59] I'm getting dejavu, running runJobs.php for this wiki [21:44:03] I will fix Grafana yeah it was an accident. Sorry... [21:44:07] brings back memories, I rememver doing this before [21:44:25] > Making grafana private is a big back step [21:44:25] > We now have no public monitoring except what's relayed to irc [21:44:26] agreed [21:44:40] dmehus: I don't need a parrot [21:44:40] seconded [21:44:48] Cosmic has agreed to fix [21:45:08] thanks, CosmicAlpha [21:45:09] Orange_Star: try increasing the memory on the cli script [21:45:27] let's see if it happens again and I'll adjust [21:45:27] RhinosF1: okay, sorry, not what I was intending [21:46:06] Orange_Star: we are not far off Kafka [21:46:14] yeah [21:46:16] Which should make this a whole different kettle of fish [21:46:21] I never thought I would see this tbh [21:46:30] Hopefully a slightly more resilient kettle of fish [21:46:51] Cause at the moment, when the JobQueue does explode, it doesn't recover without help [21:47:05] WikiTide's servers have a much high barrier before they go bang though [21:47:13] I remember the good old days of RamNode [21:47:30] It was fairly easy to trigger the JobQueue into a tantrum then [21:47:37] Didn't MH used to be hosted on VPSs on OVH tho? [21:47:49] or at least that's what I remember reading [21:47:53] Ramnode is before OVH [21:48:21] Ramnode -> OVLON -> SCSVG -> Whatever new one is [21:48:32] Ramnode was pre 6 charecter DC names [21:48:33] SLCFS iirc? [21:48:44] Salt Lake City Fiber State [21:48:57] FSSCL [21:49:00] Other way around [21:49:10] OVh LONdon [21:49:10] what [21:49:14] FSSLC [21:49:22] Server Choice SteVenaGe [21:49:27] how do you pronounce that [21:49:33] With difficulty [21:49:36] You dont [21:49:44] OVLON is the only nice one to pronounce [21:49:53] yeah [21:49:55] Fssss-l-k [21:50:42] we're now at 12000 jobs ladies and gentlemen [21:51:02] PROBLEM - test151 MediaWiki Rendering on test151 is UNKNOWN: HTTP UNKNOWN: Failed to unchunk message body [21:51:05] Ramnode would be RNAMS [21:51:07] I guess [21:51:15] If we'd have had 6 chars then [21:51:23] RNAMS & OVLON are nice names [21:51:35] Is- is that higher or lower? [21:51:37] We've been unlucky [21:51:37] PROBLEM - test151 HTTPS on test151 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 500 [21:51:41] Lower [21:51:51] Most of WMF's DC names are pronounceable [21:52:01] yeah [21:52:08] codfw isn't really [21:52:11] I can’t remember their spelling tho [21:52:17] drmrs is officially dreamers [21:53:08] Eqiad is nice, Esams is easy, ulsfo kinda works, eqsin meh [21:53:59] KNAMS works too [21:54:16] But I can't pronounce SCSVG or FSSLC as a word [21:54:22] FSSLC sounds awful [21:54:31] And SCSVG is just not workable [21:54:32] PROBLEM - changeprop151 Puppet on changeprop151 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 4 minutes ago with 0 failures [21:54:36] !log [universalomega@mwtask181] sudo -u www-data php /srv/mediawiki/1.41/maintenance/run.php /srv/mediawiki/1.41/maintenance/rebuildall.php --wiki=sagan4alphawiki (END - exit=0) [21:54:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:54:58] Oh that makes sense [21:55:07] Orange_Star: ^ is why jobs is high [21:55:15] yeah, I figured [21:55:16] Now that's not running, it will go down [21:55:23] Jobs are weird [21:55:24] hopefully [21:55:24] _chuckles in redis_ [21:55:30] They can spike very high before they drop [21:55:39] Because jobs spawn jobs that spawn jobs [21:55:48] Then they all run really quick [21:55:54] Kaftka moment [21:55:58] unless it's htmlCacheUpdate [21:56:02] that one is sure taking its time [21:56:08] jobceptio [21:56:11] It's probably memory issues [21:56:13] n [21:56:19] We haven't had them for a while [21:56:31] refreshLinks used to be a bit leaky [21:56:55] that some nice memories huh [21:57:02] PROBLEM - test151 MediaWiki Rendering on test151 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 3229 bytes in 0.465 second response time [21:57:13] Orange_Star: nice isn't the word I'd use [21:57:21] I was being sarcastic [21:57:32] But when we decom jobchron, let's say it'll have a special place in the hearts of me and Zppix [21:57:34] has it been fixed since then? [21:57:39] Mitigated [21:58:27] @zppix [21:59:02] RECOVERY - test151 MediaWiki Rendering on test151 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.375 second response time [21:59:15]