[00:57:35] Hey guys, I've been working on moving the iabot tool over to it's own VM, and while I have it set up to run, I'm hitting these odd 502 bad gateway errors on iabot.wmcloud.org. Half the time it loads, the other half it doesn't. I'm not getting a single error on php-fpm, or nginx on the VM. Hammering it on the VM itself yields only 200 OKs. I have to conclude there might be an issue with the web proxy itself. Might be a [00:57:35] misconfiguration on my part, but was wondering if someone can help me out on this. [01:21:38] Nevermind, I simply recreated the proxy in Horizon and the problem magically disappeared. [07:22:10] lets see [07:22:41] !log tools.krinklebot $ sudo qdel 2153961 [07:22:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.krinklebot/SAL [07:23:07] hm, this time with -f [07:23:13] > root forced the deletion of job 2153961 [07:27:28] AntiComposite, Krinkle: successfully whacked. [07:46:19] Are there known problems with api.svc.tools.eqiad1.wikimedia.cloud ? [07:47:31] I'm just running from Toolforge "toolforge jobs restart " and I get a timeout there from Python toolforge_weld client [07:52:02] !log tools.itwiki Run 'jobs restart itwiki-orphanizerbot' but get a timeout from api.svc.tools.eqiad1.wikimedia.cloud, multiple times. But worked. [07:52:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.itwiki/SAL [07:52:17] !log tools.wudele Webservice restart [07:52:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wudele/SAL [07:52:40] valerio-bozzolan: I think the issue is more with the timeout being too slow for some of the operations, rather than the api being too slow itself. I'll get that fixed [07:54:10] Yes it may be just that, since I've seen the timeout for sure after just few seconds (<30 for sure) [09:33:33] Hey, are those 504s related to the recent outages? Do they just need a restart? https://ipcheck.toolforge.org and https://whois-referral.toolforge.org [09:36:32] !log tools.ipcheck webservice restart [09:36:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ipcheck/SAL [09:37:07] !log tools.whois-referral webservice restart [09:37:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.whois-referral/SAL [09:37:28] Titore: gave them a poke, looks like it worked :) [09:38:12] Yep, thanks! [10:12:10] !log tools deploy bulids-api 0.0.96 [10:12:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:16:37] !log toolsbeta deploy bulids-api 0.0.96 [10:16:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:02:03] legoktm: `HTTP status server error (502 Bad Gateway) for url (https://www.mediawiki.org/api/rest_v1/page/html/Project%20talk%3AMastodon?redirect=false)` for masto-collab, though https://www.mediawiki.org/api/rest_v1/page/html/Project%20talk%3AMastodon?redirect=false is 502ing so.. [12:02:53] (wrong channel, oops) [12:11:12] !log admin enable puppet on cloudservices1006 to drop local NAT hacks and enable new DNS auth IP address (T346042) [12:11:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:11:15] T346042: cloudservices1005: move to new setup - https://phabricator.wikimedia.org/T346042 [12:12:44] TheresNoTime: did you report it elsewhere? [12:13:07] legoktm: see -sre, restbase is b0rked or something [12:13:27] ty [12:13:46] * TheresNoTime will *manually* request a Wikipedia post /sigh [12:13:50] :p [14:40:55] !status DNS operations ongoing [14:42:05] !log admin DNS operation: route 208.80.154.148 to cloudservices1006 in anticipation of cloudservices1005 decom (T346042) [14:42:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:42:09] T346042: cloudservices1005: move to new setup - https://phabricator.wikimedia.org/T346042 [15:16:09] !log tools populating db credential envvars for tools that do not have them (T345742) [15:19:14] Temporary failure in name resolution in ... [15:19:14] [15:19:16] I am getting this error trying to access my user database [15:20:49] wm-bb: have you any more info on that? [15:21:00] what DNS lookup from where is failing? [15:28:45] Hi :) [15:29:09] mix'n'match is down again, catalogues are not loading in. could it be a server problem? [15:31:15] DNS? I am trying to access it from my tool account (re @wmtelegram_bot: what DNS lookup from where is failing?) [15:31:25] JonathanG: we just reverted a test move we had in place [15:31:33] can you advise if it's changed now? [15:32:23] wm-bb: thanks, I'm a bit of a cloud outsider so I don't fully understand what's broken [15:32:32] yes, it is working now. thanks! [15:32:34] "name resolution" indicates a DNS issue - and we were making some DNS changes [15:32:59] but I'm not sure what or where the "user database" may be in terms of what DNS names that relies on that might be causing the first error message [15:37:09] have a nice day, everyone! [15:37:15] and thanks for your work! [15:39:02] It is working now (re @wmtelegram_bot: "name resolution" indicates a DNS issue - and we were making some DNS changes) [15:39:31] wm-bb: ok thanks for confirming [16:10:47] !log admin DNS operation: add new DNS entry for ns0.openstack.eqiad1.wikimediacloud.org. on wikimedia authdns pointing to 185.15.56.162 [16:10:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:11:41] !log tools increasing secrets quota to 30 (T339916) [16:11:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:11:44] T339916: [envvars-api] Review and potentially increase the quota for secrets - https://phabricator.wikimedia.org/T339916 [16:12:41] !log admin DNS operation: remove old DNS entry for ns0.openstack.eqiad1.wikimediacloud.org. on wikimedia authdns (was pointing to 208.80.154.148) [16:12:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:32:36] Hi guys! I have a problem with this tool: https://mp.toolforge.org/ [16:32:37] Why does it say "This Grid Engine web service cannot be reached" and can't be reached? [16:32:38] It clearly says: [16:32:40] --> webservice status [16:32:41] Your webservice of type php7.4 is running on backend kubernetes [16:38:46] @EK_aka_EK: the grid engine sometimes thinks a tool is still running on it after it's been stopped. I can clear that, one second [16:39:10] taavi: nice error page ^.^ [16:40:41] https://mp.toolforge.org/ is back [16:49:38] thanks!! [17:45:08] Can an admin kill 2141868 for Geograph? It's supposed to killed automatically if it exceeds 24 hours, but that doesn't seem to have happened and manual kill doesn't work. [17:48:55] !log tools.geograph tools-sgegrid-master:~ $ sudo qdel -f 2141868 [17:48:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.geograph/SAL [17:54:07] Still stuck it seems (re @wmtelegram_bot: !log tools.geograph tools-sgegrid-master:~ $ sudo qdel -f 2141868)