[08:00:47] !log wikistats deploy latest code for T262148 [08:00:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL [08:00:51] T262148: remove referata table? - https://phabricator.wikimedia.org/T262148 [08:03:40] When I type in "webservice status" I get a whole lot of lines, something is wrong here. Please have a look [08:03:54] Wurgl: can you paste the error? [08:04:11] Syntax error in python [08:04:19] !log wikistats deploy latest code for T262148 again [08:04:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL [08:04:27] could not find expected ':' [08:04:35] in "", line 33, column 1: [08:04:52] Just the final two lines [08:04:59] Wurgl: maybe use https://paste.toolforge.org/ [08:05:35] which tool? [08:06:13] https://paste.toolforge.org/view/5181dee0 [08:07:03] that maybe a problem in toolforge_weld [08:07:15] try now? [08:07:24] !log wikistats drop referata table after dumping it [08:07:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL [08:07:47] thx [08:07:57] What was the reason? [08:08:54] T344289 [08:08:55] T344289: Corrupt $HOME/.kube/config preventing use of Kubernetes for wikiquantos, wikiroupas, and possibly more tools - https://phabricator.wikimedia.org/T344289 [08:09:05] !log wikistats drop referata table after dumping it T262148 [08:09:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL [08:09:07] T262148: remove referata table? - https://phabricator.wikimedia.org/T262148 [08:09:28] so, a bug causing the kubernetes config file to have invalid yaml syntax [08:10:10] It was not my fault? [08:10:15] no [08:10:23] +1 [08:14:24] python is still a mystery for me … [08:43:07] !log tools reboot tools-sgebastion-10 due to stuck NFS mounts [08:43:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:02:28] !log tools restart a bunch of sge nodes due to NFS lockups [09:02:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:41:11] !log wikistats drop wikia where http = 404 T215534 [09:41:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL [09:41:15] T215534: wikistats does not work for wikia sites - https://phabricator.wikimedia.org/T215534 [10:24:17] Hi :) [10:24:51] So, Toolforge has been 504 Gateway Time-out for hours, what gives? [10:27:16] JonathanG: hi, which tool? right now we're not aware of any toolforge-wide issues, but some individual tools seem to have got stuck due to a file system hickup [10:27:34] mix'n'match [10:27:55] https://mix-n-match.toolforge.org/#/ [10:29:07] !log tools.mix-n-match `webservice restart`, reports of it being stuck [10:29:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.mix-n-match/SAL [10:30:29] that seems to have improved it [10:30:47] yeah, that did the trick [10:30:53] complaining sometimes helps :) [10:31:00] Thank you very much! [10:37:04] You have a great day! [14:24:30] the iabot tool is 504ing after just 3 seconds for some reason. I can't see any obvious reasons why. Can someone help me out here. I did a web service restart yesterday for the same reason, but it's back again this morning. [14:34:41] possibly related: T346126, T346141 [14:34:42] T346141: Global contribution tools down, again. - https://phabricator.wikimedia.org/T346141 [14:34:42] T346126: Some of my tools (eg wikidata-todo) just start throwing 504 errors - https://phabricator.wikimedia.org/T346126 [14:48:26] Hi! I'm currently having some issues with SSH-ing to Toolforge as well as with availability of https://vector-dark.toolforge.org/ (both are DNS errors). Is it a known issue? [14:49:27] Msz2001: we're not aware of any DNS issues atm, although there is some DNS reshuffling going on (cc arturo) [14:53:06] I'm asking because I was reported almost 9 hours ago that the tool is unavailable and it still persists. It may be an issue with DNS replication then (when `dig`ging toolforge.org it succeeds only if there's DNS server explicitly set to 8.8.8.8; my default one may have not caught the update yet). [14:53:20] taavi: There a 504 reports in the tasks I mentioned above. Given the wider spread, it seems like an infra issue (as opposed to the tools themselves). [14:59:19] JJMC89: there was a short nfs issue this morning which caused some tools to get stuck and need a manual restart. but that's not a DNS issue, so Msz2001 has something different [15:21:49] JJMC89: I'm going to assume it is. I can't find any reason why my tool would be doing this. Even simple txt files are 504ing. [15:25:06] !log tools rebooting tools-sgeweblight-10-26.tools.eqiad1.wikimedia.cloud, oom [15:25:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:34:26] hi folks. I am trying to debug an issue with Wikimedia DNS SERVfailing for queries to wmcloud.org, toolforge.org [15:34:43] still trying to figure out what changed (since on WDNS we haven't done any recent changes) [15:34:55] have there been some changes on the auth side of things here? [15:46:37] there have been changes. Can you tell me a specific test case to try? [15:46:49] arturo, topranks ^^ [15:47:48] (here only for 10 minutes more) [15:48:52] sukhe: can you tell me what to try and where? [15:48:59] you can log in to a doh* host, say doh1001: dig @127.0.0.1 wmfcloud.org [15:49:09] or, if you knot-dnsutils installed [15:49:22] kdig +tls @wikimedia-dns.org www.toolforge.org A [15:50:35] andrewbogott: changes have been on the ACL side I am guessing, maybe homer? [15:50:55] sukhe: please open a ticket and I will investigate tomorrow [15:51:00] arturo: sure, thanks [15:51:13] I will also do some debugging later and then file a proper taks [15:51:15] *task [15:52:15] basically it seems like anything to ns[01].openstack.eqiad1.wikimediacloud.org is failing [15:52:26] from Wikimedia DNS, seems reachable from other places [15:52:32] (other DNS recursors) [15:53:52] to both of them? [15:54:29] sukhe: also note that the wmfcloud.org (note the f) isn't expected to return anything, the domain you can is wmcloud.org [15:55:53] `taavi@doh1001 ~ $ dig dev.toolforge.org @ns0.openstack.eqiad1.wikimediacloud.org` seems to work fine, same with ns1 [15:59:31] that doesn't go through WDNS though, it goes through the anycasted DNS resolver [16:00:20] unless I'm mistaken that tests how WDNS would connect to the wmcs auth servers [16:03:00] no, WDNS has its own pdns-rec instance running that does a full recursion. if you just do dig something, that goes through 10.3.0.1 [16:04:10] I think other than that, you can just run this locally [16:05:43] did we change something, because now it works :) [16:07:27] 12:07:21 [sukhe@azadi ~] kdig +tls @wikimedia-dns.org toolforge.org A +short [16:07:30] 185.15.56.11 [16:07:31] was failing before [16:09:29] i believe ns0 was down for a bit.. but hopefully it would have retried on ns1? [16:09:44] taavi: yeah... [16:10:05] it should have. and maybe other DNS recursors working could have been just them having it cached [16:10:11] anywhere I can read up on the ns0 thing? [16:11:05] T346042 [16:11:06] T346042: cloudservices1005: move to new setup - https://phabricator.wikimedia.org/T346042 [16:11:37] thanks! [17:15:01] taavi: any other relevant changes you might recall? [17:15:26] not asking in the way that you did them but because you linked to the above [17:15:46] I just got a failure on 8.8.8.8 as well, so yeah, it's the auth servers I am guessing [17:16:32] sukhe: ns1 is now routed via cloudsw's and bgp, instead of being a simple extra IP on a public vlan [17:18:34] hmm [17:20:41] ;; From 8.8.8.8@53(UDP) in 3049.5 ms [17:20:45] that's high as it is [17:55:39] i'm getting DNS problems this morning for https://outreachdashboard.wmflabs.org/ [17:56:11] `ERR_NAME_NOT_RESOLVED` but the server is running and the proxy is unchanged [17:56:17] ragesoss: heya. those are being investigated [17:56:45] thanks [17:57:19] !status Some DNS issues [17:59:01] !log admin "designate-manage pool update' on cloudservices1005 to remove cloudservices1004 from the pool [17:59:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:59:11] !log admin mysql:root@localhost [pdns]> update domains set master='185.15.56.163:5354 208.80.154.11:5354'; [17:59:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:59:17] (I also can't reach that server via SSH: "ssh: Could not resolve hostname bastion.wmcloud.org: Name or service not known") [18:02:26] taavi, ragesoss, any better now? [18:02:58] no [18:03:17] web and ssh both are still out for me. [18:03:18] I suspect the issue ragesoss is seeing is a separate one, which is being investigated on -tech [18:03:37] ok [18:03:50] and dns likes caching which means many issues take a while to appear and disappear :/ [18:06:42] !log admin update domains set master='185.15.56.163:5354 208.80.154.11:5354 10.64.151.4:5354'; on cloudservices1005 + cloudservices1006 [18:06:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:10:15] !status DNS issues (T346177) [18:10:16] T346177: Certain systems failing to resolve DNS entries under toolforge.org - https://phabricator.wikimedia.org/T346177 [18:34:24] it's fixed for me now. [20:04:45] !log wikistats deploying to drop lxde [20:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL [20:08:16] !log wikistats START: maintenance, expect outages [20:08:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL [20:08:57] that dropped the database size by 50% [20:12:05] !log wikistats END maintenance; kernel upgrade and database optimised [20:12:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL [20:18:03] !log wikistats final update today (to update versions list) [20:18:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikistats/SAL