[07:30:32] hey folks, I had a report from j.oe about problems reaching Wikipedia from a Telecom Italia / Seabone DSL link where he is [07:30:53] I'm not finding a smoking gun in our logging our routing back based on the info he passed on [07:31:15] I'll follow up with him when he's back online but if anyone else is having connectivity problems please ping me [07:31:18] thanks :) [08:07:11] topranks: o/ from Fastweb in Italy all good afaics [08:07:54] elukey: ok thanks for the info! [13:45:46] I'm seeking a reviewer for an easy one: https://gerrit.wikimedia.org/r/c/operations/dns/+/851632 [13:47:45] cheers sukhe ! [13:47:57] hth! [13:49:26] it does, thank you [15:06:42] Looking at failures in pristine-tar, I think I've concluded that tar really needs a -print0 or similar option for -t; I don't think currently there's any set of runes you can pass to "tar -t" that results in a file list you can guarantee to be safe to pass to "tar -T" [15:08:36] because if you use any of the quoting options to -t then you can't use --verbatim-files-from (and so lose with any paths starting -), and if you don't quote your paths then you can use --verbatim-files-from (but then lose with any paths with newlines in). [15:10:35] I think pristine-tar can be coerced into working by having it take quoted paths from tar -t, unquoting them, and storing them in the manifest separated by NULL, and then using tar --null -T [16:02:30] What am I missing in http://paste.debian.net/1259162/ ? tar complains "tar: -hyphentest/foo.txt: Not found in archive" but that file is in the archive, and indeed tar has successfully extracted it [16:41:10] Answer is "tar: -hyphentest/foo.txt: Not found in archive" means "you already extracted -hyphentest/". Obviously. [17:46:50] since this affects things outside of Traffic's scope as well [17:46:55] sukhe@cumin2002:~$ sudo cumin 'R:Class = Haproxy' [17:46:55] 123 hosts will be targeted: [17:46:55] cloudcontrol[2001,2004-2005]-dev.wikimedia.org,cloudcontrol[1005-1007].wikimedia.org,cp[2027-2042].codfw.wmnet,cp[6001-6016].drmrs.wmnet,cp[1075-1090].eqiad.wmnet,cp[5002-5016].eqsin.wmnet,cp[3050-3065].esams.wmnet,cp[4037-4052].ulsfo.wmnet,dbproxy[2001-2004].codfw.wmnet,dbproxy[1012-1021].eqiad.wmnet,thumbor[2003-2006].codfw.wmnet,thumbor[1001-1002,1005-1006].eqiad.wmnet [17:47:09] looking for a review on a small change: https://gerrit.wikimedia.org/r/c/operations/puppet/+/851689 [17:47:12] thanks! [17:47:40] we in Traffic are trying to alleviate any and all Puppet issues we are having with reimaging/rebooting, and hence this and other related changes [17:51:18] sukhe: +1! [17:51:24] thanks! <3 [17:51:50] oh wait, guess we should remove this comment too "FIXME: Migrate to systemd::tmpfile" [17:51:53] updating [18:10:26] sukhe: just got a widespread puppet failures alert for cache_text nodes, I think that patch might be the culprit [18:11:05] ah [18:11:08] Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Could not find resource 'File[/run/esitest]' in parameter 'require' (file: /etc/puppet/modules/esitest/manifests/init.pp, line: 35) on node cp3050.esams.wmnet [18:16:36] oh no [18:16:49] that's me [18:16:49] fixing [18:18:04] so the haproxy patch is fine [18:18:18] the esitest depends on the file but it should depend on Systemd::Tmpfile instead [18:18:21] updating [18:29:20] thanks for letting me know cdanis and bblack! the Puppet failures should be resolving soon [19:35:38] Anyone know how I obtain etcd credentials for confctl? I trying to pool some kubertes controller nodes, https://wikitech.wikimedia.org/wiki/Conftool#Insufficient_credentials. [19:41:29] ok, I just ran it on the puppetmasters with sudo, perhaps that was what I was supposed to do? [19:44:29] jhathaway: yeah, it is, ah, 'convention', that all mutating confctl operations run as root [19:45:10] ok thanks, this bit confused me, https://wikitech.wikimedia.org/wiki/Conftool#Insufficient_credentials [19:45:34] where it told me to create an etcdrc file if I didn't have sufficient permissions [19:46:56] those instructions are not great heh [19:47:23] I think actually plumbing users down to etcd acls was considered at some point, then abandoned [19:47:33] I made a small edit [19:48:19] cdanis: thank you! [19:49:00] thanks for pointing out rough edges :) [20:10:32] !log restarting pybal on lvs1020.eqiad.net [20:10:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:07] !log restarting pybal on lvs1019.eqiad.net [20:16:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:53:38] jhathaway: sadly I believe the experts are go.dog and Empero.r so EU times [21:54:10] ok thanks, not sure what the issue is yet, other than that swift is swampped [21:55:02] I can't offer anything else, my knowledge of Swift is minimal [21:55:08] jhathaway: see my doc [21:55:13] ERROR with Account server 10.64.32.117:6002/sda3 re: Trying to HEAD /v1/AUTH_mw: ConnectionTimeout (0.5s) (txn: tx7fc70e4647cb4a01a26e0-00636195a7) [21:55:18] is what I see the proxy logs [21:55:27] given errors seem to happening only on eqiad, I suggest we should depool eqiad's swift cluster [21:55:47] sounds reasonable jynus [21:56:04] please help me find that, will be on wikitech or gerrit old commit [21:56:30] looking, what doc are you referring to? [21:57:22] I don't know, that is why I am hasking for help [21:59:53] I am going to mention the issue on status page [22:00:17] jynus: thanks [22:00:23] haven't found any docs yet [22:03:55] jynus: https://docs.google.com/document/d/1eD0kTGLCvl_x5pGNy_7_woFKUJf9-rD3xFzEla71qlc/edit#heading=h.vg6rb6x2eccy [22:04:19] log filippo@cumin1001 conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad [22:04:31] ok, that helps [22:05:14] that supposedly depools for reads, based on the incident doc [22:05:41] https://wikitech.wikimedia.org/wiki/Conftool#Depool_all_nodes_in_a_specific_datacenter [22:05:47] that points me to that [22:07:23] sounds worth trying [22:07:26] I wonder what is the service name for that [22:07:50] for dnsdisc? or something else? [22:08:20] cd is eqiad, cluster is swift, what is the service name? [22:08:25] *dc [22:09:49] swift-fe maybe [22:10:00] yeah, looks like it at least in service::catalog [22:10:06] hmm you cannot depool all the servers there [22:10:14] Pybal won't let you [22:10:18] I see [22:10:22] I just realized that [22:10:32] You need to depool eqiad from the dnsdisc for swift [22:10:37] shouldn't this work? : confctl --object-type discovery select 'dnsdisc=swift,name=codfw' set/pooled=false [22:10:48] s/codfw/eqiad [22:11:03] https://sal.toolforge.org/log/6qQcGYAB6FQ6iqKi-s-e [22:11:16] We also have swift-ro and swift-rw FWIW [22:11:22] how can I query the current state? [22:11:53] https://config-master.wikimedia.org/pybal/eqiad/swift [22:11:54] get worked [22:12:00] both dcs are pooled [22:12:13] https://config-master.wikimedia.org/discovery/discovery-basic.yaml [22:12:21] confctl --object-type discovery select 'dnsdisc=swift,name=eqiad' set/pooled=no [22:12:25] ^ok? [22:12:33] +1 [22:12:36] or false? [22:12:48] ah you are right, false [22:13:01] well wait not sure [22:13:11] vgutierrez ok to disagree now [22:13:33] mvernon did set/pooled=no, so I assume that worked [22:13:52] false [22:14:01] Per https://wikitech.wikimedia.org/wiki/DNS/Discovery [22:15:48] jynus: I say go with false [22:15:54] and give it a try [22:16:04] conversation is happening on other channel [22:16:27] roger roger, thanks