[00:14:50] ebernhardson: so I'm taking a look at https://phabricator.wikimedia.org/T323066 and could use some help understanding the turnilo data [00:15:30] I'm guessing from the name `webrequest_sampled_128` that the turnilo data in the ticket is like a sampling of 1 out of every 128 web requests, but not sure [00:15:59] There is some decent documentation here that I'm looking thru: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest#Pipeline [08:21:47] ryankemper: yes, `webrequest_sampled_128` is sampling 1 out of every 128 requests. So if you want the actual number of request, you need to multiply by 128 [08:56:46] inflatador: I think cloudelastic needs a rolling-restart after you merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/859094 [08:58:42] hi all [08:58:48] after the suggestions of dcausse about my issue with encoding, yesterday i downloaded again the latest available wikidata dump to start over from the munge process, but it seems that the issue is from the dump itself... [08:58:57] zgrep � latest-all.ttl.gz [08:59:04] pr:P1476 "���s�{���v�l��"@ja . [08:59:04] ouch [08:59:04] pq:P1810 "Fi� allo Scilliar" . [08:59:04] pr:P1476 "���s�{���v�l��"@ja . [08:59:18] that's worse than I thought then... [08:59:23] looking [09:03:01] madbob_: dumps seem fine to me, do you have a link? [09:03:28] https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz [09:05:09] "curl -s https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz | zcat | grep pr:P1476 | head -n 100" shows proper strings to me (e.g. pr:P1476 "Nationale feestdagen in België"@nl) [09:05:41] might be when displaying the strings to your terminal perhaps? [09:05:58] zgrep pr:P1476 latest-all.ttl.gz | grep Nationale [09:06:06] pr:P1476 "Nationale feestdagen in België"@nl ; [09:06:11] this is right for me also [09:06:17] not all strings are broken [09:06:32] try to grep "Scilliar" [09:07:36] looking, but if it's deep inside the file it might take a while [09:08:57] zgrep Scilliar latest-all.ttl.gz -n [09:09:04] 30359417: pq:P1810 "Fi� allo Scilliar" . [09:10:57] madbob_: it's broken in wikidata https://www.wikidata.org/wiki/Q381366 (Fi� allo Scilliar) [09:11:09] data quality issue [09:11:17] it's not a bug here [09:13:08] actually the dataset it comes from has also the same data issue https://spelunker.whosonfirst.org/id/101792261/ [09:13:43] I guess the bot that replicated this data simply replicated the same mistake [09:14:29] I think that grepping � will yield many false positives [09:16:09] oops... ok, sorry for the wrong alert :-\ [09:16:26] no worries! [09:17:04] i'm try to grep some string known to be broken on my latest import, both on dump and munged files, to double check them before loading data in blazegraph [09:17:43] madbob_: this seems like a good approach [12:11:50] lunch [13:21:32] Hi all! I'm looking at a 1.37 -> 1.38 change for a wiki, and seeing an error in jobs for elastic things `[CirrusSearch] Exception thrown while running DataSender::sendData in cluster default: No enabled connection` [13:22:06] any hints around what might be up / what I should be looking at? generally speaking I havn't changed any elastic config, same elastic version is running, just updated the extension etc [13:24:16] hmm, / hold that thought, maybe I spotted my issue.... [15:54:10] will not make Weds mtg, dealing with car issues [15:59:29] I'll be a bit late to Wednesday meeting. Current meeting will be running late and I need a quick break [17:24:33] dinner [17:55:42] ryankemper thanks for the cloudelastic restart. You might check relforge too if you haven't already [17:56:56] my car's fixed, but I need to pack some stuff. Will see you after the break! [18:02:53] relforge is good [18:03:04] Looks like it was just 3 hosts in cloudelastic [18:05:00] (Scanned via `sudo -E cumin -b 6 A:cloudelastic 'ps auxf | grep -v grep | grep -q Xmx8G && echo needs restart || echo no restart needed'`) [19:28:40] ryankemper: I'll be 3' late [19:28:46] gehel: ack