[00:36:45] That should be amended to human who is competent in the language the article is in. For smaller langs we're already seeing corpora collected by people who don't know the lang and where a good chunk of the texts are written by people who don't actually know the lang. (re @conrick: The Wikimedia bottom line so far is that no content may be saved explicitly unless it has been checked by a human) [00:37:20] Think about what garbage would have come of using the Scots wp as a data set for instance.... [05:32:42] Not all projects have a policy about that though (and for bot created content, only the code and samples are checked a priori, not every article) [08:24:52] Real sources from the web. It sometimes uses reddit or forums as sources, but if you explicitly tell it to use particular sources, or more reliable ones, it tends to comply, as far as my experience goes. (re @Mārtiņš: Are these real sources or generated ones that look like real ones?)