[10:10:55] lunch [10:27:26] lunch 2 [12:53:11] dcausse: I ran the hive query regarding deduplication and refined it to compare file_size, too. As a result, there are 2014 files in en wiki and commons that match in title and file_size. [13:00:16] pfischer: out of curiosity, why would you be interested in file_size being the same? [13:12:40] Well, I wanted to narrow the criteria of duplicate. An identical file size + title would be a better indicator for one and the same file in two different places. By just looking at the file name, I might be two completely different things that accidentally share the same name. [13:15:36] pfischer: the current index time behavior that we'd like to remove does only take the page title into consideration [14:12:56] I know and I’m trying to challenge this approach. Wouldn’t we risk hiding valid search results by just looking at the title? [14:19:29] possibly, but I'm not sure how can force MW to go to commons if the same title is locally available on the wiki [15:00:50] dcausse: Wednesday meeting is starting: https://meet.google.com/eki-rafx-cxi [15:02:19] oops [16:12:51] dr0ptp4kt: you froze [16:27:22] dinner