[00:01:51] Danny_B: maybe you can write the urls to filter out already reversed? [00:05:49] I think it would be worth to install on cloud a SQL function that could reverse these back [00:06:25] Platonides: if you can figure out how to write it, I'm game to figure out how to deploy it. [00:07:15] I think we can cram it into the maintain-views script for the replicas [00:08:15] I've never done it, but I think the database supports functions coded in C [00:09:16] * Platonides notes the server is 10.4.22-MariaDB [00:09:55] a loadable function is trickier magic than I was thinking about. [00:10:28] I'm not sure it's possible just with SQL functions [00:11:58] I've done much trickier things in PL/SQL, but I haven't written super complicated mariadb stored procs before. [00:12:28] my PL/SQL is 10 years out of use too so would take a lot of staring into the void to recall :) [00:13:38] hmm, it is probably possible to do it in PL/SQL [00:13:45] albeit I don't know about efficiency [00:14:36] it would have been easier if we had reversed the domain pieces as well, not only their order: https://mariadb.com/kb/en/reverse/ [00:19:08] Platonides: exactly [00:19:30] at least [00:20:04] though i kinda doubt that it's even worth to do the reversing (pieces or chars) at all [00:22:12] chars reversing would actually allow easier seaching for wildcard even within the chunk [00:23:13] *x.com - i want all domains ending with x in tld com (not that it would be useful example) [00:23:44] however still chr reversing both on php and sql side is easier than chunks [00:24:14] you might try convincing Amir1 to change the format of the new el_to_domain_index :P [00:25:13] though to comply with current functionality when we can prepend protocol, it would require to be el_id, el_from, el_protocol, el_domain_reversed, el_path [00:26:47] not sure why protocol was left in el_domain_index [00:27:41] maybe for allowing to search old http links? [00:28:10] perhaps to support all kind of protocols [00:29:49] Platonides: you seemed to think it was needed in the past -- https://phabricator.wikimedia.org/T312666#8147009 [00:31:39] the optimization of index could have been done by going towards NF [00:32:41] bd808: that only was about storing the protocol as it was on the page [00:33:04] not regarding if it should be in the same column as el_domain or a different one [00:33:20] *nod* [00:35:50] Amir1 seemed to plan a further split into a table with external link domains and another with the paths [00:36:34] the best way would be e_id, el_from, el_protocol_id, el_domain_id, el_path and domain_id, domain_name, domain_reversed and protocol_id, protocol_name [00:36:37] a full reversing could be done there [00:36:51] that would allow to search by protocol only too [00:38:09] protocol could be an ENUM(), removing the need for a protocol table [00:38:25] but wgUrlProtocols can get new protocols in LocalSettings [00:38:48] anything exotic added there would require an ALTER TABLE [00:39:51] I opened T344877, btw [00:39:51] T344877: SQL function to recover the normal hostname, to install on wmfcloud db replicas - https://phabricator.wikimedia.org/T344877 [00:47:58] btw searching by path can be also useful ie to find cloned sites [00:48:24] * Danny_B hopes Amir1 will read the backlog and take it into consideration [01:08:57] Platonides: re the enum of protocol name and possible altering table - if the enum contained all currently registered/known protocols, the need for possible alter with new protocol would be highly rare. enum can contain all while $wg... will narrow them to subset for the given wiki [01:44:49] I'm sure some people would "need" to add some made-up protocols :) [01:45:05] we might decide they're on their own, though [04:32:11] did the train get pushed to group 2 then rolled back? saw a new patch appear then disappear [06:02:22] train should be wmf.22 on group 2 so I don't think so [06:03:46] https://phabricator.wikimedia.org/T343725 is the tracking task for the deployment [06:13:30] correct per https://versions.toolforge.org/ [10:48:55] I have a question about our cache setup. I assumed the local server object cache (APCu) gets wiped on each deployment / php-fpm restart. Is that true? [12:18:13] is there any comfortable way how to get the list of interlanguage prefixes? (preferably raw one line one code) [12:46:11] I'd start from https://www.mediawiki.org/wiki/Special:ApiSandbox#action=sitematrix&format=json&formatversion=2