[00:03:44] if I give that +2, it will deploy itself to beta and hopefully fix the fatal error, right Reedy? [00:03:56] yup [00:04:18] but you should also pull/deploy in prod too otherwise people will get grouchy [00:04:35] yeah but best to test in beta [00:08:34] tested "mwscript purgeMessageBlobStore.php", that works now [00:10:15] looks like scap worked too [00:11:29] good, I'll deploy it to production now [00:13:32] thanks [03:12:52] TimStarling: RE: Defines.php, was it needed because the value of a constant changed (e.g. regular operation), or because of a hot-fix to define an undefined constant used in wmf-config, but then the hotfix not working? [03:13:52] ah, I didn't notice the added 0 [03:13:54] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/684142/13/includes/Defines.php [03:15:05] I guess that's another reason to consider removing the cache. I was close to doing that at some point, but got distracted with other projects. [03:15:35] at one point it looked as if the file read and json parse was more expensive than the call to wgConf [03:16:23] having a third mtime, might tip that more consistently in the direction of not worth it. [05:07:55] the value of a constant changed [19:16:09] Pchelolo: is there a way to reproduce it locally? [19:16:20] Amir1: I didn't manage to [19:16:39] Pchelolo: yeah, it's hard usually. Let me give it a try. [19:16:58] it only happens in prod like once an hour now, so you need to be VERY luck [19:17:11] I know a little bit about MySQL, might manage to find the underlying problem [19:17:36] nah, I wouldn't try to reproduce it, this kind of issues are really hard [19:17:46] *to reproduce [19:18:50] what's very suprising is that if you look at https://phabricator.wikimedia.org/T286521#7213294 - it seems to deadlock between different user_ids and different preferences names [19:20:32] lovely [19:27:23] Pchelolo: one thing is that row level locking doesn't practically exist, it's usually bigger than one row and mostly mitigated through MVCC, one classic case is that when you read a range for locking, rows in between that doesn't exist also get locked and can't be inserted [19:27:32] I triple check and see what can be the reason [19:28:09] So, what I was doing before with this is removing locking on selects for user_properties [19:29:04] and I'm fairly sure there's no more SELECT .. FOR UPDATE, or SELECT .. LOCK IN SHARED MODE anymore. [19:30:03] that's not 100% correct not to lock for update, but we have a plan there to trully only insert properties that were modified. I'm working on it, it's going to take a bit [19:32:12] I think I know what's happening, let me dig a bit deeper [19:38:39] Pchelolo: so the actual deadlock is happening on user table, it's basically first user locking user id on the table and waiting for the second thread to insert the options, the second thread is waiting for the first thread on user_options [19:39:03] the important bit is that all of work in one request happens during one massive transaction [19:39:52] This section is very useful and gives a new meaning to the word "Basic" [19:39:52] https://www.mediawiki.org/wiki/Database_transactions#Basic_transaction_usage [19:40:03] I have to know how did you arrive to this conclusion :) [19:40:50] compare two most recent cases in https://logstash.wikimedia.org/goto/033a8bfb8015153e1218823e8f1504c6 [19:41:07] User::saveSettings() [19:41:27] first creates the user then saves options [19:42:06] moving this to a deferred update would fix it. [19:45:06] would logically moving the saving user options to a deferred update be okay? [19:45:24] Amir1: I think so. yes. [19:45:43] it would take it out of the main transaction [19:45:59] let me make a patch then [19:46:24] right now we're calling User::saveSettings to save options sometimes, but the goal is to eventually only call UserOptionsManager::saveOptions when you update the preferences. [19:47:02] can't do it yet, UserOptionsManager::saveOptions needs to update user_touched. [19:48:08] yeah that sounds fun [19:48:53] Pchelolo: if you ask me, make UserOptionsManager::saveOptions changing user_touched a job or defered update (preferably a job) [19:49:58] yup, either onTransactionCommitOrIdle, or deferred update, or a job, donno yet, depending on how consistent we need to be there. [19:51:10] yeah, I don't think that's a critical feature [20:04:20] Pchelolo: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/704601/ [20:05:41] oki, will have a look in a min. THANK YOU! [22:17:28] Pchelolo: awesome comment, thanks. That makes much more sense [22:18:19] hehe, it's a theory, but I'm not very sure about it. [22:21:55] it's really hard to test [22:28:28] Pchelolo: I suggest we patch only wmf.14 for it and if it works, run it on master [22:28:44] but not right now, I need to leave, tomorrow! [22:29:03] thank you for your help [23:44:26] TimStarling: if you got some spare time today, could you have a look at my theory at T286521#7213689 and my two proposals at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/704642 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/704639 ? [23:44:26] T286521: Deadlock found when trying to get lock (UserOptionsManager::saveOptionsQuery) - https://phabricator.wikimedia.org/T286521 [23:50:10] moving all saving of user options to onTransactionPreCommitOrIdle seemed like quite an aggressive step [23:50:16] and self-merged? [23:50:36] oh right, actually approved after half an hour [23:50:53] no, approved after 13 minutes [23:54:40] right, and Timo pointing out problems with it within the hour [23:59:27] reverting my patch would be fine to try and certainly safer than totally changing the order of operations after 13 minutes of review