[10:08:23] mysql grants make me sad [11:13:12] Hello. I'm planning to do T334055 today, at long last. I've got the outline of a plan here if you'd like to review: https://phabricator.wikimedia.org/T334055#9027251 [11:13:12] T334055: Replace db1108 with db1208 - https://phabricator.wikimedia.org/T334055 [11:17:09] hello :) I am wondering how many bits per characters are used in the wiki `user.user_name` field :) [11:17:15] Intuitively I would expect 8 bits per character given the CI MariaDB spawns with `charachter_set_server=binary` [11:17:43] based on that assumption the 9 characters long username `User-0.12` would 9 chars × 8 bits / chars = 72 bits [11:18:05] the reason I ask is that when doing an INSERT innodb established a 72 bits lock on the field [11:18:24] hashar: it's complicated- utf-8 lets you use up to 4 bytes [11:18:35] but then there is compression + length storage [11:18:55] and as I understand it that causes two simultanenous INSERT statement sharing the same 72 bits prefix of `User-0.12` to share the same lock leading to a deadlock [11:20:04] you are mixing too many things on the same question. My counter question would be- what do you need from the start? [11:20:17] hehe sorry :-( [11:20:47] so the problem statement is we are trying to run Selenium tests concurrently, each test creates a one off wiki user [11:21:16] and we end up with some user creations being rejected due to a deadlock when inserting the user in the database, which cause the user creation to fail and thus a test to fail [11:21:58] ok, that make sense- do you have real deadlocks or just insert timeouts due to locks? [11:22:00] that got filed like 5 years ago and start hitting us as we try to make tests to run concurrently. The best analysis I have found is from 2018 at https://phabricator.wikimedia.org/T199393#4420174 [11:22:30] which shows a deadlock report written when using `mysqld --innodb-print-all-deadlocks` [11:23:01] ok, the select lock is important :-D [11:23:15] otherwise it would be quite difficult to get a deadlock [11:24:36] Tim mentioned changing the sql statement used to insert the field. My workaround is to change the randomly generated usernames to give them more variance over the first 72 bits and hopefully reduce the chance of sharing the same gap :) [11:25:25] eg instead of having the shared `User-0.` I can at least strip the `0.` which gives me two less characters :] [11:25:45] I don't get that last part. What I would do is avoid gap locking by reducing the lock constrains [11:26:04] yeah that as well I guess [11:26:23] I don't think how large the index is will change how much gap locking happens [11:26:36] it may, but I wouldn't trust that at all [11:27:55] so ROW binlog format tx_isolation=READ COMMITTED [11:28:28] the problem is you are having an issue while testing but you are changing the testing environment because you want to test in parallel [11:29:12] and I don't have any immediate solution to that- other than having 100 precreated dbs [11:34:20] jynus: what would 100 created dbs solve? [11:34:26] one db per test? [11:36:03] yeah, allowing concurrency but suposing isolation between tests in data [11:36:45] if they are precreated shouldn't be *that* bad, we have the tooling to setup that quickly (xtrabackup) but I understand not being an ideal solution, just to be clear [11:37:40] I just don't see any perfect solution, other than suggesting the lowering of consistency level (which also may not be ideal) [11:37:48] that is theorically doable but I am afraid that will be a bit messy with mediawiki/selenium/quibble etc :) [11:38:33] so my assumption is that assuming the gap lock is done on the first 72 bits, if I concurrent insert users named `A` and `B` they don't share the same first 72 bits and thus never race for the lock [11:38:59] but usernames sharing the same 9 characters (72 bits) do share the same gap lock leading to the issue [11:47:00] the problem is that is now how it works (bits) [11:47:16] if you make it longer or change the values, the bits will increase accordingly [11:47:35] you have to focus on what it is being locked (the gaps) logically, not how it is implemented [11:47:59] that is why I think you reasoning will not lead to success (but I cannot know for sure) [11:48:34] cause the number of bits used for locking is determinated when creating the lock? [11:49:19] I have an intreview, but please do some testing based on that, we can continue discussing after that [11:49:32] thank you!! have a good interview :) [11:50:09] it is one of those things that I think won't work but I cannot 100% ensure it without testing [11:51:15] yeah I have a test running in CI we shall see :) [12:05:43] ah now I understand your previous message `tx_isolation=READ COMMITTED` after reading https://www.percona.com/blog/innodbs-gap-locks/ :) [12:06:08] looks like the 72 bits is a red hearing and it lock the whole gap between existing records [12:06:09] fun [12:34:33] FWIW, innodb gap locking is usually the biggest reason behind deadlocks and similar issues. If you see a deadlock, it's due to gap locking unless proven otherwise [12:53:06] and I guess I mis understood how gap locking. I assumed it locked the space "<72 bits>*" [12:53:20] when it seems to lock anything between where the record is inserted and the next/previous record