[02:46:14] TimStarling: I made a series of small rdbms patches with parts of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/771749 [02:46:29] first is https://gerrit.wikimedia.org/r/c/mediawiki/core/+/771754/1 [02:47:12] ok [02:48:15] did you isolate T303887 or have a theory on it? [02:48:15] T303887: Wikimedia\Rdbms\DBTransactionError: Transaction round stage must be 'cursory' (not 'within-rollback-session') - https://phabricator.wikimedia.org/T303887 [02:48:53] I don't want to just regress it when we get to the part of the sequence that caused it [02:56:44] TimStarling: those are all of the "side changes", which should be lower risk [02:57:04] right [02:57:06] TimStarling: I thought T303887 was fixed [02:57:07] T303887: Wikimedia\Rdbms\DBTransactionError: Transaction round stage must be 'cursory' (not 'within-rollback-session') - https://phabricator.wikimedia.org/T303887 [02:57:13] I never deployed the backport though [02:58:06] (it would have needed your patch to pass CI) [02:58:59] TimStarling: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/771475 would also help with errors like the LogicException one [03:05:14] yesterday in "PET Technical Discussion" with Amir I suggested that some of these errors could be resolved by just removing assertions [03:05:58] for example the whole owner ID concept could go away -- I checked the original commit and there was no linked task, so it didn't seem like it was fixing anything [03:07:12] Amir seemed enthusiastic about that, and about anything that will reduce the code size and complexity of libs/rdbms [03:18:32] TimStarling: the owner ID stuff is to avoid misuse. It's really annoying to find and clean up random callers doing things that only the LB/LBF should do. [03:18:46] * AaronSchulz is a bit jaded in that regard [03:19:18] yeah but is the cure worse than the disease? [03:20:18] I don't think one bug in first method in Database would justify removing the concept from LB/LBF. That seems a bit too reactive. [03:21:34] the bugs in exception handling in unit tests and empty unit tests made it take way longer to debug. Those are general problems that have patches to the test code (yours was merged already). [03:22:10] The actual id mismatch bug was easy to fix. I spent forever on the LogicException one though. [03:23:08] Not being able to see all the failing tests before that exception is annoying. [03:23:23] you can see them in the debug log [03:24:32] yes, there are PhpUnit entries. I spent a lot of time grepping through that 8.3mb log file) [03:26:05] the parsertest ones looked suspect, which made me suspect something about the DB [03:27:44] (also, run() setup errors are annoying for local testing since you have to enable a debug log file and tail/grep for stuff that could have easily been shown in the output) [03:28:36] it would have been handy to have had more notes from you, we have a big timezone difference, so we can tag team to some extent [03:30:03] I spent maybe 5 hours isolating that bug, including a lot of time doing the debug log analysis reported at https://phabricator.wikimedia.org/T292239#7784221 [03:30:37] it seemed like people had theories, but all I saw was a few unexplained commits [03:37:03] I was debugging and making patches all day, so we must have had a few hours of time overlap. Maybe we could have had a google hangout too, talking as soon we see something meaningful.