[08:55:12] o/ [08:55:26] Morning [08:56:05] marostegui: hi, can you optimize flaggedtemplates on ruwiki when you have time? I think it should be smaller than its current size (180GB) [08:56:43] I also found a hotspot for parsercache that needs attention :D I will get to it when I have some free time [08:57:07] * Amir1 waves at sobanski [08:59:54] sure Amir1 , I will do in a bit [09:00:07] give me 10 mins [09:00:14] sure! [09:00:18] Thanks [09:03:09] Amir1: running it, current size on db1168, 168GB [09:03:12] Will report once done [09:03:53] awesome [09:14:02] jynus: alright, about to start replication on db2118 🤞 [09:14:08] (i took a full copy of /srv/sqldata before doing this) [09:14:14] cool :-) [09:14:31] it takes ~40mins to make a copy, which is slow, but a damn sight better than 11h :) [09:15:07] snapshotting would be nice for things like this [09:15:11] yeah [09:15:17] as in, file-system snapshoting [09:32:29] kormat, still replicating? [09:32:48] so far, yep [09:32:55] it's got ~5 days of catching up to do [09:37:49] Amir1: https://phabricator.wikimedia.org/P17106 pretty good [09:38:12] marostegui: Can I get a beer? [09:39:12] Amir1: -rw-rw---- 1 mysql mysql 104G Aug 30 09:38 /srv/sqldata/enwiki/templatelinks.ibd -> That means no [09:39:31] db2097 lag is me, I will disable the check [09:39:57] :( [09:39:59] Amir1: -rw-rw---- 1 mysql mysql 160G Aug 30 09:39 /srv/sqldata/enwiki/pagelinks.ibd -> another no! [09:40:09] I'm on it :D [09:40:19] it takes some time [09:40:35] buzzkiller [09:40:57] that's marostegui alright [09:53:43] jynus: failed again. ok, our method is not correct [10:33:22] marostegui: i'll write up how we got to where we are as a comment on the task, and then let you know [10:35:08] I think I know the issue, looking at the binlogs [10:36:02] First written event after you restarted replication is "210825 6:02:27", which is I belive the time of the switchover [10:36:25] so you seem to be right about "server_id" ignoring events generated on itself [10:36:41] welp [10:36:51] my suggestion: stop, restore your backup, change server id, retry [10:37:12] maybe s/restore/copy/ just in case :-) [10:37:54] feel free to check my conclusion, db2118-bin.002588 I this was the place you were writing this morning [10:38:10] and the first meaningful write is from that date [10:38:40] *I am assuming [10:39:42] you or manuel, ofc [10:45:48] https://mariadb.com/kb/en/mysqld-options/#-replicate-same-server-id [10:45:58] that too [10:46:01] I'll take a look after lunch [10:46:19] good news is with the copy there is room for experimentation :-) [10:46:40] "Can't be set to 1 if log-slave-updates" :-( [10:47:09] although as long as it is temporary, anything goes [10:48:31] jynus: Not sure if we are talking about the same thing, but replication was started with 'db2121-bin.002436' at position 130479946 (I guess this is the logical restore) [10:48:48] (I don't have all the context of what has been done and when) [10:48:54] marostegui, yes, I am pointing to where that was written to on the replica [10:48:58] on the binlog [10:49:07] to try to understand the duplicate key error [10:49:11] Yeah, same [10:49:29] I think logs from dump time to master switch were ignored, that's my theory [10:49:41] based not on the original binlog you mention [10:49:46] when was this logical dump taken? [10:50:01] let me get you the date [10:50:42] aproximatelly 2021-08-24 00:00:02 [10:51:11] in coords, that would be db2118-bin.002581:110134450 [10:51:25] That's the logical dump that was placed in db2118 and now gave duplicate entry? [10:51:35] The switchover was the 25th? [10:51:54] I am trying to reverse engineer this :) [10:52:06] yes, S to confirm, coordinate with her please when she's back [10:52:15] (as I am also going for lunch) [10:52:21] Sure, no problem [10:52:52] I remember checking, but maybe because things have been writen at least 3 times [10:53:08] I got confused about when it was written (the original time vs the recovery) [10:53:21] that is why I prefer someone else to check my, at the moment, only a theroy [10:53:43] What I don't get is...the dump that was restored (the one that gave duplicate entry now), was that taken BEFORE the switchover? [10:53:51] yes [10:53:55] Interesting [10:54:34] And you both correlated those coordinates to see where they'd be on the new master? [10:54:51] so lots of things gets confusing since then, as mysql expects not finding the same events twice, as if we had gone back in time [10:55:32] yeah, but if right, events will get ignored until switchover, and once no more duplicate ids, start replication for real [10:56:12] Uff yeah, but I think that's a receipt for issues [10:56:29] I thought a new dump was generated with the current master coordnates and that was the one restored [10:56:37] well, of course it does [10:56:39] and for some reason it was failing [10:56:43] it was the right master at the time [10:56:58] as I said many times, backups cannot know about the future [10:57:16] but the issue is not the coord- that can be found (we did) - what I called problem 1 [10:57:24] No, I mean if the switchover was the 25th and db2118 failed that day once it wasn't the master, I thought the dump that was placed (and now broke) was a new one that was generated ad-hoc [10:57:40] problem 2 is recovering to the same server id replicating from events initially generated on itself [10:58:03] no, that should have the good coords [10:58:40] and no prob2 because all events will come from the new master [10:59:02] the issue is we cannot just document "recover from the most recent backup" [10:59:18] because what if we need the state from 1 month ago- we need it to make it work [11:00:28] other solution would mean applying the binlog out-of-band, from binlog backups [11:02:07] please check my guess, I was wrong once, I can be wrong twice :-/ [11:02:34] I am trying to check if I can find where the first row that is now giving duplicate was inserted [12:38:39] jynus: can you confirm that https://phabricator.wikimedia.org/T289856#7318581 is what we did? [12:41:10] broadly yes, but I can clarify your questions later (why it is the next transaction) [12:41:52] marostegui: ok, that's the process we followed ^ [12:42:17] kormat: cool, I sort of reversed engineer it and I think I got it yep, but this is useful [12:42:27] See my comments on the task [12:45:11] the thing that worried my, kormat and maybe you can double check me, is that the latest binlogs after restart I would expect to contain data starting from the dump time aprox [12:45:26] but I think they start from a bit after 6 [12:45:43] but again, need someone to check that, as it is very confusing [12:45:49] which restart? [12:46:48] there was a restart of db2118 for making a backup of the restored data, as documented on description [12:50:29] so the 27th we had the exact same duplicate entry error [12:50:36] same row [12:53:30] marostegui: yep [12:54:11] marostegui: q: if you start replication, and it takes A While before "seconds behind master" is non-zero, is that suspicious? [12:55:03] (i don't remember exactly how long, but i'd say a couple of minutes) [12:58:44] marostegui: https://phabricator.wikimedia.org/T289856#7318444 - that doesn't look right [12:59:05] kormat: what do you mean? [12:59:10] the first binlog you reference (db2121-bin.002434) is _after_ the second one you reference (db2121-bin.002440) [13:00:32] -rw-rw---- 1 mysql mysql 1001M Aug 23 14:05 db2121-bin.002434 [13:00:35] -rw-rw---- 1 mysql mysql 1001M Aug 25 20:41 db2121-bin.002440 [13:01:17] oh crap, sorry :) [13:01:46] i'm really struggling to follow the rest tbh [13:02:18] Essentially looking at binlogs, I don't understand why it didn't break everywhere [13:02:55] I am scanning row based ones, but I am reaching the same conclusion (not finished yet) [13:07:19] there really should be a query language for searching binlogs [13:09:16] also, do binlogs really not have datestamps? [13:14:11] kormat, they do, but depending on the context they could be the timestamp of the original event, not when it was applied [13:15:24] plus we have a most confusing case here, where it is possible the same event will be recorded up to 3 times on the same binlog set (original, replication 1 and replication 2) [13:15:53] plus the topology changes on top :-) [13:40:07] jynus: realised after - i wasn't seeing it as a date because it's YYMMDD with 2 digit years [13:40:10] which is just Wrong. [13:40:30] jynus: marostegui: ok, i'm going to restore db2118 to it's freshly-restored-from-dumps state [13:40:36] be glad it is not MM/DD/YY [13:40:37] yeah, I am guessing some obscure format that saves 2 bytes or something [13:40:55] did you find something? [13:41:20] jynus: i want to triple-check the binlog position we're using by comparing transactions to the state of the db, [13:41:28] ok to me [13:41:30] and i want to test out the replicate-same-server-id setting [13:41:36] (or whatever the exact name is) [13:42:00] I can briefly explain the offset vs gtid thingy [13:44:33] jynus: yeah, please [13:44:45] so the idea is that gtid tracks executed transactions [13:44:46] (i do understand binlog offsets, at least) [13:44:54] "gtid_executed" [13:45:23] so if it says it is at gtid: X-Y-Z, it means that transaction marked with that id [13:45:32] I am thinking...what if we are not cleaning up gtid_current_pos and it is starting on the wrong one? [13:45:37] has been already, well, executed [13:45:57] marostegui: i did also wonder about that. i should look at the entry on db2118 actualluy [13:46:21] jynus: which transaction is 'marked' with an id? the one immediately before? or the one immediately after? [13:46:22] so compared with binlog, one will mark the "gaps" between transactions [13:46:40] on the binlog it will have it marked at the beginning and at the end [13:46:54] while the binlog only says once # at X [13:47:00] on the "gaps" [13:47:06] jynus: the GTID only appears once in the binlog [13:47:37] let me go to the example you pasted [13:50:00] it will say GTID 180367395-180367395-554641873 trans [13:50:07] just before START TRANSACTION [13:50:35] marostegui: the binlog coords for the dump is for the end of the transaction marked with the GTID in the metadata [13:50:37] will also set SET @@session.gtid_seq_no=554641873 [13:51:08] so, GTID appears at the _start_ of the transaction. not the end. [13:51:09] do stuff, COMMIT, the start GTID 180367395-180367395-554641874, [13:51:39] the important stuff is that if you mean gtid_executed=180367395-180367395-554641873 [13:51:51] executed implies , after that transaction [13:51:56] jynus: i think i got it, at this point? [13:52:06] and that would be pos 130479946 [13:53:48] but I always check the data anyway :-) [13:53:58] kormat: so you mean the metadata shows 180367395-180367395-554641873 as the last executed one, so we need to start replication from the next one which is 554641873+1=554641874? [13:54:06] marostegui: yep [13:54:11] makes sense to me yep [13:54:26] i'll double-check the data when the copy is finished, just to be extra sure [13:54:37] between 130479291 and 130479946 there's nothing touching the table that breaks with duplicate entry no? [13:54:37] +1 [13:54:40] I do it anway [13:54:56] I mean that I normally do that [13:55:20] marostegui: nope: https://phabricator.wikimedia.org/P17107 [13:55:21] in this case, the one transaction is a REPLACE, which is IDEMPOTENT, so it is not the issue [13:55:42] jynus: the transaction contains REPLACE + 2 DELETE's [13:55:47] ah [13:55:47] so the mystery still stays...why that table twice... [13:55:50] ok [13:55:56] but they're not the table that breaks the replication [13:56:28] what I mean is I remember checking with data too [13:57:35] What I discussed with kormat on a call was to check the status of that particular table right after finishing loading the dump [13:57:43] But I am pretty sure it will have the affected rows already [13:57:46] as they come from earlier binlogs [13:58:15] but if that is the case, why replication didn't break on the  backup source too? [13:59:50] cannot say [14:00:02] but the last thing I wanted to mention, and I will leave you to your own [14:00:10] (as too many people won't be productive) [14:00:30] is what I said about db2118-bin.002588 [14:00:40] which I belive it is the binlog from today [14:00:59] which I would expect to have writes since the dump date, but it has since the switchover [14:01:16] yeah, but the breakage comes from earlier than today [14:01:19] so independently of how that row got deleted [14:01:22] I think it is from 26th or something [14:01:31] that too me seems the most fishy [14:01:54] jynus: that sounds like exactly the server_id issue we discussed [14:01:57] yes [14:02:06] but please double check, because I no longer trust myself [14:02:13] will do. [14:02:44] last time I checked I saw the transactions written- but it is possible I was looking at very old binlogs [14:02:52] it is all very confusing and I don't trust myself [14:03:21] so I think I am going to move to backups unless I got called to not add more confusion [14:03:32] jynus: thanks for your help [14:03:46] let's see how kormat's new tests go [14:03:48] you are welcome, I don't think I was really that useful, I am afraid [14:03:53] :-( [14:04:58] can confirm: the first transaction in the latest binlog for db2118 was: [14:05:02] #210825 6:01:48 server id 180355206 end_log_pos 465 CRC32 0x307c1fe7 GTID 180355206-180355206-6 trans [14:05:08] other than heartbeat, etc. [14:05:09] which would be exactly when the switchover happened [14:05:13] which I didn't pay attention [14:05:25] I searched wik on grep BTW [14:05:47] but yeah, that date is the one I wanted to bring up, thank you [14:14:30] uh. how do you tell what database a given query in the binlog was run against? [14:14:56] you should have a use statement at some point before [14:15:07] but it could be from a previous transaction, I think [14:15:44] jynus: that would imply that it's conn-based; how can you tell which transacations are from the same connection? [14:15:53] thread_id perhaps? [14:16:19] probably, but binlogs are a serialization of things that are not really serial in origin [14:16:46] jynus: it has to be deterministic, though, otherwise replication would break [14:16:49] they promise to generate deterministic results, so not sure if that is a rule [14:17:08] in other words [14:17:26] I cannot say if a use will be written when an connection id is found for the first time [14:17:46] I think so, but don't know [14:18:04] jynus: i'm not searching for the first entry for the thread_id [14:18:09] as that's almost certainly a trap [14:18:20] i'm looking for the previous entry for that thread_id that mentions use [14:18:25] in this case, `arwiki` it is [14:18:40] yep [20:18:45] hey marostegui et. al. I'm hoping you can join for the Database on the Outside watch party and/or discussion on Wednesday [20:18:46] you should have invites [20:18:55] sorry* [20:19:01] Database Inside Out*