[05:56:17] Amir1: Do you think we should go ahead and optimize flaggedtemplates in ruwiki eqiad (or everywhere else if you tell me so) before the switch back? [07:28:29] s5 and s6 had around a 7% of redunction in size for dumps [07:32:29] that might be the cleanup on flaggedtemplates from Amir1 [07:32:52] This is s65 for instance: https://phabricator.wikimedia.org/P17106 [07:32:55] s6 [07:33:48] marostegui: yes, can you check the ticket and do anything that's marked as done? [07:34:29] Amir1: this ticket? https://phabricator.wikimedia.org/T289249 [07:34:41] jynus: yup, that's very likely because of flaggedtemplates clean up [07:34:53] Yup [07:35:10] Amir1: I will create a task for the optimize part as a subtask of that one [07:35:17] Awesome [07:35:24] Thanks [07:35:29] Is this the current status? https://phabricator.wikimedia.org/T289249#7311985 ? [07:35:39] Yup [07:35:43] Meaning I can optimize huwiki, ruwiki, cewiki, plwiki [07:35:45] right? [07:35:50] Yup [07:36:10] Do you think that cleanup will be finished before the 7th? [07:36:12] I might need to update and some more [07:36:21] *add [07:36:35] marostegui: very likely yes [07:36:39] sweeet [07:36:47] Let me create the optimize task then [07:40:39] If everything goes well, I create the ticket to drop flaggedimages next week [07:40:53] That'd be a couple hundred GBs [07:41:17] oh sweet [07:53:24] it's slightly odd coming back after a bank holiday, and everyone else was working yesterday :) [07:53:38] * Emperor has spent a while catching up on email et al [07:53:41] I know the feeling [08:27:18] marostegui: eureka: https://phabricator.wikimedia.org/T289856#7321005 [08:28:19] that nukes the page_restrictions entry, it's between the dump and the primary switchover, and it has the server_id of db2118, so it would skip it by default [08:29:09] Aha! [08:30:04] That explains the duplicate entry then [08:30:17] The fact that it was skipping that transaction because of the server_id [08:30:17] yep [08:30:28] ok. so i think we're good to try again. [08:30:36] did you change the mariadb config? [08:30:55] just about to :) [08:31:44] mm. looks like it requires setting log_slave_updates to 0, to [08:31:46] *too [08:31:52] Yeah, that makes sense [08:32:10] It is scary though to let it replicate its own things but oh well [08:32:35] my main concern at this point is that, even if we get db2118 working, [08:32:46] can we trust that its binlogs are sane? [08:33:01] e.g. if we promote it back to primary, and then restore another host, [08:33:06] will things be ok? [08:33:18] We can always reset all the binlogs [08:33:27] But yeah, we need to check its data [08:33:28] for sure [08:36:34] let me archive them to a backup host and then you can purge them [08:36:44] so we can keep them for a while [08:37:02] jynus: which set of binlogs are you thinking of? [08:37:04] jynus: you mean db2118 ones? [08:37:12] yes [08:37:33] let's see if it finally catches up entirely before doing a reset master [08:37:54] oh, of course, I meant before "We can always reset all the binlogs" [08:37:56] but we need to also do a compare on the revision tables and friends [08:37:58] ah sure [08:40:45] marostegui: does https://phabricator.wikimedia.org/T289856#7321022 look ok? [08:40:56] checking [08:41:21] binlog backup is actually part of our strategy for backups, but never got to it in the past, beacause I didn't know how to implement it on topology changes [08:42:32] kormat: Before starting mariadb I would do: systemctl set-environment MYSQLD_OPTS="--skip-slave-start" just to be extra safe, after the mysqlcheck I would also include a db-compare for main_tables.txt and a reset master [08:43:07] (that option is already set) [08:43:46] good [08:44:05] marostegui: should i be starting replicaiton again before doing db-compare? [08:44:44] yes, we need it in sync with the master (and do it while it runs) otherwise there will be drifts [08:45:00] ok, that was my guess [08:45:42] steps updated [08:46:25] I love when I have 5 tabs in a row with exactly the same ticket [08:46:27] checking now kormat [08:46:42] it looks good [08:53:06] ok, that's a good sign - the seconds_behind_master field _immediately_ went to a big number [08:59:51] ahh. i was confused why it wasn't writing its own binlogs. then i remember what log_slave_updates=0 does :) [09:07:52] marostegui: you know, this makes me wonder if we could have 'just' used GTID here. because the GTID attempt failed in exactly the same way [09:08:36] yeah at this point I don't really trust GTID [09:11:20] marostegui: my point is the opposite - i think in this case we _should_ trust in GTID, because it would greatly simplify things [09:11:29] trying to find the right binlog offset on the new primary is awful [09:11:41] can't be automated, very easy to screw up [09:12:13] kormat: yeah what I mean is that I don't know if it would have worked by just setting the GTID position [09:12:18] i'm going to test this, because this opportunity is too good to pass up [09:12:27] or broke in some weird way [09:12:27] iff replication does catch up correctly, [09:12:32] i'm going to reset back to the fresh-dump situation, [09:12:36] with the same config changes, [09:12:43] and try starting replication just using GTID [09:12:57] either that works, or we learn something new and exciting [09:12:57] using the --74 position? [09:13:23] ish. i _think_ we set the slave gtid pos to the --73 one, and then let it run [09:13:40] y'know, it would be nice if mariadb actually _documented_ how this stuff works [09:13:41] sure, GTID will refuse to execute something twice anyways [09:14:08] marostegui: my understanding is that the binglog offset is to the next transaction to execute. the gtid slave pos is to the last transaction executed [09:14:15] (i could be wrong, etc) [09:14:29] yes, when setting it up it should be as you say [09:14:34] as you can't necessarily predict the next GTID position [09:14:44] yes, what I mean is that if you even setup a position of an executed position it will not run it twice [09:14:53] ie, you set up -72 [09:15:00] gtid will not try to re-apply -73 [09:15:12] marostegui: right, ack [09:15:21] where does mariadb store which gtids it has executed? [09:15:49] show global variables like 'gtid%'; [09:16:00] physically on a few mysql. tables [09:16:08] but it also scans binlogs on start, I think [09:16:16] oh, heh. well in that case it doesn't actually _know_ which GTIDs is has executed in our case [09:16:47] we provide that info by setting slave_gtid_pos [09:16:49] no, I think it leads a file with the pos written to disk [09:16:52] and it trusts us [09:16:54] *loads [09:17:01] jynus: where is that in the dump? [09:17:09] shouldn't be on the dumps [09:17:17] ok, so as i was saying then [09:17:19] if you set it manually, it shoudl obey you [09:18:25] kormat: you have all the info on that show variables and the table is gtid_slave_pos on mysql db [09:18:30] the question is, when it connects to the primary [09:18:37] it it tracks the transaction per domain_id [09:18:44] depending on how you set i up manually it could do unexpected things [09:18:47] and it is a bit of a mess cause we have lots of domain_ids [09:18:52] what manuel says [09:19:13] marostegui: when we restore from a dump, my understanding is that we do not tell mariadb what the gtid status is [09:19:21] e.g. it is on X pos on serverid-domainid X, so let's apply from 0 the others (or not) [09:19:27] marostegui: we then either give it binlog+pos, and it figures it out from the primary, [09:19:36] or we give it slave_gtid_pos, and it trusts that that is correct [09:20:10] to put it another way: a mariadb server normally has a bunch of GTID context. this is not dumped during backups. [09:20:17] just the latest slave_gtid_pos value [09:20:21] (please correct me if i'm wrong) [09:20:33] kormat: yeah, that's why I started the docs with binlog+file as when I was debugging the gtid+multisource mariadb bug, it was a super mess with the slave_gtid_pos not doing anything sane [09:20:42] hah, ack [09:20:52] So starting with binlog+pos and THEN switching to SLAVE_POS would do the right thing [09:21:03] rather than starting with a weird old domain_id slave_pos combination [09:21:10] in this one specific case, that's what i'm trying to avoid [09:21:17] because binlog+pos is a nightmare [09:21:24] in every other case, i agree :) [09:21:35] In theory GTID will refuse to do anything crazy (ie executed transactions that are old or from an incorrect domain_id) but at that point I didn't trust anything [09:21:50] kormat, the counter issue is [09:22:09] someone is possible to have done maintenance with binlog enabled [09:22:27] that will be reflected on the binlog and gtid [09:22:44] marostegui: i'm not sure that is true. if we tell a freshly restored mariadb the wrong (old) slave_gtid_pos, i think it will try to re-execute old transactions [09:22:44] so the danger is that by setting a good gtid [09:22:46] kormat: this is the very long bug I filled https://jira.mariadb.org/browse/MDEV-12012 [09:22:56] it is possible to do bad, unintended maintenance [09:23:23] is it our fault- true, but it is quite difficult to avoid it [09:23:48] jynus: we do maintenance all the time with binlogs enabled - what's the issue there? [09:23:56] kormat: it will but I should fail before applying them (ie: not corrupting data), but God knows [09:24:05] marostegui: right, agreed on that :) [09:24:08] so this is the effect- [09:24:11] kormat: He means out of band I think [09:24:18] you drop tables with binlog enabled from a replica [09:24:27] those get a gtid [09:24:28] ie: I alter a candidate master before promoting it to master [09:24:45] then you reload those tables with binlog disbled (you just did that when recovering a dump) [09:24:50] marostegui: that's also a normal procedure for us..? [09:24:53] then you promote that server to primary [09:25:14] other hosts are not aware of gtids for the maintenance, so they detect it as missing [09:25:33] it then forces the drop on the replicas, to "sync state" [09:26:25] which is I belive why manuel does it with binlog, as it assumes the master and replica are in the same state [09:26:37] and why we end up with so long gtid strings [09:26:41] I wish MariaDB had a crash-safe flag not tight to GTID like mysql :( [09:26:44] from binlog-enabled maintenance [09:27:17] jynus: no, I do all the alters (and similar) with binlog disabled, to avoid messing up GTID sequences [09:27:28] yes, but it just happens [09:27:34] for example, on db2118 [09:27:45] yeah, they are hard to avoid sometimes [09:27:57] in some cases you do mess with GTID anyways ie: altering a sanitarium master [09:28:14] but that's why GTID should keep track of per domain_id transactions [09:28:16] there will be 2, or 3 gtid events because the recovery wasn't done "cleanly" [09:28:21] And be smart about it [09:28:33] jynus: i'm.. dubious about the scenario you're describing. if i understand correctly, for it to be an issue, when a replica sees a new primary and a new GTID index (the last part of the domain-server-index GTID) of, say, 500, it would then somehow request -1 through -499? [09:28:34] for the same thing [09:28:53] kormat, yep [09:28:59] because mariadb does not have that capability, afaict [09:29:05] in reality, replication most likely will break [09:29:12] which is why we had all that manual bullshit to do to find a gtid in a binlog [09:29:28] we had to do that because using binlog [09:29:33] mariadb can autoposition [09:29:47] but sometimes with unintended consequences [09:29:53] kormat: the exact issue you are describing is the reason I filled the bug, the domain_id wasn't able to distinguish transactions, which is its primary mission [09:29:55] this is documented behaviour, let me search it [09:30:12] we could workaround it if there was a way to cleaup those gtid artifacts [09:30:32] e.g. in this case, we would like to keep only events from db2118 and the old master [09:34:39] on strict mode doc kinda goes over it: gtid supposes binlog to be identical on all hosts [09:36:24] jynus: AFAICT, auto-positioning is a mysql-gtid feature. not mariadb. [09:36:29] bad things could happen if not-- with for us translated into topology changes breaking because missing transactions, etc [09:36:35] ? [09:36:47] if we enabled strict mode that would break all over I think [09:37:07] no, you can do a CHANGE MASTER TO on gtid mode, and it will be autopositioned [09:37:13] it will also likely break for us [09:37:23] but it works on lab conditions, I tested it :-) [09:37:51] jynus: what do you mean by 'autopositioned'? that's not a term in the mariadb docs (that i can find) [09:37:58] sorry, I mean [09:38:10] http://mariadb.com/kb/en/gtid/#changing-a-replica-to-replicate-from-a-different-primary [09:38:44] Ah, I think we have a different understanding of autopositioning [09:38:56] that works, as long as you have a pure usage of gtid [09:39:13] no extranous events on binlogs, all binlogs are identical on all servers, etc [09:41:04] the problem with the above is what was seeing on db2118, that it never worked right? that was the first that was attempted last week? [09:41:06] my reading of that section of the doc says the above scenario is not an issue [09:42:05] marostegui: what we know: using the dump metadata gtid_slave_pos failed because it skipped the transactions that were executed against db2118 between the dump and the primary switcdhover [09:42:17] (i.e. the exact same failure-case we got with binlog positions yesterday) [09:42:49] what we don't know is if db2118 skipped those transactions when using gtid because of the domain_id, or because of the normal (non-gtid) server_id [09:42:49] right [09:42:54] yeah [09:42:58] that's what i want to know [09:43:01] i'm hoping it's the latter [09:43:07] in which case doing what we're doing now will work [09:43:08] they are kinda- the same [09:43:16] jynus: they are absolutely not the same [09:43:22] gtid uses server_id as the first part of the id [09:43:28] i'm aware. [09:43:30] I know what you mean [09:43:41] ok, so you know what i mean, but you're saying the opposite. useful. :) [09:43:57] I said kinda- :-D [09:44:23] let me see if we're passed the usual breaking point yet [09:48:56] not yet. another 12h of replication-traffic to go. [10:30:17] Amir1: done all the wikis that have completed the cleanup for flaggedtemplates, the reduction is around 50% so pretty nice [10:30:50] awesome. I hope I can get arwiki done, that'd be a lot [10:30:57] let me know when done [10:31:08] 177GB jeeez [10:32:55] basically in any speed running it, it caused lag :((( [10:33:12] I started really slow, I increased the speed a bit yesterday [10:35:55] marostegui, every time I see you comment on a plan by Amir1, I read you in this voice in my mind: https://www.youtube.com/watch?v=Ia7513Kn7as :-D [10:36:10] jajajaja [10:36:31] lol [10:36:38] I should rewatch the whole thing [10:37:16] this would fit perfectly, changing Rick for Amir: https://getyarn.io/yarn-clip/83bea365-6dae-4740-b186-eb6943f3591e [10:37:58] Wait until we start dropping tables :D [10:47:22] marostegui: I think arwiki's falggedtemplate is bigger than the biggest table of enwiki (pagelinks)? [10:47:30] let me check [10:47:56] haha indeed, pagelinks on enwiki is 160G [10:48:32] the flagged thing is for wikis that chose manually which version from history show, right? [10:49:02] or has nothing to do with it? [10:49:31] jynus: yup, it's "pending changes" [10:49:55] not which version though, the most recent "stable" version [10:50:21] flaggedrevs truly is a masterpiece, like you have to work really hard to be this bad [10:50:45] I am guessing as it is not a global thing, it doesn't have as good maintenance as other parts? [10:51:40] your touching on another terrible aspect of this extension. It has such a flexible configuration that it can turn into completely different extension basically [10:52:07] it is only deployed on 50-ish wikis but most of them are pretty big, like enwiki [10:52:23] oh, enwiki has that, I didn't know [10:52:29] I thought it was only on smaller wikis [10:52:35] the part we got lucky was that enwiki has it on "protect mode" which is basically only a couple thousand pages [10:53:06] I see so it is flags that depending on a wiki are used for one thing or another [10:53:09] but the wikis that have it on full mode are dewiki, ruwiki, arwiki (If I'm not mistaken), etc. [10:53:21] ok, that is the part I have head about [10:53:26] *heard [10:53:52] some wikis are actually moving away, I changed it on idwiki from full mode to protect mode [10:54:09] and I honestly think protect mode is a good thing (we have it on fawiki too) [10:54:23] what's protected mode? [10:54:26] what does protect mode do, that is what I am not familiar with [10:54:28] just autoconfirmed users or what? [10:55:06] no, basically flaggedrevs but only on pages an admin protects them to be [10:55:11] let me grab an example [10:55:48] yeah, without an example I am not getting it- compared to page_props [10:55:53] https://en.wikipedia.org/wiki/Arizona [10:56:12] you see the magnifier at the top? [10:56:16] yup [10:56:19] I see, it is a soft protect [10:56:38] yeah [10:56:52] and flaggedrevs is per wiki entirely? [10:56:55] pages that have it https://en.wikipedia.org/wiki/Special:StablePages [10:57:44] marostegui: you can configure it to be on the whole wiki. dewiki is like that, meaning every edit (not by an "editor" user) has to be reviewed [10:58:10] that's why it's massive in dewiki https://en.wikipedia.org/wiki/Special:PendingChanges [10:58:20] compare to https://de.wikipedia.org/wiki/Spezial:Seiten_mit_ungesichteten_Versionen [10:58:22] wow [10:58:36] editor meaning autoconfirmed users? [10:58:44] depends on the wiki [10:58:51] on dewiki? [10:59:07] in dewiki, someone has to manually give the right to you [10:59:16] :-/ [10:59:19] at least that was how I got it [11:00:28] there is also whole concept of dimensions, tiers and levels but don't get me started on those. [11:00:49] and now I wonder how it intereacts with the translation feature [11:01:17] I have been trying to get it into a more bearable extension by dropping a couple thousands line of code https://phabricator.wikimedia.org/T277883 [11:02:17] jynus: my guess is that it doesn't :D I don't think meta and other multi-lingual wiki have this extension [11:03:08] this is mostly new to me, as I used to edit only on eswiki and commons, and very rarely on en [11:03:40] I knew it was on de, but hadn't explored it [11:05:07] Very interesting, thanks for the explanation [11:10:52] I recently gave a list of its biggest issues to a PM in WMF. It's hilarious. e.g. it uses action=ajax which was the predecessor to API in mediawiki and has been deprecated more than ten years now [11:11:05] I think it's the only blocker of removing that action [11:21:37] good news, I think? https://phabricator.wikimedia.org/T262668#7321326 [11:37:18] I now have to not forget to increase the size of the underlying file volume, because I created them with only 50 TB in size, and at this speed they will soon be full [12:38:38] marostegui: well, replication has gotten way past the primary switchover now, so our current method definitely does work [12:38:54] (which is a pleasant change :) [12:38:55] sweeeet [12:39:05] Let's see if the db-compare runs fine once it has caught up [12:51:50] jynus: do you have the s4 buster backup ready for the Monday switchover to be flipped this week? [13:55:28] it won't take me long to prepare it [13:57:27] great! [15:47:56] 1.1days of replication lag left to catch up on for db2118 [15:48:08] it will easily finish over night, at least [16:33:00] PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 245.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [16:34:30] PROBLEM - MariaDB sustained replica lag on m1 on db2078 is CRITICAL: 224.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2078&var-port=13321 [16:40:14] RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [16:41:44] RECOVERY - MariaDB sustained replica lag on m1 on db2078 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2078&var-port=13321 [18:52:57] marostegui: (for tomorrow). Please record the arwiki's before and after. It's the biggest one [18:56:23] definitely!!!