[01:10:35] PROBLEM - MariaDB sustained replica lag on m1 on db2078 is CRITICAL: 8.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2078&var-port=13321 [01:12:35] RECOVERY - MariaDB sustained replica lag on m1 on db2078 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2078&var-port=13321 [08:11:22] I am investigating PROBLEM - Check systemd state on cumin2001 is CRITICAL: CRITICAL - degraded: The following units failed: database-backups-snapshots.service [08:12:00] it seems to be s8 snapshot [08:14:50] Good morning everyone, trying my luck during EU business hours. Reposting from yesterday evening/night: [08:14:51] 23:33 Hello everyone, could someone explain the difference between the runtimes and EXPLAINs for the queries shown above? Of course, the right thing to do is to cast the IDs to integers before querying, but I'm wondering what makes MariaDB to ignore the index when it gets mixed types, but use it both in all-strings and all-integers scenarios. https://www.irccloud.com/pastebin/i5j7mvX8/ [08:15:00] 23:33 Context: Blogpost written at https://phabricator.wikimedia.org/T293452, draft available to WMF staff at https://docs.google.com/document/d/1OTcog8suaJ0CHkCDBhGDUUj44cY6Sook5auBM7X8eIA/edit# for now [08:23:09] urbanecm: I can check later and maybe provide a trace of the query optimizer, but I am busy now with the backups thing [08:23:41] sure, this is by no means urgent :). Backups are definitely more important. [08:23:57] It's more of "we made it go faster by changing this, but...why does it work?" [08:24:25] well, essentially the last one is using the key and the other one is doing a full scan XD [08:24:51] sure, but...why? :D [08:36:51] the usual reason is 'spite' [08:40:18] Another q from last night that's definately not important: how does parsercache being shorter for the main page work? [08:40:31] ottomata: re: restoring a single db, it's supported to do that from the logical dumps (which is also probably want you want given the scale of the data and the reason for doing the restore) [08:40:36] So it always purges when the day rolls over and new content appears on enwiki [08:41:30] Spookreeeno: i'm not sure if this answers your question, but when a pc entry is written, an expiry time is added to the row [08:41:43] a daily purge job goes through all rows and removes any with an expire time in the past [08:41:54] Right [08:42:05] however, mw already won't access/see an entry that's past its expiry time [08:42:18] Which by standard is 7 days for me when it expires, and probably different for me [08:42:35] But how do we make that expiry time different for specific pages [08:42:41] Spookreeeno: the current numbers, iirc, are 21 days for normal pages, and 7 days for talk pages [08:43:19] being able to set a different expiry length for those 2 different types of pages is a very recent thing [08:43:56] i don't know of anything more specific than that [08:43:58] but i'm no expert on the code side of this :) [08:44:41] ottomata: you'd be following this procedure here, but also supplying a `--database` arg to `recover-dump`: https://wikitech.wikimedia.org/wiki/MariaDB/Backups#Recovering_the_data [08:48:14] kormat: exptime=20211021092129 is in the past right [08:50:42] * kormat goes crosseyed trying to figure it out [08:51:18] Spookreeeno: no that's in 30mins, if i'm not completely brain-broken [08:52:18] kormat: timezones exist [08:52:27] I picked a bad example [08:52:30] Spookreeeno: right, i'm looking at utc [08:52:36] But you stumbled me on to something [09:03:44] marostegui: damnit, just found one last schema change that i forgot about: T278619 [09:03:45] T278619: ipb_timestamp is varbinary(14) in old wikis while being binary(14) in the code since 2007 - https://phabricator.wikimedia.org/T278619 [09:04:03] ≤o/ [09:10:36] kormat: parsercache is evil [09:10:49] Spookreeeno: no argument here :) [09:10:49] * Spookreeeno should not have tried to understand i [09:10:51] it [09:22:34] kormat: i found a way of doing this. I make no guarantees any part of it is sane. [09:23:07] * Emperor fails to parse ≤o/ [09:23:35] Emperor: It was a mistake, I wanted to write \o/ [09:23:36] XD [09:23:45] But then I was like: nah, no need to fix that [09:23:47] lol [09:24:28] sigh, turns out doing git config --global gitreview.remote origin on day one (per the gerrit quickstart) and then forgetting all about it doesn't end well if you later try to interact with anyone else's gerrit /o\ [09:28:00] Emperor: anything you forget will come back and bite you [09:35:32] also, the error messages from git review -s are ... unhelpful [13:20:56] kormat: that is good news! thank you. [13:26:57] Emperor: re: T294016, i think you wanted to add data-presistence as a tag, not a subscriber [13:26:58] T294016: Swift-recon -d overstates disk capacity and usage - https://phabricator.wikimedia.org/T294016 [13:28:49] kormat: done, ta. Sorry, still find all the various labels and suchlike in phabricator a bit confusing [13:29:10] Emperor: no worries. i still get confused by it at times [13:29:29] Emperor: still less confusing than emacs [13:29:40] 🔥 [13:31:03] https://twitter.com/wcrichton/status/1450871977552629767 ;-p [13:31:20] haha [13:33:34] hahahaha [13:34:36] ottomata: re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/732369/, can i ask what the motivation behind this is? [13:35:00] the change on its own looks useful, but it makes this multiinstance class very different from the numerous other multiinstance classes we already have to support [13:35:21] and i, at least, don't have the bandwidth to go converting all the rest of them so they're all similar 'shapes' again [13:39:29] IWBNI we had fewer slightly-different database classes IMO [13:40:22] Emperor: sure. but it's highly non-trivial, sadly. [13:40:57] it's on my todo list, but not for anytime soon [13:41:31] i once spent a while trying to make a mariadb::common class that would contain stuff that everyone used [13:41:39] except that turned out to be the empty set in the end [13:42:15] "yay" [13:42:17] some of it comes down to limitations of puppet [13:42:24] or interactions with our style guide [13:42:38] (which in the end is basically the same issue) [13:43:00] Emperor: believe me, no yays were issued in this journey [13:43:18] (IWBNI = "it would be nice if", for those playing along at home) [13:44:06] thanks for explaining the acronym! It really helps [13:45:25] MTAAS at your service [13:45:32] :( [13:45:41] (Matthew Translator As A Service) [13:45:47] XDDDD [13:51:57] Sorry, I do seem to have a bunch of acronyms in my idiolect, and just keep using them [14:01:36] Emperor: it's one of the issues that comes with moving to a workplace that has a high proportion of non-native-english speakers :) [14:07:14] yeah, the issue with the acronyms, in my case is that it breaks the focus as if you are reading a conversation, then you need to switch, go to a browser, google it, read it, and then come back to the conversation and apply it on the fly [14:07:23] kormat: mostly DRYing code, the analytics one was a big copy/paste [14:07:34] thought it woudl be nice to have a resuable class for which all you had to do was provide configs [14:07:36] rather than writing puppet [14:08:35] kormat: perhaps what I should do is make a generic mariadb::multiinstance class (not rewrite the dbstore one) [14:08:48] 😬 😬 😬 [14:08:50] there be dragons [14:09:00] and note in the docs that if you need something more than it can do, you make your own clalss [14:09:26] kormat: haha which part? a generic multiinstance class? [14:09:32] yep! [14:09:38] say more! [14:10:10] the existing multiinstance classes are all _similar_. but they're not the same. and the differences will drive you mad [14:10:33] it really needs a _major_ refactoring before i'd even contemplate doing that [14:11:05] refactoring each one might be a pain, but, perhaps starting a new generic one that convered some cases (like dbstore and analytics) would work? then in the future, if you need multiinstance, you could use that class? [14:11:18] (analytics-meta) [14:14:03] ottomata: e.g., have a look at modules/profile/manifests/mariadb/misc/multiinstance.pp [14:14:25] sanitarium looks the same too [14:14:39] specifically the block that starts with `if $section == 'm3' {` [14:14:42] marostegui: do feel free to keep asking me to explain - either I'll get out of the habit, or you'll remember the more common ones :) [14:15:00] hahaha [14:15:08] Emperor: will do! [14:15:14] that one looiks pretty simliar execpt forthe phabricator stuff.. [14:15:25] you coudl use a generic class, but then include another class that adds the stuff you need [14:15:51] conditional inclusion of another class depending on the section you're iterating over? [14:16:01] marostegui: ....perhaps it would make the most sense to stop using settings in /etc/my.cnf on multiinstance [14:16:02] nono [14:16:03] get rid of that [14:16:10] generic multinstnace profile [14:16:23] then, role includes that, plus other things you need like etc/mysql/phabricator-init.sq [14:16:40] ottomata: but what's the goal of all this? [14:16:53] oh, the template works fine with the generic setup, that is a param to mariadb::instance [14:16:59] marostegui: DRY reusable code? [14:16:59] marostegui: to take over the world [14:17:04] easier for folks to set up mariadb instances? [14:17:42] (oops sorry marostegui didn'tmean to ping you was responding to kor mat, but hello too! ) [14:19:45] ottomata: The implications of refactoring all this code are massive (many things to keep in mind, many things can break) and I am not sure if it is worth the effort (as of today) given the amount of new and completely different instances of mariadb we spawn per year (which are normally 0). I am not saying this isn't something needed, I am saying that I don't think we are in a position where we can commit to spend [14:19:45] time [14:19:45] of this any time soon [14:19:58] Thanks irc client [14:20:13] ^ what marostegui said [14:21:11] woudll it be ok to just make a generic ::multiinstance class that accepted instance parameters as configs, and used it for analytics meta, without refactoring the others (ok mayyyybe dbstore if you like it) [14:22:13] ottomata: so.. there's already profile::mariadb::misc::analytics::multiinstance, which you're free to do as you wish with. is it not currently suitable? [14:23:07] its suitable but analytlics specific, and i wanted to use it to get rid of profile::analytics::database::meta [14:24:21] i'd like to be able to use a class in cloud vps to set up these mariadb instances with hiera only [14:27:19] i'm probably missing something here :) you say it's analytics specific, but.. you're in the analytics team, and working on analytics stuff [14:28:55] (i'm guessing that 'analytics' means different things in different contexts here) [14:29:12] if possible, i'd like to not have to write puppet code to set up a new mariadb instance. that's what modules are for, isn't it? to DRY up your stuff you can generically and reproducable recreate environments? [14:29:15] we have a test cluster as welll [14:29:23] the setup will be different there, with different configs [14:29:35] different replication setup (no backup, maybe different nodes, colocated stuff) [14:29:40] we also test out things in cloud vps [14:30:42] we used to have several different ways of using puppet to set up http servers, but we've successfully unified and generified those I think, was just hoping to improve the experience for mariadb too [14:36:27] ottomata: so, to be clear, the aim is laudable (though i doubt that single-instance and multi-instance will ever be covered with a single class). there's just a Huge amount of inertia with puppet language constraints, style guide constraints, and the existing codebase. i do want to tackle this, but i'd estimate it's on the order of 2-3 months of work [14:37:26] kormat: to do a refactor? or just to make a generic multi instance class to use for my current work? [14:37:29] one of my main previous attempts: https://gerrit.wikimedia.org/r/c/operations/puppet/+/622578 [14:38:53] ottomata: by all means, make a generic multi instance class under profile::analytics::database. knock yourself out [14:39:02] i agree a big refactor of everything would be hard, and maybe not that useful for things that have very specific needs. But, for cases where we can start fresh and the needs are pretty similar, a generic class seems possible [14:39:10] ah ha you don't want it in a non 'analytics' namespace eh? [14:39:16] too tempting for others to use (and then abuse?) [14:39:37] i don't want it in the 'mariadb' namespace, specifically :) there's already Far too much stuff in there that i have to maintain [14:39:46] oh hm. i see. [14:40:00] if at some point i have the bandwidth to look at refactoring our stuff, then i can take a look at your new generic base class, and see if it would work [14:40:11] ok lemme try some stuff then, i'll change the patch (leave dbstore alone) and see what you trhink [14:40:22] thanks [14:40:28] as it is, whenever i make changes to how mariadb is managed, i go through everyone's puppet classes and make the changes (mariadb, wmcs, analytics) [14:40:46] so another class means one more place for me to look at, but so be it [14:41:04] well i'll remove to 'analytics' classes in the process so -1 total? :) [14:41:06] two* [14:43:33] haha actually, taking it out of mariadb and not connecting to existing stuff it means is can make it even more generic...uh oh...