[02:31:31] (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (db1108:13351) - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [05:52:20] ^I wonder if that could be my fixes for zarcillo? [06:03:08] pff [06:03:10] x2 switchover is a mess [06:03:32] our current tooling isn't ready for this sort of multi-master scenario [06:03:37] I am glad we don't care that much about the data anyways [06:05:23] you should just switch topology [06:05:42] in theory, like with pc, it should be able to write to both servers at the same time [06:05:51] no, it is not that easy, trust me [06:06:01] I would have done it that way if it were [06:06:08] and if it doesn't, you surface a but [06:06:27] just set circular replication in write-write and change the app [06:07:00] is is hearbeat what fails? [06:07:31] no, everything failed [06:07:41] ? [06:08:45] <_joe_> "change the app" is easier said than done, given x2 is serving an omni-present interface in mediawiki right now. I'd rather focus on what tooling needs improvement [06:09:05] no [06:09:14] I meant just update configuration [06:09:23] without setting read only at db layer [06:09:42] "repoint mw" would have been better wording [06:09:57] again, not that easy if the topology isn't erady [06:09:58] ready [06:10:36] cannot circular replication be set? [06:11:00] in theory x2 should be ok with multiple masters [06:12:14] https://phabricator.wikimedia.org/T313811#8110957 [06:13:37] yeah [06:16:29] that is why I say switch-replication shouldn't be used, that case is not supported [06:16:50] it is meant only for single-master cases, where you have to set read-only [06:18:01] as I understood x2 is more like parser cache, you just setup circular replication and repoing dbctl with no read only [06:18:16] and in theory it will write everything consistently [06:18:29] *repointing [06:18:31] you still need to change the topology [06:18:32] anyways [06:18:36] I am going to get breakfast [06:19:00] yeah, circular replication is supposed what it is doing all the time- as it will be active active [06:19:11] I don't think orchestrator will like that though [06:20:21] that's what I'm saying that we still need to fix our tooling [06:21:44] the issue is that having a "clean" heartbeat table is not possible in a multi-heartbeat environement [06:21:54] *multi-primary [06:22:55] anyways I'm off [06:24:26] heartbeat is not the problem [06:31:31] (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (db1108:13351) - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [06:36:16] (PrometheusMysqldExporterFailed) resolved: Prometheus-mysqld-exporter failed (db1108:13351) - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [06:36:39] ^I restarted the systemd unit, I think that worked [08:09:25] s5 eqiad snapshot wrong_size 7 hours ago 630.8 GB -13.0 % The previous backup had a size of 724.9 GB, a change larger than 5.0%. [10:25:29] marostegui: Thank you for all the troubleshooting work of 10.6 <3 <3 [10:25:35] ping me if you need anything from me [10:25:44] sure will do [11:24:26] so this is not in a rush, but I would like your (DBAs) input for the 2 points I mentioned on the weekly meeting, and Am|r's feedback on the db_inventory grant cleanup + other fixes 0:-) [11:33:33] marostegui: okay if I drop the old columns of templatelinks in testwiki? [11:34:13] that way any code paths using it implicitly (and I missed it in my search) start to error and I can fix them [11:56:22] sure [11:57:57] wohooo [12:35:34] Amir1: quick question, the schema change deployment script would repool a host if the host is depooled already? [12:36:02] marostegui: depends on the configuration looked up during the start of the script [12:36:10] it is depooled now [12:36:17] And I don't want the script to repool it [12:36:38] it'll probably will repool it if it was pooled when the script started [12:36:45] no no, it won't be [12:37:00] I mean, the host is depooled and I haven't started the script [12:37:13] than it treats it as a random depooled host and won't touch pool status [12:37:17] sweet [15:20:28] marostegui: jynus https://phabricator.wikimedia.org/T299417#8112279 [15:20:53] nice [15:21:08] did you see my previous comment about s5 BTW? [15:21:27] where? [15:21:30] (not expecting you to comment if expected, but please raise an alarm if it is weird [15:21:39] here on irc, let me paste it again [15:21:50] s5 eqiad snapshot wrong_size 14 hours ago 630.8 GB -13.0 % The previous backup had a size of 724.9 GB, a change larger than 5.0%. [15:22:53] hmm, it can be flaggedrevs clean up I'm doing but can't say for sure. x1 redaction is expected but I don't have anything running for s5 to my knowledge [15:23:21] I can check, I know have a non-productionized shiny dashboard that makes checking stuff easier! [15:23:27] oh nice [15:23:43] let me know what table had the biggest redaction, I'd take a look [15:39:32] Amir1: I belive to be templatelinks: https://phabricator.wikimedia.org/P32065 [15:40:24] jynus: hmm, I think because the alter caused an optimize, did the logical size change? [15:40:54] I haven't dropped anything there yet, I just started making it nullable [15:41:07] I think I won't know until next week [15:41:27] okay, I think this is sorta expected templatelinks table grow and shrink a lot [15:41:40] this is data from 14 hours ago, literally :-D [16:36:46] Amir1, I see some Error 1054: Unknown column 'tl_title' in 'where clause' in testwiki [16:37:15] cool, which codebase is still on it [16:37:17] I check [16:38:21] I'm sure I fixed ApiQueryBacklinks sigh