[07:33:16] jynus: I want to switchover es5 master, from what I can see all the backups there are done, but can you also confirm? [07:34:02] they are indeed [07:34:57] Cool! [08:13:49] between 70 and 90 GB less for s3 and s7 snapshots [08:14:42] s4 too [08:19:24] nice! [08:19:35] in terms of time, do you know how much that is approx? [08:20:13] for backup probably not much [08:20:32] for recovery, around a 7% [08:20:39] oh sweet [08:21:13] jynus: do you have any query handy to get how much unique metadata we store? (external store is easy to figure out) [08:21:16] it is difficult to say an exact absolute number beacause recovery has some fixed costs [08:21:42] I think I had a spreadsheet [08:21:58] Ah that'd be nice, if not I can just double check with the masters [08:22:36] this is the summary from last year: https://docs.google.com/presentation/d/1w0MkfTQL3HB3vMSuVV4PnO5fDojoy_XAOpAjfnf1FkQ/edit#slide=id.g1616b7b70b_0_195 [08:23:58] are you planning to include a slide like that in the state of the union presentation for your area? [08:24:03] Sept 2021: https://wikitech.wikimedia.org/w/index.php?title=File:Haciendo_copias_de_seguridad_de_todo_el_conocimiento_humano_con_Python_y_software_libre.pdf&page=5 [08:24:05] if that's the case I can omit mine [08:24:16] yeah, I wanted to refresh it and share the data with you [08:24:20] ah excellent [08:24:23] Sounds good [08:24:37] We can sync once you've got it (no rush at all) [08:25:19] feel free to do some db digging and we can share it at https://docs.google.com/spreadsheets/d/1aAo8COkz3_P3NS73i-ZZXu0gocx1J6mlzA79Drwo8CA [08:26:18] Cool, I will see what I can do! [08:26:46] I want to give some rough numbers for metadata+es hosts [08:26:55] And not spending more than 30 seconds on it [08:27:05] that will help checking it matches previous calculations [08:29:19] for example, in 2020 the data said: (1110 + 948 + 1833 + 1760 + 1044 + 1212 + 2093 + 1389 + 328) * (1024 * 1024 * 1024) - (E14/3) [08:29:27] which is the size of metadata dbs - binlogs [08:29:39] 1110 is enwiki? [08:29:48] my guess [08:29:50] This is our current enwiki: | 856.36 | [08:29:50] +------------+ [08:30:27] how is that calculated? Note the "aproximate dataset size, original format" [08:30:42] I just checked information_schema table sizes [08:30:44] so that means raw ibd sizes on the spreadsheet [08:30:44] and did a sum [08:31:05] calculating data is complicated, as there are many ways to do it [08:31:11] i know [08:31:13] all correct, but different [08:31:21] We can simply also do: du -sh /srv [08:31:22] XD [08:31:25] yeah, just stating the obvious [08:31:45] Which is probably a lot easier for this presentation, so we present metadata+binlogs at once [08:32:07] I separated binlogs because those are planned for backup next Qs [08:32:28] I think I will include them in the first slide [08:32:44] It is some rough numbers [08:33:05] we can have 2 sets of numbers [08:33:14] after all backup size != database size [08:33:24] I keep more than one copy :-) [08:33:33] yeah [08:35:02] if I find the time (unsure) I may create some grafana graphs to automate this [09:22:20] marostegui: maybe that'd be useful. clouddb1021 is the only db that has s1-s8 (all of them). I use its storage as rough estimation of the size https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=28&orgId=1&var-server=clouddb1021&var-datasource=thanos&var-cluster=mysql&from=now-90d&to=now [09:22:37] Amir1: yeah, but that host has sanitized data, so I don't use it [09:22:49] ah yeah [09:48:03] btullis: hi, Is there documentation to depool wikireplicas through confctl [09:48:34] Amir1: it needs to be done via the proxies [09:48:59] Ah I get what you mean I think [09:49:57] Amir1: I think I've failed to add docs to Wikitech. It should be here, I think: https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replicas#Physical_proxy_layer [09:50:42] Amir1: These are the steps that I used last. https://phabricator.wikimedia.org/T298940#8122811 [09:51:08] let's add that to wikitech, it is useful for quick depools [09:51:09] I cann add them to wikitech, unless you think that there is a better way like a cookbook. [09:51:20] nah, wikitech is much better [09:51:22] thanks! [09:51:34] btullis: ideally a cookbook would be the way to go, but meanwhile, having that in wikitech is good [11:47:59] new backup aliases: https://phabricator.wikimedia.org/P34724 [12:05:44] pro-tip (in case it was what you wanted to do ;) ): in cumin if you don't pass a command (or use the --dry-run flag) will just print the matching hosts and exits [12:07:01] I know, I actually applied that for controling kernel upgrades [12:12:00] ack, maybe it's useful for some of the latest addition to the team :) [21:06:32] (MysqlReplicationLag) firing: (2) MySQL instance db1139:13311 has too large replication lag (1h 29m 37s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:11:32] (MysqlReplicationLag) resolved: (2) MySQL instance db1139:13311 has too large replication lag (10m 13s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:21:32] (MysqlReplicationLag) firing: (3) MySQL instance db1139:13311 has too large replication lag (10m 13s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:26:32] (MysqlReplicationLag) firing: (4) MySQL instance db1139:13311 has too large replication lag (10m 13s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:31:32] (MysqlReplicationLag) resolved: (2) MySQL instance db1150:13314 has too large replication lag (27m 57s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:47:32] (MysqlReplicationLag) firing: MySQL instance db1171:13318 has too large replication lag (1h 56m 3s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1171&var-port=13318 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:52:17] (MysqlReplicationLag) firing: (2) MySQL instance db1171:13318 has too large replication lag (21m 13s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:57:17] (MysqlReplicationLag) resolved: (2) MySQL instance db1171:13318 has too large replication lag (40m 31s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [22:57:54] hi! How to determine the correct mysql port to use from client to server? for example when I look at db2160 it has iptables rules that allow connections to 3321, 3322, 3342 etc. it's not default 3306 [22:58:12] in thase I want to use m3-slave.codfw.wmnet [22:58:26] so that is db2160 and not a dbproxy [22:58:54] while m3-master.eqiad.wmnet would go via dbproxies [22:59:36] Oh, i think I just found out in the process list [23:00:36] well, the process that uses --defaults-group-suffix=@m3 among the other mysql processes I guess [23:01:42] listens on 3323 and 3343, trying the lower one and it's tcp6 only [23:19:52] there is not a "correct" one, there are several instances assigned at different services [23:21:00] db2160 has more than one db instance running [23:22:06] yea, and that there is more than one instance running makes me ask which one it is :) [23:22:13] but I found it should be 3323 [23:22:30] except we do need to ask for grants after all.. it seems [23:23:08] because I can connect from my eqiad host to m3-slave.eqiad.wmnet but not from my codfw host to m3-slave.codfw.wmnet with identical credentials [23:23:19] let me reopen the ticket we had about the grants for phab [23:23:59] don't worry at this time, I will comment on the ticket T315713 and thank you! [23:23:59] T315713: sort out mysql privileges for phab1004/phab2002 - https://phabricator.wikimedia.org/T315713