[07:19:36] jynus: I assume we don't have to specifically include labswiki anywhere within the backups for s6, right? as it is just backuped entirely. Although I do assume we need to remove labswiki from m5 backups (maybe after next week), right? [07:19:36] jynus: I assume we don't have to specifically include labswiki anywhere within the backups for s6, right? as it is just backuped entirely. Although I do assume we need to remove labswiki from m5 backups (maybe after next week), right? [07:20:15] true on both, although it wouldn't hurt checking things are as they should after backups run [07:20:15] true on both, although it wouldn't hurt checking things are as they should after backups run [07:20:32] Yeah, so far I am not going to remove it from m5 [07:20:32] Yeah, so far I am not going to remove it from m5 [07:20:40] Let me include the step on the pending things list [07:20:41] Let me include the step on the pending things list [07:20:49] they didn't run last nigh, so we can check on monday/tuesday [07:20:50] they didn't run last nigh, so we can check on monday/tuesday [07:21:08] yeah, thanks, I can take care of the grants removal [07:21:08] yeah, thanks, I can take care of the grants removal [07:21:17] just ping me on the right ticket [07:21:17] just ping me on the right ticket [07:21:23] Sounds good, added to the list: https://phabricator.wikimedia.org/T167973#7359504 [07:21:23] Sounds good, added to the list: https://phabricator.wikimedia.org/T167973#7359504 [07:21:57] I think we can do them as the rest of the grants are also cleaned up [07:21:58] I think we can do them as the rest of the grants are also cleaned up [07:22:07] sounds good [07:22:07] sounds good [07:22:51] not sure what is going on with phab, or my browser, but lately, a comment link doesn't move my browser focus to the right comment :-( [07:22:51] not sure what is going on with phab, or my browser, but lately, a comment link doesn't move my browser focus to the right comment :-( [07:23:04] It does for me on Chrome [07:23:04] It does for me on Chrome [07:23:09] Just tested with the above link [07:23:10] Just tested with the above link [07:23:22] it may be me then [07:23:22] it may be me then [07:24:23] let's add a "cleanup labswiki grants on m5 (mediawiki and backups)"? what do you think? [07:24:24] let's add a "cleanup labswiki grants on m5 (mediawiki and backups)"? what do you think? [07:24:28] just did :) [07:24:28] just did :) [07:24:31] oh [07:24:32] oh [07:24:49] he he, I loaded a previous version [07:24:50] he he, I loaded a previous version [07:25:19] so that would be dump grants- but only for labswiki [07:25:19] so that would be dump grants- but only for labswiki [07:25:25] yep [07:25:25] yep [07:25:36] cool [07:25:36] cool [07:25:49] The grants are so messed up look at this [07:25:49] The grants are so messed up look at this [07:25:54] `labswiki_eqiad`.* TO `wikiadmin`@`%` [07:25:54] `labswiki_eqiad`.* TO `wikiadmin`@`%` [07:26:04] I wonder what that database used to be XD [07:26:04] I wonder what that database used to be XD [07:26:04] yeah, I saw them briefly, also for s6 [07:26:05] yeah, I saw them briefly, also for s6 [07:26:49] but I think you are well aware of my worry about grant handling :-) [07:26:49] but I think you are well aware of my worry about grant handling :-) [07:27:12] :) [07:27:12] :) [07:27:18] which reminds me something [07:27:19] which reminds me something [07:27:24] if you have 1 spare minute [07:27:24] if you have 1 spare minute [07:27:30] sure, what's up [07:27:30] sure, what's up [07:27:48] so I sent some days ago a patch with grants related to mediabackups [07:27:49] so I sent some days ago a patch with grants related to mediabackups [07:27:53] and I got a +1 from you [07:27:53] and I got a +1 from you [07:28:05] but what I really wanted is to have a small discussion [07:28:05] but what I really wanted is to have a small discussion [07:28:19] https://gerrit.wikimedia.org/r/c/operations/puppet/+/712993 [07:28:19] https://gerrit.wikimedia.org/r/c/operations/puppet/+/712993 [07:29:06] Sure, that's what we discussed on a monday meeting no? [07:29:06] Sure, that's what we discussed on a monday meeting no? [07:29:13] so the dumps-part is trivial- those are not used, but are commited as documentation [07:29:13] so the dumps-part is trivial- those are not used, but are commited as documentation [07:29:18] yep [07:29:18] yep [07:29:45] there is 2 weird things here, however [07:29:45] there is 2 weird things here, however [07:30:38] I am giving access media backups software to mediawiki metadata [07:30:38] I am giving access media backups software to mediawiki metadata [07:30:45] to scan the image et all tables [07:30:45] to scan the image et all tables [07:30:50] which is normally a big no [07:30:50] which is normally a big no [07:31:13] but it is how I can get unblocked before doing it with a php mediawiki maintenance script [07:31:13] but it is how I can get unblocked before doing it with a php mediawiki maintenance script [07:31:51] and the other thing is the production grants, which are not really added to the misc puppet code [07:31:51] and the other thing is the production grants, which are not really added to the misc puppet code [07:32:15] I am wodnering wondering if we can narrow those grants to only the image table [07:32:15] I am wodnering wondering if we can narrow those grants to only the image table [07:32:16] I want to make sure that was an understoo +1, because honestly, even myself I get lost on grants [07:32:17] I want to make sure that was an understoo +1, because honestly, even myself I get lost on grants [07:32:22] marostegui, I wanted to [07:32:23] marostegui, I wanted to [07:32:43] but the problem is there is no wildcards for wikis [07:32:43] but the problem is there is no wildcards for wikis [07:33:06] ah right it cannot be %wik%.image? [07:33:06] ah right it cannot be %wik%.image? [07:33:09] e.g. the alternative is writing 1000 db names and maintaining that on every wiki addition [07:33:09] e.g. the alternative is writing 1000 db names and maintaining that on every wiki addition [07:33:15] yeah, that's crazy [07:33:15] yeah, that's crazy [07:33:34] maybe, what I can do [07:33:34] maybe, what I can do [07:33:37] And *.image? [07:33:37] And *.image? [07:34:03] I think I tried and couldn't, but if you don't mind trying on a test db? [07:34:04] I think I tried and couldn't, but if you don't mind trying on a test db? [07:34:13] I totally trust you! [07:34:13] I totally trust you! [07:34:25] We can try on db1124/db1125 though [07:34:25] We can try on db1124/db1125 though [07:34:29] that's the thing, I am asking for review because I don't trust myself [07:34:29] that's the thing, I am asking for review because I don't trust myself [07:35:04] I thought mysql supported *.table_name [07:35:04] I thought mysql supported *.table_name [07:36:32] certainly the opposite works [07:36:32] certainly the opposite works [07:37:16] I think I tried a few combinations of % and * and I bailed out, but again, I wouldn't mind a second opinion! [07:37:17] I think I tried a few combinations of % and * and I bailed out, but again, I wouldn't mind a second opinion! [07:37:19] let me try on db1124 [07:37:20] let me try on db1124 [07:38:09] I think I tried `%wik%`.* and that didn't work [07:38:09] I think I tried `%wik%`.* and that didn't work [07:39:21] sorry [07:39:21] sorry [07:39:23] I meant [07:39:23] I meant [07:39:31] `%wik%`.image [07:39:31] `%wik%`.image [07:40:03] and maybe because it is allowed on replication filters [07:40:03] and maybe because it is allowed on replication filters [07:40:17] I got confused [07:40:17] I got confused [07:40:54] It is silly that mysql checks to see if the table exists everywhere [07:40:54] It is silly that mysql checks to see if the table exists everywhere [07:43:33] I am reading docs and they say it is impossible [07:43:33] I am reading docs and they say it is impossible [07:43:40] either db.* or *.* [07:43:40] either db.* or *.* [07:43:43] yeah, I am not finding any combination :( [07:43:43] yeah, I am not finding any combination :( [07:43:43] yeah [07:43:44] yeah [07:43:49] "there's no facility for dynamic matching of table names to granted privileges" [07:43:49] "there's no facility for dynamic matching of table names to granted privileges" [07:44:15] yeah, so what you have in the patch is fine [07:44:16] yeah, so what you have in the patch is fine [07:44:29] now my justification for that would be: 1) this is an internal access, and it is to a dump host, it can leak info but doesn't access actual mw tables [07:44:29] now my justification for that would be: 1) this is an internal access, and it is to a dump host, it can leak info but doesn't access actual mw tables [07:44:39] 2) it is temporary until we do the scanning with a mw script [07:44:40] 2) it is temporary until we do the scanning with a mw script [07:44:46] and those hosts do not have any public IP anyways, right? [07:44:46] and those hosts do not have any public IP anyways, right? [07:44:51] nope [07:44:51] nope [07:45:24] but can you see why I wanted to talk to you, and not just take your +1? [07:45:24] but can you see why I wanted to talk to you, and not just take your +1? [07:45:37] it is not my proudest patch :-( [07:45:37] it is not my proudest patch :-( [07:45:38] Yep, totally [07:45:38] Yep, totally [07:45:46] there is another topic [07:45:47] there is another topic [07:45:58] this should be less problematic [07:45:59] this should be less problematic [07:46:27] on the grants for production, they are not referred by puppet [07:46:27] on the grants for production, they are not referred by puppet [07:46:37] unlike other misc servers [07:46:37] unlike other misc servers [07:46:46] that's https://gerrit.wikimedia.org/r/c/operations/puppet/+/712993/2/modules/role/templates/mariadb/grants/production-mediabackupstemp.sql.erb [07:46:46] that's https://gerrit.wikimedia.org/r/c/operations/puppet/+/712993/2/modules/role/templates/mariadb/grants/production-mediabackupstemp.sql.erb [07:47:08] they are just documentation [07:47:09] they are just documentation [07:47:14] yeah, makes sense [07:47:15] yeah, makes sense [07:47:45] ok, so I wanted basically to be sure you were aware of all the compromises (codework for bad decisions) I made [07:47:45] ok, so I wanted basically to be sure you were aware of all the compromises (codework for bad decisions) I made [07:47:50] 0:-) [07:47:51] 0:-) [07:48:30] hopefully I can soon slowly improve them [07:48:30] hopefully I can soon slowly improve them [07:48:47] yeah, I think that's all what we can do for now [07:48:47] yeah, I think that's all what we can do for now [07:48:49] and also in the future [07:48:49] and also in the future [07:48:52] To keep moving [07:48:52] To keep moving [07:49:04] we could maintain a list of grants dynamically on a system [07:49:05] we could maintain a list of grants dynamically on a system [07:49:16] e.g. the list of wikis on a centralized location [07:49:16] e.g. the list of wikis on a centralized location [07:49:30] to do what you asked [07:49:30] to do what you asked [07:49:44] yeah, although that's very unique case [07:49:44] yeah, although that's very unique case [07:49:47] thank you for the help [07:49:48] thank you for the help [07:50:38] yes, but in general, things are going to be dynamic: list of hosts [07:50:38] yes, but in general, things are going to be dynamic: list of hosts [07:50:43] clients, servers, etc. [07:50:43] clients, servers, etc. [07:51:12] yes, that for sure [07:51:12] yes, that for sure [07:51:23] but I mean generating ad-hoc grants (ie: per table grants) [07:51:23] but I mean generating ad-hoc grants (ie: per table grants) [07:51:37] yeah, that is right now not normal [07:51:38] yeah, that is right now not normal [07:52:11] but if things were automated in a super-duper system, it wouldn't be too crazt :-D [07:52:11] but if things were automated in a super-duper system, it wouldn't be too crazt :-D [07:52:20] yeah indeed [07:52:20] yeah indeed [07:52:32] I think some of the current grants are broad just for sake of maintenance [07:52:33] I think some of the current grants are broad just for sake of maintenance [07:52:57] as, in general, not looking at anything in particular [07:52:57] as, in general, not looking at anything in particular [07:53:48] yeah, once we have that system we will be able to do a major clean up [07:53:49] yeah, once we have that system we will be able to do a major clean up [07:58:32] [I know it's Friday, but...] it wasn't clear to me from Fabian's mail what kind of storage they wanted (archive, filesystem, object store, database...); is that context already well-known? [07:58:32] [I know it's Friday, but...] it wasn't clear to me from Fabian's mail what kind of storage they wanted (archive, filesystem, object store, database...); is that context already well-known? [08:45:50] don't worry, Emperor, I think the context was me offering using media backups metadata to some researchers [08:45:50] don't worry, Emperor, I think the context was me offering using media backups metadata to some researchers [08:46:14] so I answered based on that, but anything more than that manager should definitely be involved [08:46:14] so I answered based on that, but anything more than that manager should definitely be involved [10:52:45] time to download more disk from the Internets :-): https://grafana.wikimedia.org/d/000000607/cluster-overview?viewPanel=2849&orgId=1&var-site=eqiad&var-cluster=backup&var-instance=All&var-datasource=thanos&from=1631703147145&to=1631875947145 [10:52:45] time to download more disk from the Internets :-): https://grafana.wikimedia.org/d/000000607/cluster-overview?viewPanel=2849&orgId=1&var-site=eqiad&var-cluster=backup&var-instance=All&var-datasource=thanos&from=1631703147145&to=1631875947145 [10:53:34] that link doesn't work for that dashboard [10:53:34] that link doesn't work for that dashboard [10:53:38] I meant to share: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=backup1004&var-datasource=thanos&var-cluster=backup&from=1631865199479&to=1631875999479&viewPanel=12 [10:53:38] I meant to share: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=backup1004&var-datasource=thanos&var-cluster=backup&from=1631865199479&to=1631875999479&viewPanel=12 [11:36:37] What's that in amount of TB used? [11:36:37] What's that in amount of TB used? [11:38:11] right now around 330TB in total [11:38:11] right now around 330TB in total [11:38:30] my initial calculations were of 380TB total [11:38:30] my initial calculations were of 380TB total [11:38:36] (when finished) [11:38:36] (when finished) [11:39:12] jeeez [11:39:12] jeeez [11:39:27] (that's why it take so much over ethernet :-)) [11:39:27] (that's why it take so much over ethernet :-)) [11:40:22] *takes [11:40:22] *takes [12:03:23] we just need faster ethernet ;-p [12:03:23] we just need faster ethernet ;-p [12:09:38] Ceph nodes at Sanger had ConnectX-5 cards, 2x100Gb ports [12:09:38] Ceph nodes at Sanger had ConnectX-5 cards, 2x100Gb ports [12:10:22] to be fair, I am most bottlenecked by not doing a lot of production requests than ethernet [12:10:22] to be fair, I am most bottlenecked by not doing a lot of production requests than ethernet [12:11:14] bandwidth peaked at 160MB/s [12:11:15] bandwidth peaked at 160MB/s [12:11:24] or maybe, write speed on slow HDs [12:11:24] or maybe, write speed on slow HDs [12:12:07] that's a bit missleading, 160MB/s, per host [12:12:07] that's a bit missleading, 160MB/s, per host [12:20:54] 122 GB on the db already, which was within the range I though it was going to be (200GB) [12:20:55] 122 GB on the db already, which was within the range I though it was going to be (200GB) [13:31:57] Emporer: Happy to discuss requirements for faster network at any time! Unlikely we'd be able to deliver 100G to hosts in the short-term, although our plans are to only use SFP28 based top-of-racks for the next expansion, which supprt 25G optics (and potentially we can bond multiple interfaces if needed). [13:31:57] Emporer: Happy to discuss requirements for faster network at any time! Unlikely we'd be able to deliver 100G to hosts in the short-term, although our plans are to only use SFP28 based top-of-racks for the next expansion, which supprt 25G optics (and potentially we can bond multiple interfaces if needed). [13:32:29] Jynus: At 160MB/s sounds like its not maxing out 10G yet, so perhaps there are other things at play also. [13:32:29] Jynus: At 160MB/s sounds like its not maxing out 10G yet, so perhaps there are other things at play also. [13:36:57] topranks, there are indeed [13:36:57] topranks, there are indeed [13:37:31] to be fair, I am most bottlenecked by not doing a lot of production requests than ethernet [13:37:32] to be fair, I am most bottlenecked by not doing a lot of production requests than ethernet [13:37:45] or maybe, write speed on slow HDs [13:37:45] or maybe, write speed on slow HDs [13:38:38] Yeah write speed on the disk is an obvious one alright. We also have a not insignificant number of drops on our ASW->CR links in eqiad, which will impact TCP throughput between rows and sites. [13:38:38] Yeah write speed on the disk is an obvious one alright. We also have a not insignificant number of drops on our ASW->CR links in eqiad, which will impact TCP throughput between rows and sites. [13:38:53] this projects is a bit different than past ones [13:38:53] this projects is a bit different than past ones [13:39:00] we have a lot of overhead per object [13:39:01] we have a lot of overhead per object [13:39:30] many small objects, and that is not only on transmission- I have to hash twice every object, classify it, etc. [13:39:30] many small objects, and that is not only on transmission- I have to hash twice every object, classify it, etc. [13:40:02] Ok gotcha, so there is a lot more involved in each, cpu, memory access etc. I'm sure. [13:40:02] Ok gotcha, so there is a lot more involved in each, cpu, memory access etc. I'm sure. [13:40:15] although as expected CPU >> network overhead, here overhead I mean things that are layer 7 reasons :-) [13:40:15] We're currently making the case for spend on new network kit / a new design to support growth. So do let us know about any network pain points or improvements you'd like to see, it will help us make our case :) [13:40:16] although as expected CPU >> network overhead, here overhead I mean things that are layer 7 reasons :-) [13:40:16] We're currently making the case for spend on new network kit / a new design to support growth. So do let us know about any network pain points or improvements you'd like to see, it will help us make our case :) [13:40:32] you already know my pains with raw bandwidth :-) [13:40:32] you already know my pains with raw bandwidth :-) [13:40:53] BTW, should we check now that we are on eqiad, if issue is still ongoing? [13:40:53] BTW, should we check now that we are on eqiad, if issue is still ongoing? [13:41:14] I can have a look at next weeks graphs to see how we are doing, I belive you did some work there [13:41:15] I can have a look at next weeks graphs to see how we are doing, I belive you did some work there [13:41:16] Actually yes. We do still have discards unfortunately, but they are less than before, so maybe we have seen an improvement. [13:41:16] Actually yes. We do still have discards unfortunately, but they are less than before, so maybe we have seen an improvement. [13:41:43] Yeah we re-allocated the buffer partition, which will mean less drops, although it's not entirely eliminated the problem. [13:41:43] Yeah we re-allocated the buffer partition, which will mean less drops, although it's not entirely eliminated the problem. [13:41:44] I will be able to tell next week then, big transmissions happen I think nite from tuesday to wed [13:41:45] I will be able to tell next week then, big transmissions happen I think nite from tuesday to wed [13:42:00] and will update the ticket with what I can find [13:42:00] and will update the ticket with what I can find [13:42:11] great thanks! [13:42:12] great thanks! [13:42:22] thanks to you, for helping us so much! [13:42:22] thanks to you, for helping us so much! [13:42:38] you are "saving (cross-dc) backups" in a way [13:42:38] you are "saving (cross-dc) backups" in a way