[08:34:54] FYI, Last snapshot for s4 at codfw (db2139) taken on 2022-08-07 21:31:34 is 1711 GiB, but the previous one was 1857 GiB, a change of -7.8 % [10:30:39] I amended the patch and asked you about 2 more, probably leftovers of old maintenance [11:00:57] I will be around to make sure there is not surprise notifications with the puppet changes [13:56:41] I think I know what the issue is- when I plug my earphones on my laptopt- it expect those to have a mike too, and it wasn't the case [13:56:55] (I normally don't use my laptop for meetings) [13:57:53] urandom: this is what I meant https://www.mediawiki.org/wiki/Manual:Sql.php [13:58:25] it still provides normal mariadb connections, but hides password management, configuration behind a wrapper [13:58:45] maybe something like that could be setup for data engineering & cassandra [13:59:01] (I don't know the details, it was a blind suggestion) [13:59:44] actually I think I meant to link to: https://www.mediawiki.org/wiki/Manual:Mysql.php [14:04:09] jynus: so users can execute the script, but not read LocalSettings.php? [14:05:21] yeah, it doesn't impede access, but allow us (for some meaning of us) to change everything around it- grants, config, real servers, etc [14:05:44] it abstracts the connection so users don't have to know the server details/changes [14:06:31] oh, so they can read them, but don't directly depend on them [14:06:32] that is an example, but a similar model but different implementation is there for key authentication- [14:06:51] ssh key access is not provided, but a service is given that provides the service [14:07:11] search for keyholder on wikitech (that is focused on ssh access) [14:07:37] it makes MUCH easier to do the admin bits, after abstraction [14:07:55] and users keep having essentially the same access [14:08:06] ideally, I'd like for the credentials to be opaque to the user/developer, like is the case for services that once deployed have their credentials templated from private.git [14:08:26] yes, that is the idea [14:08:41] people cannot lose passwords they don't have in the first place! [14:09:03] but keep the privileges they need [14:10:48] in the absence of that, just having a canonical configuration -even if transparent- would be an improvement... [14:10:58] yeah [14:11:07] but we're talking about 3 different kinds of services that haven't yet been configured in this way [14:11:30] my intention with this is to give you inspirations [14:11:58] of other places where this was done: mw database access, ssh access for deployment/remote execution [14:12:12] and a 3rd I can think of is clouddbs [14:12:31] where there was a dot file provided for each user [14:12:41] and later I think they integrated a proxy service for HA [14:13:06] again, mentioning this not to tell you to do this, but in case it helps as a reference [14:13:31] (you are not alone :-D) [14:13:36] no no, examples help, thank you! [14:54:33] re. T314789, please feel free to yell directly at me on IRC ^^ (also Data Persistence *is* under SRE right..?) [14:54:33] T314789: SRE/Data Persistence consultation — use of FSFileBackend for caching audio files - https://phabricator.wikimedia.org/T314789 [15:00:14] TheresNoTime: sorry, I have no context, but I would make sure to add service ops for the app layer and then data persistence distributed storage [15:02:24] I think that is the kind of usage that wants to be suppored and there is ongoing work for that, but I have no idea if it is offered yet, Matthew or filippo will know-add them to the ticket [15:04:47] that would be #serviceops and #SRE-swift-storage tags [15:10:09] Thank you! :) [15:37:22] PROBLEM - pt-heartbeat-wikimedia service on es2021 is CRITICAL: CRITICAL - Expecting active but unit pt-heartbeat-wikimedia is failed https://wikitech.wikimedia.org/wiki/MariaDB/pt-heartbeat [15:52:27] Amir1: was it just a shutdown or did it power crash?- if the second, I can do a data check [15:53:17] jynus: I think it's power crash [15:53:25] T314559 [15:53:25] :-( [15:53:25] T314559: es2021 (B3) lost power supply redundancy - https://phabricator.wikimedia.org/T314559 [16:28:43] RECOVERY - pt-heartbeat-wikimedia service on es2021 is OK: OK - pt-heartbeat-wikimedia is active https://wikitech.wikimedia.org/wiki/MariaDB/pt-heartbeat [16:30:54] Amir1: unless you tell me not to, I will be doing tomorrow a data check, as content servers are quite key for our infra. It is unlikely any data corruption happened, but I want to make that a 0% probability [16:31:15] jynus: I was about to ask you to do it. Thanks! [16:31:25] (and it is party of my job to make sure data is consistent and not lost) [16:34:20] this is now a bit late, but it would help for maintenance to create new cluster from time to time to make the table size limited, even within the same server [16:34:42] e.g. regular backups would shrink in size as most of the data would be on read only clusters [16:35:10] this is too ambitous to be here on IRC, so I am just speaking aloud, and not a big deal