[09:03:38] 10DBA: ipb_timestamp is varbinary(14) in old wikis while being binary(14) in the code since 2007 - https://phabricator.wikimedia.org/T278619 (10Kormat) [09:08:00] 10DBA: ipb_timestamp is varbinary(14) in old wikis while being binary(14) in the code since 2007 - https://phabricator.wikimedia.org/T278619 (10Kormat) [09:11:19] 10DBA, 10Data-Services: Prepare and check storage layer for dagwiki - https://phabricator.wikimedia.org/T286683 (10LSobanski) p:05Triage→03Medium Thanks, let us know when the database is created, so we can sanitize it. [09:11:31] 10DBA, 10Data-Services: Prepare and check storage layer for banwikisource - https://phabricator.wikimedia.org/T286684 (10LSobanski) p:05Triage→03Medium Thanks, let us know when the database is created, so we can sanitize it. [09:16:48] 10DBA: ipb_timestamp is varbinary(14) in old wikis while being binary(14) in the code since 2007 - https://phabricator.wikimedia.org/T278619 (10Kormat) [09:21:01] 10DBA: ipb_timestamp is varbinary(14) in old wikis while being binary(14) in the code since 2007 - https://phabricator.wikimedia.org/T278619 (10Kormat) [09:59:27] Ohia! :) [09:59:41] 👻 [09:59:43] In Wikibase we just finished adding 2 new db groups that get sent to mediawiki [09:59:51] "from-client" and "from-repo" [10:00:16] 😮 [10:00:18] This comes from a very very very long actionable from incidents, and also wanting to know more about the db traffic from wikibase code [10:00:50] So, in theory now, all wikipediay wikidata traffic vs all wikidatay traffic (db taffic) could be split to dedicated hosts [10:01:12] ie, if wikidata.org repo gets bad code that kills the dbs, wikipedia could in theory continue to operate etc [10:01:18] https://phabricator.wikimedia.org/T263127 [10:01:32] i've been lobbying to _get rid_ of db groups [10:01:40] so you're now my mortal enemy [10:01:55] hahahhaaa [10:02:36] (the group bit itself was not much work) we had to do a refactoring for other reasons [10:02:50] so if we do decide that we want them to go away not not use them, that can also be fine [10:04:49] the tldr of this is giving us the option to segregate the traffic of 2 "different apps", so that one will not take down the other, there are certainly other ways to do that too [10:05:28] i like it on a conceptual level [10:05:36] i just hate it on a practical level. ;) [10:05:38] :P [10:05:51] db groups are a giant headache for maintenance for us [10:06:04] and make automating a lot of stuff basically infeasible [10:07:08] in an "ideal" world, all replicas would be ~identical drones [10:07:24] so the only decision is "do we have enough capacity across the rest of the replicas to take this specific one down for maintenance?" [10:07:36] yup [10:08:22] I guess another framing of this in a way would be, Wikipedia should have a read only service (backed by whatever resources needed) for DB access of wikidata data, and Wikidata should have its own master / replica services for its application [10:08:27] said ideal world would probably also have a dedicated loadbalancer service between MW and the dbs [10:09:06] s/MW/all the things that want to hurt us/ [10:09:16] in an ideal world, in some of our opinions, this direct DB access shouldn't happen anyway, and WP should access WD data via the API [10:10:23] right :) [10:10:36] cool, well, sounds like you may not want to use these groups, which is fine! I forget which incident this came from exactly so can't link to it for context [10:10:54] As they are there, is there any way to get any metrics around which db groups are used at all? [10:11:19] Another half reason that it was added was to see what % of requests to s8 actually come from wikidata itself, vs from all the other wikimedia sites [10:11:20] on our side we have no way of telling. i'd _hope_ the MW side has relevant metrics, but i wouldn't have any idea where to look [10:11:32] ack!, thanks! [10:12:09] (which is one of the complications for us. "ok, this host is in these groups, and getting 15k QPS of queries. which groups are responsible for what fraction of traffic? uhh... nfc" [10:12:11] ) [10:12:26] yeah, thats one of the other things we would love to know [10:12:38] hypothetically right now wikidata only accounts for 5% of traffic to s8 etc :P [10:12:42] but we have no idea [10:12:45] maybe it is 90% [10:13:10] would love to know the answer to that [10:16:05] turns out black boxes are only good ideas on aircraft ;) [11:56:25] 10DBA: ipb_timestamp is varbinary(14) in old wikis while being binary(14) in the code since 2007 - https://phabricator.wikimedia.org/T278619 (10Kormat) [11:56:54] 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Sustainability (Incident Followup), and 2 others: Detect object, schema and data drifts between mediawiki HEAD, production masters and replicas - https://phabricator.wikimedia.org/T104459 (10Kormat) [11:56:59] 10DBA: ipb_timestamp is varbinary(14) in old wikis while being binary(14) in the code since 2007 - https://phabricator.wikimedia.org/T278619 (10Kormat) 05Open→03Stalled Stalling until we switch back to eqiad. [14:51:15] addshore: I looked at those numbers in december, Wikipedia is around 60% [14:51:33] through performance schema [14:51:38] aaaah nice [14:52:29] my suggestion is to actually move terms to its dedicated group, that would highly take advantage of inndb buffer pool [14:53:02] and also would be easily doable once wikidata grows too big and we have to put terms in its own section but that's for later [15:21:11] jynus: hi, do you know when is the next backup of image table of commons? [15:21:23] next? [15:21:40] next start or is it hot backup? [15:21:57] there should be one in a few hours, around 19h UTC or so [15:22:08] is it daily? [15:22:27] almost daily, we speep a few days for logical backups or backup remote storage [15:22:34] *skip [15:23:02] and also increasing the retention with less space used [15:23:26] noted, when you have time or if possible can you check if it started to get smaller? [15:23:27] ah, but if rows have been removed [15:23:35] you will want the logical backups [15:23:48] which happen weekly on mon->tues nights [15:23:49] the rows have changed not removed [15:24:07] yeah, but the table size won't have changed unless optimize is run [15:24:16] it will show only on the logical backups [15:24:40] okay then, we have to wait for a couple of weeks to see [15:24:44] technically also on the pysical after compression [15:24:50] or logical backup [15:25:01] but the logical process is now very slow [15:25:04] or was [15:25:09] let me give you numbers [15:25:58] from what I can see the file usage didn't increase on s4 master for days now https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=12&from=now-90d&orgId=1&refresh=5m&to=now&var-server=db2090&var-datasource=codfw%20prometheus%2Fops&var-cluster=mysql [15:26:07] independently of that, if lots of space has been removed [15:26:18] consider open a conversation with dbas to potentially run optimize [15:26:48] they will know if/when to do that, if necessary [15:26:53] yeah, it'll be definitely needed, my assumption is that it'll go from 380GB (compressed) to 300GB [15:27:10] but not right now, so far only 5% is done [15:27:22] ah, so it is WIP, I see [15:27:27] nope sorry, from 300GB to 80GB [15:27:29] I don't have much context, sorry [15:27:37] 300GB will be gone [15:27:44] I would like to be aware of the progress [15:27:57] as it will affect backups of s4 but also backups of es* hosts, I guess [15:28:00] oh definitely, I sent an email to ops-l on Monday [15:28:07] yeah, I saw it [15:28:21] I mean add me to a ticket if there is one for the details of progress [15:28:39] T275268 [15:28:40] T275268: Address "image" table capacity problems by storing pdf/djvu text outside file metadata - https://phabricator.wikimedia.org/T275268 [15:28:50] already there [15:29:03] that way I can monitor the backup time and make sure es backups have enough space, etc. [15:29:06] thanks [15:29:55] I am getting you the backup times for s4 [15:29:57] one thing is that in old system it wasn't compressed on the database (only on disk) but the data in es is compressed in application side [15:30:00] I can paste them there if you want [15:30:07] ah, cool [15:30:27] so it shouldn't be that big [15:30:39] for backup storage everything is compressed, but the download took a lot [15:31:08] as image has text PKs, the tools is not clever enough to paralellize its dump [15:31:16] unlike other tables [15:31:54] I have a patch to add img_id but the table is basically not alterable at this size [15:32:57] once this is cleaned (and then djvu which would remove an extra 30GB) I will work on the img_id part [15:35:48] having the backup numbers would be great, here or in the ticket [15:41:48] working on it :-) [15:43:20] Thanks! [15:43:28] 10DBA, 10Commons, 10MediaWiki-File-management, 10MW-1.37-notes (1.37.0-wmf.14; 2021-07-12), and 4 others: Address "image" table capacity problems by storing pdf/djvu text outside file metadata - https://phabricator.wikimedia.org/T275268 (10jcrespo) I offered on IRC to paste the "backup time" of s4, this is... [15:43:36] ^it think this is sort of what you want [15:43:49] I've put the queries so anyone can do it later, doesn't have to be me [15:44:00] we store all that data indefinitely for now [15:45:33] there is some important *s: size of image on dumps is post-compression and of snapshots is pre-compression [15:45:43] although there could be innodb compression on source [15:46:21] I wonder if there was some related "image table is to slow to backup" [15:46:28] that I can depend on that task [15:47:47] 10DBA, 10Znuny, 10database-backups: OTRS database is "too large" - https://phabricator.wikimedia.org/T138915 (10jcrespo) [20:00:46] 10DBA, 10Data-Services: Prepare and check storage layer for dagwiki - https://phabricator.wikimedia.org/T286683 (10Urbanecm) i'm sorry, the bot creating the tasks created a duplicate. This is actually fully done as {T284456}. [20:00:59] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for dagwiki - https://phabricator.wikimedia.org/T284456 (10Urbanecm) [20:01:13] 10DBA, 10Data-Services: Prepare and check storage layer for dagwiki - https://phabricator.wikimedia.org/T286683 (10Urbanecm) [20:01:34] 10DBA, 10Data-Services: Prepare and check storage layer for banwikisource - https://phabricator.wikimedia.org/T286684 (10Urbanecm) [20:01:37] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for dagwiki - https://phabricator.wikimedia.org/T284456 (10Urbanecm) [20:45:02] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for shiwiki - https://phabricator.wikimedia.org/T284928 (10nskaggs) 05Open→03Resolved Tested from toolforge. [22:55:56] 10DBA, 10Toolhub, 10User-bd808: Discuss database needs with the DBA team - https://phabricator.wikimedia.org/T271480 (10bd808) a:03bd808 Adding the #DBA tag here as I think I'm ready to talk with the DBA team about deployment needs. The django backend has very similar storage needs compared to #striker. S...