[12:22:08] something weird happened at 10:22, s1 mw query traffic doubled [12:22:53] https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&refresh=1m&var-site=eqiad&var-group=core&var-shard=s1&var-role=All&from=1637043768628&to=1637065368630 [12:25:29] there was an increase in writes until 11:48, but after then reads kept high- only on s1 [12:31:47] I see similar growth on memcache requests at the same time- seems traffic-driven; there is a relatively big impact in latency on app servers- lots of new uncached requests? [12:33:07] https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=1637056430575&to=1637061375321 [12:38:30] lots of loadSlotRecords, which tells me, without looking at logs, that someone may be scraping enwiki with lots of uncached requests [12:40:47] Should we get someone from ServiceOps to take a look? [12:43:08] Looking at the last 30 days it's not an unusual spike. [12:43:51] not at peak time, but we are supposed not at it yet [12:44:33] plus some may be missleading- dumps are isolated to one or 2 hosts, same for backups [12:45:00] not saying this is worrying yet, but to keep it in mind [12:45:52] In that case it's back to my original question :) [12:49:24] seems to be going down, so maybe it is "normal" [12:51:15] sobanski, probably those spikes were less normal before last month: https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?viewPanel=26&orgId=1&from=1631632182121&to=1637066993888 [12:52:10] that's a baseline increase I wasn't used to [12:55:51] nothing that can be solved with "moar s1 instances" [12:55:55] *cannot [13:08:52] I pinged Alex to see if it's something they're aware of. [13:16:19] alternatively it could be useful because sometimes some memcache issue could cause higher db traffic [13:17:25] but based on the graph it looks like just an increase in uncached traffic [13:59:45] mentioning this analytics backup issue here so you are aware (but no action needed from DBAs): https://phabricator.wikimedia.org/T284150 [14:00:00] (at least, yet) [16:14:53] The AHT Team is working on this investigation https://phabricator.wikimedia.org/T293263 [16:14:54] Some of the potential solutions were [16:14:54] A. Explore possibility of not purging ip_blocks and finding ways to reuse that table [16:14:55] B. Create a new table tailor it for this purpose [16:14:55] However both of these may result in potentially tables with a huge number of rows [16:14:56] so the team advised that we reach out to DBA team to get a sense of what is acceptable table sizeĀ  in terms of number of rows. [16:35:08] hey, Tsepo not sure if DBAs are around right now- best way is for you to ask the question on ticket, I just added the DBAs/architects so hopefully one can advise you when available [16:54:18] Tsepo, IRC is best for real time communication, but I think people will prefer a ticket comment for async consulting [16:57:10] Tsepo: I will get to it asap [16:57:45] Thank you everyone.