[05:27:09] marostegui: I love this new memory leak: Every time db load increases, memory on appservers go weeeeee [05:27:30] T297667 [05:27:30] T297667: mysqli/mysqlnd memory leak - https://phabricator.wikimedia.org/T297667 [05:29:16] yeah, just saw that [06:53:58] marostegui: I'm sure I found the problem. Fixing it OTOH... [06:58:51] Oh cool! [08:18:11] And found the underlying problem and made two patches. Let me find someone to merge it [09:10:29] Thanks Amir1! [09:32:42] afk for a bit [12:59:21] PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 39.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [13:15:53] RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [13:27:18] back [13:28:53] :( [15:47:47] Amir1: can you review some of the tasks on the Refine column? I think we have like 2-3 that can be merged on the last one you created [15:48:08] or else we are going to end up with very similar tasks trying to address the same things I reckon [15:48:17] marostegui: sure sure, let me push the ubn thingy out first [15:48:28] marostegui: so technically they are not the same thing [15:48:48] absolutely! no rush! [15:56:06] marostegui: ooof https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&refresh=1m&from=now-6h&to=now [15:58:01] Amir1: I might be blind, but I don't see anything specific? [15:58:16] I just pushed the change, the load is going down [15:59:50] ah ok, still not seeing a big pattern change, let's give it a few mins [16:00:41] yeah, it'll take half an hour to cache to actually warm up [16:09:15] Amir1: will this affect external store? https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&refresh=1m&from=now-6h&to=now&var-site=eqiad&var-group=core&var-shard=es5&var-role=All [16:09:22] just checking if that was expected [16:20:45] it looks back to normal [16:34:25] marostegui: sorry came back from meeting, no, it shouldn't :D [16:37:20] it hasn't been fully recovered but it's partially due to all of the caches needing to be warmed up. The full result will be in a day (given the TTL of a day)