[00:10:34] RESOLVED: DiskSpace: Disk space centrallog2002:9100:/srv 3.939% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=centrallog2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [18:51:34] FIRING: DiskSpace: Disk space centrallog2002:9100:/srv 3.967% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=centrallog2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [19:43:50] cwhite: fun recursion bug :/ https://phabricator.wikimedia.org/T406170 [20:14:37] Krinkle: hmmm... I wonder what happens if we moved setLabel() to just before the stop()? [20:15:03] yeah that should work, that'll be the short term fix (I suggested that in CR) [20:15:10] I wonder if we could make it Just Work, though. [20:15:23] maybe start() keeps its own state until observe. [20:15:30] or maybe revisit the need for a cache [20:16:00] instrumented a pageview and a load.php request as example [20:16:11] https://performance.wikimedia.org/xhgui/run/symbol?id=68fbd7cf2c15650c3c6bcdb0&symbol=Wikimedia%5CStats%5CStatsFactory%3A%3AgetMetric [20:16:12] https://performance.wikimedia.org/xhgui/run/symbol?id=68fbd7cd077f975e9a2feacf&symbol=Wikimedia%5CStats%5CStatsFactory%3A%3AgetMetric [20:17:17] it seems the get/set adds about as much as we save from constructors, and presumably memory higher overall either way [20:17:27] object creation is pretty cheap nowadays [20:22:53] Possibly RunningTimer should be smarter about label intentions. [20:27:06] I don't think we can get away from a cache completely. If it's a smart cache (like StatsCache is now) or a simple array with MetricInterface instances, we still want to put the overhead of writing udp packets at the end of the request. [21:17:25] cwhite: yes, a buffer of observations. [21:17:46] I forgot that we store samples in the Metric object [21:17:47] makese sne [21:19:41] Looks like the Metric object also serves to enforce label pair consistency. If nothing else by proxy of them remaining from a previous invocation. [21:21:01] actually no, that's taken care of by fresh() [21:23:29] That part of the shared state is quite useful. Okay so not it's that simple to remove. I thikn logically there's a reasonable expectation that getCounter/getMetric doesn't share "surprising" state with a previous one, which is mostly true, given fresh(), except when the same metric has two overlapping calls and/or recursion. It feels like each one is thought of as separate with shared state being limited to buffering and [21:23:29] validation. [21:23:36] Anyway, food for thought. [21:25:17] potentially such validation could be done after the fact, e.g. when consuming the samples we could verify that samples with the same metric name set the same set of labels and a log a warning. That has the downside of not having the stack trace attributed to the call, and making it log-file only instead of e.g. php-xdebug on the web page at runtime. [21:26:02] so maybe a two-tier is justified one shared cache but then empeheral objects that only create one sample. [21:26:39] I suspect this model is inspiured by long-runing applications where yu're likely to set up a metric object and then increment it multiple times. [21:27:00] as opposed to CGI where a metric is rarely called twice or if it is, it'll go through the same code again to set up the labels. [21:33:45] Yeah, that's the case. Other Prometheus clients expect that you instantiate your metrics prior to doing anything else and then pass the instance(s) around if they're needed in other parts of the code. AFAICT, one cannot fetch a metric from the registry (StatsCache) in other clients. [21:34:07] In python, trying to redefine a metric once instantiated throws a ValueError. [21:38:13] It's a valid expectation that the labels will follow the RunningTimer. [22:51:49] FIRING: DiskSpace: Disk space centrallog2002:9100:/srv 2.776% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=centrallog2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace