[07:01:42] hello folks [07:01:52] kafka logging seems to have /srv partitions getting filled up [07:02:47] most of the space used is for udp_localhost-{warning,info} partitions [07:03:30] there is some space on the pvs, but it may not be enough for the long term, maybe some topics are getting too big [07:05:24] ah yes I see that we have some special retention overrides already [07:06:47] Topic:udp_localhost-warning PartitionCount:6 ReplicationFactor:3 Configs:retention.bytes=300000000000 [07:07:14] we could do the same for udp_localhost-info, it is currently ~440G (every partition I mean) [07:09:46] https://wikitech.wikimedia.org/wiki/Kafka/Administration#Temporarily_Modify_Per_Topic_Retention_Settings [07:12:18] basically [07:12:19] kafka configs --alter --entity-type topics --entity-name udp_localhost-info --add-config retention.bytes=300000000000 [08:43:04] thanks for taking a look elukey, yeah uniform space retention SGTM [08:43:21] godog: buongiorno [08:43:34] buongiorno indeed! [08:43:54] lemme know if it is ok to apply the change or if you prefer to wait [08:44:09] in theory it should drop some hundreds of GBs once applied [08:45:11] yeah I think we're good to go ahead [08:45:25] molto bene, proceeding [08:46:25] thank you, appreciate it [08:46:28] applied [08:46:30] <3 [08:49:10] super now on kafka-logging1001 we have 1.3T of free space in /srv (+1TB of free space) [08:52:01] very nice, yeah definitely will need some deeper investigation as space has been filling up since jan :| [08:52:05] https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&var-site=eqiad&var-cluster=logstash&var-instance=All&var-datasource=thanos&viewPanel=2929&from=1638866928675&to=1646642928675 [13:43:09] speaking of unified kafka settings: https://phabricator.wikimedia.org/T276088 [13:43:20] :) [13:49:18] godog: o/ yt? [13:49:26] wanted to talk about https://phabricator.wikimedia.org/T294420#7754240 [13:50:47] ottomata: sure, reading [13:51:05] q: is it okay to delete all metrics for a job in pushgateway when the job starts, and then remit metrics at the end? [13:51:41] the most frequent these jobs will run is about every 10 minutes, but most of them will run about once per hour. Sometimes, in cases of failure or maintenance, they might be paused for many hours [13:55:38] yeah I think that might be okay [13:59:28] ottomata: re reading the task it seems the unique label was mapping task id to its kafka topic + partition [13:59:35] this bit [13:59:37] > Gobblin reports metrics per job and per task, each task being the import of a single kafka-partition. We are gonna use the kafka-partition (topic name +partition number) the prometheus tag by which we report our task metrics. [13:59:48] yes [13:59:51] and that works godog [13:59:59] but only for some of the metrics [14:00:01] not all of them [14:00:44] there are some that we could hack an association with toppar [14:00:58] ok, yeah looks like some metrics are just job specific not kafka specific, that makes sense [14:00:58] if they happen to be emitted in a task that emits metrics with toppars [14:01:17] but, some are more global and emitted from the launcher process, afaict [14:02:01] global to the job ? [14:04:13] like gobblin_last_successful_run and gobblin_job_duration [14:04:20] Hi folks - I just joined and therefore have no backlog - I'll ask silly questions if I don't understand something :) [14:04:25] these are not emitted from a worker task [14:04:28] so have not kafka info [14:04:30] no* [14:04:36] hi joal ! thanks for joining [14:04:41] joal: so'k not much expect for what is in the task [14:04:54] these two comments: https://phabricator.wikimedia.org/T294420#7753856 [14:07:12] joal: i can't get toppar for all metrics; some metrics don't come from workers. actually hm [14:07:17] i guess that is only the two i just mentinoed [14:07:20] the first two in the list [14:07:23] hm [14:07:34] but they are also not emitted from a task [14:07:44] they only come from the launcher, so are only emimtted once [14:07:46] so that's ok [14:07:49] indeed ottomata [14:08:00] once per job-run [14:08:03] i guess....perhaps i can get toppar for all the others? hmmmm actually no [14:08:03] so ok [14:08:05] i can't get them for metrics. [14:08:10] you should! [14:08:18] some metrics have toppar [14:08:26] oh? unless i share what I get out of KafkaExtractorTopicMetadata [14:08:30] which is a gobblin event [14:08:37] maybe they are some we have not kept for simplicity [14:08:45] afaik the only metrics with toppar are in the KafkaExtractorTopicMetadata gobblin event [14:08:46] OH [14:08:50] i only looked at what you had listed. [14:08:50] hm [14:09:14] so there might be a metric that has kafka toppar in the task, and i could hold onto that [14:09:26] the thing is: you have toppar only on event/metric related to the kafka extractor [14:09:29] and emit is a label/grouping key at end of task for metrics that don't' have it [14:09:32] yes [14:09:43] That's exactly my idea ottomata [14:09:46] i didn't look fork kafka extractor metrics, but only because they weren't listed here [14:09:53] i'll see if i can find that [14:10:02] okay, if i can do that then I think i can make this work. [14:10:04] hm [14:10:06] thank you for reading my brain when I don't actually dump it properly :) [14:10:16] grouping keys [14:10:17] ok so [14:10:32] i should just includ topic, partition in grouping keys [14:10:38] and that will suffice to solve my problem [14:10:44] the only case where that might be weird [14:10:53] is if we ever reduce partitions for a topic [14:11:03] and that only happens if we delete and recreate a topic [14:11:03] yup [14:11:09] which is never unless we are being weird [14:11:30] or [14:11:45] if we remove a topic from the gobblin job [14:11:47] which does happen. [14:12:01] in that case the metrics for that toppar will never be re-pushed [14:12:05] and will sit in PushGateway forever [14:12:06] yup [14:12:14] is ^ correct godog ? [14:12:28] that's fine I think, the last success metric will eventually alert I think [14:12:39] no, last success will still update [14:12:40] but yeah it is correct afaik [14:12:49] a single job ingests many kafka topics [14:12:50] hm actually no ottomata, if we "PUT" isntead of post with a grouping key we should be good no? [14:12:59] not if toppar is in the grouping key [14:13:23] we'd end up never calling PUT for the removed topic [14:14:43] if the grouping key is "gobblin_events" or "gobblin_metrics", we'd erase all of the current values as we send new ones [14:14:47] ottomata: --^ [14:15:04] right but [14:15:12] that will call PUT in every task [14:15:19] resulting in each task deleting the other's metrics [14:15:28] AH! good call [14:15:41] it would mean toppar as partition key indeed [14:17:25] right, and in that case, we need to figure out what to do if a topic is no longer being ingested [14:17:33] which will certainly [14:17:34] happen [14:17:58] i can't see what else to do other than call a big ol delete before pushing new metrics [14:18:18] mwarf I get it :( [14:18:23] so tthat would be grouping key with jobname and as you say gobblin_events or gobblin_metrics [14:18:52] delete before pushing metrics [14:18:53] I misspoke earlier, so each group gets its push_time_seconds updated [14:18:56] and using POST / pushAdd instead [14:19:13] that's the metric from pushgateway itself, in other words we can detect stale groups [14:19:18] which we want to do anyways [14:19:49] hm [14:19:53] okay [14:19:55] hm [14:20:11] s o [14:20:23] godog: does prometheus eventually stop pulling those out of pushgateway? [14:20:34] or, are they just eventually deleted from pushgateway? [14:20:37] godog: if we could have that automatically done that'd be so awesome :) it'd allow us to send "temporary" metrics that we knew would be deleted after! [14:21:04] ottomata: no we have to delete stale groups [14:21:19] hm [14:21:48] godog: this will happen farely frequently, e.g. when an instrumentation event stream is decomissioned [14:21:54] not every day [14:22:01] but possibly every month or so [14:22:20] and we dont' do it manually [14:22:35] instrumentation owners just remove the stream from strea mconfig [14:23:20] I see, in that case yeah in the background we could scan what's still in pushgateway and what in the config and remove what's extra [14:23:30] say once a day or sth [14:24:17] (I'm verifying that "scan what's still in pushgateway" is a thing) [14:24:18] godog: would it be easier just to have the job delete its own metrics when it runs? [14:26:09] ottomata: you'd need a list of all grouping keys though to do so ? [14:26:18] godog: not if we don't use toppar in the grouping key [14:26:28] which we don't need to do if we always delete everything first [14:27:01] ottomata: that wouldn't work for metrics would it? [14:27:16] (I don't know what toppar is) but sure in that case the grouping key is the job only? [14:27:53] godog: sorry, toppar == kafka topic, kafka partititon [14:28:04] grouping key is job and um, metric type [14:28:07] which only has two valuyes [14:28:13] 'event' or 'metric' [14:28:16] I have a suggestion: would we use "task-number" as a grouping key ottomata? this one at least we own and we know when we change them [14:28:36] joal: that would work but we'd have thet same problem if we change task numers [14:28:37] ottomata: ack, thanks [14:28:56] then we'd have to manually clear out pushgateway somehow [14:29:25] yes ottomata, but that would be very less frequent than kafka-topics I think - and, as it would be "us" doing it, we could update the push-gateway accordingly [14:29:35] yeah that's true [14:29:43] but...i guess... is there a reason not to do it thte delete way? [14:29:50] it is thet most automated way afaict [14:29:56] I think I don't understand the delete way ottomata :) [14:30:20] if you don't add toppar as a PK, then your differsnt tasks overwrite each-other metric [14:30:44] no [14:30:54] we only delete once in the launcher for the job [14:31:25] ...not sure how to accomplish that tho :) [14:31:31] Ah - we scan the gateway for metrics in the form: gobblin_*** and we drop them all [14:31:38] i guess some code outside of the reporter [14:31:44] no need to scan joal [14:31:47] just delete for grouping key [14:31:55] how do you know grouping keys? [14:32:03] https://prometheus.github.io/client_java/io/prometheus/client/exporter/PushGateway.html#delete(java.lang.String,java.util.Map) [14:32:09] the grouping keys are only 2 [14:32:15] jobName, metric_type [14:32:21] and metric type is either 'event' or metric [14:32:26] so it is a total of 2 delete calls [14:32:46] well, not when the data is sent from different tasks [14:32:57] we need a grouping key per task [14:33:07] oh no, we'd use pushAdd in the tasks [14:33:08] POST [14:33:15] so [14:33:23] start job: delete all metrics for job and event and metric reporter ttypes [14:33:33] I think I get it [14:33:35] then, in each task , use pushAdd (and topic partitions in labels) [14:34:02] IIRC "POST" don't override when metrics have different names - not sure if labels come into play here [14:34:31] in my tests pushAdd only adds metrics, and a metric is distinct with its labels [14:34:50] I think I get it too now btw [14:35:37] haha great! but will it work okay? will pushgateway/prometheus be mad if it scrapes while metrics are deleted? [14:35:47] the doc says "POST works exactly like the PUT method but only metrics with the same name as the newly pushed metrics are replaced (among those with the same grouping key)" --> it depends if name here means "full metric name including labels" or just "metric-name" [14:36:19] i'm pretty sure labels matter; i don't see how i could have ever used pushAdd and seen metrics for multiple topics otherwise [14:36:28] (pushAdd == POST iiuc) [14:36:37] --^ true [14:36:41] ottomata: I think the views from prometheus' perspective are atomic so shouldn't be a problem, but at the same time we're building experience together on pushgateway [14:37:16] ottomata: you have pushAdd only once per job, all metrics at once :) [14:37:23] once per task [14:37:28] Ah! [14:37:40] oh [14:37:42] althoguht true [14:37:48] in my test i'm only running with one task! [14:37:52] i will need to test that then. [14:38:09] oh no [14:38:11] there are 2 pushes [14:38:14] and, weren't you having issues with not having metrics for tasks as of now? [14:38:16] in the launcher [14:38:20] and in the task [14:38:28] ottomata: but with different metric names [14:38:32] ? [14:38:45] when i was using pushAdd (POST) everythign was working (for events at least) [14:38:52] yup [14:39:17] for me the question is about what "name" means in the gateway doc [14:39:22] godog: what does prometheus do if it scrapes something and metrics it previously scaped are now gone? [14:39:46] okay, so i'll proceed with this idea for now and see how it goes [14:39:47] ottomata: nothing, the metric is gone [14:40:17] godog: ok, will it be weird if the scrape happens in the middle of the job run and the metric is gone? [14:40:47] i guess it doesn't matter, because we'll only be alerting on latest values stored in prom, and as long as the job finishes we should be okay [14:41:06] ottomata: the dashboards will show the gap for sure, potentially even be considered different metrics (not sure, say displaed in different colors in grafana) [14:41:08] most of the time pushgateway will have metrics to scrape (and usually they will be unchanging...until the next job run) [14:41:14] hm okay [14:44:17] taking a break, bbiab [14:45:10] k thanks godog will update task with discussion [14:48:26] joal: and just checking, IF we needed task number (we don't i think), we'd have to parse it out of the taskId, rightt? [14:48:49] correct ottomata [14:51:57] k lemme know if https://phabricator.wikimedia.org/T294420#7757004 looks like a good summary of our convo [14:55:22] ottomata: adding my note about name/label potential issue to the ticket [14:56:10] k [14:56:31] yeah right i need to run multiple tasks to test that for sure [14:56:42] will try to test that first before going too far :) [14:59:49] right and joal ok, if that is true, we can use toppar in groupingkey [14:59:53] but figuring out how to delete will be harder [14:59:55] we'd need to scan [15:00:28] indeed ottomata - therefore possibly using task_id - making "on our side" only [15:00:33] right. [15:00:44] and wait, a task # only ever works a single toppar? [15:01:53] nope ottomata [15:01:56] oh [15:02:01] the opposite is true [15:02:09] task 1 may work many top pars? [15:02:24] Ah! actually no my bad ottomata [15:02:33] you're right: task = toppar for us [15:02:44] okay [15:02:56] same worker can work multiple tasks, 1 task per toppar [15:03:05] k [15:03:20] need to drop for kids - back at standup :) [15:04:08] k ty joal [15:10:42] yeah summary LGTM [15:16:30] heh joal i suppose if we have to use task # in grouping key, we could just always do a big giant delete for a large number of tasks [15:16:38] and not have to think about if we reduce them manually [15:16:47] so always delete grouping keys for tasks up to 1000 or soemthing [15:30:11] using task # in the groupingKey is unfortunate though, it mmeans that we are emitting a useless labe [15:30:21] which will increase cardinality [15:30:32] afaik the association of kafka topic partition with a task # is random [15:31:03] so each time the job runs the metric+groupingKey+label could be different for the samem topic partition [15:31:11] so you'll get all possible combinations in prometheus over timem [15:31:27] even if at any given time in pushgateway the current ones are stored