[10:04:10] lunch + errands [10:15:04] lunch 2 [13:15:54] o/ [13:55:13] o/ [16:00:07] workout, back in ~40 [16:03:49] hmm, wonder if something changed in yarn. cluster is mostly idle but getting `user capacity reached` [16:37:11] dinner [16:43:07] back [17:13:39] dr0ptp4kt: realized the uniques numbers might have been slightly misleading, they were both already converted to per day numbers. it was ~72M/day over april according to unique devices estimate, and ~82M/day according to actor hashing [17:14:02] ty ebernhardson [17:22:58] daily values have a correlation between uniques devices and actors of .67. Not terrible, some linear relationship but not a strong match [17:30:11] I've updated https://gerrit.wikimedia.org/r/c/operations/alerts/+/1025453 per d-causse suggestions if anyone wants to take a look [17:42:13] lunch, back in time for pairing [17:49:23] meh, expanding out to a full month the correlation dropped to 0.53 :P Curiously they look more correlated than that to my eye. I guess thats why we use math and not my eyes :) [18:13:22] back [18:33:50] ryankemper we're in pairing if you wanna join [19:19:39] hmm , looks like logstash's missing the last couple of hours for rdf-streaming-updater in staging https://logstash.wikimedia.org/goto/03cc6fc6299a7a58213cbca6d6bd40c7 [19:21:04] aaaand here's a link that doesn't cut off the last hour https://logstash.wikimedia.org/goto/386d707856fd6f4d94043299c77c67d5 [19:25:02] seems to be happening for all namespaces in eqiad staging. I pinged observability and service ops [19:29:25] I reverted the patch for now, will check w d-causse for next steps on https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1025764 when he gets in [19:32:16] * ebernhardson sighs ... in this hour i have 91k click events, but only 5k visit for autocomplete. vs 8k fulltext clicks and 9k fulltext visit's. Nothing ever lines up : [19:41:24] re: missing logs, it's apparently a known issue. T363856 [19:41:24] T363856: datahub-mae-consumer producing logs at excessive rate - https://phabricator.wikimedia.org/T363856 [19:46:13] btw, I think we are OK in the flink update. Forgot that they started calling checkpoints "Savepoints" in the logs, even though they're difference [19:46:45] will update the docs, but basically checkpoints are short-lived, automatically-triggered and have incrementing numbers, [19:46:59] savepoints live until you delete, are manually triggered, and have UUIDs [19:59:46] anyway, these log lines make it clear that it's using the previous checkpoint...checkpoint number would've reset if not https://phabricator.wikimedia.org/P61498 [21:13:59] merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1023813 for moving to CFSSL certs. All is well on canary (elastic2100) except I need to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1024481 or we'll get a ton of alerts [21:33:16] I'm off! See you all in July! [21:33:22] Trey314159: enjoy! [21:33:30] Thanks! [21:40:42] Alert patch is merged...rolling out CFSSL to the rest of the ES hosts [22:07:46] OK, new certs are provisioned and everything looks happy [22:07:52] see ya tomorrow!