[04:16:07] <ryankemper>	 i.nflatador / g.ehel: I wonder if we can rip out https://gerrit.wikimedia.org/g/operations/puppet/+/ebfb23ce343a9d8adc6abbcb730f437ca65fa3dd/modules/profile/manifests/query_service/common.pp#79 since we're not using the netcat method to xfer journal files anymore
[04:55:35] <ryankemper>	 If we do still want it, then we need to add wdqs20[13-22] here which we missed: https://gerrit.wikimedia.org/g/operations/puppet/+/ebfb23ce343a9d8adc6abbcb730f437ca65fa3dd/hieradata/role/common/wdqs/public.yaml#33 (but I think we can prob just tear this stuff out)
[09:10:56] <pfischer>	 o/ dcausse: Following up on yesterdays meeting, I looked into the S3 request for the enrichment application (https://phabricator.wikimedia.org/T330693) and sort out the requirements for zookeeper (ZK) and a file system (S3). IIUC, ZK is for coordination amongst the flink coordinators (job mangers & task managers) and S3 is needed to persist state of stateful operators, aka checkpoints. Are kafka offsets be such state of 
[09:10:56] <pfischer>	 the source operator?
[09:13:28] <pfischer>	 Would ZK (or ConfigMaps as of now) be used to hold a reference to such checkpoint?
[09:26:01] <dcausse>	 pfischer: yes, swift will store: the job artifacts (jars), the flink state (kafka offsets, the operator states which is mainly the dedup window)
[09:26:21] <dcausse>	 so mainly we have to estimate what's the size of the dedup window
[09:30:33] <dcausse>	 this will also store savepoints in addition to checkpoints, savepoints can be taken manually or automatically by the flink-k8s-operator where you can tell it e.g.: take a savepoint every 10 minutes and keep the last 10 savepoints
[09:34:58] <dcausse>	 ZK/configmaps only stores pointers to checkpoints and do leader election for flink components (e.g. what's the jobmanager that a taskmanager should use)
[09:49:41] <pfischer>	 Alright, thanks! I already tried to research how to estimate the snapshot size. As expected, this is highly dependent on the application. Do you think, running a local experiment with avg. sized revision-based events at 30 events/s + non-revision events at 300 events/s (with 7.5% deduplication rate) would give us an estimate when looking at the locally saved checkpoints?
[09:52:39] <dcausse>	 pfischer: I think we could even take the event size (enriched) and do the math instead of running an experiment?
[09:54:19] <dcausse>	 window_length_in_sec * rate * avg_event_size
[10:01:44] <dcausse>	 doing quick math and assuming a 50k enriched event size, enriching all guesstimating something around 5G, enriching only rev_based we'd be at 500m, enriching nothing somewhere in the 100m
[10:05:55] <pfischer>	 So we checkpoint every 5s?
[10:08:28] <dcausse>	 5s seems low but possible?
[10:08:42] <dcausse>	 haven't thought too much about the checkpoint rate
[10:10:15] <dcausse>	 but this should not affect much the storage needs unless we start to have multiple checkpoints taken in parallel (which might happen if we take a checkpoint evey 5s)
[10:10:45] <pfischer>	 I was just doing the reverse math: 5000000 kb / 50 kb / 307 (ops/s) -> 5s
[10:11:08] <dcausse>	 ah you mean the window size
[10:11:16] <dcausse>	 hm my math are wrong then :)
[10:11:53] <pfischer>	 Yes, sorry, that would be the window size.
[10:13:34] <gehel>	 lunch
[10:14:31] <dcausse>	 so 50kb*330ops/s*300sec = 4950000kb (ignoring dedup for now simplicity)
[10:15:44] <pfischer>	 https://docs.google.com/spreadsheets/d/1Fp44MdLxUVlxi03MBD_64m0zQErny-9jUD5C6RGf_bU/edit#gid=2037151092
[10:16:11] <dcausse>	 thanks will be way easier in a spreadsheet indeed! :)
[10:16:35] <pfischer>	 But state would pile up inside the dedupe window since we only flush at the end
[10:26:27] <dcausse>	 pfischer: not sure I understand?
[10:28:35] <pfischer>	 So if we use full (not diff) checkpoints, with min pause of 1 s, we would write 300 checkpoints within a dedupe window. Each checkpoint would be the size of its predecessor + its own size.
[10:29:18] <pfischer>	 If we use diff checkpoints the math should be as you suggested above.
[10:33:30] <dcausse>	 for full checkpoints (and allowing parallel checkpoints) it depends on the time a checkpoint takes to be taken so no it's unlikely that 300 chekpoints will be pending
[10:34:12] <dcausse>	 if we don't allow parallel checkpoints only one full checkpoints + the pending one will take up storage
[10:34:43] <dcausse>	 checkpoint rate and window size are not closely related
[10:37:09] <dcausse>	 by parallel I mean tuning the "number of concurrent checkpoints" from https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/
[10:37:38] <dcausse>	 lunch
[11:15:32] <pfischer>	 Yes, I saw that. I think I got it now. What I confused was checkpoint retention. Since they can be dropped once a new one has been written successfully, our max storage requirement would be determined by the maximum parallel checkpoints * the number of retained checkpoints.
[11:24:54] <pfischer>	 I created https://phabricator.wikimedia.org/T342620, would tag SRE-swift-storage once you had a chance to look at it.
[11:57:45] <pfischer>	 Lunch + errands
[13:26:10] <dcausse>	 pfischer: thanks! made some adjustments but all this is up for debate (i.e. # retained savepoints and the headroom %) I just picked these numbers based on my experience operating flink with WDQS
[13:41:30] <pfischer>	 Sure, thanks dcausse:
[13:44:23] <inflatador>	 <o/
[13:45:43] <inflatador>	 pfischer I added tags to the phab task, thanks for creating it
[14:02:23] <dcausse>	 inflatador: are we doing the DPE K8s check-in?
[14:04:22] <dcausse>	 nvm, re-reading emails I believe we discussed about removing it
[14:10:06] <inflatador>	 dcausse I thought I deleted it already, sorry!
[14:10:12] <dcausse>	 np!
[14:13:53] <inflatador>	 But...if dcausse and pfischer do have time to meet, I have a quick question on https://phabricator.wikimedia.org/T341625 as it seems that kafka/swift usage are interdependent. I wanted to respond to the ticket with that but just want to make sure my response is accurate 
[14:14:19] <inflatador>	 draft response here: https://etherpad.wikimedia.org/p/kafkaswift . If you can QC/edit and get back to me it would be appreciated
[14:14:59] <pfischer>	 Just a sec.
[14:15:09] <dcausse>	 inflatador: happy to jump in a meet to discuss this
[14:16:12] <inflatador>	 dcausse cool, up at https://meet.google.com/oom-cfms-yar 
[16:57:05] <dcausse>	 pfischer: was about to merge https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/5 but sonar complaining is this something we can ignore?
[16:58:40] <dcausse>	 relatedly we should try to keep https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/856507 up to date with the flink code
[16:59:38] <dcausse>	 and when you have a moment to review it I'd like to deploy https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/465 to unblock a broken job
[17:00:04] <dcausse>	 dinner
[17:23:12] <inflatador>	 lunch, back in ~45
[17:59:18] <gehel>	 dcausse: the failure seems to be with the build itself, not with sonar. I wonder why it is passing on the main build. Maybe different JDK ?
[18:11:52] <inflatador>	 back
[18:33:01] <inflatador>	 ryankemper we're in SRE pairing
[18:33:06] <gehel>	 ryankemper: https://meet.google.com/eki-rafx-cxi