[06:54:15] <pfischer>	 o/
[06:55:36] <dcausse>	 o/
[07:06:03] <pfischer>	 dcausse: Apparently the weighted tags (kafka representation) where created on Friday. I am just a bit surprised to see the DAG finished in ~3h instead of the expected 7h. According to the kafka dashboard the rate peaked at ~10 records per second which is also below the default rate-limit of 20. Looking into it.
[07:07:45] <dcausse>	 pfischer: the SUP misbehaved badly so depending on what metrics you're looking at perhaps that's expected?
[07:09:34] <pfischer>	 dcausse: Ouch, I have not checked SUP yet. What happened? Are the produced records ill-formatted?
[07:11:05] <pfischer>	 I just checked the source kafka topic the DAG is producing to, so SUP shouldn’t effect that.
[07:11:41] <dcausse>	 ok
[07:12:16] <dcausse>	 pfischer: I should have written an incident report but there's a short description at: https://phabricator.wikimedia.org/T372912#11152737
[07:13:14] <dcausse>	 was pretty much bad luck, hit by 3 consecutive nasty problems while trying to recover 
[07:14:41] <dcausse>	 pfischer: to confirm a theory of the bad state problem: did you encounter issues while deploying flink 1.20 saying that the state was incompatible and had to switch to upgradeMode: stateless to drop the state?
[07:19:34] <dcausse>	 if this theory is right https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/190 should help capture such problems in the future, we only test that the state bytes can be read again bug we did not check that the two serializers are "compatible" by flink standards
[07:19:51] <dcausse>	 s/bug/but
[07:22:54] <pfischer>	 dcausse: yes, state could not be restored when I upgraded to 1.20 so changed to stateless (temporarily).
[07:23:44] <dcausse>	 ok that explains it, thanks
[07:26:04] <pfischer>	 How did that become a problem with eventutilities? Its POM still relies on flink 1.17 I assume, but that should be overruled by SUP’s POM, requesting 1.20, right?
[07:26:46] <dcausse>	 pfischer: it's because we needed 1.4 of eventutilities because of https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/1079506/11/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/EventRowTypeInfo.java
[07:27:02] <dcausse>	 in short they changed the signature of the createSerializer function
[07:27:58] <dcausse>	 without it (SUP with flink-1.20 but eventutilties 1.3) flink used the new signature which was not overridden by EventRowTypeInfo and thus called the parent RowTypeInfo::createSerializer
[07:30:12] <dcausse>	 we had a hint when upgrading to flink 1.20 in https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/187 , the UpdateEventTest failed and we had to create a "new version", I think I did not pay enough attention as to the why, UpdateEventTest should not have failed
[07:30:41] <pfischer>	 Ah, stupid me, I changed those signatures in SUP code but did forget eventutilities… 🤦
[07:31:24] <pfischer>	 Sorry for that and thanks for taking care!
[07:31:28] <dcausse>	 np! the root cause imo is the fragility of the EventRowTypeInfo that depends on RowTypeInfo, it's very trappy
[07:33:58] <pfischer>	 Hm, I am still surprised that no test covered that so far.
[07:34:40] <dcausse>	 yes... https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/190 should hopefully help
[07:35:03] <dcausse>	 and probably some more tests in eventutilities as well
[10:40:04] <dcausse>	 lunch
[13:19:57] <inflatador>	 <o/
[13:21:04] <dcausse>	 o/
[13:43:03] <ebernhardson>	 \o
[13:43:45] <inflatador>	 .o/
[13:51:44] <dcausse>	 o/
[14:15:23] <gehel>	 pfischer: I think I lost you...
[14:15:45] <pfischer>	 gehel: yes, and I cant re-connect…
[14:16:23] <gehel>	 pfischer: I'll move a few tasks around while waiting for you...
[14:19:12] <gehel>	 you've left again!
[14:19:48] <pfischer>	 gehel: aaaargh. I’ll restart.
[14:23:31] <pfischer>	 gehel: Okay, it’s working again, google meet, with camera…
[16:01:24] <inflatador>	 workout, back in ~40
[16:45:44] <inflatador>	 back
[17:36:55] <ebernhardson>	 proposed scoring metric:  -1: incorrect or missing rewrite of bad query, 0: rewriting a good query, 0.5: not rewriting a good query, 1: plausibly correct rewrite
[17:42:54] <ebernhardson>	 perhaps unsupringly, on this metric:  default: -24.5, expensive_1: -11, expensive_1_variant: 6, expensive_2: -27, expensive_2_variant: -10, variant: -8.5
[17:44:31] <ebernhardson>	 so on this totally arbitrary metric...variant and expensive_2_variant seems plausibly better than default at least
[17:45:14] <Trey314159>	 You can keep them categorical and not try to weight them, though the weighting system seems fine—except that I don't think missing a rewrite is as bad as an incorrect rewrite. Looking at "tayps of wlding daifets" and giving up is a reasonable thing to do. So maybe missing a rewrite is -0.5.... but again, this system is good enough to get the general idea of what's going on.
[17:46:16] <Trey314159>	 Also... I think we have Stats Chat on Wednesday, unless we are skipping for the P&T meeting.
[17:46:22] <ebernhardson>	 yea it's hard to say...i guess i'm leaning towards the "rewrite zero result queries" end which is where we think it will have the most interaction
[17:46:39] <ebernhardson>	 in that case i thinking missing a rewrite of a bad query is probably bad
[17:51:52] <ebernhardson>	 as for categorical, i suppose but then i'm not sure how to sum and sort :P
[17:54:46] <inflatador>	 lunch, back in ~40
[17:58:09] <dcausse>	 dinner
[18:08:30] <Trey314159>	 The interpretation of the categorical data is more subjective (but assigning weights to various categories is also subjective). You hope to see an increase in good categories and a decrease in bad categories, which is a  sign that it works.
[18:08:31] <Trey314159>	 That's better than a big increase in a good category covering for a small increase in a bad category. It's still subjective, but multi-dimensionally subjective. (This is related to my dislike of F₀ for recall and precision. More dimensions is better dimensions when the total is < 5.)
[18:17:16] <ebernhardson>	 yea, i suppose summing the categories is reasonable if there aren't too many (and this isn't that many)
[18:17:42] <ebernhardson>	 separately...having some siding replaced and the cacophany is real. i knew it would be loud but somehow not this loud :P
[18:40:29] <inflatador>	 back
[19:18:49] <Trey314159>	 Having a roof replaced is probably similar. The roofers were super fast and did it in one day.. but that meant the number of hammer strikes per minute was.... 𝑡ℎ𝑟𝑜𝑢𝑔ℎ 𝑡ℎ𝑒 𝑟𝑜𝑜𝑓! (badum-tiss)
[19:25:20] <inflatador>	 {◕ ◡ ◕}
[20:08:11] <ebernhardson>	 plausible analysis of dym complete, mixed feelings.  old `suggest` field with prefix_len=1 might be most reasonable way forward, taking latency into account.  It moves more bad rewrites to good, but doesn't fix any of the "silly suggestions to reasonable queries"
[20:08:13] <ebernhardson>	 https://phabricator.wikimedia.org/T403826#11159648
[20:18:10] <inflatador>	 ryankemper we may wanna look at those `PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 ` alerts at pairing. Maybe even submit a patch to set fix the grammar ;) 
[20:25:02] <ryankemper>	 inflatador: +1
[20:34:45] <inflatador>	 cool, we could also go over https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1184572 and deploy a namespace so we can install the OpenSearch operator CRDs (or both)
[21:04:35] <ryankemper>	 inflatador: 5’
[21:11:44] <inflatador>	 ryankemper ACK, I'm here