[09:21:23] o/ dcausse: SUP has been up all weekend and so far it looks promising: https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater?orgId=1&refresh=5m&from=now-2d&to=now - back pressure also looks fine [09:21:23] https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus%2Fk8s-staging&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-search&var-flink_job_name=cirrus_streaming_updater_consumer_search_staging&var-operator_name=All [09:22:14] pfischer: \o/ [09:22:51] re: back-pressure and throughtput I think we're just OK, might not be enough if we add more wikis [09:25:19] Hm, at least the back pressure is not increasing over time under pressure, it’s rather stable <500ms [09:26:10] Could we add another wiki with relforge as sink? [09:26:18] yes it allowed to keep the kafka lag low but unsure that'll be enough after adding commons and wikidata [09:26:32] with relforge that might hard? [09:26:39] be* [09:27:10] But the envoy TLS issue should be solved, so we might as well use cloudealstic, right? [09:27:11] Erik added the nullsink for testing [09:27:20] Oh, okay [09:28:07] yes we can try shipping to cloudelastic as well but it requires shipping MW config changes [09:28:09] So I’d deploy with processing events from commons? [09:28:20] And the null sink [09:28:44] sure, you can add wikidata as well? (lemme check what Erik had in mind) [09:29:01] Both, commons and Wikidata where part of the patch [09:29:18] (For enabling publication of the events) [09:29:32] yes from T352335: wikidatawiki, commonswiki, frwiki, itwiki, testwiki [09:29:33] T352335: Deploy the new Cirrus Updater to update select wikis in cloudelastic - https://phabricator.wikimedia.org/T352335 [09:35:10] we should perhaps run a quick compare cluster to see if the numbers are sane on fr and it wiki [09:35:52] Sure, where would you execute that? statXXXX? [09:35:58] last time Erik checked it was pretty far off ~24k but it was pretty much backlogged [09:36:00] unsure [09:36:32] I’ll run it. [09:36:50] it's in cirrus so I'd say somewhere on a mw machine but could well be copied via scp [09:37:21] pfischer: be sure to take https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/980913 [09:38:15] Oh, that’s still open. 👀 I’ll merge that first. [09:38:55] sure [11:23:19] Lunch [11:28:22] lunch 2 [14:04:17] o/ [14:04:50] o/ [14:35:41] dcausse: Took me a while to get stats out of the compare-clusters.py but here we go: https://phabricator.wikimedia.org/P54487 [14:36:55] pfischer: thanks! [14:37:52] it improved I think, now I guess we have to monitor if it gets worse [14:38:33] we could possibly copy fr wiki again (after doing some test with the nullsink) [14:38:46] Yes that at least one order of magnitude lower than before, roughly 0.1% of mismatches for it/fr [14:39:21] Okay, I’ll turn non processing of Wikidata and commons then. [15:44:21] Need to run to pharmacy, missing triage [16:01:33] ryankemper: I'll skip our 1:1 today, not feeling super well [17:17:28] pfischer dcausse I just set page rerender topic partitions to 5 in kafka-jumbo, LMK if you see any problems [17:17:44] inflatador: thanks! [18:34:13] lunch, back in ~40 [18:56:54] dinner [19:15:23] back [20:04:51] appointment, back in ~90 [21:28:40] back