[07:15:55] 06Data-Engineering: Consider classifying www.wikipedia.org from "internal" referrer - https://phabricator.wikimedia.org/T422584 (10Krinkle) 03NEW [08:14:12] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Traffic: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11797902 (10Fabfur) >>! In T422030#11794907, @Vgutierrez wrote: > It looks like the root cause is [[ https://github.com/haproxy/haproxy/commit/0b7a5a64eb51ce4b22866... [08:17:59] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Traffic: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11797907 (10JAllemandou) Thanks for confirming the invalid-events change @Vgutierrez. There still is something I don't understand: * The pattern we see in v3.0 seem... [08:26:40] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Traffic: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11797939 (10Vgutierrez) Yes, sequence numbers are enerated by haproxy itself, even if it results in a SSL handshake error where the sequence number doesn't reach ha... [08:47:07] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Traffic: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11798013 (10JAllemandou) >>! In T422030#11797939, @Vgutierrez wrote: > Yes, sequence numbers are enerated by haproxy itself, even if it results in a SSL handshake e... [09:37:52] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Traffic: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11798111 (10Vgutierrez) I've replicated locally a SSL handshake failure using haproxy with `log-format-sd %{+E}o\ [haproxykafka@0\ %[capture.req.hdr(0),json(ascii)]... [09:52:19] (03CR) 10Lucas Werkmeister (WMDE): [C:04-1] Update repo dev dependencies in preparation for further work (036 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1268581 (https://phabricator.wikimedia.org/T422510) (owner: 10Andrew McAllister (WMDE)) [10:22:55] (03PS6) 10Andrew McAllister (WMDE): Update repo dev dependencies in preparation for further work [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1268581 (https://phabricator.wikimedia.org/T422510) [10:29:52] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] Update repo dev dependencies in preparation for further work (036 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1268581 (https://phabricator.wikimedia.org/T422510) (owner: 10Andrew McAllister (WMDE)) [11:10:44] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] Update repo dev dependencies in preparation for further work [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1268581 (https://phabricator.wikimedia.org/T422510) (owner: 10Andrew McAllister (WMDE)) [11:11:38] (03Merged) 10jenkins-bot: Update repo dev dependencies in preparation for further work [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1268581 (https://phabricator.wikimedia.org/T422510) (owner: 10Andrew McAllister (WMDE)) [11:14:12] (03PS1) 10Andrew McAllister (WMDE): Update repo dev dependencies in preparation for further work [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1268922 (https://phabricator.wikimedia.org/T422510) [11:14:51] (03CR) 10Andrew McAllister (WMDE): [C:03+2] Update repo dev dependencies in preparation for further work [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1268922 (https://phabricator.wikimedia.org/T422510) (owner: 10Andrew McAllister (WMDE)) [11:15:45] (03Merged) 10jenkins-bot: Update repo dev dependencies in preparation for further work [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1268922 (https://phabricator.wikimedia.org/T422510) (owner: 10Andrew McAllister (WMDE)) [12:44:42] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11798866 (10JMonton-WMF) The previous config started failing after 1 hour, I wasn't able to check the exact reason, alt... [12:46:16] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Wikidata, 10Wikidata-Query-Service, 13Patch-For-Review: Add a --output-dir argument to wikibase rdf and json dumps - https://phabricator.wikimedia.org/T401296#11798876 (10xcollazo) >>! In T401296#11798862, @CodeReviewBot wrote: > xcollazo **merged**... [12:57:57] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Perform a one-time clean up of retained data sets in event_sanitize - https://phabricator.wikimedia.org/T417694#11798990 (10xcollazo) 05Open→03In progress p:05Triage→03Medium [12:58:07] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Perform a one-time clean up of retained data sets in event_sanitize - https://phabricator.wikimedia.org/T417694#11798993 (10xcollazo) a:03xcollazo [13:09:32] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform: page_change.v1 increate partitions to 3 - https://phabricator.wikimedia.org/T422511#11799050 (10Ottomata) Linking some really good thoughts from Javier from the parent task: T421216#11792886 ...and responding here. --- I was unsure if i... [13:20:47] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11799302 (10Ottomata) > The checkpoint was already exactly_once, the Sink delivery guarantee was at_least_once, Oh! I... [13:26:25] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11799363 (10Ottomata) checkpointing.. > making it "unaligned" which seems to improve in scenarios with big latencies,... [13:29:25] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Backfill datasets affected by Nov 2025 automated traffic incident - https://phabricator.wikimedia.org/T421735#11799388 (10mforns) [13:37:49] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11799472 (10Ottomata) > As a rule of thumb, we need to do 40 HTTP calls per second. If many of them take 10 seconds, we... [13:42:48] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Perform a one-time clean up of retained data sets in event_sanitize - https://phabricator.wikimedia.org/T417694#11799606 (10xcollazo) == Investigation findings == We investigated two angles: the top offenders by file count and size (from the ticket descript... [13:50:44] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Add new API rate limiting fields from webrequest_logs to Turnilo view - https://phabricator.wikimedia.org/T419736#11799689 (10HCoplin-WMF) @Ahoelzl & @GGoncalves-WMF -- do you have a rough idea for when you might be able to pull this in? [14:03:09] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Traffic: Surge in webrequest validation check - https://phabricator.wikimedia.org/T422030#11799759 (10JAllemandou) Summarizing here a talk we had on slack with @Vgutierrez and @Fabfur : * In v3.0 we were experiencing unexpected sequence-id increment. Thi... [15:08:08] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11800172 (10JMonton-WMF) New test on `-next` staging release with these configs: helmfile apply -i -e dse-k8s-eqiad \... [15:45:58] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11800307 (10JMonton-WMF) I'm trying this setup on the `-next` deployment: ` helmfile apply -i -e dse-k8s-eqiad \ --... [15:57:49] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform: page_change.v1 increate partitions to 3 - https://phabricator.wikimedia.org/T422511#11800370 (10Ahoelzl) @dcausse @gmodena @gkyziridis please provide your respective team input. [16:58:21] 06Data-Engineering, 06Data-Engineering-Radar, 10Data-Platform, 06Growth-Team, and 3 others: Image Suggestions uses AI-generated images from Commons when adding images on English Wikipedia - https://phabricator.wikimedia.org/T422513#11800636 (10Ahoelzl) [17:11:47] 06Data-Engineering, 06Data-Engineering-Radar, 10Data-Platform, 06Growth-Team, and 3 others: Image Suggestions uses AI-generated images from Commons when adding images on English Wikipedia - https://phabricator.wikimedia.org/T422513#11800750 (10Ahoelzl) @mfossati @APizzata-WMF can this be easily done in the... [17:47:50] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-Mediawiki-Content: Missing/inconsistent page_redirect_target field for redirects in Mediawiki content current v1 dumps - https://phabricator.wikimedia.org/T400632#11801119 (10xcollazo) a:03APizzata-WMF [18:09:57] (03PS1) 10Andrew McAllister (WMDE): Split user changes by namespace to perm/tmp users [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1269033 (https://phabricator.wikimedia.org/T422500) [18:10:48] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 07Essential-Work: Perform a one-time clean up of retained data sets in event_sanitize - https://phabricator.wikimedia.org/T417694#11801237 (10Ahoelzl) [18:10:49] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11801236 (10Ottomata) Hm! prod failed with a message to large error but in the error sink! ` Caused by: org.apache.fl... [18:12:33] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 07Essential-Work: Perform a one-time clean up of retained data sets in event_sanitize - https://phabricator.wikimedia.org/T417694#11801238 (10mforns) I'd be totally in favor of setting a long-time retention period for event_sanitized. The overall plan look... [18:13:44] 06Data-Engineering, 10Dumps-Generation, 10Wikidata: Json wikidata dumps incomplete - https://phabricator.wikimedia.org/T422303#11801242 (10Ahoelzl) @Lydia_Pintscher is this something you are investigating? [18:31:40] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 07Essential-Work: Perform a one-time clean up of retained data sets in event_sanitize - https://phabricator.wikimedia.org/T417694#11801337 (10xcollazo) @SNowick_WMF and @phuedx: You both seem to be the owners of the original work items that decommissioned... [19:24:30] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Research, 10Event-Platform, 13Patch-For-Review: eventutilties-python - support synchronous Flink process function mode - https://phabricator.wikimedia.org/T421965#11801514 (10Ottomata) I published docker v1.49.0.dev30 today with synch mode. Didn't h... [19:27:02] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 07Essential-Work: Perform a one-time clean up of retained data sets in event_sanitize - https://phabricator.wikimedia.org/T417694#11801530 (10xcollazo) [22:46:22] 06Data-Engineering, 06Data-Engineering-Radar, 10QuickSurveys, 06WMDE-TechWish: QuickSurveys should show an error when response is blocked - https://phabricator.wikimedia.org/T256463#11802317 (10Jdlrobson-WMF)