[01:53:41] 06Data-Engineering, 06DBA, 10MediaWiki-Core-Revision-backend, 07Schema-change: Rethink rev_sha1 field - https://phabricator.wikimedia.org/T389026#10650098 (10Krinkle) ### Timeline * 2009: Original request by the Halfak/Research team in {T23860}, which details the originally envisioned use cases for a SHA... [06:23:53] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Traffic, 10DPE HAProxy Migration, 13Patch-For-Review: Fix `webrequest_frontend` kafka timestamp mismatch with in-data `dt` field - https://phabricator.wikimedia.org/T388397#10650404 (10Vgutierrez) Maybe I'm misreading the task description but from >... [09:05:10] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10650718 (10kevinbazira) 05Stalled→03In progress a:03kevinbazira [09:15:21] 06Data-Engineering, 06Data-Engineering-Radar, 10Data-Platform-SRE (2025.03.01 - 2025.03.21), 13Patch-For-Review: Unable to trigger dag with config - https://phabricator.wikimedia.org/T384805#10650740 (10dcausse) It is possible to run such dags with `airflow dags trigger -c {config}`. Ideally manual dags sh... [09:50:12] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 06Java-Scala-Standardization: Create a template for building Maven projects - https://phabricator.wikimedia.org/T389248#10650908 (10Gehel) →14Duplicate dup:03T386406 [09:50:13] 06Data-Engineering, 06Java-Scala-Standardization, 10Discovery-Search (2025.03.01 - 2025.03.21): Create Gitlab CI templates for JVM packages - https://phabricator.wikimedia.org/T386406#10650910 (10Gehel) [09:50:29] 06Data-Engineering, 06Java-Scala-Standardization, 10Discovery-Search (2025.03.01 - 2025.03.21): Create Gitlab CI templates for JVM packages - https://phabricator.wikimedia.org/T386406#10650912 (10Gehel) [09:50:30] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 10Discovery-Search (2025.03.01 - 2025.03.21): Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all dependencies are available,... - https://phabricator.wikimedia.org/T367405#10650913 [09:51:00] 06Data-Engineering, 06Java-Scala-Standardization, 10Discovery-Search (2025.03.01 - 2025.03.21): Create Gitlab CI templates for JVM packages - https://phabricator.wikimedia.org/T386406#10650914 (10Gehel) a:03amastilovic [09:56:29] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Traffic, 10DPE HAProxy Migration, 13Patch-For-Review: Fix `webrequest_frontend` kafka timestamp mismatch with in-data `dt` field - https://phabricator.wikimedia.org/T388397#10650919 (10JAllemandou) >>! In T388397#10650404, @Vgutierrez wrote: > Maybe... [10:10:16] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Traffic, 10DPE HAProxy Migration, 13Patch-For-Review: Fix `webrequest_frontend` kafka timestamp mismatch with in-data `dt` field - https://phabricator.wikimedia.org/T388397#10650959 (10JAllemandou) I answered to a comment on the gitlab PR (https://gi... [10:12:50] (03CR) 10Joal: [C:03+1] "One mini-nit in commit message the code is good! Merge at will." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1128801 (https://phabricator.wikimedia.org/T354694) (owner: 10Aqu) [11:30:38] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Dumps-Generation, 06SRE, 07Epic: Dumps generation cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#10651347 (10BTullis) Just to follow up on this, we have confirmed that there is a performance regression when using d... [12:05:43] 06Data-Engineering: NEW BUG REPORT - https://phabricator.wikimedia.org/T389352 (10Aklapper) 03NEW [12:07:12] 06Data-Engineering: NEW BUG REPORT  - https://phabricator.wikimedia.org/T389352#10651549 (10Aklapper) 05Open→03Invalid [12:14:11] 06Data-Engineering: 21 new wikis missing from mediawiki_history - https://phabricator.wikimedia.org/T386649#10651593 (10dr0ptp4kt) [12:21:08] (03PS1) 10Dr0ptp4kt: WIP DNM: Add 21 wikis for sqoop, mediawiki_history eligibility [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1129234 (https://phabricator.wikimedia.org/T386649) [12:25:55] (03CR) 10Dr0ptp4kt: "I noticed in c1129234 a merge conflict here for `grouped_wikis.csv`. The `grouped_wikis.csv` here seems smaller than the current one. Is t" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/496560 (https://phabricator.wikimedia.org/T215550) (owner: 10Joal) [12:26:42] (03CR) 10Dr0ptp4kt: "Ugh, I meant in c1129234 I noticed the merge conflict." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/496560 (https://phabricator.wikimedia.org/T215550) (owner: 10Joal) [12:28:46] (03CR) 10Dr0ptp4kt: "Er, https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1129234" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/496560 (https://phabricator.wikimedia.org/T215550) (owner: 10Joal) [12:36:11] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Dumps-Generation, 06SRE, 07Epic: Dumps generation cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#10651667 (10Marostegui) >>! In T368098#10651347, @BTullis wrote: > Just to follow up on this, we have confirmed that... [12:37:22] (03CR) 10Dr0ptp4kt: "I think I may see what's going on: is this here patch starting with an older base branch?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/496560 (https://phabricator.wikimedia.org/T215550) (owner: 10Joal) [12:55:32] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10651747 (10BTullis) [12:57:40] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10651754 (10BTullis) For reference, the current `... [13:36:18] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10651906 (10xcollazo) >>! In T386255#10648681, @x... [14:03:52] (03Abandoned) 10Joal: Update refinery sqoop parallel execution [analytics/refinery] - 10https://gerrit.wikimedia.org/r/496560 (https://phabricator.wikimedia.org/T215550) (owner: 10Joal) [14:06:06] 06Data-Engineering, 10Event-Platform, 10MW-1.44-notes (1.44.0-wmf.21; 2025-03-18), 07Wikimedia-production-error: Wikimedia\Assert\PreconditionException: Precondition failed: Cannot end a span that has not been started - https://phabricator.wikimedia.org/T389331#10652022 (10jnuche) 05In progress→03Re... [14:06:36] 06Data-Engineering, 13Patch-For-Review: 21 new wikis missing from mediawiki_history - https://phabricator.wikimedia.org/T386649#10652027 (10dr0ptp4kt) These 21 wikis do appear to be part of [[ https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/refinery/+/929a74e7a220826d5a40e5e723d0cd1d0a994c9c/static_da... [14:07:06] 06Data-Engineering, 10Event-Platform, 10MW-1.44-notes (1.44.0-wmf.21; 2025-03-18), 07Wikimedia-production-error: Wikimedia\Assert\PreconditionException: Precondition failed: Cannot end a span that has not been started - https://phabricator.wikimedia.org/T389331#10652028 (10jnuche) a:05jnuche→03None [14:07:24] (03PS2) 10Dr0ptp4kt: Add 21 wikis for sqoop, mediawiki_history eligibility [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1129234 (https://phabricator.wikimedia.org/T386649) [14:11:43] 06Data-Engineering, 13Patch-For-Review: 21 new wikis missing from mediawiki_history - https://phabricator.wikimedia.org/T386649#10652049 (10dr0ptp4kt) a:03dr0ptp4kt [14:15:05] 06Data-Engineering, 10Event-Platform, 10MW-1.44-notes (1.44.0-wmf.21; 2025-03-18), 07Wikimedia-production-error: Wikimedia\Assert\PreconditionException: Precondition failed: Cannot end a span that has not been started - https://phabricator.wikimedia.org/T389331#10652057 (10kostajh) a:03mszabo [14:21:15] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Traffic, 10DPE HAProxy Migration, 13Patch-For-Review: Fix `webrequest_frontend` kafka timestamp mismatch with in-data `dt` field - https://phabricator.wikimedia.org/T388397#10652133 (10Ottomata) BTW, there was a request to do this for varnishkafka, b... [14:24:58] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Data-Platform-SRE (2025.03.01 - 2025.03.21): Consider writing Spark files to Ceph (S3) instead of Hadoop - https://phabricator.wikimedia.org/T384500#10652150 (10BTullis) 05Open→03Resolved a:03BTullis >>! In T384500#10491440, @Gehel wrote: > Given our c... [14:37:46] 10Data-Engineering (Q3 2025 January 1st - March 31th), 07Essential-Work: [Data Quality] Implement wiki completeness check for MediaWiki History - https://phabricator.wikimedia.org/T365203#10652209 (10Snwachukwu) To solve the missing wikis issue, we decided it's best to automate sqoop list. There are 3 source o... [14:37:55] 06Data-Engineering, 06Data-Engineering-Radar, 10ConfirmEdit (CAPTCHA extension), 10MediaWiki-extensions-EventLogging, and 2 others: Send captcha API response data to event logging - https://phabricator.wikimedia.org/T379179#10652210 (10acooper) [14:51:47] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10652390 (10dr0ptp4kt) It probably won't matter a... [14:56:35] 10Data-Engineering (Q3 2025 January 1st - March 31th), 07Essential-Work: [Data Quality] Implement wiki completeness check for MediaWiki History - https://phabricator.wikimedia.org/T365203#10652444 (10Ottomata) Thanks Sandra! I am not 100% on all the pros and cons of the solutions, but I'm sure you and Dan a... [15:38:23] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Image-Suggestions, 10Section-Level-Image-Suggestions, 10Structured-Data-Backlog (Current Work): [SPIKE] Check the Wikimedia content history dataset - https://phabricator.wikimedia.org/T385787#10652695 (10Cparle) Here's how... [15:38:32] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Image-Suggestions, 10Section-Level-Image-Suggestions, 10Structured-Data-Backlog (Current Work): [SPIKE] Check the Wikimedia content history dataset - https://phabricator.wikimedia.org/T385787#10652707 (10Cparle) [15:39:04] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Image-Suggestions, 10Section-Level-Image-Suggestions, 10Structured-Data-Backlog (Current Work): [SPIKE] Check the Wikimedia content history dataset - https://phabricator.wikimedia.org/T385787#10652726 (10Cparle) [15:57:19] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Data-Platform-SRE (2025.03.01 - 2025.03.21), 07Essential-Work: Update canary_events DAG to use an internal domain and/or the service mesh to obtain its eventstream config - https://phabricator.wikimedia.org/T384329#10652890 (10Ottomata) @brouberol lots... [15:58:17] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Data-Platform-SRE (2025.03.01 - 2025.03.21), 07Essential-Work: Update canary_events DAG to use an internal domain and/or the service mesh to obtain its eventstream config - https://phabricator.wikimedia.org/T384329#10652905 (10brouberol) 05Open→... [16:07:08] 10Data-Engineering (Q3 2025 January 1st - March 31th), 13Patch-For-Review: Timeout hive-metastore locks - https://phabricator.wikimedia.org/T365563#10652962 (10Antoine_Quhen) 05Open→03Resolved Resolved in T386114 [16:38:35] 10Data-Engineering (Q3 2025 January 1st - March 31th), 07Essential-Work: [Data Quality] Implement wiki completeness check for MediaWiki History - https://phabricator.wikimedia.org/T365203#10653165 (10Snwachukwu) Okay @Andrew. Thank you! [17:19:36] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06MW-Interfaces-Team: Make DomainEvents serializable - https://phabricator.wikimedia.org/T379936#10653367 (10Ottomata) 05Open→03Invalid We aren't doing any active development work on this, and the plan may change significantly when we do. How this... [18:05:47] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Image-Suggestions, 10Section-Level-Image-Suggestions, 10Structured-Data-Backlog (Current Work): [SPIKE] Check the Wikimedia content history dataset - https://phabricator.wikimedia.org/T385787#10653633 (10xcollazo) The propo... [18:11:31] 06Data-Engineering, 10DPE-Mediawiki-Content: Use the Spark-Iceberg built in CDC mechanism to PoC a replacement for wikimedia_wikitext_current - https://phabricator.wikimedia.org/T366544#10653687 (10xcollazo) Noting here that there are multiple immediate use cases for having a `wmf_content.mediawiki_content_cur... [18:14:10] 06Data-Engineering, 10DPE-Mediawiki-Content: Use the Spark-Iceberg built in CDC mechanism to PoC a replacement for wmf.wikimedia_wikitext_current - https://phabricator.wikimedia.org/T366544#10653743 (10xcollazo) [18:20:05] 06Data-Engineering, 10DPE-Mediawiki-Content: Use the Spark-Iceberg built in CDC mechanism to PoC a replacement for wmf.wikimedia_wikitext_current - https://phabricator.wikimedia.org/T366544#10653805 (10Ottomata) +1 for `wmf_content.mediawiki_content_current_v1` @JAllemandou how might this look if we did the... [19:27:48] 06Data-Engineering, 10DPE-Mediawiki-Content: Use the Spark-Iceberg built in CDC mechanism to PoC a replacement for wmf.wikimedia_wikitext_current - https://phabricator.wikimedia.org/T366544#10654155 (10Ottomata) (BTW, maybe the more [[ https://www.thoughtspot.com/data-trends/data-modeling/star-schema-vs-snowfl... [20:28:23] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10654324 (10xcollazo) >It probably won't matter a... [21:29:52] (03PS2) 10TChin: Support inserting ResultKey into DeequVerificationSuiteToDataQualityAlerts [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1127964 (https://phabricator.wikimedia.org/T384962) [21:29:57] (03PS2) 10TChin: Add columns to data_quality_alerts to support inserting ResultKey [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1127967 (https://phabricator.wikimedia.org/T384962) [21:46:32] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10654702 (10dr0ptp4kt) > Thanks @dr0ptp4kt . Took... [22:22:07] 06Data-Engineering, 06DBA, 10MediaWiki-Core-Revision-backend, 07Schema-change: Rethink rev_sha1 field - https://phabricator.wikimedia.org/T389026#10654827 (10Ladsgroup) I'm all for dropping the column and compute that from content hash on the fly in the API, I'm sure for revert detection inside mediawiki i...