[04:49:56] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Add row_update_dt watermark column to mediawiki_history_incremental_v1 - https://phabricator.wikimedia.org/T428503#12008102 (10AKhatun_WMF) Q: Are the daily back-patch MERGEs in MWHistoryDeltaWriter meant to guard again... [04:50:13] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): WE5.3.3b: Contributor Count Per Page [Attribution API] - https://phabricator.wikimedia.org/T426316#12008105 (10AKhatun_WMF) ### Date Interval Consideration: Incremental MWH data interval is (T428503#12003010): - Daily runs = `data_interval_end` (Data for the... [05:31:36] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Add row_update_dt watermark column to mediawiki_history_incremental_v1 - https://phabricator.wikimedia.org/T428503#12008157 (10AKhatun_WMF) There seems to be one missed dat time format thing in control map: https://gerr... [08:39:52] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 13Patch-For-Review: Accelerate sqoop landing for MediaWiki History private tables - https://phabricator.wikimedia.org/T424355#12008692 (10APizzata-WMF) the task requests: > Verify savings over two monthly runs before... [09:26:09] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Enable Airflow DAG trigger config dialog by default - https://phabricator.wikimedia.org/T428872 (10amastilovic) 03NEW [09:26:33] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Enable Airflow DAG trigger config dialog by default - https://phabricator.wikimedia.org/T428872#12008895 (10amastilovic) [09:48:29] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE, 06Traffic: Provide a scheduled data download service from Google Cloud Storage - https://phabricator.wikimedia.org/T427457#12008995 (10Gehel) [12:06:50] 06Data-Engineering: WMF Data Engineering Request: Additional Columns to Geoeditors Tables - https://phabricator.wikimedia.org/T428888 (10catherine.kelsey.wmde) 03NEW [12:09:15] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Display DbtSkeinOperator Skein resource config in task notes - https://phabricator.wikimedia.org/T428889 (10amastilovic) 03NEW [12:20:30] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE, 06Traffic: Provide a scheduled data download service from Google Cloud Storage - https://phabricator.wikimedia.org/T427457#12009615 (10ayounsi) @Antoine_Quhen Our network can 100% handle that kind of load, but we have some question... [12:38:12] 06Data-Engineering, 07Sustainability: Create a wmf.webrequest_sampled_128 Hive table - https://phabricator.wikimedia.org/T427978#12009712 (10mforns) @Ottomata Do you mean that we should sample after webrequest 2.0? Or that webrequest 2.0 should be good enough to query without creating the sampled table? I... [13:38:26] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 13Patch-For-Review: Accelerate sqoop landing for MediaWiki History private tables - https://phabricator.wikimedia.org/T424355#12010036 (10xcollazo) > We can wait the beginning of July and close the task? Ah yes, make... [13:40:45] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Delete `wmf_raw.mediawiki_content` table after a few months - https://phabricator.wikimedia.org/T427441#12010047 (10xcollazo) [13:40:51] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 13Patch-For-Review: Accelerate sqoop landing for MediaWiki History private tables - https://phabricator.wikimedia.org/T424355#12010049 (10xcollazo) [13:49:20] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): WE5.3.3b: Contributor Count Per Page [Attribution API] - https://phabricator.wikimedia.org/T426316#12010075 (10Ottomata) > I am also wondering if we even need a updated_at and if only keeping a loaded_at works? Maybe not, but if it is easy to keep `updated_... [13:54:22] 06Data-Engineering, 06Data-Engineering-Radar, 06Content-Transform-Team, 06MW-Interfaces-Team, and 2 others: Expose MediaWiki Parser render_id as a response header in relevant MW REST API endpoints - https://phabricator.wikimedia.org/T418792#12010105 (10Ottomata) @cscott if not too late, would you consider... [13:59:07] 06Data-Engineering, 07Sustainability: Create a wmf.webrequest_sampled_128 Hive table - https://phabricator.wikimedia.org/T427978#12010149 (10Ottomata) I don't know what I mean! If we had Spark 4 maybe https://issues.apache.org/jira/browse/SPARK-55978 is all we would need? > The latter, we would have to... [14:12:09] 06Data-Engineering, 10DPE-MediaWiki-Incremental-History, 06MW-Interfaces-Team: MediaWiki DomainEvents - Include LogEntry - https://phabricator.wikimedia.org/T427815#12010255 (10xcollazo) @Ottomata did you intent to move this one to Phase II? [14:50:09] 06Data-Engineering, 06Data-Engineering-Radar, 06Content-Transform-Team, 06MW-Interfaces-Team, and 2 others: Expose MediaWiki Parser render_id as a response header in relevant MW REST API endpoints - https://phabricator.wikimedia.org/T418792#12010419 (10cscott) >>! In T418792#12010105, @Ottomata wrote: > @c... [14:52:59] 06Data-Engineering, 06DBA, 10DiscussionTools, 07Schema-change-in-production: Drop DiscussionTools database tables from wikis which don't need them - https://phabricator.wikimedia.org/T426341#12010437 (10JAllemandou) This will cause a problem indeed: sqoop will try to gather tables that don't exist and fail... [14:54:35] 06Data-Engineering, 06Data-Engineering-Radar, 06Content-Transform-Team, 06MW-Interfaces-Team, and 2 others: Expose MediaWiki Parser render_id as a response header in relevant MW REST API endpoints - https://phabricator.wikimedia.org/T418792#12010450 (10Ottomata) Thank you! [14:55:56] 06Data-Engineering, 06DBA, 10DiscussionTools, 07Schema-change-in-production: Adapt sqoop configs to account for discussiontools tables only present on some wiki databases - https://phabricator.wikimedia.org/T428916 (10Ottomata) 03NEW [14:56:27] 06Data-Engineering, 06DBA, 10DiscussionTools, 07Schema-change-in-production: Drop DiscussionTools database tables from wikis which don't need them - https://phabricator.wikimedia.org/T426341#12010468 (10Ottomata) Thank you, I created a subtask for us. @Dreamy_Jazz what is the timeline for this change? [14:56:57] 06Data-Engineering, 06DBA, 10DiscussionTools, 07Schema-change-in-production: Drop DiscussionTools database tables from wikis which don't need them - https://phabricator.wikimedia.org/T426341#12010469 (10Dreamy_Jazz) No idea what the timeline is, that's for #dba to decide [15:02:30] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 05Metrics-Sprint-2026-2027, 13Patch-For-Review: DE3.1 - Logged-out Wikipedia reader 21-day retention on web - https://phabricator.wikimedia.org/T424706#12010508 (10Milimetric) [15:02:46] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 05Metrics-Sprint-2026-2027, 13Patch-For-Review: DE3.1 - Logged-out Wikipedia reader 21-day retention on web - https://phabricator.wikimedia.org/T424706#12010509 (10Milimetric) [15:10:08] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 05Metrics-Sprint-2026-2027: DE3.2 - Logged-in Wikipedia reader 2nd week retention on web - https://phabricator.wikimedia.org/T424708#12010564 (10Milimetric) [15:14:47] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 05Metrics-Sprint-2026-2027, 13Patch-For-Review: DE3.1 - Logged-out Wikipedia reader 21-day retention on web - https://phabricator.wikimedia.org/T424706#12010596 (10Milimetric) [15:15:20] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 05Metrics-Sprint-2026-2027, 13Patch-For-Review: DE3.1 - Logged-out Wikipedia 21-day retention on mobile web - https://phabricator.wikimedia.org/T424706#12010610 (10Milimetric) [15:28:08] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 05Metrics-Sprint-2026-2027, 13Patch-For-Review: DE3.1 - Logged-out Wikipedia 21-day retention on mobile web - https://phabricator.wikimedia.org/T424706#12010704 (10Milimetric) [15:29:39] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 05Metrics-Sprint-2026-2027: DE3.2 - Logged-in Wikipedia reader 2nd week retention on web - https://phabricator.wikimedia.org/T424708#12010733 (10Milimetric) [15:38:55] 06Data-Engineering, 07Sustainability: Create a wmf.webrequest_sampled_128 Hive table - https://phabricator.wikimedia.org/T427978#12010770 (10xcollazo) Note that TABLESAMPLE is not deterministic, so we may want to do our own way of sampling here so that reruns are repeatable? (Also this task is closed as d... [15:46:10] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 07Epic, 13Patch-For-Review: Incremental MediaWiki History Phase I - https://phabricator.wikimedia.org/T424350#12010828 (10xcollazo) [15:46:54] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE, 06Traffic: Provide a scheduled data download service from Google Cloud Storage - https://phabricator.wikimedia.org/T427457#12010832 (10Gehel) >>! In T427457#12009615, @ayounsi wrote: > * What's the timeline for the project ? Ideally... [15:54:07] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 13Patch-For-Review: Airflow DAGs for mediawiki_history_incremental_v1 writers - https://phabricator.wikimedia.org/T425730#12010882 (10xcollazo) Both DAGs WAD in production at https://airflow.wikimedia.org/home?tag... [15:54:15] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 13Patch-For-Review: Airflow DAGs for mediawiki_history_incremental_v1 writers - https://phabricator.wikimedia.org/T425730#12010883 (10xcollazo) 05In progress→03Resolved [15:55:48] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 07Epic, 13Patch-For-Review: Incremental MediaWiki History Phase I - https://phabricator.wikimedia.org/T424350#12010905 (10APizzata-WMF) [16:09:36] FIRING: [3x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [16:09:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [16:12:01] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Use Iceberg branches (i.e WAP) to write atomically - https://phabricator.wikimedia.org/T428288#12011011 (10xcollazo) 05Open→03Resolved WAD in production. Love the new branching pattern, we should do similarly fo... [17:09:03] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE, 06Traffic: Provide a scheduled data download service from Google Cloud Storage - https://phabricator.wikimedia.org/T427457#12011346 (10BCornwall) @Gehel Thanks for looping in traffic. If I'm reading this correctly, the Hadoop cluster... [17:50:29] 06Data-Engineering, 10Test Kitchen: Master's Thesis Proposal: Contributing to Wikimedia's Data Platform - https://phabricator.wikimedia.org/T428674#12011517 (10Ahoelzl) Thanks @JaimeAvaloss for reaching out. We are discussing your request internally and will get back to you shortly. Do you have a timeline in m... [17:59:00] (03PS1) 10Xcollazo: MWHistoryDeltaWriter: fix CAST format bug and stale comment [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1300876 (https://phabricator.wikimedia.org/T428503) [18:02:07] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 13Patch-For-Review: Add row_update_dt watermark column to mediawiki_history_incremental_v1 - https://phabricator.wikimedia.org/T428503#12011593 (10xcollazo) >>! In T428503#12008157, @AKhatun_WMF wrote: > There seems t... [18:03:04] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Convert `control_map MAP` to a typed struct in `mediawiki_history_incremental_v1` - https://phabricator.wikimedia.org/T427862#12011596 (10xcollazo) 05Declined→03Open Re-opening and attaching to Phase... [18:03:18] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Convert `control_map MAP` to a typed struct in `mediawiki_history_incremental_v1` - https://phabricator.wikimedia.org/T427862#12011601 (10xcollazo) [18:03:23] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 07Epic, 13Patch-For-Review: Incremental MediaWiki History Phase I - https://phabricator.wikimedia.org/T424350#12011603 (10xcollazo) [18:07:12] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Quality verification for mediawiki_history_incremental_v1 using Iceberg time travel - https://phabricator.wikimedia.org/T425734#12011609 (10xcollazo) a:03APizzata-WMF For seeing this work thru, I am boldly assigning i... [18:29:38] (03CR) 10Joal: [C:03+1] "LGTM" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1300876 (https://phabricator.wikimedia.org/T428503) (owner: 10Xcollazo) [18:32:30] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 05Metrics-Sprint-2026-2027: DE3.2 - Logged-in Wikipedia reader 2nd week retention on web - https://phabricator.wikimedia.org/T424708#12011708 (10Milimetric) [18:34:36] FIRING: [3x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [18:34:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [18:39:36] FIRING: [3x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [18:39:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [19:26:31] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): [dbt] Contributor metrics for May off by 100x - https://phabricator.wikimedia.org/T428785#12011862 (10amastilovic) OK so we've investigated the issue and can provide a comprehensive explanation of how the data ended up the way it did. ## Post-mortem summary... [19:55:40] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Traffic, 13Patch-For-Review: Add X-Provenance data to webrequest_sampled_live - https://phabricator.wikimedia.org/T427068#12011986 (10CDanis) The MR looks good to me! I'm happy to help with the haproxy patch portion of possibility 1, which is my prefe... [20:25:28] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Review remaining 90-day windows in MWHistoryDeltaWriter - https://phabricator.wikimedia.org/T428961 (10xcollazo) 03NEW [20:26:17] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Review remaining 90-day windows in MWHistoryDeltaWriter - https://phabricator.wikimedia.org/T428961#12012087 (10xcollazo) Opening this as part of Phase I for consideration, but I think it would be totally fine if we mov... [20:29:57] (03CR) 10AKhatun: [C:03+1] MWHistoryDeltaWriter: fix CAST format bug and stale comment [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1300876 (https://phabricator.wikimedia.org/T428503) (owner: 10Xcollazo) [20:39:26] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Right-size Spark resource config using History Server data - https://phabricator.wikimedia.org/T428966 (10xcollazo) 03NEW [20:39:39] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Right-size Spark resource config using History Server data - https://phabricator.wikimedia.org/T428966#12012167 (10xcollazo) Opening this as part of Phase I for consideration, but I think it would be totally fine if we... [21:10:32] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Add source='snapshot+events' for incrementally-enriched snapshot rows - https://phabricator.wikimedia.org/T428969 (10xcollazo) 03NEW [21:13:47] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Add source='snapshot+events' for incrementally-enriched snapshot rows - https://phabricator.wikimedia.org/T428969#12012303 (10xcollazo) While drafting this ticket, I realized that an implementation of this is significan... [21:15:07] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History: Add source='snapshot+events' for incrementally-enriched snapshot rows - https://phabricator.wikimedia.org/T428969#12012305 (10xcollazo) CC @AKhatun_WMF, as FYI, and to acknowledge that the whole `row_update_dt` dance fr... [21:17:52] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Bliviero - https://phabricator.wikimedia.org/T428815#12012309 (10BCornwall) [21:22:24] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Bliviero - https://phabricator.wikimedia.org/T428815#12012317 (10BCornwall) [21:24:59] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Bliviero - https://phabricator.wikimedia.org/T428815#12012333 (10BCornwall) 05Open→03Resolved a:03BCornwall Hi, @BLiviero-WMF! The access has been granted and should be in ef... [21:38:36] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): [dbt] Contributor metrics for May off by 100x - https://phabricator.wikimedia.org/T428785#12012370 (10Ahoelzl) Good findings. Some follow ups: 1. For the interval convention, would a dbt macro `half_open_month(ds)` make sense? 2. Let's use this incident to... [21:38:52] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): [dbt] Contributor metrics for May off by 100x - https://phabricator.wikimedia.org/T428785#12012373 (10Ahoelzl) p:05Triage→03High [21:49:57] 06Data-Engineering: Automate ingestion of netflow event stream - https://phabricator.wikimedia.org/T248865#12012417 (10BCornwall) [22:39:36] FIRING: [3x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [22:39:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [23:49:36] FIRING: [3x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [23:49:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [23:59:36] FIRING: [3x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [23:59:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag