[02:10:02] 06Data-Engineering, 06tech-decision-forum, 10Event-Platform: MediaWiki Event Carried State Transfer - Problem Statement - https://phabricator.wikimedia.org/T291120#11540079 (10Ottomata) [02:10:06] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Research, 10Event-Platform, 13Patch-For-Review: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#11540080 (10Ottomata) [02:23:25] 06Data-Engineering, 10Event-Platform: Common event data model for data derived from parsed page revision content - https://phabricator.wikimedia.org/T415158 (10Ottomata) 03NEW [02:30:56] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services: Add datetime versions of timestamp fields to Wikireplica databases - https://phabricator.wikimedia.org/T414199#11540129 (10Huji) >>! In T414199#11511823, @Marostegui wrote: > Maybe you can explore virtual columns? https://maria... [05:37:16] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11540284 (10Marostegui) In s8 it takes around 24 hours per host. [05:37:47] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163#11540285 (10Marostegui) [05:38:03] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163#11540287 (10Marostegui) [05:38:14] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11540288 (10Marostegui) [06:01:53] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11540309 (10ops-monitoring-bot) Starting pool of db1160 by marostegui@cumin1003: After schema change [06:02:04] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11540310 (10ops-monitoring-bot) Starting pool of db2179 by marostegui@cumin1003: After schema change [06:47:16] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11540340 (10ops-monitoring-bot) Completed pooling of db1160 by marostegui@cumin1003: After schema change [06:47:30] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11540341 (10ops-monitoring-bot) Completed pooling of db2179 by marostegui@cumin1003: After schema change [07:51:57] (03CR) 10Joal: [V:03+2 C:03+2] add translation_difficulty_levelto allow list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1207489 (owner: 10Conniecc1) [07:52:55] (03CR) 10Joal: [V:03+2 C:03+2] Add kaj.wikipedia to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1229091 (https://phabricator.wikimedia.org/T415038) (owner: 10Gerrit maintenance bot) [07:53:11] (03CR) 10Joal: [V:03+2 C:03+2] Add ppl.wikipedia to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1229095 (https://phabricator.wikimedia.org/T415046) (owner: 10Gerrit maintenance bot) [07:56:49] (03PS1) 10Joal: Update pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1229507 [07:58:05] (03PS2) 10Joal: Update pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1229507 [07:58:34] (03CR) 10A-pizzata: [C:03+1] Update pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1229507 (owner: 10Joal) [08:02:25] (03CR) 10Joal: [V:03+2 C:03+2] Update pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1229507 (owner: 10Joal) [08:22:27] (03CR) 10Joal: "Adding Dan as a reviewer to dicuss this. I don't like this hack. IMO we should try to have the same data in the underlying mediarequest an" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1228304 (https://phabricator.wikimedia.org/T198628) (owner: 10Ladsgroup) [09:14:26] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Publish Dumps 2 to dumps.wikimedia.org and provide only monthly dumps - https://phabricator.wikimedia.org/T414389#11540482 (10Poslovitch) Hi, is this why the mid-month dump run (20260120) has not started? [09:46:46] 06Data-Engineering, 10Event-Platform, 07Wikimedia-production-error: Wikimedia\Rdbms\DBTransactionError: Transaction round stage must be 'cursory' (not 'within-commit') - https://phabricator.wikimedia.org/T415169 (10Aklapper) 03NEW p:05Triage→03Unbreak! [09:50:38] 06Data-Engineering, 10Event-Platform, 07Wikimedia-production-error: Wikimedia\Rdbms\DBTransactionError: Transaction round stage must be 'cursory' (not 'within-commit') - https://phabricator.wikimedia.org/T415169#11540544 (10Aklapper) [10:05:24] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163#11540572 (10Marostegui) [10:05:28] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11540576 (10Marostegui) [10:11:03] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services: Add datetime versions of timestamp fields to Wikireplica databases - https://phabricator.wikimedia.org/T414199#11540600 (10fnegri) > The //real// solution here is to have OLAP databases provided on #data-services which are not... [10:51:37] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06MW-Interfaces-Team, 10RESTBase-API, 06ServiceOps new, 07OKR-Work: AQS Wikimedia REST API - new API version - https://phabricator.wikimedia.org/T407863#11540698 (10Blake) [10:53:21] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06MW-Interfaces-Team, 10RESTBase-API, 06ServiceOps new, and 2 others: AQS Wikimedia REST API - new API version - https://phabricator.wikimedia.org/T407863#11540700 (10Blake) [11:04:57] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06MW-Interfaces-Team, 10RESTBase-API, 06ServiceOps new, and 2 others: AQS Wikimedia REST API - new API version - https://phabricator.wikimedia.org/T407863#11540737 (10Blake) [11:13:25] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services: Add datetime versions of timestamp fields to Wikireplica databases - https://phabricator.wikimedia.org/T414199#11540745 (10Ladsgroup) Yeah, to me this feels a bit of x/y problem. A proper solution would be having an OLAP infra... [11:16:30] (03CR) 10Ladsgroup: "I talked to Dan about it yesterday, I was planning to create a ticket. His suggested way is to add a column in HDFS like "import_to_aqs" w" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1228304 (https://phabricator.wikimedia.org/T198628) (owner: 10Ladsgroup) [12:54:49] 06Data-Engineering (Q2 FY25/26 October 1st - December 31th): Test the dbt+skein approach to running dbt Spark jobs in K8s - https://phabricator.wikimedia.org/T414784#11541060 (10amastilovic) >>! In T414784#11537990, @JMonton-WMF wrote: > - About `profiles.yml`, I think we could consider the `profiles.yml` a defa... [12:57:50] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Test the dbt+skein approach to running dbt Spark jobs in K8s - https://phabricator.wikimedia.org/T414784#11541076 (10amastilovic) [12:59:07] !log Test Kitchen mw-user experiment (poll 29665) - adds: none; removes: growthexperiments-revise-tone; fields: none - xLab/MPIC/TK tips at https://w.wiki/FwuD [12:59:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:36:47] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Test the dbt+skein approach to running dbt Spark jobs in K8s - https://phabricator.wikimedia.org/T414784#11541169 (10amastilovic) >>! In T414784#11538893, @Ottomata wrote: > I know close to zero about dbt, but if dbt is launching a spark job, then these a... [13:38:32] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10MediaWiki-Page-derived-data, 07OKR-Work: Global Editor Metrics - backfill pageview metric data - https://phabricator.wikimedia.org/T405040#11541170 (10amastilovic) 05Open→03Resolved [13:49:02] 06Data-Engineering, 10Datasets-General-or-Unknown: Get dump mirrors to use new dumps-rsync service name - https://phabricator.wikimedia.org/T415193 (10taavi) 03NEW [13:50:10] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Create a `DbtSkeinOperator` in the Airflow `wmf_airflow_common` library - https://phabricator.wikimedia.org/T415194 (10amastilovic) 03NEW [13:54:39] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Review SLIS image suggestion pipeline - https://phabricator.wikimedia.org/T415195 (10APizzata-WMF) 03NEW [14:04:05] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Create a `DbtSkeinOperator` in the Airflow `wmf_airflow_common` library - https://phabricator.wikimedia.org/T415194#11541296 (10amastilovic) [14:04:15] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Movement-Insights, 06Data-Platform-SRE (2026.01.05 - 2026.01.23), 07Essential-Work, 13Patch-For-Review: Run dbt from Airflow - https://phabricator.wikimedia.org/T410268#11541297 (10amastilovic) [14:50:46] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services: Add datetime versions of timestamp fields to Wikireplica databases - https://phabricator.wikimedia.org/T414199#11542026 (10Ottomata) +1 to #Data-Services OLAP infra. BTW, We started a project in 2018 to do this, but canned i... [14:54:19] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services: Add datetime versions of timestamp fields to Wikireplica databases - https://phabricator.wikimedia.org/T414199#11542059 (10taavi) 05Open→03Declined OLAP infrastructure is tracked elsewhere as indicated above. It seems l... [14:59:20] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Create a `DbtSkeinOperator` in the Airflow `wmf_airflow_common` library - https://phabricator.wikimedia.org/T415194#11542081 (10Ottomata) Suggestion: Instead a new `DbtSkeinOperator` inheriting from `SimpleSkeinOperator`, consider making a `DbtOperator` t... [15:03:00] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services: Add datetime versions of timestamp fields to Wikireplica databases - https://phabricator.wikimedia.org/T414199#11542118 (10Ottomata) Suggestion: If OLAP infra in Cloud Services is something a significant portion of the deve... [15:04:20] 06Data-Engineering, 06Data-Engineering-Icebox, 06cloud-services-team, 10Data-Services, 07Epic: Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema - https://phabricator.wikimedia.org/T215858#11542120 (10Ottomata) Suggestion: If OLAP infra in... [15:07:22] 06Data-Engineering, 10AQS2.0: Introduce a new AQS endpoint to expose video plays - https://phabricator.wikimedia.org/T415202 (10Ladsgroup) 03NEW [15:07:49] 14Analytics, 06Data-Engineering, 06Test Kitchen, 13Patch-For-Review: Count the number of video plays - https://phabricator.wikimedia.org/T198628#11542143 (10Ladsgroup) [15:17:26] 06Data-Engineering, 10Event-Platform: Common event data model for data derived from parsed page revision content - https://phabricator.wikimedia.org/T415158#11542181 (10Ottomata) [15:50:18] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Create a `DbtSkeinOperator` in the Airflow `wmf_airflow_common` library - https://phabricator.wikimedia.org/T415194#11542366 (10amastilovic) @Ottomata I'm looking into that, thanks for the suggestion! [15:59:15] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic, 13Patch-For-Review: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11542398 (10elukey) Thanks a lot for the detailed explanation Ben! I tried to work on the Puppet part in ht... [16:04:48] 06Data-Engineering, 10Data-Platform, 06Moderator-Tools-Team, 06Product-Analytics (Kanban): Personal Dashboard Instrumentation Superset Dashboard - https://phabricator.wikimedia.org/T412137#11542410 (10MNeisler) a:03MNeisler [16:34:04] FIRING: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [16:34:08] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [16:40:21] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for kareid - https://phabricator.wikimedia.org/T413364#11542548 (10KReid-WMF) Hi - I've checked and I'm able to log in and see the test kitchen staging environment. Thanks! [16:44:03] RESOLVED: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [16:44:03] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [16:45:03] FIRING: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [16:45:04] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [16:50:03] FIRING: [2x] GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [16:50:03] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [17:02:49] 06Data-Engineering, 10MediaWiki-extensions-EventLogging, 06Test Kitchen: Deprecated and remove mw.eventLog.submitClick() - https://phabricator.wikimedia.org/T415210 (10Sfaci) 03NEW [17:03:02] 06Data-Engineering, 10MediaWiki-extensions-EventLogging, 06Test Kitchen: Deprecated and remove mw.eventLog.submitClick() - https://phabricator.wikimedia.org/T415210#11542625 (10Sfaci) [17:03:05] 06Data-Engineering, 10MediaWiki-extensions-EventLogging, 06Test Kitchen, 07Essential-Work, 05Goal: [GOAL] Tidy up EventLogging - https://phabricator.wikimedia.org/T408059#11542626 (10Sfaci) [17:06:12] 06Data-Engineering, 10MediaWiki-extensions-EventLogging, 06Test Kitchen: Deprecated and remove mw.eventLog.submitClick() - https://phabricator.wikimedia.org/T415210#11542641 (10Sfaci) [17:10:25] !log Test Kitchen mw-user experiment (poll 30413) - adds: growthexperiments-revise-tone; removes: none; fields: none - xLab/MPIC/TK tips at https://w.wiki/FwuD [17:10:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:24:16] 06Data-Engineering, 10Event-Platform: Common event data model for data derived from parsed page revision content - https://phabricator.wikimedia.org/T415158#11542707 (10Ottomata) [17:24:30] 06Data-Engineering, 06tech-decision-forum, 10Event-Platform: MediaWiki Event Carried State Transfer - Problem Statement - https://phabricator.wikimedia.org/T291120#11542708 (10Ottomata) [17:24:31] 06Data-Engineering, 10Event-Platform: Common event data model for data derived from parsed page revision content - https://phabricator.wikimedia.org/T415158#11542709 (10Ottomata) [17:24:39] 06Data-Engineering, 06Machine-Learning-Team, 10Event-Platform: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399#11542711 (10Ottomata) [17:24:42] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Research, 10Event-Platform, 13Patch-For-Review: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#11542710 (10Ottomata) [17:25:03] FIRING: [2x] GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [17:25:03] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [17:26:06] 06Data-Engineering, 06Content-Transform-Team, 06MW-Interfaces-Team, 10Event-Platform: Common event data model for data derived from parsed page revision content - https://phabricator.wikimedia.org/T415158#11542715 (10Ottomata) [18:10:58] 06Data-Engineering, 10Event-Platform, 07Wikimedia-Performance-recommendation: StreamConfig::validate() eating 0.5% of index.php time - https://phabricator.wikimedia.org/T413350#11542799 (10ori) 05Open→03Resolved a:03ori Before [[ https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventStreamConf... [18:17:17] 06Data-Engineering, 10Event-Platform, 07Wikimedia-Performance-recommendation: StreamConfig::validate() eating 0.5% of index.php time - https://phabricator.wikimedia.org/T413350#11542819 (10Ottomata) Hooray! Thank you! [18:50:03] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for kareid - https://phabricator.wikimedia.org/T413364#11542936 (10FCeratto-WMF) 05In progress→03Resolved Thanks, closing task. [19:03:42] 06Data-Engineering: Secret management on airflow for the automated transfer of (public) datasets from stats infra --> WME AWS - https://phabricator.wikimedia.org/T415208#11543026 (10Aklapper) [19:17:59] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services, and 3 others: Set up x1 replication to Wiki Replicas - https://phabricator.wikimedia.org/T395881#11543080 (10Ladsgroup) 05Open→03Stalled Blocked on {T415219} [20:21:13] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Clean up artifacts.yaml - https://phabricator.wikimedia.org/T405379#11543267 (10Ahoelzl) a:03Snwachukwu [20:51:49] 06Data-Engineering, 10MediaWiki-General: Update pingback MediaWiki versions to include new values - https://phabricator.wikimedia.org/T413349#11543337 (10xcollazo) We ran into a couple issues trying to backfill: * The SQL UNION ALLs existing data with the newly calculated data. This means the SQL expects the... [20:52:44] 06Data-Engineering, 10MediaWiki-General: Update pingback MediaWiki versions to include new values - https://phabricator.wikimedia.org/T413349#11543338 (10xcollazo) After fixes, the backfill is running well with: ` airflow dags backfill --reset-dagruns --start-date 2025-05-01 --end-date 2026-01-20 pingback_rep... [20:54:19] 06Data-Engineering, 10MediaWiki-extensions-EventLogging, 06Test Kitchen, 07Essential-Work, 05Goal: [GOAL] Tidy up EventLogging - https://phabricator.wikimedia.org/T408059#11543345 (10cjming) just to be crystal about what is being deprecated/removed from EventLogging, are we agreeing the following will be... [20:56:26] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10MediaWiki-General: Update pingback MediaWiki versions to include new values - https://phabricator.wikimedia.org/T413349#11543351 (10xcollazo) 05Open→03In progress p:05Triage→03Medium a:03xcollazo [21:21:48] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10MediaWiki-General: Update pingback MediaWiki versions to include new values - https://phabricator.wikimedia.org/T413349#11543401 (10cicalese) @xcollazo Thank you so much for your work on this! I appreciate it! I'm not a huge fan of the current SQL qu... [21:25:18] FIRING: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [21:25:18] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected