[08:58:19] 10Data-Engineering (Sprint 9): We should provide DQ integration with Python - https://phabricator.wikimedia.org/T353940#9610127 (10gmodena) [10:15:12] puredata [10:15:29] woops :) [10:36:24] 10Data-Engineering (Sprint 9): We should provide DQ integration with Python - https://phabricator.wikimedia.org/T353940#9610458 (10gmodena) We can integrate our DQ framework with Python by piggy backing on `pyspark` 's py4j gateway. Following is a rudimentary example that produces metrics `data_quality_metrics`... [13:25:06] 06Data-Engineering, 10Data Pipelines, 06SRE, 06Traffic-Icebox: Mobile redirects drop provenance parameters - https://phabricator.wikimedia.org/T252227#9611197 (10Isaac) Very excited to see this gaining some traction (thanks @mpopov and @dr0ptp4kt)! Commenting on the analytics side of things (I don't know e... [13:26:42] I have prepared a patch to switch AQS services to use the Feb snapshot of mediawiki_history_reduced https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1009389 [13:26:49] joal: or anyone ^^ [13:33:41] +1ed btullis :) [13:56:27] Thank you. [13:59:13] joal: Despite my message earlier, I haven't had a chance to investigate that data loss warning yet. Bit swamped. [14:01:08] !log deploying updated mediwiki_history_reduced snapshots to AQS 2.0 [14:01:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:03:20] (03CR) 10Gmodena: Mediawiki History Data Quality Metrics (037 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1008934 (https://phabricator.wikimedia.org/T354692) (owner: 10Snwachukwu) [15:57:35] (KafkaReplicationFactorTooLow) firing: (2) Kafka topic eqiad.change-prop.retry.change-prop.transcludes.resource_change replication factor is too low on main-eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Increase_a_topic's_replication_factor - https://alerts.wikimedia.org/?q=alertname%3DKafkaReplicationFactorTooLow [15:57:50] (KafkaReplicationFactorTooLow) firing: (17) Kafka topic eventlogging_DesktopWebUIActionsTracking replication factor is too low on jumbo-eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Increase_a_topic's_replication_factor - https://alerts.wikimedia.org/?q=alertname%3DKafkaReplicationFactorTooLow [16:02:39] (KafkaReplicationFactorTooLow) resolved: (378) Kafka topic _schemas replication factor is too low on jumbo-eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Increase_a_topic's_replication_factor - https://alerts.wikimedia.org/?q=alertname%3DKafkaReplicationFactorTooLow [17:08:48] (03PS3) 10Snwachukwu: Mediawiki History Data Quality Metrics [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1008934 (https://phabricator.wikimedia.org/T354692) [18:14:28] 10Data-Engineering (Sprint 9): [Data Quality] Define concept for Alerting in coordination with SRE - https://phabricator.wikimedia.org/T351093#9612780 (10Ahoelzl) SRE / Bryan is following up on that. [19:04:53] (03CR) 10Snwachukwu: [C: 03+2] Add DataPivoter job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/995271 (https://phabricator.wikimedia.org/T354552) (owner: 10Snwachukwu) [19:14:53] (03Merged) 10jenkins-bot: Add DataPivoter job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/995271 (https://phabricator.wikimedia.org/T354552) (owner: 10Snwachukwu) [19:36:26] 10Data-Engineering (Sprint 9): We should provide DQ integration with Python - https://phabricator.wikimedia.org/T353940#9613196 (10xcollazo) IIUC, the necessity for py4j is only tied to the fact that we developed helper code like the case of `HivePartition` and `DeequAnalyzersToDataQualityMetrics` that we'd like... [19:52:32] 06Data-Engineering, 10Wikibase change dispatching scripts to jobs, 10Wikidata Change Dispatching & Watchlists, 06serviceops-radar: Better observability/visualization for MediaWiki jobs - https://phabricator.wikimedia.org/T291620#9613249 (10Michael)