[00:14:45] 06Data-Engineering: Improve spider detection in the webrequest refinery pipeline - https://phabricator.wikimedia.org/T394794#10842283 (10nshahquinn-wmf) @Ahoelzl this definitely sounds like important maintenance that we need to do regardless of what future work we plan for page views. It may cause a noticeable s... [01:39:01] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Engineering-Wikistats, 10Data Pipelines, and 4 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476#10842408 (10srishakatux) p:05High→03Medium a:05srishakatux→03None [07:34:59] Morning team! I’ll be joining in a bit, my Mac decided it was a good time to upgrade [07:55:04] 10Quarry, 06cloud-services-team, 10Data-Services: Quarry WMCloud (ruwiki_p, section s6) experiencing sustained replication lag (~16 h) - https://phabricator.wikimedia.org/T394859#10842734 (10taavi) This is due to a hardware issue with one of the hosts involved in the replication chain to the wiki replicas: {... [08:02:14] (03CR) 10Peter Fischer: "There's no need to re-release, the package has already been imported, see, for example, the parent: https://gitlab.wikimedia.org/repos/wmf" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1147771 (https://phabricator.wikimedia.org/T367405) (owner: 10Peter Fischer) [08:49:19] 06Data-Engineering: Improve spider detection in the webrequest refinery pipeline - https://phabricator.wikimedia.org/T394794#10843013 (10Joe) >>! In T394794#10840834, @JAllemandou wrote: > I think that for WE 5.4 the plan to improve `spider` accuracy is great and we should implement it, and I also think we need... [09:30:11] 06Data-Engineering: Improve spider detection in the webrequest refinery pipeline - https://phabricator.wikimedia.org/T394794#10843178 (10JAllemandou) >>! In T394794#10843013, @Joe wrote: > I might propose some changes to the heuristics based on our experience dealing with "concealed" bots over the last couple ye... [09:54:12] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10MediaWiki-DomainEvents, 07Epic, 10Event-Platform, and 2 others: Testing the domain event refactoring with production data - https://phabricator.wikimedia.org/T394899 (10gmodena) 03NEW [11:40:19] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 10Discovery-Search (2025.05.02 - 2025.05.23), 13Patch-For-Review: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all depen... - https://phabricator.wikimedia.org/T367405#10843727 [11:40:34] 06Data-Engineering, 10Wikidata, 10Wikidata Analytics (Kanban): Wikidata editor numbers for stats.wikimedia and Grafana are dramatically different - https://phabricator.wikimedia.org/T394770#10843730 (10AndrewTavis_WMDE) 05Open→03Resolved a:03AndrewTavis_WMDE https://phabricator.wikimedia.org/T39359... [13:30:00] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Add data quality metrics to mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T392494#10844228 (10xcollazo) >>! In T392494#10835951, @xcollazo wrote: > Ok we now have DQ tests in production. > > Will wait until a succe... [13:49:22] 06Data-Engineering, 06Data-Persistence, 10Data-Platform, 06Data-Platform-SRE, and 3 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10844267 (10fnegri) I would suggest splitting the `an-redacteddb1001` upgrade to a separate task. It can be a sub-task of this one,... [14:02:01] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Add data quality metrics to mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T392494#10844319 (10xcollazo) Ok the issue from T392494#10844228 is a class mismatch that will likely have to wait. We currently use the jav... [14:13:42] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Traffic, 13Patch-For-Review: Clean-up varnishkafka webrequest leftovers in Hadoop-world - https://phabricator.wikimedia.org/T394011#10844393 (10JAllemandou) [14:30:39] 06Data-Engineering, 06Data-Persistence, 10Data-Platform, 06Data-Platform-SRE, and 3 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10844464 (10JAllemandou) >>! In T394372#10840706, @Ahoelzl wrote: > @JAllemandou any implications for the Data Platform / sqooping?... [15:14:56] 06Data-Engineering, 06Data-Persistence, 10Data-Platform, 06Data-Platform-SRE, and 2 others: an-redacteddb1001: upgrade MariaDB to 10.11 - https://phabricator.wikimedia.org/T394930 (10fnegri) 03NEW [15:18:42] 06Data-Engineering, 06Data-Persistence, 10Data-Platform, 06Data-Platform-SRE, and 3 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10844756 (10fnegri) > As you prefer. If better for you, you can start by the an-redacteddb1001 :) No preference really, if it's up... [15:23:32] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [16:33:52] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 10Discovery-Search (2025.05.02 - 2025.05.23), 13Patch-For-Review: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all depen... - https://phabricator.wikimedia.org/T367405#10845129 [19:23:32] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [19:47:03] 10Quarry, 06cloud-services-team, 10Data-Services: Quarry WMCloud (ruwiki_p, section s6) experiencing sustained replication lag (~16 h) - https://phabricator.wikimedia.org/T394859#10845620 (10Marostegui) The server has been fixed and it is now slowly catching up. @Voyagerim I would like to understand where th... [21:02:25] @log Deploy Airflow artifact for T392494 and T394310. [21:02:26] T392494: Add data quality metrics to mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T392494 [21:02:35] !log Deploy Airflow artifact for T392494 and T394310. [21:02:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:03:45] 10Data-Engineering (Q4 2025 April 1st - June 30th): 2025-04-01 run of mediawiki_wikitext_history is stuck (20d running) - https://phabricator.wikimedia.org/T394954 (10Ahoelzl) 03NEW [21:13:50] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Data-Platform-SRE: Provide tooling to instantiate ad-hoc temporary Airflow DEV environments - https://phabricator.wikimedia.org/T393521#10845879 (10Ahoelzl) [21:48:22] 10Quarry, 06cloud-services-team, 10Data-Services: Quarry WMCloud (ruwiki_p, section s6) experiencing sustained replication lag (~16 h) - https://phabricator.wikimedia.org/T394859#10845998 (10Voyagerim) @Marostegui , expectations regarding replication latency thresholds - namely that web replicas should maint... [22:10:20] 10Data-Engineering (Q4 2025 April 1st - June 30th): Spike on choosing a solution for DagProperties - https://phabricator.wikimedia.org/T394541#10846037 (10mforns) ### DAG Params Adding here the conversation we had in the Airflow DevEx group sync about DAG Params. --- DAG Params is a feature that allows the dev... [23:23:32] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem