[02:50:41] FIRING: MediawikiPageHtmlContentChangeEnrichHighKafkaConsumerLag: ... [02:50:46] High Kafka consumer lag for mw_page_html_content_change_enrich in eqiad - https://wikitech.wikimedia.org/wiki/MediaWiki_Event_Enrichment/HTML_Enrichment#Alerting - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-page-html-content-change-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_page_html_content_change_enrich - ... [02:50:47] https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageHtmlContentChangeEnrichHighKafkaConsumerLag [02:55:41] RESOLVED: MediawikiPageHtmlContentChangeEnrichHighKafkaConsumerLag: ... [02:55:41] High Kafka consumer lag for mw_page_html_content_change_enrich in eqiad - https://wikitech.wikimedia.org/wiki/MediaWiki_Event_Enrichment/HTML_Enrichment#Alerting - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-page-html-content-change-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_page_html_content_change_enrich - ... [02:55:41] https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageHtmlContentChangeEnrichHighKafkaConsumerLag [04:00:36] !log Test Kitchen experiment (poll 82361) - adds: logged-in-retention-srm-investigation-2026-06; removes: none; fields: none - TK tips at https://w.wiki/_cvdP [04:00:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:26:46] 06Data-Engineering, 05FY2025-26 KR 5.1, 06MediaWiki-Platform-Team (Kanban Board), 07OKR-Work, 13Patch-For-Review: redioscope: periodically publish top clients to the data lake - https://phabricator.wikimedia.org/T424823#12063841 (10daniel) >>! In T424823#11938856, @Ottomata wrote: > Q: are you sure you d... [07:28:25] 06Data-Engineering, 05FY2025-26 KR 5.1, 06MediaWiki-Platform-Team (Kanban Board), 07OKR-Work, 13Patch-For-Review: redioscope: periodically publish top clients to the data lake - https://phabricator.wikimedia.org/T424823#12063849 (10daniel) WE have put this on hold for now, since there is no pressing need... [08:31:00] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Product-Analytics: dbt-jobs backfill: PP3 API hourly and known clients aggregate jobs - https://phabricator.wikimedia.org/T429341#12064134 (10amastilovic) [09:02:15] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Product-Analytics: dbt-jobs backfill: PP3 API hourly and known clients aggregate jobs - https://phabricator.wikimedia.org/T429341#12064363 (10amastilovic) Backfill completed. [09:39:12] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 13Patch-For-Review: Right-size Spark resource config using History Server data - https://phabricator.wikimedia.org/T428966#12064465 (10APizzata-WMF) During the weekend I have let the pipeline run in my `airflow-devenv... [09:41:54] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Document editor counts table and APIs - https://phabricator.wikimedia.org/T429863#12064485 (10GGoncalves-WMF) Looks excellent, thank you! I've just made a minor edit to fix a typo and add a link to an issue. > Should we omit the mention of data gateway API... [10:28:53] (03CR) 10Clément Goubert: [C:03+1] Remove apiportal for pageviews, add isvwiki for sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1305980 (https://phabricator.wikimedia.org/T430366) (owner: 10Dr0ptp4kt) [12:05:52] 06Data-Engineering: Setup and populate initial version of user_agents_info table - https://phabricator.wikimedia.org/T430020#12065072 (10KCVelaga_WMF) [12:09:41] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Enable Airflow DAG trigger config dialog by default - https://phabricator.wikimedia.org/T428872#12065074 (10atsuko) Deploing this change. [12:11:35] 06Data-Engineering: Setup and populate initial version of user_agents_info table - https://phabricator.wikimedia.org/T430020#12065077 (10KCVelaga_WMF) I have the updated the proposed schema in the description based on the discussion and also additional stats columns. [12:18:51] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 13Patch-For-Review: Quality verification for mediawiki_history_incremental_v1 using Iceberg time travel - https://phabricator.wikimedia.org/T425734#12065101 (10APizzata-WMF) > a dashboard with just counts of how many... [12:24:53] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): dbt-jobs backfill: all base models for moderator actions - https://phabricator.wikimedia.org/T429995#12065119 (10amastilovic) Backfill complete, @CMyrick-WMF - please validate. [12:52:59] (03CR) 10A-pizzata: [C:03+2] Update Inc-MWH splitting SQL into smaller files [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1305633 (https://phabricator.wikimedia.org/T425734) (owner: 10Joal) [12:53:18] 06Data-Engineering, 05FY2025-26 KR 5.1, 06MediaWiki-Platform-Team (Kanban Board), 07OKR-Work, 13Patch-For-Review: redioscope: periodically publish top clients to the data lake - https://phabricator.wikimedia.org/T424823#12065277 (10Ottomata) Okay! Let us know, happy to help! > It was suggested to send... [12:55:44] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Document editor counts table and APIs - https://phabricator.wikimedia.org/T429863#12065298 (10Ottomata) > Design Document Could we close the google doc and move this to wikitech too? Perhaps at ata_Platform/Data_Lake/Edits/Editor_counts_per_page/Design or so... [13:09:10] (03Merged) 10jenkins-bot: Update Inc-MWH splitting SQL into smaller files [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1305633 (https://phabricator.wikimedia.org/T425734) (owner: 10Joal) [13:22:11] (03CR) 10Joal: [V:03+2 C:03+2] "LGTM- merging for later deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1305980 (https://phabricator.wikimedia.org/T430366) (owner: 10Dr0ptp4kt) [14:06:52] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 06Machine-Learning-Team (Q4 FY2025-26): `mediawiki.page_change.v1`: negative namespace_id schema validation errors - https://phabricator.wikimedia.org/T421237#12065906 (10Ottomata) From @gmodena in [[ https://wikimedia.slack.com/archive... [14:07:20] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 06Machine-Learning-Team (Q4 FY2025-26): `mediawiki.page_change.v1`: negative namespace_id schema validation errors - https://phabricator.wikimedia.org/T421237#12065916 (10Ottomata) [14:26:25] 06Data-Engineering, 06Java-Scala-Standardization, 07Essential-Work: Ignore MacOS .DS_Store in parent pom - https://phabricator.wikimedia.org/T407514#12066065 (10TJones) Thanks, @TheDJ! [14:27:47] 06Data-Engineering, 10Test Kitchen, 10Event-Platform: [EventGate] Reject events from inactive instruments and experiments - https://phabricator.wikimedia.org/T430541 (10mpopov) 03NEW [14:28:19] 06Data-Engineering, 10Test Kitchen, 10Event-Platform: [EventGate] Record metrics for instruments and experiments - https://phabricator.wikimedia.org/T430322#12066092 (10mpopov) > In order to keep the cardinality of the counters to a minimum, we should validate the instrument and experiment names prior to inc... [14:30:09] !log Test Kitchen experiment (poll 86114) - adds: none; removes: we-1-10-articleguidance-v1; fields: none - TK tips at https://w.wiki/_cvdP [14:30:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:31:23] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): MWH `page_is_deleted` and `revision_is_deleted_py_page_deletion` are not consistent - https://phabricator.wikimedia.org/T430543 (10AKhatun_WMF) 03NEW [15:05:22] !log Test Kitchen experiment (poll 86324) - adds: we-1-8-tempuser-post-edit; removes: none; fields: none - TK tips at https://w.wiki/_cvdP [15:05:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:09:20] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 07Essential-Work, 13Patch-For-Review: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all dependencies are available, and v... - https://phabricator.wikimedia.org/T367405#12066428 [15:09:29] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 07Essential-Work, 13Patch-For-Review: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all dependencies are available, and v... - https://phabricator.wikimedia.org/T367405#12066431 [15:29:01] 06Data-Engineering: Setup and populate initial version of user_agents_info table - https://phabricator.wikimedia.org/T430020#12066596 (10GGoncalves-WMF) Looks good to me! I'm just wondering whether `is_bot_like` should be a subcategory under `bot_category`: semantically, it's a fallback that says //"we think th... [15:34:47] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): MWH `page_is_deleted` and `revision_is_deleted_py_page_deletion` are not consistent - https://phabricator.wikimedia.org/T430543#12066652 (10AKhatun_WMF) [15:35:51] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): MWH `page_is_deleted` and `revision_is_deleted_py_page_deletion` are not consistent - https://phabricator.wikimedia.org/T430543#12066680 (10AKhatun_WMF) [15:53:15] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 10Event-Platform, 13Patch-For-Review: Create mediawiki.user_change event stream - https://phabricator.wikimedia.org/T423952#12066781 (10PatchDemoBot) Test wiki on [[ https://patchdemo.wmcloud.org | Patch demo ]] by... [16:01:40] 06Data-Engineering: Setup and populate initial version of user_agents_info table - https://phabricator.wikimedia.org/T430020#12066875 (10KCVelaga_WMF) >>! In T430020#12066596, @GGoncalves-WMF wrote: > Looks good to me! > > I'm just wondering whether `is_bot_like` should be a subcategory under `bot_category`: se... [16:47:23] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-MediaWiki-Incremental-History, 10Event-Platform: mediawiki.page_change.v1 - add revision.editor.first_edit_dt field - https://phabricator.wikimedia.org/T425029#12067106 (10Ottomata) Tested in beta, first_edit_dt works as expected. [16:57:37] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 06Machine-Learning-Team (Q4 FY2025-26): `mediawiki.page_change.v1`: negative namespace_id schema validation errors - https://phabricator.wikimedia.org/T421237#12067154 (10Ottomata) FYI, I merged [Add first_edit_dt to UserEntitySerialize... [17:34:05] 06Data-Engineering: dbt-jobs backfill: active_moderators_monthly - https://phabricator.wikimedia.org/T430576 (10CMyrick-WMF) 03NEW [17:34:19] 06Data-Engineering: dbt-jobs backfill: active_moderators_monthly - https://phabricator.wikimedia.org/T430576#12067374 (10CMyrick-WMF) [17:54:21] 06Data-Engineering (Q1 FS26/27 July 1st - September 30th): Customizable dbt scheduled dependency sensors - https://phabricator.wikimedia.org/T430579 (10amastilovic) 03NEW [18:15:00] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Document editor counts table and APIs - https://phabricator.wikimedia.org/T429863#12067490 (10AKhatun_WMF) @GGoncalves-WMF thanks for the suggestions! Updated accordingly. @Ottomata: I have made the google doc viewable for all. Is that enough? The doc is fo... [18:20:15] FIRING: HdfsRpcQueueLatency: RPC queue latency on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue/latency - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=56&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLatency [18:20:48] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Document editor counts table and APIs - https://phabricator.wikimedia.org/T429863#12067504 (10Ottomata) > The doc is for Attribution API as a whole and contributor counts is one part of it. Hm, that is a good point. Guilherme and I briefly discussed today... [18:25:15] RESOLVED: HdfsRpcQueueLatency: RPC queue latency on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue/latency - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=56&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLatency [18:28:05] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform: mediawiki.page_change.v1 event - Add revision revert details - https://phabricator.wikimedia.org/T423583#12067517 (10Ottomata) Discussed with @tchin today. We considered - A. Do nothing. `rev_reverted_ids` remains vestigial and null fo... [18:39:25] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 13Patch-For-Review: WE5.3.3b: Contributor Count Per Page [Attribution API] - https://phabricator.wikimedia.org/T426316#12067563 (10AKhatun_WMF) [18:40:20] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): WE5.3.3b: Contributor Count Per Page [Attribution API] - https://phabricator.wikimedia.org/T426316#12067567 (10AKhatun_WMF) [18:56:47] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: mediawiki.page_change.v1 event - Add revision revert details - https://phabricator.wikimedia.org/T423583#12067632 (10Ottomata) I ran a dry_run EvolveHiveTable command with mediawiki/page/change/latest from MR !67 whi... [20:02:24] 06Data-Engineering, 06MW-Interfaces-Team, 10Event-Platform: mediawiki.page_change.v1 - adapt page change kind model to MediaWiki's PageUpdateCauses. - https://phabricator.wikimedia.org/T430588 (10Ottomata) 03NEW [21:40:10] 06Data-Engineering: Create Kerberos identity for Randall Scout - https://phabricator.wikimedia.org/T430598 (10Rscout) 03NEW [21:40:43] 06Data-Engineering: Create Kerberos identity for Randall Scout - https://phabricator.wikimedia.org/T430598#12068219 (10Rscout) [23:03:13] 06Data-Engineering, 06Data-Platform-SRE, 07Epic, 13Patch-For-Review: Upgrade Spark to a version with long term Iceberg support - https://phabricator.wikimedia.org/T338057#12068408 (10BTullis) I'm taking this out of the currentl #data-platform-sre milestone and putting it back to the parent workboard, becau... [23:03:22] 06Data-Engineering, 06Data-Platform-SRE, 07Epic, 13Patch-For-Review: Upgrade Spark to a version with long term Iceberg support - https://phabricator.wikimedia.org/T338057#12068410 (10BTullis) [23:04:15] 06Data-Engineering, 06Data-Platform-SRE, 07Epic, 13Patch-For-Review: Upgrade Spark to a version with long term Iceberg support - https://phabricator.wikimedia.org/T338057#12068412 (10BTullis) a:05BTullis→03None [23:08:40] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Dbt backfill: editor_month and base_account_registration - https://phabricator.wikimedia.org/T430602 (10nshahquinn-wmf) 03NEW [23:09:43] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Dbt backfill: editor_month and base_account_registration - https://phabricator.wikimedia.org/T430602#12068428 (10nshahquinn-wmf) [23:12:20] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Dbt backfill: editor_month and base_account_registration - https://phabricator.wikimedia.org/T430602#12068445 (10nshahquinn-wmf) a:03amastilovic