[02:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [06:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [10:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [11:31:54] 10Data-Engineering (Q2 2024 October 1st - December 31th), 07Epic, 13Patch-For-Review: [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition - https://phabricator.wikimedia.org/T354694#10370886 (10gmodena) [11:46:25] 10Data-Engineering (Q2 2024 October 1st - December 31th), 13Patch-For-Review: [HAProxy transition] Deploy a staging airflow dag for webrequest refinement - https://phabricator.wikimedia.org/T378342#10370936 (10gmodena) The `webrequest_frontend` dag is now deployed on the Airflow `analytics` instance, producing... [11:54:07] 10Data-Engineering (Q2 2024 October 1st - December 31th), 13Patch-For-Review: Implement a data retention policy for webrequest_frontend datasets - https://phabricator.wikimedia.org/T379024#10370994 (10gmodena) RFC for extending `DataRegistry` to support data retention policies for `HiveDataset`s. [11:54:22] 10Data-Engineering (Q2 2024 October 1st - December 31th), 13Patch-For-Review: Implement a data retention policy for webrequest_frontend datasets - https://phabricator.wikimedia.org/T379024#10370996 (10gmodena) [12:05:09] 10Data-Engineering (Q2 2024 October 1st - December 31th), 13Patch-For-Review: Implement a data retention policy for webrequest_frontend datasets - https://phabricator.wikimedia.org/T379024#10371025 (10gmodena) A lot of discussion about this task happened in slack / OTR. While we don't have a standard way to e... [14:23:26] 06Data-Engineering: [Dumps 2] Time partitioning for mediawiki_content_history - https://phabricator.wikimedia.org/T380773#10371479 (10Ottomata) [14:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [15:15:23] 06Data-Engineering, 10Data Products (Data Products Sprint 23), 07Documentation, 10Event-Platform: Render human-readable schemas on schema.wikimedia.org - https://phabricator.wikimedia.org/T376841#10371763 (10Milimetric) [15:16:12] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data Products, 10Dumps 2.0 (Kanban Board): Dashboard and alerting of data quality metrics for wmf_dumps.wikitext_raw - https://phabricator.wikimedia.org/T357684#10371794 (10Ahoelzl) [15:18:54] 06Data-Engineering, 10CirrusSearch, 10MediaWiki-extensions-EventLogging, 10Metrics Platform, and 2 others: Error: Call to a member function getPageAsLinkTarget() on null - https://phabricator.wikimedia.org/T368543#10371843 (10Milimetric) [15:24:47] 06Data-Engineering: Airflow skips canary-event tasks - https://phabricator.wikimedia.org/T380836#10371885 (10Ottomata) This wouldn't solve the source problem of the job failing, but... Could we consider also producing canary events in a loop / in k8s alongside of eventgate? There is no harm in producing more c... [15:28:33] 10Analytics-Canonical-Data, 06Movement-Insights: Update mobile domain derivation code to match new canonical version - https://phabricator.wikimedia.org/T353300#10371898 (10nshahquinn-wmf) [15:31:28] 10Analytics-Canonical-Data, 06Movement-Insights: Update mobile domain derivation code to match new canonical version - https://phabricator.wikimedia.org/T353300#10371908 (10nshahquinn-wmf) a:03Hghani The actual mobile domain logic [was just changed](https://gerrit.wikimedia.org/r/c/operations/mediawiki-confi... [15:59:41] 06Data-Engineering: Warning of mismatch in declarations of Webrequest schema - https://phabricator.wikimedia.org/T380916#10372013 (10Ottomata) @JAllemandou is there anything to do here? [16:08:03] 06Data-Engineering, 10Data Products (Data Products Sprint 23), 07Documentation, 10Event-Platform: Render human-readable schemas on schema.wikimedia.org - https://phabricator.wikimedia.org/T376841#10372046 (10Ottomata) > We have too many unanswered questions around how to materialize on-build to resolve and... [16:12:54] 06Data-Engineering: Warning of mismatch in declarations of Webrequest schema - https://phabricator.wikimedia.org/T380916#10372094 (10JAllemandou) Ah, interesting! it's a know fact that spark uses a different schema-handling mechanism than the usual hive one, even if it uses the Hive metastore (it uses a json sch... [16:23:42] 06Data-Engineering, 06Data-Platform-SRE, 07Epic: Upgrade Hadoop to version 3.3.6 and Hive to version 4.0.1 - https://phabricator.wikimedia.org/T379385#10372149 (10Ottomata) WOW, TIL [[ https://github.com/unitycatalog/unitycatalog/?tab=readme-ov-file | UnityCatalog ]]. Very interesting! Added to https://doc... [16:25:01] 06Data-Engineering, 06Data-Platform-SRE, 07Epic: Upgrade Hadoop to version 3.3.6 and Hive to version 4.0.1 - https://phabricator.wikimedia.org/T379385#10372158 (10Ottomata) > we could migrate the Hive metastore on the dse-k8s cluster, instead of on dedicated Hadoop co-ordinator servers (currently an-coord100... [16:27:42] 10Data-Engineering (Q2 2024 October 1st - December 31th): [Refine DAG Improvement] Add Parameter to Reduce Spark Driver Logs in Skein Log Collection - https://phabricator.wikimedia.org/T381074#10372164 (10Ottomata) +1 sounds great. [16:44:47] 06Data-Engineering, 10CirrusSearch, 10MediaWiki-extensions-EventLogging, 10Metrics Platform, and 2 others: Error: Call to a member function getPageAsLinkTarget() on null - https://phabricator.wikimedia.org/T368543#10372247 (10Milimetric) [17:08:56] (03CR) 10Aleksandar Mastilovic: [V:03+2 C:03+2] "LGTM!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/1098954 (owner: 10Joal) [17:12:06] (03Merged) 10jenkins-bot: Use WMF parent pom and fix accordingly [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/1098954 (owner: 10Joal) [17:28:03] (03CR) 10Joal: "Parent patch has been merged, this one is ready too" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/1098955 (owner: 10Joal) [17:28:10] (03CR) 10Joal: [C:03+2] Fix failure scenario when no timestamp available [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/1098955 (owner: 10Joal) [17:29:37] (03Merged) 10jenkins-bot: Fix failure scenario when no timestamp available [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/1098955 (owner: 10Joal) [17:56:54] 06Data-Engineering, 10Data-Engineering-Wikistats, 06Data Products, 10PageViewInfo, and 2 others: Pageviews Analysis 3.0 (Vue + Codex) - https://phabricator.wikimedia.org/T378549#10372625 (10Ottomata) [17:56:56] 06Data-Engineering, 10Data-Engineering-Dashiki, 10Data Products (Epics Timeline), 07Epic: Public dashboard process - https://phabricator.wikimedia.org/T361214#10372635 (10Ottomata) [17:57:28] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Data-Platform-SRE (2024.11.30 - 2024.12.20), 13Patch-For-Review: Upgrade Spark to a version with long term Iceberg support, and with fixes to support Dumps 2.0 - https://phabricator.wikimedia.org/T338057#10372653 (10xcollazo) Re T338057#10356492, we... [17:57:34] 06Data-Engineering, 10Data-Engineering-Dashiki, 10Data Products (Epics Timeline), 07Epic: Data Platform - Public dashboard support - https://phabricator.wikimedia.org/T361214#10372654 (10Ottomata) [18:14:52] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Movement-Insights: 2024-10-10 Data Loss Incident - webrequest Hive table - https://phabricator.wikimedia.org/T376882#10372726 (10Aklapper) @JAllemandou / @Ahoelzl: Should this task remain open and at "Unbreak now" priority (["Something is broken and n... [18:16:12] (03PS2) 10Snwachukwu: Update Geoeditors Monthly to support Temp Accounts. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1092916 (https://phabricator.wikimedia.org/T379769) [18:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [19:01:37] 06Data-Engineering, 10Dumps 2.0 (Kanban Board): Stop using spark.jars.packages - https://phabricator.wikimedia.org/T375298#10372963 (10Ottomata) [19:30:43] (03PS3) 10Snwachukwu: Update Geoeditors Monthly to support Temp Accounts. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1092916 (https://phabricator.wikimedia.org/T379769) [19:43:12] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Movement-Insights: 2024-10-10 Data Loss Incident - webrequest Hive table - https://phabricator.wikimedia.org/T376882#10373210 (10Ottomata) This could be resolved. However our process is things in our quarterly Done column are not resolved until the e... [19:51:42] PROBLEM - statsv Varnishkafka log producer on cp3069 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [19:51:50] PROBLEM - Webrequests Varnishkafka log producer on cp3069 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [19:52:12] PROBLEM - eventlogging Varnishkafka log producer on cp3069 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/eventlogging.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [19:58:44] RECOVERY - eventlogging Varnishkafka log producer on cp3069 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/eventlogging.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [20:01:44] RECOVERY - statsv Varnishkafka log producer on cp3069 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [20:02:44] RECOVERY - Webrequests Varnishkafka log producer on cp3069 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [20:41:07] (03CR) 10Ottomata: [C:03+1] Add an option to ignore missing input folders in Refine [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1088346 (https://phabricator.wikimedia.org/T369845) (owner: 10Aqu) [21:09:45] (03CR) 10Snwachukwu: Update Geoeditors Edits Monthly to support Temp Accounts. (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1092890 (https://phabricator.wikimedia.org/T379768) (owner: 10Snwachukwu) [22:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [23:01:26] (03CR) 10Mforns: Update Geoeditors Edits Monthly to support Temp Accounts. (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1092890 (https://phabricator.wikimedia.org/T379768) (owner: 10Snwachukwu) [23:46:18] (03CR) 10Milimetric: Update Geoeditors Edits Monthly to support Temp Accounts. (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1092890 (https://phabricator.wikimedia.org/T379768) (owner: 10Snwachukwu)