[02:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [06:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [08:12:46] (03PS1) 10Aqu: Improve Spark Logger Quietness [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100394 (https://phabricator.wikimedia.org/T381074) [09:05:44] (03CR) 10Aqu: [C:03+1] build: introduce .sdkmanrc to document JDK version to use for this project [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/1062712 (https://phabricator.wikimedia.org/T346611) (owner: 10Gehel) [09:05:59] (03CR) 10Gehel: [C:03+2] build: introduce .sdkmanrc to document JDK version to use for this project [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/1062712 (https://phabricator.wikimedia.org/T346611) (owner: 10Gehel) [09:09:29] (03CR) 10Aqu: [C:03+1] "Cool. We may have to think about mac users not using open jdk." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1098908 (https://phabricator.wikimedia.org/T346611) (owner: 10Gehel) [09:10:18] (03CR) 10Aqu: [C:03+2] build: add sdkman configuration [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1098908 (https://phabricator.wikimedia.org/T346611) (owner: 10Gehel) [09:22:43] (03Merged) 10jenkins-bot: build: add sdkman configuration [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1098908 (https://phabricator.wikimedia.org/T346611) (owner: 10Gehel) [09:30:15] (03PS10) 10Aqu: Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 (owner: 10Gehel) [09:44:43] (03PS11) 10Aqu: Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 (owner: 10Gehel) [10:00:54] 10Data-Engineering (Q2 2024 October 1st - December 31th), 07Epic, 13Patch-For-Review: [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition - https://phabricator.wikimedia.org/T354694#10378733 (10gmodena) [10:20:47] 06Data-Engineering, 10CirrusSearch, 10Structured Data Engineering, 06Structured-Data-Backlog, and 2 others: Migrate image recommendation to use page_weighted_tags_changed stream - https://phabricator.wikimedia.org/T372912#10378796 (10Gehel) [10:22:31] (03PS12) 10Aqu: Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 (owner: 10Gehel) [10:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [11:07:13] 06Data-Engineering, 06Data-Platform-SRE: Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479 (10BTullis) 03NEW [11:07:50] 06Data-Engineering, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10378973 (10BTullis) [11:20:06] 06Data-Engineering, 10CirrusSearch, 10Structured Data Engineering, 06Structured-Data-Backlog, and 2 others: Migrate image recommendation to use page_weighted_tags_changed stream - https://phabricator.wikimedia.org/T372912#10379023 (10BTullis) >>! In T372912#10316874, @pfischer wrote: > @BTullis, I would ap... [11:20:28] 06Data-Engineering, 06Data-Platform, 06DBA, 07Schema-change-in-production: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742#10379024 (10Ladsgroup) [11:35:54] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10379044 (10gmodena) @tchin w e'll need to update this patch to target {T381322} [11:36:15] 06Data-Engineering, 06Data-Platform, 06DBA, 07Schema-change-in-production: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742#10379054 (10Ladsgroup) [11:36:20] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10379055 (10gmodena) @tchin actually - do we already have the s3 buckets? [12:02:22] (03Abandoned) 10Milimetric: GDI Equity Landscape Tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/941911 (owner: 10Nmaphophe) [12:23:30] 06Data-Engineering, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10379301 (10BTullis) p:05Triage→03High [12:47:44] 06Data-Engineering, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10379451 (10brouberol) These are the error messages I could see in the console: {F57778202} {F57778203} {F57778205}... [12:52:07] 06Data-Engineering, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10379470 (10brouberol) https://github.com/apache/airflow/blob/a238d06b8a1e631dfee58e53e9349f7a0c0fa880/airflow/api_conn... [13:00:54] 06Data-Engineering, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10379505 (10brouberol) ` airflow_analytics=# select map_index from task_instance where dag_id='refine_to_hive_hourly' a... [13:12:40] FYI, a fleet-wide software update failed on an-test-worker1001 due to no disk space left [13:12:57] there's 57G of *_resources files in /tmp [13:19:12] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10379578 (10Marostegui) ` cumin2024@db1167.eqiad.wmnet[wikidatawiki]> stop slave; ALTER TABLE /*_*/revision Query OK, 0... [13:19:35] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10379594 (10Marostegui) [14:08:13] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10379785 (10tchin) @gmodena We don't, although I guess now is a good time to do it. Wh... [14:13:36] (03CR) 10Milimetric: "we just kind of left this up in the air, sorry about that. What is the status with this project? What should we do? I can see arguments" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/958518 (https://phabricator.wikimedia.org/T345446) (owner: 10Fabian Kaelin) [14:14:44] (03CR) 10Milimetric: [C:03+2] Update links to docs and repo [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/1049265 (https://phabricator.wikimedia.org/T357327) (owner: 10Triciaburmeister) [14:16:01] (03Merged) 10jenkins-bot: Update links to docs and repo [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/1049265 (https://phabricator.wikimedia.org/T357327) (owner: 10Triciaburmeister) [14:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [15:03:48] (03PS13) 10Gehel: Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 [15:09:27] (03CR) 10Gehel: Extraction of RefineHelper (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 (owner: 10Gehel) [15:09:46] (03PS14) 10Gehel: Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 [15:14:58] (03CR) 10CI reject: [V:04-1] Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 (owner: 10Gehel) [15:16:32] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10380070 (10brouberol) I can create the buckets. I'll need to generate S3 users as wel... [15:26:35] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10380125 (10brouberol) Scratch that, found it. I need to put any private values under... [15:32:50] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10380157 (10brouberol) ` brouberol@cephosd1001:~$ sudo radosgw-admin user create --uid... [15:38:40] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10380183 (10brouberol) I've done the same thing for `mw-content-history-reconcile-enri... [15:40:12] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10380196 (10brouberol) [16:47:55] 06Data-Engineering, 06Data-Platform-SRE: Do performance testing of a big Hadoop Table hosted by Ceph - https://phabricator.wikimedia.org/T381416#10380406 (10JAllemandou) Let's try that. The network load is indeed a known impact of decoupling compute and storage in big-data world. [17:20:38] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100067 (https://phabricator.wikimedia.org/T377257) (owner: 10Joal) [17:26:48] (03CR) 10Ottomata: [C:03+1] Improve Spark Logger Quietness [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100394 (https://phabricator.wikimedia.org/T381074) (owner: 10Aqu) [17:35:16] (03PS1) 10Joal: Update load cassandra top-pageview monthly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100505 [17:44:37] (03CR) 10Btullis: [C:03+1] Update load cassandra top-pageview monthly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100505 (owner: 10Joal) [17:46:54] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100505 (owner: 10Joal) [17:50:23] !log Deploying refinery with scap [17:50:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:56:19] !log Deploying refinery onto HDFS [17:56:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:59:13] !log Rerun cassandra_load_pageview_top_articles_monthly after refinery patch deployed [17:59:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:28:18] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [18:32:05] 06Data-Engineering, 06Data Products, 06DBA, 10GlobalBlocking, 07Schema-change-in-production: Update type of gbw_expiry on the global_block_whitelist table - https://phabricator.wikimedia.org/T381521 (10Dreamy_Jazz) 03NEW [18:35:03] 06Data-Engineering, 06Data Products, 06DBA, 10GlobalBlocking, 07Schema-change-in-production: Update type of gbw_expiry on the global_block_whitelist table - https://phabricator.wikimedia.org/T381521#10380922 (10Dreamy_Jazz) [19:06:21] !log Alter wmf_raw.mediawiki_user adding user_is_temp field for temp_account project [19:06:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:07:08] !log Alter webrequest_actor 3 tables adding actor_signature_per_project_family field for automated-traffic detection heuristic changeb [19:07:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:07:40] !log Recompute 1 day of webrequest_actor_metrics for rollup to work as expected [19:07:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log