[00:07:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [00:08:23] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [00:18:23] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [03:14:36] 06Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Data Products (Data Products Sprint 11), 13Patch-For-Review, 10Web-Team-Backlog (FY2023-24 Q4 Sprint 1): Update mediawiki.web_ui_actions Stream Config - https://phabricator.wikimedia.org/T360955#9675416 (10Mabualruz) ###The following questions... [03:45:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [07:20:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [08:15:40] 10Quarry, 10ChangeProp, 06collaboration-services, 10GitLab, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9675600 (10Aklapper) As API Gateway is nowadays owned by #ServiceOps, adding the #serviceops project tag to open API G... [12:02:46] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Maintenance] Migrate wmcs to Airflow - https://phabricator.wikimedia.org/T357938#9675886 (10lbowmaker) [12:03:05] 10Data-Engineering (Sprint 9): Delete reportupdater jobs data/puppet-code - https://phabricator.wikimedia.org/T358210#9675889 (10lbowmaker) [12:03:45] 06Data-Engineering: 14[Maintenance] Define Migration/Deprecation Plan for Hue - 14https://phabricator.wikimedia.org/T333011#9675890 (10lbowmaker) 05Open→03Resolved [12:04:26] 10Data-Engineering (Q4 2024 April 1st - June 30th): Improve service runner to better support metrics and debugging - https://phabricator.wikimedia.org/T360924#9675893 (10lbowmaker) [12:04:27] 10Data-Engineering (Q4 2024 April 1st - June 30th): Delete reportupdater jobs data/puppet-code - https://phabricator.wikimedia.org/T358210#9675895 (10lbowmaker) [12:05:05] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Data Quality] Update data_quality schemas to be compatible with Iceberg tables - https://phabricator.wikimedia.org/T356866#9675899 (10lbowmaker) [12:05:14] 10Data-Engineering (Sprint 9): 14eventstreams: change default num_workers to 0 - 14https://phabricator.wikimedia.org/T359051#9675902 (10lbowmaker) 05Open→03Resolved [12:05:15] 10Data-Engineering (Sprint 9): 14[Refine Refactoring] Orchestrate Airflow execution of navigationtiming from config store - 14https://phabricator.wikimedia.org/T356360#9675905 (10lbowmaker) 05Open→03Resolved [12:05:17] 10Data-Engineering (Sprint 9): 14[Data Quality] Define concept for Alerting in coordination with SRE - 14https://phabricator.wikimedia.org/T351093#9675906 (10lbowmaker) 05Open→03Resolved [12:05:18] 10Data-Engineering (Sprint 9), 13Patch-For-Review: 14[Maintenance] Migrate cx ReportUpdater job - 14https://phabricator.wikimedia.org/T356424#9675903 (10lbowmaker) 05Open→03Resolved [12:05:19] 10Data-Engineering (Sprint 9), 10Event-Platform: 14ProduceCanaryEvents job should be scheduled by Airflow and/or a k8s service - 14https://phabricator.wikimedia.org/T341229#9675907 (10lbowmaker) 05Open→03Resolved [12:05:21] 10Data-Engineering (Sprint 9), 06Data-Platform, 06Movement-Insights: 14Add movement insights group/users to MWH denormalize job alerts - 14https://phabricator.wikimedia.org/T357472#9675908 (10lbowmaker) 05Open→03Resolved [12:05:34] 10Data-Engineering (Q4 2024 April 1st - June 30th), 13Patch-For-Review: [Dataset Config Store] Deploy poc to dse-k8s - https://phabricator.wikimedia.org/T357434#9675897 (10lbowmaker) [12:05:42] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032#9675910 (10rook) [12:05:51] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition - https://phabricator.wikimedia.org/T354694#9675915 (10lbowmaker) [12:05:58] 10Data-Engineering (Q4 2024 April 1st - June 30th): Airflow mapped tasks UI & metrics - https://phabricator.wikimedia.org/T357430#9675911 (10lbowmaker) [12:06:06] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Refine Refactoring] [Spike] Define a concept and provide a PoC for dynamic DAG execution in Airflow - https://phabricator.wikimedia.org/T356362#9675913 (10lbowmaker) [12:06:07] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Maintenance] Migrate pingback to Airflow - https://phabricator.wikimedia.org/T357372#9675917 (10lbowmaker) [12:06:47] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Maintenance] Migrate ReportUpdater browser queries to Airflow - https://phabricator.wikimedia.org/T354552#9675919 (10lbowmaker) [12:07:15] 10Data-Engineering (Q4 2024 April 1st - June 30th), 06Data-Platform-SRE, 06SRE Observability: [Data Platform] Install a Prometheus connector for Presto, pointed at thanos-query - https://phabricator.wikimedia.org/T347430#9675925 (10lbowmaker) [12:07:28] 10Data-Engineering (Q4 2024 April 1st - June 30th), 06Data Products, 06Structured-Data-Backlog: [Maintenance] Set up deletion jobs for Structured Data's data pipelines - https://phabricator.wikimedia.org/T347561#9675923 (10lbowmaker) [12:07:41] 10Data-Engineering (Q4 2024 April 1st - June 30th), 06serviceops-radar: Rewrite all Airflow sensors that use datacenter prepartitions to depend on both datacenters - https://phabricator.wikimedia.org/T338796#9675921 (10lbowmaker) [12:08:09] 06Data-Engineering, 10EventStreams, 10Prod-Kubernetes, 06serviceops, and 2 others: eventstreams regularly uses more than 95% of its memory limit - https://phabricator.wikimedia.org/T357005#9675937 (10lbowmaker) [12:08:13] 06Data-Engineering, 06Data-Platform-SRE, 10Event-Platform: [Event Platform] Define Flink k8s operator SLO - https://phabricator.wikimedia.org/T345914#9675934 (10lbowmaker) [12:08:42] 10Data-Engineering (Q4 2024 April 1st - June 30th), 13Patch-For-Review: [Refine Refactoring] Refactor refinery code for compatibility with Airflow integration - https://phabricator.wikimedia.org/T356363#9675940 (10lbowmaker) [12:08:43] 06Data-Engineering, 10ChangeProp, 10observability, 10service-runner, 10Event-Platform: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics - https://phabricator.wikimedia.org/T350180#9675944 (10lbowmaker) [12:08:45] 10Data-Engineering (Q4 2024 April 1st - June 30th), 06Data-Platform-SRE, 13Patch-For-Review: [Data Platform] Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641#9675930 (10lbowmaker) [12:09:13] 10Data-Engineering (Q4 2024 April 1st - June 30th), 13Patch-For-Review: [Data Quality] Implement basic data quality metrics for MW history - https://phabricator.wikimedia.org/T354692#9675942 (10lbowmaker) [12:09:14] 10Data-Engineering (Q4 2024 April 1st - June 30th): We should provide DQ integration with Python - https://phabricator.wikimedia.org/T353940#9675947 (10lbowmaker) [12:09:16] 10Data-Engineering (Q4 2024 April 1st - June 30th): Improve service runner to better support metrics and debugging - https://phabricator.wikimedia.org/T360924#9675951 (10lbowmaker) [12:09:18] 06Data-Engineering, 06Machine-Learning-Team, 06Wikimedia Enterprise, 07Epic, 10Event-Platform: [Event Platform] Implement PoC Event-Driven Data Pipeline for Revert Risk Model Scores using Event Platform Capabilities - https://phabricator.wikimedia.org/T338792#9675927 (10lbowmaker) [12:12:21] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Data Quality] Migrate MWHistoryChecker to DeeQu checks - https://phabricator.wikimedia.org/T361016#9675954 (10lbowmaker) [12:12:22] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Data Quality] Migrate MWHistoryChecker to DeeQu checks - https://phabricator.wikimedia.org/T361016#9675956 (10lbowmaker) [12:13:21] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Data Quality] Migrate the anomaly detection job to DeeQu checks - https://phabricator.wikimedia.org/T361014#9675957 (10lbowmaker) [12:13:40] 10Data-Engineering (Q4 2024 April 1st - June 30th), 07Spike: [SPIKE] Investigate OpenHouse as a data lake management tool - https://phabricator.wikimedia.org/T360969#9675959 (10lbowmaker) [12:14:20] 10Data-Engineering (Q4 2024 April 1st - June 30th), 07Spike: [Developer Experience] [SPIKE] Investigate process to automate deployment of folders and artifacts to HDFS - https://phabricator.wikimedia.org/T360968#9675961 (10lbowmaker) [12:14:47] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Developer Experience] Implement CI hql Linting - https://phabricator.wikimedia.org/T360967#9675963 (10lbowmaker) [12:15:06] 10Data-Engineering (Q4 2024 April 1st - June 30th), 07Spike: [Status Store] [SPIKE] Investigate and document approach for Iceberg Sensors - https://phabricator.wikimedia.org/T360922#9675965 (10lbowmaker) [12:16:05] 10Data-Engineering (Q4 2024 April 1st - June 30th), 07Spike: [SPIKE] [Dataset Config Store] - Design how config store feeds DataHub - https://phabricator.wikimedia.org/T360896#9675968 (10lbowmaker) [12:16:18] 06Data-Engineering, 10Event-Platform: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#9675970 (10lbowmaker) [12:17:57] 10Data-Engineering (Q4 2024 April 1st - June 30th), 10Event-Platform: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#9675975 (10lbowmaker) [12:18:01] 06Data-Engineering, 10Metrics Platform Backlog, 10Event-Platform: Document instructions for deleting an event stream and its usages - https://phabricator.wikimedia.org/T360210#9675976 (10lbowmaker) [12:19:00] 10Data-Engineering (Q4 2024 April 1st - June 30th), 10Metrics Platform Backlog, 10Event-Platform: Document instructions for deleting an event stream and its usages - https://phabricator.wikimedia.org/T360210#9675978 (10lbowmaker) [12:19:58] 06Data-Engineering: 14[Airflow] SparkSqlOperator fails when executing via Skein with master=local - 14https://phabricator.wikimedia.org/T359435#9675980 (10lbowmaker) 05Open→03Invalid 14Based on @JAllemandou comment I will close. @mforns let us know if you want us to investigate further. [12:20:57] 06Data-Engineering, 10Data Pipelines: [datahub] Implement automatic deletion of datasets with deleted data sources - https://phabricator.wikimedia.org/T335528#9675982 (10lbowmaker) [12:21:59] 10Data-Engineering (Q4 2024 April 1st - June 30th), 10Data Pipelines, 10Data-Catalog: Spike: Integrate Spark with DataHub - https://phabricator.wikimedia.org/T306896#9675984 (10lbowmaker) [12:23:00] 06Data-Engineering, 10Event-Platform, 07Spike: [SPIKE] Can we express Event Platform configs in config store? - https://phabricator.wikimedia.org/T361017#9675986 (10lbowmaker) [12:23:27] 10Data-Engineering (Q4 2024 April 1st - June 30th), 10Event-Platform, 07Spike: [SPIKE] Can we express Event Platform configs in config store? - https://phabricator.wikimedia.org/T361017#9675988 (10lbowmaker) [12:31:07] 10Data-Engineering (Q4 2024 April 1st - June 30th), 10Event-Platform, 10GitLab (Pipeline Services Migration🐤): Migrate Data Engineering Pipelinelib repos to GitLab - https://phabricator.wikimedia.org/T344730#9676001 (10lbowmaker) [12:31:20] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Iceberg Migration] Migrate pageview tables to Iceberg - https://phabricator.wikimedia.org/T347690#9676003 (10lbowmaker) [12:43:07] 06Data-Engineering: [Iceberg Migration] Migrate pageview tables to Iceberg - https://phabricator.wikimedia.org/T347690#9676006 (10lbowmaker) [12:47:56] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032#9676034 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/31 [12:48:00] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032#9676035 (10rook) Quarry is now on kubernetes. [12:49:02] 10Quarry: 14Deploy magnum cluster for quarry - 14https://phabricator.wikimedia.org/T349032#9676038 (10rook) 05Open→03Resolved [12:57:53] 10Quarry, 10ChangeProp, 06collaboration-services, 10GitLab, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9676049 (10akosiaris) LWN has an article titled "The race to replace Redis". I am not going to link directly as it is... [13:05:41] 10Quarry, 10ChangeProp, 06collaboration-services, 10GitLab, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9676057 (10aborrero) [13:25:54] 10Quarry: Shutdown quarry VMs - https://phabricator.wikimedia.org/T361470 (10rook) 03NEW [13:25:55] 10Quarry: Shutdown quarry VMs - https://phabricator.wikimedia.org/T361470#9676085 (10rook) [13:25:57] 10Quarry: 14Deploy magnum cluster for quarry - 14https://phabricator.wikimedia.org/T349032#9676084 (10rook) [13:34:44] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471 (10taavi) 03NEW [13:39:27] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#9676116 (10rook) I'm not able to reproduce this in firefox 124.0.1. Do I need to do anything additional to reproduce? [13:48:14] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#9676118 (10taavi) Can you try with Firefox "HTTPS-Only Mode" (in the "Privacy & Security" about:preferences tab) enabled? [13:52:23] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#9676122 (10rook) Strangely still seems to be letting log in with HTTPS-Only Mode enabled, restarted firefox just in case, still can log in [13:53:44] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#9676129 (10rook) Regardless, I'm unsure of where that location header is populated from. There is no reason for it to be http. [14:17:04] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#9676169 (10taavi) My gueus is that some proxy in the middle is not relaying the `x-forwarded-proto` header to the backend correctly. [14:25:36] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#9676202 (10rook) I'm not sure how the backend saw it as https before. As the web proxy terminates tls and delivered to http://172.16.5.58:80. Though I suppose it must have delivered x-forwarded-proto al... [17:40:39] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] List out all production Refine datasets that need to be migrated to the config store (Airflow and Iceberg) - https://phabricator.wikimedia.org/T361498 (10Ahoelzl) 03NEW [17:43:08] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] List out all production Refine datasets that need to be migrated to the config store (Airflow and Iceberg) - https://phabricator.wikimedia.org/T361498#9676957 (10Ahoelzl) [17:47:36] 10Data-Engineering (Q4 2024 April 1st - June 30th): Resolve long launch times for canary events on Airflow (30mins in total) - https://phabricator.wikimedia.org/T361499 (10Ahoelzl) 03NEW [17:47:48] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Maintenance] Resolve long launch times for canary events on Airflow (30mins in total) - https://phabricator.wikimedia.org/T361499#9676984 (10Ahoelzl) [17:52:43] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Refine Refactoring] Configure and deploy all Refine data sets for parallel production processing and testing - https://phabricator.wikimedia.org/T361501 (10Ahoelzl) 03NEW [17:53:03] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] [Refine Refactoring] List out all production Refine datasets that need to be migrated to the config store (Airflow and Iceberg) - https://phabricator.wikimedia.org/T361498#9677014 (10Ahoelzl) [17:55:48] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Refine Refactoring] Define and implement a automated testing / comparison tool for config store configured datasets - https://phabricator.wikimedia.org/T361502 (10Ahoelzl) 03NEW [17:55:49] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Refine Refactoring] Configure and deploy all Refine data sets for parallel production processing and testing - https://phabricator.wikimedia.org/T361501#9677033 (10Ahoelzl) [17:56:08] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Refine Refactoring] Define and implement a automated testing / comparison tool for config store configured datasets - https://phabricator.wikimedia.org/T361502#9677034 (10Ahoelzl) [17:58:24] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] [Maintenance] Define late arrival event strategy and idem-potent backfilling concept. - https://phabricator.wikimedia.org/T361503 (10Ahoelzl) 03NEW [17:58:36] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] [Maintenance] Define late arrival event strategy and idem-potent backfilling concept. - https://phabricator.wikimedia.org/T361503#9677053 (10Ahoelzl) [18:12:43] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] List out SystemD timers migration targets - https://phabricator.wikimedia.org/T361507 (10Ahoelzl) 03NEW [18:12:48] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] List out SystemD timers migration targets - https://phabricator.wikimedia.org/T361507#9677128 (10Ahoelzl) [18:14:14] 10Quarry, 10ChangeProp, 06collaboration-services, 10GitLab, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9677129 (10Tgr) [18:22:30] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] Define technology roadmap around Airflow / k8s / ceph - https://phabricator.wikimedia.org/T361509 (10Ahoelzl) 03NEW [18:22:31] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] Define technology roadmap around Airflow / k8s / ceph - https://phabricator.wikimedia.org/T361509#9677164 (10Ahoelzl) [18:23:11] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Spike] Define technology roadmap around Airflow / k8s / ceph - https://phabricator.wikimedia.org/T361509#9677167 (10Ahoelzl) Regarding Cepth we have several efforts in flight: - CEPH (data-sre) - Swift (multimedia + ad hoc use cases) https://wikitech.wikimedia... [18:30:53] 10Quarry, 10ChangeProp, 06collaboration-services, 10GitLab, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9677197 (10Tgr) [22:58:08] 10Data-Engineering (Q4 2024 April 1st - June 30th): Improve service runner to better support metrics and debugging - https://phabricator.wikimedia.org/T360924#9677992 (10Ahoelzl) Update: the new version should undergo a security review. [23:58:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage