[03:38:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [07:38:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [09:41:12] !log Replace pageview_actor data by backfilled correction [09:41:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:11:08] 06Data-Engineering: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565#10405472 (10MarcoSwart) In [[ https://stats.wikimedia.org/#/nl.wiktionary.org | November nl.wiktionary ]] had 8 million pageviews from Singapore, which were not considered "spiders" or "bots".... [11:02:25] !log 2024_12-Backfill Launch jobs dependent on pageview_actor from Airflow (8 jobs) [11:02:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:17:08] 06Data-Engineering, 10Phabricator: Add ahoezel, aotto and abaso to acl*phabricator project - https://phabricator.wikimedia.org/T382089#10405676 (10Aklapper) 05Open→03Resolved a:03Aklapper Done: https://phabricator.wikimedia.org/project/members/1081/ [11:38:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [11:52:29] 06Data-Engineering, 10ConfirmEdit (CAPTCHA extension), 06Data Products, 10MediaWiki-extensions-EventLogging, and 2 others: Send captcha API response data to event logging - https://phabricator.wikimedia.org/T379179#10405791 (10acooper) We need to do something like this schema: https://schema.wikimedia.org/... [12:13:00] 10Data-Engineering (Q2 2024 October 1st - December 31th), 13Patch-For-Review: [HAProxy transition] Deploy a staging airflow dag for webrequest refinement - https://phabricator.wikimedia.org/T378342#10405837 (10gmodena) [13:59:41] 06Data-Engineering, 06Data Products, 06Data-Platform, 06Movement-Insights, and 4 others: Temporary Accounts Initiative (IP Masking) - Add user_is_temporary and user_is_permanent to data tables - https://phabricator.wikimedia.org/T356701#10406098 (10AndrewTavis_WMDE) [14:04:03] (03CR) 10Mforns: [V:03+2] Modify MediaWiki History queries to support Temp Accounts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1088342 (https://phabricator.wikimedia.org/T379230) (owner: 10Mforns) [14:07:44] 06Data-Engineering, 10Phabricator: Add ahoezel, aotto and abaso to acl*phabricator project - https://phabricator.wikimedia.org/T382089#10406128 (10Ottomata) Thank you! [14:10:20] 06Data-Engineering, 10ConfirmEdit (CAPTCHA extension), 06Data Products, 10MediaWiki-extensions-EventLogging, and 2 others: Send captcha API response data to event logging - https://phabricator.wikimedia.org/T379179#10406142 (10Ottomata) @acooper, what would be the producer of the captcha? Is it code we wr... [14:56:48] 06Data-Engineering, 10Structured-Data-Backlog (Current Work): [L] Track commons deletion requests - https://phabricator.wikimedia.org/T370898#10406290 (10Cparle) [14:57:37] 06Data-Engineering, 10Structured-Data-Backlog (Current Work): [L] Track commons deletion requests - https://phabricator.wikimedia.org/T370898#10406296 (10Cparle) [15:00:56] 06Data-Engineering, 06Research, 10Data-Platform-SRE (2024.11.30 - 2024.12.20), 03Discovery-Search (Current work): Low available space on Hadoop / HDFS - https://phabricator.wikimedia.org/T381707#10406302 (10BTullis) We're still using up the available space much too quickly to survive the holiday. Currently... [15:17:32] (03PS8) 10Peter Fischer: Rewrite MediawikiDumper partitioning implementation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1101892 [15:22:33] 10Data-Engineering (Q2 2024 October 1st - December 31th), 13Patch-For-Review: Publish Data Engineering maintained NodeJS packages to GitLab and use them in depender code - https://phabricator.wikimedia.org/T366612#10406404 (10tchin) [15:27:00] 10Data-Engineering (Q2 2024 October 1st - December 31th): Implement automated deployment of refinery HQL files to HDFS (via blunderbuss) - https://phabricator.wikimedia.org/T365659#10406410 (10Ottomata) [15:32:30] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Figure why XML Dump code generates 1000's of files for simplewiki - https://phabricator.wikimedia.org/T381016#10406444 (10Ahoelzl) [15:32:47] 06Data-Engineering, 10MediaWiki-General, 10Event-Platform, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817#10406446 (10Ottomata) [15:38:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [17:20:24] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Data Pipelines, 10Data-Catalog: Integrate Spark with DataHub with lineage - https://phabricator.wikimedia.org/T306896#10406856 (10Ottomata) Thomas and I just checked a few things to see how we should move forward. tl;dr - spark jobs that write to... [17:36:44] (03CR) 10Milimetric: "looks good, just following up on the user_is discussion" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1088232 (https://phabricator.wikimedia.org/T379230) (owner: 10Mforns) [17:41:49] (03CR) 10Milimetric: Modify MediaWiki History queries to support Temp Accounts (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1088342 (https://phabricator.wikimedia.org/T379230) (owner: 10Mforns) [17:48:13] 06Data-Engineering: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565#10406941 (10Mayakp.wiki) hi @MarcoSwart , we identified this issue with Singapore traffic and are currently working on fixing it in the tasks under T373630. [18:38:01] 06Data-Engineering, 06Data-Platform, 10Dumps-Generation: The cleanup_tmpdumps service fails when the file to delete doesn't exist - https://phabricator.wikimedia.org/T381026#10407147 (10Ahoelzl) If the missing target file is not crucial indicator for a process failure, I recommend a more failsafe deletion at... [18:41:30] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Research, 10Data-Platform-SRE (2024.11.30 - 2024.12.20), 03Discovery-Search (Current work): Low available space on Hadoop / HDFS - https://phabricator.wikimedia.org/T381707#10407159 (10Ahoelzl) [18:42:25] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10MediaWiki-General, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817#10407160 (10Ahoelzl) [18:42:36] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10MediaWiki-General, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817#10407162 (10Ahoelzl) [18:48:33] 06Data-Engineering: 20241201 wikidatawiki xml dump not progressing - https://phabricator.wikimedia.org/T382084#10407176 (10Ahoelzl) Still in progress according to https://dumps.wikimedia.org/backup-index.html. [18:51:33] 10Data-Engineering (Q2 2024 October 1st - December 31th): 20241201 wikidatawiki xml dump not progressing - https://phabricator.wikimedia.org/T382084#10407181 (10Ahoelzl) [19:31:40] !log Deploy analytics-airflow for pageview-hourly backfill [19:31:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:38:16] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [23:36:40] 06Data-Engineering, 10Event-Platform: EventBus PageChangeHooks uses unconventional log channel name - https://phabricator.wikimedia.org/T382288 (10tstarling) 03NEW [23:38:16] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent