[02:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [06:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [08:47:28] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06DBA, 07Schema-change-in-production: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781#10351639 (10ABran-WMF) 05In progress→03Resolved s4 is now done: ` Result: {"already done in all dbs": ["db1150:3... [08:54:26] 06Data-Engineering, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10351648 (10brouberol) 05Open→03Resolved [10:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [10:37:32] 06Data-Engineering, 10Data-Engineering-Wikistats, 10Data Pipelines, 10LPL Technical Support, and 2 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476#10352113 (10MaryMunyoki) a:05Amire80→03None [12:24:03] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Test if an existing conda environment with Spark 3.1.2 clients works fine with Spark 3.5.3 - https://phabricator.wikimedia.org/T380417#10352589 (10BTullis) >>! In T380417#10342041, @xcollazo... [12:29:06] !log rebooting cephosd cluster for T380731 [12:29:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:29:10] T380731: Reboots of Bookworm systems which use 6.1.115 - https://phabricator.wikimedia.org/T380731 [13:01:11] 06Data-Engineering: [opsweek] Airflow DAGs with Spark jobs should always include Spark tuning variables - https://phabricator.wikimedia.org/T343154#10352775 (10Ottomata) Related: {T348963} [13:17:54] 06Data-Engineering, 10CirrusSearch, 03Discovery-Search (Current work): Move oozie/util/swift/upload/ out of the refinery oozie folder - https://phabricator.wikimedia.org/T380343#10352835 (10Gehel) p:05Triage→03Medium [13:24:35] 06Data-Engineering, 10Wmfdata-Python: Convert existing Wmfdata docstrings to a standard format - https://phabricator.wikimedia.org/T380742 (10nshahquinn-wmf) 03NEW p:05Triage→03Medium [13:24:37] 06Data-Engineering, 10Wmfdata-Python: Convert existing Wmfdata docstrings to a standard format - https://phabricator.wikimedia.org/T380742#10352893 (10nshahquinn-wmf) [13:29:05] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Discovery-Search, 10Dumps 2.0, 13Patch-For-Review: Add relevant kafka clusters to defined airflow connections in puppet - https://phabricator.wikimedia.org/T379676#10352914 (10Gehel) p:05Triage→03Medium [13:33:23] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Discovery-Search, 10Dumps 2.0, 10Data-Platform-SRE (2024.11.09 - 2024.11.29), 13Patch-For-Review: Add relevant kafka clusters to defined airflow connections in puppet - https://phabricator.wikimedia.org/T379676#10352937 (10Gehel) [13:38:42] 06Data-Engineering, 06Movement-Insights, 10Wmfdata-Python, 07Documentation: Publish HTML docs for Wmfdata-Python on doc.wikimedia.org - https://phabricator.wikimedia.org/T298178#10353012 (10nshahquinn-wmf) >>! In T298178#10317227, @apaskulin wrote: > Sphinx plus the Furo theme is a great option! If you're... [13:45:09] 06Data-Engineering, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Log aggregation is failing for the analytics user due to too many files in /var/log/hadoop-yarn/apps/analytics/logs on HDFS - https://phabricator.wikimedia.org/T380674#10353034 (10BTullis) 05Open→03Resolved I have also failed back the... [13:45:27] 06Data-Engineering, 10Wmfdata-Python: Convert existing Wmfdata docstrings to a standard format - https://phabricator.wikimedia.org/T380742#10353039 (10nshahquinn-wmf) [13:46:36] 06Data-Engineering, 06Movement-Insights, 10Wmfdata-Python, 07Documentation: Publish HTML docs for Wmfdata-Python on doc.wikimedia.org - https://phabricator.wikimedia.org/T298178#10353044 (10nshahquinn-wmf) [13:47:14] 06Data-Engineering, 06Movement-Insights, 10Wmfdata-Python, 07Documentation: Publish HTML docs for Wmfdata-Python on doc.wikimedia.org - https://phabricator.wikimedia.org/T298178#10353048 (10nshahquinn-wmf) [13:47:15] 06Data-Engineering, 10Wmfdata-Python: Convert existing Wmfdata docstrings to a standard format - https://phabricator.wikimedia.org/T380742#10353049 (10nshahquinn-wmf) [13:54:42] 06Data-Engineering, 06Data Products, 10Event-Platform: Add schema diffing support to jsonschema-tools and run diff in CI - https://phabricator.wikimedia.org/T321850#10353068 (10Ottomata) Just a heads up. This is similar (schema UI improvements) to the work yall wanted to do in {T376841}, so I added you all.... [13:55:36] !log enabled the performance CPU governor across the Hadoop cluster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1072529 for T362922 [13:55:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:55:38] T362922: Audit/consider enabling CPU performance governor on DPE SRE-owned hosts - https://phabricator.wikimedia.org/T362922 [13:57:09] 06Data-Engineering, 06Product-Analytics, 10Event-Platform: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163#10353088 (10Ottomata) a:03Ottomata I hope to find time for this again in the new year [14:20:19] (03CR) 10Milimetric: Modify MediaWiki History queries to support Temp Accounts (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1088342 (https://phabricator.wikimedia.org/T379230) (owner: 10Mforns) [14:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [14:30:32] 06Data-Engineering, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10353230 (10Ottomata) > Data Engineers would like DAG deployments to be automated SRE would like DAG deployments to be "secure" (aka: no single person can m... [14:31:23] 06Data-Engineering, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10353238 (10Ottomata) BTW, this is why I was asking this https://phabricator.wikimedia.org/T368033#10251511 Did you call get to talk about that? How will... [14:34:35] 06Data-Engineering, 10CirrusSearch, 03Discovery-Search (Current work): Move oozie/util/swift/upload/ out of the refinery oozie folder - https://phabricator.wikimedia.org/T380343#10353252 (10Ottomata) +1 [14:35:06] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Discovery-Search, 10Dumps 2.0, 10Data-Platform-SRE (2024.11.09 - 2024.11.29), 13Patch-For-Review: Add relevant kafka clusters to defined airflow connections in puppet - https://phabricator.wikimedia.org/T379676#10353259 (10Ottomata) > Can we mov... [14:48:18] (03CR) 10Ottomata: [C:03+1] Remove dead code, doc and artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1092879 (owner: 10Joal) [16:55:11] 06Data-Engineering: Time-based partitioning for wikitext history for Dumps2 - https://phabricator.wikimedia.org/T380773#10354295 (10xcollazo) [17:03:03] 06Data-Engineering, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10354357 (10brouberol) First off, one thing that wasn't clear for other folks is that you're still free to self-approve, so the check really makes sure some... [17:35:52] (03PS2) 10Joal: Remove dead code, doc and artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1092879 [17:36:48] (03CR) 10Joal: Remove dead code, doc and artifacts (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1092879 (owner: 10Joal) [18:00:01] (03CR) 10DCausse: [C:03+1] Remove dead code, doc and artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1092879 (owner: 10Joal) [18:03:25] 06Data-Engineering, 10CirrusSearch, 03Discovery-Search (Current work), 13Patch-For-Review: Move oozie/util/swift/upload/ out of the refinery oozie folder - https://phabricator.wikimedia.org/T380343#10354654 (10dcausse) https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1092879 is moving the script to `bin` [18:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [21:46:34] 06Data-Engineering, 06Data-Platform-SRE, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group, sql_lab role, Kerberos Principal for Khantstop - https://phabricator.wikimedia.org/T379303#10355402 (10mpopov) 05Resolved→03Open a:05MatthewVernon→03None @Khantstop has rep... [21:58:12] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform, 06Movement-Insights: Modify the automated traffic detection to be applied at the project family level - https://phabricator.wikimedia.org/T377257#10355456 (10Hghani) We completed initial impact analysis on the proposal to apply automa... [22:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent