[01:17:48] !log Deleted unused tables analytics_platform_eng.imagerec and analytics_platform_eng.imagerec_prod. [01:17:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:33:05] 10Data-Engineering, 10Event-Platform Value Stream, 10ci-test-error: Time-dependent test in PageChangeEventSerializerTest - https://phabricator.wikimedia.org/T325715 (10Ladsgroup) [09:53:24] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 05): We should provide utilities for local development and unit testing of Python streaming services - https://phabricator.wikimedia.org/T324951 (10gmodena) [14:36:02] PROBLEM - Disk space on an-launcher1002 is CRITICAL: DISK CRITICAL - free space: / 2552 MB (3% inode=58%): /tmp 2552 MB (3% inode=58%): /var/tmp 2552 MB (3% inode=58%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-launcher1002&var-datasource=eqiad+prometheus/ops [14:42:17] !log `apt-get clean` on an-launcher1002 to free some space [14:42:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:47:51] joal, ottomata o/ if you have some stuff that you can free on your home on an-launcher1002 please do :) [14:48:25] the rest is in /tmp and /var afaics [14:49:00] ack elukey, checking [14:50:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [14:55:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [14:56:38] RECOVERY - Disk space on an-launcher1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-launcher1002&var-datasource=eqiad+prometheus/ops [16:00:09] there are all these super old things in /tmp, on an-launcher1002, how come it's not getting deleted automatically after a month or so... [16:00:17] (I'll look after this meeting [16:00:18] ) [17:12:10] 10Data-Engineering, 10Event-Platform Value Stream, 10ci-test-error: Time-dependent test in PageChangeEventSerializerTest - https://phabricator.wikimedia.org/T325715 (10Umherirrender) [17:12:12] 10Data-Engineering, 10Event-Platform Value Stream, 10ci-test-error: Use a fake timer in EventBus unit test for PageChangeEventSerializerTest::testCreatePageChangeVisibilityEvent - https://phabricator.wikimedia.org/T325341 (10Umherirrender) [18:17:02] 10Data-Engineering-Planning, 10Data Pipelines: Document known data quality issues on Wikistats - https://phabricator.wikimedia.org/T325256 (10odimitrijevic) [20:16:32] PROBLEM - Disk space on an-launcher1002 is CRITICAL: DISK CRITICAL - free space: / 1514 MB (2% inode=57%): /tmp 1514 MB (2% inode=57%): /var/tmp 1514 MB (2% inode=57%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-launcher1002&var-datasource=eqiad+prometheus/ops [20:18:48] joal: an-launcher is acting up again, I'm looking at it and thinking we can delete stuff in /tmp that's super old [20:18:53] it's a few gigs [20:19:20] but something's been filling it up... maybe our spark job inserting into the iceberg content table [20:23:02] what do you think: find /tmp/* -mtime +100 -exec rm {} \; [20:23:13] (waiting a few min and doing...) [20:28:24] milimetric: do you want to pair? [20:28:35] mforns: yeah, that'd probably be safer [20:28:42] ok! [20:28:48] omw cave [20:37:10] RECOVERY - Disk space on an-launcher1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-launcher1002&var-datasource=eqiad+prometheus/ops [21:33:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [21:38:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [21:38:57] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [21:48:42] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage