[01:16:16] 10Data-Engineering: Low disk space on stat1007 - https://phabricator.wikimedia.org/T335069 (10xcollazo) I am actively using `/tmp/xcollazo_airflow_home`, and I nuke it every so often. That size (~1GB) is typical for a single airflow deployment with all the python packages. It does indeed weigh a ton though! [02:49:17] (SystemdUnitFailed) firing: (9) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:59:17] (SystemdUnitFailed) firing: (10) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:17:48] (SystemdUnitFailed) firing: (10) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:01:24] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10Marostegui) [07:01:57] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10Marostegui) @jcrespo kindly check what is needed for backup involved hosts, thanks! [07:29:30] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10jcrespo) [07:30:37] 10Data-Engineering, 10DBA, 10Discovery-Search, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10jcrespo) >>! In T335042#8795210, @Marostegui wrote: > @jcrespo kindly check what is needed for backup involved hosts, thanks! Done. [08:42:48] (SystemdUnitFailed) firing: (10) jupyterhub-conda.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:47:48] (SystemdUnitFailed) firing: (10) jupyterhub-conda.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:49:17] (SystemdUnitFailed) firing: (9) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:53:41] hi btullis! just saw your message from yesterday, will do some cleaning! [12:57:55] (03CR) 10Joal: [C: 03+1] "LGTM!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/910094 (https://phabricator.wikimedia.org/T334096) (owner: 10Mforns) [13:12:44] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10MW-1.40-notes (1.40.0-wmf.24; 2023-02-20), 10MW-1.41-notes (1.41.0-wmf.2; 2023-03-27), and 2 others: Remove StreamConfig::INTERNAL_SETTINGS logic from EventStreamConfig and do it in EventLogging client ... - https://phabricator.wikimedia.org/T286344 [13:17:47] btullis: I deleted most of my files (the heaviest would be the airflow envs) and most of analytics-privatedata folders older than 90 days (but they were mostly empty folders) [13:39:45] (03CR) 10Joal: "I commented on one file, the comments are valid for all :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/910092 (https://phabricator.wikimedia.org/T334096) (owner: 10Mforns) [13:47:05] (03CR) 10Joal: "Actually my previous comment might be invalid depending on how the hive2druid job takes input data - if it can read from files, then it sh" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/910092 (https://phabricator.wikimedia.org/T334096) (owner: 10Mforns) [13:50:34] Hey mforns - I reviewed some code for you, let's talk about my comments when you wish [13:50:36] mforns: Many thanks indeed! [13:50:51] cool! [13:50:56] joal: thanks! will read [13:51:45] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 11): Support for moving data from HDFS to public http file server - https://phabricator.wikimedia.org/T317167 (10JAllemandou) >>! In T317167#8794346, @Ottomata wrote: > Yeah but I guess it is the same problem for the stat box synced data too :/ Absolutely... [13:52:18] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 11): Support for moving data from HDFS to public http file server - https://phabricator.wikimedia.org/T317167 (10JAllemandou) Folders on HDFS created: `hdfs://analytics-hadoop/wmf/data/published/datasets` [13:58:05] 10Data-Engineering: Low disk space on stat1007 - https://phabricator.wikimedia.org/T335069 (10BTullis) 05Open→03Resolved Thanks @xcollazo and @mforns for checking out and handling the stuff in `/tmp` {F36957546,width=60%} I agree that a ~1 GB working directory shouldn't be a problem, so apologies of it cam... [13:58:20] hey ottomata - would you have aminute? [14:11:13] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 5.989% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [14:12:21] steve_munene: ^^ Is this related to something you're doing? No issue either way, just interested :) [14:34:38] btullis: does not look related to anything I am working on [14:34:51] joal: do you want to discuss hive_to_druid? :-) [14:35:23] mforns: sure! [14:35:41] joal: batcave? [14:35:48] joining! [14:39:30] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10MW-1.40-notes (1.40.0-wmf.24; 2023-02-20), 10MW-1.41-notes (1.41.0-wmf.2; 2023-03-27), and 2 others: Remove StreamConfig::INTERNAL_SETTINGS logic from EventStreamConfig and do it in EventLogging client ... - https://phabricator.wikimedia.org/T286344 [14:46:06] (03CR) 10Joal: [C: 03+1] "LGTM :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/910092 (https://phabricator.wikimedia.org/T334096) (owner: 10Mforns) [15:00:07] joal: thanks for the reviews! What do you think of EasyDAG? [15:14:42] 10Data-Engineering, 10API Platform: [Needs grooming] Turnilo: include authentication status in request data cube - https://phabricator.wikimedia.org/T332864 (10JArguello-WMF) [15:19:00] mforns: I think it's great :) Will facilitate not writing some boilerplate code [15:19:37] joal: and the fact that it's defined in dag_config.py, and not in a proper class within wmf_airflow_common? [15:20:34] hm, I had not noticed [15:20:42] Not sure - I assume ou [15:20:53] you have a reason to have put it there [15:21:19] I guess it's because some config parameters for it are in that file [15:22:02] I wonder if we could provide the class as a global helper, and enhance it in our dag_config file [15:22:05] Not sure [15:22:09] well... I think it could be done in wmf_airflow_common [15:22:52] but yes, it uses default_args, which is decorated inside dag_config.py [15:23:50] right [15:24:12] if it's not too complicated to put it in wmf_airflow_common, I think it's worth [15:25:47] getting the kids - will be back at standup [15:26:09] thanks joal! :-) [15:28:21] 10Data-Engineering-Planning: Data Engineering Pairing system - https://phabricator.wikimedia.org/T327790 (10JArguello-WMF) [15:30:54] (03PS1) 10Snwachukwu: Migrate pageview druid load hql queries to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/910520 (https://phabricator.wikimedia.org/T334104) [15:37:43] 10Data-Engineering-Planning: Data Engineering Pairing system - https://phabricator.wikimedia.org/T327790 (10JArguello-WMF) [15:44:16] !log deploying weekly deployment train for analytics refinery. [15:44:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:07:59] (03PS1) 10Milimetric: Add new wikis to sqoop list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/910546 (https://phabricator.wikimedia.org/T332070) [16:09:00] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add new wikis to sqoop list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/910546 (https://phabricator.wikimedia.org/T332070) (owner: 10Milimetric) [16:09:04] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 11): Support for moving data from HDFS to public http file server - https://phabricator.wikimedia.org/T317167 (10JArguello-WMF) [16:10:15] 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 11), 10Patch-For-Review: 2 additional new wikis - https://phabricator.wikimedia.org/T332070 (10Milimetric) if we can deploy this before May 1st, it will pull in data for the new wikis in the next mw history snapshot. [16:25:51] !log Deployed refinery using scap, then deployed onto hdfs as part of weekly deployment train. [16:25:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:32:13] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 11): Airflow ArchiveOperator should have a number of retries of 0 - https://phabricator.wikimedia.org/T332216 (10JArguello-WMF) 05Open→03Resolved [16:32:27] 10Data-Engineering, 10Data Pipelines (Sprint 11): Update API with March Net New Content Data - https://phabricator.wikimedia.org/T334890 (10JArguello-WMF) 05Open→03Resolved a:03JArguello-WMF [16:43:45] (03PS4) 10AikoChou: Add event schema for ML classification change on current page state [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/905965 (https://phabricator.wikimedia.org/T331401) [16:49:17] (SystemdUnitFailed) firing: (9) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:57:22] (03CR) 10AikoChou: Add event schema for ML classification change on current page state (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/905965 (https://phabricator.wikimedia.org/T331401) (owner: 10AikoChou) [17:52:12] 10Data-Engineering: Low disk space on stat1007 - https://phabricator.wikimedia.org/T335069 (10xcollazo) >>! In T335069#8796315, @BTullis wrote: > I agree that a ~1 GB working directory shouldn't be a problem, so apologies of it came across as hassling you @xcollazo :-) No worries! A question though: Is it me,... [17:54:24] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 11): Support for moving data from HDFS to public http file server - https://phabricator.wikimedia.org/T317167 (10xcollazo) >>! In T317167#8796278, @JAllemandou wrote: >>>! In T317167#8794346, @Ottomata wrote: >> Yeah but I guess it is the same problem for t... [18:11:28] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 5.766% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [20:49:17] (SystemdUnitFailed) firing: (9) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:11:28] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 5.545% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace