[01:21:16] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:24:26] 10Data-Engineering, 10Data-Engineering-Kanban, 10Superset: Superset Timeout Logging - https://phabricator.wikimedia.org/T294772 (10odimitrijevic) [02:27:04] 10Data-Engineering, 10Data-Engineering-Kanban, 10Superset: Superset SQL Lab fails to stop query - https://phabricator.wikimedia.org/T293083 (10odimitrijevic) [02:30:15] 10Data-Engineering, 10Data-Engineering-Kanban, 10Superset: Superset Timeout Logging - https://phabricator.wikimedia.org/T294772 (10odimitrijevic) a:05razzi→03None [04:23:04] (03CR) 10TsepoThoabala: "This LGTM." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte) [11:47:44] 10Analytics, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review, 10User-Ladsgroup: Add dhorn to analytics-privatedata-users - https://phabricator.wikimedia.org/T300579 (10Ladsgroup) I talked to @jbond and it seems you can login with your CN in CAS, that's why "DannyH (WMF)" works but your ldap entry says y... [11:50:01] 10Analytics, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review, 10User-Ladsgroup: Add dhorn to analytics-privatedata-users - https://phabricator.wikimedia.org/T300579 (10Ladsgroup) [11:50:58] 10Analytics, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review, 10User-Ladsgroup: Add dhorn to analytics-privatedata-users - https://phabricator.wikimedia.org/T300579 (10Ladsgroup) 05Open→03Resolved You should be able to access it in half an hour, reopen if that's not the case. [11:51:42] 10Analytics, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review, 10User-Ladsgroup: Add dhorn to analytics-privatedata-users - https://phabricator.wikimedia.org/T300579 (10jbond) > I talked to @jbond and it seems you can login with your CN in CAS, that's why "DannyH (WMF)" works but your ldap entry says you... [13:30:15] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:43:51] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:44:32] mforns: o/ [13:55:09] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:23:39] heya ottomata :] [14:27:09] mforns: hello! [14:27:32] mforns: lets talk about default_args, i got a little confused when trying to understand default_args and dag_config [14:27:36] bc? [14:27:45] oh also [14:27:55] we can troubleshoot pyarrow stuff [14:27:58] heya, I tried the change you proposed yesterday, and it has the same error [14:28:07] I'm about to meet with Antoine in the bc [14:32:27] okay coming i'll lurk til we got talkey time [15:24:56] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review, 10Product-Analytics (Kanban): Test log file and error notification - https://phabricator.wikimedia.org/T295733 (10BTullis) Right, this time I think I have fixed it. I have added the following: ` syslog_identifier => 'product-analytics-movem... [16:56:10] (03PS1) 10GoranSMilovanovic: wikidata_examples [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/759276 [16:56:29] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] wikidata_examples [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/759276 (owner: 10GoranSMilovanovic) [17:04:13] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Epic: Technical evaluation of Amundsen - https://phabricator.wikimedia.org/T300756 (10razzi) [17:09:54] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Set SparkmaxPartitionBytes to 256MB - https://phabricator.wikimedia.org/T300299 (10Ottomata) Let's merge and apply this on Monday. [17:10:54] 10Analytics, 10Data-Engineering, 10Event-Platform, 10SRE: ~1 request/minute to intake-logging.wikimedia.org times out at the traffic/service interface - https://phabricator.wikimedia.org/T264021 (10Ottomata) a:05Ottomata→03None [17:15:16] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Internet-Archive, and 3 others: page-links-change stream is assigning template propagation events to the wrong edits - https://phabricator.wikimedia.org/T216504 (10Ottomata) a:05Ottomata→03None [17:18:40] (03CR) 10Joal: [V: 03+1] "Thanks Dan :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/757495 (https://phabricator.wikimedia.org/T297934) (owner: 10Joal) [17:18:50] (03PS10) 10Joal: Integrate SparkSQLNoCLIDriver and HiveToCassandra [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/757495 (https://phabricator.wikimedia.org/T297934) [17:20:59] folks are we doing the SRE sync? If not I'd take the extra 30 mins to finish some stuff before the tech meeting [17:22:22] we are doing, its always optional though so feel free to skip...although if you were there i'd ask you something about knative stuff but we can talk anytime [17:22:25] elukey: ^ [17:23:05] ottomata: ack if you don't mind I'd skip today, but please ping me anytime for knative :) [17:23:22] k! [17:23:24] (03CR) 10jerkins-bot: [V: 04-1] Integrate SparkSQLNoCLIDriver and HiveToCassandra [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/757495 (https://phabricator.wikimedia.org/T297934) (owner: 10Joal) [17:26:13] ping mforns, aqu1 - would you have aminute? [17:26:32] I have a question about concurrency on airflow (again!) [17:26:36] joal: yes [17:26:45] yeppp joal :] [17:26:45] batcave? [17:26:48] ok [17:30:02] btullis: razzi batcave busy [17:30:04] SRE sync here? meet.google.com/skg-xgmw-oty [17:40:39] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Deploy an-test-coord1002 to facilitate failover testing of analytics coordinator role - https://phabricator.wikimedia.org/T287864 (10BTullis) a:05BTullis→03razzi [17:47:12] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Epic, 10User-razzi: Technical evaluation of Amundsen - https://phabricator.wikimedia.org/T300756 (10razzi) [17:54:25] 10Analytics-Radar, 10Fundraising-Backlog, 10Product-Analytics, 10Wikipedia-iOS-App-Backlog, and 2 others: Understand impact of Apple's Relay Service - https://phabricator.wikimedia.org/T289795 (10nettrom_WMF) 05Open→03Resolved We've complete the analyses in the subtasks and continued to monitor views f... [18:47:12] mforns: keep going at 2? [18:47:19] i have default_args qs [18:49:00] sorry, in 10 mins? [19:05:15] ottomata: still in meeting with Sandra [19:05:37] I need to step out for 1 minutes as well, though, let's meet then? [19:09:03] k [19:22:50] hey ottomata I'm back wanna cave? [19:28:50] y!~ [19:28:54] missed ping sorry [19:28:55] coming tro cave [20:05:15] (03PS11) 10Joal: Integrate SparkSQLNoCLIDriver and HiveToCassandra [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/757495 (https://phabricator.wikimedia.org/T297934) [20:07:21] Gone for tonight - see you tomorrow folks [20:36:02] mforns: back [20:36:29] i'll poke around or a bit [20:36:30] for [20:36:49] are you in da cave? [20:36:53] ottomata: ^ [20:36:56] ya but afk one min [20:37:00] k [20:40:04] k here [21:32:15] 10Data-Engineering, 10Airflow: [Airflow] Add module to easily get dependency paths in HDFS - https://phabricator.wikimedia.org/T300795 (10mforns) [21:33:23] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: [Airflow] Add module to easily get dependency paths in HDFS - https://phabricator.wikimedia.org/T300795 (10mforns) [21:35:12] hm ottomata ugly... I added a call to hdfs_client (new API) from dag_config.py, and it fails... [21:35:19] same problem [21:36:34] this is going to affect refine DAGs... and any kind of dynamic dags... [21:36:42] this is the same problem we saw in the past [21:38:18] Airflow developers said this was fixed by using celery executor... I wonder if this would break in our current prod/test instances (in addition to with the develompment instance script) [21:38:32] no bueno [21:39:07] mforns: .... so we are using sequential executor here [21:39:10] in dev env? [21:39:12] right? [21:39:16] what if we used LocalExecutor? [21:39:40] mforns: so you are saying even using the new pyarrow API, if you have 2 calls to it, it hangs? [21:40:42] yes [21:40:54] but it's using the sequential executor, as you say [21:41:15] one call must be at interpretation time, the other at execution [21:42:00] lets try local, its an easy change [21:42:03] just edit the config and try, right? [21:42:15] or, maybe local doesn't work with sqlite? [21:42:16] can't recall [21:43:00] yea.. not sure [21:43:31] in any case, will continue tomorrow! loggin off, see you all! [21:49:22] laters! [21:57:35] do sensors run as operator tasks? [21:57:42] or at dag parse time? [21:57:46] hmm not at dag parse time, right? [21:58:16] i see operator [21:58:24] i think there is no reason to need to use pyarrow during dag parse, right? [21:58:55] i think i have a fix, newer fsspec and pyarrow, and then we configure the artifact source to use e.g. arrow_hdfs://analytics-hadoop/ [21:58:59] that makes it use the new pyarrow api [21:59:18] also need to set CLASSPATH as you have done in the right places [22:07:52] altough, i seem to have trouble writing to hdfs with the new API [22:07:53] hm. [22:08:02] ok i'm out for the day too! [22:08:11] 10Analytics, 10SRE, 10SRE-Access-Requests, 10User-Ladsgroup: Add dhorn to analytics-privatedata-users - https://phabricator.wikimedia.org/T300579 (10DannyH) @Ladsgroup Thank you, I appreciate it! [23:16:31] 10Data-Engineering, 10Project-Admins: Archive Analytics tag - https://phabricator.wikimedia.org/T298671 (10Aklapper) * Changed H126 by replacing #Analytics with #Data-Engineering and removing some archived project tags, so that rule is now: ** When _all_ of these conditions are met: *** Project tags include a... [23:24:28] 10Data-Engineering, 10Project-Admins: Archive Analytics tag - https://phabricator.wikimedia.org/T298671 (10Aklapper) For open tasks tagged with #Analytics but not with `#Data-Engineering*`, see https://phabricator.wikimedia.org/maniphest/query/jex5HoXYMhEN/#R . There are some open tasks which only have an #An...