[11:04:51] lunch [14:06:55] \o [15:22:51] o/ Hi ebernhardson: I tried rebasing my DAG migration branches on the main branch but ever since run into test failures, for example: “airflow.exceptions.AirflowException: SparkSubmitHook env_vars is not supported in standalone-cluster mode.” [15:27:55] pfischer: hmm, sounds like some environment isn't making it into the configuration [15:29:36] pfischer: in particular, the wmf_airflow_common.config.dag_default_args only sets `master=yarn` as a default operator argument (which is needed for SparkSubmitOperator to not complain there) if a wmf prod environment is set [15:30:14] i'm not really sure why they set it up that way, tbh the configuration seems a little over-complicated, but maybe they have use cases for running airflow outside of the stats cluter with hadoop [15:31:08] the environment should be getting set from tests/search/conftest.py via the autouse'd environment_override_fixture. hmm [15:32:44] as for a fix...still pondering :S Not sure why it would pass in the CI and on my local but not on your local [15:34:41] pfischer: could you paste the output of running pytest somewhere? phab has a form (https://phabricator.wikimedia.org/paste/edit/form/14/) or wherever [15:51:53] * ebernhardson can't figure out how his gitlab profile clearly says my local time zone is set to UTC 0, but the commit history says i authored patches at 2 pm (which would be past 10pm local, something i know i wasn't doing) [15:52:18] ebernhardson: I do not have access to that phab paste [15:53:37] pfischer: huh, thats odd. It should be a generic form to create a paste in phabricator. i guess use https://phabricator.wikimedia.org/paste/ and click create-paste at the top right. If that doesn't work maybe we missed something on your profile configuration in phabricator [15:54:29] in that case we'd have to make a ticket for release engineering, i wasn't aware there were access controls around creating paste's [16:23:05] workout, back in ~40 [16:44:33] pfischer: maybe this will work for you, i tried to clean up my docker env in such a way that i think it might also run on macos: https://gitlab.wikimedia.org/-/snippets/62 [16:45:44] the last mac i owned was an Apple IIgs though, so no clue if it actually works :P Just based on my working assumptions about how docker on macos works [16:46:27] i'm not sure about arm images...it's based on the wikimedia-buster image [16:47:33] probably an arm based debian buster image would work, but i'm not sure [16:57:44] So far I didn’t use docker to build/run tests, just tox -e py39. Here’s the output: https://phabricator.wikimedia.org/P44903 [17:01:56] back [17:03:59] pfischer: hmm, the most curious part is that the fixtures you are generating for other dags have `--master yarn`, but the fixtures generated for import_commons_ttl and import_wikidata_ttl are not passing that same check [17:04:44] pfischer: it seems like the default_args you are passing to DAG might not be initialized in the same way the other dag files are [17:05:16] pfischer: as for docker, i was thinking perhaps ensuring it's run in the same way from the same environment would resolve the issue, but since this is limited to the new dags I'm thinking thats less likely [17:08:17] pfischer: is the branch on github updated? When i run it i get a different error: ImportError: cannot import name 'default_args' from 'search.config.dag_config' (/srv/app/search/config/dag_config.py) [17:10:18] I think I found the issue, I messed up during rebase, now that I’m using your get_default_config() method, things look better. [17:11:49] nice! [17:48:34] Alright, ebernhardson: I followed your suggestions in your comments and marked my PR ready: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/240 [17:57:23] pfischer: thanks! Did a pass over it comparing to the old repo, i think we are very close to merging this one [18:36:03] hmm, somehwat surprised elsaticsearch-spark-30_2.12 is marked as using apache license and not the custom elastic license. Good thing, because they didn't support spark 3 until 7.12 (and they changed licenses for 7.11 onwards) [18:38:25] * ebernhardson wonders if should use the latest 8.x package, it claims to support elasticsearch all the way down to 1.x [18:39:12] lunch, back in time for pairing [18:39:48] so many random question though...their repo says 8.4.0 is the latest stable release, but they have point releases up to 8.4.3, and the most recent release is 8.6.2 [18:44:15] their docs are a mess :P The repo README claims for requirements "lasticsearch (1.x or higher (2.x highly recommended)) cluster accessible through REST. That's it!" But then the compatability matrix says elastic 7.10.2 needs es-hadoop 6.8.x-7.17.x [19:08:01] back [19:11:26] huh, gitlab cuts off long commit messages in the UI without a "show more" button :S https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/commit/fba7afb52113c77663f0764c2067007ec427589c [19:15:55] not good. I wonder if that's tweakable [19:17:18] to be fair, it seems like half the patches i look at in gitlab don't even have a commit message, just a commit name like 'Apply review suggestions' [19:18:12] perhaps the expectation is that information about commits goes somewhere other than the git repo...i dunno [19:18:31] I wonder why. Is there something in gitlab pushing people away from making decent commit msg/ [19:18:44] Anyway, I asked in -releng about the webUI [19:19:16] perhaps i would lean more towards gerrit specifically encouraging good commit messages [19:20:44] Could be...but I'm def on the side of descriptive commit msgs [19:31:13] ryankemper: pairing session in https://meet.google.com/eki-rafx-cxi [21:11:58] (from earlier) > perhaps the expectation is that information about commits goes somewhere other than the git repo...i dunno [21:12:26] this is my eternal pet peeve...so many times all the context is left in git[hub/lab] PRs and not the commit itself [21:12:59] end result being, apart from not working natively with terminal-based workflows, all that context gets lost if the git provider is ever changed [21:34:37] ah, that's a good point. I never really thought about the difference between commit message and PR message [22:38:52] hmm, my port of export queries to relforge claims to run to succes but no index is found on relforge :S