If we want to use pycharm + docker + astro CLI for local development, is there a recommended way to do this?
Astro CLI (at least 0.7.5.2) uses compose under the hood but does seem to write out the file for us.
This presents two problems when you try to use interactive python console.
- while the CLI provides a means for loading environment variables, we either have to do copy paste in pycharm or create our own compose file – neither of which is super desirable.
- If we just reference the docker image directly, we don’t get the out-of-the-box bind mounts. So again, we either have to go fiddling around with pycharm, or make our own compose file.
Both of these issues seem to undercut the potential for smoothness of dev experience with the astro cli. Are there some tricks I’m missing?
Thanks in advance
Update
I tried out this compose file for pycharm console:
version: '3.4'
services:
python_console:
image: <your image>
environment:
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
- AIRFLOW__CORE__FERNET_KEY=d6Vefz3G9U_ynXB3cr7y_Ak35tAHkEGAVxuz_B-jzWw=
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://postgres:postgres@postgres:5432
env_file:
- .env
volumes:
- ./include:/usr/local/airflow/include
entrypoint: python
This seems to work. I grabbed the fernet key from environment of running container after astro airflow start
. It looks like this is hardcoded into the CLI, so that we could commit this to the repo and it would still work on different developer machines etc.
The question still remains though: what’s the best way to set up for local development, using pycharm including interactive console, while remaining fully consistent with standard astronomer standards for workflow and structure?
2 Likes
Did you ever figure this out?
Well, I figured out what works for us.
Here are some notes on what we do as relates to this topic.
- we don’t use the astro CLI
- For deployments, we use a custom shell script and a service account to authenticate to astro.
- For local dev cluster, we use a custom compose file which we based on the one that is generated internally in the astro cli. And then I created a wrapper for the compose command referencing that and I call it
dev
. So then you can run dev up
to launch a cluster etc. This way it’s just native docker and I don’t have to remember how the astro cli works.
- For pycharm, I use a custom compose file that doesn’t spin up webserver or scheduler or postgres.
- we rarely use docker
- most of the time when developing an operator or something you don’t need an actual local cluster running and using docker is just slow and a major resource hog
- when i want to validate that a graph looks right i’ll just run
airflow webserver
briefly in a virtualenv – much faster
- when i want to run something ad hoc i’ll just call
my_task.execute()
- we manage the virtualenv setup with a
pip-install.sh
(requirements.txt alone is not enough because you have to reference the astronomer certified pip index) and the docker setup with build.sh
and both of these get you to the same place (notwithstanding system dependencies)
- windows people are more likely to use docker for interactive dev but still there is WSL and it works
- usually only time i use docker is updating dependencies or upgrading
One small other thing that was helpful to me was realizing that there was nothing special about the bind mounts that work out of the box with the astro cli (namely include, plugins, and dags). You can structure your repo any way you want, and you can use custom compose file, and you can use whatever bind mounts you want. When deployed, nothing is bind mounted at all, everything is baked into the image, and as long as the env vars are set up right you should be ok.
2 Likes
Wow, thank you!
Is is it possible to share your scripts?
Seems like ease of development (and testing) is a major shortcoming with the cli.
Jumping in here to add some thoughts:
- The CLI is a great tool when you don’t want to deal with Python dependencies and are comfortable with the level of Docker that we expose out.
- Testing is definitely something that’s on our radar, but we haven’t really figured out the best way to expose that sort of functionality through the CLI (would love to hear suggestions!).
- You can also use a docker-compose override for the CLI.
-
This guide can help you get a Docker + PyCharm environment set up.
Yeah nothing against CLI
Re scripts… here is more recent version of pycharm docker compose file i use:
version: '3.4'
services:
python_console:
image: <your image>
env_file:
- ../.env
environment:
- AIRFLOW__CORE__SQL_ALCHEMY_POOL_ENABLED=False
- AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=True
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__ENABLE_XCOM_PICKLING=False
- AIRFLOW__SCHEDULER__CATCHUP_BY_DEFAULT=False
- PYTHONPATH=/usr/local/airflow
volumes:
- ../dags:/usr/local/airflow/dags
- ../plugins:/usr/local/airflow/plugins
- ../include:/usr/local/airflow/include
command: sleep infinity
Only real difference here is that the command is sleep infinity
. Apprently the command doesn’t really matter when using with pycharm console. I’d have just updated the original post but I don’t seem to be able to.
Anyway, this will work as a pycharm interpreter. For interactive development often you just want a REPL – you don’t need to be running webserver or scheduler or postgres. So this compose file works for that purpose.
The advantage using compose rather than simply referencing the image is that you can use an env file and bind mounts having to configure it in pycharm (and env file is convenient way to reference secrets or any other configuration you don’t want in source control).
One note for pycharm dev… When terminating a session, you just want to make sure you click the red square rather than only closing the console. If you just close the panel you’ll have zombie containers you’ll need to kill manually (think it’s pycharm bug).