Running Astronomer in a monorepo

My company will only adopt Astronomer if it is compatible with a monorepo organization structure. Our monorepo will include an astronomer project, our website, some AWS lambdas, and a python library of shared code that all services will install.

Here is a minimal version of the directory structure:
.
├── README.md
├── astronomer
│ ├── Dockerfile
│ ├── dags
│ ├── include
│ ├── plugins
├── shared
│ ├── lib
│ └── setup.py
└── www
├── Dockerfile
├── setup.py
└── website

We need to be able to copy the shared directory into astronomer during the image build process, which requires that the build context be at the root of the monorepo, instead of in ./astronomer. Switching up the build context is possible in the standard docker cli, which would work for making releases in CI/CD, however the astro CLI seems less flexible about that for local development.

Does anybody have experience in the past with managing shared code in a monorepo with astronomer? If it’s feature people want, I’d be willing to open a PR that adds some kind of build context flag to the astro CLI so it’s possible to copy in files outside of the root astronomer project.

Thanks!

@gusostow If you are able to copy the shared directory into the build context before docker building in your pipeline, you could possibly add a symlink in the astronomer/include directory that links back to the shared directory. From what I can tell, git will track it, docker build won’t follow it, but python will follow it. If that all works, you should be able to work locally (with a team) and deploy to Astronomer using the same import paths to your shared code. When you are running locally, the project directory is mounted into the container at runtime, so the airflow python process should follow the symlink.

For reference, here’s the part of the CLI where we mount the local project directories for astro airflow start - https://github.com/astronomer/astro-cli/blob/master/airflow/include/composeyml.go#L54-L57. If you could manage copying into and symlinking into the include directory, it might work.

I haven’t tried this, but it may be at least a starting point.

2 Likes

@schnie I’m having trouble getting the symlink to work in the mounted volume. Within the running container I can see that it targets the correct directory in my host filesystem, which doesn’t exist in the container filesystem.

I always thought that the consensus was that symlinks aren’t compatible with docker. I’d love to just be doing something wrong though.

Other options:

  1. Have a bash script that copies shared code into the ./astronomer directory every time a developer wants to update changes. Which is manual and error prone.
  2. Moving the astronomer service into the root of the monorepo. Which is inelegant.

Agree on your other options there.

Yea, docker shouldn’t work with symlinks when adding files in the build context, but I was thinking since the locally running instance (astro airflow start) mounts volumes to the directories in that previous link that python would start up and load the paths just fine, but maybe not.

Can you share where your symlink is located and where it is pointing?

@schnie My for the example directory structure above, my symlink command was ln -s $PWD/shared $PWD/astronomer/include/shared. Which worked great on my root machine for bringing the shared files in, but didn’t work within the docker container file system.

Inside the container the container you cannot access the shared folder:

bash: cd: shared: No such file or directory

Seems like because the symlink still points to the path in the host filesystem.

bash-4.4$ pwd
/usr/local/airflow/include
bash-4.4$ ls -l
lrwxr-xr-x    1 astro  astro  48 Jun 28 13:41 shared -> /Users/augustusostow/dev/example-monorepo/shared

So that doesn’t seem to work, however, I think I did manage to hack together a solution, that Ry and Pete thought I should share here in the forum to get some feedback.

My hack

I pulled the local development docker functionality out of the CLI.

As I already mentioned, Docker has the ability to separate the build context from where the Dockerfile is located, which allows you to copy files from directories “behind” it.

There were two problems:

  1. The docker commands are tightly controlled by the astro CLI, which enforces that execution is in the root of the astronomer project directory. I solved this by pulling out a modified version of the compose file from your CLI source code, then starting local development through the standard docker-compose CLI.

  2. It turns out paths in a Dockerfile are relative to the build context, not the Dockerfile itself. So if you change the buildcontext, it breaks path references in the onbuild Dockerfile. I solved this by ditching the onbuild image altogether, which had the paths that were broken.

This strategy seems to work for me as long as nobody sees any longterm problems. Also, if there’s interested I’d like to contribute features to the astronomer CLI that support monorepo development and code sharing. Let me know.

Thanks for the time!