Access external resources from local containers

Hi, everyone,

I’m hoping someone else has experienced and has some thoughts on how to resolve this.

We are using the Astronomer CLI to develop DAGs locally, but the containers that are started don’t appear to have access to any external resources such as a network file system or database. I am running on Windows 10 using Docker Desktop.

I would like to write some DAGs on my local workstation that can read/write files from some of our network shares and would love to be able to access our external database servers.

Thanks!

Hi Larry,

Not 100% sure on this but I don’t think the containers can interact with your local file system, and nor should they. Best practice is to set up cloud connections between databases and storage systems.

We currently use Big Query and Cloud Storage to use as our database and file system. This allows me to interact with the data and files I need to easily.

I suppose you could set up some sort of secure tunneling to your private network. Once you deploy to a production instance you would have to set something up anyway, but that probably isn’t the easiest. It makes it a lot easier if you move your processes to the cloud.

Thanks, @kaseyatkellyklee. Sorry if I wasn’t clear. I’m not looking for them to interact with my local file system (i.e the file system of the Windows machine running Docker Desktop). I’m hoping to find a way to allow them to see things like our network storage or to connect to an external SQL Server database that lives on a test server as these are things a full deployment of Airflow will be doing.

To add a bit more additional information, we’re in the process of setting up a fully on-prem version of Astronomer running on OpenShift in our data center. For the time being we are fully on-prem and won’t be able to run or store anything in the cloud right now.

Hey @larry.komenda - does your laptop have access to the resources you’re trying to read/write to? You should be able to access anything networked that your host can. Maybe it’s just a matter of setting up the necessary connections in your local Airflow instance?

https://airflow.apache.org/docs/1.10.1/howto/manage-connections.html

Hi, @pete. Yes, the computer hosting the CLI instance has access to those resources I mentioned. I had tried setting up the connections, but I’ll give it another shot.

I know the containers are based on Alpine Linux. Any tips for confirming that the running container does have access to those resources as step one? I can access the container using docker exec, but I had permissions issues pinging.

@pete I figured it out.

The container was not connected to the default bridge network, so I had to run “docker network connect bridge 1acbe962a5f7” to force it. Once I did that and I ran “docker network inspect bridge” I could see the container running the scheduler listed as a Container connected to “bridge”.

I just ran a test with an mssql_operator and was able to insert records into a test database.

3 Likes

Nice! Glad you figured it out- thanks for posting here, hopefully it’ll help others that run into similar issues :).

1 Like