Best Practice for interacting with AWS Services from Astro CLI local

I have a local install of Astronomer that I am using to test out a pipeline DAG. In a nutshell, DAG involves kicking off a Glue Crawler >> on completion kick off Glue Job >> Process loaded data on Aurora Postgres… etc. etc.

All of this requires interaction with AWS account and various IAM Service roles. Is there a recommended best practice on how to setup my AWS credentials and various service role names and invoking it within my DAG (default args?)

For example:

run_cfl_crawler = AwsGlueCrawlerOperator(task_id="run_cfl_crawler",
                                                 crawler_name="name of crawler",
                                                 iam_role_name="GlueServiceRole",
                                                 poll_interval=60, priority_weight=3)

Above fails if run locally. Astronomer registry documentation asks me for a config dictionary. Do I load my credentials/connection there? And would I do that for all crawlers and jobs?

Also invoking/setting the connection to AWS itself - Any way to instantiate it for the DAG (I was thinking, connection could be invoked/referenced in default args that is called in the DAG definition but Config may contain job, crawler and endpoint references.

Apologize for the Astro/Airflow noob questions if this has been answered elsewhere in this forum. If so links to prior threads would be appreciated. Thanks in advance.

Sandeep

Use an aws connection with the role name defined in the connection string instead.

When you run it locally, you can set your personal Iam user credentials in the dockerfile and a .env file. Then grant your Iam user access to assume the role defined in the connection string.

Let me know if you need more specifics.

Here are some more specifics.

  1. Use the aws_conn_id (or similarly named) parameter in your operator
  2. Create an aws connection in Airflow and set the extra parameter to something like {"role_arn":"arn:aws:iam::123456789:role:..."}
  3. Add the following to your Dockerfile:
ENV AWS_ACCESS_KEY_ID $AWS_ACCESS_KEY_ID
ENV AWS_SECRET_ACCESS_KEY $AWS_SECRET_ACCESS_KEY
  1. Create a .env file in the same directory with your AWS credentials (Make sure to add this file to your .gitignore so you don’t commit it!!):
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_DEFAULT_REGION=us-east-1
  1. Grant your IAM user access to assume the role you referenced in step 2
  2. Grant the IAM role your astronomer worker nodes have access to assume the same role

Now when you run astro dev start you should be able to run the DAG with the same level of access as Airflow will run it in your test/prod environment.

2 Likes

Thank so much, Matt. Works like a charm. No luck with the JDBC operator though. That requires a JVM installed and a path to JVM declared. Only issue is not sure where I should be installing JVM. Webserver? And reference path to that?

Sandeep