I have a local install of Astronomer that I am using to test out a pipeline DAG. In a nutshell, DAG involves kicking off a Glue Crawler >> on completion kick off Glue Job >> Process loaded data on Aurora Postgres… etc. etc.
All of this requires interaction with AWS account and various IAM Service roles. Is there a recommended best practice on how to setup my AWS credentials and various service role names and invoking it within my DAG (default args?)
For example:
run_cfl_crawler = AwsGlueCrawlerOperator(task_id="run_cfl_crawler",
crawler_name="name of crawler",
iam_role_name="GlueServiceRole",
poll_interval=60, priority_weight=3)
Above fails if run locally. Astronomer registry documentation asks me for a config dictionary. Do I load my credentials/connection there? And would I do that for all crawlers and jobs?
Also invoking/setting the connection to AWS itself - Any way to instantiate it for the DAG (I was thinking, connection could be invoked/referenced in default args that is called in the DAG definition but Config may contain job, crawler and endpoint references.
Apologize for the Astro/Airflow noob questions if this has been answered elsewhere in this forum. If so links to prior threads would be appreciated. Thanks in advance.
Sandeep