Suppose one has multiple ETL DAGs that execute one after another post completion. What would be the best way to define a main DAG/sub DAG pattern that will trigger each sub DAG and wait for completion?
- MAIN DAG
- limit visual complexity in graph
- triggers other dags
- handles branching logic
- is scheduled
start >> child_dag_0 >> child_dag_1
- contains ETL Tasks
- are not scheduled
Option 1 SubDags
SubDags have had lots of issues in the past with worker slots and deadlock. This may not be the best option.
Option 2 TaskGroups
Divide each task into a taskgroup in the MAIN DAG. This is possible, but makes it harder to do adhoc one off taskgroup runs.
Option 3 TriggerDagRunOperator
Will allow MAIN DAG to trigger other DAGS as needed and provides for poking/sensing for completion status. However the poke
mode does not have
reschedule and will take up a worker slot. Since the source code shows that a call to
time.sleep is made.
github source TriggerDagRunOperator
Option 4 Stable REST API + HTTPHook + HTTPSensor
With the new stable API in Airflow2.0 it is possible to use the
post_dag_run endpoint to trigger a dag and follow up with
get_dag_run endpoint with an
http_sensor to poll for status.
Are there any other patterns or suggestions for accomplishing such scheduling?