What is the `include` directory that `astro airflow init` creates?

The include directory is really just a “catch all” for anything else you need to bring into the project (e.g. jar files, SSL certs, etc.)

They’ll get bundled up with your project and you can reference them in your DAGs, much like a resources directory. There shouldn’t be any restrictions on sub-directories within the include directory.

SQL Script files, for example, can go either in the include directly in your dag folder. It really only matters when you reference them, so how you organize a lot of those files comes down to personal preference more than anything else.

1 Like

This is slightly misleading when the files are templated(rendered) by jinja. The files in questions are those that match any of the operator’s templates_exts and is referenced in any of the operator’s template_fields.

The jinja environment, rather more specifically the FileSystemLoader, is used to render the file, and depending on where the file is you need to include absolute path in the template_searchpath parameter field for the DAG as that is where Jinja will look.

import os
from airflow.models import DAG

base_dir = os.path.dirname(os.path.realpath(__file__))
sql_dir = os.path.join(base_dir, 'sql')

dag = DAG(
    dag_id='dag_with_templated_dir',
    start_date=datetime(2020, 1, 1),
    template_searchpath=[sql_dir, '/usr/local/airflow/include']
)

a = PostgresOperator(
    task_id="sql_dir_script",
    sql=f"my_script.sql",
    dag=dag
)

As shown in the above example, the DAG is capable of rendering files in /usr/local/airflow/dags/my_dag/sql and /usr/local/airflow/include.

usr/local/airflow
├── dags
│   └── my_dag
│       ├── dag.py
│       └── sql
│           └── my_script.sql
└── include
    └── include.sql

I would advise against setting defined absolute paths in template_searchpath parameter because that is not always the directory structure of your airflow installation. Your dags folder could be somewhere other than your airflow home directory, for example.

However, by programmatically getting the path of where the DAG is located, you can safely determine the location to use for template_searchpath provided that the files you want to template are relative to that location. Logically, It also makes sense to an extent to keep the sql script that is used in your DAG in the same root directory as your dag file as shown above.

This is one way to set it up that tries to leverage how Jinja operates without imposing certain structure of the Airflow installation. You can, of course, have all your template files in one directory anywhere on the hosted machine but that only makes it less portable in my opinion.

2 Likes

Hey @Alan ,

Do you have an example of how a DAG author might import python files located in the include dir into a DAG in the dags dir?

Given you have something setup like this.

usr/local/airflow
├── dags
│   └── my_dag
│       ├── dag.py
│       └── sql
│           └── my_script.sql
└── include
    └── test.py

You would import with this statement.

from include.test import printer

test.py looks like this.

def printer():
    print("wow")