Best practice : best place to define variables and functions

Hi
I use to set/define my variables, connections and functions before i instantiate the tasks (outside “with dag”) . But in some dags i notice that sometimes it’s the inverse. So what is the best practice between:

myvar = Variable.get(“var1”)
with DAG … as dag:
task1

vs

with DAG … as dag:
myvar = Variable.get(“var1”)
task1

Hi @gto,

For using Variables in your DAGs it is best practice to either use Jinja templating or confining the Variable.get() call to inside an Operator’s execute() method. This is related to reducing, and ideally avoiding, top-level code in your DAG file (i.e. your DAG file should act like a config file and solely define the DAG, tasks, and task dependencies and not contain function calls). More on best practices when using Variables here.

Accessing Variables outside of an Operator’s execute() method without Jinja templating creates a session to query the metadata database in Airflow. This can slow down DAG parsing and put unnecessary load on the metadata database causing a number of performance issues.

For the sake of example, let’s assume task1 is the Task that uses the var1 Variable and task1 is a PythonOperator that executes a function called foo(). Using the two options outlined above:

  1. Access var1 in task1 Operator
from airflow.models import Variable


def foo():
    print(Variable.get("var1"))
  1. Use Jinja templating where op_kwargs is a templated field (see source)
# Function defined outside of the DAG file in foo.py
def foo(bar):
    print(bar)

from foo import foo
...

with DAG(...) as dag:
   task1 = PythonOperator(
       task_id="task1",
       python_callable=foo,
       op_kwargs=dict(bar={{ var.value.var1 }}),
   )

I hope that helps!

2 Likes

Thanks @josh-fell for taking the time to explain.
Its much more practical to use the templating and leverage the execute method of the task.