Best practices on dag re-runs (even successful ones)

Hello,

I’m trying to figure out how to best set up my dags for situations when we need to re-run them (even if the runs were successful as we have some data sources that can have unreliable & changing data sometimes). I was reading through some existing threads on backfilling data (Triggering past execution date through the Airflow UI - #11 by Alan, Backfilling guidelines, What's the best way to re-run a task in Airflow?) but I’m still unsure as to what the best approach is for my use case.

Are my options the only ones listed below?:

  1. Run the Airflow CLI’s backfill command on my local dev environment (Does Astro CLI support Airflow CLI commands? If so, how do I run Airflow CLI commands using Astro CLI?)
  2. Delete the task runs & dag runs on Airflow UI for the days I want to re-run the dag for. Then run the dag with a specified execution date for those days
  3. Clear the task runs on Airflow UI
  4. Set up a separate dag for backfilling/ re-running

These options feel a bit manual or effort-intensive with lots of redundancies when trying to re-run the task for a large time frame. So I’m hoping that maybe there are alternatives that I haven’t thought of.

Thanks in advance! :slight_smile:

1 Like
  1. Run the Airflow CLI’s backfill command on my local dev environment (Does Astro CLI support Airflow CLI commands? If so, how do I run Airflow CLI commands using Astro CLI?)

Yes, this is possible through the Astro CLI to run airflow commands on your LOCAL deployment. You can use the run command.

astro dev run <airflow command>

For example, if you want to clear all your tasks in your DAG.

➜ astro dev run tasks clear <dag_id> --yes

Please refer to Airflow’s documentation on CLI.

  1. Delete the task runs & dag runs on Airflow UI for the days I want to re-run the dag for. Then run the dag with a specified execution date for those days

This is a valid way to rerun the DAG. Though the caveat is that catchup needs to set as True and all of the deleted dagruns are after the latest existing dagrun.

  1. Clear the task runs on Airflow UI

This option sounds the most reasonable to me. Easiest to execute and you do not have alter the DAG. You only need to clear the tasks you want to rerun. This allows you to be more precise with the tasks you want to rerun too.

  1. Set up a separate dag for backfilling/ re-running

Probably the most excessive route. You do not need another DAG just for rerunning tasks.


I think both option 1 and options 3 are equally easy to perform. Up to the preference of the individual.

1 Like

Thanks for the detailed answer Alan!

Just a follow-up question on clearing the task runs. Can I use Astro CLI to clear the task runs on my astronomer-hosted airflow instance? Or is the CLI just for my local environment?

I see Paola’s answer back in 2019 that cloud users can run the airflow commands just in the local environment, but I want to check if that’s still the case: Can I use the Airflow CLI on Astronomer?

Thanks again!

Unfortunately, Astro CLI is not able to run airflow command on Astronomer cloud deployments.

You will either need to upgrade to Airflow 2.0 and leverage the Stable API or install a plugin that allows you to perform those functions from the webserver. A few customer has used this plugin made by AnkurChoraywal

Oh nice, we’re already on Airflow 2.0 so we can use the Stable API then. Thanks for your help Alan!