I am using airflow with celery executor. As an example, I have a dag with 3 tasks.
task1 >> task2 >> task3
When my worker nodes = 1, I see that the tasks execute fine and in the right sequence
However, when I increase the worker nodes to 2, I see that task1 is picked by worker node 1 and task2 by worker node 2. In my use case, task2 must execute only after task1. I believed that the celery executor will understand this. My dag fails because of this
Can you please me understand what’s wrong and how do I fix this issue?
If you want your tasks to be executed by the same Celery worker, you will need to set queues them.
Take a look at the Airflow documentation for Celery Queues.
Though I would also consider writing your task to be not reliant on executing on a specific host due to the result of the previous tasks. If the previous task saves a file on the worker and the next task needs to inspect it, my suggestion is either:
combine those tasks
upload at the end of task 1 and download at the beginning of task 2