Hi all,
I am using Airflow and different tasks in my DAG are dependent on data with different latency.
If I want to run the DAG for all the rows in 1 dataset, with each task changing a value (based on computation on different datasets) in one column (say task 4 will change column 4) of the main dataset, how can I efficiently track which rows have been completely executed and re-run the tasks which didn’t get the data the first time round.