Somehow it looks like that no matter what I do, after a few minutes nothing works anymore. At first everything starts as usual, but then no more tasks start.
I have already re-deployed via the cli and changed the value AIRFLOW__SCHEDULER__RUN_DURATION. Unfortunately nothing helps.
All tasks are stuck.
Hey, unfortunately, it looks like the problem is back.
For a while, things were good in my astronomer deployment. Now that the number of dags (dynamically created) is growing, I seem to be having the same problem again. The scheduler is stuck and in the logs I permanently get the message â{scheduler_job.py:214} WARNING - Killing PID 5866â. I have already increased the scheduler to 20 AU and also the âAIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUTâ to 60. But it seems to be of no use.
Thanks for your answer. I havenât filed an issue yet. But I hope I have already solved the problem.
My deployment was still running on 1.10.5 and I use some Google Bigquery operators. In this version there was logging inside the init function. This seems to have been the problem.
I have now updated to version 1.10.7 and everything works again.
I had a similar issue on airflow 1.10 on top of kubernetes.
Restarting all the management node and worker nodes solve the issue. They were running for one year, without reboot. It seems we need frequent maintenance reboot for all kubernetes nodes to prevent such issues.