Tasks taking the poison pills after few seconds of running

Hi,

We are encountering strange behaviour using Airflow.

Randomly, tasks “take poison pill” few second after starting.

Reading local file: ************************************************************************************************
[2019-11-27 03:48:16,430] [logging_mixin.py:95] WARNING - /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:536: Warning: Data truncated for column ‘start_date’ at row 1
cursor.execute(statement, parameters)
[2019-11-27 03:48:16,431] [logging_mixin.py:95] WARNING - /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:536: Warning: Data truncated for column ‘latest_heartbeat’ at row 1
cursor.execute(statement, parameters)
[2019-11-27 03:48:16,450] [models.py:1359] INFO - Dependencies all met for <TaskInstance: *********************** [queued]>
[2019-11-27 03:48:16,460] [models.py:1353] INFO - Dependencies not met for <TaskInstance: ********************************** [queued]>, dependency ‘Task Instance Slots Available’ FAILED: The maximum number of running tasks (16) for this task’s DAG ‘cron_flux_ftp’ has been reached.
[2019-11-27 03:48:16,460] [models.py:1554] WARNING -

FIXME: Rescheduling due to concurrency limits reached at task runtime. Attempt 1 of 2. State set to NONE.

[2019-11-27 03:48:16,460] [models.py:1557] INFO - Queuing into pool None
[2019-11-27 03:48:16,472] [logging_mixin.py:95] WARNING - /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:536: Warning: Data truncated for column ‘queued_dttm’ at row 1
cursor.execute(statement, parameters)
[2019-11-27 03:48:16,475] [logging_mixin.py:95] INFO - [2019-11-27 03:48:16,475] [jobs.py:2514] INFO - Task is not able to be run
[2019-11-27 03:48:16,486] [logging_mixin.py:95] WARNING - /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:536: Warning: Data truncated for column ‘end_date’ at row 1
cursor.execute(statement, parameters)
[2019-11-27 03:48:38,384] [logging_mixin.py:95] WARNING - /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:536: Warning: Data truncated for column ‘start_date’ at row 1
cursor.execute(statement, parameters)
[2019-11-27 03:48:38,384] [logging_mixin.py:95] WARNING - /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:536: Warning: Data truncated for column ‘latest_heartbeat’ at row 1
cursor.execute(statement, parameters)
[2019-11-27 03:48:38,401] [models.py:1359] INFO - Dependencies all met for <TaskInstance: **************************** [queued]>
[2019-11-27 03:48:38,411] [models.py:1359] INFO - Dependencies all met for <TaskInstance: ************************************************************** [queued]>
[2019-11-27 03:48:38,411] [models.py:1571] INFO -

Starting attempt 1 of 2

[2019-11-27 03:48:38,426] [models.py:1593] INFO - Executing <Task(BashOperator): > on 2019-11-26T02:30:00+00:00
[2019-11-27 03:48:38,426] [base_task_runner.py:118] INFO - Running: [‘bash’, ‘-c’, u’airflow run cron_flux_ftp ********************* 2019-11-26T02:30:00+00:00 --job_id 40947091 --raw -sd DAGS_
--cfg_path /tmp/tmpg9r3RQ’]
[2019-11-27 03:48:38,815] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* [2019-11-27 03:48:38,815] [settings.py:174] INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=12120
[2019-11-27 03:48:39,036] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* [2019-11-27 03:48:39,036] [init.py:51] INFO - Using executor CeleryExecutor
[2019-11-27 03:48:39,175] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:536: Warning: Data truncated for column ‘dttm’ at row 1
[2019-11-27 03:48:39,175] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* cursor.execute(statement, parameters)
[2019-11-27 03:48:39,175] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:536: Warning: Data truncated for column ‘execution_date’ at row 1
[2019-11-27 03:48:39,175] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* cursor.execute(statement, parameters)
[2019-11-27 03:48:39,182] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* [2019-11-27 03:48:39,181] [models.py:273] INFO - Filling up the DagBag from /home/production/airflow/dags/flux_ftp.py
[2019-11-27 03:48:39,231] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* /usr/local/lib/python2.7/dist-packages/airflow/utils/helpers.py:356: DeprecationWarning: Importing ‘PythonOperator’ directly from ‘airflow.operators’ has been deprecated. Please import from ‘airflow.operators.[operator_module]’ instead. Support for direct imports will be dropped entirely in Airflow 2.0.
[2019-11-27 03:48:39,231] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* DeprecationWarning)
[2019-11-27 03:48:39,231] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* /usr/local/lib/python2.7/dist-packages/airflow/utils/helpers.py:356: DeprecationWarning: Importing ‘DummyOperator’ directly from ‘airflow.operators’ has been deprecated. Please import from ‘airflow.operators.[operator_module]’ instead. Support for direct imports will be dropped entirely in Airflow 2.0.
[2019-11-27 03:48:39,231] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* DeprecationWarning)
[2019-11-27 03:48:39,515] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:536: Warning: Incorrect datetime value: ‘2019-11-26 02:30:00+00:00’ for column ‘execution_date’ at row 1
[2019-11-27 03:48:39,516] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* cursor.execute(statement, parameters)
[2019-11-27 03:48:39,516] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* /usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py:536: Warning: Truncated incorrect datetime value: ‘2019-11-26 02:30:00+00:00’
[2019-11-27 03:48:39,516] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* cursor.execute(statement, parameters)
[2019-11-27 03:48:39,551] [base_task_runner.py:101] INFO - Job 40947091: Subtask ********************* [2019-11-27 03:48:39,551] [cli.py:520] INFO - Running <TaskInstance: cron_flux_ftp.
************* 2019-11-26T02:30:00+00:00 [up_for_retry]> on host ip-172-31-48-142.eu-west-1.compute.internal
[2019-11-27 03:48:39,584] [bash_operator.py:77] INFO - Tmp dir root location:
/tmp
[2019-11-27 03:48:39,607] [bash_operator.py:86] INFO - Exporting the following env vars:
AIRFLOW_CTX_TASK_ID=*********************
AIRFLOW_CTX_DAG_ID=cron_flux_ftp
AIRFLOW_CTX_EXECUTION_DATE=2019-11-26T02:30:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2019-11-26T02:30:00+00:00
[2019-11-27 03:48:39,608] [bash_operator.py:100] INFO - Temporary script location: /tmp/airflowtmpQ4h8TB/*********************k2YRcb
[2019-11-27 03:48:39,608] [bash_operator.py:110] INFO - Running command: temp_file=$(mktemp) && echo $temp_file && cd ***************** && php cron_script_flux_ftp.php *************************** > $temp_file && **************************** $temp_file && rm $temp_file || (php ****************************************** $temp_file && rm $temp_file && exit 1)
[2019-11-27 03:48:39,612] [bash_operator.py:119] INFO - Output:
[2019-11-27 03:48:39,642] [bash_operator.py:123] INFO - /tmp/tmp.H4Ipn1K3uU
[2019-11-27 03:48:43,432] [logging_mixin.py:95] INFO - [2019-11-27 03:48:43,431] [jobs.py:2595] WARNING - State of this instance has been externally set to up_for_retry. Taking the poison pill.
[2019-11-27 03:48:43,446] [helpers.py:250] INFO - Sending 15 to GPID 12120
[2019-11-27 03:48:43,446] [models.py:1634] ERROR - Received SIGTERM. Terminating subprocesses.
[2019-11-27 03:48:43,446] [bash_operator.py:137] INFO - Sending SIGTERM signal to bash process group
[2019-11-27 03:48:43,460] [helpers.py:232] INFO - Process psutil.Process(pid=12680, status=‘terminated’) (12680) terminated with exit code None
[2019-11-27 03:48:43,458] [models.py:1788] ERROR - Task received SIGTERM signal
Traceback (most recent call last):

Hello @genseb, thanks for submitting your question here. Can I ask what version of Airflow you are using?

It seems people have found a solution by increasing the killed_task_cleanup_time in airfow.cfg (or export AIRFLOW__CORE__KILLED_TASK_CLEANUP_TIME=<a larger number than 60>.

In Airflow 1.10.12, by default killed_task_cleanup_time = 60.

I have no idea why or how this solution works.