Docker 19.03.6 is capping CPU usage on Airflow containers

We are running Airflow as docker containers on AWS ECS (using m5.xlarge instance, 4 vCPUs which are 4096 CPU units).

We recently upgraded docker from 18.09.9 to 19.03.6. Then immediately we noticed that it’s capping the CPU usage on the containers. For instance, the scheduler was a heavy CPU user. So I allocated 1024 CPU units in its task definition. But under docker 19.03.6 the scheduler is using much less CPU than before. In fact, all containers are using less CPU. And we keep seeing these errors on Airflow UI

Broken DAG: [/usr/local/airflow/dags/<my_dags>.py] Timeout, PID: <xxxx>

Everything else seems fine though. No errors in the logs. Only on the UI.

When we roll back to docker 18.09.9, the errors are gone. I know it sounds like a docker issue. But of all containers that have been “capped”, Airflow is the only one that generates these error messages.

I do think there is something not quite right with my scheduler: it constantly causes high cpu usage in the container and i merely have max_threads = 4 in my airflow.cfg for the scheduler. In fact, DataDog metrics indicate my scheduler container uses up to 115 docker.cpu.user (The percent of time the CPU is under direct control of processes of this container, unnormalized)