When testing locally using astro dev start some users see their tasks exit with a message like the following:
Task exited with return code Negsignal.SIGKILL
This is nearly always an indication that there weren’t enough resources (CPU and/or memory) available. Check your Docker settings and increase when appropriate. You can find the resource settings under Preferences → Resources (for Mac, may be different on other systems). For reference, I have mine set to 6 CPU and 6 GB of memory. That’s a good starting point.
I was actually getting the same issue. It worked for my local astronomer instance (I did 4 cpu and 6 gb memory) but failed in astronomer cloud even though I have 12 GB memory. Is there a way to specify that a certain DAG can use more memory.
I managed to track you down and find your cloud deployment. Any reason you’re using the Local Executor? That means tasks run on the scheduler rather than in a Celery worker (Celery Executor) or a separate Kubernetes Pod (Kubernetes Executor).
Thanks for your answer Kris. I was not aware that I was using the Local Executor? I was under the impression that I am using the Kubernetes Executor since those are the only values I change. How do I know which executor I am using? If you see the image below , in my production environment, Kubernetes is highlighted:
Ahhh, I was looking at your other staging deployment. Disregard.
To answer the original question, there’s no way to allocate memory to a specific DAG. The memory and CPU limits are set at the deployment level with your Extra Capacity slider.
Thank you for your response. So one last question about the Extra Capacity slider. Most of our DAGs run between midnight at 8 am. Would it make sense to have more units on Extra Capacity and lower the scheduler resource?
As for your confusion, you were right. I was using Local Executor in staging . Somehow I am gettingNegsignal.SIGKILL using Kubernetes Executor but not when using Local Executor for the same DAG
In my opinion, yes. Based on historical resource usage, I believe it would be safe for you to reduce the scheduler resources to 15-20 AU. You could use that spend on the Extra Capacity slider if you’re seeing any issues with resource quota limitations.
As for the DAG that’s seeing the SIGKILL, feel free to open a support ticket with us if you want assistance troubleshooting that.
Hey! Did you figure out why you were getting Negsignal.SIGKILL using Kubernetes Executor but not when using Local Executor for the same DAG? I am having the exact same problem.
My DAG basically takes a large S3 file, downloads it by chunks and reloads the chunks to S3.