In traditional Celery Arch, we can have one connection per worker and this connection can be initialized when a worker is created/init.
Can the same be achieved functionality using celery executor? to have one connection per worker? and this is initialized when a worker is created?
Ideally I would like to initialize a worker and provide worker list of task to be done!!!
By Connection, I mean a database connection. I wish to instantiate one db connection per worker.
Is it possible to instantiate one connection per worker and use it for all the task that worker do?
On second point,
Do you mean you want to specify which worker your task instances are executed by?
@Alan Well there are hooks for all the databases in airflow … what you have provided is for metadata storage …and doesn’t cover all the database we can connect and use in our dag.
I’m sorry for my brief responses but i can only answer based on what is given and you have not provided much context around your infrastructure and what it is you want to accomplish.
If you are talking about hooks connecting to other services like a database, then the answer is more complicated. I think Airflow does NOT have any internal mechanism to limit number of connections created by a hook to the service that the hook is for.
A custom hook could have some sort of mechanism built in to act as a pooling agent.
Another potential solution could be to use Pools. If you only want one task to connect to a database for example, you can enforce all operators that uses that hook to have the pool parameter specified. When there are two task instances that uses the hook, only one can run. This scenario assumes that there is only one Celery worker.
I don’t think this is possible to implement with multiple workers because there is nothing in the metadata database that tracks how many hook connections there are at all times.
I would encourage you to submit a feature request on Airflow’s Github Issues.