Connection per worker in CeleryExecutor

In traditional Celery Arch, we can have one connection per worker and this connection can be initialized when a worker is created/init.
Can the same be achieved functionality using celery executor? to have one connection per worker? and this is initialized when a worker is created?

Ideally I would like to initialize a worker and provide worker list of task to be done!!!

@paola : Please have a look at my query!!!

I’m not sure I understand your question.

What is this connection you are referring to? Do you mean Queues?

Ideally I would like to initialize a worker and provide worker list of task to be done!!!

Task scheduling is done by the scheduler.

Do you mean you want to specify which worker your task instances are executed by?

@Alan

By Connection, I mean a database connection. I wish to instantiate one db connection per worker.
Is it possible to instantiate one connection per worker and use it for all the task that worker do?

On second point,

Do you mean you want to specify which worker your task instances are executed by?

Is this possible?

I don’t think it’s possible to limit connection per worker.

It is possible to limit the number of connects per Airflow deployment though. (see sql_alchemy_pool_size).

You could look into pgbouncer to see if that fits your needs.

@Alan pgbouncer is for postgres…what are its equivalent for say mongodb or mysql or other similar db?

I believe Airflow only supports MySQL or Postgres.

Please reference the Airflow documentation on database backends.

I do not have any recommendation for MySQL since Astronomer uses only Postgres databases.

@Alan Well there are hooks for all the databases in airflow … what you have provided is for metadata storage …and doesn’t cover all the database we can connect and use in our dag.

I’m sorry for my brief responses but i can only answer based on what is given and you have not provided much context around your infrastructure and what it is you want to accomplish.

If you are talking about hooks connecting to other services like a database, then the answer is more complicated. I think Airflow does NOT have any internal mechanism to limit number of connections created by a hook to the service that the hook is for.

A custom hook could have some sort of mechanism built in to act as a pooling agent.

Another potential solution could be to use Pools. If you only want one task to connect to a database for example, you can enforce all operators that uses that hook to have the pool parameter specified. When there are two task instances that uses the hook, only one can run. This scenario assumes that there is only one Celery worker.

I don’t think this is possible to implement with multiple workers because there is nothing in the metadata database that tracks how many hook connections there are at all times.

I would encourage you to submit a feature request on Airflow’s Github Issues.