I need to use `selenium` in a PythonOperator. Can I install a browser driver on Astronomer?

We need to use selenium in a PythonOperator; selenium requires at least one web browser driver (such as a chromedriver) in order to work.

So two questions:

  1. How can I add that chrome driver to my local astronomer deployment?
  2. Will it be possible for me to add that chrome driver to the cloud deployment as well?

Answer

Our Airflow containers run on an Alpine Linux based image. You can add system-level packages to your image via the packages.txt file and python packages via the requirements.txt file, both of which are automatically generated when you initialize an Airflow project on Astronomer via the CLI (by running astro airflow init).

In your case, you can:

  1. Add chromium-chromedriver to packages.txt
  2. Add selenium to requirements.txt

This applies to both your local deployment as well as your deployment on Astronomer Cloud. Cloud runs the exact same image you build locally, so anything that works locally will work in Cloud.

Note: We’re currently experimenting with a Debian based image compatible with both Astronomer Cloud and Enterprise. If you’d like to try it out, reach out to support@astronomer.io.

2 Likes

@paola How do I reference the driver once I have included these packages? I understand that it should be something like:
driver = webdriver.Chrome('path/to/driver')
But I am not sure where the driver is located when I run locally or in the cloud when I have an astronomer initialized configuration.

@AstroGabe figured out that you also need to add chromium to your packages.txt and these options when you use the driver:

options = webdriver.ChromeOptions()
options.add_argument(‘–no-sandbox’)
options.add_argument(‘–disable-dev-shm-usage’)
options.add_argument(‘–headless’)
driver = webdriver.Chrome(chrome_options=options)

3 Likes

Hi @johnlim - may be a bit of a longshot since your comment is nearly 3 years old - but are you able to confirm what items you added to requirements.txt, packages.txt, and a sample DAG you were successfully implementing selenium with?