Issue installing the Databricks Operator


#1

I have been able to get Astronomer running and some basic DAGs with Python operators are also running successfully.

I am having issues running the Databricks operator.

In my DAG .py file, I specify the following line to try an import it:

import airflow.contrib.operators.databricks_operator

When I open Airflow, however, I get this error:

According to the blog post about the Databricks operator, it should be integrated in Airflow 1.9.0 which was released on December 15, 2017 according to airflow github.

Is Astronomer using a version earlier than 1.9.0?

Is there something else I need to do in order to ensure that Airflow on astronomer will be able to use the databricks operator?


#2

Hi there!

Astronomer by default runs the 1.9-stable branch, which does include the databricks operator

Can you paste your full code here? I was able to successfully import from that class:

from airflow.contrib.operators.databricks_operator import DatabricksSubmitRunOperator

Can verify if this works for you? I didn’t instantiate it as I don’t have a spark cluster right now, but can definitely dig deeper if needed!


#3

The “Submit Run” operator is being imported just fine into my environment it appears, as is the case with you.

The “Run Now” operator does not appear to be importing successfully. I still get this error:
Broken DAG: [/usr/local/airflow/dags/example-databricks-dag.py] cannot import name ‘DatabricksRunNowOperator’

I just checked the history on Github and it was not committed to Airflow until September 6, 2018, so perhaps that version of Airflow has not made it into Astronomer

If you want to see the DAG code that gave me the above error, here it is:

from airflow import DAG
from airflow.contrib.operators.databricks_operator import DatabricksRunNowOperator
from datetime import datetime, timedelta

default_args = {
‘owner’: ‘airflow’,
‘depends_on_past’: False,
‘start_date’: datetime(2018, 1, 1),
‘email_on_failure’: False,
‘email_on_retry’: False,
‘retries’: 1,
‘retry_delay’: timedelta(minutes=5),
}

dag = DAG(‘example_dag_databricks’,
max_active_runs=3,
schedule_interval=timedelta(minutes=5),
default_args=default_args)

t1 = DatabricksRunNowOperator(job_id=139)


#4

The 1.9 stable branch was indeed released before that PR was committed into Airflow:


#5

Hey @zdestefano- Sorry for the spam issues- looks like Discourse automatically flags comments as spam if there are multiple consecutive comments that link out to the same host (in this case github.com). I just fixed our settings up so you shouldn’t have issues with that going forward.

As far as the issue goes, you can follow along with the response here to run Airflow 1.10 on Astronomer. That should include the commit you linked above.