Is there a good guide for getting the Spark operators to work?

I am an Astronomer Enterprise customer and I am just starting to look at creating a DAG that will connect to Spark on an AWS EMR cluster and process some data. The DAG should support both SparkSQL and some basic pySpark code.

Is there an Astronomer guide for getting this type of a DAG set up? What are the basic steps I need to do get this working? I need to deploy some pySpark code in a Jupyter Notebook that my data scientist has developed within Sagemaker. Any tips? or is it already supported out of the box?

Thank you,
–Chris

1 Like