Making use of the GCP DataflowOperators


I’ve been trying to use astronomer to try and trigger some dataflow jobs, however I’ve hit a few issues with the python 2.7 requirement of apache beam to send the job information. I’ve based most of what I’ve done on the example provided but without success. Naturally installing apache beam fails given astronomer runs on Python 3.6 and they’re not compatible, and the DataflowOperators provided by airflow also seem to need to be run in Python 2.7 as well, and as a result fail as well.

I’ve started to attempt to use the VirtualEnv operator, but given I’m trying to call code that needs to be imported, it necessitates importing apache_beam, which can’t be installed.

Has this issue, triggering Dataflow, or more specifically apache beam, from within astronomer been solved before?