I am using airflow with GCP’s Cloud Composer.
I am creating data warehouse in BigQuery.
I actually prefer to use the python client for BigQuery and execute my sql statement through it, rather than using the provided BigQuery operator.
I run all of my python scripts with the bash operator.
Everything works fine so far. I can also have more freedom to do what I want in that way.
My question is: is this the best way to execute BigQuery statements ?
Are there possible drawbacks’s ?
On the other hand, I am using also GCP’s Data Fusion with airflow, however the operator provided for starting a pipeline is not working in my environment.
Again, I am using the bash operator and making api calls to Data Fusion instance directly, and all works fine.
Again, my question what are the possible drawback’s?
You can think of all the built in operators as PythonOperators in a sense since all a PythonOperator does is execute Python code and all operators are written in python. The way you are approaching this is similar but kind of added some complexity by involving a BashOperator. You could write a function where it is essentially your script and reference that function with a PythonOperator.
is this the best way to execute BigQuery statements?
Hmm, personally I think the best way to execute anything is to write a custom operator and leverage hooks and packages. That is essentially what the BigQueryOperator is doing. However, it sounds like for your use case, you need to use the google cloud package more intimately both the BigQueryHook and BigQueryOperator doesn’t support your specific needs.
There are more than one way to skin a cat and I would say yours is a valid one. It really depends on the style of the coder at this point.