I’m pulling content from an API. Each record has a unique ID “objectID.” I’m not able to pull more than one record at a time so each connection is to https://api.myvendor.com/mydata/[objectID]
I have a Python script that iterates over the objectIDs, calls the API for each ID and pulls down the data.
Here’s the rub though - there are 4000,000 objectIDs I need to pull in a one-time load. I don’t want my script running for however long before I can see the output.
Because this job is not time sensitive, I thought I might use a scheduler like Airflow to run the script every hour or so on, say, ten of the records, load them into the database so I can check the jobs from day to day.
I know how to do this in Python, but I’m wondering if there’s a recommended way to do it with Aiflow as a part of the Pipeline.
Any advice is appreciated! Thanks!