CDD Vault to DataFrame: Python API and tutorial for querying and downloading data

could not render the beautiful face of this article's author :( OCE Communications Team 8/4/22
Oftentimes, it is a necessary step to export data out of CDD Vault into Python or otherwise onto a local machine. This can be an annoying task, so we want to help make the process as simple as possible so you can get to your analysis.
There’s an easy way to do this, which involves first creating a Saved Search that reflects the exact query and format you’d like to export the data as, and then second calling the Saved Search via CDD’s API.
To use the following code you’ll need three pieces of information: your CDD Vault ID, your CDD API token with read access, and the Saved Search ID.
The below code creates a function, get_dataset_cdd_saved_search(search_id), which given a Saved Search id, returns a pd.DataFrame with the data.
import time
import requests
import pandas as pd
from io import StringIO

# Written by OAM Communications Team, Oloren AI

token = None
vault_id = None

def run_saved_search(search_id):
    base_url = f"{vault_id}/"
    headers = {'X-CDD-token':f'{token}'}
    url = base_url + f"searches/{search_id}"

    response = requests.request("GET", url, headers=headers).json()
    return response["id"]

def check_export_status(export_id):
    base_url = f"{vault_id}/"
    headers = {'X-CDD-token':f'{token}'}
    url = base_url + f"export_progress/{export_id}"

    response = requests.request("GET", url, headers=headers).json()
    return response["status"]

def get_export(export_id):
    base_url = f"{vault_id}/"
    headers = {'X-CDD-token':f'{token}'}
    url = base_url + f"exports/{export_id}"

    response = requests.request("GET", url, headers=headers)
    data_stream = StringIO(response.text)
    return pd.read_csv(data_stream)

def get_dataset_cdd_saved_search(search_id):
    export_id = run_saved_search(search_id)
    i = 0
    status = "new"
    while True:
        print(f"Export status is {status}, checking in {2**i} seconds...")
        status = check_export_status(export_id)
        if status == "finished":
            print("Export ready!")
        i += 1
    return get_export(export_id)