How to fetch candidates from Greenhouse using Python

Greenhouse, a prominent applicant tracking system (ATS), lets you organize and manage candidate details, job postings, applications, interviews, hiring decisions, and more. 

Given all the information Greenhouse collects and stores, as well as tasks it lets you perform, you likely have many reasons to connect your product to the application via its APIs. That said, candidates likely top your data wants from Greenhouse. 

We’ll help you collect candidates from the Greenhouse API by walking you through the steps of setting up a functioning script in Python.

1. Configure authentication 

Greenhouse uses Basic Authentication over HTTPS, which requires an API key. 

Include a header in your requests in the format Authorization: <code class="blog_inline-code">Basic {BASE-64}{API-KEY}:{BASE-64}</code>. This API key is unique to your instance and provides you with the necessary permissions to access and manipulate your data. The API key should be Base64 encoded before being passed in the header. 

It’s worth noting that all API requests must be made over HTTPS. Requests made over plain HTTP will fail. API requests without authentication will also fail. You’ll also want to include the following:<code class="blog_inline-code">```python import requests import base64</code>

2. Define the API key and encode it using BASE-64

You’ll need to use the following code: <code class="blog_inline-code">api_key = 'your_api_key' base64_api_key = base64.b64encode(api_key.encode()).decode()</code>

3. Set up the headers for the request

Here’s how it should look: <code class="blog_inline-code">headers = { "Authorization": f"Basic {base64_api_key}:{base64_api_key}", "Content-Type": "application/json" }</code>

4. Define the initial endpoint

In this case, the endpoint is https://harvest.greenhouse.io/v1/candidates.

5. Initialize an empty list to store the candidates

This can be as simple as the following: candidates = []

6. Loop until the endpoint is empty (i.e., we've reached the last page)

Here’s the code that facilitates this logic: 

while endpoint: response = requests.get(endpoint, headers=headers) response.raise_for_status() # Raise exception if the request failed data = response.json() candidates.extend(data) endpoint = response.headers.get('link', '').split('next')[-1]

An example of a response

At this point, candidates should contain all the data.

To recap, this Python script sends a GET request to the API endpoint defined above, fetches candidates' data, and stores it in the candidates list. The script uses basic authentication with the API key provided and iterates through all pages of the data using Greenhouse's cursor-based pagination. Moreover, the response.raise_for_status()line will raise an HTTP error if one occurred, which can be helpful for debugging.

After the script finishes, <code class="blog_inline-code">thecandidates`</code> variable contains the candidates data, which can be processed as needed. An example of an individual item returned by this API endpoint can look as follows:


{
    "id": "1627289354398",
    "tags": [
        "developer",
        "newyork",
        "remote"
    ],
    "title": "Software Developer",
    "company": "Tech Solutions",
    "addresses": [
        {
            "type": "home",
            "value": "123 Main St, New York, NY"
        }
    ],
    "can_email": true,
    "last_name": "Smith",
    "recruiter": {
        "id": "1627289354399",
        "name": "John Doe",
        "last_name": "Doe",
        "first_name": "John"
    },
    "created_at": "2021-07-26T14:22:34Z",
    "educations": [
        {
            "id": "1627289354400",
            "degree": "Bachelor of Science in Computer Science",
            "discipline": "Computer Science",
            "school_name": "Harvard University"
        }
    ],
    "first_name": "John",
    "is_private": false,
    "updated_at": "2021-07-26T14:22:34Z",
    "coordinator": {
        "id": "1627289354401",
        "name": "Jane Doe",
        "last_name": "Doe",
        "first_name": "Jane"
    },
    "applications": [
        {
            "id": "1627289354402",
            "jobs": [
                {
                    "id": "1627289354403",
                    "name": "Software Developer"
                }
            ],
            "source": {
                "id": "1627289354404",
                "public_name": "LinkedIn"
            },
            "status": "Active",
            "prospect": false,
            "applied_at": "2021-07-26T14:22:34Z",
            "credited_to": {
                "id": "1627289354405",
                "name": "John Doe",
                "last_name": "Doe",
                "first_name": "John"
            },
            "candidate_id": "1627289354398",
            "current_stage": {
                "id": "1627289354406",
                "name": "Screening"
            },
            "last_activity_at": "2021-07-26T14:22:34Z"
        }
    ],
    "last_activity": "2021-07-26T14:22:34Z",
    "phone_numbers": [
        {
            "type": "home",
            "value": "1627289354407"
        }
    ],
    "email_addresses": [
        {
            "type": "home",
            "value": "johnsmith@example.com"
        }
    ],
    "website_addresses": [
        {
            "type": "LinkedIn",
            "value": "https://linkedin.com/in/johnsmith"
        }
    ],
    "social_media_addresses": [
        {
            "value": "https://twitter.com/johnsmith"
        }
    ]
}

Want to integrate with more ATS systems?

For those striving to connect their product with multiple ATS applications, Merge's ATS Unified API offers an ideal solution. 

It enables access to a broad range of ATS platforms, such as Greenhouse, with just a single call. This eliminates the necessity for numerous integrations, thereby streamlining your system integration process. By doing so, you can dedicate more time and resources towards enhancing your primary product, all while broadening your potential market reach.

To learn more about Merge, you can request a demo.