Building API and Dashboard of HubSpot Data

Project Overview

The primary objective was to seamlessly integrate HubSpot data with AWS, enabling real-time data analysis and visualization in Amazon QuickSight. "We designed a scalable, automated analytics pipeline by extracting data from HubSpot, storing it in AWS S3 and Glue, and leveraging SPICE in QuickSight," Sean shared. The result was a streamlined approach that allows real-time visualization of CRM data directly in QuickSight.

Accessing HubSpot Data

Accessing HubSpot data required setting up a private app within HubSpot, which granted API access. "Using the API key, I referenced HubSpot's API documentation to determine the key endpoints needed for data extraction," he explained. His approach focused on retrieving data from the Contacts, Companies, and Deals endpoints through GET requests, ensuring all necessary information was available for analysis.

Data Storage

"Once the data was retrieved, I saved it in Parquet format, which is highly efficient for large-scale analytics," Sean noted. These Parquet files were written to AWS S3 and registered in the AWS Glue Data Catalog, making them easily queryable within AWS.

Data Refresh Strategy

The team implemented an incremental update strategy to avoid unnecessary full refreshes. "Instead of pulling all data every time, we only retrieve new or updated records," he said. This approach significantly reduced processing time and prevented data duplication.

Additionally, he emphasized the importance of handling date fields properly. "Date fields are a common issue in API data extraction, so I carefully inspected the data types in the extracted DataFrames before saving them to Parquet to ensure proper formatting," he added.

Pipeline Automation

To eliminate manual data extraction, Sean leveraged AWS Lambda for automation. "We set up a Lambda function that runs on a scheduled basis using Amazon EventBridge," Sean explained. This function automates API requests to HubSpot, retrieves the latest data, and saves it as Parquet files in S3.

The integration with AWS Glue ensured that data remained queryable in QuickSight. "By using serverless execution, the automation is cost-efficient and requires no manual intervention while maintaining fresh data availability," he shared.

Tools and Programming Languages

Python was the primary language used for this project due to its flexibility and extensive support for data manipulation. "We used several key libraries to streamline the process," he said, listing:

requests — for making API calls
pandas — for structuring and cleaning data
pyarrow — for converting data to Parquet format
boto3 — for uploading data to S3
awswrangler — for interacting with AWS Glue

"These libraries allowed us to handle incremental updates efficiently and ensure smooth data ingestion," he added.

HubSpot Base Tables

The primary HubSpot tables accessed included:

Contacts — containing customer and lead details
Companies — storing organizational information
Deals — tracking sales opportunities

Careful data type handling was essential. "We converted date fields to `datetime64` in Pandas to avoid format mismatches and ensured that ID and numeric fields were stored as `int64` or `float64` for accurate aggregations," Sean explained.

Data Transformations and Joins

"Absolutely," Sean said when asked about transformations. He detailed how he created a joined table combining Contacts and Companies data to enhance visualization and simplify analysis. "This table includes key details like contact names, email, company name, industry, and lifecycle stage, making filtering and segmentation in QuickSight much easier," he explained.

The join was performed using the company ID as the key, ensuring a structured one-to-many relationship. "By merging these datasets, we unlocked valuable insights, such as tracking engagement levels by company and understanding customer interactions on an organizational level," Sean added.

He also emphasized that all transformations, including field mappings, data cleaning, and joins, were handled at the DataFrame level before storage. "This approach gives us greater flexibility while optimizing performance," he noted.

Why Amazon QuickSight?

QuickSight was the natural choice due to its seamless integration with AWS services. "It offers several key benefits," Sean highlighted:

SPICE storage boosts performance and allows for rapid analysis.
Native support for Parquet files makes it easy to connect data from AWS Glue and S3.
Dynamic filtering and drill-down capabilities enhance analytics flexibility.

"Once we structured the Parquet files correctly, I connected them to QuickSight, built a new analysis, and designed interactive dashboards to visualize the data effectively," Sean shared.

Final Outcome

The end product was a fully automated, scalable, and efficient data pipeline. "This integration enables us to track key business metrics, analyze trends, and make data-driven decisions without any manual data handling," he said.

The combination of AWS Lambda automation, AWS Glue cataloging, and QuickSight visualization has transformed how the team interacts with their HubSpot data. "With real-time data availability, we can now respond quickly to business needs and uncover insights that drive strategic decisions," Sean concluded.