Project Overview
To streamline marketing operations in the healthcare space, I built and deployed a pair of custom-coded Python tools hosted on a Linux-based Google Cloud server. These tools were designed to:
- Enrich incomplete HubSpot contact data by integrating Apollo.io’s API.
- Scrape Reddit healthcare subreddits to extract negative patient sentiment about treatments, feeding our content and campaign strategy.
Problems
Our marketing team was struggling with two major challenges:
- Incomplete CRM contact records were limiting personalization and segmentation—vital for healthcare outreach.
- We lacked authentic patient insight for ad messaging. Traditional market research wasn’t giving us the unfiltered opinions we needed.
Solution Overview
I proposed and implemented two automated tools:
- Tool 1: CRM Contact Enrichment
- Tool 2: Reddit Sentiment Analysis Engine
Both tools ran on a Google Cloud Linux server, scheduled with cron jobs, and were built using Python with integrated API authentication, logging, and error handling.
Tool 1 – CRM Contact Enrichment via Apollo.io
Goal: Automatically fill in missing HubSpot contact fields (like title, company, location) using Apollo.io’s enriched data.
Tech Stack:
- Python
- HubSpot API
- Apollo.io API
- Google Cloud (Compute Engine)
- cron jobs
Challenge: Apollo.io’s API had a limit of 600 requests/day, which meant naïvely looping through contacts would cause failure halfway through.
Solution: I implemented rate limiting and retry logic using Python’s time.sleep()
and custom backoff strategies.
Snippet
import time
import requests
def call_apollo_api(email):
try:
response = requests.get(f"https://api.apollo.io/v1/match?email={email}", headers=headers)
if response.status_code == 429:
print("Rate limit hit. Sleeping...")
time.sleep(60)
return call_apollo_api(email)
return response.json()
except Exception as e:
print(f"Error fetching Apollo data: {e}")
return None
Problem: Matching Fields Correctly
Challenge: Data from Apollo didn’t always match HubSpot’s field format (e.g., job_title
vs title
, or missing fields altogether).
Solution: I created a mapping dictionary and fallback logic to ensure syncing.
apollo_to_hubspot = {
"title": "job_title",
"organization_name": "company",
"location": "city"
}
def map_fields(apollo_data):
return {hub_key: apollo_data.get(api_key, '') for hub_key, api_key in apollo_to_hubspot.items()}
Results
- +35% improvement in contact completeness
- Enabled more accurate segmentation for outreach campaigns
- Fully automated via cron job (daily sync)
Tool 2: Reddit Sentiment Scraper
Goal: Extract real-world complaints and concerns from Reddit posts to guide our campaign messaging.
Problem: Scraping Reddit alone gave us huge volumes of text, but much of it was:
- Neutral
- Off-topic
- Sarcastic or linguistically complex
Basic keyword filters weren’t enough, and simpler sentiment tools like TextBlob didn’t understand Reddit’s nuance or tone.
Solution: VADER Sentiment + Custom Filters
I integrated VADER from NLTK for its accuracy with social media-style text and short, casual phrasing.
Snippet
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
def is_negative_vader(text):
scores = analyzer.polarity_scores(text)
return scores['compound'] <= -0.4 and scores['neg'] > 0.2
Experimentation
I went through multiple rounds of tuning to improve the quality of scraped posts:
- Thresholds: Tested
compound
cutoffs from -0.2 to -0.6. Foundcompound <= -0.4
withneg > 0.2
consistently surfaced truly negative posts. - Sarcasm Handling: Excluded posts where
pos > neg
but compound was still negative—this helped filter false negatives due to sarcasm. - Keyword Layering: Combined sentiment filters with domain-specific keywords like
"pain clinic"
,"treatment failed"
,"doctor won’t listen"
, etc.
def is_valid_post(text, keywords):
sentiment = analyzer.polarity_scores(text)
negative_enough = sentiment['compound'] <= -0.4 and sentiment['neg'] > 0.2
keyword_hit = any(k in text.lower() for k in keywords)
return negative_enough and keyword_hit
Sample Output
Subreddit | Excerpt | VADER Score – Compounded |
r/ChronicPain | “I’ve tried 5 treatments. Not one made a difference.” | -0.65 |
r/AskDocs | “Doc said it’s in my head. Still in constant pain.” | -0.58 |
r/PainManagement | “This injection made everything worse.” | -0.72 |
Why it mattered
- These posts became a goldmine of messaging inspiration.
- Our team could speak directly to concerns patients were actually expressing, not generic personas.
- It led to better resonance in ad copy, email hooks, and landing page language.