Building a Podcast Network Analytics Stack from Scratch

Building a Podcast Network Analytics Stack from Scratch

The typical podcast network analytics situation looks something like this: Apple Podcasts Connect for one view of downloads, Spotify for Podcasters for another, a third-party prefix like OP3 or Chartable for "source of truth" downloads, a spreadsheet the sales team built to track CPM and sponsor delivery, and whatever each hosting platform (Buzzsprout, Libsyn, Megaphone, RSS.com) shows in its own dashboard. A head of content trying to answer "which shows are growing, which are declining, and why" has to open five tabs and reconcile numbers that don't agree with each other.

This is not primarily a tooling problem. It's a measurement definition problem. Apple counts a download when 1+ bytes are transferred by a unique device within 24 hours of publish. IAB v2.1 compliant hosting platforms count a download when a sufficient number of bytes are transferred, deduplication applied, bots filtered. Spotify's play count reflects streams, not downloads. If you're comparing these numbers directly, you're not doing analytics — you're doing numerical fiction.

Starting with the Right Measurement Foundation

Before you build anything, you need one authoritative download count. That number should come from an IAB v2.1 certified hosting platform or a certified third-party measurement prefix. IAB certification means the platform has been audited against the Interactive Advertising Bureau's podcast measurement guidelines — bot filtering applied, duplicate downloads within 24-hour windows removed, minimum byte thresholds enforced. This is the number your sponsors will accept as contractual delivery. Everything else is directional.

The OP3 (Open Podcast Prefix Project) prefix is worth understanding for networks that want an open-source, privacy-respecting way to add a measurement layer on top of any hosting platform. OP3 logs individual download events and makes the anonymized data available via API. It doesn't replace an IAB-certified platform, but for networks that are locked into a hosting platform without strong analytics, it adds a consistent counting layer across all shows.

Once you have a consistent download source, you can start building the stack. The components most networks actually need, in order of priority:

  1. Episode-level download data, normalized to IAB v2.1, queryable by show and date range
  2. Retention data (completion rates, average consumption percentage) from Apple Podcasts Connect and Spotify for Podcasters — these are directional, not exact, but they're the only retention signal available at scale
  3. Subscriber/follower counts over time from Apple and Spotify, tracked weekly
  4. Geographic breakdown by DMA and country, at minimum monthly
  5. Social clip performance data if you're publishing social content

The Gap Between What Platforms Provide and What You Actually Need

Apple Podcasts Connect provides episode-level data, but its API access is limited and the data export format has changed multiple times. The platform's "listeners" metric (unique devices with 20%+ completion) and "plays" metric (any playback event) are useful for understanding episode engagement, but they're not downloads and shouldn't be treated as downloads.

Spotify for Podcasters provides streams, starts, and listeners — all of which measure Spotify-native behavior only. If your show has 20% Spotify listenership, Spotify data represents a fifth of your audience. Using Spotify's retention data as a proxy for overall audience behavior requires adjusting for the fact that Spotify listeners skew younger, are more likely to be listening on mobile, and are more likely to be reached through algorithmic discovery rather than RSS subscription. The retention patterns are real, but they're Spotify-specific.

Neither platform provides show-to-show cross-promotion attribution, audience overlap analysis, or cohort-level retention (how do listeners acquired through social clips compare to organic RSS subscribers in their long-term completion rates?). For a network managing 15+ shows, these are exactly the questions that drive programming and promotional decisions. They require building outside the native platform dashboards.

What a Minimal Viable Analytics Stack Looks Like

For a growing network building from scratch, the minimum viable stack has three layers:

Layer 1 — Data collection. Your IAB v2.1 certified hosting platform is the source of truth for downloads. Add platform-level API pulls for Apple Podcasts Connect and Spotify for Podcasters. If you're running social clips, add pulls from each platform's content API (YouTube Analytics, Instagram Insights, TikTok Analytics). Set these on daily or weekly schedules depending on how often you actually make decisions with the data.

Layer 2 — Storage and normalization. All of this data needs to land somewhere queryable. For networks with technical resources, a data warehouse (BigQuery, Snowflake, or even a Postgres instance) gives you the flexibility to define consistent metrics across shows — a "listener" defined consistently whether the data came from Apple or Spotify. For networks without dedicated data engineering, a spreadsheet-based approach using Google Sheets and Zapier/Make automations can get you 70% of the value with a fraction of the complexity, provided you're disciplined about schema.

Layer 3 — Reporting layer. This is where VP-level decisions get made. A weekly network performance report needs: downloads by show this week vs. last week vs. 90 days ago; subscriber trends; episode-level top and bottom performers by completion rate; and a cross-show comparison on whatever your current growth metric is. Looker Studio, Metabase, or even a well-structured Google Sheet with charts serves this function. The goal is one view, not nine tabs.

Where Networks Consistently Get This Wrong

The most common failure mode isn't under-investment in tooling — it's conflating metrics that measure different things. Downloads measure reach. Completion rates measure engagement per episode. Subscriber growth measures long-term retention momentum. A show with declining downloads but improving completion rates is a different problem than a show with declining downloads and declining completion rates. Treating "downloads are down" as the only signal leads to the wrong interventions.

The second failure mode is measuring shows against each other when they shouldn't be. A true crime narrative show and a business interview show shouldn't have the same benchmarks. True crime narrative shows on growing networks typically see 40–65% completion rates and higher episode-to-episode retention because listeners follow story arcs. Business interview shows typically see 35–55% completion and lower carry-through rates because each episode is more standalone. Averaging these together into a single "network completion rate" tells you almost nothing useful.

We're not saying platform-native analytics are bad — they're genuinely useful for understanding behavior within each platform's ecosystem. We're saying they're insufficient as a single source of truth for network-level programming decisions, and treating them as such leads to decisions based on incomplete data.

What to Build vs. What to Buy

Networks frequently ask whether they should build their own analytics infrastructure or use a third-party solution. The honest answer depends on your technical resources and decision velocity. If your team can't run a SQL query, building a custom data warehouse is probably not the right first step — the overhead of maintaining it will quickly exceed the operational benefits.

What matters more than tooling is defining your measurement standards clearly before you pick a tool. Know what your "download" means, know which platform's retention data you're using and why, know what you're comparing when you compare two shows. The best analytics stack for a 12-show network is the one the team actually uses to make decisions weekly — not the most technically sophisticated one that gets abandoned because it's too hard to query.

Whatever stack you build, instrument it so it gets easier over time. New shows should automatically inherit the same measurement infrastructure as existing shows. New platforms should have defined onboarding procedures for how their data maps to your existing schema. The goal is a system where adding a show is a configuration task, not a project.