How to Build Spotify's "Discover Weekly": The Hybrid Filtering Playbook

Learn how to build personalized music recommendations like Spotify's Discover Weekly by blending collaborative filtering (ELSA) with content enrichment (genre/mood extraction), combining multiple signals in ShapedQL for hybrid recommendations.

Mar 9, 2026

min read

Nic Scheltema

Quick Answer: Collaborative Signals + Content Understanding

Spotify’s Discover Weekly works because it combines two different types of intelligence:

Collaborative filtering (ELSA): “Users who listened to these tracks also listened to…”
Content understanding: Genre, mood, tempo, instrumentation extracted from track metadata

Neither works well alone:

Collaborative filtering alone can’t recommend new releases (no listening history yet)
Content-based filtering alone misses the subtle taste patterns that make recommendations feel magical

Hybrid filtering blends both signals. The result: personalized recommendations that work for new releases (using content features) and surface hidden gems based on taste patterns (using collaborative signals).

Key Takeaways:

Pure collaborative filtering fails for new items — no listening history = can’t recommend
Pure content-based fails for discovery — matching genres is too obvious, not surprising
Hybrid blending solves both — use ELSA for taste patterns, content features for cold-start
AI enrichment extracts content — LLMs pull genre/mood/vibe from track metadata automatically
Score ensembles combine signals — blend collaborative + content scores in ShapedQL

Time to read: 24 minutes | Includes: 8 code examples, 2 architectures, 1 comparison table

The Cold Start Problem
Why Pure Collaborative Filtering Fails
Why Pure Content-Based Fails
Part 1: Traditional Hybrid Approach
Part 2: The Shaped Way — ELSA + AI Enrichment
Building the System
Score Ensemble Strategies
Comparison Table
FAQ

The Cold Start Problem

You’re building a music recommendation system. A new track drops today. How do you recommend it?

The Dilemma

Collaborative filtering says: “I can’t recommend this track—no one has listened to it yet, so I don’t know who would like it.”

Content-based filtering says: “This track is tagged as ‘indie rock’. I’ll recommend it to users who listen to indie rock.”

But “indie rock” is too broad. Within that genre:

Arctic Monkeys sounds nothing like Tame Impala
The Strokes sounds nothing like Radiohead
Phoenix sounds nothing like The National

Genre tags alone don’t capture vibe, energy, instrumentation, or mood.

What Discover Weekly Does

Spotify blends:

Collaborative signals — “Users who liked Tame Impala also liked Unknown Mortal Orchestra”
Audio features — Tempo, energy, danceability, valence (mood), acousticness
Text features — Genre tags, similar artists, playlist co-occurrences

When a new track drops, Spotify:

Extracts audio features (tempo, energy, mood) from the audio file
Finds tracks with similar audio features
Uses collaborative patterns from those similar tracks to recommend the new release

Result: New tracks get recommended to the right users immediately, even with zero listening history.

Why Pure Collaborative Filtering Fails

Collaborative filtering learns user taste from interaction history: “Users who listened to X also listened to Y.”

How ELSA Works

ELSA (Efficient Latent Sparse Autoencoder) learns item-item relationships by reconstructing user listening vectors.

If many users listen to both Track A and Track B, ELSA learns they’re similar. If User 123 listened to Track A but not Track B, ELSA recommends Track B.

Training data:

User 1: [Track A, Track C, Track E]
User 2: [Track A, Track B, Track D]
User 3: [Track B, Track C, Track F]

ELSA learns:

Track A and Track B co-occur → similar
Track B and Track C co-occur → similar
Track A → Track B is a strong recommendation

The Cold Start Failure

New track drops: Track Z (released today)

Interaction history: Zero listens

ELSA says: “I have no data on Track Z. I can’t compute similarity to any other track. I can’t recommend it.”

Result: New releases don’t get recommended until they accumulate listening history. By the time ELSA can recommend them, they’re no longer new.

Other Collaborative Filtering Limitations

Popularity bias: ELSA over-recommends popular tracks (they appear in many user histories)
Echo chamber: ELSA reinforces existing taste, doesn’t help users discover new genres
Sparse user problem: Users with few listens get poor recommendations (not enough history)

Limitation	What Happens	Impact
Cold start	New releases have zero listening history	Not recommended until they go viral
Popularity bias	Popular tracks appear in more user histories	Top 1% of catalog gets 80% of recommendations
Echo chamber	Reinforces existing taste	Users never discover new genres
Sparse users	Few listens = not enough history	Poor recommendations for new users

Why Pure Content-Based Fails

Content-based filtering recommends items similar to what the user already likes, based on features like genre, artist, tempo, or mood.

How It Works

User listens to Track A (indie rock, 120 BPM, high energy)
Find tracks with similar features (indie rock, 115-125 BPM, high energy)
Recommend the most similar tracks

# content_similarity.py
# User's recent listens
user_tracks = [
    {"title": "Do I Wanna Know?", "genre": "indie rock", "tempo": 85, "energy": 0.8},
    {"title": "R U Mine?", "genre": "indie rock", "tempo": 89, "energy": 0.9}
]

# Candidate track
candidate = {"title": "Feel It Still", "genre": "indie rock", "tempo": 79, "energy": 0.7}

# Similarity score (cosine similarity on features)
similarity = cosine_similarity(user_avg_features, candidate_features)
# → High similarity, recommend it

Why This Fails for Discovery

Problem 1: Too obvious

If you only recommend tracks with similar features, you’re showing users what they already know.

User listens to Arctic Monkeys → Recommend The Strokes, The Libertines, Franz Ferdinand

These are obvious recommendations. Not surprising. Not delightful.

Problem 2: Feature mismatch

Genre and tempo don’t capture vibe.

“Indie rock, 120 BPM, high energy” could describe:

Arcade Fire - “Ready to Start” (anthemic, orchestral)
MGMT - “Electric Feel” (psychedelic, synth-heavy)
Yeah Yeah Yeahs - “Heads Will Roll” (post-punk, danceable)

All have the same metadata but completely different vibes.

Problem 3: No taste learning

Content-based filtering doesn’t learn user taste. It only matches features.

If a user loves Radiohead but hates Muse (both tagged “alternative rock”), content-based can’t tell the difference. It recommends both.

Part 1: Traditional Hybrid Approach

The traditional approach combines collaborative and content-based models as separate systems, then blends their outputs with a weighted average.

Architecture

Collaborative

User listening history

↓

ALS / Matrix Factorization

↓

Top 100 candidates
(collaborative score)

Content-Based

Track metadata

↓

TF-IDF + cosine similarity

↓

Top 100 candidates
(content score)

↓

Weighted Blend

final_score = 0.7 × collab_score + 0.3 × content_score

→ Rank by final_score → Top 20 recommendations

Implementation

Step 1: Train collaborative filtering model

# train_collaborative.py
from implicit.als import AlternatingLeastSquares
import scipy.sparse as sp

# Build user-item interaction matrix
# rows = users, cols = tracks, values = play count
interaction_matrix = sp.csr_matrix(...)

# Train ALS model
model = AlternatingLeastSquares(factors=64, iterations=15)
model.fit(interaction_matrix)

# Save model
model.save('als_model.pkl')

Step 2: Build content-based features

# build_content_features.py
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

# Load track metadata
tracks = pd.read_csv('tracks.csv')
# Columns: track_id, title, artist, genre, tempo, energy, valence

# Create text features from metadata
tracks['content_text'] = (
    tracks['genre'] + ' ' +
    tracks['artist'] + ' ' +
    'tempo_' + tracks['tempo'].astype(str) + ' ' +
    'energy_' + (tracks['energy'] * 10).astype(int).astype(str)
)

# TF-IDF vectorization
vectorizer = TfidfVectorizer(max_features=500)
content_vectors = vectorizer.fit_transform(tracks['content_text'])

Step 3: Generate recommendations by blending scores

# hybrid_recommender.py
def get_hybrid_recommendations(user_id, n=20):
    # Get collaborative filtering candidates
    user_vector = interaction_matrix[user_id]
    collab_scores = als_model.recommend(user_id, user_vector, N=100, filter_already_liked_items=True)

    # Get content-based candidates (based on user's recent listens)
    user_recent_tracks = get_user_recent_tracks(user_id, n=10)
    user_content_vector = np.mean(content_vectors[user_recent_tracks], axis=0)

    content_scores = {}
    for track_id in range(len(content_vectors)):
        if track_id in user_recent_tracks:
            continue  # Skip already listened
        similarity = cosine_similarity([user_content_vector], [content_vectors[track_id]])[0][0]
        content_scores[track_id] = similarity

    # Merge and blend scores
    collab_dict = {track_id: score for track_id, score in collab_scores}
    content_dict = {track_id: score for track_id, score in sorted(content_scores.items(), key=lambda x: x[1], reverse=True)[:100]}

    all_tracks = set(collab_dict.keys()) | set(content_dict.keys())

    blended_scores = {}
    for track_id in all_tracks:
        collab_score = collab_dict.get(track_id, 0)
        content_score = content_dict.get(track_id, 0)
        # Weighted blend: 70% collaborative, 30% content
        blended_scores[track_id] = 0.7 * collab_score + 0.3 * content_score

    return sorted(blended_scores.items(), key=lambda x: x[1], reverse=True)[:n]

Problems with This Approach

Problem	Why It Hurts
Two separate models to train and maintain	ALS for collaborative, TF-IDF for content — double the infrastructure
Manual feature engineering	You create “tempo_85 energy_8” text features by hand
Crude blending	Weighted average doesn’t adapt to context (new releases vs. catalogue)
No unified scoring	Can’t easily add a third signal (popularity, recency, user context)
Scaling issues	Content-based cosine similarity is O(n²) for n tracks

Part 2: The Shaped Way — ELSA + AI Enrichment

Shaped unifies collaborative filtering (ELSA) and content understanding (AI enrichment) in a single engine, with flexible score blending in ShapedQL.

Architecture

Collaborative stream

User listening history

↓

ELSA model training

↓

ELSA embeddings

Content stream

Track metadata (title, artist, album)

↓

AI Enrichment View
(LLM extracts genre, mood, vibe, instruments)

↓

Content embeddings
(text-embedding-3-small)

↓

ShapedQL Query

Retrieve via ELSA similarity

Score ensemble: 0.7 × ELSA + 0.3 × content

Return top 20

↓

Personalized playlist — works for new releases and hidden gems

Key difference: Content features are extracted automatically by LLMs (not manual), and blending happens in the query (not in application code).

Implementation

Step 1: Create AI enrichment view

# views/track_enrichment.yaml
version: v2
name: track_enrichment
view_type: AI_ENRICHMENT
source_table: tracks
enrichment:
  prompt: |
    Given this music track:
    Title: {title}
    Artist: {artist}
    Album: {album}

    Extract the following as JSON:
    {
      "genre": "primary genre (e.g., indie rock, electronic, hip-hop)",
      "subgenre": "more specific subgenre",
      "mood": "overall mood (e.g., melancholic, energetic, chill)",
      "vibe": "vibe or feeling (e.g., dreamy, aggressive, uplifting)",
      "instruments": "primary instruments (e.g., guitar, synth, drums)",
      "era": "musical era or decade influence"
    }

  output_columns:
    - name: genre
      type: STRING
    - name: subgenre
      type: STRING
    - name: mood
      type: STRING
    - name: vibe
      type: STRING
    - name: instruments
      type: STRING
    - name: era
      type: STRING

This view runs an LLM over each track and extracts structured content features automatically.

Example enrichment output:

track_id	title	artist	genre	mood	vibe	instruments
track_1	Do I Wanna Know?	Arctic Monkeys	indie rock	brooding	dark, sultry	guitar, bass, drums
track_2	Electric Feel	MGMT	psychedelic pop	euphoric	trippy, groovy	synth, bass, vocals
track_3	Feel It Still	Portugal. The Man	indie pop	upbeat	retro, funky	guitar, bass, keys

Step 2: Configure engine with ELSA + content embeddings

# engines/music_recommendations.yaml
version: v2
name: music_recommendations

data:
  item_table:
    name: tracks
  interaction_table:
    name: user_listens
  views:
    - name: track_enrichment  # AI-enriched content

index:
  # Collaborative filtering: ELSA embeddings
  - name: elsa_embedding
    encoder:
      name: elsa
      type: trained
    columns:
      - user_id
      - track_id
      - listen_count

  # Content-based: Text embeddings on enriched features
  - name: content_embedding
    encoder:
      name: text-embedding-3-small
      provider: openai
    columns:
      - track_enrichment.genre
      - track_enrichment.mood
      - track_enrichment.vibe
      - track_enrichment.instruments

training:
  models:
    - name: collaborative_ranker
      policy_type: elsa
      ranking_expression: |
        1.0 / (1.0 + rank(embedding="elsa_embedding"))

Shaped trains:

ELSA on user listening history (collaborative filtering)
Text embeddings on AI-enriched content features (genre, mood, vibe)

Both are stored in the same engine.

Step 3: Query with hybrid score ensemble

# app.py
import requests

SHAPED_API_KEY = "your-api-key"

def get_discover_weekly(user_id: str, limit: int = 20):
    """
    Hybrid recommendations: blend ELSA (collaborative) + content similarity
    """
    response = requests.post(
        "https://api.shaped.ai/v2/engines/music_recommendations/query",
        headers={"x-api-key": SHAPED_API_KEY},
        json={
            "query": """
                SELECT *
                FROM similarity(
                    embedding_ref='elsa_embedding',
                    encoder='interaction_pooling',
                    limit=500
                )
                WHERE track_id NOT IN $already_listened
                ORDER BY score(
                    expression='
                        0.7 / (1.0 + rank(embedding="elsa_embedding"))
                        + 0.3 / (1.0 + rank(embedding="content_embedding"))
                    '
                )
                LIMIT $limit
            """,
            "parameters": {
                "user_id": user_id,
                "already_listened": get_user_history(user_id),
                "limit": limit
            },
            "return_metadata": True
        }
    )

    return response.json()['results']

What’s happening:

Retrieve: similarity(embedding_ref='elsa_embedding') retrieves 500 candidates based on collaborative filtering (ELSA)
Filter: WHERE track_id NOT IN $already_listened removes tracks the user already heard
Score: Blend ELSA rank (70%) + content similarity rank (30%)
Return: Top 20 tracks

Handling Cold Start

For new releases with zero listening history, adjust the blend:

# app.py
def get_new_release_recommendations(user_id: str, limit: int = 20):
    """
    For new releases: prioritize content similarity (ELSA has no data)
    """
    response = requests.post(
        "https://api.shaped.ai/v2/engines/music_recommendations/query",
        headers={"x-api-key": SHAPED_API_KEY},
        json={
            "query": """
                SELECT *
                FROM similarity(
                    embedding_ref='content_embedding',
                    encoder='interaction_pooling',
                    limit=500
                )
                WHERE days_since_release <= 7
                  AND track_id NOT IN $already_listened
                ORDER BY score(
                    expression='
                        0.2 / (1.0 + rank(embedding="elsa_embedding"))
                        + 0.8 / (1.0 + rank(embedding="content_embedding"))
                    '
                )
                LIMIT $limit
            """,
            "parameters": {
                "user_id": user_id,
                "already_listened": get_user_history(user_id),
                "limit": limit
            }
        }
    )

    return response.json()['results']

For new releases:

Retrieve via content_embedding (not ELSA, since new tracks have no collaborative data)
Blend: 20% ELSA + 80% content (rely more on content features)

Building the System

Full Workflow

1. Ingest data

# tables/tracks.yaml
version: v2
name: tracks
connector:
  type: postgres
  connection_string: $DATABASE_URL
  table: tracks
schema:
  - name: track_id
    type: STRING
  - name: title
    type: STRING
  - name: artist
    type: STRING
  - name: album
    type: STRING
  - name: release_date
    type: TIMESTAMP

# tables/user_listens.yaml
version: v2
name: user_listens
connector:
  type: postgres
  connection_string: $DATABASE_URL
  table: user_listens
schema:
  - name: user_id
    type: STRING
  - name: track_id
    type: STRING
  - name: listen_count
    type: INTEGER
  - name: last_listened_at
    type: TIMESTAMP

2. Create AI enrichment view

# terminal
shaped create-view --file views/track_enrichment.yaml

Shaped runs the LLM enrichment and materializes results.

3. Train engine

# terminal
shaped create-engine --file engines/music_recommendations.yaml

Shaped:

Trains ELSA on user_listens (collaborative filtering)
Generates text embeddings on enriched content from track_enrichment
Builds vector indexes for both

4. Query for recommendations

# app.py
recommendations = get_discover_weekly(user_id="user_12345", limit=20)

Score Ensemble Strategies

You’re not limited to 70/30 blends. ShapedQL supports flexible scoring expressions.

Strategy 1: Adaptive blending by track age

-- adaptive_blend.sql
ORDER BY score(
    expression='
        CASE
            WHEN days_since_release < 7 THEN
                -- New releases: prioritize content
                0.3 / (1.0 + rank(embedding="elsa_embedding"))
                + 0.7 / (1.0 + rank(embedding="content_embedding"))
            ELSE
                -- Older tracks: prioritize collaborative
                0.7 / (1.0 + rank(embedding="elsa_embedding"))
                + 0.3 / (1.0 + rank(embedding="content_embedding"))
        END
    '
)

Strategy 2: Boost by popularity for mainstream users

-- popularity_boost.sql
ORDER BY score(
    expression='
        0.6 / (1.0 + rank(embedding="elsa_embedding"))
        + 0.3 / (1.0 + rank(embedding="content_embedding"))
        + 0.1 / (1.0 + item._derived_popular_rank)
    '
)

Strategy 3: Penalize over-represented genres

-- genre_diversity.sql
ORDER BY score(
    expression='
        (0.7 / (1.0 + rank(embedding="elsa_embedding"))
        + 0.3 / (1.0 + rank(embedding="content_embedding")))
        * (1.0 - 0.2 * user_genre_saturation)
    '
)

Where user_genre_saturation is a computed feature: “what % of this user’s recent listens are in the same genre as this track?”

Comparison: Traditional vs. Shaped

Component	Traditional Hybrid	Shaped Hybrid
Collaborative model	ALS / Matrix Factorization (separate system)	ELSA (built-in, trained automatically)
Content features	Manual feature engineering (TF-IDF on “genre tempo energy”)	AI enrichment (LLM extracts genre, mood, vibe)
Content model	Cosine similarity on TF-IDF vectors	Text embeddings (OpenAI, Cohere, etc.)
Blending logic	Hardcoded in application (`0.7 * a + 0.3 * b`)	ShapedQL score expressions (flexible, query-time)
Cold start handling	Separate code path for new items	Adjust blend weights in query (`CASE WHEN days_since_release < 7`)
Adaptive scoring	Requires application logic rewrite	Change score expression in query
Code to maintain	~500 lines (model training + feature engineering + blending)	~50 lines (YAML config + query)
Scalability	Manual sharding, caching, model serving	Automatic (Shaped handles indexing, serving, scaling)

FAQ

Q: Why not just use collaborative filtering (ELSA)?

A: ELSA fails for new releases (no listening history) and doesn’t capture content similarity (can’t recommend based on mood/vibe). Pure collaborative also creates echo chambers—users only see more of what they already like.

Q: Why not just use content-based filtering?

A: Content features (genre, tempo) are too crude. Many tracks have the same genre but completely different vibes. Content-based also doesn’t learn user taste—it just matches features. You miss the collaborative signal: “users who liked X also liked Y.”

Q: How does AI enrichment help?

A: LLMs extract nuanced features (mood, vibe, instrumentation) that aren’t in your metadata. Instead of “indie rock”, you get “brooding, guitar-driven, dark, sultry.” These richer features power better content-based recommendations.

Q: What if I don’t have track metadata?

A: Use audio features (if available): tempo, energy, danceability, valence. Or use multimodal AI enrichment on album art. Or rely purely on ELSA (collaborative) and accept that new releases won’t recommend well until they have listening history.

Q: How do I tune the blend weights (70% ELSA, 30% content)?

A: Run A/B tests. Try 60/40, 70/30, 80/20 and measure offline metrics (NDCG@10) and online metrics (listen-through rate, skip rate). The optimal blend depends on your data and user behavior.

Q: Can I blend more than 2 signals?

A: Yes. ShapedQL supports arbitrary scoring expressions. You can blend ELSA + content + popularity + recency + user context:

-- multi_signal_blend.sql
ORDER BY score(
    expression='
        0.5 / (1.0 + rank(embedding="elsa_embedding"))
        + 0.2 / (1.0 + rank(embedding="content_embedding"))
        + 0.1 / (1.0 + item._derived_popular_rank)
        + 0.1 / (1.0 + days_since_release)
        + 0.1 * user_affinity_score
    '
)

Q: What about Spotify’s actual implementation?

A: Spotify uses a combination of collaborative filtering, audio feature analysis (via their own audio models), NLP on playlist names and descriptions, and user context (time of day, device, activity). They don’t publish exact details, but the principles are the same: hybrid filtering with multiple signals.

Conclusion

Discover Weekly works because it blends collaborative filtering (learning taste from listening history) with content understanding (genre, mood, vibe extracted from metadata).

Pure collaborative filtering fails for new releases—no listening history means no recommendations. Pure content-based filtering is too obvious—matching genres doesn’t capture the subtle taste patterns that make recommendations magical.

Hybrid filtering solves both: use ELSA for collaborative signals, AI enrichment for content features, and ShapedQL score ensembles to blend them adaptively.

The traditional approach requires two separate models (ALS + TF-IDF), manual feature engineering, and hardcoded blending logic. Shaped unifies everything: ELSA for collaborative filtering, AI enrichment for automatic content extraction, and flexible score expressions in ShapedQL for adaptive blending—all in one engine.

If you’re building personalized recommendations and can’t handle cold start, you need hybrid filtering.

Ready to build your own Discover Weekly? Sign up for Shaped and get $100 in free credits. Visit console.shaped.ai/register to get started.

Want us to walk you through it?

Book a 30-min session with an engineer who can apply this to your specific stack.

Book a demo →

How to Build Spotify's "Discover Weekly": The Hybrid Filtering Playbook

Quick Answer: Collaborative Signals + Content Understanding

Table of Contents

The Cold Start Problem

The Dilemma

What Discover Weekly Does

Why Pure Collaborative Filtering Fails

How ELSA Works

The Cold Start Failure

Other Collaborative Filtering Limitations

Why Pure Content-Based Fails

How It Works

Why This Fails for Discovery

Part 1: Traditional Hybrid Approach

Architecture

Implementation

Problems with This Approach

Part 2: The Shaped Way — ELSA + AI Enrichment

Architecture

Implementation

Handling Cold Start

Building the System

Full Workflow

Score Ensemble Strategies

Strategy 1: Adaptive blending by track age

Strategy 2: Boost by popularity for mainstream users

Strategy 3: Penalize over-represented genres

Comparison: Traditional vs. Shaped

FAQ

Conclusion

Want us to walk you through it?

Related Articles

Further Reading