How to Build Spotify's "Discover Weekly": The Hybrid Filtering Playbook

Learn how to build personalized music recommendations like Spotify's Discover Weekly by blending collaborative filtering (ELSA) with content enrichment (genre/mood extraction), combining multiple signals in ShapedQL for hybrid recommendations.

How to Build Spotify's "Discover Weekly": The Hybrid Filtering Playbook

Quick Answer: Collaborative Signals + Content Understanding

Spotify’s Discover Weekly works because it combines two different types of intelligence:

  1. Collaborative filtering (ELSA): “Users who listened to these tracks also listened to…”
  2. Content understanding: Genre, mood, tempo, instrumentation extracted from track metadata

Neither works well alone:

  • Collaborative filtering alone can’t recommend new releases (no listening history yet)
  • Content-based filtering alone misses the subtle taste patterns that make recommendations feel magical

Hybrid filtering blends both signals. The result: personalized recommendations that work for new releases (using content features) and surface hidden gems based on taste patterns (using collaborative signals).

Key Takeaways:

  • Pure collaborative filtering fails for new items — no listening history = can’t recommend
  • Pure content-based fails for discovery — matching genres is too obvious, not surprising
  • Hybrid blending solves both — use ELSA for taste patterns, content features for cold-start
  • AI enrichment extracts content — LLMs pull genre/mood/vibe from track metadata automatically
  • Score ensembles combine signals — blend collaborative + content scores in ShapedQL

Time to read: 24 minutes | Includes: 8 code examples, 2 architectures, 1 comparison table

Table of Contents

  1. The Cold Start Problem
  2. Why Pure Collaborative Filtering Fails
  3. Why Pure Content-Based Fails
  4. Part 1: Traditional Hybrid Approach
  5. Part 2: The Shaped Way — ELSA + AI Enrichment
  6. Building the System
  7. Score Ensemble Strategies
  8. Comparison Table
  9. FAQ

The Cold Start Problem

You’re building a music recommendation system. A new track drops today. How do you recommend it?

The Dilemma

Collaborative filtering says: “I can’t recommend this track—no one has listened to it yet, so I don’t know who would like it.”

Content-based filtering says: “This track is tagged as ‘indie rock’. I’ll recommend it to users who listen to indie rock.”

But “indie rock” is too broad. Within that genre:

  • Arctic Monkeys sounds nothing like Tame Impala
  • The Strokes sounds nothing like Radiohead
  • Phoenix sounds nothing like The National

Genre tags alone don’t capture vibe, energy, instrumentation, or mood.

What Discover Weekly Does

Spotify blends:

  1. Collaborative signals — “Users who liked Tame Impala also liked Unknown Mortal Orchestra”
  2. Audio features — Tempo, energy, danceability, valence (mood), acousticness
  3. Text features — Genre tags, similar artists, playlist co-occurrences

When a new track drops, Spotify:

  1. Extracts audio features (tempo, energy, mood) from the audio file
  2. Finds tracks with similar audio features
  3. Uses collaborative patterns from those similar tracks to recommend the new release

Result: New tracks get recommended to the right users immediately, even with zero listening history.

Why Pure Collaborative Filtering Fails

Collaborative filtering learns user taste from interaction history: “Users who listened to X also listened to Y.”

How ELSA Works

ELSA (Efficient Latent Sparse Autoencoder) learns item-item relationships by reconstructing user listening vectors.

If many users listen to both Track A and Track B, ELSA learns they’re similar. If User 123 listened to Track A but not Track B, ELSA recommends Track B.

Training data:

User 1: [Track A, Track C, Track E]
User 2: [Track A, Track B, Track D]
User 3: [Track B, Track C, Track F]

ELSA learns:

  • Track A and Track B co-occur → similar
  • Track B and Track C co-occur → similar
  • Track A → Track B is a strong recommendation

The Cold Start Failure

New track drops: Track Z (released today)

Interaction history: Zero listens

ELSA says: “I have no data on Track Z. I can’t compute similarity to any other track. I can’t recommend it.”

Result: New releases don’t get recommended until they accumulate listening history. By the time ELSA can recommend them, they’re no longer new.

Other Collaborative Filtering Limitations

  1. Popularity bias: ELSA over-recommends popular tracks (they appear in many user histories)
  2. Echo chamber: ELSA reinforces existing taste, doesn’t help users discover new genres
  3. Sparse user problem: Users with few listens get poor recommendations (not enough history)
LimitationWhat HappensImpact
Cold startNew releases have zero listening historyNot recommended until they go viral
Popularity biasPopular tracks appear in more user historiesTop 1% of catalog gets 80% of recommendations
Echo chamberReinforces existing tasteUsers never discover new genres
Sparse usersFew listens = not enough historyPoor recommendations for new users

Why Pure Content-Based Fails

Content-based filtering recommends items similar to what the user already likes, based on features like genre, artist, tempo, or mood.

How It Works

  1. User listens to Track A (indie rock, 120 BPM, high energy)
  2. Find tracks with similar features (indie rock, 115-125 BPM, high energy)
  3. Recommend the most similar tracks
# content_similarity.py
# User's recent listens
user_tracks = [
    {"title": "Do I Wanna Know?", "genre": "indie rock", "tempo": 85, "energy": 0.8},
    {"title": "R U Mine?", "genre": "indie rock", "tempo": 89, "energy": 0.9}
]

# Candidate track
candidate = {"title": "Feel It Still", "genre": "indie rock", "tempo": 79, "energy": 0.7}

# Similarity score (cosine similarity on features)
similarity = cosine_similarity(user_avg_features, candidate_features)
# → High similarity, recommend it

Why This Fails for Discovery

Problem 1: Too obvious

If you only recommend tracks with similar features, you’re showing users what they already know.

User listens to Arctic Monkeys → Recommend The Strokes, The Libertines, Franz Ferdinand

These are obvious recommendations. Not surprising. Not delightful.

Problem 2: Feature mismatch

Genre and tempo don’t capture vibe.

“Indie rock, 120 BPM, high energy” could describe:

  • Arcade Fire - “Ready to Start” (anthemic, orchestral)
  • MGMT - “Electric Feel” (psychedelic, synth-heavy)
  • Yeah Yeah Yeahs - “Heads Will Roll” (post-punk, danceable)

All have the same metadata but completely different vibes.

Problem 3: No taste learning

Content-based filtering doesn’t learn user taste. It only matches features.

If a user loves Radiohead but hates Muse (both tagged “alternative rock”), content-based can’t tell the difference. It recommends both.

Part 1: Traditional Hybrid Approach

The traditional approach combines collaborative and content-based models as separate systems, then blends their outputs with a weighted average.

Architecture

Collaborative
User listening history
ALS / Matrix Factorization
Top 100 candidates
(collaborative score)
+
Content-Based
Track metadata
TF-IDF + cosine similarity
Top 100 candidates
(content score)
Weighted Blend
final_score = 0.7 × collab_score + 0.3 × content_score

→ Rank by final_score → Top 20 recommendations

Implementation

Step 1: Train collaborative filtering model

# train_collaborative.py
from implicit.als import AlternatingLeastSquares
import scipy.sparse as sp

# Build user-item interaction matrix
# rows = users, cols = tracks, values = play count
interaction_matrix = sp.csr_matrix(...)

# Train ALS model
model = AlternatingLeastSquares(factors=64, iterations=15)
model.fit(interaction_matrix)

# Save model
model.save('als_model.pkl')

Step 2: Build content-based features

# build_content_features.py
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

# Load track metadata
tracks = pd.read_csv('tracks.csv')
# Columns: track_id, title, artist, genre, tempo, energy, valence

# Create text features from metadata
tracks['content_text'] = (
    tracks['genre'] + ' ' +
    tracks['artist'] + ' ' +
    'tempo_' + tracks['tempo'].astype(str) + ' ' +
    'energy_' + (tracks['energy'] * 10).astype(int).astype(str)
)

# TF-IDF vectorization
vectorizer = TfidfVectorizer(max_features=500)
content_vectors = vectorizer.fit_transform(tracks['content_text'])

Step 3: Generate recommendations by blending scores

# hybrid_recommender.py
def get_hybrid_recommendations(user_id, n=20):
    # Get collaborative filtering candidates
    user_vector = interaction_matrix[user_id]
    collab_scores = als_model.recommend(user_id, user_vector, N=100, filter_already_liked_items=True)

    # Get content-based candidates (based on user's recent listens)
    user_recent_tracks = get_user_recent_tracks(user_id, n=10)
    user_content_vector = np.mean(content_vectors[user_recent_tracks], axis=0)

    content_scores = {}
    for track_id in range(len(content_vectors)):
        if track_id in user_recent_tracks:
            continue  # Skip already listened
        similarity = cosine_similarity([user_content_vector], [content_vectors[track_id]])[0][0]
        content_scores[track_id] = similarity

    # Merge and blend scores
    collab_dict = {track_id: score for track_id, score in collab_scores}
    content_dict = {track_id: score for track_id, score in sorted(content_scores.items(), key=lambda x: x[1], reverse=True)[:100]}

    all_tracks = set(collab_dict.keys()) | set(content_dict.keys())

    blended_scores = {}
    for track_id in all_tracks:
        collab_score = collab_dict.get(track_id, 0)
        content_score = content_dict.get(track_id, 0)
        # Weighted blend: 70% collaborative, 30% content
        blended_scores[track_id] = 0.7 * collab_score + 0.3 * content_score

    return sorted(blended_scores.items(), key=lambda x: x[1], reverse=True)[:n]

Problems with This Approach

ProblemWhy It Hurts
Two separate models to train and maintainALS for collaborative, TF-IDF for content — double the infrastructure
Manual feature engineeringYou create “tempo_85 energy_8” text features by hand
Crude blendingWeighted average doesn’t adapt to context (new releases vs. catalogue)
No unified scoringCan’t easily add a third signal (popularity, recency, user context)
Scaling issuesContent-based cosine similarity is O(n²) for n tracks

Part 2: The Shaped Way — ELSA + AI Enrichment

Shaped unifies collaborative filtering (ELSA) and content understanding (AI enrichment) in a single engine, with flexible score blending in ShapedQL.

Architecture

Collaborative stream
User listening history
ELSA model training
ELSA embeddings
Content stream
Track metadata (title, artist, album)
AI Enrichment View
(LLM extracts genre, mood, vibe, instruments)
Content embeddings
(text-embedding-3-small)
ShapedQL Query
Retrieve via ELSA similarity
Score ensemble: 0.7 × ELSA + 0.3 × content
Return top 20
Personalized playlist — works for new releases and hidden gems

Key difference: Content features are extracted automatically by LLMs (not manual), and blending happens in the query (not in application code).

Implementation

Step 1: Create AI enrichment view

# views/track_enrichment.yaml
version: v2
name: track_enrichment
view_type: AI_ENRICHMENT
source_table: tracks
enrichment:
  prompt: |
    Given this music track:
    Title: {title}
    Artist: {artist}
    Album: {album}

    Extract the following as JSON:
    {
      "genre": "primary genre (e.g., indie rock, electronic, hip-hop)",
      "subgenre": "more specific subgenre",
      "mood": "overall mood (e.g., melancholic, energetic, chill)",
      "vibe": "vibe or feeling (e.g., dreamy, aggressive, uplifting)",
      "instruments": "primary instruments (e.g., guitar, synth, drums)",
      "era": "musical era or decade influence"
    }

  output_columns:
    - name: genre
      type: STRING
    - name: subgenre
      type: STRING
    - name: mood
      type: STRING
    - name: vibe
      type: STRING
    - name: instruments
      type: STRING
    - name: era
      type: STRING

This view runs an LLM over each track and extracts structured content features automatically.

Example enrichment output:

track_idtitleartistgenremoodvibeinstruments
track_1Do I Wanna Know?Arctic Monkeysindie rockbroodingdark, sultryguitar, bass, drums
track_2Electric FeelMGMTpsychedelic popeuphorictrippy, groovysynth, bass, vocals
track_3Feel It StillPortugal. The Manindie popupbeatretro, funkyguitar, bass, keys

Step 2: Configure engine with ELSA + content embeddings

# engines/music_recommendations.yaml
version: v2
name: music_recommendations

data:
  item_table:
    name: tracks
  interaction_table:
    name: user_listens
  views:
    - name: track_enrichment  # AI-enriched content

index:
  # Collaborative filtering: ELSA embeddings
  - name: elsa_embedding
    encoder:
      name: elsa
      type: trained
    columns:
      - user_id
      - track_id
      - listen_count

  # Content-based: Text embeddings on enriched features
  - name: content_embedding
    encoder:
      name: text-embedding-3-small
      provider: openai
    columns:
      - track_enrichment.genre
      - track_enrichment.mood
      - track_enrichment.vibe
      - track_enrichment.instruments

training:
  models:
    - name: collaborative_ranker
      policy_type: elsa
      ranking_expression: |
        1.0 / (1.0 + rank(embedding="elsa_embedding"))

Shaped trains:

  1. ELSA on user listening history (collaborative filtering)
  2. Text embeddings on AI-enriched content features (genre, mood, vibe)

Both are stored in the same engine.

Step 3: Query with hybrid score ensemble

# app.py
import requests

SHAPED_API_KEY = "your-api-key"

def get_discover_weekly(user_id: str, limit: int = 20):
    """
    Hybrid recommendations: blend ELSA (collaborative) + content similarity
    """
    response = requests.post(
        "https://api.shaped.ai/v2/engines/music_recommendations/query",
        headers={"x-api-key": SHAPED_API_KEY},
        json={
            "query": """
                SELECT *
                FROM similarity(
                    embedding_ref='elsa_embedding',
                    encoder='interaction_pooling',
                    limit=500
                )
                WHERE track_id NOT IN $already_listened
                ORDER BY score(
                    expression='
                        0.7 / (1.0 + rank(embedding="elsa_embedding"))
                        + 0.3 / (1.0 + rank(embedding="content_embedding"))
                    '
                )
                LIMIT $limit
            """,
            "parameters": {
                "user_id": user_id,
                "already_listened": get_user_history(user_id),
                "limit": limit
            },
            "return_metadata": True
        }
    )

    return response.json()['results']

What’s happening:

  1. Retrieve: similarity(embedding_ref='elsa_embedding') retrieves 500 candidates based on collaborative filtering (ELSA)
  2. Filter: WHERE track_id NOT IN $already_listened removes tracks the user already heard
  3. Score: Blend ELSA rank (70%) + content similarity rank (30%)
  4. Return: Top 20 tracks

Handling Cold Start

For new releases with zero listening history, adjust the blend:

# app.py
def get_new_release_recommendations(user_id: str, limit: int = 20):
    """
    For new releases: prioritize content similarity (ELSA has no data)
    """
    response = requests.post(
        "https://api.shaped.ai/v2/engines/music_recommendations/query",
        headers={"x-api-key": SHAPED_API_KEY},
        json={
            "query": """
                SELECT *
                FROM similarity(
                    embedding_ref='content_embedding',
                    encoder='interaction_pooling',
                    limit=500
                )
                WHERE days_since_release <= 7
                  AND track_id NOT IN $already_listened
                ORDER BY score(
                    expression='
                        0.2 / (1.0 + rank(embedding="elsa_embedding"))
                        + 0.8 / (1.0 + rank(embedding="content_embedding"))
                    '
                )
                LIMIT $limit
            """,
            "parameters": {
                "user_id": user_id,
                "already_listened": get_user_history(user_id),
                "limit": limit
            }
        }
    )

    return response.json()['results']

For new releases:

  • Retrieve via content_embedding (not ELSA, since new tracks have no collaborative data)
  • Blend: 20% ELSA + 80% content (rely more on content features)

Building the System

Full Workflow

1. Ingest data

# tables/tracks.yaml
version: v2
name: tracks
connector:
  type: postgres
  connection_string: $DATABASE_URL
  table: tracks
schema:
  - name: track_id
    type: STRING
  - name: title
    type: STRING
  - name: artist
    type: STRING
  - name: album
    type: STRING
  - name: release_date
    type: TIMESTAMP
# tables/user_listens.yaml
version: v2
name: user_listens
connector:
  type: postgres
  connection_string: $DATABASE_URL
  table: user_listens
schema:
  - name: user_id
    type: STRING
  - name: track_id
    type: STRING
  - name: listen_count
    type: INTEGER
  - name: last_listened_at
    type: TIMESTAMP

2. Create AI enrichment view

# terminal
shaped create-view --file views/track_enrichment.yaml

Shaped runs the LLM enrichment and materializes results.

3. Train engine

# terminal
shaped create-engine --file engines/music_recommendations.yaml

Shaped:

  • Trains ELSA on user_listens (collaborative filtering)
  • Generates text embeddings on enriched content from track_enrichment
  • Builds vector indexes for both

4. Query for recommendations

# app.py
recommendations = get_discover_weekly(user_id="user_12345", limit=20)

Score Ensemble Strategies

You’re not limited to 70/30 blends. ShapedQL supports flexible scoring expressions.

Strategy 1: Adaptive blending by track age

-- adaptive_blend.sql
ORDER BY score(
    expression='
        CASE
            WHEN days_since_release < 7 THEN
                -- New releases: prioritize content
                0.3 / (1.0 + rank(embedding="elsa_embedding"))
                + 0.7 / (1.0 + rank(embedding="content_embedding"))
            ELSE
                -- Older tracks: prioritize collaborative
                0.7 / (1.0 + rank(embedding="elsa_embedding"))
                + 0.3 / (1.0 + rank(embedding="content_embedding"))
        END
    '
)

Strategy 2: Boost by popularity for mainstream users

-- popularity_boost.sql
ORDER BY score(
    expression='
        0.6 / (1.0 + rank(embedding="elsa_embedding"))
        + 0.3 / (1.0 + rank(embedding="content_embedding"))
        + 0.1 / (1.0 + item._derived_popular_rank)
    '
)

Strategy 3: Penalize over-represented genres

-- genre_diversity.sql
ORDER BY score(
    expression='
        (0.7 / (1.0 + rank(embedding="elsa_embedding"))
        + 0.3 / (1.0 + rank(embedding="content_embedding")))
        * (1.0 - 0.2 * user_genre_saturation)
    '
)

Where user_genre_saturation is a computed feature: “what % of this user’s recent listens are in the same genre as this track?”

Comparison: Traditional vs. Shaped

ComponentTraditional HybridShaped Hybrid
Collaborative modelALS / Matrix Factorization
(separate system)
ELSA
(built-in, trained automatically)
Content featuresManual feature engineering
(TF-IDF on “genre tempo energy”)
AI enrichment
(LLM extracts genre, mood, vibe)
Content modelCosine similarity on TF-IDF vectorsText embeddings (OpenAI, Cohere, etc.)
Blending logicHardcoded in application
(0.7 * a + 0.3 * b)
ShapedQL score expressions
(flexible, query-time)
Cold start handlingSeparate code path for new itemsAdjust blend weights in query
(CASE WHEN days_since_release < 7)
Adaptive scoringRequires application logic rewriteChange score expression in query
Code to maintain~500 lines
(model training + feature engineering + blending)
~50 lines
(YAML config + query)
ScalabilityManual sharding, caching, model servingAutomatic (Shaped handles indexing, serving, scaling)

FAQ

Q: Why not just use collaborative filtering (ELSA)?

A: ELSA fails for new releases (no listening history) and doesn’t capture content similarity (can’t recommend based on mood/vibe). Pure collaborative also creates echo chambers—users only see more of what they already like.

Q: Why not just use content-based filtering?

A: Content features (genre, tempo) are too crude. Many tracks have the same genre but completely different vibes. Content-based also doesn’t learn user taste—it just matches features. You miss the collaborative signal: “users who liked X also liked Y.”

Q: How does AI enrichment help?

A: LLMs extract nuanced features (mood, vibe, instrumentation) that aren’t in your metadata. Instead of “indie rock”, you get “brooding, guitar-driven, dark, sultry.” These richer features power better content-based recommendations.

Q: What if I don’t have track metadata?

A: Use audio features (if available): tempo, energy, danceability, valence. Or use multimodal AI enrichment on album art. Or rely purely on ELSA (collaborative) and accept that new releases won’t recommend well until they have listening history.

Q: How do I tune the blend weights (70% ELSA, 30% content)?

A: Run A/B tests. Try 60/40, 70/30, 80/20 and measure offline metrics (NDCG@10) and online metrics (listen-through rate, skip rate). The optimal blend depends on your data and user behavior.

Q: Can I blend more than 2 signals?

A: Yes. ShapedQL supports arbitrary scoring expressions. You can blend ELSA + content + popularity + recency + user context:

-- multi_signal_blend.sql
ORDER BY score(
    expression='
        0.5 / (1.0 + rank(embedding="elsa_embedding"))
        + 0.2 / (1.0 + rank(embedding="content_embedding"))
        + 0.1 / (1.0 + item._derived_popular_rank)
        + 0.1 / (1.0 + days_since_release)
        + 0.1 * user_affinity_score
    '
)

Q: What about Spotify’s actual implementation?

A: Spotify uses a combination of collaborative filtering, audio feature analysis (via their own audio models), NLP on playlist names and descriptions, and user context (time of day, device, activity). They don’t publish exact details, but the principles are the same: hybrid filtering with multiple signals.

Conclusion

Discover Weekly works because it blends collaborative filtering (learning taste from listening history) with content understanding (genre, mood, vibe extracted from metadata).

Pure collaborative filtering fails for new releases—no listening history means no recommendations. Pure content-based filtering is too obvious—matching genres doesn’t capture the subtle taste patterns that make recommendations magical.

Hybrid filtering solves both: use ELSA for collaborative signals, AI enrichment for content features, and ShapedQL score ensembles to blend them adaptively.

The traditional approach requires two separate models (ALS + TF-IDF), manual feature engineering, and hardcoded blending logic. Shaped unifies everything: ELSA for collaborative filtering, AI enrichment for automatic content extraction, and flexible score expressions in ShapedQL for adaptive blending—all in one engine.

If you’re building personalized recommendations and can’t handle cold start, you need hybrid filtering.

Ready to build your own Discover Weekly? Sign up for Shaped and get $100 in free credits. Visit console.shaped.ai/register to get started.

Want us to walk you through it?

Book a 30-min session with an engineer who can apply this to your specific stack.

Book a demo →

Related Posts

$1.9M Funding Round
Apr 27, 2022
 | 
2

$1.9M Funding Round

10 Best Practices in Data Ingestion: A Scalable Framework for Real-Time, Reliable Pipelines
Jun 11, 2025
 | 
9

10 Best Practices in Data Ingestion: A Scalable Framework for Real-Time, Reliable Pipelines

5 Best APIs for Adding Personalized Recommendations to Your App in 2025
Aug 19, 2025
 | 
4

5 Best APIs for Adding Personalized Recommendations to Your App in 2025