Quick Answer: Collaborative Signals + Content Understanding
Spotify’s Discover Weekly works because it combines two different types of intelligence:
- Collaborative filtering (ELSA): “Users who listened to these tracks also listened to…”
- Content understanding: Genre, mood, tempo, instrumentation extracted from track metadata
Neither works well alone:
- Collaborative filtering alone can’t recommend new releases (no listening history yet)
- Content-based filtering alone misses the subtle taste patterns that make recommendations feel magical
Hybrid filtering blends both signals. The result: personalized recommendations that work for new releases (using content features) and surface hidden gems based on taste patterns (using collaborative signals).
Key Takeaways:
- Pure collaborative filtering fails for new items — no listening history = can’t recommend
- Pure content-based fails for discovery — matching genres is too obvious, not surprising
- Hybrid blending solves both — use ELSA for taste patterns, content features for cold-start
- AI enrichment extracts content — LLMs pull genre/mood/vibe from track metadata automatically
- Score ensembles combine signals — blend collaborative + content scores in ShapedQL
Time to read: 24 minutes | Includes: 8 code examples, 2 architectures, 1 comparison table
Table of Contents
- The Cold Start Problem
- Why Pure Collaborative Filtering Fails
- Why Pure Content-Based Fails
- Part 1: Traditional Hybrid Approach
- Part 2: The Shaped Way — ELSA + AI Enrichment
- Building the System
- Score Ensemble Strategies
- Comparison Table
- FAQ
The Cold Start Problem
You’re building a music recommendation system. A new track drops today. How do you recommend it?
The Dilemma
Collaborative filtering says: “I can’t recommend this track—no one has listened to it yet, so I don’t know who would like it.”
Content-based filtering says: “This track is tagged as ‘indie rock’. I’ll recommend it to users who listen to indie rock.”
But “indie rock” is too broad. Within that genre:
- Arctic Monkeys sounds nothing like Tame Impala
- The Strokes sounds nothing like Radiohead
- Phoenix sounds nothing like The National
Genre tags alone don’t capture vibe, energy, instrumentation, or mood.
What Discover Weekly Does
Spotify blends:
- Collaborative signals — “Users who liked Tame Impala also liked Unknown Mortal Orchestra”
- Audio features — Tempo, energy, danceability, valence (mood), acousticness
- Text features — Genre tags, similar artists, playlist co-occurrences
When a new track drops, Spotify:
- Extracts audio features (tempo, energy, mood) from the audio file
- Finds tracks with similar audio features
- Uses collaborative patterns from those similar tracks to recommend the new release
Result: New tracks get recommended to the right users immediately, even with zero listening history.
Why Pure Collaborative Filtering Fails
Collaborative filtering learns user taste from interaction history: “Users who listened to X also listened to Y.”
How ELSA Works
ELSA (Efficient Latent Sparse Autoencoder) learns item-item relationships by reconstructing user listening vectors.
If many users listen to both Track A and Track B, ELSA learns they’re similar. If User 123 listened to Track A but not Track B, ELSA recommends Track B.
Training data:
User 1: [Track A, Track C, Track E]
User 2: [Track A, Track B, Track D]
User 3: [Track B, Track C, Track F]
ELSA learns:
- Track A and Track B co-occur → similar
- Track B and Track C co-occur → similar
- Track A → Track B is a strong recommendation
The Cold Start Failure
New track drops: Track Z (released today)
Interaction history: Zero listens
ELSA says: “I have no data on Track Z. I can’t compute similarity to any other track. I can’t recommend it.”
Result: New releases don’t get recommended until they accumulate listening history. By the time ELSA can recommend them, they’re no longer new.
Other Collaborative Filtering Limitations
- Popularity bias: ELSA over-recommends popular tracks (they appear in many user histories)
- Echo chamber: ELSA reinforces existing taste, doesn’t help users discover new genres
- Sparse user problem: Users with few listens get poor recommendations (not enough history)
| Limitation | What Happens | Impact |
|---|---|---|
| Cold start | New releases have zero listening history | Not recommended until they go viral |
| Popularity bias | Popular tracks appear in more user histories | Top 1% of catalog gets 80% of recommendations |
| Echo chamber | Reinforces existing taste | Users never discover new genres |
| Sparse users | Few listens = not enough history | Poor recommendations for new users |
Why Pure Content-Based Fails
Content-based filtering recommends items similar to what the user already likes, based on features like genre, artist, tempo, or mood.
How It Works
- User listens to Track A (indie rock, 120 BPM, high energy)
- Find tracks with similar features (indie rock, 115-125 BPM, high energy)
- Recommend the most similar tracks
# content_similarity.py
# User's recent listens
user_tracks = [
{"title": "Do I Wanna Know?", "genre": "indie rock", "tempo": 85, "energy": 0.8},
{"title": "R U Mine?", "genre": "indie rock", "tempo": 89, "energy": 0.9}
]
# Candidate track
candidate = {"title": "Feel It Still", "genre": "indie rock", "tempo": 79, "energy": 0.7}
# Similarity score (cosine similarity on features)
similarity = cosine_similarity(user_avg_features, candidate_features)
# → High similarity, recommend it
Why This Fails for Discovery
Problem 1: Too obvious
If you only recommend tracks with similar features, you’re showing users what they already know.
User listens to Arctic Monkeys → Recommend The Strokes, The Libertines, Franz Ferdinand
These are obvious recommendations. Not surprising. Not delightful.
Problem 2: Feature mismatch
Genre and tempo don’t capture vibe.
“Indie rock, 120 BPM, high energy” could describe:
- Arcade Fire - “Ready to Start” (anthemic, orchestral)
- MGMT - “Electric Feel” (psychedelic, synth-heavy)
- Yeah Yeah Yeahs - “Heads Will Roll” (post-punk, danceable)
All have the same metadata but completely different vibes.
Problem 3: No taste learning
Content-based filtering doesn’t learn user taste. It only matches features.
If a user loves Radiohead but hates Muse (both tagged “alternative rock”), content-based can’t tell the difference. It recommends both.
Part 1: Traditional Hybrid Approach
The traditional approach combines collaborative and content-based models as separate systems, then blends their outputs with a weighted average.
Architecture
(collaborative score)
(content score)
→ Rank by final_score → Top 20 recommendations
Implementation
Step 1: Train collaborative filtering model
# train_collaborative.py
from implicit.als import AlternatingLeastSquares
import scipy.sparse as sp
# Build user-item interaction matrix
# rows = users, cols = tracks, values = play count
interaction_matrix = sp.csr_matrix(...)
# Train ALS model
model = AlternatingLeastSquares(factors=64, iterations=15)
model.fit(interaction_matrix)
# Save model
model.save('als_model.pkl')
Step 2: Build content-based features
# build_content_features.py
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd
# Load track metadata
tracks = pd.read_csv('tracks.csv')
# Columns: track_id, title, artist, genre, tempo, energy, valence
# Create text features from metadata
tracks['content_text'] = (
tracks['genre'] + ' ' +
tracks['artist'] + ' ' +
'tempo_' + tracks['tempo'].astype(str) + ' ' +
'energy_' + (tracks['energy'] * 10).astype(int).astype(str)
)
# TF-IDF vectorization
vectorizer = TfidfVectorizer(max_features=500)
content_vectors = vectorizer.fit_transform(tracks['content_text'])
Step 3: Generate recommendations by blending scores
# hybrid_recommender.py
def get_hybrid_recommendations(user_id, n=20):
# Get collaborative filtering candidates
user_vector = interaction_matrix[user_id]
collab_scores = als_model.recommend(user_id, user_vector, N=100, filter_already_liked_items=True)
# Get content-based candidates (based on user's recent listens)
user_recent_tracks = get_user_recent_tracks(user_id, n=10)
user_content_vector = np.mean(content_vectors[user_recent_tracks], axis=0)
content_scores = {}
for track_id in range(len(content_vectors)):
if track_id in user_recent_tracks:
continue # Skip already listened
similarity = cosine_similarity([user_content_vector], [content_vectors[track_id]])[0][0]
content_scores[track_id] = similarity
# Merge and blend scores
collab_dict = {track_id: score for track_id, score in collab_scores}
content_dict = {track_id: score for track_id, score in sorted(content_scores.items(), key=lambda x: x[1], reverse=True)[:100]}
all_tracks = set(collab_dict.keys()) | set(content_dict.keys())
blended_scores = {}
for track_id in all_tracks:
collab_score = collab_dict.get(track_id, 0)
content_score = content_dict.get(track_id, 0)
# Weighted blend: 70% collaborative, 30% content
blended_scores[track_id] = 0.7 * collab_score + 0.3 * content_score
return sorted(blended_scores.items(), key=lambda x: x[1], reverse=True)[:n]
Problems with This Approach
| Problem | Why It Hurts |
|---|---|
| Two separate models to train and maintain | ALS for collaborative, TF-IDF for content — double the infrastructure |
| Manual feature engineering | You create “tempo_85 energy_8” text features by hand |
| Crude blending | Weighted average doesn’t adapt to context (new releases vs. catalogue) |
| No unified scoring | Can’t easily add a third signal (popularity, recency, user context) |
| Scaling issues | Content-based cosine similarity is O(n²) for n tracks |
Part 2: The Shaped Way — ELSA + AI Enrichment
Shaped unifies collaborative filtering (ELSA) and content understanding (AI enrichment) in a single engine, with flexible score blending in ShapedQL.
Architecture
(LLM extracts genre, mood, vibe, instruments)
(text-embedding-3-small)
Key difference: Content features are extracted automatically by LLMs (not manual), and blending happens in the query (not in application code).
Implementation
Step 1: Create AI enrichment view
# views/track_enrichment.yaml
version: v2
name: track_enrichment
view_type: AI_ENRICHMENT
source_table: tracks
enrichment:
prompt: |
Given this music track:
Title: {title}
Artist: {artist}
Album: {album}
Extract the following as JSON:
{
"genre": "primary genre (e.g., indie rock, electronic, hip-hop)",
"subgenre": "more specific subgenre",
"mood": "overall mood (e.g., melancholic, energetic, chill)",
"vibe": "vibe or feeling (e.g., dreamy, aggressive, uplifting)",
"instruments": "primary instruments (e.g., guitar, synth, drums)",
"era": "musical era or decade influence"
}
output_columns:
- name: genre
type: STRING
- name: subgenre
type: STRING
- name: mood
type: STRING
- name: vibe
type: STRING
- name: instruments
type: STRING
- name: era
type: STRING
This view runs an LLM over each track and extracts structured content features automatically.
Example enrichment output:
| track_id | title | artist | genre | mood | vibe | instruments |
|---|---|---|---|---|---|---|
| track_1 | Do I Wanna Know? | Arctic Monkeys | indie rock | brooding | dark, sultry | guitar, bass, drums |
| track_2 | Electric Feel | MGMT | psychedelic pop | euphoric | trippy, groovy | synth, bass, vocals |
| track_3 | Feel It Still | Portugal. The Man | indie pop | upbeat | retro, funky | guitar, bass, keys |
Step 2: Configure engine with ELSA + content embeddings
# engines/music_recommendations.yaml
version: v2
name: music_recommendations
data:
item_table:
name: tracks
interaction_table:
name: user_listens
views:
- name: track_enrichment # AI-enriched content
index:
# Collaborative filtering: ELSA embeddings
- name: elsa_embedding
encoder:
name: elsa
type: trained
columns:
- user_id
- track_id
- listen_count
# Content-based: Text embeddings on enriched features
- name: content_embedding
encoder:
name: text-embedding-3-small
provider: openai
columns:
- track_enrichment.genre
- track_enrichment.mood
- track_enrichment.vibe
- track_enrichment.instruments
training:
models:
- name: collaborative_ranker
policy_type: elsa
ranking_expression: |
1.0 / (1.0 + rank(embedding="elsa_embedding"))
Shaped trains:
- ELSA on user listening history (collaborative filtering)
- Text embeddings on AI-enriched content features (genre, mood, vibe)
Both are stored in the same engine.
Step 3: Query with hybrid score ensemble
# app.py
import requests
SHAPED_API_KEY = "your-api-key"
def get_discover_weekly(user_id: str, limit: int = 20):
"""
Hybrid recommendations: blend ELSA (collaborative) + content similarity
"""
response = requests.post(
"https://api.shaped.ai/v2/engines/music_recommendations/query",
headers={"x-api-key": SHAPED_API_KEY},
json={
"query": """
SELECT *
FROM similarity(
embedding_ref='elsa_embedding',
encoder='interaction_pooling',
limit=500
)
WHERE track_id NOT IN $already_listened
ORDER BY score(
expression='
0.7 / (1.0 + rank(embedding="elsa_embedding"))
+ 0.3 / (1.0 + rank(embedding="content_embedding"))
'
)
LIMIT $limit
""",
"parameters": {
"user_id": user_id,
"already_listened": get_user_history(user_id),
"limit": limit
},
"return_metadata": True
}
)
return response.json()['results']
What’s happening:
- Retrieve:
similarity(embedding_ref='elsa_embedding')retrieves 500 candidates based on collaborative filtering (ELSA) - Filter:
WHERE track_id NOT IN $already_listenedremoves tracks the user already heard - Score: Blend ELSA rank (70%) + content similarity rank (30%)
- Return: Top 20 tracks
Handling Cold Start
For new releases with zero listening history, adjust the blend:
# app.py
def get_new_release_recommendations(user_id: str, limit: int = 20):
"""
For new releases: prioritize content similarity (ELSA has no data)
"""
response = requests.post(
"https://api.shaped.ai/v2/engines/music_recommendations/query",
headers={"x-api-key": SHAPED_API_KEY},
json={
"query": """
SELECT *
FROM similarity(
embedding_ref='content_embedding',
encoder='interaction_pooling',
limit=500
)
WHERE days_since_release <= 7
AND track_id NOT IN $already_listened
ORDER BY score(
expression='
0.2 / (1.0 + rank(embedding="elsa_embedding"))
+ 0.8 / (1.0 + rank(embedding="content_embedding"))
'
)
LIMIT $limit
""",
"parameters": {
"user_id": user_id,
"already_listened": get_user_history(user_id),
"limit": limit
}
}
)
return response.json()['results']
For new releases:
- Retrieve via content_embedding (not ELSA, since new tracks have no collaborative data)
- Blend: 20% ELSA + 80% content (rely more on content features)
Building the System
Full Workflow
1. Ingest data
# tables/tracks.yaml
version: v2
name: tracks
connector:
type: postgres
connection_string: $DATABASE_URL
table: tracks
schema:
- name: track_id
type: STRING
- name: title
type: STRING
- name: artist
type: STRING
- name: album
type: STRING
- name: release_date
type: TIMESTAMP
# tables/user_listens.yaml
version: v2
name: user_listens
connector:
type: postgres
connection_string: $DATABASE_URL
table: user_listens
schema:
- name: user_id
type: STRING
- name: track_id
type: STRING
- name: listen_count
type: INTEGER
- name: last_listened_at
type: TIMESTAMP
2. Create AI enrichment view
# terminal
shaped create-view --file views/track_enrichment.yaml
Shaped runs the LLM enrichment and materializes results.
3. Train engine
# terminal
shaped create-engine --file engines/music_recommendations.yaml
Shaped:
- Trains ELSA on
user_listens(collaborative filtering) - Generates text embeddings on enriched content from
track_enrichment - Builds vector indexes for both
4. Query for recommendations
# app.py
recommendations = get_discover_weekly(user_id="user_12345", limit=20)
Score Ensemble Strategies
You’re not limited to 70/30 blends. ShapedQL supports flexible scoring expressions.
Strategy 1: Adaptive blending by track age
-- adaptive_blend.sql
ORDER BY score(
expression='
CASE
WHEN days_since_release < 7 THEN
-- New releases: prioritize content
0.3 / (1.0 + rank(embedding="elsa_embedding"))
+ 0.7 / (1.0 + rank(embedding="content_embedding"))
ELSE
-- Older tracks: prioritize collaborative
0.7 / (1.0 + rank(embedding="elsa_embedding"))
+ 0.3 / (1.0 + rank(embedding="content_embedding"))
END
'
)
Strategy 2: Boost by popularity for mainstream users
-- popularity_boost.sql
ORDER BY score(
expression='
0.6 / (1.0 + rank(embedding="elsa_embedding"))
+ 0.3 / (1.0 + rank(embedding="content_embedding"))
+ 0.1 / (1.0 + item._derived_popular_rank)
'
)
Strategy 3: Penalize over-represented genres
-- genre_diversity.sql
ORDER BY score(
expression='
(0.7 / (1.0 + rank(embedding="elsa_embedding"))
+ 0.3 / (1.0 + rank(embedding="content_embedding")))
* (1.0 - 0.2 * user_genre_saturation)
'
)
Where user_genre_saturation is a computed feature: “what % of this user’s recent listens are in the same genre as this track?”
Comparison: Traditional vs. Shaped
| Component | Traditional Hybrid | Shaped Hybrid |
|---|---|---|
| Collaborative model | ALS / Matrix Factorization (separate system) | ELSA (built-in, trained automatically) |
| Content features | Manual feature engineering (TF-IDF on “genre tempo energy”) | AI enrichment (LLM extracts genre, mood, vibe) |
| Content model | Cosine similarity on TF-IDF vectors | Text embeddings (OpenAI, Cohere, etc.) |
| Blending logic | Hardcoded in application ( 0.7 * a + 0.3 * b) | ShapedQL score expressions (flexible, query-time) |
| Cold start handling | Separate code path for new items | Adjust blend weights in query ( CASE WHEN days_since_release < 7) |
| Adaptive scoring | Requires application logic rewrite | Change score expression in query |
| Code to maintain | ~500 lines (model training + feature engineering + blending) | ~50 lines (YAML config + query) |
| Scalability | Manual sharding, caching, model serving | Automatic (Shaped handles indexing, serving, scaling) |
FAQ
Q: Why not just use collaborative filtering (ELSA)?
A: ELSA fails for new releases (no listening history) and doesn’t capture content similarity (can’t recommend based on mood/vibe). Pure collaborative also creates echo chambers—users only see more of what they already like.
Q: Why not just use content-based filtering?
A: Content features (genre, tempo) are too crude. Many tracks have the same genre but completely different vibes. Content-based also doesn’t learn user taste—it just matches features. You miss the collaborative signal: “users who liked X also liked Y.”
Q: How does AI enrichment help?
A: LLMs extract nuanced features (mood, vibe, instrumentation) that aren’t in your metadata. Instead of “indie rock”, you get “brooding, guitar-driven, dark, sultry.” These richer features power better content-based recommendations.
Q: What if I don’t have track metadata?
A: Use audio features (if available): tempo, energy, danceability, valence. Or use multimodal AI enrichment on album art. Or rely purely on ELSA (collaborative) and accept that new releases won’t recommend well until they have listening history.
Q: How do I tune the blend weights (70% ELSA, 30% content)?
A: Run A/B tests. Try 60/40, 70/30, 80/20 and measure offline metrics (NDCG@10) and online metrics (listen-through rate, skip rate). The optimal blend depends on your data and user behavior.
Q: Can I blend more than 2 signals?
A: Yes. ShapedQL supports arbitrary scoring expressions. You can blend ELSA + content + popularity + recency + user context:
-- multi_signal_blend.sql
ORDER BY score(
expression='
0.5 / (1.0 + rank(embedding="elsa_embedding"))
+ 0.2 / (1.0 + rank(embedding="content_embedding"))
+ 0.1 / (1.0 + item._derived_popular_rank)
+ 0.1 / (1.0 + days_since_release)
+ 0.1 * user_affinity_score
'
)
Q: What about Spotify’s actual implementation?
A: Spotify uses a combination of collaborative filtering, audio feature analysis (via their own audio models), NLP on playlist names and descriptions, and user context (time of day, device, activity). They don’t publish exact details, but the principles are the same: hybrid filtering with multiple signals.
Conclusion
Discover Weekly works because it blends collaborative filtering (learning taste from listening history) with content understanding (genre, mood, vibe extracted from metadata).
Pure collaborative filtering fails for new releases—no listening history means no recommendations. Pure content-based filtering is too obvious—matching genres doesn’t capture the subtle taste patterns that make recommendations magical.
Hybrid filtering solves both: use ELSA for collaborative signals, AI enrichment for content features, and ShapedQL score ensembles to blend them adaptively.
The traditional approach requires two separate models (ALS + TF-IDF), manual feature engineering, and hardcoded blending logic. Shaped unifies everything: ELSA for collaborative filtering, AI enrichment for automatic content extraction, and flexible score expressions in ShapedQL for adaptive blending—all in one engine.
If you’re building personalized recommendations and can’t handle cold start, you need hybrid filtering.
Ready to build your own Discover Weekly? Sign up for Shaped and get $100 in free credits. Visit console.shaped.ai/register to get started.
Want us to walk you through it?
Book a 30-min session with an engineer who can apply this to your specific stack.