How I built a movie suggestion app with zero ML experience

In the last week, I've been building an app to generate real-time movie recommendations based on a user's activity. I chose this simple use case to learn more about how recommender systems work. 

Shaped makes building recommender systems like this easy for developers because it packages a complex and performant recommender system into three simple layers: 

  • A data layer to store your data or connect to a real-time source
  • An ML layer that indexes on your data and supports the latest recommender models and architectures
  • An API layer to interface with client applications and power real-time recommendations 

In this article, I will show you how I used Shaped to build a movie recommendation system. Click here to check out my final demo application.

Upload your datasets

Any machine learning system is only as good as the data it is trained on. 

For data, I started with a public dataset called movielens that is well-known in the machine learning industry. It contains 100,000 ratings of 9000 movies, ranging from the early 1900s to 2018. 

My suggestion system will be built with two data sources from movielens: 

- Movies: a catalog of 9000 movies

- Ratings: a list of user-generated ratings

The first step was to load this data into Shaped. This was a relatively easy process; movielens data is very clean so the only step was convert the data files to jsonl format. 

Shaped also supports automated import from systems like Postgres, MySQL, S3, Apache, and more. 

Enriching my dataset with semantic information

To give my model more to work with, I wrote a small Python script to get metadata from the IMDb API. This enrichment step is crucial to enable semantic search on my dataset. 

I added columns for description, cast, writers, etc, so my model can respond to searches like - “Movies written by Paul Thomas Anderson”. 

script.py
1 # Load movies from JSONL file
2 movies = []
3 with open('movies.jsonl', 'r') as f:
4   for line in f:
5     movies.append(json.loads(line))
6
7 # Process each movie with API enrichment
8 for i, movie in enumerate(movies):
9   imdb_id = movie.get('imdbId')
10   
11   try:
12     response = requests.get(url, headers=headers, timeout=30)
13     result = response.json()
14     
15     # Extract and process movie data
16     directors = result.get("directors", [])
17     directors_string = ','.join([d.get('fullName', '') for d in directors])
18     writers = result.get("writers", [])
19     writers_string = ','.join([x.get('fullName', '') for x in writers])
20     cast = result.get("cast", [])
21     cast_string = ','.join([x.get('fullName', '') for x in cast])
22     
23     # Update movie with enriched data
24     movies[i].update({
25       "description": result.get("description"),
26       "interests": result.get("interests"),
27       "release_date": result.get("releaseDate"),
28       "directors": directors_string,
29       "cast": cast_string,
30       "writers": writers_string,
31     })
32
33 # Save enriched movies to JSONL file
34 with open('enriched_movies.jsonl', 'w') as f:
35   for movie in movies:
36     f.write(json.dumps(movie) + '\n')

The full enrichment script is in /model/scripts/enrich-movies.py

Defining my model

After my data was loaded, it was time to configure my model. Shaped makes it very easy to set up your first model: just upload a YAML file. 

There are three config components to know: connectors, fetch, and model.

  • connectors: Defines which datasets to connect to my model. 
  • fetch: to define the SQL that Shaped will use to get my training data. For this model, I configured an items table (movies) and an events table (user behaviour like ratings and clicks). 
  • model: Declare how the model will actually score and rank items. It exposes two important fields:
    • policy_config: Define the ranking algorithm and how the model learns
    • inference_config: Tweak how your model serves results at runtime (inputs, retrieval methods, diversity, etc)

I'll save the details for another blog post, but here's the full model config for your reference: model.yaml

Building the frontend

Since I'm creating this demo from scratch, I spent some time building a Next.js app to showcase the model. 

I built some generic components to start: 

  • Carousel to show a category of movies
  • Search bar
  • Card when you click on a movie that shows further details
  • Similar movies 

I also wrote some logic to track which items a user clicks on. These are sent back to Shaped as new events in the “events” dataset. 

Here's what the first version looked like, with dummy data: 

Wiring my model to the Shaped API

After building my frontend and training my model, it was time to wire my app to the Shaped API. 

The benefit of using Shaped is its single-model versatility. A single model can serve multiple use cases across my app. I don’t have to train a recommendation model, a separate semantic search one, and then a third similarity model. 

As you’ll see, the same model will be used to get personalized recommendations, run semantic search, and get trending movies, similar movies that other people liked, and recommendations in a specific category. 

This dramatically reduces complexity and ensures consistent ranking logic across my application.

Feature 1: Personalized “For you” carousel

The topmost carousel should show a personalized “For you” feed of movies that the user may like. To do this, we call the Shaped /rank endpoint, which returns a personalized list of rankings based on user IDs, interactions, a text query, and anything else you want to pass it. 

For this carousel, we want rankings that are: 

  1. Conditioned on the current user’s unique ID
  2. Conditioned on any recent interactions that the model may not have been trained on, but do not return these items
  3. Include item metadata (title, genre, etc) to save a trip to the server
  4. Include some less-relevant items to prompt exploration

The final call to the /rank endpoint looks like this. Notice we include interactions, user_id, and an exploration_factor to adjust the flavour of our results set: 

foryou.js
1 const forYouMovies = await fetch("/models/movie_recs/rank", {
2     method: "POST",
3     headers: {
4       "Content-Type": "application/json",
5       "x-api-key": token,
6     },
7     body: JSON.stringify({
8       return_metadata: true,
9       limit: 20,
10       user_id: userId,
11       interactions: stringInteractions,
12       config: {
13         filter_interaction_iids: true,
14         exploration_factor: 0.2,
15       },
16     }),
17   });
18 return (
19  <MovieList movies={forYouMovies} />
20 )

Feature 2: Semantic search using the same model

As mentioned before, we’ve trained this ranking model and get semantic search for free. In this case, we use the /retrieve endpoint with a text query. This returns a set of relevant results with no personalization. This is important because a search should be agnostic to a user’s preferences.

MovieList.js
1 const getMovies = async (searchQuery) => {
2   try {
3     const response = await fetch("/models/movie_recs/retrieve", {
4       method: "POST", headers,
5       body: JSON.stringify({
6         return_metadata: true,
7         explain: true,
8         text_query: searchQuery,
9         config: {
10           exploration_factor: 0,
11           diversity_factor: 0,
12           diversity_attributes: [],
13           limit: 50
14         }
15       }),
16     });
17     
18     const searchResults = await response.json();
19     setMovies(searchResults?.data?.metadata || []);
20   } catch (error) { ... }
21 };
22
23 const handleInputChange = (event) => {
24   const searchQuery = event.target.value;
25   setQuery(searchQuery);
26   getMovies(searchQuery);
27 };
28
29 return (
30   <div>
31     <Input 
32       value={query}
33       onChange={handleInputChange} 
34       placeholder="Search for movies..."
35     />
36     <MovieList movies={movies} />
37   </div>
38 );
39

Feature 3: Powering a “People also liked…” section

We can pass the model a movie ID and it will show us similar movies. To do this, we call the /similar_items endpoint with an item_id parameter. This returns the movies that are most similar to the selected one. 

SimilarMovies.js
1 const similarMovies = await fetch("/models/movie_recs/similar_items", {
2   method: "POST",
3   headers,
4   body: JSON.stringify({ item_id: item_id }),
5 });
6
7 return (
8   <MovieList movies={similarMovies} />
9 )

Feature 4: Adding genre filters

Again we can support a new use case with our same model. I can add carousels for a specific genre, with personalized recommendations based on the user’s activity. I use a similar API call as the first example, but filtered for only a specific genre. For this, I use the /rank endpoint with a filter_predicate attribute: 

GenreFilter.js
1 const actionMovies = await fetch("/models/movie_recs/rank", {
2         method: "POST", headers,
3         body: JSON.stringify({
4           filter_predicate: `array_has_any(genres, ['Action'])`,
5           user_id: userId,
6           interactions: stringInteractions,
7           limit: 20,
8           return_metadata: true,
9         }),
10       });
11 return (
12   <MovieList movies={actionMovies} />
13 )

Feature 5: Adding real-time interactions

Finally, we can make our model better over time by inserting the interactions back to our events table, using /datasets/{name}/insert

InteractionTracking.js
1 const trackClick = () => {
2   await fetch("/datasets/events_table/insert", {
3     method: "POST",headers,
4     body: JSON.stringify({
5       data: [
6         {
7           event_value: payload.event_value,
8           movieId: payload.movieId,
9           timestamp: payload.timestamp,
10           userId: payload.userId
11         }
12       ]
13     }),
14   })
15 };
16
17 return (
18     <button type="button" onClick={trackClick} className="text-left w-full"> <MovieCard />
19 </button> )
20

Conclusion

I built a real-time, production-ready movie recommendation system without deep machine learning expertise. 

Shaped abstracts the complex training and deployment pipeline, allowing me to go from raw data to a fully functional application quickly. I powered personalized ranking, semantic search, and item similarity using a single model and without touching any infrastructure.

If you're curious to train your own models, sign up for a 14-day free trial and test it yourself.

The full code for this project (including data and model config) is available on GitHub.

Get up and running with one engineer in one sprint

Guaranteed lift within your first 30 days or your money back

100M+
Users and items
1000+
Queries per second
1B+
Requests

Related Posts

Nic Scheltema
 | 
August 19, 2025

The 10 Best Pinecone Alternatives in 2025

Tullie Murrell
 | 
June 5, 2025

Powering AI Personalization with Your BigQuery Data and Shaped

Tullie Murrell
 | 
March 22, 2023

Shaped 1.0: The fastest way to personalize your product, platform or marketplace