What is Feature Engineering in ML?

Feature engineering is the process of turning raw data into inputs a machine learning model can learn from. Here's what it means in practice, with examples.

May 27, 2026

min read

Dipro Bhowmik

What is Feature Engineering?

Machine learning models only understand numbers. They can’t read a product name, a timestamp, or a user’s browsing history (at least not directly). Feature engineering is the process of converting that raw data into numeric inputs a model can learn from.

It’s one of the most important parts of building any machine learning system. A system of well-engineered features with a simple model like LightGBM outperforms sophisticated model architecture using mediocre features.

What is a feature in machine learning and data science?

A feature is any measurable property you use as input to a model.

Features can be simple — a raw column value like price or age — or derived, like the ratio of views to purchases, or the cosine similarity between two embedding vectors.

For a recommendation system, features might include things like how many times a user has purchased from a category, how recently an item was added to the catalog, or how similar two items are based on their descriptions.

What does feature engineering actually involve?

In practice, feature engineering covers a range of transformations, often composed as a pipeline:

Aggregations — windowed counts, sums, or averages grouped by a key (e.g. purchases per user in the last 30 days)
Mathematical transforms — log, sqrt, or ratio to normalize skewed distributions
Bucketing — converting a continuous value like age into discrete bands
Time-based features — days since last purchase, whether an item is new in the last 7 days
Location-based features – distance between item and user, country
Embeddings — representing items or users as dense vectors, then computing similarity between them

The right approach depends on what the feature represents and how a model is likely to use it.

Why does feature engineering matter?

No amount of model complexity can compensate for bad features. If your inputs don’t reflect the relationships that actually drive user behaviour, the model has nothing meaningful to learn from.

Conversely, a simple model with well-engineered features will often outperform a complex one with poorly defined inputs. This is especially true in recommendation systems, where the signal-to-noise ratio of your features directly affects the quality of what gets surfaced to users.

Feature engineering also has an operation cost, since every transformation is something to maintain and monitor in production.

Feature engineering vs. signals

In recommendation systems, it’s useful to distinguish between features and signals. A feature is any numeric input. A signal is a feature that describes a meaningful real-world relationship — trendiness, recency, similarity to past behaviour. Not all features are signals, and building a system around signals rather than arbitrary features tends to produce more interpretable, maintainable results.