2024 · Unilever × RSM master thesis built · 2024

Forecasting SKU sales value share with machine learning.

Written inside Unilever, graded by RSM: can ensemble ML out-forecast classical time-series methods on real FMCG data?

What it is

Consumer goods manufacturers live and die by their ability to forecast SKU sales value share: it drives product portfolios and category management. Most forecasting systems still rely on statistical time-series methods, which struggle with complex effects.

This thesis, written during my Data Analytics placement at Unilever, tests whether ensemble machine learning does better. Random Forest and Gradient Boosting Regressor go head to head with Prophet on a 12-week forecast horizon, with three twists: SKU descriptions transformed into text embeddings to capture latent features, a Targeted Random Forest using LASSO to select predictors, and SHAP values to open the black box.

The verdict: Gradient Boosting wins on predictive performance, the Targeted Random Forest gets close at a fraction of the computational cost, and SHAP makes the results explainable enough for a marketing planner to trust.

The work

12-week SKU sales value share forecasts on real Unilever point-of-sale data
Benchmarked Random Forest and Gradient Boosting Regressor against Prophet
Turned SKU text descriptions into embeddings to capture latent product features
Built a Targeted Random Forest with LASSO regularisation to cut computational complexity
Applied SHAP values for global and per-prediction explainability

Concepts

Gradient Boosting Random Forest Prophet Text embeddings LASSO SHAP Time-series

What it taught me

The best model is not the most accurate one. It is the most accurate one whose reasoning a planner can inspect and challenge.

Links

Read the thesis (PDF)