A. ŠARAS
2024 · Unilever × RSM master thesis built · 2024

Forecasting SKU sales value share with machine learning.

Written inside Unilever, graded by RSM: can ensemble ML out-forecast classical time-series methods on real FMCG data?

What it is

Consumer goods manufacturers live and die by their ability to forecast SKU sales value share: it drives product portfolios and category management. Most forecasting systems still rely on statistical time-series methods, which struggle with complex effects.

This thesis, written during my Data Analytics placement at Unilever, tests whether ensemble machine learning does better. Random Forest and Gradient Boosting Regressor go head to head with Prophet on a 12-week forecast horizon, with three twists: SKU descriptions transformed into text embeddings to capture latent features, a Targeted Random Forest using LASSO to select predictors, and SHAP values to open the black box.

The verdict: Gradient Boosting wins on predictive performance, the Targeted Random Forest gets close at a fraction of the computational cost, and SHAP makes the results explainable enough for a marketing planner to trust.

The work

Concepts

Gradient Boosting Random Forest Prophet Text embeddings LASSO SHAP Time-series

What it taught me

The best model is not the most accurate one. It is the most accurate one whose reasoning a planner can inspect and challenge.

Links

Read the thesis (PDF)