Forecasting SKU sales value share with machine learning.
Written inside Unilever, graded by RSM: can ensemble ML out-forecast classical time-series methods on real FMCG data?
What it is
Consumer goods manufacturers live and die by their ability to forecast SKU sales value share: it drives product portfolios and category management. Most forecasting systems still rely on statistical time-series methods, which struggle with complex effects.
This thesis, written during my Data Analytics placement at Unilever, tests whether ensemble machine learning does better. Random Forest and Gradient Boosting Regressor go head to head with Prophet on a 12-week forecast horizon, with three twists: SKU descriptions transformed into text embeddings to capture latent features, a Targeted Random Forest using LASSO to select predictors, and SHAP values to open the black box.
The verdict: Gradient Boosting wins on predictive performance, the Targeted Random Forest gets close at a fraction of the computational cost, and SHAP makes the results explainable enough for a marketing planner to trust.
The work
- 12-week SKU sales value share forecasts on real Unilever point-of-sale data
- Benchmarked Random Forest and Gradient Boosting Regressor against Prophet
- Turned SKU text descriptions into embeddings to capture latent product features
- Built a Targeted Random Forest with LASSO regularisation to cut computational complexity
- Applied SHAP values for global and per-prediction explainability
Concepts
What it taught me
The best model is not the most accurate one. It is the most accurate one whose reasoning a planner can inspect and challenge.