SHM Machinery Pricing

/ The brief

Replace a feeling with a defensible number

SHM buys and sells used heavy machinery. Pricing lived in one expert's head, and he is leaving. The job is not just to predict a price - it is to earn the trust he had: explain each valuation, admit uncertainty, and flag a mispriced lot. Because SHM both buys and sells, a calibrated, explainable price beats a slightly sharper point estimate.

/ How it works

From a raw auction record to a priced, explained estimate

Raw auctions

412k sales, 54 columns of machine attributes

Clean + engineer

age, sale-date parts, missingness as signal, parsed size class

Split on time

train on the past, test on the most recent sales

Model suite

six models, early-stopped, scored by RMSLE

Calibrate + explain

conformal intervals, SHAP, similar past sales

Price + range + why

a number, a confidence band, and its drivers

/ Honest validation

Test on the future, not a shuffle

The model prices machines sold tomorrow, so it is trained on the past and tested on the most recent sales. A random shuffle would let it peek at neighbouring-week prices and look far better than it is. We ran that shuffle too, to measure exactly how much it would have flattered us.

Random split (optimistic)

0.210

What a shuffle would have reported. It is a mirage.

Time split (honest)

0.302

What the model actually achieves on unseen, later sales.

RMSLE: random split vs the honest time-based split

/ The models

Six models, one honest test set

From a Ridge baseline to gradient boosting and an entity-embedding neural net, with the boosters early-stopped on a validation slice. Random Forest is the production model: the interpretable workhorse, it pairs cleanly with exact explanations and calibrates without bias.

Model	RMSLE	MAE	MAPE	R2
Neural net (embeddings)best	0.298	$7,694	24.2%	0.760
Random Forestproduction	0.302	$7,567	22.0%	0.762
CatBoost	0.304	$7,793	21.4%	0.746
Ridge	0.317	$7,660	23.9%	0.748
LightGBM	0.334	$8,482	23.4%	0.713
XGBoost	0.335	$8,265	23.1%	0.722

/ What drives the price

The model reasons like a valuer

Age leads, then the machine's size class parsed out of its product description - the levers an appraiser pulls first. Every prediction breaks down per machine, so a buyer sees why a number came out the way it did.

age

product_class_value

Couple System

Secondary Description

age_unknown

Model Description

SHAP: how each feature pushes a price up or down

/ Knowing when it is unsure

A price with a range, not false precision

Conformal prediction wraps each estimate in a calibrated band - the band is what turns a guess into a buy or sell decision. A flat band in log space is wider in dollars for expensive machines, which is the right shape.

90% target

86% of true prices land inside the band, against a 90% target. The small shortfall is the market drifting - measured, not hidden.

/ Staying calibrated

A price model is a depreciating asset

Measured only on data it never trained on, the error climbs from about 0.25 to 0.34 across 2009-2012 as the market moves. That is the signal to retrain - tracked, not hidden.

Try it

Price a machine yourself

Enter a machine's specs and watch the model return a price, its confidence band, the factors behind it, and the most similar past sales.

Open the live demo