SHM Machinery Pricing

/ The brief

Replace a feeling with a defensible number

SHM buys and sells used heavy machinery. Pricing lived in one expert's head, and he is leaving. Predicting a price is only half the job. It also has to earn the trust he had: explain each valuation, admit uncertainty, and flag a mispriced lot. Because SHM both buys and sells, a calibrated, explainable price beats a slightly sharper point estimate.

/ How it works

From a raw auction record to a priced, explained estimate

Raw auctions

412k sales, 54 columns of machine attributes

Clean + engineer

age, sale-date parts, missingness as signal, parsed size class

Split on time

train on the past, test on the most recent sales

Model suite

six models, early-stopped, scored by RMSLE

Calibrate + explain

conformal intervals, SHAP, similar past sales

Price + range + why

a number, a confidence band, and its drivers

/ Honest validation

Tested on the most recent sales

The model prices machines sold tomorrow, so it is trained on the past and tested on the most recent sales. A random shuffle would let it peek at neighbouring-week prices and look far better than it is. We ran that shuffle too, to measure exactly how much it would have flattered us.

Random split (optimistic)

0.210

What a shuffle would have reported by peeking at neighbouring-week prices.

Time split (honest)

0.302

What the model actually achieves on unseen, later sales.

RMSLE: random split vs the honest time-based split

/ The models

Six models, one honest test set

The neural net and RandomForest finish in a statistical tie at the top, with CatBoost just behind. We deploy Random Forest: it is interpretable, pairs cleanly with exact explanations, and calibrates without bias. LightGBM and XGBoost sit a little back on untuned defaults (with early stopping) - a hyperparameter search is what they would need; more trees and a different encoding were tried and neither closed the gap.

Model	RMSLE	MAE	MAPE	R2
Neural net (embeddings)best	0.298	$7,694	24.2%	0.760
Random Forestproduction	0.302	$7,567	22.0%	0.762
CatBoost	0.308	$7,889	21.6%	0.743
Ridge	0.317	$7,660	23.9%	0.748
XGBoost	0.334	$8,270	23.2%	0.722
LightGBM	0.335	$8,482	23.4%	0.713

/ What drives the price

The same drivers a valuer checks first

Age leads, then the machine's size class parsed out of its product description - the levers an appraiser pulls first. Every prediction breaks down per machine, so a buyer sees why a number came out the way it did.

age

product_class_value

Couple System

Secondary Description

age_unknown

Model Description

/ Knowing when it is unsure

A price with a calibrated range

Conformal prediction wraps each estimate in a calibrated band - the band is what turns a guess into a buy or sell decision. A flat band in log space is wider in dollars for expensive machines, which is the right shape.

90% target

86% of true prices land inside the band, against a 90% target. The small shortfall is the market drifting away from the calibration window, and the gauge shows it.

/ Staying calibrated

Accuracy decays as the market moves

Measured only on data it never trained on, the error climbs from about 0.25 to 0.34 across 2009-2012 as conditions shift away from the training window. That is the signal to retrain, and the chart catches it while the error is still small.

Try it

Price a machine yourself

Enter a machine's specs and the model returns a price, its confidence band, the factors behind it, and the most similar past sales.

Open the live demo