Bit // AI Developer CaseSHM secondhand machinery
Auction price prediction

Secondhand machinery,
priced and explained.

A retiring expert used to price every used machine by feel. This replaces that with a model that gives a price, says how sure it is, and shows why - on 412,698 historical auctions.

0.298
Best RMSLE (held-out)
$7,567
Typical error (MAE)
86%
Interval coverage (target 90)
412k
Auctions learned from
/ The brief

Replace a feeling with a defensible number

SHM buys and sells used heavy machinery. Pricing lived in one expert's head, and he is leaving. The job is not just to predict a price - it is to earn the trust he had: explain each valuation, admit uncertainty, and flag a mispriced lot. Because SHM both buys and sells, a calibrated, explainable price beats a slightly sharper point estimate.

/ How it works

From a raw auction record to a priced, explained estimate

01
Raw auctions
412k sales, 54 columns of machine attributes
02
Clean + engineer
age, sale-date parts, missingness as signal, parsed size class
03
Split on time
train on the past, test on the most recent sales
04
Model suite
six models, early-stopped, scored by RMSLE
05
Calibrate + explain
conformal intervals, SHAP, similar past sales
06
Price + range + why
a number, a confidence band, and its drivers
/ Honest validation

Test on the future, not a shuffle

The model prices machines sold tomorrow, so it is trained on the past and tested on the most recent sales. A random shuffle would let it peek at neighbouring-week prices and look far better than it is. We ran that shuffle too, to measure exactly how much it would have flattered us.

Random split (optimistic)
0.210
What a shuffle would have reported. It is a mirage.
Time split (honest)
0.302
What the model actually achieves on unseen, later sales.
RMSLE: random split vs the honest time-based split
RMSLE: random split vs the honest time-based split
/ The models

Six models, one honest test set

From a Ridge baseline to gradient boosting and an entity-embedding neural net, with the boosters early-stopped on a validation slice. Random Forest is the production model: the interpretable workhorse, it pairs cleanly with exact explanations and calibrates without bias.

ModelRMSLEMAEMAPER2
Neural net (embeddings)best0.298$7,69424.2%0.760
Random Forestproduction0.302$7,56722.0%0.762
CatBoost0.304$7,79321.4%0.746
Ridge0.317$7,66023.9%0.748
LightGBM0.334$8,48223.4%0.713
XGBoost0.335$8,26523.1%0.722
/ What drives the price

The model reasons like a valuer

Age leads, then the machine's size class parsed out of its product description - the levers an appraiser pulls first. Every prediction breaks down per machine, so a buyer sees why a number came out the way it did.

age
product_class_value
Couple System
Secondary Description
age_unknown
Model Description
SHAP: how each feature pushes a price up or down
SHAP: how each feature pushes a price up or down
/ Knowing when it is unsure

A price with a range, not false precision

Conformal prediction wraps each estimate in a calibrated band - the band is what turns a guess into a buy or sell decision. A flat band in log space is wider in dollars for expensive machines, which is the right shape.

90% target
86% of true prices land inside the band, against a 90% target. The small shortfall is the market drifting - measured, not hidden.
/ Staying calibrated

A price model is a depreciating asset

Measured only on data it never trained on, the error climbs from about 0.25 to 0.34 across 2009-2012 as the market moves. That is the signal to retrain - tracked, not hidden.

Out-of-sample RMSLE over time
Out-of-sample RMSLE over time
Try it

Price a machine yourself

Enter a machine's specs and watch the model return a price, its confidence band, the factors behind it, and the most similar past sales.