top of page

Predicting Defects with Warranty Claims

Business Case




Modern assembly lines contain machines that do repetitive tasks. Sometimes it is tightening a screw. Other times it can be lifting or welding. As these machines perform tasks, they generate data. This data can then be used to optimize activity for the firm.

Amazing HVAC Parts (AHP) is a multi-billion dollar company that makes parts for HVAC systems. The HVAC unit cooling your home this very minute probably contains parts made AHP. Warranty claims are a massive problem for AHP because AHP must replace every defective part that fails under warranty, and warranty replacement costs come directly off AHP’s bottom line. Current warranty-related losses are 10’s of millions of dollars each year. Decreasing warranty claims by only 10% would add $millions to corporate profit.

Your Customer Contact

Jeb works in accounting at AHP and has an idea: "What if AHP could use a data science model to predict which parts are faulty before they leave the manufacturing floor? That is, use information collected during the manufacturing of the HVAC part to predict if the part will ultimately result in a warranty claim. Knowing which parts are faulty before they leave the manufacturing facility could save AHP millions of dollars each year.


The following time-series data set contains sensor information collected during the manufacturing of HVAC parts. The data is summarized in five-minute increments. Appended to this information is an indicator of whether AHC manufactured a faulty part during that five-minute increment. The goal is to use a machine learning model to predict the conditions that result in defective parts. An accurate machine learning model will allow us to predict the probability of a warranty claim the second a part rolls off the assembly line and give insight into whether the HVAC part is worthy of customer use.

Rows and Records

Data contaits 542 Warranty Claims and 8,735,233 sensor readings collected from 218 machines.  The length of the data spans roughly five and one half years. 

The data is relational and includes 8 separate tables.  For more information, please see the meta data.

Real or Fake: 

This data is 100% fake.  Any relationship to the real world is coincidental.

bottom of page