Now, it is time to look at and comment on the final model for DISTANCE after removing outliers, working through the process of model building, adding interactions, and assessing accuracy and stability.

My latest model attempts to predict DISTANCE uses ARRIVAL_DELAY, DEPARTURE_DELAY, AIR_TIME, and the WEATHER*DISTANCE interaction. Of course, I’ve already divided the data to create training and test sets.

Or as the model equation:

DISTANCE = -157.99 + 8.28*air_time-0.017*arrival_delay+0.0082*departure_time

Also, R-squared is about 94%, which means we’re doing an excellent job with the data we have, but in order to get our model perfect and closer to 100%, there’s clearly more work to be done.

But let’s also look at prediction accuracy and stability, if we were to make predictions of BMI for a new person not already in our dataset.

- tstpred=fit$coeff[1]+fit$coeff[2]*dtst*AIR_TIME+fit$coeff[3]*dtst$weather+fit$coeff[4]*dtst$ad
- tstresid=dtst$DISTANCE-tstpred

- mean(abs(fit$resid))
- 69.83145

- mean(abs(tstresid))
- 68.30

- mean(abs(fit$resid))

Using the mean and the two histograms above, we were able to discuss stability and accuracy. The two histograms shown above are very similar. They are both centered at zero and have most observations within the range of 150 and 200. Also, our means are very similar which show that the model is pretty stable. On average, the model is not accurate and it is off by a large amount.

If we were working with actual flights, we would need to consider other variables involved in order to accurately make better predictions.

Thank you for following along!!!