Logistic report template

Introduction

Statement of the problem from the customer’s perspective

History of the problem, previous results

Exploratory data analysis

  • Head of the data

    • Discuss the characteristics of each feature.
  • Barchart of target (0 or 1) vs each feature, by percent (%)

    • Discussion of y vs target variables
  • Boxplots of the numeric data (insert plot here)

    • Discussion of boxplots of the numeric data
  • Histograms of each numeric column (insert plot here)

    • Discussion of histograms of each numeric column
  • Data summary (insert table here)

    • Discussion of the data summary
  • Outliers in the data (insert outliers data here)

    • Discussion of outliers in the data
  • Correlation of the data (table)

  • Correlation plot of the numeric data as circles and colors

  • Correlation of the ensemble

  • Variance Inflation Factor

  • The stories in the exploratory data analysis

24 logistic models (Individual models then ensembles, in alphabetical order)

One paragraph summary about statistical modeling here

  • Cubist

    cubist_train_fit <- Cubist::cubist(x = as.data.frame(train), y = train$y)

  • Flexible Discriminant Analysis

    fda_train_fit <- MachineShop::fit(as.factor(y) ~ ., data = train01, model = “FDAModel”)

  • GAM (Generalized Additive Models) (uses smoothing splines)

    f2 <- stats::as.formula(paste0(“y ~”, paste0(“gam::s(”, names_df, “)”, collapse = “+”)))

    gam_train_fit <- gam(f2, data = train1)

  • Generalized Linear Models

    glm_train_fit <- stats::glm(y ~ ., data = train, family = binomial)

  • Lasso (uses best model)

    best_lasso_lambda <- lasso_cv$lambda.min

    best_lasso_model <- glmnet(x, y, alpha = 1, lambda = best_lasso_lambda)

  • Linear (tuned)

    linear_train_fit <- e1071::tune.rpart(formula = y ~ ., data = train)

  • Linear Discriminant Analysis

    lda_train_fit <- MASS::lda(as.factor(y) ~ ., data = train01, model = “LMModel”)

  • Penalized Discriminant Analysis

    pda_train_fit <- MachineShop::fit(as.factor(y) ~ ., data = train01, model = “PDAModel”)

  • Quadratic Discriminant Analysis

    qda_train_fit <- MASS::qda(as.factor(y) ~ ., data = train01)

  • Random Forest

    rf_train_fit <- randomForest(x = train, y = as.factor(y_train), data = df, family = binomial(link = “logit”))

  • Ridge

    best_ridge_lambda <- ridge_cv$lambda.min

    best_ridge_model <- glmnet(x, y, alpha = 0, lambda = best_ridge_lambda)

  • RPart

    rpart_train_fit <- rpart::rpart(train$y ~ ., data = train)

  • SVM (Support Vector Machines) (tuned)

    svm_train_fit <- e1071::tune.svm(x = train, y = train$y, data = train)

  • Tree

    tree_train_fit <- tree::tree(train$y ~ ., data = train)

    Ensemble models start here

  • Ensemble Gradient Boosted

    ensemble_gb_train_fit <- gbm::gbm(ensemble_train$y_ensemble ~ ., data = ensemble_train, distribution = “gaussian”, n.trees = 100, shrinkage = 0.1, interaction.depth = 10 )

  • Ensemble Lasso (uses best model)

    ensemble_best_lasso_lambda <- ensemble_lasso_cv$lambda.min

    ensemble_best_lasso_model <- glmnet(ensemble_x, ensemble_y, alpha = 1, lambda = ensemble_best_lasso_lambda)

  • Ensemble Partial Least Squares

    ensemble_pls_train_fit <- MachineShop::fit(as.factor(y) ~ ., data = ensemble_train, model = “PLSModel”)

  • Ensemble Penalized Discriminant Analysis

    ensemble_pda_train_fit <- MachineShop::fit(as.factor(y) ~ ., data = ensemble_train, model = “PDAModel”)

  • Ensemble Ridge

    x = model.matrix(y ~ ., data = ensemble_train)[, -1]

    y = ensemble_train$y

    ensemble_ridge_train_fit <- glmnet::glmnet(x, y, alpha = 0)

  • Ensemble RPart

    ensemble_rpart_train_fit <- MachineShop::fit(as.factor(y) ~ ., data = ensemble_train, model = “RPartModel”)

  • Ensemble Support Vector Machines (SVM)

    ensemble_svm_train_fit <- e1071::svm(as.factor(y) ~ ., data = ensemble_train, kernel = “radial”, gamma = 1, cost = 1)

  • Ensemble Trees

    ensemble_tree_train_fit <- tree::tree(ensemble_train$y ~ ., data = ensemble_train)

  • The stories in the models (fill in here)

Ensembles and individual model plots

  • Negative predictive value (fixed scales)

  • Negative predictive value (free scales)

  • Positive predictive value (fixed scales)

  • Positive predictive value (free scales)

  • F1 Score (fixed scales)

  • F1 Score (free scales)

  • False negative rate (fixed scales)

  • False negative rate (free scales)

  • False positive rate (fixed scales)

  • False positive rate (free scales)

  • True negative rate (fixed scales)

  • True negative rate (free scales)

  • True positive rate (fixed scales)

  • True positive rate (free scales)

  • ROC Curves for each of the 24 models

  • Over or under fitting (closer to 1 is better) barchart

  • Duration (mean) by model barchart

  • Overfitting by model and resample, fixed scales

  • Overfitting by model and resample, free scales

  • Model accuracy bar chart

  • Accuracy by model and resample, including train and holdout by each resample, fixed scales

  • Accuracy by model and resample, including train and holdout by each resample, free scales

  • Summary report

    • Accuracy (mean)

    • Accuracy (standard deviation)

    • True positive rate (also known as sensitivity)

    • True negative rate (also known as specificity)

    • False positive rate (also known as Type I error)

    • False negative rate (also known as Type II error)

    • Positive predictive value

    • Negative predictive value

    • F1 score

    • Area under the curve (AUC)

    • Overfitting (mean)

    • Overfitting (standard deviation)

    • Duration (mean)

    • Duration (standard deviation)

  • Function call

  • Warnings or errors

  • The stories in the plots

Strongest evidence based results:

  • Most accurate models with error ranges

  • Strongest predictor with error ranges

  • The stories of the strongest evidenced based data

Five strongest evidence based recommendations

Conclusions

References