View on GitHub

dash-app

The GreatFull Plate Internship Projects . Dashboards include visualizations and written analysis, all used to create a final report about insurance data.

Download this project as a .zip file Download this project as a tar.gz file

R Analysis

View or download the R PDF (Generated with R Markdown): R PDF
Download RMD file: .RMD file | View from Browser: .RMD file

Jupyter Notebook Regression: Download PythonRegression.ipynb

Jupyter Notebook PDF: PythonRegression.PDF

If you would like to run the reportr.rmd file, make sure that you have R, RStudio, and a LaTeX renderer in your computer - i.e. MiniTeX or TinyTeX.
Otherwise you could render an HTML document instead of a PDF.

How to run?

If you have RStudio - just hit Knit and the PDF should pop up.
How to run

Differences between Python and R

I am not going to into the specifics of the languages, but only the differences when running Linear Regressions and Random Forests.

As you can see R offers a more straight-forward approach to building the models - but it comes with a performance cost.

To run a simple regression model in R:

Note: Make sure that if you have categorical data you convert it to numeric values.

###### Converting to Numeric Variables ###### 
sex <- ifelse(data["sex"] == "female", 0, 1)
smoker  <- ifelse(data["smoker"] == "yes", 1, 0)
region <- as.numeric(data$region)

##### Replacing columns in the Data ###### 
data["sex"] <-  sex
data["smoker"] <-  smoker
data["region"] <- region

Using the lm() function:

Response variable: Y (charges)
Predictor variable: X (age)

#note in R it is allowed to have variable names with `.` in between words
example.model <- lm(charges ~ age, data = data) 

To get the Summary - use the summary() function:

summary(example)

Random Forest in R

If you want to separate the data in training set and test set:

Remember to always set a seed in order to replicate results.

###### Random Forest Model ######
set.seed(100)

#setting a train and test set 
train <- sample(nrow(data), 0.8*nrow(data), replace = FALSE) 
trainset <- data[train,]
testset <- data[-train,]

Running and plotting the model

random.forest1 <- randomForest(charges ~ ., data = trainset, ntree = 500, mtry = 6, 
                               importance = TRUE)
plot(main = "Random Forest Errors vs. Number of Trees", random.forest1)