#Example Annotated R code for Factor Analysis

#Companion to: Knekta, Runyon, and Eddy (in prep)

####################
##  Preliminaries ##
####################

# First, read the dataset into R. To do so you will need to direct R to the location of the file that contains
# the data (the working directory). This path will be unique to your computer, but below is an example path.
# Notice the direction of the slashes (/). They are the opposite direction that Windows' uses typically uses (\).
# Alternatively, one could use the double backslash (\\) instead of the forwardslash (/).

# The following command directs R to where the .csv containing the data is located.

setwd("C:/Users/Christopher Runyon/Documents/EFA_CFA_Paper")

# Equivalent to
# setwd("C:\\Users\\Christopher Runyon\\Documents\\EFA_CFA_Paper")
# The data is being read from a comma-separated-value file into
# an R dataframe called "data"

data <- read.csv(file="EFAsampledata.csv")

# Next, install the necessary packages. After the pacakges are installed on your computer, it is not necessary
# to re-install them for each use; you can simply load the package (as we do after installing the packages). You
# may be prompted to choose a secure CRAN mirror to download the package. It does not matter which one
# you choose, and you only need to specify this CRAN mirror the first time you are prompted.

# Installing packages
# Only necessary to do so once!

install.packages("lavaan")
install.packages("psych")
install.packages("nFactors")
install.packages("corrplot")
install.packages("GPArotation")


# Loading the packages
# Necessary for each session

library(lavaan)
library(psych)
library(nFactors)
library(corrplot)
library(GPArotation)

# There is a bug in the Lavaan version 0.6 causing Robust estimates of CFI and RMSEA to appear as NA
# If that happends it can be fixed by installing and loading the following package:

install.packages("githubinstall")
library(githubinstall)

# Than run the following code: 
githubinstall("lavaan") 

# You will be prompted with some Y/N questions; answer Y to all of them
#After going through these steps, reload R and library("lavaan")


##################################
## Confirmatory factor analysis ##
##################################

### Step 1: Specifying a Model (Section 6.4.2) ###

# We chose to initially try a two-factor model based on the theoretical underpinnings of the survey. We chose
# two broad factor names and assigned items to those factors. The syntax below is the model specification,
# telling which factors we want the items to load on.

# The "=~" indicates that factor (left-hand side) is defined by the
# observed items (right-hand side).

CFA2<- 'Self =~ go1 + go2 + go3 + go4 + go5 + go6 + go7 + go8 + go9 +
go10 + go11 + go12 + go13 + go14
Other =~ go15 + go16 + go17 + go18 + go19 + go20 + go21 + go22 + go23'

### Step 2: Estimating the CFA (Section 6.4.1) ###

# Here we specify what estimator should be used (estimator), how missing data should be handled (missing),
# and what file to pull the data from (data). See main body of paper for different options.
# The estimator used here is maximum likelihood robust ('mlr') which can handle data that violates the
# assumptions of normality and linearity. This is not the standard estimator and so must be specificed by the
# user. When the data are pefectly normal, the result of using 'mlr' and the default maximum likelhood ('ml')
# would be equal.

# To account for any missing data we use full information maximum likelihood ('fiml') which is an alternative
# to imputing missing data that still utilizes the entire dataset.

C2f_fit <- cfa(CFA2, estimator = "mlr", missing = "fiml", data= data)

### Step 3: Examining Model Fit (Section 6.4.3 and 6.5) ###

# To get primary output

summary(C2f_fit, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)

# The syntax below can be used to examine the modification indices.
# Modification indices, byt only showing values above 2

C2f_mods <- modificationIndices(C2f_fit, minimum.value = 2)

# Displays the modification indices

C2f_mods

# In order to see the largest modification indices first,one may do the following:

C2f_mods <- C2f_mods[order(C2f_mods$mi.scaled, decreasing = TRUE), ]

# This call requests the correlation residuals. 
residuals(C2f_fit, type = "cor")$cor

#################################
## Exploratory Factor Analysis ##
#################################

### Step 1: Examining the Correlation Matrix (Supplmental Material Section 1) ###

# An EFA explores the relationship between variables, so it can be useful to examine a correlation matrix of all
# of the possible items to identify what relationships between the items may exist. The code below creates this
# correlation matrix. To account for missing data we chose to build this matrix based on pairwise complete
# observations. Pairwise complete observations uses all pairs available for a correlation, regardless of whether
# or not there are responses to the other observed items. We chose this option because of the small amount of
# missing data in the observed responses (< 5%) and because the analysis is exploratory in nature. Using a
# more complex missing data technique would not necessarily improve the quality of the data for the purposes
# of an exploratory analysis.

# Calculating correlation matrix based on pairwise complete observations

efa_cormat <- round(cor(data[,1:23], use="pairwise.complete.obs"),3)

# Display the correlation matrix

efa_cormat

# The 'corrplot' package re-orders the items and colors the strength of the correlations to make clusters of
# items more apparent.

corrplot((efa_cormat), order = "hclust", tl.col='black', tl.cex=.75)
                     
### Step 2: Determining the number of factors to look for in the EFA (6.6.3) ###

# In an EFA, the user has to specificy how many factors (dimensions) the program should try to model. There
# are multiple methods to do this that return both as visual representations and suggested number of factors
# using multiple tests.

# Visual representation of dimensionality using scree plot and parallel analysis

fa.parallel(data, fm = 'pa')

### Step 3: Specifying the EFA Model (Section 6.6)###

# Below the syntax examines the correlation matrix that we created above. We extract 5 factors, as we wanted
# to examine the most complex solution first. We use an oblimin rotation ("oblimin") that allows our latent factors to be
# correlated, and we also use a principal axis solution ("pa") as our factoring method. The solution is saved to the R
# object "Efa5."

Efa5 <- fa(r = efa_cormat, nfactors = 5,
           rotate = "oblimin", fm = "pa",
           max.iter = 500)
Efa5