#Example Annotated R code for Factor Analysis #Companion to: Knekta, Runyon, and Eddy (in prep) #################### ## Preliminaries ## #################### # First, read the dataset into R. To do so you will need to direct R to the location of the file that contains # the data (the working directory). This path will be unique to your computer, but below is an example path. # Notice the direction of the slashes (/). They are the opposite direction that Windows' uses typically uses (\). # Alternatively, one could use the double backslash (\\) instead of the forwardslash (/). # The following command directs R to where the .csv containing the data is located. setwd("C:/Users/Christopher Runyon/Documents/EFA_CFA_Paper") # Equivalent to # setwd("C:\\Users\\Christopher Runyon\\Documents\\EFA_CFA_Paper") # The data is being read from a comma-separated-value file into # an R dataframe called "data" data <- read.csv(file="EFAsampledata.csv") # Next, install the necessary packages. After the pacakges are installed on your computer, it is not necessary # to re-install them for each use; you can simply load the package (as we do after installing the packages). You # may be prompted to choose a secure CRAN mirror to download the package. It does not matter which one # you choose, and you only need to specify this CRAN mirror the first time you are prompted. # Installing packages # Only necessary to do so once! install.packages("lavaan") install.packages("psych") install.packages("nFactors") install.packages("corrplot") install.packages("GPArotation") # Loading the packages # Necessary for each session library(lavaan) library(psych) library(nFactors) library(corrplot) library(GPArotation) # There is a bug in the Lavaan version 0.6 causing Robust estimates of CFI and RMSEA to appear as NA # If that happends it can be fixed by installing and loading the following package: install.packages("githubinstall") library(githubinstall) # Than run the following code: githubinstall("lavaan") # You will be prompted with some Y/N questions; answer Y to all of them #After going through these steps, reload R and library("lavaan") ################################## ## Confirmatory factor analysis ## ################################## ### Step 1: Specifying a Model (Section 6.4.2) ### # We chose to initially try a two-factor model based on the theoretical underpinnings of the survey. We chose # two broad factor names and assigned items to those factors. The syntax below is the model specification, # telling which factors we want the items to load on. # The "=~" indicates that factor (left-hand side) is defined by the # observed items (right-hand side). CFA2<- 'Self =~ go1 + go2 + go3 + go4 + go5 + go6 + go7 + go8 + go9 + go10 + go11 + go12 + go13 + go14 Other =~ go15 + go16 + go17 + go18 + go19 + go20 + go21 + go22 + go23' ### Step 2: Estimating the CFA (Section 6.4.1) ### # Here we specify what estimator should be used (estimator), how missing data should be handled (missing), # and what file to pull the data from (data). See main body of paper for different options. # The estimator used here is maximum likelihood robust ('mlr') which can handle data that violates the # assumptions of normality and linearity. This is not the standard estimator and so must be specificed by the # user. When the data are pefectly normal, the result of using 'mlr' and the default maximum likelhood ('ml') # would be equal. # To account for any missing data we use full information maximum likelihood ('fiml') which is an alternative # to imputing missing data that still utilizes the entire dataset. C2f_fit <- cfa(CFA2, estimator = "mlr", missing = "fiml", data= data) ### Step 3: Examining Model Fit (Section 6.4.3 and 6.5) ### # To get primary output summary(C2f_fit, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE) # The syntax below can be used to examine the modification indices. # Modification indices, byt only showing values above 2 C2f_mods <- modificationIndices(C2f_fit, minimum.value = 2) # Displays the modification indices C2f_mods # In order to see the largest modification indices first,one may do the following: C2f_mods <- C2f_mods[order(C2f_mods$mi.scaled, decreasing = TRUE), ] # This call requests the correlation residuals. residuals(C2f_fit, type = "cor")$cor ################################# ## Exploratory Factor Analysis ## ################################# ### Step 1: Examining the Correlation Matrix (Supplmental Material Section 1) ### # An EFA explores the relationship between variables, so it can be useful to examine a correlation matrix of all # of the possible items to identify what relationships between the items may exist. The code below creates this # correlation matrix. To account for missing data we chose to build this matrix based on pairwise complete # observations. Pairwise complete observations uses all pairs available for a correlation, regardless of whether # or not there are responses to the other observed items. We chose this option because of the small amount of # missing data in the observed responses (< 5%) and because the analysis is exploratory in nature. Using a # more complex missing data technique would not necessarily improve the quality of the data for the purposes # of an exploratory analysis. # Calculating correlation matrix based on pairwise complete observations efa_cormat <- round(cor(data[,1:23], use="pairwise.complete.obs"),3) # Display the correlation matrix efa_cormat # The 'corrplot' package re-orders the items and colors the strength of the correlations to make clusters of # items more apparent. corrplot((efa_cormat), order = "hclust", tl.col='black', tl.cex=.75) ### Step 2: Determining the number of factors to look for in the EFA (6.6.3) ### # In an EFA, the user has to specificy how many factors (dimensions) the program should try to model. There # are multiple methods to do this that return both as visual representations and suggested number of factors # using multiple tests. # Visual representation of dimensionality using scree plot and parallel analysis fa.parallel(data, fm = 'pa') ### Step 3: Specifying the EFA Model (Section 6.6)### # Below the syntax examines the correlation matrix that we created above. We extract 5 factors, as we wanted # to examine the most complex solution first. We use an oblimin rotation ("oblimin") that allows our latent factors to be # correlated, and we also use a principal axis solution ("pa") as our factoring method. The solution is saved to the R # object "Efa5." Efa5 <- fa(r = efa_cormat, nfactors = 5, rotate = "oblimin", fm = "pa", max.iter = 500) Efa5