Navigation

Entries in R (8)

Wednesday
Mar142012

Plotting forecast() objects in ggplot part 1: Extracting the Data

Lately I've been using Rob J Hyndman's excellent forecast package. The package comes with some built in plotting functions but I found I wanted to customize and make my own plots in ggplot. In order to do that, I need a generalizable function that will extract all the data I want (forecasts, fitted values, training data, actual observations in the forecast period, confidence intervals, et cetera) and place it into a data.frame with a properly formatted date field (ie, not  a ts() object).

The function below does all that and should work for any forecast object (though I've only tested it on Arima() outputs). The only arguments it takes are the original observations and the forecast object (whatever results from calling forecast()). In my next post I'll give some examples of plotting the results using ggplot and explain why I wanted more than the default plot.forecast() function.

 

#--Produces a data.frame with the Source Data+Training Data, Fitted Values+Forecast Values, forecast data Confidence Intervals
funggcast<-function(dn,fcast){ 
	require(zoo) #needed for the 'as.yearmon()' function
 
	en<-max(time(fcast$mean)) #extract the max date used in the forecast
 
	#Extract Source and Training Data
	ds<-as.data.frame(window(dn,end=en))
	names(ds)<-'observed'
	ds$date<-as.Date(time(window(dn,end=en)))
 
	#Extract the Fitted Values (need to figure out how to grab confidence intervals)
	dfit<-as.data.frame(fcast$fitted)
	dfit$date<-as.Date(time(fcast$fitted))
	names(dfit)[1]<-'fitted'
 
	ds<-merge(ds,dfit,all.x=T) #Merge fitted values with source and training data
 
	#Exract the Forecast values and confidence intervals
	dfcastn<-as.data.frame(fcast)
	dfcastn$date<-as.Date(as.yearmon(row.names(dfcastn)))
	names(dfcastn)<-c('forecast','lo80','hi80','lo95','hi95','date')
 
	pd<-merge(ds,dfcastn,all.x=T) #final data.frame for use in ggplot
	return(pd)
 
}

Created by Pretty R at inside-R.org

 

Friday
Mar092012

Recovering Marginal Effects and Standard Errors of Interactions Terms Pt. II: Implement and Visualize

In the last post I presented a function for recovering marginal effects of interaction terms. Here we implement the function with simulated data and plot the results using ggplot2.  

 

 

#---Simulate Data and Fit a linear model with an interaction term
y<-rnorm(100,5,1)
x<-rnorm(100,5,1)
d<-data.frame(y=y,x=x,fac=sample(letters[1:3],100,replace=T))
 
mod<-lm(y~x*fac,data=d)
 
#========================================================
 
#---Extract the Main Effects, including the baseline, into a data.frame
dusp<-funinteff(mod,'x') #returns a data.frame of the Estimate and Standard Error, row.names correspond to the variables
 
#----Now Set the data up to visualize in ggplot-----
library(ggplot2)
#------Quick ggplot (move into graph code later)
#quick convenience function to compute significance at .95
funsig<-function(d){
	tstat<-abs(d$b/d$se)
	sig<-ifelse(tstat>=1.96,'yes','no')
	return(sig)
}
 
 
names(dusp)[1:2]<-c('b','se') #change the names to to make typing easier
 
#Add confidence intervals and signficance test
dusp$hi<-dusp$b+1.96*dusp$se
dusp$lo<-dusp$b-1.96*dusp$se
dusp$sig95<-funsig(dusp)
 
dusp$var<-row.names(dusp)
 
 
pd<-dusp
 
p1<-ggplot(data=pd,aes(x=var,y=b,shape=sig95))
p1<-p1+geom_hline(yintercept=0,col='grey')+geom_line()
p1<-p1+geom_pointrange(aes(ymin=lo,ymax=hi)) #+coord_flip() #uncomment coord_flip to switch the axes
p1<-p1+scale_y_continuous(name='Marginal Effect of Interaction Terms')

Created by Pretty R at inside-R.org

 

Monday
Mar052012

Recovering Marginal Effects and Standard Errors from Interaction Terms in R

When I fit models with interactions, I often want to recover not only the interaction effect but also the marginal effect (the main effect + the interaction) and of course the standard errors. There are a couple of ways to do this in R but I ended writing my own function (essentially a wrapper around the deltaMethod() function) to fit my needs.

In this case, I have a model where a continuous variable has been interacted with a discrete variable. I want to create a data.frame that stores the main effect, the marginal effects of the interactions, and their standard errors.

I also want the standard errors for the marginal effects on the interaction to match the standard errors on the main effect which in this case were HAC (Heteroskedastic and Autocorrelation Consistent ie. Newey-West).

The function below is a little sloppy as I built it iteratively while I was working through a specific problem, but it is generalized to any lm() type object. The basic use case is that you have one or more models, each with one or more interactions and the discrete terms in the interactions have more than two levels (it will work with two levels, but in that case the function might be overkill). In my case I fit a series of region specific panel models where two of the main effects were interacted with a year term. I wanted have a generalizable function where I could just feed it the model object, variable name, and covariance estimator, and then return a data.frame with estimates and standard errors. In the next post I'll demonstrate the function with simulated data and show how to visualize the marginal in ggplot.

 Note: The code below assumes you have already loaded car (for deltaMethod()), sandwich (for vcovHAC), and lmtest (for coeftest()).

 

##Calls this function which is a wrapper for coeftest from the sandwhich package
sehac<-function(fit,vcov=vcovHAC){ #Convenience function for HAC standard erros
	coeftest(fit,vcov)
}
 
funinteff<-function(mod,var,vcov=vcovHAC){ #mod is an lm() object, var is the name of the main effect that was interacted, vcov is the type of variance covariance method you want to use 
	#Extract Coefficient names create 'beta names' to feed to deltaMethod()
	cnames<-coef(mod)
	pnams<-data.frame('b'=paste('b',0:(length(cnames)-1),sep=""),'est'=cnames) #assign parameter names so that deltaMethod does not throw an error
 
	#Extract the specific parameters of interest
	vars<-grep(var,names(cnames),value=T)
	var1<-vars[1]
	intvars<-vars[2:length(vars)]
	bi<-pnams[var1,'b']
 
	#--Create Data Frame to store Main Effect
	int<-sehac(mod,vcov=vcov)[var1,c('Estimate','Std. Error')]
	int<-as.data.frame(t(int))
	names(int)<-c('Estimate','SE')
	row.names(int)<-var1
 
	#Loop through and store the results in a data.frame
	for(i in 1:length(intvars)){
		bint<-pnams[intvars[i],'b']
		eq<-paste(bi,bint,sep="+")
		interac<-deltaMethod(mod,eq,parameterNames=pnams[,1],vcov=vcov)
		row.names(interac)<-intvars[i]
 
		int<-rbind(int,interac)
	}
	return(int)
}

Created by Pretty R at inside-R.org

 

Page 1 2