Using WRDS on a Linux terminal 2013-03-19

I'm currently a visiting graduate researcher at UCLA and one of the nice perks here is access to WRDS. As they write on their site, they are the leading data research platform in the world. And why do they lead? Well, in my opinion one reason is the very easy access to great data sets. For instance, as soon as I had my account I checked out their awesome documentation sites and noticed that I can easily run scripts on their servers (apparently, this only applies to faculty staff and PhD students, so not everyone can do this).

If you want to do that as well, just start a terminal session in Linux (don't ask me how this works on Windows or on a Mac, but you find great documentation on WRDS for that as well) and type:

$ ssh

(You also find a very good introduction here. Note that this and the following links only help if you have an account for WRDS. However, if you don't, this post isn't relevant for you anyways...)

After that, you are asked to enter your password and BOOM!, you are already on their server! After that, you can start by typing in some commands.

To exit, just type

$ exit


$ logout

Now, it is quite easy to run a SAS file on their server. First, you need to know how to write a file...basically, you can start an editor -- such as vi, Emacs, or pico -- in the terminal. I use pico here, because it is a rather simple editor and since I copy/paste the text anyways, this is more than enough.

Let's start with a simple example (taken from here) to see that you can actually run SAS on the WRDS server. Type the following into the terminal:

$ pico first_script

Vector autoregression (VAR) in R 2013-03-12

In this post, I want to show how to run a vector autoregression (VAR) in R. First, I'm gonna explain with the help of a finance example when this method comes in handy and then I'm gonna run one with the help of the vars package.

Some theory

So what exactly is a VAR? Without going into too much detail here, it's basically just a generalization of a univariate autoregression (AR) model. An AR model explains one variable linearly with its own previous values, while a VAR explains a vector of variables with the vector's previous values. The VAR model is a statistical tool in the sense that it just fits the coefficients that best describe the data at hand. You still should have some economic intuition on why you put the variables in your vector. For instance, you could easily estimate a VAR with a time-series of the number of car sales in Germany and the temperature in Australia. However, it's hard to sell to someone why you are doing this, even if you would find that one variable helps explaining the other...

Let's make an example of a VAR often applied in finance (starting with Campbell/Ammer, 1993). Concretely, I implement an approach to decompose unexpected returns into two parts: cash flow (CF) news and discount rate (DR) news. This is an important issue, as pointed out for instance by Chen/Zhao (2009): Return decomposition, which notation I'm going to use here as well:

Naturally, financial economists place keen interest in the relative importance of CF news and DR news—the two fundamental components of asset valuation—in determining the time-series and cross-sectional variations of stock returns. Relatively speaking, CF news is more related to firm fundamentals because of its link to production; DR news can reflect time-varying risk aversion or investor sentiment. Their relative importance thus helps greatly to understand how the financial market works, and provides the empirical basis for theoretical modeling.

We start with the following decomposition of unexpected equity return $e_{t+1}$, based on the seminal work by Campbell/Shiller (1988): $$ e_{t+1} = r_{t+1} - E_t r_{t+1} $$ $$ = (E_{t+1} - E_t) \sum_{j=0}^{\infty} \rho^j \Delta d_{t+1+j} - (E_{t+1} - E_t) \sum_{j=1}^{\infty} \rho^j r_{t+1+j} $$ $$ = e_{CF,t+1} - e_{DR, t+1} $$ I'm not going into details here about the notation because this is explained for instance in Chen/Zhao (2009) and tons of other papers. However, just a short motivation on what is done here. Basically, investors expect a return for the next period ($E_t r_{t+1}$). However, there is uncertainty in this world and hence, you normally don't get what you expect, but what actually happens, i.e. $r_{t+1}$. For example, investor at the beginning of 2008 most definitely expected a positive return on their stocks, otherwise they wouldn't have had invested in them. But in the end, they ended up with a high negative return because negative news came in. So the unexpected return $e_{t+1}$ is just the difference between the actual realization $r_{t+1}$ and the expected return $E_t r_{t+1}$.

However, financial economists are also interested on why returns didn't turn out to be the same as expected. Well obviously, some news must have arrived in period $t+1$ which led to a revisal and adjustment of the stock price, which in turn leads to a different return. The Campbell/Shiller decomposition shows that there are only two relevant parameters: news about future expected cash flows and news about future expected returns. As the above quote already shows, separating between these two is an important issue in financial research.

Now, let's introduce a VAR process. Concretely, we will assume that there is a vector of state variables $z_t$ that follows a first-order VAR. This means that every state variable in period $t+1$ can be explained by a linear combination of the state variables in $t$ and a constant. Surpressing the constant, we can write

$$ z{t+1} = \Gamma zt + u_{t+1} $$

We further assume that the first element of the state variable vector $z{t+1}$ is the equity return $r{t+1}$. We can then write the discount rate news as follows:

How to set up a new blog with ruhoh on github 2013-02-10

Since it took me quite some effort to get this blog running, I give a short summary of the steps I went through. Note, however, that I am really a beginner at this, so I only point you to those links that helped me. Don't even bother to ask me when something doesn't work. Not that I wouldn't want to help...I just couldn't. Oh, and just to be clear: this guide assumes that you are using Linux.

Get ruhoh running

ruhoh is a static blogging platform. Why would you want to use that in the first place? Good question! The point is that I wanted to publish some of my replications of finance papers and for that I wanted to be able to write both formulas and code snippets. Turns out that this isn't so easy with normal blogging platforms. For instance, I signed up for Bloggers from Google and couldn't get it to work. I wasn't able to get mathjax to run.

So after some research, I found Jekyll/Bootstrap, which seemed to do what I wanted. However, on that webpage the maintainer wrote that he now focused his efforts on ruhoh instead, so I thought that I might as well just do that. Note that ruhoh is quite new though, so Jekyll is definitely better documented.

Anyways, to get ruhoh running, just follow the installer guide. It looks pretty straightforward, but I had to deal with the issue that there are a lot of dependencies. I can't all memorize them now, but my workflow was something like this:

  1. Write command from installer guide into terminal.
  2. Get message on why this failed (xyz is missing).
  3. Googling message.
  4. Dealing with it (mostly just installing xyz, which often also needed abc, etc.)

So after that was done, you should have a folder somewhere in your folder structure with all the subfolder copied from Jade's github page. To check if it worked out, fire up a terminal, go to the folder you copied everything into, and type:

$ bundle exec rackup -p 9292

This starts a web server that hosts your blog here: http://localhost:9292. So basically, you can check out your blog in the browser. Now you can edit all the posts and play around.

Install mathjax

You have to install a mathjax widget, which sounds more complicated than it is. Your folder structure should have one folder named widgets. In this folder, add another folder named mathjax, and a subfolder named layouts and copy this file into the layouts folder. Actually, if you check out that file online, you also see the folder structure it has to be into.

Finally, you have to put {{{ mathjax }}} in your default.html file in the /themes/.../layouts subfolder.

Now you should be able to write equations in LaTex. In-text math should be surrounded with $ signs, equations in a separate line with double dollar signs. To check if it worked, copy this sample file into your pages folder and check it out in your localhost session.

Replicating Cochrane (2008) 2013-02-09

In this post, I want to replicate some results of Cochrane (2008), The Dog That Did Not Bark: A Defense of Return Predictability, Review of Financial Studies, 21 (4). You can find that paper on John Cochrane's website. I wrote some thoughts about return predictability already on my Goyal/Welch replication post, so please check this one out for some more background. Or just read the papers, they explain it better than I could anyway.

Replication of the forecasting regressions in Cochrane's Table 1

Let's first repeat the forecasting regressions Cochrane runs in Table 1 of his paper. He uses data in real terms, i.e. deflated by the CPI, and on an annual basis ranging from 1926 to 2004. I do not have access to CRSP, but fortunately, we find similar data on Robert Shiller's website. His data is saved in an Excel-file and is formatted in such a way that you cannot just read it into R. So you manually have to delete unnecessary rows and save the sheet Data as a .CSV file. Also, here is the naming convention I apply for the relevant columns:

  • RealR: Real One_Year Interest Rate (column H as of february 2013). Note that Cochrane uses real return on 3-month Treasury-Bills, but I'm to lazy to find that somewhere else and match it.
  • RealP: RealP Stock Price (column P as of february 2013).
  • RealD: RealD S&P Dividend (column O as of february 2013).
  • Ret_SP: Return on S&P Composite (column P as of february 2013).
  • Year: First column with the years.
strPath <- "/home/christophj/Dropbox/R_Package_Development/vignettes_REM/Data/Robert_Shiller_Data_Formatted.csv"
#strPath <- "C:/Dropbox/R_Package_Development/vignettes_REM/Data/Robert_Shiller_Data_Formatted.csv"
shiller_data <-
strStart <- 1924; strEnd <- 2005
shiller_data <- shiller_data[, Dgrowth := c(NA, exp(diff(log(RealD))))]
shiller_data <- shiller_data[, DP := RealD/RealP]
vec_Ret_SP <- c(NA, shiller_data[2:nrow(shiller_data), RealP + RealD]/shiller_data[1:(nrow(shiller_data)-1), RealP ])
shiller_data <- shiller_data[, Ret_SP := vec_Ret_SP]
shiller_data <- shiller_data[, Ex_Ret_SP := Ret_SP - RealR]
shiller_data <- shiller_data[Year >= strStart & Year <= strEnd, list(Ret_SP, Ex_Ret_SP, RealR, RealD, Dgrowth, DP)]

How to produce nice tables in PDFs using knitr/Sweave and R 2013-02-03

In this post, I want to show you how to produce nice tables in PDFs, even if you use knitr or Sweave to produce your reports dynamically. Why should you use tools for reproducible research in the first place? Well, it guarantees that you always know how you did your analysis. I mean if someone came up to me today and asked me how I computed the mean on page 52 of my diploma thesis, it would take me probably hours to figure that out (or maybe I couldn't figure it out anymore at all). When someone asks me how I computed the mean of one of my papers written during my PhD, I have a look at my knitr document and could tell him in minutes. That's the beauty of it, so you should definitely check it out if you don't use such a tool so far.

However, one thing that bothered me for a while was that the tables produced didn't really look great. I mean they had all necessary information in it, but I just like tables that look good and with LaTex (knitr or Sweave are just built on top of LaTex, so you still use that) it is normally quite easy to make tables look great, for instance by using the package booktabs. In my early knitr days, I just edited the .tex file produced by knitr, but this seemed like a quick and dirty hack that was prone to non-reproducible errors (for instance, you delete one row in the table). That's what you want to get rid of when using those tools, so I figured out how to edit the tables in the source .Rnw file. This is what I want to show you here with a small minimal example.

There are two key tricks that we have to use:

  1. The option in the function print.xtable allows us to enter strings before or after certain rows in your table.
  2. The backslash is a special character in R. For instance, if you want to get a line break you type "\n", which does not actually print that string, but inserts a line break. However, in tables we actually want to enter backslashes at the end of rows because two backslashes break a row there. So how do we do that? We just write four backslashes: the first backslash is then considered as a special character, telling R that the next character should be considered as a normal character, not as a special character. So in this case, the backslash should just be printed. Since we need two backslashes, we have to do that twice. I know, it sounds complicated, but it's quite similar to the percentage sign in LaTex. You can't just write % because this tells LaTex that it should be a comment. To actually get the percentage sign, you have to write \%.

OK, now we have the basics, so let's actually produce a nice table. In R, you need to load the xtable package and in LaTex, you need to load the booktabs package. Also, I use the package caption; otherwise, the caption is too close to the table.

Now imagine we want to compare three different regression models (rows) and want to print in the columns the $\alpha$, the $\beta$, the t-value of the $\beta$ coefficient, and the adjusted $R^2$. With randomly drawn data, our minimal example looks like this.

Here is the source code of the minimal example. Save it as a .Rnw file, knit that file and you should get a nice PDF.


Here is our minimal example:

<<Code_chunk_Minimal_example, results='asis', echo=FALSE>>=
#Just some random data
x1 <- rnorm(1000); x2 <- rnorm(1000); x3 <- rnorm(1000)
y  <- 2 + 1 *x1 + rnorm(1000)
#Run regressions
reg1 <- summary(lm(y ~ x1))
reg2 <- summary(lm(y ~ x2))
reg3 <- summary(lm(y ~ x3))
#Create data.frame
df <- data.frame(Model = 1:3,
                 Alpha = c(reg1$coef[1,1], reg2$coef[1,1], reg3$coef[1,1]),
                 Beta  = c(reg1$coef[2,1], reg2$coef[2,1], reg3$coef[2,1]),
                 tV    = c(reg1$coef[2,2], reg2$coef[2,2], reg3$coef[2,2]),
                 AdjR  = c(reg1$adj.r.s,  reg2$adj.r.s,   reg3$adj.r.s))
strCaption <- paste0("\\textbf{Table Whatever} This table is just produced with some",
                     "random data and does not mean anything. Just to show you how ",
                     "things work.")
print(xtable(df, digits=2, caption=strCaption, label="Test_table"), 
      size="footnotesize", #Change size; useful for bigger tables
      include.rownames=FALSE, #Don't print rownames
      include.colnames=FALSE, #We create them ourselves
      hline.after=NULL, #We don't need hline; we use booktabs = list(pos = list(-1, 
                        command = c(paste("\\toprule \n",
                                          "Model & $\\alpha$ & $\\beta$ & t-value & $R^2$ \\\\\n", 
                                          "\\midrule \n"),
                                    "\\bottomrule \n")