More generally, you can use ifelse to choose between two values depending on a condition. So if instead of a 0-1 dummy variable, for some reason you wanted to use, say, 4 and 7, you could use ifelse(year == 1957, 4, 7).
This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.
You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.
Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.
#Generate example dataframe with character column
example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
names(example) <- "strcol"
#For every unique value in the string column, create a new 1/0 column
#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
for(level in unique(example$strcol)){
example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)
}
The other answers here offer direct routes to accomplish this task—one that many models (e.g. lm) will do for you internally anyway. Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. While somewhat more verbose, they both scale easily to more complicated situations, and fit neatly into their respective frameworks.
For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.
1) Make your own make_dummies-function
# example data
df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))
# create a function
make_dummies <- function(v, prefix = '') {
s <- sort(unique(v))
d <- outer(v, s, function(v, s) 1L * (v == s))
colnames(d) <- paste0(prefix, s)
d
}
# bind the dummies to the original dataframe
cbind(df2, make_dummies(df2$year, prefix = 'y'))
However, this will not work when there are duplicate values in the column for which the dummies have to be created. In the case a specific aggregation function is needed for dcast and the result of of dcast need to be merged back to the original:
# example data
df3 <- data.frame(var = c("B", "C", "A", "B", "C"))
# aggregation function to get dummy values
f <- function(x) as.integer(length(x) > 0)
# reshape to wide with the cumstom aggregation function and merge back to the original
merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)
which gives (note that the result is ordered according to the by column):
var A B C
1 A 1 0 0
2 B 0 1 0
3 B 0 1 0
4 C 0 0 1
5 C 0 0 1
3) use the spread-function from tidyr (with mutate from dplyr)
library(dplyr)
library(tidyr)
df2 %>%
mutate(v = 1, yr = year) %>%
spread(yr, v, fill = 0)