37

I am trying to remove some columns in a dataframe. I want to know why it worked for a single column but not with multible columns e.g. this works

album2[,5]<- NULL

this doesn't work:

album2[,c(5:7)]<- NULL
Error in `[<-.data.frame`(`*tmp*`, , 5:7, value = NULL) : 
replacement has 0 items, need 600

This also doesn't work:

for (i in 5: (length(album2)-1)){
 album2[,i]<- NULL
}
Error in `[<-.data.frame`(`*tmp*`, , i, value = NULL) : 
new columns would leave holes after existing columns
7
  • 7
    Try album2[,5:7]<- list(NULL) Commented Jan 5, 2016 at 17:37
  • It would be great if you could supply a minimal reproducible example to go along with your question. Something we can work from and use to show you how it might be possible to answer your question. That way others can also befit form your question, and the accompanying answer, in the future. You can have a look at this SO post on how to make a great reproducible example in R. Commented Jan 5, 2016 at 17:42
  • 1
    @EricFail especially as, as far as I can tell, the first example "e.g. this works" doesn't actually work. Commented Jan 5, 2016 at 17:47
  • @doctorG using "list(NULL)" made it work with multiple columns , using NULL with a single column worked.i will take care of reproducibility in the future . Commented Jan 5, 2016 at 17:57
  • See my question here. Commented Jan 6, 2016 at 2:29

8 Answers 8

64

Basic subsetting:

album2 <- album2[, -5] #delete column 5
album2 <- album2[, -c(5:7)] # delete columns 5 through 7
Sign up to request clarification or add additional context in comments.

4 Comments

drop columns by their position is not recommended, at least for me.
Yes, and no. The OP was posed in context of specifying column positions. If you know the desired positions, then this is fine. For others to know if your comment is useful/relevant to them, could you add why you would not recommend it?
well, what if one adds a new column to his/her data then column position changed? I agree that your answer is correct, but it's neither safe nor efficient.
It's implicit you're at the point of knowing what column numbers you want. Getting to that point is up to you. Considering whether you're doing it interactively or programmatically (and thus what conditions you need to cope with) is also up to you.
42

Adding answer as this was the top hit when searching for "drop multiple columns in r":

The general version of the single column removal, e.g df$column1 <- NULL, is to use list(NULL):

df[ ,c('column1', 'column2')] <- list(NULL)

This also works for position index as well:

df[ ,c(1,2)] <- list(NULL)

This is a more general drop and as some comments have mentioned, removing by indices isn't recommended. Plus the familiar negative subset (used in other answers) doesn't work for columns given as strings:

> iris[ ,-c("Species")]
Error in -"Species" : invalid argument to unary operator

1 Comment

Can you please explain why list(NULL) and not just NULL?
12

This works for me.

x <-dplyr::select(dataset_df, -c('column1', 'column2'))

Comments

10

If you only want to remove columns 5 and 7 but not 6 try:

album2 <- album2[,-c(5,7)] #deletes columns 5 and 7

Comments

7

@Ahmed Elmahy following approach should help you out, when you have got a vector of column names you want to remove from your dataframe:

test_df <- data.frame(col1 = c("a", "b", "c", "d", "e"), col2 = seq(1, 5), col3 = rep(3, 5))
rm_col <- c("col2")
test_df[, !(colnames(test_df) %in% rm_col), drop = FALSE]

All the best, ExploreR

1 Comment

what is drop doing in this context?
2

Another solution, similar to @Dulakshi Soysa, is to use column names and then assign a range.

For example, if our data frame df(), has column names defined as column_1, column_2, column_3 up to column_15. We are interested in deleting the columns from the 5th to the 10th.

We can specify a range using column names e.g.,

library(dplyr)
x = select(df, -c('column_5':'column_10'))

Specifying the range can save some time when you are deleting multiple adjacent columns. It can also be used if you want to use some adjacent and some non-adjacent columns. For example, if you want to remove the 1st column in addition to the previously specified columns, you would update the code as below:

library(dplyr)
x = select(df, -c('column_1', 'column_5':'column_10'))

Comments

1

The following line will remove col_1 and col_2 from the data frame 'data'

data[!(colnames(data) %in% c('col_1','col_2'))]

Comments

1

Here is an interesting solution I read the other day in @JoachimSchork's blog, Statistics Globe. You can remove columns by column name. You can find out more here.

library(data.table)

mtcars2 <- mtcars

setDT(mtcars2)[, c("mpg", "cyl", "disp", "hp") := NULL]

> head(mtcars2)
   drat    wt  qsec vs am gear carb
1: 3.90 2.620 16.46  0  1    4    4
2: 3.90 2.875 17.02  0  1    4    4
3: 3.85 2.320 18.61  1  1    4    1
4: 3.08 3.215 19.44  1  0    3    1
5: 3.15 3.440 17.02  0  0    3    2
6: 2.76 3.460 20.22  1  0    3    1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.