14

I'm getting started on a Tensorflow project, and am in the middle of defining and creating my feature columns. However, I have hundreds and hundreds of features- it's a pretty extensive dataset. Even after preprocessing and scrubbing, I have a lot of columns.

The traditional way of creating a feature_column is defined in the Tensorflow tutorial and even this StackOverflow post. You essentially declare and initialize a Tensorflow object for each feature column:

gender = tf.feature_column.categorical_column_with_vocabulary_list(
    "gender", ["Female", "Male"])

This works all well and good if your dataset has only a few columns, but in my case, I surely don't want to have hundreds of lines of code initializing different feature_column objects.

What's the best way to resolve this issue? I notice that in the tutorial, all the columns are collected as a list:

base_columns = [
    gender, native_country, education, occupation, workclass, relationship,
    age_buckets,
]

Which is ultimately passed into your estimator:

m = tf.estimator.LinearClassifier(
    model_dir=model_dir, feature_columns=base_columns)

So would the ideal way of handling feature_column creation for hundreds of columns be to append them directly into a list? Something like this?

my_columns = []

for col in df.columns:
    if is_string_dtype(df[col]): #is_string_dtype is pandas function
        my_column.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
            hash_bucket_size= len(df[col].unique())))

    elif is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
        my_column.append(tf.feature_column.numeric_column(col))

Is this the best way of creating these feature columns? Or am I missing some functionality to Tensorflow that allows me to work around this step?

3
  • 1
    What you have makes sense to me. :) Commented Oct 20, 2017 at 20:04
  • Can you promote that to an answer, @greeness ? Thanks! :) Commented Nov 18, 2017 at 15:30
  • alright, it does not add anything to op's question though. Commented Nov 20, 2017 at 4:13

3 Answers 3

8

What you have posted in the question makes sense. Small extension based on your own code:

import pandas.api.types as ptypes
my_columns = []
for col in df.columns:
  if ptypes.is_string_dtype(df[col]): 
    my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
        hash_bucket_size= len(df[col].unique())))

  elif ptypes.is_numeric_dtype(df[col]): 
    my_columns.append(tf.feature_column.numeric_column(col))

  elif ptypes.is_categorical_dtype(df[col]): 
    my_columns.append(tf.feature_column.categorical_column(col, 
        hash_bucket_size= len(df[col].unique())))
Sign up to request clarification or add additional context in comments.

Comments

1

I used your own answer. Just edited a little bit (there should be my_columns instead of my_column in for loop) and posting it the way it worked for me.

import pandas.api.types as ptypes

my_columns = []

for col in df.columns:
  if ptypes.is_string_dtype(df[col]): #is_string_dtype is pandas function
    my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
        hash_bucket_size= len(df[col].unique())))

  elif ptypes.is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
    my_columns.append(tf.feature_column.numeric_column(col))

Comments

0

The above two methods works only if the data is provided in pandas data frame where you have column name for each column. But, in case you have all numeric column and you don't want to name those columns. for e.g. reading several numerical columns from a numpy array, you can use something like this:-

feature_column = [tf.feature_column.numeric_column(key='image',shape=(784,))] 

input_fn = tf.estimator.inputs.numpy_input_fn(dict({'image':x_train})  

where X_train is your numy array with 784 columns. You can check this post by Vikas Sangwan for more details.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.