I'm getting started on a Tensorflow project, and am in the middle of defining and creating my feature columns. However, I have hundreds and hundreds of features- it's a pretty extensive dataset. Even after preprocessing and scrubbing, I have a lot of columns.
The traditional way of creating a feature_column is defined in the Tensorflow tutorial and even this StackOverflow post. You essentially declare and initialize a Tensorflow object for each feature column:
gender = tf.feature_column.categorical_column_with_vocabulary_list(
"gender", ["Female", "Male"])
This works all well and good if your dataset has only a few columns, but in my case, I surely don't want to have hundreds of lines of code initializing different feature_column objects.
What's the best way to resolve this issue? I notice that in the tutorial, all the columns are collected as a list:
base_columns = [
gender, native_country, education, occupation, workclass, relationship,
age_buckets,
]
Which is ultimately passed into your estimator:
m = tf.estimator.LinearClassifier(
model_dir=model_dir, feature_columns=base_columns)
So would the ideal way of handling feature_column creation for hundreds of columns be to append them directly into a list? Something like this?
my_columns = []
for col in df.columns:
if is_string_dtype(df[col]): #is_string_dtype is pandas function
my_column.append(tf.feature_column.categorical_column_with_hash_bucket(col,
hash_bucket_size= len(df[col].unique())))
elif is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
my_column.append(tf.feature_column.numeric_column(col))
Is this the best way of creating these feature columns? Or am I missing some functionality to Tensorflow that allows me to work around this step?