From the course: Deep Learning with Python and Keras: Build a Model for Sentiment Analysis

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Representing text using TF-IDF vectorization

Representing text using TF-IDF vectorization

- [Instructor] Now so far, we've trained a dense neural network by encoding our input text data using a count vectorizer. What I'm going to do now is train the same dense neural network, but we'll use a different encoding for our data. We'll use a TfidfVectorizer. TF-IDF stands for Term Frequency Inverse Document Frequency, and in this vectorization, we continue to use feature vectors the size of our vocabulary to represent text, except that every token in the input text is represented using a TF-IDF score. The term frequency component of the score upways words that occur more frequently in the input text. More frequently occurring words are considered more important. The IDF term downways words that occur more frequently across the entire corpus of documents. The assumption is that more frequently occurring words are common words with less information content. Let's see how we can use the same text vectorizer to…

Contents