1

I am trying to create a classifier using Python and Sklearn. I currently have all my data imported successfully. I have been trying to follow a tutorial from here, changing it a bit as I go. Later into the project I realized that their training and testing data was much different then mine. If I understand it right they had something like this:

X_train = ['Article or News article here', 'Anther News Article or Article here', ...]
y_train = ['Article Type', 'Article Type', ...]
#Same for the X_test and y_test

While I had something like this:

X_train = [['Dylan went in the house. Robert left the house', 'Where is Dylan?'], ['Mary ate the apple. Tom ate the cake', 'Who ate the cake?'], ...]
y_train = ['In the house.', 'Tom ate the cake']
#Same for the X_test and y_test

When I tried to train the classifier with there pipeline:

text_clf = Pipeline([('vect', CountVectorizer(stop_words='english')),
     ('tfidf', TfidfTransformer(use_idf=True)),
     ('clf', SGDClassifier(loss='hinge', penalty='l2', alpha=1e-3, random_state=42, 
     verbose=1)),])

I get the error:

AttributeError: 'list' object has no attribute 'lower'

At this line:

text_clf.fit(X_train, y_train)

After doing research I now know that is because I am inputting a array for my X_train data instead of a string. So my question is, how do I construct a pipeline that will accept arrays for my X_train data and a string for my y_train data? Is this possible to do with a pipeline?

1 Answer 1

1

You can use the tokenizer attribute to tell the CountVectorizer to each list as a single document and turn the lowercase option to False like this

text_clf = Pipeline([('vect', CountVectorizer(tokenizer=lambda single_doc: single_doc,stop_words='english',lowercase=False)),
 ('tfidf', TfidfTransformer(use_idf=True)),
 ('clf', SGDClassifier(loss='hinge', penalty='l2', alpha=1e-3, random_state=42, 
 verbose=1)),])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.