1

I have a vector of 5 different values that I use as my sample value, and the label is a single integer of 0, 1, or 3. The machine learning algorithms work when I pass an array as a sample, but I get this warning. How do I pass feature vectors without getting this warning?

import numpy as np
from numpy import random

from sklearn import neighbors
from sklearn.model_selection import train_test_split
import pandas as pd

filepath = 'test.csv'

# example label values
index = [0,1,3,1,1,1,0,0]

# example sample arrays
data = []
for i in range(len(index)):
    d = []
    for i in range(6):
        d.append(random.randint(50,200))
    data.append(d)

feat1 = 'brightness'
feat2, feat3, feat4 = ['h', 's', 'v']
feat5 = 'median hue'
feat6 = 'median value'

features = [feat1, feat2, feat3, feat4, feat5, feat6]

df = pd.DataFrame(data, columns=features, index=index)
df.index.name = 'state'

with open(filepath, 'a') as f:
    df.to_csv(f, header=f.tell() == 0)

states = pd.read_csv(filepath, usecols=['state'])

df_partial = pd.read_csv(filepath, usecols=features)

states = states.astype(np.float32)
states = states.values
labels = states

samples = np.array([])
for i, row in df_partial.iterrows():
    r = row.values
    samples = np.vstack((samples, r)) if samples.size else r

n_neighbors = 5

test_size = .3
labels, test_labels, samples, test_samples = train_test_split(labels, samples, test_size=test_size)
clf1 = neighbors.KNeighborsClassifier(n_neighbors, weights='distance')
clf1 = clf1.fit(samples, labels)

score1 = clf1.score(test_samples, test_labels)

print("Here's how the models performed \nknn: %d %%" %(score1 * 100))

Warning:

"DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). clf1 = clf1.fit(samples, labels)"

sklearn documentation for fit(self, X, Y)

2 Answers 2

2

Try replacing

states = states.values by states = states.values.flatten()

OR

clf1 = clf1.fit(samples, labels) by clf1 = clf1.fit(samples, labels.flatten()).

states = states.values holds the correct labels that were stored in your panda dataframe, however they are getting stored on different rows. Using .flatten() put all those labels on the same row. (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.ndarray.flatten.html)

In Sklearn's KNeighborsClassifier documentation (https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html), they show in their example that the labels must be stored on the same row: y = [0, 0, 1, 1].

Sign up to request clarification or add additional context in comments.

2 Comments

Also I'm really confused by the documentation. What does it mean about the x input? That it is a 2d array of n_samples x n_samples or something else? or is it literally just a list [n_samples, n_samples] like it says? I put a screenshot of the sklearn documentation in my question above ^^^
You are very welcome! Regarding the x input, it is an array/matrix that holds points that have n_features. In your case, your points have 6 features (Brightness, h, s, v, median hue, and median value), so n_features = 6. Your X therefore holds 28 points having 6 features each, so its shape [n_samples, n_features] will be [28, 6].Try adding print(samples) right before clf1 = clf1.fit(samples, labels) in your code. It will help you visualize it better.
0

When you retrieve data from dataframe states, it is stored in multiple rows (column vector) whereas it expected values in single row.

You can also try using ravel() function which is used to create a contiguous flattened array.

numpy.ravel(array, order = ‘C’) : returns contiguous flattened array (1D array with all the input-array elements and with the same type as it)

Try:

states = states.values.ravel() in place of states = states.values

3 Comments

So ravel() and flatten() are the same thing essentially?
I did little bit research on this, Although ravel() and flatten() are two ways to convert a ndarray to 1D array, they have some differences. Ravel return reference to original array, and changes in array reflect in original array, whereas Flatten copy the original array and changes to array does not affect original array. As Ravel is just a reference of original array and completely avoid copying of data, it is faster than flatten.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.