0

So I have this in Matplotlib.

plt.scatter(X[: , 0:1][Y == 0], X[: , 2:3][Y==0])
plt.scatter(X[: , 0:1][Y == 1], X[: , 2:3][Y==1])
plt.scatter(X[: , 0:1][Y == 2], X[: , 2:3][Y==2])

I'd like to know if there's a better way to loop instead of this:

for i in range(3):
  plt.scatter(X[: , 0:1][Y == i], X[: , 2:3][Y==i])

MVCE:

# CSV: https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv
data = np.loadtxt('/content/drive/My Drive/Colab Notebooks/Machine Learning/iris.csv', skiprows=1, delimiter=',')

X = data[:, 0:4]
Y = data[:, 4:5]

# Scatter
for i in range(len(np.intersect1d(Y, Y))):
  plt.scatter(X[: , 0:1][Y == i], X[: , 3:4][Y==i])

# map(lambda i: plt.scatter(X[: , 0:1][Y == i], X[: , 2:3][Y==i]), range(3))

plt.title("Scatter Sepal Length / Petal Width ")
plt.legend(('Setosa', 'Versicolor', 'Virginica'))
plt.show()

15
  • 2
    This already very compact. You could map a lambda function to the range(3) iterable to save a line, but this does have any benefit. What are you trying to make better? I do not see an obvious error. Commented May 18, 2020 at 20:02
  • Our teacher told us we shouldn't use loops when we're using Numpy, so I assumed that maybe matplotlib works like numpy, that magically there would be an attribute for the method that could iterate that increasing "y" how could I do that with map()? Commented May 18, 2020 at 20:39
  • 1
    something like map(lambda i: plt.scatter(X[: , 0:1][Y == i], X[: , 2:3][Y==i]), range(3)). This should work. I have not tested this though. (I like the for-loop more. looks cleaner) Commented May 18, 2020 at 21:07
  • 1
    Maybe you should ask specifically for numpy solution and set the numpy tag, if that's what you want :) Commented May 18, 2020 at 21:10
  • 1
    @Sharki. An MCVE means extracting a small piece of data that is representative of the actual problem. It does not mean copy and pasting the whole problem, data and all. It's an art form that most beginners have trouble with because it requires intuiting the minimum necessary to represent the actual problem, and most beginners have trouble identifying the problem. Commented May 20, 2020 at 1:15

2 Answers 2

1

Probably the simplest way to display your data is with a single plot containing multiple colors.

The key is to label the data more efficiently. You have the right idea with np.intersect1d(Y, Y), but though clever, this not the best way to set up unique values. Instead, I recommend using np.unique. Not only will that remove the need to hard-code the argument to plt.legend, but the return_inverse argument will allow you to construct attributes directly.

A minor point is that you can index single columns with a single index, rather than a slice.

For example,

X = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[0, 1, 2, 3])
Y = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[4], dtype=str)

labels, indices = np.unique(Y, return_inverse=True)
scatter = plt.scatter(X[:, 0], X[:, 2], color=indices)

The array indices indexes into the three unique values in labels to get the original array back. You can therefore supply the index as a label for each element.

Constructing a legend for such a labeled dataset is something that matplotlib fully supports out of the box, as I learned from matplotlib add legend with multiple entries for a single scatter plot, which was inspired by this solution. The gist of it is that the object that plt.scatter returns has a method legend_elements which does all the work for you:

plt.legend(scatter.legend_elements()[0], labels)

legend_elements returns a tuple with two items. The first is handle to a collection of elements with distinct labels that can be used as the first argument to legend. The second is a set of default text labels based on the numerical labels you supplied. We discard these in favor of our actual text labels.

Sign up to request clarification or add additional context in comments.

Comments

1

You can do a much better job with the indexing by splitting the data properly.

The indexing expression X[:, 0:1][Y == n] extracts a view of the first column of X. It then applies the boolean mask Y == n to the view. Both steps can be done more concisely as a single step: X[Y == n, 0]. This is a bit inefficient since you will do this for every unique value in Y.

My other solution called for np.unique to group the labels. But np.unique works by sorting the array. We can do that ourselves:

X = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[0, 1, 2, 3])
Y = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[4], dtype=str)

ind = np.argsort(Y)
X = X[ind, :]
Y = Y[ind]

To find where Y changes, you can apply an operation like np.diff, but tailored to strings:

diffs = Y[:-1] != Y[1:]

The mask can be converted to split indices with np.flatnonzero:

inds = np.flatnonzero(diffs) + 1

And finally, you can split the data:

data = np.split(X, inds, axis= 0)

For good measure, you can even convert the split data into a dictionary instead of a list:

labels = np.concatenate(([Y[0]], Y[inds]))
data = dict(zip(labels, data))

You can plot with a loop, but much more efficiently now.

for label, group in data.items():
    plt.scatter(group[:, 0], group[:, 2], label=label)
plt.legend(labels)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.