-2

im trying to code this algorithm but im struggling with it and step 6 is confusing me my code so far is at the bottom

  1. Set a positive value for K.
  2. Select K different rows from the data matrix at random.
  3. For each of the selected rows a. Copy its values to a new list, let us call it c. Each element of c is a number. (at the end of step 3, you should have the lists 𝑐1 , 𝑐2 , … , 𝑐𝐾. Each of these should have the same number of columns as the data matrix)
  4. For each row i in the data matrix a. Calculate the Manhattan distance between data row 𝐷′ 𝑖 and each of the lists 𝑐1 , 𝑐2 , … , 𝑐𝐾. b. Assign the row 𝐷′ 𝑖 to the cluster of the nearest c. For instance, if the nearest c is 𝑐3 then assign row i to the cluster 3 (ie. you should have a list whose ith entry is equal to 3, let’s call this list S).
  5. If the previous step does not change S, stop.
  6. For each k = 1, 2, …, K a. Update 𝑐𝑘. Each element j of 𝑐𝑘should be equal to the median of the column 𝐷′𝑗 but only taking into consideration those rows that have been assigned to cluster k.
  7. Go to Step 4.

Notice that in the above K is not the same thing as k


#This is what i have so far:
def clustering(matrix,k):
    for i in k: 

I'm stuck with how it would choose the rows randomly and also I don't understand what step 5 and 6 mean if someone could explain

3
  • Steps 5 and 6 are off-topic IMO. Have you done any research? All you need is to select some data randomly, no? Commented Dec 18, 2019 at 18:17
  • 1
    Also, this is probably a duplicate: stackoverflow.com/q/14262654/11301900 Commented Dec 18, 2019 at 18:18
  • @dee see my answer Commented Dec 18, 2019 at 19:25

1 Answer 1

0

You need np.random.choice.

Use this:

import numpy as np

# some data with 10 rows and 5 columns
X=np.random.rand(10,5)

def clustering(X,k):
    # create the random indices (selector)
    random_selector = np.random.choice(range(X.shape[0]), size=k, replace=False) # replace=False to get unique samples

    # select randomly k=10 lines
    sampled_X = X[random_selector] # X[random_selector].shape = (10,5)
    .
    .
    .
    return #SOMETHING

Now you can continue working on your ?homework?

Sign up to request clarification or add additional context in comments.

9 Comments

Why not use random.randint()? (stackoverflow.com/q/14262654/11301900)
it's the same thing -- I prefer choice
You do realize that this way is just more complicated for no reason, right?
in this particular case yes. it leads to the same desired result.
Sure, but just because two methods give the same result doesn’t mean that they are equally as good. I see no benefit to using choice, it’s reinventing the wheel.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.