clustering algorithm in python [duplicate]

Question

im trying to code this algorithm but im struggling with it and step 6 is confusing me my code so far is at the bottom

Set a positive value for K.
Select K different rows from the data matrix at random.
For each of the selected rows a. Copy its values to a new list, let us call it c. Each element of c is a number. (at the end of step 3, you should have the lists 𝑐1 , 𝑐2 , … , 𝑐𝐾. Each of these should have the same number of columns as the data matrix)
For each row i in the data matrix a. Calculate the Manhattan distance between data row 𝐷′ 𝑖 and each of the lists 𝑐1 , 𝑐2 , … , 𝑐𝐾. b. Assign the row 𝐷′ 𝑖 to the cluster of the nearest c. For instance, if the nearest c is 𝑐3 then assign row i to the cluster 3 (ie. you should have a list whose ith entry is equal to 3, let’s call this list S).
If the previous step does not change S, stop.
For each k = 1, 2, …, K a. Update 𝑐𝑘. Each element j of 𝑐𝑘should be equal to the median of the column 𝐷′𝑗 but only taking into consideration those rows that have been assigned to cluster k.
Go to Step 4.

Notice that in the above K is not the same thing as k

#This is what i have so far:
def clustering(matrix,k):
    for i in k:

I'm stuck with how it would choose the rows randomly and also I don't understand what step 5 and 6 mean if someone could explain

Steps 5 and 6 are off-topic IMO. Have you done any research? All you need is to select some data randomly, no? — AMC
– AMC, Commented Dec 18, 2019 at 18:17
Also, this is probably a duplicate: stackoverflow.com/q/14262654/11301900 — AMC
– AMC, Commented Dec 18, 2019 at 18:18

seralouk · Accepted Answer · 2019-12-18 19:15:55Z

0

You need np.random.choice.

Use this:

import numpy as np

# some data with 10 rows and 5 columns
X=np.random.rand(10,5)

def clustering(X,k):
    # create the random indices (selector)
    random_selector = np.random.choice(range(X.shape[0]), size=k, replace=False) # replace=False to get unique samples

    # select randomly k=10 lines
    sampled_X = X[random_selector] # X[random_selector].shape = (10,5)
    .
    .
    .
    return #SOMETHING

Now you can continue working on your ?homework?

edited Dec 18, 2019 at 19:15

answered Dec 18, 2019 at 15:35

seralouk

33.6k10 gold badges127 silver badges141 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

AMC Over a year ago

Why not use random.randint()? (stackoverflow.com/q/14262654/11301900)

seralouk Over a year ago

it's the same thing -- I prefer choice

AMC Over a year ago

You do realize that this way is just more complicated for no reason, right?

seralouk Over a year ago

in this particular case yes. it leads to the same desired result.

AMC Over a year ago

Sure, but just because two methods give the same result doesn’t mean that they are equally as good. I see no benefit to using choice, it’s reinventing the wheel.

|

Collectives™ on Stack Overflow

clustering algorithm in python [duplicate]

1 Answer 1

9 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Linked

Related