Numpy csv file groupby

Question

I'm reading a CSV file with two columns. The second column describes a label. I would like to see how many of each labels exists in my CSV file.

My solution involves a simple for loop and a dictionary object:

dataset = np.genfromtxt(input_file, invalid_raise=False, missing_values='N/A', delimiter=",", dtype=str,
                            skip_header=1)
    np.load

    X = dataset[:, 0]
    y = dataset[:, 1]
    classes = dict()
    for label in y:
        if label in classes:
            classes[label] += 1
        else:
            classes[label] = 1

    print classes

Example:

{'Error Processing Payment': 1, 'General Question': 1, 'Display': 5, 'Software': 2}

I was wondering if there is a NumPy function like groupby, which will give me the same functionality?

mommermi · Accepted Answer · 2016-07-01 23:22:22Z

1

You could use numpy's fancy indexing by turning your dataset into a structured array:

dataset = np.genfromtxt(input_file, invalid_raise=False, missing_values='N/A', delimiter=",", dtype=[('data', 'S50'), ('label', 'S50')],
                        skip_header=1)

and then you get the frequency for 'Error Processing Payment' as simple as:

len(dataset[dataset['label'] == 'Error Processing Payment'])

also, you get all available labels using:

set(dataset['label'])

answered Jul 1, 2016 at 23:22

mommermi

1,09212 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Numpy csv file groupby

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related