4

I have a csv file with thousands of entries that need to be broken up into groups. In the example below, I need each row broken up into groups based on the River Name so later I can reformat the information based on their groups.

River Name, Branch, Length
Catnip, 1, 2145.30
Peterson, 2, 24.5
Catnip, 3, 15.4
Fergerson, 1, 5.2
Catnip, 1, 88.56
Peterson, 2, 6.45

The only way I can think of grouping the information would be to:

  1. Use python to read csv and create a list of just the unique river names.
  2. Create new individual csv based on the unique river names e.g Peterson.csv, Catnip.csv.
  3. Use python to read the original csv, and depending on the river name on the row being read, write that row to the corresponding .csv file. e.g row Catnip, 1, 2145.30 would be written to catnip.csv

I don't think this is an efferent way to go about this as it gives me about 1500 csv that will need to be open and written to, but I am at my limits of python knowledge. If any one could provide a better methodology, it would greatly be appreciated.

5 Answers 5

6

You can also simply use the csv module and save the results to a dictionary. I enumerated the reader to skip the first row (I'm sure there must be an easier way...). I then read each row and assign the values to river, branch and length. If the river is not in the dictionary, then it initializes it with an empty list. It then appends the tuple pair of branch and length to the dictionary.

rivers = {}
with open('rivers.csv', mode='rU') as f:
    reader = csv.reader(f, delimiter=',')  # dialect=csv.excel_tab?
    for n, row in enumerate(reader):
        if not n:
            # Skip header row (n = 0).
            continue  
        river, branch, length = row
        if river not in rivers:
            rivers[river] = list()
        rivers[river].append((branch, length))

>>> rivers
{'Catnip': [('1', '2145.3'), ('3', '15.4'), ('1', '88.56')],
 'Fergerson': [('1', '5.2')],
 'Peterson': [('2', '24.5'), ('2', '6.45')]}
Sign up to request clarification or add additional context in comments.

1 Comment

Why do I get ValueError: too many values to unpack (expected 2) in the river, branch, length = row
3

You can use pandas library. Read your csv file with delimitter comma,

import pandas as pd
df =  pd.read_csv('yourfile.csv',sep=',')

Df is a dataframe in pandas, it is used to manipulate imported csv files.

pandas automatically partitions your csv file into columns/rows. You can simply use df['River Name'] to access River Name column.

Comments

1

python pandas can handle csv dataset. I haven't done anything related to it but it would be good idea to check the pandas first.

http://pandas.pydata.org/pandas-docs/stable/

Comments

1

A collections.defaultdict will do the trick:

from collections  import defaultdict, namedtuple
import csv

branches = defaultdict(set)
Branch = namedtuple('Branch', 'branch length'.split())

with open('rivers.csv') as fin:
    reader = csv.DictReader(fin)
    for row in reader:
        branch = Branch(row['Branch'], row['Length'])
        branches[row['River Name']].add(branch)

for river in branches:
    with open(river+'.csv', 'w') as fout:
        writer = csv.DictWriter(fout, ['Branch', 'Length'])
        writer.writeheader()
        for branch in branches[river]:
            writer.writerow({'Branch': branch.branch, 
                                        'Length': branch.length})

Comments

1

Use generator

FILENAME = "river.csv"
river_dict = dict()

with open(FILENAME) as fd:
    line = (l for l in fd.readlines())
    detail = (d.split(',') for d in line)
    for river_name, branch, length in detail:
        river_name, branch, length = map(str.strip, [river_name, branch, length])
        with open(river_name.title() + ".csv", "a") as rd:
            rd.write("{0}, {1}\n".format(branch, length))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.