1

I have csv file with 4 columns and would like to create a python list of arrays, with each csv row being an array.

I am able to get each row as an array but the problem is that the array begins and ends with quotes.

cvs data format:

User Link,Reputation,DisplayName,Location   
353410,"47245","John Doe","Uruguay" 
927034,"46782","Jane Doe","Bahia Blanca, Argentina" 

This is one of the codes I tried:

with open('Query_SO_Arg.csv', 'rb') as csvfile:
    so = csv.reader(csvfile, delimiter=',', quotechar='"')
    so_data = []
    so.next()
    for row in so:
        so_data.append(row)
    print so_data

This is the output I am getting:

[['353410,"47245","John Doe","Uruguay";'], ['927034,"46782","Jane Doe","Bahia Blanca, Argentina";'], ['62024,"41775","Jim Doe","Buenos Aires, Argentina";'], 

How can I build this structure but without the external '' so I can work with the data?

Thanks!

EDIT:

This is the data of a brand new csv file (with the same structure as the original one):

User Link,Reputation,DisplayName,Location
60000,"40000","Diego K","Buenos Aires, Argentina"
240000,"37000","Claudio R","Buenos Aires, Argentina"

This is the output I am getting (with the same old quote problem):

[['60000,"40000","Diego K", "Buenos Aires, Argentina"'], ['240000,"37000","Claudio R","Buenos Aires, Argentina"']]

EDIT 2 if I use the following code:

so = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in so:
    print ', '.join(row)

I get:

User Link, Reputation, DisplayName, Location
60000,"40000","Diego K","Buenos Aires, Argentina"
240000,"37000","Claudio R","Buenos Aires, Argentina"

The data seems to be ok with the exception that there are no lists. Does this give any clue of why I cannot make the move to building lists properly?

EDIT 3: Per @MartijinPieters kind request I am posting the following code:

print repr(open('So_fake_data_test.csv', 'rb').read())

which outputs:

'User Link,Reputation,DisplayName,Location\r\n"60000,""40000"",""Diego K"",""Buenos Aires, Argentina"""\r\n"240000,""37000"",""Claudio R"",""Buenos Aires, Argentina"""\r\n'

Thanks @MartijinPieters

EDIT 4

CSV screeshot

I hope this helps. Thanks again.

26
  • That code shouldn't give you that output from that input. Commented Mar 11, 2015 at 15:32
  • 1
    I cannot reproduce your issue. I do note that your output includes semicolons, which your input doesn't have. Commented Mar 11, 2015 at 15:32
  • All you need to do is use list(csv.reader(csvfile)) to get your list of lists; the default dialect configuration is enough. Commented Mar 11, 2015 at 15:33
  • Thank you for the answers. I have the above mentioned csc file and applied the above mentioned code...is there anything that I could be missing? Commented Mar 11, 2015 at 15:37
  • 2
    @Diego: your input says John Doe is in Uruguay, but the output says Argentina. Your input doesn't have any semicolons, your output does. You may think that's the file you're running that code on, but I assure you you're not. Commented Mar 11, 2015 at 15:41

3 Answers 3

2

Finally I found a solution.

The misterious problem is not related to the code or the data itself but to the way Excel saves the original downloaded data.

This is what I was doing: downloaded the csv file with the original data, opened it in Excel and saved it as a recognizable name.

This is the solution I found: download the csv file, go to Windows Explorer and rename the file.

With this basic operation and the following code everything works fine:

so = csv.reader(csvfile, delimiter=',', quotechar='"')
so = list(so)

Thanks for all your inputs, specially to @MartijnPieters!

Sign up to request clarification or add additional context in comments.

Comments

1

this works for me (Python 3.4):

import csv
with open('Query_SO_Arg.csv', 'r') as csvfile:
    so = csv.reader(csvfile, delimiter=',', quotechar='"')
    so_data = []
    for row in so:
        so_data.append(row)

    print(so_data[1:])

The output is:

[['353410', '47245', 'John Doe', 'Uruguay '], ['927034', '46782', 'Jane Doe', 'Bahia Blanca, Argentina ']]

3 Comments

Thanks @dm295. I get the same output (with the quotes). I am using python 2.7, although I do not think is the issue here.
@diego - our outputs are not the same - you have a lists with only one item in them, my output lists have 4 items
Thanks, what I meant is that when I run your code I am getting a different output than what you are getting. I still get my unsatisfactory output (with the quotes). This is the issue that is happening with every solution proposed so far. I am not sure what the problem is.
0

Tested in Python 3.11.1

import numpy as np
    
# row = f.getline()   
row = "1.1,2.2,3.3,4.4,5.5\n"
row_arr = np.asarray([float(i) for i in row.replace('\n','').split(',')])
print(row_arr)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.