1

I'm trying to follow this example:

Data in the following format is stored in a csv file:

  Date       Open    High    Low   Close   Adj. close  Volume 
23/01/2018    1.00    3.00    2.00   2.10     2.15       1000

This data is read using the following code:

self.symbol_data[s] = pd.io.parsers.read_csv( os.path.join(self.csv_dir,
’%s.csv’ % s), 
header=0, index_col=0, parse_dates=True,
names=['datetime’, ’open’, ’high’, ’low’, ’close’, ’volume’, ’adj_close’]).sort()

Just to check the .sort() sorts the frame by the values in the first column correct?

My problem is I'm using a different version of Python 3.6 vs his 2.x and a different version of pandas 0.22.0 vs (not sure but it's older), I'm also trying to access data from a different source that has a different format. There are some extra columns and the names of the columns are slightly different.

timestamp     open    high    low   close adjusted_close  Volume div_amt split
23/01/2018    1.00    3.00    2.00   2.10     2.15         1000     0      1 

self.symbol_data[s] = pd.read_csv(os.path.join(self.csv_dir, '%s.csv' %s),
usecols=[0,1,2,3,4,5,6],
header=0, index_col=0, parse_dates=True,
names=['timestamp', 'open', 'high','low', 'close', 'adjusted_close',
    'volume']).sort_values(by=['timestamp'])

Will the pd.read_csv call above achieve what I want?

Is it possible to select the columns to be read by name?

Also can I check that the names=[] refers to the colums in the Pandas DateFrame?
I think the python help is not clear on this: names : List of column names to use.
Which columns the csv file of the pandas dataframe and to use for what?

Anyway currently I'm having problems with the sorting part. Is the sort_values equivalent(by='timestamp') to sort() above?

Also I'm getting this error:

KeyError: 'timestamp'

Any suggestions on how to resolve this?

0

1 Answer 1

2

You set first column to index by index_col=0, so need change sort_values to sort_index.

How it working in some very old pandas versions, below 0.17.0 - check docs.

EDIT:

Csv has header col1 and col2. If want replace columns names use parameter names and header=0:

temp=u"""col1,col2
1,2
4,8"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), names=['a', 'b'], header=0)
print (df)
   a  b
0  1  2
1  4  8

If omit header=0 new columns names are prepended:

temp=u"""col1,col2
1,2
4,8"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), names=['a', 'b'])
print (df)
      a     b
0  col1  col2
1     1     2
2     4     8

But if no csv header and use header=0 then first data row 1 and 2 is lost:

temp=u"""
1,2
4,8"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), names=['a', 'b'], header=0)
print (df)
   a  b
0  4  8

Correctly need only parameter names:

temp=u"""
1,2
4,8"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), names=['a', 'b'])
print (df)
   a  b
0  1  2
1  4  8
Sign up to request clarification or add additional context in comments.

3 Comments

That answers the bit about the bug which I thank you for but the other question is: Is it possible to chose the columns to be read by name?
And does names=[] set the column names of the pandas dataframe that has been read in
Do you think names parameter? Yes, but it is used if no header of csv. There are 2 possible ways - if use header=0 ir replace original columns names (first row in csv) and if omit it it create new columns names without replace.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.