I'm trying to follow this example:
Data in the following format is stored in a csv file:
Date Open High Low Close Adj. close Volume
23/01/2018 1.00 3.00 2.00 2.10 2.15 1000
This data is read using the following code:
self.symbol_data[s] = pd.io.parsers.read_csv( os.path.join(self.csv_dir,
’%s.csv’ % s),
header=0, index_col=0, parse_dates=True,
names=['datetime’, ’open’, ’high’, ’low’, ’close’, ’volume’, ’adj_close’]).sort()
Just to check the .sort() sorts the frame by the values in the first column correct?
My problem is I'm using a different version of Python 3.6 vs his 2.x and a different version of pandas 0.22.0 vs (not sure but it's older), I'm also trying to access data from a different source that has a different format. There are some extra columns and the names of the columns are slightly different.
timestamp open high low close adjusted_close Volume div_amt split
23/01/2018 1.00 3.00 2.00 2.10 2.15 1000 0 1
self.symbol_data[s] = pd.read_csv(os.path.join(self.csv_dir, '%s.csv' %s),
usecols=[0,1,2,3,4,5,6],
header=0, index_col=0, parse_dates=True,
names=['timestamp', 'open', 'high','low', 'close', 'adjusted_close',
'volume']).sort_values(by=['timestamp'])
Will the pd.read_csv call above achieve what I want?
Is it possible to select the columns to be read by name?
Also can I check that the names=[] refers to the colums in the Pandas DateFrame?
I think the python help is not clear on this:
names : List of column names to use.
Which columns the csv file of the pandas dataframe and to use for what?
Anyway currently I'm having problems with the sorting part. Is the sort_values equivalent(by='timestamp') to sort() above?
Also I'm getting this error:
KeyError: 'timestamp'
Any suggestions on how to resolve this?