i am trying to load some data that is several thousand rows and 4 columns, where each column is separated by a tab space, and turn every item of every row into an int datatype.
when i create the dataframe like this:
my_data = pd.read_csv('filename', sep='\t')
i get a output where each row looks like this:
col1\tcol2\tcol3\tcol4
i then need to transform this into a numpy array so i do this:
arr_data = np.array(my_data)
this is my output now:
array([['col1\tcol2\tcol3\tcol4'],
['col1\tcol2\tcol3\tcol4'],
['col1\tcol2\tcol3\tcol4'],
.....
.....
so basically now each row is a string.
what i'd like to do is turn everything into an int, instead of string but when i try to to do this:
arr_data = np.array(my_data, dtype=int)
i get a ValueError
do i need to write a nested for loop to go through every row, and then every column in every row to make each item into an int??
edit:
i've also just noticed that when i create the dataframe the data is of shape (rows, 1) instead of (rows, 4), which i guess means the delimiter didn't work?
here's the first few rows:
1 1 5 874965758
1 2 3 876893171
1 3 4 878542960
1 4 3 876893119
1 5 3 889751712
1 7 4 875071561
thanks
'\s+'helps.