How to Create a table with data from array output in Python

Question

I printed out composed array and saved to text file, it like:

({
    ngram_a67e6f3205f0-n: 1,
    logreg_c120232d9faa-regParam: 0.01,
    cntVec_9c0e7831261d-vocabSize: 10000
},0.8580469779197205)
({
    ngram_a67e6f3205f0-n: 2,
    logreg_c120232d9faa-regParam: 0.01,
    cntVec_9c0e7831261d-vocabSize: 10000
},0.8880895806519427)
({
    ngram_a67e6f3205f0-n: 3,
    logreg_c120232d9faa-regParam: 0.01,
    cntVec_9c0e7831261d-vocabSize: 10000
},0.8656452460818544)

I hope extract data to produce python Dataframe, it like:

1, 10000, 0.8580469779197205
2, 10000, 0.8880895806519427

Yes, the content of files is result of cross validation. I print out it, then copied it to files. — Ivan Lee
– Ivan Lee, Commented Oct 4, 2019 at 0:59

Massifox · Accepted Answer · 2019-10-04 04:20:39Z

My advice is to change the input format of your file, if possible. It would greatly simplify your life.
If this is not possible, the following code solves your problem:

import pandas as pd
import re

pattern_tuples = '(?<=\()[^\)]*'
pattern_numbers = '[ ,](?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?'
col_name = ['ngram', 'logreg', 'vocabSize', 'score']

with open('test.txt','r') as f:
    matchs = re.findall(pattern_tuples, f.read())
    arr_data = [[float(val.replace(',','')) for val in re.findall(pattern_numbers, match)] for match in matchs]
    df = pd.DataFrame(arr_data, columns=col_name).astype({'ngram':'int', 'vocabSize': 'int'})

and gives:

   ngram  logreg  vocabSize     score
0      1    0.01      10000  0.858047
1      2    0.01      10000  0.888090
2      3    0.01      10000  0.865645

Brief explanation

Read the file
Using re.findall and the regex pattern_tuples finds all the tuples in the file
For each tuple, using the regex pattern_numbers you will find the 4 numerical values that interest you. In this way you will get a list of lists containing your data
Enter the results in a pandas dataframe

Extra

Here's how you could save your CV results in json format, so you can manage them more easily:

Create an cv_results array to keep the CV results
For each loop of CVs you will get a tuple t with the results, which you will have to transform into a dictionary and hang in the array cv_results
At the end of the CV loops, save the results in json format

.

cv_results = []

for _ in range_cv: # Loop CV
    # ... Calculate results of CV in t
    t = ({'ngram_a67e6f3205f0-n': 1,
       'logreg_c120232d9faa-regParam': 0.01,
       'cntVec_9c0e7831261d-vocabSize': 10000},
      0.8580469779197205) # FAKE DATA for this example

    # append results like a dict
    cv_results.append({'res':t[0], 'score':t[1]})

# Store results in json format
with open('cv_results.json', 'w') as outfile:
    json.dump(cv_results, outfile, indent=4)

Now you can read the json file and you can access all the fields like a normal python dictionary:

with open('cv_results.json') as json_file:
    data = json.load(json_file)

data[0]['score']
# output: 0.8580469779197205

if you want to switch to using a json, I updated the answer giving you some advice. Good luck :) @IvanLee

Bugbeeb · Accepted Answer · 2019-10-04 02:16:34Z

0

Why not do:

import pandas as pd
With open(file.txt) as file:
    df = pd.DataFrame([i for i in eval(file.readline())])

Eval takes a string and converts it to the literal python representation which is pretty nifty. That would convert each parenthetical to a single item iterator which is then stored into a list. Pd dataframe class can take a list of dictionaries with identical keys and create a dataframe

edited Oct 4, 2019 at 2:16

answered Oct 4, 2019 at 1:07

Bugbeeb

2,1601 gold badge11 silver badges28 bronze badges

1 Comment

razdi Over a year ago

Have you tried this on the given text? I don't think it works

Collectives™ on Stack Overflow

How to Create a table with data from array output in Python

2 Answers 2

Brief explanation

Extra

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Brief explanation

Extra

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related