How to read in pretty-printed dataframe into a Pandas dataframe?

Question

# necessary imports
from tabulate import tabulate
import pandas as pd

I have a dataframe:

df = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                   'B': ['B0', 'B1', 'B2', 'B3'],
                   'C': ['C0', 'C1', 'C2', 'C3'],
                   'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 1, 2, 3])

Using this, I pretty print it:

prettyprint=tabulate(df, headers='keys', tablefmt='psql')
print(prettyprint)

Result:

+----+-----+-----+-----+-----+
|    | A   | B   | C   | D   |
|----+-----+-----+-----+-----|
|  0 | A0  | B0  | C0  | D0  |
|  1 | A1  | B1  | C1  | D1  |
|  2 | A2  | B2  | C2  | D2  |
|  3 | A3  | B3  | C3  | D3  |
+----+-----+-----+-----+-----+

Saving it to a text file:

with open("PrettyPrintOutput.txt","w") as text_file:
    text_file.wite(prettyprint)

How can I read PrettyPrintOutput.txt back into a dataframe without doing a lot of text processing manually?

Maybe you can look into pickling it instead of writing to a text file? — user32882
– user32882, Commented Aug 23, 2020 at 13:31
Yeah that's also good for general use. One of the main reasons I search for a solution in the above way is that I often posts on SO with dataframes given in a similar manner & find it hard to reproduce them. — zabop
– zabop, Commented Aug 23, 2020 at 13:33
IMO, the pretty printed versions of DataFrames are a nuisance (Better to just plain print without the decorators, or use the to_string() method so someone can reproduce with StringIO). For pretty print, I wind up copying them, removing the lines and then find and replacing '|' with ''. Otherwise you end up with all kinds of whitespace issues on string columns/column headers. Sure you can strip it, but it winds up being more code — ALollz
– ALollz, Commented Aug 23, 2020 at 16:04

above_c_level · Accepted Answer · 2020-08-23 15:25:44Z

2

One solution is to use clever keyword arguments in pd.read_csv / pd.read_clipboard:

    df = pd.read_csv(r'PrettyPrintOutput.txt', sep='|', comment='+', skiprows=[2], index_col=1)
    df = df[[col for col in df.columns if 'Unnamed' not in col]]

I just define all lines beginning with '+' as comments, so they don't get imported. This does not help against the third row, which has to be excluded using skiprow.

The second line is needed because you end up with additional columns using the '|' as separator. If you know the column names in advance use the keyword usecols to be explicit.

Output:

       A      B      C      D   
                                
0      A0     B0     C0     D0  
1      A1     B1     C1     D1  
2      A2     B2     C2     D2  
3      A3     B3     C3     D3

It also works with pd.read_clipboard, because the functions accept the same keyword arguments.

answered Aug 23, 2020 at 15:25

above_c_level

3,9893 gold badges26 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ALollz Over a year ago

object columns and column headers are problematic with this approach. You'll need to strip all of them.

Collectives™ on Stack Overflow

How to read in pretty-printed dataframe into a Pandas dataframe?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related