4
# necessary imports
from tabulate import tabulate
import pandas as pd

I have a dataframe:

df = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                   'B': ['B0', 'B1', 'B2', 'B3'],
                   'C': ['C0', 'C1', 'C2', 'C3'],
                   'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 1, 2, 3])

Using this, I pretty print it:

prettyprint=tabulate(df, headers='keys', tablefmt='psql')
print(prettyprint)

Result:

+----+-----+-----+-----+-----+
|    | A   | B   | C   | D   |
|----+-----+-----+-----+-----|
|  0 | A0  | B0  | C0  | D0  |
|  1 | A1  | B1  | C1  | D1  |
|  2 | A2  | B2  | C2  | D2  |
|  3 | A3  | B3  | C3  | D3  |
+----+-----+-----+-----+-----+

Saving it to a text file:

with open("PrettyPrintOutput.txt","w") as text_file:
    text_file.wite(prettyprint)

How can I read PrettyPrintOutput.txt back into a dataframe without doing a lot of text processing manually?

3
  • Maybe you can look into pickling it instead of writing to a text file? Commented Aug 23, 2020 at 13:31
  • Yeah that's also good for general use. One of the main reasons I search for a solution in the above way is that I often posts on SO with dataframes given in a similar manner & find it hard to reproduce them. Commented Aug 23, 2020 at 13:33
  • IMO, the pretty printed versions of DataFrames are a nuisance (Better to just plain print without the decorators, or use the to_string() method so someone can reproduce with StringIO). For pretty print, I wind up copying them, removing the lines and then find and replacing '|' with ''. Otherwise you end up with all kinds of whitespace issues on string columns/column headers. Sure you can strip it, but it winds up being more code Commented Aug 23, 2020 at 16:04

1 Answer 1

2

One solution is to use clever keyword arguments in pd.read_csv / pd.read_clipboard:

    df = pd.read_csv(r'PrettyPrintOutput.txt', sep='|', comment='+', skiprows=[2], index_col=1)
    df = df[[col for col in df.columns if 'Unnamed' not in col]]

I just define all lines beginning with '+' as comments, so they don't get imported. This does not help against the third row, which has to be excluded using skiprow.

The second line is needed because you end up with additional columns using the '|' as separator. If you know the column names in advance use the keyword usecols to be explicit.

Output:

       A      B      C      D   
                                
0      A0     B0     C0     D0  
1      A1     B1     C1     D1  
2      A2     B2     C2     D2  
3      A3     B3     C3     D3 

It also works with pd.read_clipboard, because the functions accept the same keyword arguments.

Sign up to request clarification or add additional context in comments.

1 Comment

object columns and column headers are problematic with this approach. You'll need to strip all of them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.