Pandas Regex: Read specific columns only from csv with regex patterns

Question

Given a large CSV file(large enough to exceed RAM), I want to read only specific columns following some patterns. The columns can be any of the following: S_0, S_1, ...D_1, D_2 etc. For example, a chunk from the data frame looks like this:

And the regex pattern would be for example anyu column that starts with S: S_\d.*.

Now, how do I apply this with pd.read_csv(/path/, __) to read the specific columns as mentioned?

Ynjxsjmh · Accepted Answer · 2022-05-31 15:14:02Z

2

You can first read few rows and try DataFrame.filter to get possible columns

cols = pd.readcsv('path', nrows=10).filter(regex='S_\d*').columns
df = pd.readcsv('path', usecols=cols)

edited May 31, 2022 at 15:14

answered May 31, 2022 at 15:07

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

George Pipis · Accepted Answer · 2023-06-20 15:43:17Z

1

You can work with the usecols parameter as follows:

import re
import pandas as pd

pattern = 'S_\d+'


df = pd.read_csv('path/your_file.csv', usecols=lambda col: bool(re.match(pattern, col)))

answered Jun 20, 2023 at 15:43

George Pipis

1,89225 silver badges14 bronze badges

Comments

Mrutyunjay Biswal · Accepted Answer · 2022-05-31 16:16:51Z

0

Took the same approach(as of now) as mentioned in the comments. Here goes the detailed piece I used:

def extract_col_names(all_cols, pattern):
    result = []
    
    for col in all_cols:
        if re.match(pattern, col):
            result.append(col)
        else:
            continue
            
    return result

extract_col_names(cols, pattern="S_\d+")

And it works! But without this work-around, say even loading the columns is heavy enough itself. So, does there exist any method to parse regex patterns at the time of reading CSVs? This still remains a question.

Thanks for the response :)

answered May 31, 2022 at 16:16

Mrutyunjay Biswal

911 silver badge6 bronze badges

Collectives™ on Stack Overflow

Pandas Regex: Read specific columns only from csv with regex patterns

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related