Python pandas checking if row contains a string

Question

I am trying to make a program that would sort found password hashes with CSV file containing hash and email. I am trying to get the "Email" from ex.csv and "Pass" from the found.txt where hash values coincide. But I am getting an error - raise ValueError( ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

My code -

import pandas as pd
import numpy as np

ex = pd.read_csv("ex.csv",delimiter=",")
found = pd.read_csv("found.txt",delimiter=":")

temp = ex[["Hash","Email"]]
te = found[["Hash","Pass"]]

for index,row in te.iterrows(): #Looping through file
    if temp.loc[temp['Hash'] == row['Hash'][index]]: # If pandas can't locate Hash string inside a first file, list is empty. And I am comparing that here
        print(temp['Email'][index]) # If successful, print out the
        print(te['Pass'][index])    # found values in the console

Samples from ex.csv:

                                          Hash                    Email
0     210ac64b3c5a570e177b26bb8d1e3e93f72081fd  [email protected]
1     707a1b7b7d9a12112738bcef3acc22aa09e8c915  [email protected]
2     24529d87ea25b05daba92c2b7d219a470c3ff3a0  [email protected]

Samples from found.txt:

                                         Hash         Pass
0    f8fa3b3da3fc71e1eaf6c18e4afef626e1fc7fc1     pass1
1    ecdc5a7c21b2eb84dfe498657039a4296cbad3f4     pass2
2    f61946739c01cff69974093452057c90c3e0ba14     pass3

Or maybe there are better ways to iterate through rows and check if the row contains string from another file row? ;)

SahilDesai · Accepted Answer · 2020-05-21 16:08:54Z

1

import pandas as pd
import numpy as np

ex = pd.read_csv("c.csv",delimiter=",")
found = pd.read_csv("d.csv",delimiter=",")

print(ex)
print(found)

temp = ex[['Hash','Email']]
te = found[['Hash','Pass']]

for temp1, temp2 in zip(te.iterrows(), temp.iterrows()):
    if temp2[1]['Hash'][temp2[0]] == temp1[1]['Hash'][temp1[0]]:
        print(temp['Email'][temp2[0]])
        print(te['Pass'][temp1[0]])

I have stored values like this

1) c.csv

Hash,Email
210ac64b3c5a570e177b26bb8d1e3e93f72081fd,[email protected]
707a1b7b7d9a12112738bcef3acc22aa09e8c915,[email protected]
24529d87ea25b05daba92c2b7d219a470c3ff3a0,[email protected]

2) d.csv

Hash,Pass
f8fa3b3da3fc71e1eaf6c18e4afef626e1fc7fc1,pass1
ecdc5a7c21b2eb84dfe498657039a4296cbad3f4,pass2
f61946739c01cff69974093452057c90c3e0ba14,pass3

edited May 21, 2020 at 16:08

answered May 21, 2020 at 15:24

SahilDesai

5223 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mike2233 Over a year ago

Thank you for your answer. But with your code I am getting error - Traceback (most recent call last): File "a.py", line 14, in <module> if temp['Hash'].str == row['Hash'][index]: IndexError: string index out of range

SahilDesai Over a year ago

@Mike2233 I am considering size of both dataframe as same.

Mike2233 Over a year ago

Ohh. That's why. ex.csv file is bigger. But I am not getting any prints at all. It just jumps to index out of range

SahilDesai Over a year ago

@Mike2233 I have edited the code but looks like answer by Valdi_Bo is quite better and does the work easily.

Valdi_Bo · Accepted Answer · 2020-05-21 15:59:22Z

To print matches, use the following code:

for _, row in te.iterrows():
    rowHash = row.Hash
    matches = temp.Hash == rowHash  # boolean mask
    if matches.any():
        mails = temp[matches].Email.tolist()
        print(f'Found:  {rowHash} / {row.Pass} / {", ".join(mails)}')

Thoroughly compare my code with yours. I think, such comparison will allow you to locate what was wrong in your code.

You didn't write it precisely, but I suppose that your error occurred in if instruction (my version is different).

Edit

You can also try another concept. Due to lookup by index it should run considerably faster than the above loop.

# Set 'Hash' column as the index in both DataFrames
temp2 = temp.set_index('Hash')
te2 = te.set_index('Hash')
# Loop over rows in 'te2', index (Hash) in 'teHash'
for teHash, row in te2.iterrows():
    try:
        res = temp2.loc[teHash]  # Attempt to find corresponding row(s) in 'temp2'
        if isinstance(res, pd.Series):  # Single match found
            mails = res.Email
        else:                           # Multiple matches found
            mails = ', '.join(res.Email)
        print(f'Found: {teHash} / {row.Pass} / {mails}')
    except KeyError:
        pass      # Not found

Collectives™ on Stack Overflow

Python pandas checking if row contains a string

2 Answers 2

4 Comments

Edit

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Edit

Comments

Your Answer

Sign up or log in

Post as a guest

Related