0

What's the easiest way to get from an array of strings like this:

arr = ['abc def ghi', 'def jkl xyz', 'abc xyz', 'jkl xyz']

to a dataframe where each column is a single word and each row contains 0 or 1 depending if the word appeared in the string. Something like this:

   abc def ghi jkl xyz
0    1   1   1   0   0
1    0   1   0   1   1
2    1   0   0   0   1
3    0   0   0   1   1

EDIT: Here is my approach, which to me seemed like a lot of python looping and not using the built in pandas functions

labels = (' ').join(arr)
labels = labels.split()
labels = list(set(labels))
labels = sorted(labels)

df = pd.DataFrame(np.zeros((len(arr), len(labels))), columns=labels)
cols = list(df.columns.values)

for i in range(len(arr)):
    for col in cols:
        if col in arr[i]:
            df.set_value(i, col, 1)
2
  • I'm sorry, but this site is not meant to solve your tasks, but to help you with problems you occur on your way to solve them. So, what does your code look like so far? Commented Apr 23, 2017 at 11:15
  • i included my own code in the question, which works but seemed like a lot of manual python loops. i thought there might be an easier way to do it with pandas Commented Apr 23, 2017 at 11:43

1 Answer 1

3

EDITED - reduced to 3 essential lines:

import pandas as pd

arr = ['abc def ghi', 'def jkl xyz', 'abc xyz', 'jkl xyz']

words = set( ' '.join( arr ).split() )
rows  = [ { w : int( w in e ) for w in words } for e in arr ]
df    = pd.DataFrame( rows )

print( df )

Result:

   abc  def  ghi  jkl  xyz
0    1    1    1    0    0
1    0    1    0    1    1
2    1    0    0    0    1
3    0    0    0    1    1
Sign up to request clarification or add additional context in comments.

3 Comments

thanks and sorry for now including my code from the beginning. yours runs a little faster than mine
Sure. You can reduce it to 2 lines by creating the DataFrame directly on the result of the rows list comprehension, but this is a bit more readable :)
@pietz if this solution helped, please accept it (tick mark to side) to confirm resolution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.