My aim is to read excel data and then classify each first name as first name, second name as second name and domain as domain variables respectively.
2 Answers
You can iterate over rows with pandas, update data and then save it to excel with pandas again:
import pandas as pd
df = pd.read_excel('input.xlsx', index_col=None)
output = {'0': [], '1': [], '2': [], '3': [], '4': []}
for index, row in df.iterrows():
output['0'].append(f"{row['First']}@{row['Domain']}")
output['1'].append(f"{row['Second']}@{row['Domain']}")
output['2'].append(f"{row['First']}{row['Second']}@{row['Domain']}")
output['3'].append(f"{row['First']}.{row['Second']}@{row['Domain']}")
output['4'].append(f"{row['First'][0]}{row['Second']}@{row['Domain']}")
df = pd.DataFrame(output, columns=list(output.keys()))
df.to_excel('output.xlsx')
Output:
3 Comments
confusedcoder
Thanks! but isn't this going to be very inefficient if there are 10,000+ rows? Wouldn't I have to initialize 10k arrays. Is there an faster way for that?
confusedcoder
sorry forgot to tag you
Alderven
sorry, have no idea about faster way of doing it. probably use c++
I understand you want something like that :
df = pandas.read_excel("input.xlsx")
def generate(data):
first,last,domain = data
return [ fl+'@'+domain for fl in \
[first,last,first+last,first+'.'+last,first[0]+last]]
df.apply(generate,'columns',result_type='expand').to_excel("output.xlsx")
the good function to do that is Dataframe.apply. the parameter of generate must be a sequence corresponding to a row.
