3

I have an Excel file with the ff: row/col structure

ID   English   Spanish   French
 1   Hello     Hilo      Halu
 2   Hi        Hye       Ghi
 3   Bus       Buzz      Bas

I would like to read the Excel file, extract the row and col values, and create 3 new files base on the columns English, Spanish, and French.

So I would have something like:

English File:

"1" = "Hello"
"2" = "Hi"
"3" = "Bus"

I've been using xlrd. I can open, read, and print the contents of the file. However, this is what I get using this command (with the Excel file already open):

for index in xrange(0,2):
    theWord = '\n' + str(sh.col_values(index, start_rowx=index, end_rowx=1)) + '=' + str(sh.col_values(index+1, start_rowx=index, end_rowx = 1))
    print theWord

OUTPUT:

[u'Parameter/Variable/Key/String']=[u'ENGLISH'] <-- is this a list?, didn't the str() use to strip it out?

What's the u doing there? How can I remove the square brackets?

1
  • So none of the answers below addresses your issue? Commented Feb 20, 2013 at 11:26

3 Answers 3

5

The u means it is a unicode string, it gets put there when you call str(). If you write the string out to a file it wont be there. What you are getting is 1 row from the column. It's because you are using end_rowx=1 it returns a list with one element.

Try getting the column value lists:

ids = sh.col_values(0, start_rowx=1)
english = sh.col_values(1, start_rowx=1)
spanish = sh.col_values(2, start_rowx=1)
french = sh.col_values(3, start_rowx=1)

and then you can zip them into tuple lists:

english_with_IDS = zip(ids, english)
spanish_with_IDS = zip(ids, spanish)
french_with_IDS = zip(ids, french)

Which are in the form:

("1", "Hello"),("2", "Hi"), ("3", "Bus")

If you want to print the pairs:

for id, word in english_with_IDS:
       print id + "=" + word

col_values returns a list of column values, if you want single values you can call sh.cell_value(rowx, cellx).

Sign up to request clarification or add additional context in comments.

1 Comment

it resembles a basis for my solution, likewise, you've answered the question of 'u's existence in the output.
4
import xlrd

sh = xlrd.open_workbook('input.xls').sheet_by_index(0)
english = open("english.txt", 'w')
spanish = open("spanish.txt", 'w')
french = open("french.txt", 'w')
try:
    for rownum in range(sh.nrows):
        english.write(str(rownum)+ " = " +str(sh.cell(rownum, 0).value)+"\n")
        spanish.write(str(rownum)+ " = " +str(sh.cell(rownum, 1).value)+"\n")
        french.write(str(rownum)+ " = " +str(sh.cell(rownum, 2).value)+"\n")
finally:
    english.close()
    spanish.close()
    french.close()

Comments

3

Use pandas:

In [1]: import pandas as pd

In [2]: df = pd.ExcelFile('test.xls').parse('Sheet1', index_col=0) # reads file

In [3]: df.index = df.index.map(int)

In [4]: for col in df.columns:
   ...:     column = df[col]
   ...:     column.to_csv(column.name, sep='=')  # writes each column to a file                                                    
   ...:                                          # with filename == column name

In [5]: !cat English  # English file content
1=Hello
2=Hi
3=Bus

1 Comment

I didn't tried this, given the fact xlrd/xlwt can do almost all the stuff I need but I think its worth mentioning.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.