1

I am writing a function that will iterate through a list of text items - parse each item, and append the parsed items back into a list. The code is as below:

clean_list = []

def to_words( list ):
    i = 0
    while i <= len(list):
        doc = list[i]
        # 1. Remove HTML
        doc_text = BeautifulSoup(doc).get_text() 
        # 2. Remove non-letters (not sure if this is advisable for all documents)       
        letters_only = re.sub("[^a-zA-Z]", " ", doc_text) 
        # 3. Convert to lower case, split into individual words
        words = letters_only.lower().split()                                               
        # 4. Remove stop words
        stops = set(stopwords.words("english"))
        meaningful_words = [w for w in words if not w in stops]   
        # 5. Join the words back into one string separated by space, and return the result.
        clean_doc = ( " ".join( meaningful_words ))   
        i = i+1
        clean_list.append(clean_doc)

But when I pass the list into this function, to_words(list), I get this error: IndexError: list index out of range

I tried experimenting without technically defining the to_words function i.e. avoiding the loop, manually changing i as 0,1,2 etc, and following through the steps of the function; this works fine.

Why am I facing this error when I use the function (and loop)?

4
  • Can you give full traceback here ? Commented Feb 23, 2017 at 10:48
  • 2
    a list of length 5 has indices 0, 1, 2, 3, 4. - your while i <= len(list) gives i the values 0, 1, 2, 3, 4, 5. Change it to while i < len(list) Commented Feb 23, 2017 at 10:49
  • 1
    Also don't use the variable name list, as this causes confusion with the list object type. Commented Feb 23, 2017 at 10:50
  • 2
    You could use for doc in list rather than incrementing i. Commented Feb 23, 2017 at 10:51

1 Answer 1

1

Change while i <= len(list) to while i < len(list)

List indexing start from 0 so, i <= len(list) will satisfy the index as equal to len(list) so that's will make an index error.

1 . Better use for rather than using file loop, list support iterating through the list. Like

for elem in list_:
    # Do your operation here

2 . Don't use list as a variable name.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.