1

I'm running into some issues with some java code that I do not know how to fix. I was wondering if I could get some help with figuring out why I keep getting

java.lang.StringIndexOutOfBoundsException: String index out of range: 1

Here's the code snippet where the problem is popping up (its part of a larger package for an assignment..) :

public class MyMapper extends Mapper {

@Override
//method takes docName and data as string
public void map(String documentID, String document) {

    //this string array hold all the delimiters for our split
    //String[] separators = {",", ".", "!", "?", ";", ":", "-", "' "," "};

    //splits the string 'document' according to delimiters
    String[] words = document.split(",|\\.|\\!|\\?|\\;|\\:|\\-|\\' |\\ |\\'.");


    // for each word in String[] words, check that each word is legitimate
    for (String word : words) {

        if (isAlpha(word)){
            //System.out.println(word);
            emit(word.substring(0, 1).toUpperCase() , "1");
        }

        else;

    }
}


// private helper method to check that each word is legitimate (alphas-only)
private boolean isAlpha(String name) {
char[] chars = name.toCharArray();

for (char c : chars) {
    if(!Character.isLetter(c)) {
        return false;
    }
}

return true;
}

}

What I am trying to do is take in a document (stored in string form through bufferedReader) and seize the first letter of each word in the doc, and capitalize them.

***** Updated Code*****

I decided to go with the suggested check for the empty "word" in my private helper method. Everything works now.

Here is the updated code for documentation purposes:

// private helper method to check that each word is legitimate (alphas-only)
private boolean isAlpha(String name) {

if (name.equals("")) 
    return false;

char[] chars = name.toCharArray();

for (char c : chars) {
    if(!Character.isLetter(c)) {
        return false;
    }
}

return true;
4
  • 2
    It seems like some of the "words" are empty. Commented Apr 16, 2015 at 20:01
  • This question needs an example of input on which it fails. Commented Apr 16, 2015 at 20:03
  • 1
    If words has empty strings, or word.length() is 0 you will have index out of bounds errors. Commented Apr 16, 2015 at 20:04
  • I am feeding in the following string "a about above absolutely acceptable add adjacent' af-ter alg0rithm all." Currently it seems to fail on "a", works for "about" and "above" and fails everything else Commented Apr 16, 2015 at 20:12

4 Answers 4

1

Looks like sometimes your word is empty. Make a check first to see that you've got something to work with:

if (isAlpha(word)){        
    if(!word.isEmpty()){ //you could also use 'if(word.length == 0)'
       emit(word.substring(0, 1).toUpperCase() , "1");
    }
}

Alternatively, make that check in your isAlpha() method.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the suggestion. I tried this and it seems that I had 3 "words" that were empty. However, the error is still occurring.
Then please update the code in your post (so others can see) and include the stack trace of the error. Maybe then we can see what's happening. It would also help if you included an example word array.
the string I was testing with at the time was "a about above absolutely acceptable add adjacent' af-ter alg0rithm all." Following your advice as well as Iqbal's resolved my issues.
1

If your word is empty just return a false from your isAlpha() like this

private boolean isAlpha(String name) {

if (name.equals(""))
       return false;
char[] chars = name.toCharArray();

for (char c : chars) {
    if(!Character.isLetter(c)) {
        return false;
    }
}

return true;
}

}

Comments

1

For some strings, your split regex can produce empty strings, for example in the not-at-all unusual case that a comma is followed by a space, e.g., the string document = "Some words, with comma."; will be split into [Some, words, , with, comma].

Instead of enumerating all the non-word characters that you can think of, I suggest using the \W character class (non-alphanumeric character) and also allowing multiple of those, i.e. words = document.split("\\W+");. This gives you [Some, words, with, comma].

If you need more control about the characters to split by and don't want to use a character class, you can still put the characters into [...]+ to shorten the regex and to split by groups of those, too, using words = document.split("[|.!?,;:' -]+"). (Inside [...], you do not need to escape all of those, as long as the - is last, so it's unambiguous.)

2 Comments

The reason why I was enumeration some non-word chars was because of specific requirements for the assignment. Thank you for the solution, I sense it may be very helpful in the near future
@Parable See my update on how to combine my approach with specific characters to split by.
0

Would something like this do?

    String text = "es saß ein wiesel, auf einem kiesel.";

    String[] parts = text.split("\\s+");

    StringBuilder resultingString = new StringBuilder();
    for (String part : parts) {
        part = Character.toUpperCase(part.charAt(0))
                + part.substring(1, part.length());
        resultingString.append(part + " ");
    }

    text = resultingString.toString().substring(0,
            resultingString.length() - 1);

    System.out.println(text);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.