5

I have a array, let's say:

   var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"];
   var myString = "I went to the park with my garmin watch";

What is the fast way to check if my String has any of the words in myArray?

Bellow is my code, but im not sure if it would be the best way to go...

   function score(arKeywords, frase) {
      if (frase == undefined) {
        return 0;
      } else {
          var indice = 0;
          var indArray = arKeywords.length;
          var sentencaMin = frase.toLowerCase();
          for (i = 0; i < indArray; i++) {
              if (sentencaMin.search(arKeywords[i]) > 0) { indice++; }
          }
          return indice;
      }
  }

Please help me anyone. That function will run in A LOT of strings!

Thank you all :)

3
  • 1
    Do you really want to search the string, or do you want to match exact words? Commented Jun 5, 2016 at 22:18
  • 1
    myString.split(/\s+/).filter(Set.prototype.has, new Set(myArray)) (or use some instead of filter to determine "whether" not "which"). Commented Jun 5, 2016 at 22:19
  • I want to search the string for the words... thnx :) Commented Jun 6, 2016 at 14:06

7 Answers 7

4

Based on this sentence, from the question:

What is [a] way to check if my String has any of the words in myArray?

(Emphasis mine.)

I'd suggest the following, which will test if "some" of the words in the supplied string are present in the supplied array. This – theoretically – stops comparing once there is a match of any of the words from the string present in the array:

var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"],
  myString = "I went to the park with my garmin watch";

function anyInArray(needles, haystack) {

  // we split the supplied string ("needles") into words by splitting
  // the string at the occurrence of a word-boundary ('\b') followed
  // one or more ('+') occurrences of white-space ('\s') followed by
  // another word-boundary:
  return needles.split(/\b\s+\b/)
    // we then use Array.prototype.some() to work on the array of
    // words, to assess whether any/some of the words ('needle') 
    // - using an Arrow function - are present in the supplied
    // array ('haystack'), in which case Array.prototype.indexOf()
    // would return the index of the found-word, or -1 if that word
    // is not found:
    .some(needle => haystack.indexOf(needle) > -1);
    // at which point we return the Boolean, true if some of the
    // words were found, false if none of the words were found.
}

console.log(anyInArray(myString, myArray));

var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"],
  myString = "I went to the park with my garmin watch";

function anyInArray(needles, haystack) {
  return needles.split(/\b\s+\b/).some(needle => haystack.indexOf(needle) > -1);
}

console.log(anyInArray(myString, myArray));

JS Fiddle demo.

References:

Sign up to request clarification or add additional context in comments.

4 Comments

That's certainly an elegant way to do it, but it looks like the OP is after both the number of matches and speed over many input strings. That said, profiling with real code and data should be the judge of whether this answer meets the latter criterion (OP please note), and my answer, along with others, may be guilty of premature optimisation.
@FizzyTea: that may certainly be true, but I composed my answer based on the quoted (slightly paraphrased) sentence. If the OP needs exact numbers of matches, or some kind of 'score' then the question needs to be edited to include that information (and, ideally, those having answered would be notified by @-user-name pings to prompt us to address the newly-added requirements).
@FizzyTea I created a performance comparison jsfiddle, feel free to update.
Thank you so much for taking your time ans share your expertise! Still learning the best way to do things in javascript!!! Thank you very much!
4

What is the fast way to check if my String has any of the words in myArray?

Compile your myArray into regex and test for myString - please see FizzyTea's answer.

If you don't want to use regex for whatever reason, the second fastest alternative is to use String.includes() and Array.some():

 var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"];
 var myString = "I went to the park with my garmin watch";

 console.log(myArray.some(e => myString.includes(e)));

For a performance comparison of different methods, see https://jsfiddle.net/usq9zs61/5/

Results over 100000 iterations in Chrome 48 / Firefox 46, Ubuntu:

  • compiledregextest (FizzyTea): 16.046ms / 21.84ms
  • someincludes (this answer): 76.707ms / 62.55ms
  • compiledregexmatch (FizzyTea): 104.682ms / 170.58ms
  • someset (Comment by Bergi): 488.474ms / 749.46ms
  • splitregexsome (David Thomas): 529.529ms / 677.20ms
  • filterset (Comment by Bergi): 742.857ms / 875.86ms
  • ahocorasick (ordi): 1790.654ms / 1642.19ms

The Aho-Corasick algorithm proposed by orid has the best run-time complexity, but the alternative methods execute faster on current Javascript engines unless your myArray of search strings is much bigger.

2 Comments

Nice job. My regep version counts the number of matches. However, it may be worth noting that the OP's code counts the number of unique matches, not the total. It might also be worth testing the compiled regexp version using re.test(string) -- I expect that would speed things up.
I will add re.test(string) to the fiddle, should indeed be faster!
1

For speed, try a precompiled RegExp:

var re = RegExp('\\b' + myArray.join('\\b|\\b') + '\\b', gi);
var i, matches;
for(i=0; i<lotsOfStrings.length; i+=1){
    // note that this retrieves the total number
    // of matches, not unique matches, which may
    // not be what you want
    matches = lotsOfStrings[i].match(re);
    // do something with matches
}

Note that the RegExp is constructed outside the loop.

Alternatively, to simply test for a match:

var re = RegExp('\\b' + myArray.join('\\b|\\b') + '\\b', gi);
var i, matched;
for(i=0; i<lotsOfStrings.length; i+=1){
    matched = re.test(lotsOfStrings[i]);
    // do something with matched
}

1 Comment

Do you really need the word boundary (\\b) designations on these regexes? If you're using the pipe separator that should be enough right?
0

If you just want to know if there are any matches, you can convert the array into a regular expression.

My regexp also uses \b to match word boundaries, so park won't match if the string contains spark.

var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"];
var myString = "I went to the park with my garmin watch";


function score(arKeywords, frase) {
  if (frase == undefined) {
    return 0;
  } else {
    var re = new RegExp('\\b(' + arKeywords.join('|') + ')\\b', 'i');
    return !!frase.match(re);
  }
}

console.log(score(myArray, myString));

Comments

0

You could join the array with | and construct a regex, properly not the fastest, but quote pretty:

function score(myArray, text) {
  var regex = new RegExp('\\b(' + myArray.join('|') + ')\\b', 'gi');
  var matches = text.match(regex);
  return matches ? matches.length : 0;
}

And usage:

var myArray = ["ibira", "garmin", "hide", "park", "parque", "corrida", "trote", "personal", "sports", "esportes", "health", "saúde", "academia"];
var myString = "I went to the park with my garmin watch";

score(myArray, myString); // 2
score(myArray, 'Ibira is doing sports in the Park'); // 3

This assumes myArray doesn't contain any special characters.

Comments

0

Here is one way to do it: https://jsbin.com/fiqegu/1/edit?js,console

var result = myString.split(' ').filter(function(word) {
  return myArray.indexOf(word) > -1;
});

This will return the words

Obviously, you can get the count by adding .length to the end of the above code:

var result = myString.split(' ').filter(function(word) {
  return myArray.indexOf(word) > -1;
}).length;

1 Comment

It looks like he's scoring based on the number of matches, so that would be .length at the end.
0

The most efficient solution for this problem would probably be the Aho-Corasick algorithm, which searches in O(size of string being searched) after having created the initial DAG from the list of strings in O(sum of sizes of strings in the list).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.