Check if a substring of a string is in a list of strings in python

Question

I have a dictionary of foods:

foods={
  "chicken masala" : "curry",
  "chicken burger" : "burger",
  "beef burger" : "burger",
  "chicken soup" : "appetizer",
  "vegetable" : "curry"
}

Now I have a list of strings:

queries = ["best burger", "something else"]

I have to find out if there is any string in queries that has and entry in our food dictionary. Like in the above example it should return True for best burger. Currently, I am calculating cosine similarity between each string in the list for all the entries in the foods.keys(). It works but it's very time inefficient. The food dictionary has almost 1000 entries. Is there any efficient way to do so?

Edit:

Here the best burger should be returned because there is burger in it and burger is also present in chicken burger in foods.keys(). I am basically trying to find out if there is any query which is a food type.

This is how I am calculating :

import re, math
from collections import Counter

WORD = re.compile(r'\w+')

def get_cosine(text1, text2):
     vec1 = text_to_vector(text1.lower())
     vec2 = text_to_vector(text2.lower())
     intersection = set(vec1.keys()) & set(vec2.keys())
     numerator = sum([vec1[x] * vec2[x] for x in intersection])

     sum1 = sum([vec1[x]**2 for x in vec1.keys()])
     sum2 = sum([vec2[x]**2 for x in vec2.keys()])
     denominator = math.sqrt(sum1) * math.sqrt(sum2)

     if not denominator:
        return 0.0
     else:
        return (float(numerator) / denominator) * 100

foods={
  "chicken masala" : "curry",
  "chicken burger" : "burger",
  "beef burger" : "burger",
  "chicken soup" : "appetizer",
  "vegetable" : "curry"
}
queries = ["best burger", "something else"]
flag = False
food = []
for phrase in queries:
   for k in foods.keys():
      cosine = get_cosine(phrase, k)
      if int(cosine) > 40:
         flag = True
         food.append(phrase)
         break

print('Foods:', food)

OUTPUT:

Foods: ['best burger']

Solution: Though @Black Thunder's solution works for the example I have provided in the example but it doesn't work for queries like best burgers. But this solution works in that case. Which is a major concern for me. Thanks @Andrej Kesely. This was the reason I went for the cosine similarity in my solution. But i think SequenceMatcher works better here.

I am calculating cosine similarity between each entry in queries and each entry in food.keys() @BlackThunder — Anurag
– Anurag, Commented Aug 1, 2019 at 7:41
why dont you show the code you have tried which works but is inefficent. that will make it much easier for people to suggest performance improvments. — Chris Doyle
– Chris Doyle, Commented Aug 1, 2019 at 7:42

Andrej Kesely · Accepted Answer · 2019-08-01 08:52:39Z

You can use difflib (doc) to find similarities (It will probably need some tweaking with coefficients):

foods={
  "chicken masala" : "curry",
  "chicken burger" : "burger",
  "beef burger" : "burger",
  "chicken soup" : "appetizer",
  "vegetable" : "curry"
}

queries = ["best burger", "order"]

from difflib import SequenceMatcher

out = []
for q in queries:
    for k in foods:
        r = SequenceMatcher(None, k, q).ratio()
        print('q={: <20} k={: <20} ratio={}'.format(q, k, r))
        if r > 0.5:
            out.append(k)

print(out)

Prints:

q=best burger          k=chicken masala       ratio=0.16
q=best burger          k=chicken burger       ratio=0.64
q=best burger          k=beef burger          ratio=0.8181818181818182
q=best burger          k=chicken soup         ratio=0.2608695652173913
q=best burger          k=vegetable            ratio=0.3
q=order                k=chicken masala       ratio=0.10526315789473684
q=order                k=chicken burger       ratio=0.3157894736842105
q=order                k=beef burger          ratio=0.375
q=order                k=chicken soup         ratio=0.11764705882352941
q=order                k=vegetable            ratio=0.14285714285714285
['chicken burger', 'beef burger']

Nouman · Accepted Answer · 2019-08-01 08:06:22Z

1

Try this code:

queries = ["best burger", "order"]
foods={
  "chicken masala" : "curry",
  "chicken burger" : "burger",
  "beef burger" : "burger",
  "chicken soup" : "appetizer",
  "vegetable" : "curry"
}
output = []
for y in queries:                 #looping through the queries
    for x in y.split(" "):        #spliting the data in the queries for matches
        for z in foods:           #taking the keys (same as foods.keys)
            if x in z:            #Checking if the data in queries matches any data in the keys
                output.append(z)  #if matches, appending the data
print(output)

Output:

['chicken burger', 'beef burger']

edited Aug 1, 2019 at 8:06

answered Aug 1, 2019 at 8:00

Nouman

7,3257 gold badges38 silver badges66 bronze badges

1 Comment

Anurag Over a year ago

Thank you @Black Thunder. I was just calculating the complexity. Your code will have O(n*m) complexity. But I think it will work.

Kashyap KN · Accepted Answer · 2019-08-01 07:59:48Z

0

You can do something simple like this

First get all the keys

data = foods.keys()

Now convert list of strings to one single string comma separated. This will be much easier to check for substring matching,

queries = ','.join(queries)

Now check for substring matching

for food in data:
    food = food.split()
        for item in food:
            if item in data:
                print True

answered Aug 1, 2019 at 7:59

Kashyap KN

4193 silver badges8 bronze badges

Comments

Clepsyd · Accepted Answer · 2019-08-01 07:47:52Z

-1

If what you want is a list of matches between queries and foods keys, you could use a list comprehension:

matches = [food for food in queries if food in foods]

answered Aug 1, 2019 at 7:47

Clepsyd

5612 silver badges11 bronze badges

2 Comments

Nouman Over a year ago

The output list must not be empty

Clepsyd Over a year ago

Oh my bad. We need a list of booleans then?

Collectives™ on Stack Overflow

Check if a substring of a string is in a list of strings in python

4 Answers 4

Comments

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related