I have a dictionary of foods:
foods={
"chicken masala" : "curry",
"chicken burger" : "burger",
"beef burger" : "burger",
"chicken soup" : "appetizer",
"vegetable" : "curry"
}
Now I have a list of strings:
queries = ["best burger", "something else"]
I have to find out if there is any string in queries that has and entry in our food dictionary.
Like in the above example it should return True for best burger.
Currently, I am calculating cosine similarity between each string in the list for all the entries in the foods.keys().
It works but it's very time inefficient. The food dictionary has almost 1000 entries. Is there any efficient way to do so?
Edit:
Here the best burger should be returned because there is burger in it and burger is also present in chicken burger in foods.keys(). I am basically trying to find out if there is any query which is a food type.
This is how I am calculating :
import re, math
from collections import Counter
WORD = re.compile(r'\w+')
def get_cosine(text1, text2):
vec1 = text_to_vector(text1.lower())
vec2 = text_to_vector(text2.lower())
intersection = set(vec1.keys()) & set(vec2.keys())
numerator = sum([vec1[x] * vec2[x] for x in intersection])
sum1 = sum([vec1[x]**2 for x in vec1.keys()])
sum2 = sum([vec2[x]**2 for x in vec2.keys()])
denominator = math.sqrt(sum1) * math.sqrt(sum2)
if not denominator:
return 0.0
else:
return (float(numerator) / denominator) * 100
foods={
"chicken masala" : "curry",
"chicken burger" : "burger",
"beef burger" : "burger",
"chicken soup" : "appetizer",
"vegetable" : "curry"
}
queries = ["best burger", "something else"]
flag = False
food = []
for phrase in queries:
for k in foods.keys():
cosine = get_cosine(phrase, k)
if int(cosine) > 40:
flag = True
food.append(phrase)
break
print('Foods:', food)
OUTPUT:
Foods: ['best burger']
Solution:
Though @Black Thunder's solution works for the example I have provided in the example but it doesn't work for queries like best burgers. But this solution works in that case. Which is a major concern for me. Thanks @Andrej Kesely. This was the reason I went for the cosine similarity in my solution. But i think SequenceMatcher works better here.