0

I have a dictionary of foods:

foods={
  "chicken masala" : "curry",
  "chicken burger" : "burger",
  "beef burger" : "burger",
  "chicken soup" : "appetizer",
  "vegetable" : "curry"
}

Now I have a list of strings:

queries = ["best burger", "something else"]

I have to find out if there is any string in queries that has and entry in our food dictionary. Like in the above example it should return True for best burger. Currently, I am calculating cosine similarity between each string in the list for all the entries in the foods.keys(). It works but it's very time inefficient. The food dictionary has almost 1000 entries. Is there any efficient way to do so?

Edit:

Here the best burger should be returned because there is burger in it and burger is also present in chicken burger in foods.keys(). I am basically trying to find out if there is any query which is a food type.

This is how I am calculating :

import re, math
from collections import Counter

WORD = re.compile(r'\w+')

def get_cosine(text1, text2):
     vec1 = text_to_vector(text1.lower())
     vec2 = text_to_vector(text2.lower())
     intersection = set(vec1.keys()) & set(vec2.keys())
     numerator = sum([vec1[x] * vec2[x] for x in intersection])

     sum1 = sum([vec1[x]**2 for x in vec1.keys()])
     sum2 = sum([vec2[x]**2 for x in vec2.keys()])
     denominator = math.sqrt(sum1) * math.sqrt(sum2)

     if not denominator:
        return 0.0
     else:
        return (float(numerator) / denominator) * 100

foods={
  "chicken masala" : "curry",
  "chicken burger" : "burger",
  "beef burger" : "burger",
  "chicken soup" : "appetizer",
  "vegetable" : "curry"
}
queries = ["best burger", "something else"]
flag = False
food = []
for phrase in queries:
   for k in foods.keys():
      cosine = get_cosine(phrase, k)
      if int(cosine) > 40:
         flag = True
         food.append(phrase)
         break

print('Foods:', food)

OUTPUT:

Foods: ['best burger']

Solution: Though @Black Thunder's solution works for the example I have provided in the example but it doesn't work for queries like best burgers. But this solution works in that case. Which is a major concern for me. Thanks @Andrej Kesely. This was the reason I went for the cosine similarity in my solution. But i think SequenceMatcher works better here.

7
  • 5
    What have you tried so far? Commented Aug 1, 2019 at 7:39
  • 1
    And its a bit unclear too Commented Aug 1, 2019 at 7:40
  • I am calculating cosine similarity between each entry in queries and each entry in food.keys() @BlackThunder Commented Aug 1, 2019 at 7:41
  • 2
    1000 entries is not much Commented Aug 1, 2019 at 7:41
  • 2
    why dont you show the code you have tried which works but is inefficent. that will make it much easier for people to suggest performance improvments. Commented Aug 1, 2019 at 7:42

4 Answers 4

1

You can use difflib (doc) to find similarities (It will probably need some tweaking with coefficients):

foods={
  "chicken masala" : "curry",
  "chicken burger" : "burger",
  "beef burger" : "burger",
  "chicken soup" : "appetizer",
  "vegetable" : "curry"
}

queries = ["best burger", "order"]

from difflib import SequenceMatcher

out = []
for q in queries:
    for k in foods:
        r = SequenceMatcher(None, k, q).ratio()
        print('q={: <20} k={: <20} ratio={}'.format(q, k, r))
        if r > 0.5:
            out.append(k)

print(out)

Prints:

q=best burger          k=chicken masala       ratio=0.16
q=best burger          k=chicken burger       ratio=0.64
q=best burger          k=beef burger          ratio=0.8181818181818182
q=best burger          k=chicken soup         ratio=0.2608695652173913
q=best burger          k=vegetable            ratio=0.3
q=order                k=chicken masala       ratio=0.10526315789473684
q=order                k=chicken burger       ratio=0.3157894736842105
q=order                k=beef burger          ratio=0.375
q=order                k=chicken soup         ratio=0.11764705882352941
q=order                k=vegetable            ratio=0.14285714285714285
['chicken burger', 'beef burger']
Sign up to request clarification or add additional context in comments.

Comments

1

Try this code:

queries = ["best burger", "order"]
foods={
  "chicken masala" : "curry",
  "chicken burger" : "burger",
  "beef burger" : "burger",
  "chicken soup" : "appetizer",
  "vegetable" : "curry"
}
output = []
for y in queries:                 #looping through the queries
    for x in y.split(" "):        #spliting the data in the queries for matches
        for z in foods:           #taking the keys (same as foods.keys)
            if x in z:            #Checking if the data in queries matches any data in the keys
                output.append(z)  #if matches, appending the data
print(output)

Output:

['chicken burger', 'beef burger']

1 Comment

Thank you @Black Thunder. I was just calculating the complexity. Your code will have O(n*m) complexity. But I think it will work.
0

You can do something simple like this

First get all the keys

data = foods.keys()

Now convert list of strings to one single string comma separated. This will be much easier to check for substring matching,

queries = ','.join(queries)

Now check for substring matching

for food in data:
    food = food.split()
        for item in food:
            if item in data:
                print True

Comments

-1

If what you want is a list of matches between queries and foods keys, you could use a list comprehension:

matches = [food for food in queries if food in foods]

2 Comments

The output list must not be empty
Oh my bad. We need a list of booleans then?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.