I am parsing data from multiple sources and I want to assign a unique (string) id to each entry. Each entry contains a title (string), url(string) and body(string). We can get same title from multiple sources but those will have different urls and I would like to store both the items in that case. I am thinking of creating a hash of title and url and assign that as an id, that ways if I get same title and url from different sources, the id will be same and I will be able to identify that it's a duplicate.
import hashlib
hashlib.sha256(str("title url").encode('utf-8')).hexdigest()
But I think there can be a case where 2 different title url combinations might generate same hash, not sure how to overcome the clash. Can someone suggest a way of generating unique identifier using strings I don't want to use timestamp because I might get same row from different sources at different times
if hash not in hashes:title + url + bodyinstead of singletitlefor generating hashes