Set-based hash (digest) algorithm?

Question

Is there any message digest algorithm that you can apply set functions on the digest and the result still makes sense? In other words, is there a hash function that does NOT break the concept of "set" before and after hashing?

I'm looking for a hash function that:

hashes a set of data into a fixed-length (or bounded-length) string
produces identical hash if the input data set is the same
if you select a subset of your raw data, it is equivalent to either hash the data subset, or apply the subset to the hash of the original data set, i.e. you will get the same subset hash in the both ways.

As an example, in the following picture set A has several data points (red dimonds). B is a subset of A. Is there such a hash function that:

data in A ---- hash function ----> _hashA ---- set operation ----> _hashB

data in B ---- hash function ----> _hashB

enter image description here

mcdowella · Accepted Answer · 2013-07-20 04:41:46Z

This looks a bit like http://en.wikipedia.org/wiki/Homomorphic_encryption and a bit like database privacy schemes like http://en.wikipedia.org/wiki/Differential_privacy - at least to me.

In both cases developers have had problems because it turned out that once you let users do a few things they could find clever ways to work out how to do anything they wanted using those few things as building blocks so the system lacked any security at all.

In your case I think you want AndHash(hash(a), hash(b)) = hash(a and b). This means that if hash(a) != hash(null set) then I can find out if a is a member of any set based on the hash value of that set. If this happens a lot I can work out many of the members of a hashed set given its hash value, which means that the hash value must be pretty much as big as the set, as it contains all the information in it.

Depending on what you want this for, it might be worth looking at http://en.wikipedia.org/wiki/Minhash.

SeaBass · Accepted Answer · 2013-07-20 01:06:38Z

0

AFAIK, no. Hash functions generally (and I've seen many) operate on a single chunk of data without any regard whatsoever for what that data may actually represent, the primary concern being to reduce to probability of collisions. That said, it's certainly possible to come up with something like what you're wanting to do, but I imagine it would be exceedingly difficult, and the result most likely suboptimal in terms of collision-avoidance.

answered Jul 20, 2013 at 1:06

SeaBass

1211 silver badge10 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 10:32:10Z

0

The short answer is no, there isn't such an algorithm. What you might try is encrypting your data and then decrypting it when you need to apply your set function, then encrypting it again. Hashing algorithms, however, are by their very nature one way and involve the loss of data. There's a good explanation of the difference between hash and encryption algorithms here: Fundamental difference between Hashing and Encryption algorithms

edited May 23, 2017 at 10:32

CommunityBot

11 silver badge

answered Jul 20, 2013 at 1:07

seanmk

1,97315 silver badges28 bronze badges

Collectives™ on Stack Overflow

Set-based hash (digest) algorithm?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related