0

I have a list of elements which I want to hash together to get the root hash (similar to a merkle tree). The requirement is that I need to verify whether an element is present in the root hash. Now the restriction is that, I'll have access only to the root hash and the element which we need to verify at any point of time, so we wont be able to use the merkle tree approach. Have tried bloom filters and similar algorithms but couldn't find a way to handle the false positives. Is there any data structure or algorithm I can follow to get a solution?

12
  • If you just have the hash there is always the possibility of collisions. Commented Jun 7, 2021 at 10:27
  • Yes there is. Fully aware of that, just looking for the possibility of a solution for this overlooking the collisions Commented Jun 7, 2021 at 10:29
  • How is a merkle tree with just one layer violating your constrains? Commented Jun 7, 2021 at 10:50
  • Could you expand on that please? I could have more than 2 elements to in the set. Commented Jun 7, 2021 at 10:59
  • Are you allowed to cache values? Commented Jun 7, 2021 at 11:25

1 Answer 1

1
+50

At the end your question can be answered with the Kolmogorov complexity. Your condensed problem is the maximum compression of your specific set.

To rephrase the answer: Without any more entropy (see Shannon/Gibbs) on your question you need to let go one or the other: Accurancy or space. If you have a very specific problem, there might be better solutions as entropy is different (e.g. max different elements, size limits, reoccuring patterns, prefixes, ...).

The named algo (Bloomfilter) have been designed to optimize one for the cost of the other side (to say it in easy words: Who cares about a hash collision if the probability is lower than getting struck by lightning or the possibility the CPU itself crashes or earth is hit by asteroid,...). You can precalculate the expected collision probability using a collision calculator.

So a 64KiB Bloom filter will be able to store up to 20.637 elements before struck by lightning. You can play with the parameters: https://hur.st/bloomfilter/?n=&p=0.5E-5&m=64KiB&k=

Starting with a hash if you do not like to have any collision is the wrong way. Once information is lost, it will stay lost. You should then look into compressing the set differently. To optimize this operation best, knowledge about the data is needed to help.

If you are not willing (or able) to let go something, you're talking about a set with lossless compression. Shannon's source coding theoremtells you how optimal a lossless compression can theoretical get and how good you are compared to the optimum. If you talk about integers some algos for integer sequence compression are discussed in some papers here in stackoverflow

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.