-1

I have learnt some of the hash-join algorithms, and I know there usually are hash tables whose keys are calculated by the hash function. I am wondering that can the hash function be omitted and just use the value instead?

For example, tables user_table

[{"name": "tom", "id": 1}, {"name": "jerry". "id": 2}]

join score_table

[{"score": 5, "id": 1}, {"score": 7, "id": 2}]

on id

can I just use the key id as the hash table key? So I can save the calculation of hash function.

Or it is said that hash function has many kinds and

def hash(id):
    return id

is one of them?

Is there any other needs that I should apply a hash function?

UPDATE

From the discussion with @OmG, I know at least in multiple key join, there must be a hash function to calculate the key.

12
  • What did your research reveal? How to Ask Commented May 21, 2019 at 8:34
  • Possible duplicate of Hashing Algorithm, its uses? Commented May 21, 2019 at 8:36
  • Hi @philipxy I think I know what hash function is, and my question is what it is used in hash-join algorithms. Now I understand: at least in multiple key join, there must be a hash function to calculate the key. Hope it helps, thanks! Commented May 21, 2019 at 8:40
  • That is not clear from your question, and if that's your question, please edit your post to be clear. But then you should have researched hash join algorithms, and as I already asked, what that research revealed should be in your post also. However, your question right now asks can you not use a hash function, but hashing needs a hash function, so you seem to be asking about hashing & you don't show you have researched hashing. Commented May 21, 2019 at 8:42
  • Please don't insert EDITs or UPDATEs, edit your post to be the best presentation you can. Commented May 21, 2019 at 8:49

1 Answer 1

0

It backs to the definition of the hash-join algorithm. Actually, the philosophy of using the hash function in the hash-join algorithm backs to handling join attributes more efficiently. Here, in your specific case, as you have not any extra attribute on the join, and just using from the unique id, you do not need any more hash function or in the otherwords, your hash function could be the identity function.

However, you should notice that this solution is a domain specific solution and cannot be generalized for more complex cases without having a more complex hash function.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @Omg, may be a multiple key join is an example of "complex cases"?
@Jerry yes. exactly.
@Jerry & Omg Using a hash table requires a hash function--so that initial values are evenly distributed to buckets. See the duplicate link or any presentation of hash table use, like the hash join algorithm.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.