0

I need to select one and only 1 row of data based on an ID in the data I have. I thought I had solved this (For details, see my original question and my solution, here: PostgreSQL - Select only 1 row for each ID)

However, I now still get multiple values in some cases. If there is only "N/A" and 1 other value, then no problem.. but if I have multiple values like: "N/A", "value1" and "value2" for example, then my case statement is not sufficient and I get both "value1" and "value2" returned to me. This is the case statement in question:

CASE
    WHEN "PQ"."Value" = 'N/A' THEN 1
    ELSE 0
END

I need to give a unique integer value to each string value and then the problem will be solved. The question is: how do I do this? My first thought is to somehow convert the character values to ASCII and sum them up.. but I am not sure how to do that and also worried about performance. Is there a way to very simply assign a value to each string so that I can choose 1 value only? I don't care which one actually... just that it's only 1.

EDIT

I am now trying to create a function to add up the ASCII values of each character so I can essentially change my case statement to something like this:

CASE
    WHEN "PQ"."Value" = 'N/A' THEN 9999999
    ELSE SumASCII("PQ"."Value")
END

Having a small problem with it though.. I have added it as a separate question, here: PostgreSQL - ERROR: query has no destination for result data

EDIT 2

Thanks to @Bohemian, I now have a working solution, which is as follows:

CASE
    WHEN "PQ"."Value" = 'N/A' THEN -1
    ELSE ('x'||LPAD(MD5("PQ"."Value"),16,'0'))::bit(64)::bigint
END DESC

2 Answers 2

1

This will produce a "unique" number for each value:

('x'||substr(md5("PQ"."Value"),1,8))::bit(64)::bigint

Strictly speaking, there is a chance of a collision, but it's very remote.

If the result is "too big", you could try modulus:

<above-calculation> % 10000

Although collisions would then be a 0.01% chance, you should try this formula against all known values to ensure there are no collisions.

Sign up to request clarification or add additional context in comments.

3 Comments

This doesn't work. md5() returns a hexadecimal value which cannot be turned into an integer.. the error I get when trying to do as you suggested is as follows: operator does not exist: text % integer. This was from just a simple test: SELECT md5('email') % 10000
Thanks. 2 things: 1. You're missing a start parenthesis '('. 2. I was also looking at a similar solution, which is: ('x'||LPAD(MD5("PQ"."Value"),16,'0'))::bit(64)::bigint Yours and this one produce slightly different values. Any idea what the difference is?
@ Matt md5 returns a hex string, which doesn't have leading zeros. The other code guarantees 16 hex chars (64 bits), in case by some remote (but possible) chance that md5 is so small that there aren't enough bits for the cast.
0

If you don't care which value gets picked, change RANK() to ROW_NUMBER(). If you do care, do it anyway, but also add another term after the CASE statement in ORDER BY, separated by a comma, with the logic you want - for example if you want the first value alphabetically, do this:

...
ORDER BY CASE...END, "PQ"."Value")
...

5 Comments

Thanks. The thing is, I have no idea what the values will be. Basically, I want this logic: If there's only 'N/A', then return that... If there are other values as well, then return anything that is not 'N/A'. If I do an ORDER BY as you suggest, I might sometimes get 'N/A' when I don't want it. That's why I used the CASE statement to set 'N/A' with a value of 1 and any other value as 0.. but then of course multiple values get 0 and I get multiple values returned instead of only 1. Very frustrating.
I am now trying to create a function to add up the ASCII values of each character so I can essentially change my case statement to something like this: CASE WHEN "PQ"."Value" = 'N/A' THEN 9999999 ELSE SumASCII("PQ"."Value") END.. having a small problem with it though.. which you can see here if you want: stackoverflow.com/questions/42149695/…
Please try it just with the change to ROW_NUMBER() - I really think that's going work for you.
Thanks, but, I just tried it and when I have both 'N/A' and 'popup', and change to ROW_NUMBER() instead of RANK(), then I only get 'N/A', which is not correct.
If you are using the full expression ROW_NUMBER() OVER (PARTITION BY B.ID ORDER BY CASE WHEN "PQ"."Value" = "N/A" THEN 1 ELSE 0 END, "PQ"."Value") You really should not be getting 'N/A' unless the data is not as you initially stated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.