5

I have a table with these values: "user_id, text, text_lang, user_lang".

The field text_lang contains a locale shortcut ('en','de') representing the language of the text. I now want to set the (unknown/empty) 'user_lang' by selecting the text_lang that has the maximum occurrence for each user_id.

I have a working select statement that returns the text_lang with the most occurrences for each user_id, but I cannot figure out how to update the user_lang of each row based on the ID.

SELECT user_id, text_lang
FROM (
    SELECT  user_id,
        text_lang,
        max(text_lang_count) OVER (PARTITION BY user_id) max_count,
        text_lang_count
    FROM (
        SELECT  user_id,
                text_lang,
                COUNT(text_lang) AS text_lang_count
        FROM test
        GROUP BY user_id, text_lang
    ) AS xx
) AS xy
WHERE tweet_lang_count = max_count

I assume (but am not certain) that MySQL/SQL: Update with correlated subquery from the updated table itself contains something closely related to my question, but I have been unable to adapt that solution to this specific problem.

2 Answers 2

7

SQL Fiddle

with lang as (
    select distinct on (user_id)
        user_id, text_lang, text_lang_count
    from (
        select
            user_id,
            text_lang,
            count(text_lang) as text_lang_count
        from t
        group by user_id, text_lang
    ) s
    order by 1, 3 desc, text_lang != 'en' -- in a tie english wins
)
update t
set user_lang = lang.text_lang
from lang
where t.user_id = lang.user_id

Calculate the predominant language in the CTE (with) and use it in the from clause

Sign up to request clarification or add additional context in comments.

1 Comment

Works like a charm. Thank you for the full query solution and bonus cookies for even considering the "tie" case!First time using "with as" was useful for me. Can't upvote your answer yet, but I accepted it.
2

The correlated update in PostgreSQL has a different syntax than MySQL. It can be written like this:

UPDATE tablename T
 SET user_lang=subquery.text_lang
 FROM ( SELECT user_id,text_lang FROM... etc... ) subquery
 WHERE T.user_id=subquery.user_id

where the subquery is exactly your SELECT query from the question.

The above is non-standard, though. The form that complies to the SQL standard would be like:

UPDATE tablename T
 SET user_lang=(SELECT text_lang FROM ... WHERE... AND user_id=T.user_id)

but with this form you would have to reshape your initial query to compute the desired text_lang for just one specific user_id.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.