0

I have a column of data that pulls in UTM tag from Google Ads with the below ID's. It contains campaign ID's (The initial part before the "___") and then the ad group ID after. For some cases we only have campaign ID's which are strings, this is the reason why I am type casting with ::TEXT.

This is what the UTM tags look like when pulled in.

835783587___42385125483
eu
968720083___47551372269
en_usa_search_brand
648594695___38174608372
886097479___45386492795
en_trust_control
competitors
es
en_esp_search_route
1072851000___55370810634

I'm trying to split out the ID's from each other and remove the underscores then push these to another table.

umc.campaign is the column that contains the UTM tag.

I'm creating this temp table to then push to the final table below.

 CREATE TABLE reports.tmp_sem_attribution  AS (
        SELECT DISTINCT ON (umc.user_id)
            umc.user_id,
            umc.source,
            umc.campaign ::TEXT,
    (SPLIT_PART(REPLACE(campaign,'__','_'),'__',1)) :: TEXT AS campaign_id,
    (SPLIT_PART(REPLACE(campaign,'__','_'),'__',2)) :: TEXT AS adgroup_id,

When I use the below query to check the results, I can see that some of the Ad Group ID's are empty or have a space in them.

reports.sem_attribution_v2 is the table where I am pushing out the ID's into two different columns.

SELECT * FROM reports.sem_attribution_v2 WHERE adgroup_id =''

**RESULT**
Campaign_ID                     AdGroup ID
eu  
1560591282  
en_usa_search_brand 
1560608121  
en_trust_control    
1560591282  
en_fra_search_generic_manual    
990427417   
eu  

If you guys could shed some light on how I could approach this differently or if this query is incorrect. That would be much appreciated.

Thanks.

2
  • Can you provide source text for campaign ids: 1560608121, 1560591282 and 990427417? Commented Dec 17, 2018 at 13:38
  • @Adam For those 3 ID's the source text is exactly the same as the above, it only contains the campaign ID with no brackets or Adgroup ID's Commented Dec 17, 2018 at 14:13

2 Answers 2

1

You may use REGEXP_REPLACE

SELECT   REGEXP_REPLACE(campaign,'(\d+)___\d+','\1') as campaign_id,
         REGEXP_REPLACE(campaign,'\d+___(\d+)','\1') as adgroup_id
                 FROM t;

OR SUSBTRING with a case condition.

SELECT CASE 
         WHEN campaign ~ '(\d+)___(\d+)' THEN 
         substring(campaign FROM '(\d+)___')  --extracts string before "__"
         ELSE campaign                        --same string when pattern not found
       end AS campaign_id, 
       CASE 
         WHEN campaign ~ '(\d+)___(\d+)' THEN 
         substring(campaign FROM '___(\d+)')   --extracts string after "__"
         ELSE campaign 
       end AS adgroup_id 
FROM   t; 

Demo

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Kaushik, the REGEXP_REPLACE statement has done the job.
0

SPLIT_PART function will return an empty string when there are less fields after splitting then requested. For example: when there is only one field and you want to get the second field you would get an empty string. Which is correct and fine for your approach.

You can simplify your query because the REPLACE part is not nesesery:

(SPLIT_PART(campaign, '___', 1))::TEXT AS campaign_id,
(SPLIT_PART(campaign, '___', 2))::TEXT AS adgroup_id

Another improvement could be to replace empty strings with NULL values. You can do it while inserting data into reports.sem_attribution_v2 table:

CASE WHEN adgroup_id = '' THEN NULL ELSE adgroup_id END

1 Comment

Thanks for the help Adam

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.