1

I have multiple sql queries I need to run (via pandas.io.sql / .read_sql) that have a very similar structure so I am attempting to parameterize them.

I am wondering if there is a way to pass in column values using .format (which works for strings).

My query (truncated to simplify this post):

sql= '''
SELECT DISTINCT 
    CAST(report_suite AS STRING) AS report_suite, post_pagename,
    COUNT(DISTINCT(CONCAT(post_visid_high,post_visid_low))) AS unique_visitors 
FROM 
    FOO.db
WHERE 
    date_time BETWEEN '{0}' AND '{1}'
    AND report_suite = '{2}'
GROUP BY 
    report_suite, post_pagename
ORDER BY 
    unique_visitors DESC
'''.format(*parameters)

What I would like to do, is be able to parameterize the COUNT(DISTINCT(CONCAT(post_visid_high, post_visid_low))) as Unique Visitors

like this somehow:

COUNT(DISTINCT({3})) as {'4'}

The problem I can't seem to get around is that in order to do this would require storing the column names as something other than a string to avoid the quotes. Is there any good ways around this?

4
  • For starters: please do not pass values to SQL queries using string formatting. It is a bad habit that is both insecure and overly complicated (quoting etc.). Use suitable placeholders and pass your params to read_sql() separately using the params keyword argument. Commented Dec 7, 2017 at 15:54
  • Thanks for the tip. Do you have any recommendations on where I can find documentation on this method? Commented Dec 7, 2017 at 15:58
  • Second, if you need dynamic SQL, you might as well learn to use SQLAlchemy Core, which is well suited for query building. Commented Dec 7, 2017 at 15:59
  • Regarding placeholders, params, and documentation: start by reading pandas' docs on read_sql(), especially params kwarg. From there move on to reading your DB-API driver's docs on the matter. And do read the SQLA Core docs as well. Commented Dec 7, 2017 at 16:06

1 Answer 1

2

Consider the following approach:

sql_dynamic_parms = dict(
  func1='CONCAT(post_visid_high,post_visid_low)',
  name1='unique_visitors'
)

sql= '''
SELECT DISTINCT 
    CAST(report_suite AS STRING) AS report_suite, post_pagename,
    COUNT(DISTINCT({func1})) AS {name1} 
FROM 
    FOO.db
WHERE 
    date_time BETWEEN %(date_from)s AND %(date_to)s
    AND report_suite = %(report_suite)s
GROUP BY 
    report_suite, post_pagename
ORDER BY 
    unique_visitors DESC
'''.format(**sql_dynamic_parms)

params = dict(
  date_from=pd.to_datetime('2017-01-01'),
  date_to=pd.to_datetime('2017-12-01'),
  report_suite=111
)

df = pd.read_sql(sql, conn, params=params)

PS you may want to read PEP-249 to see what kind of parameter placeholders are accepted

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! This is exactly what I was looking to do.
@Jasonc200, glad I could help :)
@IljaEverilä, do you mean that using :name instead of %(name)s is a preferred way? (PEP-249)
Nah, I mean remove the left over quotes from around it. They'll cause trouble, for example a resulting query might look like ''2017-01-01'', which is clearly wrong.
@IljaEverilä, duh! Thank you! :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.