11

Due to my lack of knowledge in writing code in pyspark / python, I have decided to write a query in spark.sql. I have written the query in two formats. The first format allows EOL breaks. However, in that format I get an error, see below:

results5 = spark.sql("SELECT\
  appl_stock.Open\
 ,appl_stock.Close\
FROM appl_stock\
WHERE appl_stock.Close < 500")

The above format produces the following error:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:

~/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    318                     "An error occurred while calling {0}{1}{2}.\n".
--> 319                     format(target_id, ".", name), value)
    320             else:

Py4JJavaError: An error occurred while calling o19.sql.
: org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input '.' expecting {<EOF>, ',', 'FROM', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 35)

== SQL ==
SELECT  appl_stock.Open  appl_stock.CloseFROM appl_stockWHERE appl_stock.Close < 500
-----------------------------------^^^

    at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
    at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)


During handling of the above exception, another exception occurred:

ParseException                            Traceback (most recent call last)
<ipython-input-38-4920f7e68c0e> in <module>()
----> 1 results5 = spark.sql("SELECT  appl_stock.Open  appl_stock.CloseFROM appl_stockWHERE appl_stock.Close < 500")

~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/session.py in sql(self, sqlQuery)
    539         [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
    540         """
--> 541         return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
    542 
    543     @since(2.0)

~/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
     71                 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
     72             if s.startswith('org.apache.spark.sql.catalyst.parser.ParseException: '):
---> 73                 raise ParseException(s.split(': ', 1)[1], stackTrace)
     74             if s.startswith('org.apache.spark.sql.streaming.StreamingQueryException: '):
     75                 raise StreamingQueryException(s.split(': ', 1)[1], stackTrace)

ParseException: "\nmismatched input '.' expecting {<EOF>, ',', 'FROM', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 35)\n\n== SQL ==\nSELECT  appl_stock.Open  appl_stock.CloseFROM appl_stockWHERE appl_stock.Close < 500\n-----------------------------------^^^\n"

Whereas the following code produces a successful result, see below:

results6 = spark.sql("SELECT appl_stock.Open ,appl_stock.Close FROM appl_stock WHERE appl_stock.Close < 500")

Can someone let me know why the first code doesn't work, but the second does?

Cheers

6 Answers 6

7

Because you are using \ in the first one and that's being passed as odd syntax to spark. If you want to write multi-line SQL statements, use triple quotes:

results5 = spark.sql("""SELECT
  appl_stock.Open
 ,appl_stock.Close
FROM appl_stock
WHERE appl_stock.Close < 500""")
Sign up to request clarification or add additional context in comments.

Comments

2

In my case, I was getting the error mismatched input '*' expecting <EOF>. Then I realized I was using spark.table( instead of spark.sql(. I thought this might be useful for someone...

Comments

2

Bizarrely, this was thrown for me if I terminated my SQL statement with ;. Simply removing that fixed it.

i.e.

spark.sql("""
SELECT * FROM foo WHERE bar = 3;
""")

breaks.

spark.sql("""
SELECT * FROM foo WHERE bar = 3
""")

does not.

Comments

0

The error message explains exactly what's happening. There is no space before the FROM and WHERE keywords.

For example, if you had the following DataFrame:

df = spark.createDataFrame([(490, 495), (499, 505), (510, 499)], ["Open", "Close"])
df.show()
#+----+-----+
#|Open|Close|
#+----+-----+
#| 490|  495|
#| 499|  505|
#| 510|  499|
#+----+-----+

df.createOrReplaceTempView("appl_stock")

If you changed your code to add a space before each \:

results5 = spark.sql("SELECT \
  appl_stock.Open \
 ,appl_stock.Close \
FROM appl_stock \
WHERE appl_stock.Close < 500")

result5.show()
#+----+-----+
#|Open|Close|
#+----+-----+
#| 490|  495|
#| 510|  499|
#+----+-----+

Comments

0

Similar error messaged would be thrown incase extraneous characters like | are present in SQL query string.

Comments

0

Another surprising case: I was getting error of the same flavour, namely mismatched input 'from' expecting <EOF>. My query contained a parameter and I forgot to resolve it (please note the initial f is missing):

# failing query
my_qry = """
create table my_tab as 
select columns 
from source_tab
where somecolumn < {parameter}
"""
spark.sql(qry_defaults).collect()

So the solution was to resolve the parameters properly (mind the initial f):

# correct query
my_qry = f"""
create table my_tab as 
select columns 
from source_tab
where somecolumn < {parameter}
"""
spark.sql(qry_defaults).collect()

To summarise, in my case the error had nothing to do with the missing EOF. I hope someone finds this helpful.

1 Comment

f string way is more redeable !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.