5

I got this error when i tried to write a spark dataframe to postgres DB. I am using a local cluster and the code is as follows:

from pyspark import SparkContext
from pyspark import SQLContext, SparkConf
import os

os.environ["SPARK_CLASSPATH"] = '/usr/share/java/postgresql-jdbc4.jar'

conf = SparkConf() \
.setMaster('local[2]') \
.setAppName("test")

sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)

df = sc.parallelize([("a", "b", "c", "d")]).toDF()

url_connect = "jdbc:postgresql://localhost:5432"
table = "table_test"
mode = "overwrite"
properties = {"user":"postgres", "password":"12345678"}
df.write.option('driver', 'org.postgresql.Driver').jdbc(
     url_connect, table, mode, properties)

The error log is as follows:

Py4JJavaError: An error occurred while calling o119.jdbc.
: java.lang.NullPointerException
at  org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:308)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)

I have tried search an answer from the web but could not find any. Thank you in advance!

4
  • Did you check out this post ? stackoverflow.com/questions/30983982/… Commented Aug 9, 2016 at 13:35
  • May be this one will help? stackoverflow.com/questions/33574807/… Commented Aug 9, 2016 at 14:43
  • Thanks both. But I still cannot figure out what caused the nullpointer exception. Commented Aug 9, 2016 at 23:50
  • Did you find a solution to this problem? Commented Apr 28, 2019 at 9:46

2 Answers 2

1

Have you tried specifying the database in your table_test variable? I have a similar implementation that looks like this:

mysqlUrl = "jdbc:mysql://mysql:3306"
properties = {'user':'root',
              'password':'password',
              'driver':'com.mysql.cj.jdbc.Driver'
              }
table = 'db_name.table_name'

try:
    schemaDF = spark.read.jdbc(mysqlUrl, table, properties=properties)
    print 'schema DF loaded'
except Exception, e:
    print 'schema DF does not exist!'
Sign up to request clarification or add additional context in comments.

Comments

0

I also have the same problem by using MySQL.

The way to solve the problem is by finding the right jar.

1 Comment

Hi do you mean the java jar?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.