3

Trying to read a table with PySpark from a Postgres DB. I have set up the following code and verified SparkContext exists:

import os

os.environ['PYSPARK_SUBMIT_ARGS'] = '--driver-class-path /tmp/jars/postgresql-42.0.0.jar --jars /tmp/jars/postgresql-42.0.0.jar pyspark-shell'


from pyspark import SparkContext, SparkConf

conf = SparkConf()
conf.setMaster("local[*]")
conf.setAppName('pyspark')

sc = SparkContext(conf=conf)


from pyspark.sql import SQLContext

properties = {
    "driver": "org.postgresql.Driver"
}
url = 'jdbc:postgresql://tom:@localhost/gqp'

sqlContext = SQLContext(sc)
sqlContext.read \
    .format("jdbc") \
    .option("url", url) \
    .option("driver", properties["driver"]) \
    .option("dbtable", "specimen") \
    .load()

I get the following error:

Py4JJavaError: An error occurred while calling o812.load. : java.lang.NullPointerException

The name of my database is gqp, table is specimen, and have verified it is running on localhost using the Postgres.app macOS app.

1 Answer 1

3

The URL was the problem!

Originally it was: url = 'jdbc:postgresql://tom:@localhost/gqp'

I removed the tom:@ part, and it worked. The URL must follow the pattern: jdbc:postgresql://ip_address:port/db_name, whereas mine was directly copied from a Flask project.

If you're reading this, hope you didn't make this same mistake :)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.