pyspark dataframe error due to java.lang.ClassNotFoundException: org.postgresql.Driver

Question

I want to read data from Postgresql using JDBC and store it in pyspark dataframe. When I want to preview the data in dataframe with methods like df.show(), df.take(), they return an error saying caused by: java.lang.ClassNotFoundException: org.postgresql.Driver. But df.printschema() would return info of the DB table perfectly. Here is my code:

from pyspark.sql import SparkSession

spark = (
    SparkSession.builder.master("spark://spark-master:7077")
    .appName("read-postgres-jdbc")
    .config("spark.driver.extraClassPath", "/opt/workspace/postgresql-42.2.18.jar")
    .config("spark.executor.memory", "1g")
    .getOrCreate()
)
sc = spark.sparkContext

df = (
    spark.read.format("jdbc")
    .option("driver", "org.postgresql.Driver")
    .option("url", "jdbc:postgresql://postgres/postgres")
    .option("table", 'public."ASSET_DATA"')
    .option("dbtable", _select_sql)
    .option("user", "airflow")
    .option("password", "airflow")
    .load()
)

df.show(1)

Error log:

Py4JJavaError: An error occurred while calling o44.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 172.21.0.6, executor 1): java.lang.ClassNotFoundException: org.postgresql.Driver

Caused by: java.lang.ClassNotFoundException: org.postgresql.Driver

Edited 7/24/2021 The script was executed on JupyterLab in a separated docker container from the Standalone Spark cluster.

Steven · Accepted Answer · 2021-07-24 21:31:43Z

You are not using the proper option. When reading the doc, you see this :

Extra classpath entries to prepend to the classpath of the driver. Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-class-path command line option or in your default properties file.

This option is for the driver. This is the reason why the acquisition of the schema works, it is an action done on the driver side. But when you run a spark command, this command is executed by the workers (or executors). They need also to have the .jar to access postgres.

If your postgres driver ("/opt/workspace/postgresql-42.2.18.jar") does not need any dependencies, then you can add it to the worker using spark.jars - I know mysql does not require depencies for example but I never tried postgres. If it needs dependencies, then it is better to call directly the package from maven using spark.jars.packages option. (see the link of the doc for help)

Rahul Pandey · Accepted Answer · 2024-03-23 11:03:45Z

I attempted various methods, but unfortunately, none of them proved to be very helpful as I continued to encounter the same error. Subsequently, I decided to try the solution provided below. postgres package will be downloaded automatically in the environment and made accessible.

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[1]") \
                .appName('learn.com') \
                .config("spark.jars.packages", 
                "org.postgresql:postgresql:42.6.0") \
                .getOrCreate()

For reading the databse table from postgres use below command -

     jdbcDF = spark.read.format("jdbc"). \
     options(url='jdbc:postgresql://localhost:5432/postgres', # 
     jdbc:postgresql://<hos
     dbtable='company',
     user='postgres',
     password='admin',
     driver='org.postgresql.Driver').\
     load()

Jeremy Caney · Accepted Answer · 2022-06-24 20:02:22Z

1

You can also try adding:

.config("spark.executor.extraClassPath", "/opt/workspace/postgresql-42.2.18.jar"

So that the jar is included for your executors as well.

edited Jun 24, 2022 at 20:02

Jeremy Caney

7,808115 gold badges58 silver badges86 bronze badges

answered Jun 22, 2022 at 19:32

Elias Habash

111 bronze badge

1 Comment

Luis Felipe Over a year ago

This worked for me! I think it's importat to say that the value of this argument has to be the exact path to the .jar. I was having that error because I just copied the path from Jupyter without checking the location with pwd command.

user9469 · Accepted Answer · 2023-11-22 18:43:41Z

Scenario - Connecting Jupyterlab with local/server host (with installed postgreSQL) & writing JSON in postgreSQL db

No need to add - .config("spark.jars","postgresql-42.7.0.jar")(while creating SparkSession) & and also we can't set spark.set("spark.jars","postgresql-42.7.0.jar") in the run time
Just add the below commands (& at the end of df.write do not forget to add .save(), else it won't be saved in DB)
Do not use regular write statements (df.write.jdbc(url="url", table="table_name",mode="append", properties="properties") - you will be getting parse URL error
Download the jar (latest/required) file from (https://jdbc.postgresql.org/download/postgresql-42.7.0.jar)
Add the downloaded jar file in Spark\spark-3.2.0-bin-hadoop3.2\jars
Restart the Jupyter/Server & It should be working

df.write.format("jdbc").mode("append") \
        .option("driver","org.postgresql.Driver") \
        .option("url","jdbc:postgresql://localhost:5432/postgres") \
        .option("dbtable","TABLENAME") \
        .option("user","postgres") \
        .option("password","PASSWORD") \
        .save()

It worked for me

Gherbi Hicham · Accepted Answer · 2024-05-01 15:15:27Z

0

If you are using spark-submit command to run your spark job, do not forget to add the two prameters --driver-class-path and --jars.

Exmaple:spark-submit --driver-class-path /path/toPostgresJar/postgresql-42.6.1.jar --jars postgresql-42.6.1.jar --master spark://localhost:7077 yourSparkJob.py

answered May 1, 2024 at 15:15

Gherbi Hicham

2,6244 gold badges29 silver badges42 bronze badges

Collectives™ on Stack Overflow

pyspark dataframe error due to java.lang.ClassNotFoundException: org.postgresql.Driver

5 Answers 5

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related