3

I'm trying to load streaming data from Kafka into SQL Server Big Data Clusters Data Pools. I'm using Spark 2.4.5 (Bitnami 2.4.5 spark image).

If I want to load data into regular tables, I use this sentence and it goes well:

logs_df.write.format('jdbc').mode('append').option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver').option \
        ('url', 'jdbc:sqlserver://XXX.XXX.XXX.XXXX:31433;databaseName=sales;').option('user', user).option \
        ('password', password).option('dbtable', 'SYSLOG_TEST_TABLE').save()

But the same sentence to load data into SQL Data Pool gives me this error:

py4j.protocol.Py4JJavaError: An error occurred while calling o93.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 3, localhost, executor driver): java.sql.BatchUpdateException: External Data Pool Table DML statement cannot be used inside a user transaction.

I found that the way to load data into SQL Data Pool is to use 'com.microsoft.sqlserver.jdbc.spark' format, as this:

logs_df.write.format('com.microsoft.sqlserver.jdbc.spark').mode('append').option('url', url).option('dbtable', datapool_table).option('user', user).option('password', password).option('dataPoolDataSource',datasource_name).save()

But it's giving me this error:

py4j.protocol.Py4JJavaError: An error occurred while calling o93.save.
: java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.sqlserver.jdbc.spark. Please find packages at http://spark.apache.org/third-party-projects.html

I'm running the script with spark-submit like this:

docker exec spark245_spark_1 /opt/bitnami/spark/bin/spark-submit --driver-class-path /opt/bitnami/spark/jars/mssql-jdbc-8.2.2.jre8.jar --jars /opt/bitnami/spark/jars/mssql-jdbc-8.2.2.jre8.jar --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5 /storage/scripts/some_script.py

Is there any other package I should include or some special import I'm missing?

Thanks in advance

Edited: I've tried in scala with same results

3 Answers 3

2

You need to build the repository into the jar file first using SBT. Then include it to your spark cluster.

I know there will be a lot of people having trouble with buiding this jar file (include myself of several hours ago), so I will guide you how to build the jar file, step by step:

  1. Go to https://www.scala-sbt.org/download.html to download SBT, then install it.

  2. Go to https://github.com/microsoft/sql-spark-connector and download the zip file.

  3. Open the folder of the repository you have just downloaded, right click in the blank space and click "Open PowerShell windows here" . https://i.sstatic.net/Fq7NX.png

  4. In the Shell windows, type "sbt" then press enter. It may require you to download the Java Development Kit. If so, go to https://www.oracle.com/java/technologies/javase-downloads.html to download and install it. You may need to close and reopen the shell windows after installing.

If things go right, you may see this screen: https://i.sstatic.net/fMxVr.png

  1. After the above step has done its job, type "package". The shell may show you something like this, and it may take you a long time to finish the job. https://i.sstatic.net/hr2hw.png

  2. After the build is done, go to the "target" folder, then "scala-2.11" folder to get the jar file. https://i.sstatic.net/Aziqy.png

  3. After you got the jar file, include it to the Spark cluster.

OR, if you don't want to do the troublesome procedures above....

UPDATE MAY 26, 2021: The connector is now available in Maven, so you can just go there and do the rest.

https://mvnrepository.com/artifact/com.microsoft.azure/spark-mssql-connector

If you need more information, just comment. I will try my best to help.

Sign up to request clarification or add additional context in comments.

1 Comment

I found version 1.2 in Maven list, installed fine in Databricks, and solved problem. Thanks.
0

According to the documentation: "To include the connector in your projects, download this repository and build the jar using SBT."

So you need to build the connector JAR file using the build.sbt in the repository, then put the JAR file into spark: your_path\spark\jars

To do this, download the SBT here: https://www.scala-sbt.org/download.html. Open SBT in the directory where you saved the build.sbt then run sbt package. A target folder should be created in the same directory and the JAR file is located in target\scala-2.11

Comments

0

I was facing same issue to write to sql server using spark, so i tried the approach in this thread:

https://sqlrelease.com/read-and-write-data-to-sql-server-from-spark-using-pyspark

steps mentioned in the above thread:

  1. Download the driver file.

  2. unzip it and get the “sqljdbc42.jar” file from “sqljdbc_6.0\enu\jre8” location (if are using java 8).

  3. Copy it to spark’s jar folder. In our case it is C:\Spark\spark-2.4.3-bin-hadoop2.7\jars.

  4. Start a new SparkSession if required.

note I am using spark 3.5 version compared to yours 2.4.5.

Also make sure you stop spark session and then restart new session and try.

Code i implemented:

from pyspark.sql import SparkSession
from pyspark.sql.functions import *

spark = SparkSession.builder.appName("kafka-streaming-app") \
        .config("spark.streaming.stopgracefullyOnShutdown", True) \
        .config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0") \
        .config("spark.sql.shuffle.partitions", 4) \
        .master("local[2]").getOrCreate()

kafka_df = spark.readStream.format("kafka") \
    .option("kafka.bootstrap.servers", "localhost:9093") \
    .option("subscribe", "device-data") \
    .option("startingOffsets", "earliest") \
    .load()

url = 'jdbc:sqlserver://localhost:1433;database=mydb;'

kafka_df.write.mode("append") \
            .format("jdbc")\
            .option("url", url) \
            .option("dbtable", 'event') \
            .option("user", "demo") \
            .option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver") \
            .option("password", "PassTest123@") \
            .save()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.