1

I use pyspark to read hbase table as a dataframe, but it went some wrong:

sc = SparkContext(master="local[*]", appName="test")
spark = SparkSession(sc).builder.getOrCreate()
df = spark.read.format('org.apache.hadoop.hbase.spark') \
    .option('hbase.table', 'h_table') \
    .option('hbase.columns.mapping',
            'life_id STRING :key, score STRING info:total_score') \
    .option('hbase.use.hbase.context', False) \
    .option('hbase.config.resources', 'file:///home/softs/hbase-2.0.5/conf/hbase-site.xml') \
    .option('hbase-push.down.column.filter', False) \
    .load()

df.show()

it shows: java.lang.ClassNotFoundException: Failed to find data source: org.apache.hadoop.hbase.spark. Please find packages at http://spark.apache.org/third-party-projects.html

I followed the demo

1 Answer 1

1

The dependency is not packaged with your JAR. Use the —packages flag of spark-submit to pas the uri of the connector you are using if you don’t wish to package the dependency in your project

add the following lines to your spark-submit command:

--packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/

and it should work.

Sign up to request clarification or add additional context in comments.

3 Comments

@littlely could you share your pom.xml or build.sbt file you are using for the code
This is pyspark, I have no porm.xml.
@littlely try this link. The person has found the solution for PySpark

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.