SQLContext object has no attribute read while reading csv in pyspark

Question

I am loading a csv file into pyspark as follows (within pyspark shell):

>>> from pyspark.sql import SQLContext
>>> sqlContext = SQLContext(sc)
>>> df = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load('data.csv')

but I am getting this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'SQLContext' object has no attribute 'read'
>>>

I am using spark 1.3.1 and I am trying to use spark-csv

xiº · Accepted Answer · 2015-10-06 11:00:04Z

7

You are trying to use Spark 1.4+ syntax.

For Spark 1.3

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

df = sqlContext.load(source="com.databricks.spark.csv", header="true", path = "cars.csv")
df.select("year", "model").save("newcars.csv", "com.databricks.spark.csv")

edited Oct 6, 2015 at 11:00

answered Oct 6, 2015 at 10:37

xiº

4,7173 gold badges31 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mohamed Ali JAMAOUI Over a year ago

Actually I am using the python api example of the spark-csv module, github.com/databricks/spark-csv#python-api. which makes use of read as I am doing

xiº Over a year ago

@MedAli you are trying to use Spark 1.4+: syntax

Collectives™ on Stack Overflow

SQLContext object has no attribute read while reading csv in pyspark

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related