0

Sorry for the noob question but I've been stuck for hours on that problem :

If I type :

df['avg_wind_speed_9am'].head()

It returns :

TypeError Traceback (most recent call last) <ipython-input-42-c01967246c17> in <module>() ----> 1 df['avg_wind_speed_9am'].head() TypeError: 'Column' object is not callable

And if I type :

df[['avg_wind_speed_9am']].head()

It returns :

Row(avg_wind_speed_9am=2.080354199999768)

I don't understand, normally it should print a column.

Here is how I imported the dataframe :

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.read.load('file:///home/cloudera/Downloads/big-data-4/daily_weather.csv', format='com.databricks.spark.csv', header='true', inferSchema='true')

Here is how my dataset looks like :

number,air_pressure_9am,air_temp_9am,avg_wind_direction_9am,avg_wind_speed_9am,max_wind_direction_9am,max_wind_speed_9am,rain_accumulation_9am,rain_duration_9am,relative_humidity_9am,relative_humidity_3pm
0,918.0600000000087,74.82200000000041,271.1,2.080354199999768,295.39999999999986,2.863283199999908,0.0,0.0,42.42000000000046,36.160000000000494
1,917.3476881177097,71.40384263106537,101.93517935618371,2.4430092157340217,140.47154847112498,3.5333236016106238,0.0,0.0,24.328697291802207,19.4265967985621
6
  • Can you please share your dataframe in the text form? Commented Nov 8, 2020 at 18:22
  • please post the dataframe in the form of text Commented Nov 8, 2020 at 18:23
  • 2
    Your error messages and output look like pyspark, not pandas. Commented Nov 8, 2020 at 18:27
  • Damn, I didn't know pyspark and pandas were different about that. Yes, I'm on pyspark. Commented Nov 8, 2020 at 18:29
  • Subscribing to what @Michael Szczesny said - I would try: df.select('avg_wind_speed_9am').head() to keep it more conventional Commented Nov 8, 2020 at 18:31

1 Answer 1

0

Try one of the below:

df.select('avg_wind_speed_9am').head()

df.select('avg_wind_speed_9am').show()
n = 10
df.select('avg_wind_speed_9am').take(n)

Generally in pyspark you query dataframe, and not individual columns, hence to query single column you need to use:

df.select(<list_of_cols>) where <list_of_cols> is a single column in your case.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.