1

I'm working on a Pandas DF question and I am having trouble converting some Pandas data into a usable format to create a Scatter Plot.

Here is the code below, please let me know what I am doing wrong and how I can correct it going forward. Honest criticism is needed as I am a beginner.

# Import Data
df = pd.read_csv(filepath + 'BaltimoreData.csv')

df = df.dropna()
print(df.head(20))
# These are two categories within the data
df.plot(df['Bachelors degree'], df['Median Income'])

# Plotting the Data
df.plot(kind = 'scatter', x = 'Bachelor degree', y = 'Median Income')
df.plot(kind = 'density')
3
  • 3
    Forget the code, where's your data? Please print(df.head(20)) and post its output here. Commented Oct 22, 2017 at 23:11
  • I added the heading so you can see the first 20 lines of data. Commented Oct 23, 2017 at 22:58
  • Unfortunately, I don't have access to your computer, so I cannot load your data from your filepath. While it seems your issue was resolved this time, please look at how to provide a minimal reproducible example in the future which helps us give you better answers. Commented Oct 23, 2017 at 22:59

2 Answers 2

2

Simply plot x on y as below, where df is your dataframe and x and y are your dependent and independent variables:

import matplotlib.pyplot as plt
import pandas

plt.scatter(x=df['Bachelors degree'], y=df['Median Income'])
plt.show()
Sign up to request clarification or add additional context in comments.

2 Comments

When I run that I get the following error message: could not convert string to float: '$37,678 '
Well you've got Median Income formatted as a string - read_csv is detecting the dollar sign and assuming you're working with strings (i.e. text). You could simply change it to be formatted as a number in your CSV.
0

You can use scatter plot from pandas.

import pandas
import matplotlib.pyplot as plt
plt.style.use('ggplot')
df.plot.scatter(x='Bachelors degree', y='Median Income');
plt.show()

1 Comment

So I made some adjustments to the code so it looks like this: df.dropna(axis = 0, how = 'any') plt.style.use('ggplot') df.plot.scatter(x = df['Bachelors degree'], y = df['Median Income']) plt.show() However it it still throwing me the error that it cannot index with vector containing NA/NaN values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.