0

I have obtained a scatter plot for a given X and Y. There are multiple Y values for some X values. I want to make the result into an array.

I basically want to find the mean of each column (mean of all Y values for a given X value) and plot it. Here's the data for the X and Y vectors that I have- Scatter data

Here's my code -

import pandas as pd
import matplotlib.pyplot as plt

dataset = pd.read_csv('csv_file.csv')
dataset = dataset.iloc[:, 1:3]
dataset = dataset.sort_values(by=dataset.columns[1])
X = dataset.iloc[:, 1].values
X = X.reshape(len(X), 1)
y = dataset.iloc[:, 0].values
y = y.reshape(len(y), 1)
plt.scatter(y, X, color='pink', label='data')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Data plotting')
plt.legend()
plt.show()
1
  • Please share relevant output or the csv file. How are we supposed to interpret columns, shape, etc. of the CSV file? You said you want to find mean of all Y values for a given X value but you have used scatter plot which means you only have one corresponding value with respect to x. Please clearly specify the objective with relevant output along with the effort you have made in finding mean. Commented Nov 17, 2020 at 16:45

1 Answer 1

1

Since you mentioned that what you are actually after is getting the means, why not just do that directly using .groupby(colname).mean()? I don't have access to your data so I just used random integers.

import pandas as pd
from random import randint
if __name__ == '__main__':
    df = pd.DataFrame(data=[(randint(0,100), randint(0,1000)) for _ in range(1000)], columns=['x','y'])
    means = df.groupby('x').mean()
    print(df)
    print(means)
       x    y
0     58  761
1    100  488
2     70  213
3     11  299
4     63  166
..   ...  ...
995   54  323
996   72  160
997   77  234
998   59  523
999   52  730

[1000 rows x 2 columns]
              y
x              
0    547.100000
1    741.833333
2    408.000000
3    791.000000
4    396.166667
..          ...
96   384.000000
97   485.800000
98   629.375000
99   618.750000
100  632.500000

[101 rows x 1 columns]
Sign up to request clarification or add additional context in comments.

1 Comment

How do I store the values in the 'x' column after getting means in another variable?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.