1

when i try to create dataframe from two columns i.e. pids and SalePrice I get error "Exception: Data must be 1-dimensional". I think the error is coming because these two data series are in different format like below. Please help how can i make these data series same

ksubmission = pd.DataFrame({'Id':pids,'SalePrice':predictions_kaggle})

Exception: Data must be 1-dimensional

pids.shape

(1459,)

predictions_kaggle.shape

(1459, 1)

predictions_kaggle is in below format

array([[115901.20520943],
       [144313.70246636],
       [165320.94012928],
       ...,
       [155759.14767572],
       [111175.64223766],
       [249104.99042467]])

while pids is in below format

0       1461
1       1462
2       1463
3       1464
4       1465
        ... 
1454    2915
1455    2916
1456    2917
1457    2918
1458    2919
Name: Id, Length: 1459, dtype: int64
4
  • have you tried using pids.values instead of pids? Commented Dec 17, 2019 at 1:10
  • 1
    Or have you tried to use predictions_kaggle.flatten() to make it 1-dimensional? Commented Dec 17, 2019 at 1:14
  • I still get same error with pids.values Commented Dec 17, 2019 at 1:18
  • predictions_kaggle.flatten() worked.. thanks! Commented Dec 17, 2019 at 1:19

2 Answers 2

1

I think you need to do this if the lengths are the same:

import pandas as pd
import numpy as np
pd.DataFrame(predictions_kaggle, index=pids).reset_index().rename(columns={'index': 'Id', 0:'SalePrice'}) 

or

pd.DataFrame({'Id':pids,'SalePrice':np.ndarray.flatten(predictions_kaggle)}) 
Sign up to request clarification or add additional context in comments.

3 Comments

but pids shouldnt be index.. instead it should be a column
does it work though, we can rename it a column by reseting index. It also let me know how to format it
@user2774120 i changed it. It might not be the end solution, but it might work. Otherwise i believe you have to flatten the list with np.ndarray.flatten
1

The problem here is that your predictions_kaggle array is not a 1-D array but rather a 2-D one. As proof, the shape of a 1-D array should be in the form (n,) but instead you have (n,1) which indicates that each line of your array is a single value inside an array. A quick fix to this is by flattening the array, which will turn it into a 1-D array:

ksubmission = pd.DataFrame({'Id':pids,'SalePrice':predictions_kaggle.flatten()})

Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.