4

I am trying to sort data in a CSV file using sort function in Pandas using the following code. I have 229 rows in original file. But the output of sorting is 245 rows, because some of the data in a field were printed in the next row and some of the rows do not have any value.

sample=pd.read_csv("sample.csv" , encoding='latin-1', skipinitialspace=True)
sample_sorted = sample.sort_values(by = ['rating'])
sample_sorted.to_csv("sample_sorted.csv")

I think, this problem happened because in some cells data was entered by generating new lines. For example this is the content of a cell in original file. When I sort the original file, the second line was printed in a new row and 3 rows left empty between first and second line.

"Side effects are way to extreme. 



E-mail me if you have experianced the same things."

Any suggestion? Thanks !

21
  • 1
    can you post an output of: print(sample.shape)? Commented Sep 5, 2016 at 21:08
  • @MaxU, output of print (sample.shape) is (229, 10) Commented Sep 5, 2016 at 21:25
  • @Merlin, I thought it maybe some other character inside the file, such as arabic characters. yes the file has header. Commented Sep 5, 2016 at 21:26
  • @Mary, that's interesting. Can you upload your CSV file somewhere, so we can reproduce your issue? Commented Sep 5, 2016 at 21:27
  • 3
    @Mary I understand the nature of the content is sensitive. However, if this is an important problem to solve, it may be worth while to create a fake file that produces the same issue you are observing. This way, you can share it with us. The issue can probably be recreated with only a few rows. This would help you get your answer quicker. Commented Sep 5, 2016 at 22:15

1 Answer 1

2

You could try to remove the newlines in your problem column.

sample=pd.read_csv("sample.csv" , encoding='latin-1', skipinitialspace=True)
sample["problem_column"] = (sample["problem_column"].
                            apply(lambda x: " ".join([word for word in x.split()])
                            )

and see if that helps at all. It's difficult to see why that's happening without a reproducible sample.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.