0

How to sort Array[Row] by given column index in Scala?

I'm using RDD[Row].collect() which gives me array[Row], but I want to sort it based on a given column index.

I have already used quick-sort logic and it's working, but there are too many for loops and all.

I would like to use a Scala built-in API which can do this task with the minimum amount of code.

0

1 Answer 1

1

It would be much more efficient to sort the Dataframe before collecting it - if you collect it, you lose the distributed (and parallel) computation. You can use Dataframe's sort, for example - ascending order by column "col1":

val sorted = dataframe.sort(asc("col1"))
Sign up to request clarification or add additional context in comments.

1 Comment

"It would be much more efficient to sort the Dataframe before collecting it " If it fits on one node (i.e. can be collected) I'll bet it's more efficient to sort after collect :) but in general it won't be possible to collect an RDD, or one wouldn't be using Spark...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.