From the course: Complete Guide to Databricks for Data Engineering

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Use sortBy and orderBy in PySpark

Use sortBy and orderBy in PySpark

- We have cleaned the data by removing the nulls. We have cleaned the data by filtering it out. Now it's time to sort this data or make it in some specific order and the sorting or the ordering of your data, you can do it using the order by or the sort by function. How we can do that, let's just see. So we have our data frame already. Now imagine that I want to sort my test data frame based on the age of my customers. So I can say data frame, df.orderBy and then I can put the column name like age and I can just go and display this data frame to see if that actually works. So here you can see that there are multiple records for which the age is coming as minus one. So those would be the junk records. Probably we could have removed them. And then you can see 18, 90, 20, 21. So this shows that this data frame, which I got, has been something which is ordered. This is not only the case. Sometimes we want to order our data frame not only based on one column but based on multiple columns…

Contents