I want to filter out rows in Spark DataFrame that have Email column that look like real, here's what I tried:
df.filter($"Email" match {case ".*@.*".r => true case _ => false})
But this doesn't work. What is the right way to do it?
To expand on @TomTom101's comment, the code you're looking for is:
df.filter($"Email" rlike ".*@.*")
The primary reason why the match doesn't work is because DataFrame has two filter functions which take either a String or a Column. This is unlike RDD with one filter that takes a function from T to Boolean.
val sqlContext= new org.apache.spark.sql.SQLContext(sc) and import sqlContext.implicits._
rlikelike described here: stackoverflow.com/questions/27249685/…