2

I'm new to Scala and just spent 3 hours trying to figure out how to parse a simple json string to an array of strings inside a dataframe.

Here's my code:

import spark.implicits._
import org.apache.spark.sql.functions._
...
emailsDf.select(from_json($"emails", Array[String])).show()

The emailsDf dataframe has one column called "emails", and in each row it is a json string of an array of strings: ["[email protected]", [email protected], ...]

Here's the error message I got:

Description Resource Path Location Type missing argument list for method apply in object Array Unapplied methods are only converted to functions when a function type is expected. You can make this conversion explicit by writing apply _ or apply()() instead of apply.

4
  • I need to parse the string into an array. from "['email1', 'email2']" into ['email1', 'email2'] Commented Dec 4, 2017 at 2:07
  • The previous one is just a string, but the later one is an array Commented Dec 4, 2017 at 2:08
  • emailsDf is a dataframe, and in each row under the colume "emails", the structure like this: "['email1', 'email2', 'email3', ...]" which is a string. And I need to transfer all of them into arrays of strings Commented Dec 4, 2017 at 2:12
  • There is a problem with this from_json that is not the way this function works. You need to parse with a StructType or a DataType. But there is a problem with this function for your case that will be solved in the next versions: issues.apache.org/jira/browse/SPARK-22228 I suggest you to use UDFs Commented Dec 4, 2017 at 2:16

1 Answer 1

1

You could use an UDF to convert the string into an array. A small example with some test data:

val df = Seq("[email1, email2, email3]", "[email4, email5]").toDF("emails")

val split_string_array = udf((emails: String) => {
  emails.substring(1, emails.length - 1).split(",").map(_.trim)
})

val df2 = df.withColumn("emails", split_string_array($"emails"))

df2 will now contain a single column with an array

root
 |-- emails: array (nullable = true)
 |    |-- element: string (containsNull = true)

as wanted.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.