I have a pyspark DataFrame with only one column as follows:
df = spark.createDataFrame(["This is AD185E000834", "U1JG97297 And ODNO926902 etc.","DIHK2975290;HI22K2390279; DSM928HK08", "there is nothing here."], "string").toDF("col1")
I would like to extract the codes in col1 to other columns like:
df.col2 = ["AD185E000834", "U1JG97297", "DIHK2975290", None]
df.col3 = [None, "ODNO926902", "HI22K2390279", None]
df.col4 = [None, None, "DSM928HK08", None]
Does anyone know how to do this? Thank you very much.