I want to extract the following strings from the title column and append to a new column named hazard_extract like in the below example.
test = {'title': ['Other', 'Microbiological - Listeria', 'Extraneous Material', 'Chemical', 'Chemical - Histamine', 'Labelling, Other'], 'hazard_extract':['Other', 'Microbiological', 'Extraneous Material', 'Chemical', 'Chemical', 'Labelling']}
example = pd.DataFrame(test)
example
title hazard_extract
0 Other Other
1 Microbiological - Listeria Microbiological
2 Extraneous Material Extraneous Material
3 Chemical Chemical
4 Chemical - Histamine Chemical
5 Labelling, Other Labelling
However, I am using the code below - if the string does not have a - or , it does not extract the string. In this case, how can I extract both words as in Extraneous Material and a single word as in Chemical or Other?
example['hazard_extract'] = example['title'].str.extract(r'^(.*?),? ')
title hazard_extract
0 Other NaN
1 Microbiological - Listeria Microbiological
2 Extraneous Material Extraneous
3 Chemical NaN
4 Chemical - Histamine Chemical
5 Labelling, Other Labelling
Thank you so much for all the help!