Restructure CSV data with Notepad++, Regex

Question

I have a CSV file with following headers and (sample) data:

StopName,RouteName,Travel_Direction,Latitude,Longitude
StreetA @ StreetB,1 NameA,DirectionA,Lat,Long
StreetC @ StreetD,1 NameA,DirectionA,Lat,Long
...
StreetE @ StreetF,1 NameA,DirectionB,Lat,Long
StreetG @ StreetH,1 NameA,DirectionB,Lat,Long
...
StreetI @ StreetJ,2 NameB,DirectionC,Lat,Long
StreetK @ StreetL,2 NameB,DirectionC,Lat,Long
...
StreetM @ StreetN,2 NameB,DirectionD,Lat,Long
StreetO @ StreetP,2 NameB,DirectionD,Lat,Long
.
.
.

I am wanting to use regex (currently in Notepad++) to get the following results:

1 NameA - DirectionA=[[StreetA @ StreetB,[Lat,Long]], [StreetC @ StreetD,[Lat,Long]], ...]
1 NameA - DirectionB=[[StreetD @ StreetE,[Lat,Long]], [StreetF @ StreetG,[Lat,Long]], ...]
2 NameB - DirectionC=[[StreetH @ StreetI,[Lat,Long]], [StreetJ @ StreetK,[Lat,Long]], ...]
2 NameB - DirectionD=[[StreetL @ StreetM,[Lat,Long]], [StreetN @ StreetO,[Lat,Long]], ...]
.
.
.

With the Regex and Substitution,

RgX: ^([^,]*),([^,]*),([^,]*),(.*)
Sub: $2 - $3=[$1,[\4]]

Demo: https://regex101.com/r/gS9hD6/1

I have gotten this far:

1 NameA - DirectionA=[StreetA @ StreetB,[Lat,Long]]
1 NameA - DirectionA=[StreetC @ StreetD,[Lat,Long]]
1 NameA - DirectionB=[StreetE @ StreetF,[Lat,Long]]
1 NameA - DirectionB=[StreetG @ StreetH,[Lat,Long]]
2 NameB - DirectionC=[StreetI @ StreetJ,[Lat,Long]]
2 NameB - DirectionC=[StreetK @ StreetL,[Lat,Long]]
2 NameB - DirectionD=[StreetM @ StreetN,[Lat,Long]]
2 NameB - DirectionD=[StreetO @ StreetP,[Lat,Long]]

In a new regex, I tried splitting the above result on "=", but didn't know where to go from there.

I think one way to get the desired results would be to keep first unique instance of what's before "=", replace new line with "," and enclose it with a [..] to make it an array form.

Edit: There are about 10k stops (total), but only about 100 unique routes.

Edit 2: (maybe I am asking for too many changes now)

For first regex:

What if I want to use "\n" instead of "="?

At beginning of 2nd regex replacement,

What if I have only RouteName and StopName columns, like this: 1 NameA - DirectionA=[StreetA @ StreetB, ...]?
Similarly, what if I only have RouteName and Coordinates, like this: 1 NameA - DirectionA=[[Lat,Long]]?

You cant do the further part with only regex.. use a programming language and it will make your task very easy. — karthik manchala
– karthik manchala, Commented Oct 3, 2015 at 5:09
@karthikmanchala It can be done with 1 regex, applied multiple times :-) — Mariano
– Mariano, Commented Oct 3, 2015 at 6:00
@Mariano No it cant, if there are dynamic number of unique keys. In your answer you assumed there are only 2 keys.. and that could be 3 or more... :) check this — karthik manchala
– karthik manchala, Commented Oct 3, 2015 at 6:05
@karthikmanchala That's where you have to apply multiple times (same regex), described as step 3 in my answer. check this — Mariano
– Mariano, Commented Oct 3, 2015 at 6:06
@Mariano exactly.. what if i have 100 occurrences? or 500? 1000? thats where programming language and easy-to-do comes into play.. :) — karthik manchala
– karthik manchala, Commented Oct 3, 2015 at 6:09

Mariano · Accepted Answer · 2015-10-03 08:06:20Z

3

Steps

1. First replacement:

Find what: ^([^,]*),([^,]*),([^,]*),(.*)
Replace with: \2 - \3=[[\1,[\4]]]
Replace All

2. Second replacement:

Find what: ^[\S\s]*?^([^][]*=)\[\[.*\]\]\K\]\R\1\[(.*)\]$
Replace with: , \2]
Replace All

3. Repeat step 2 until there are no more occurences.

This means that if there are 100 instances (Stops) for the same key (Route - Direction pair), you will have to click Replace All 7 times (ceiling(log2(N))).

Description

I modified your regex in step 1 to add an extra pair of brackets that will enclose the whole set.

For step 2, it finds a pair of lines for the same Direction, appending the last to the previous one.

^[\S\s]*?^([^][]*=)     #Group 1: captures "1 NameA - DirA="
\[\[.*\]\]              #matches the set of Stops - "[[StA @ StB,[Lat,Long]], ..."
\K                      #keeps the text matched so far out of the match
\]\R                    #closing "]" and newline
\1                      #match next line (if the same route)
\[(.*)\]$               #and capture the Stop (Group 2)

regex101 Demo for step 1
regex101 Demo for step 2

edited Oct 3, 2015 at 8:06

answered Oct 3, 2015 at 5:59

Mariano

6,5214 gold badges35 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

karthik manchala Over a year ago

+l Unless there are lot of instances to replace.. this is a very nice answer!

Mariano Over a year ago

Good catch! I fixed the error (it was capturing more than it should). And I also improved the regex. A route with 100 stops now requires 7 clicks in Replace All. regex101 demo

Mohammad Ali Over a year ago

Awesome! Can you add another regex with minor change... If I want to put "\n" instead of "=" in the first regex, how will it change the 2nd one? Also, the regex should take care of any Route names containing "-" itself, correct?

Mohammad Ali Over a year ago

@Mariano Sorry to keep adding on, but what if I had only 2 columns at beginning of 2nd regex: 1 NameA - DirectionA=[[StreetA @ StreetB]], how can I apply the same logic to this data to join all the stops? (Let me know if I should put the last two requests in the OP as well)

Mariano Over a year ago

You can't use \n as key delimiter with this regex. It''ll overcomplicate things. And I don't think 2 columns matter. Test it and edit anything you need. Please consider accepting this answer if it worked for you.

|

SherylHohman · Accepted Answer · 2021-02-23 17:37:14Z

0

Try this one I checked it with mobile notepad no error.

Find what:

(s.+@\s\w+),(\d{1,} \w+),(\w+),(.+)

Replace with:

\2 - \3=[[\1,[\4],...]]

edited Feb 23, 2021 at 17:37

SherylHohman

18.2k18 gold badges94 silver badges100 bronze badges

answered Feb 23, 2021 at 14:26

Haji Rahmatullah

4301 gold badge4 silver badges11 bronze badges

2 Comments

SherylHohman Over a year ago

Thanks for contributing. Please edit to include explanation. SO discourages code only posts. Regex solutions are particularly useless without comment. Quality answers receive upvotes over time as future visitors learn something to apply to their own coding issues. Also, please look into markdowns. You can post code by indenting 4 spaces or wrapping code blocks with a line of 3 backtics

Haji Rahmatullah Over a year ago

Thank you, , if possible I need a video to watch of how to insert codes?

Collectives™ on Stack Overflow

Restructure CSV data with Notepad++, Regex

2 Answers 2

Steps

Description

6 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Steps

Description

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related