4

I have a CSV file with following headers and (sample) data:

StopName,RouteName,Travel_Direction,Latitude,Longitude
StreetA @ StreetB,1 NameA,DirectionA,Lat,Long
StreetC @ StreetD,1 NameA,DirectionA,Lat,Long
...
StreetE @ StreetF,1 NameA,DirectionB,Lat,Long
StreetG @ StreetH,1 NameA,DirectionB,Lat,Long
...
StreetI @ StreetJ,2 NameB,DirectionC,Lat,Long
StreetK @ StreetL,2 NameB,DirectionC,Lat,Long
...
StreetM @ StreetN,2 NameB,DirectionD,Lat,Long
StreetO @ StreetP,2 NameB,DirectionD,Lat,Long
.
.
.

I am wanting to use regex (currently in Notepad++) to get the following results:

1 NameA - DirectionA=[[StreetA @ StreetB,[Lat,Long]], [StreetC @ StreetD,[Lat,Long]], ...]
1 NameA - DirectionB=[[StreetD @ StreetE,[Lat,Long]], [StreetF @ StreetG,[Lat,Long]], ...]
2 NameB - DirectionC=[[StreetH @ StreetI,[Lat,Long]], [StreetJ @ StreetK,[Lat,Long]], ...]
2 NameB - DirectionD=[[StreetL @ StreetM,[Lat,Long]], [StreetN @ StreetO,[Lat,Long]], ...]
.
.
.

With the Regex and Substitution,

RgX: ^([^,]*),([^,]*),([^,]*),(.*)
Sub: $2 - $3=[$1,[\4]]

Demo: https://regex101.com/r/gS9hD6/1

I have gotten this far:

1 NameA - DirectionA=[StreetA @ StreetB,[Lat,Long]]
1 NameA - DirectionA=[StreetC @ StreetD,[Lat,Long]]
1 NameA - DirectionB=[StreetE @ StreetF,[Lat,Long]]
1 NameA - DirectionB=[StreetG @ StreetH,[Lat,Long]]
2 NameB - DirectionC=[StreetI @ StreetJ,[Lat,Long]]
2 NameB - DirectionC=[StreetK @ StreetL,[Lat,Long]]
2 NameB - DirectionD=[StreetM @ StreetN,[Lat,Long]]
2 NameB - DirectionD=[StreetO @ StreetP,[Lat,Long]]

In a new regex, I tried splitting the above result on "=", but didn't know where to go from there.

I think one way to get the desired results would be to keep first unique instance of what's before "=", replace new line with "," and enclose it with a [..] to make it an array form.

Edit: There are about 10k stops (total), but only about 100 unique routes.

Edit 2: (maybe I am asking for too many changes now)

For first regex:

  • What if I want to use "\n" instead of "="?

At beginning of 2nd regex replacement,

  • What if I have only RouteName and StopName columns, like this: 1 NameA - DirectionA=[StreetA @ StreetB, ...]?
  • Similarly, what if I only have RouteName and Coordinates, like this: 1 NameA - DirectionA=[[Lat,Long]]?
9
  • You cant do the further part with only regex.. use a programming language and it will make your task very easy. Commented Oct 3, 2015 at 5:09
  • @karthikmanchala It can be done with 1 regex, applied multiple times :-) Commented Oct 3, 2015 at 6:00
  • @Mariano No it cant, if there are dynamic number of unique keys. In your answer you assumed there are only 2 keys.. and that could be 3 or more... :) check this Commented Oct 3, 2015 at 6:05
  • @karthikmanchala That's where you have to apply multiple times (same regex), described as step 3 in my answer. check this Commented Oct 3, 2015 at 6:06
  • @Mariano exactly.. what if i have 100 occurrences? or 500? 1000? thats where programming language and easy-to-do comes into play.. :) Commented Oct 3, 2015 at 6:09

2 Answers 2

3

Steps

1. First replacement:

  • Find what: ^([^,]*),([^,]*),([^,]*),(.*)
  • Replace with: \2 - \3=[[\1,[\4]]]
  • Replace All

2. Second replacement:

  • Find what: ^[\S\s]*?^([^][]*=)\[\[.*\]\]\K\]\R\1\[(.*)\]$
  • Replace with: , \2]
  • Replace All

3. Repeat step 2 until there are no more occurences.

  • This means that if there are 100 instances (Stops) for the same key (Route - Direction pair), you will have to click Replace All 7 times (ceiling(log2(N))).

Description

I modified your regex in step 1 to add an extra pair of brackets that will enclose the whole set.

For step 2, it finds a pair of lines for the same Direction, appending the last to the previous one.

^[\S\s]*?^([^][]*=)     #Group 1: captures "1 NameA - DirA="
\[\[.*\]\]              #matches the set of Stops - "[[StA @ StB,[Lat,Long]], ..."
\K                      #keeps the text matched so far out of the match
\]\R                    #closing "]" and newline
\1                      #match next line (if the same route)
\[(.*)\]$               #and capture the Stop (Group 2)

regex101 Demo for step 1
regex101 Demo for step 2

Sign up to request clarification or add additional context in comments.

6 Comments

+l Unless there are lot of instances to replace.. this is a very nice answer!
Good catch! I fixed the error (it was capturing more than it should). And I also improved the regex. A route with 100 stops now requires 7 clicks in Replace All. regex101 demo
Awesome! Can you add another regex with minor change... If I want to put "\n" instead of "=" in the first regex, how will it change the 2nd one? Also, the regex should take care of any Route names containing "-" itself, correct?
@Mariano Sorry to keep adding on, but what if I had only 2 columns at beginning of 2nd regex: 1 NameA - DirectionA=[[StreetA @ StreetB]], how can I apply the same logic to this data to join all the stops? (Let me know if I should put the last two requests in the OP as well)
You can't use \n as key delimiter with this regex. It''ll overcomplicate things. And I don't think 2 columns matter. Test it and edit anything you need. Please consider accepting this answer if it worked for you.
|
0

Try this one I checked it with mobile notepad no error.

Find what:

(s.+@\s\w+),(\d{1,} \w+),(\w+),(.+)

Replace with:

\2 - \3=[[\1,[\4],...]]

2 Comments

Thanks for contributing. Please edit to include explanation. SO discourages code only posts. Regex solutions are particularly useless without comment. Quality answers receive upvotes over time as future visitors learn something to apply to their own coding issues. Also, please look into markdowns. You can post code by indenting 4 spaces or wrapping code blocks with a line of 3 backtics
Thank you, , if possible I need a video to watch of how to insert codes?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.