0

I'm writing a script to find and list files on my video drive that aren't already .mkv format, as well as listing any multi-episode files so that I can eventually convert and split these files properly.

Examples of files that should match:

Path/to/FilE332.1/Series Title/Season 01/Series - S01E03 - Episode Name Bluray-2160p.mkv
/Series - S01E103 - Episode Name WEBDL-1080p.mkv

Examples of files that shouldn't match:

Path/to/FilE332.1/Series Title/Season 01/Series - S01E04E05 - Episode Name SDTV.mkv
/Series - S01E04E05 - Episode Name SDTV.mkv

Here's the command I came up with:

find /path/to/files -type f ! -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"

This regex seems to be working properly when tested on regex101's website, so I'm pretty confident that the regex string is correct: https://regex101.com/r/iyUbh6/1

I've tried adding the -regextype flag to no avail:

find /path/to/files -type f ! -regextype posix-egrep -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"
find /path/to/files -type f ! -regextype posix-basic -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"
find /path/to/files -type f ! -regextype egrep -regex ".*- S\d{2}E(?:\d{3}|\d{2}) -.*\.mkv"

I also read some stuff about \d not working properly, so I tried changing it to [[:digit:]]. That didn't work either.

find /path/to/files -type f ! -regextype posix-basic -regex ".*- S[[:digit:]]{2}E(?:[[:digit:]]{3}|[[:digit:]]{2}) -.*\.mkv"
find /path/to/files -type f ! -regextype posix-extended -regex ".*- S[[:digit:]]{2}E([[:digit:]]{3}|[[:digit:]]{2}) -.*\.mkv"

I don't really know where to go from here, so hopefully someone with more experience has some insight on this issue.

0

2 Answers 2

1

Note: The following assumes you're using GNU find, which since you mention Linux, is a safe bet.

The default regular expression syntax does not understand \d (Instead you'd use [0-9] or [[:digit:]]). Alternation is \|. I don't think it supports repetition ranges; they're not documented. POSIX Basic Regular Expression syntax also doesn't understand \d, or alternation (though some GNU implementations do as an extension using \|), and requires many other things like groups and repetition ranges to be escaped. And none of the supported flavors supports non-capturing grouping ((?:...)).

Since your alternating group tests for either two or three digits, it can be turned into a single range when using one of the RE flavors that supports them.

So, something like:

find /path/to/files -regextype posix-extended -type f ! -regex ".*- S[0-9]{2}E[0-9]{2,3} -.*\.mkv"

is probably the cleanest approach.

Sign up to request clarification or add additional context in comments.

1 Comment

Your very last one looks like it should work, btw. Maybe it's the placement of the -regextype option after the ! that's breaking it?
0

I just pipe find to grep -v to do the filtering out:

find path -type f | grep -v \.mkv 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.