1

I am having a bit of trouble putting this logic on paper:

The string I would like to parse: "Jan - 2012 Presentation v1.3.ppt.pdf - 500KB" This string can vary but the structure is always "NAME+EXT+FILESIZE"

I want to return the extension. However for obvious reasons I cannot just split(".") So I came up with something else:

stringy = "Jan - 2012 Presentation v1.3.ppt.pdf - 500KB"
ext = [".pdf",".jpg",".ppt",".txt",".doc"]

for i in ext:
    indx = stringy.find(i)
    ...

I got stuck where I need to figure out how to tell Python to take the extension starting with the biggest index yielded. Should be something like whatiwant = stringy[indx:4], but I can't figure out how to tell it to only take the largest index... The largest index will obviously mean the last extension in the string, which is the one I want to get. In this particular example, I don't care about "ppt", but rather the "pdf".

Can this perhaps be done in a more pythonic way? Or at least more efficiently?

4
  • Is the dash("-") in all occurrences of the string? Commented Nov 2, 2012 at 11:44
  • For this specific problem, there is also string's rfind. Commented Nov 2, 2012 at 11:45
  • the dash is always at the end to separate the filesize part of the string. yes. Commented Nov 2, 2012 at 11:45
  • Why downvote? Isn't this a valid question? Commented Nov 2, 2012 at 13:14

3 Answers 3

2
In [44]: stringy[stringy.rfind('.'):stringy.rfind('.')+4]
Out[44]: '.pdf'
Sign up to request clarification or add additional context in comments.

7 Comments

looks very promising. I will run this through a few variations of strings. So rfind will find anything starting from the right side of the string and takes the usual [x:y] stuff to denote what to cut?
@Capt.Morgan: if it is so "straightforward and uncomplicated" can't you yourself tell what will break this method?
It was a question for the poster, not the arsey kid who has nothing better to do in SO, other than leaving unhelpful and unproductive comments. If you have nothing of use to contribute, please do not post, full stop. Edit it's a little immature to be so butthurt because I did not pick your answer, don't you think?
@Capt.Morgan: Immature is to post pathetic question asking people to do work for you. Immature is to not understand one's own limitation. Immature is to be rude to people who help you.
@Capt.Morgan I think SilentGhost's answer was the best solution for this problem, this solution can easily break for a extension whose length is greater than or equal to 4 characters, say .torrent,.docx
|
1

using regex:

>>> strs="Jan - 2012 Presentation v1.3.ppt.pdf - 500KB"

>>> re.findall(r"(\.\w+)",strs)[-1]
'.pdf'

or:

>>> re.findall(r".*(\.\w+)",strs)
['.pdf']

Comments

0

Try this:

>>> stringy = "Jan - 2012 Presentation v1.3.ppt.pdf - 500KB"
>>> extension = stringy.split(".")[-1].split("-")[0].strip()
>>> extension
'pdf'

1 Comment

This will get the file size as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.