0

I receive an input string having values expressed in two possible formats. E.g.:

#short format
data = '"interval":19'

>>> "interval":19


#extended format
data = '"interval":{"t0":19,"tf":19}'

>>> "interval":{"t0":19,"tf":19}

I would like to check whether a short format is used and, in case, make it extended.

Considering that the string could be composed of multiple values, i.e.

data = '"interval":19,"interval2":{"t0":10,"tf":15}'

>>> "interval":19,"interval2":{"t0":10,"tf":15}

I cannot just say:

if ":{" not in data:
    #then short format is used

I would like to code something like:

if ":$(a general int/float/double number)" in data:
    #extract the number
    #replace ":{number}" with the extended format

I know how to code the replacing part. I need help for implementing if condition: in my mind, I model it like a variable substring, in which the variable part is the number inside it, while the rigid format is the $(value name) + ":" part.

  "some_value":19
       ^       ^
 rigid format  variable part

EDIT - WHY NOT PARSE IT?

I know the string is "JSON-friendly" and I can convert it into a dictionary, easily accessing then the values.

Indeed, I already have this solution in my code. But I don't like it since the input string could be multilevel and I need to iterate on the leaf values of the resulting dictionary, independently from the dictionary levels. The latter is not a simple thing to do.

So I was wondering whether a way to act directly on the string exists.

11
  • 1
    This looks like partial JSON. Are you really getting this, and not perchance a fully valid JSON string? Commented May 20, 2019 at 8:20
  • Indeed no, it is something starting from JSON but modified Commented May 20, 2019 at 8:25
  • Then unless there are specific restrictions on the possible contents, you should write a parser that parses this custom format into a mutable data structure. There's hardly another sane way to properly process this, if the strings in this format are allowed to contain arbitrary text. Commented May 20, 2019 at 8:27
  • 1
    Not related to the question: This data = "\"interval\":19,\"interval2\":{\"t0\":10,\"tf\":15}" looks overly complicated with all those backslashes. You could use data = '"interval":19,"interval2":{"t0":10,"tf":15}' instead. Commented May 20, 2019 at 8:34
  • Thank you, didn't know about. I am going to modify the question Commented May 20, 2019 at 8:35

2 Answers 2

2

If you replace all keys, except t0, tf, followed by numbers, it should work.
I show you an example on a multilevel string, probably to be put in a better shape:

import re

s = '"interval": 19,"t0interval2":{"t0":10,"tf":15},{"deeper": {"other_interval":23}}'

gex = '("(?!(t0|tf)")\w+":)\s*(\d+)'
new_s = re.sub(gex, r'\1 {"t0": \3, "tf": \3}', s)
print(new_s)
>>> print(new_s)
"interval": {"t0": 19, "tf": 19},"t0interval2":{"t0":10,"tf":15},{"deeper": {"other_interval": {"t0": 23, "tf": 23}}}
Sign up to request clarification or add additional context in comments.

4 Comments

That's a nice one!
Whoa, that's simply an amazing solution
thank you guys. I think this has to be a bit refined in order to get also float values and keys composed by more than one word, like other interval
Yeah, indeed it is a problem whether a float number is passed (and this is the common case, unfortunately). Thanks to @Matthias, i would modify your (\d+) with [.\d]+. It worked!
1

You could use a regular expression. ("interval":)(\d+) will look for the string '"interval":' followed by any number of digits.

Let's test this

data = '"interval":19,"interval2":{"t0":10,"tf":15},"interval":25'
result = re.sub(r'("interval":)(\d+)', r'xxx', data)
print(result)
# -> xxx,"interval2":{"t0":10,"tf":15},xxx

We see that we found the correct places. Now we're going to create your target format. Here the matched groups come in handy. In the regular expression ("interval":) is group 1, (\d+) is group 2.

Now we use the content of those groups to create your wanted result.

data = '"interval":19,"interval2":{"t0":10,"tf":15},"interval":25'
result = re.sub(r'("interval":)(\d+)', r'\1{"t0":\2,"tf":\2}', data)
print(result)
# -> "interval":{"t0":19,"tf":19},"interval2":{"t0":10,"tf":15},"interval":{"t0":25,"tf":25}

If there are floating point values involved you'll have to change (\d+) to ([.\d]+).

If you want any Unicode standard word characters and not only interval you can use the special sequence \w and because it could be multiple characters the expression will be \w+.

data = '"interval":19,"interval2":{"t0":10,"tf":15},"Monty":25.4'
result = re.sub(r'("\w+":)([.\d]+)', r'\1{"t0":\2,"tf":\2}', data)
print(result)
# -> "interval":{"t0":19,"tf":19},"interval2":{"t0":{"t0":10,"tf":10},"tf":{"t0":15,"tf":15}},"Monty":{"t0":25.4,"tf":25.4}

Dang! Yes, we found "Monty" but now the values from the second part are found too. We'll have to fix this somehow. Let's see. We don't want ("\w+") if it's preceded by { so were going to use a negative lookbehind assertion: (?<!{)("\w+"). And after the number part (\d+) we don't want a } or an other digit so we're using a negative lookahead assertion here: ([.\d]+)(?!})(?!\d).

data = '"interval":19,"interval2":{"t0":10,"tf":15},"Monty":25.4'
result = re.sub(r'(?<!{)("\w+":)([.\d]+)(?!})(?!\d)', r'\1{"t0":\2,"tf":\2}', data)
print(result)
# -> "interval":{"t0":19,"tf":19},"interval2":{"t0":10,"tf":15},"Monty":{"t0":25.4,"tf":25.4}

Hooray, it works!

Regular expressions are powerful and fun, but if you start to add more constraints this might become unmanageable.

4 Comments

Nice shot, Matthias! But this code seems rigid to the "interval" value. If other value are provided, it could fail. Consider that the rigid part in my question has the "some_value" in it, not only "interval".
I added a possible solution for variable key strings.
Hi Matthias, yours is a very well explained solution. And I have to thank you twice, as not only you solved my problem but also you make me learn a lot. I really appreciated this.
That's what it was for. But I agree that the better solution in this case is from @Lante Dellarovere.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.