I'm starting with Python and for a design I've to validate a string that must have this format:
aaa...a aaa...a(bbb...b) aaa...a(bbb...b)ccc...c aaa...a(bbb...b)ccc...c(ddd...d)
where aaa..a, bbb...b, ccc..c, ddd..d are integer number.
The length of the string shall be arbitrary.
There are no spaces into the string.
Only single bracket allowed.
I've approached the problem as a finite state machine with two states.
I like to know if there is a best approach to solve this task and your impressions about it and every your hint.
Just as side information I've did some test by means of regexp but this seems to me a recursive pattern validation issue and I'm not sure that can be easily do in Python, but I'm not a expert into regexp, but I suppose that if this task should be possible may be executed with one single row of code.
Tha main advantage that I can see with the fsm approach is to notify to the user where a error is located into the input string and then make more easy (from the user perspective) the checking and correction task.
[EDIT] I've discovered a wrong detecting behaviour and now the code was corrected, are not allowed two consecutive group of brackt e.g. 10(200)(300). Also I've reformatted the code as a function.
"""
String parser for string formatted as reported below:
aaa...a
aaa...a(bbb...b)
aaa...a(bbb...b)ccc...c(ddd...d)
where:
aaa...a, bbb...b = integer number
Not valid (some example)
()
(aaa...a)
aaa...a()
aaa...a(bbb...b)ccc...d
aaa...a((bbb....b))
"""
import sys
import re
def parse_string(buffer):
# Checking loop
state = 1
old_state = 1
next_state = 1
strlen = len(buffer)
initial = True
success = False
is_a_number = re.compile("[0-9]")
for index, i in enumerate(buffer):
car = i
# State 1
if (state == 1):
if is_a_number.match(car):
if (index != strlen-1):
# If is a number e not the last I've to wait for the next char "(" or number
next_state = 1
else:
if (initial):
# If is a number and is also the last of the initial block -> I've finish to parse
success = True
break
else:
# Is the last number but not into the initial block of numbers -> error
success = False
break
else:
if (car == "("):
if (old_state == 2):
# Can't have two (...)(...) consecutively
success = False
break
if ((index == 0) or (index == strlen-1)):
# The ( can't be the first or the last char
success = False
break
else:
# Step to the next state
next_state = 2
initial = False
else:
# Wrong char detected
success = False
break
if (state == 2):
if is_a_number.match(car):
if (index != strlen-1):
# The char is a number and is not the last of the string
next_state = 2
else:
# If is a number and is also the last I've a error due to a missing ")"
success = False
break
else:
if (car == ")"):
if (old_state == 1):
# The sequence () is not allowed
success = False
break
elif ((old_state == 2) and (index != strlen-1)):
# The previous char was a number
next_state = 1
else:
# I'm on the last char of the string
success = True
break
else:
# Wrong char detected
success = False
break
print("current state: "+ str(state) + " next_state: " + str(next_state))
# Update the old and the new state
old_state = state
state = next_state
return(success, state, index)
if __name__ == "__main__":
# Get the string from the command line
# The first argument (index = 0) is the script name, the supplied parameters start from the idex = 1
number_cmd = len(sys.argv) - 1
if (number_cmd != 1):
print ("Error: request one string as input!")
sys.exit(0)
# Get the string
buffer = sys.argv[1].strip()
print("================================")
print("Parsing: " + buffer)
print("Checking with fsm")
print("--------------------------------")
# Parse the string
success, state, index = parse_string(buffer)
# Check result
if (success):
print("String validated!")
print("================================")
else:
print("Syntax error detected in state: " + str(state) + "\n" + "position: " + str(buffer[:index+1]))
print("================================")
# Exit from script
sys.exit(0)