This is quite a complex question and resolving the stochastic formulae is just the first step. I hope the following function will help you. It parses your stochastic formulae to extract all atoms (btw: you should put H in upper case in Ca(OH)_2. Otherwise, it regards Oh as an element.).
Using this function, you get a list of all atoms in this product or educt.
def expand(stoch):
f = ''
for c in stoch:
if c.isupper() or c == "(":
f+=' '+c
else:
f+=c
while '_' in f:
i = f.rfind("_")
if f[i-1]==")":
l = 1
start = i-2
while l > 0:
if f[start]=="(":
l-=1
elif f[start]==")":
l+=1
start-=1
subform = f[start+2:i-1]
subform = expand(subform)
k = i+1
while k<len(f):
k+=1
if not f[i+1:k].isdigit():break
num = f[i+1:k]
f = f[:start+1]+(subform+' ')*int(num)+f[k:]
else:
nc = 1
subform = f[i-nc]
while subform.islower():
nc+=1
subform = f[i-nc:i]
k = i+1
while k<len(f):
k+=1
if not f[i+1:k].isdigit():break
num = f[i+1:k]
f = f[:i-nc]+(subform+' ')*int(num)+f[k:]
while ' ' in f: f = f.replace(' ',' ')
return f
The function takes your syntax for a stochastic formula, decomposes it and simlifies if by multiplying each element the number of times it should be.
The result would be:
print(expand("Ca(OH)_2"))
print(expand("C_6H_12(OH)_2"))
## Ca O H O H
## C C C C C C H H H H H H H H H H H H O H O H
As it is recursive, it will be able to resolve nested parentheses:
print(expand("Ca_3(C_3H_5(OH)_3)_2"))
## Ca Ca Ca C C C H H H H H O H O H O H C C C H H H H H O H O H O H
If you apply it to your problem, I would suggest creating a dictionary that distinguishes between Product and Educt and lists the components and their atomic contents, so you can access it with an iterative program, later:
starters = ['Ca(OH)_2', 'HNO_3']
products = ['Ca(NO_3)_2', 'H_2O']
formula = {'Educts':[],'Products':[]}
for e in starters:
atoms = expand(e).split(' ')
while '' in atoms: atoms.remove('')
formula['Educts'].append({'Formula':e,'Atoms':sorted(atoms)})
for p in products:
atoms = expand(p).split(' ')
while '' in atoms: atoms.remove('')
formula['Products'].append({'Formula':p,'Atoms':sorted(atoms)})
for k,v in formula.items():
print(k)
for e in v:
for k2,v2 in e.items():
print(' - '+k2+': '+str(v2))
print('')
## Output:
##
##Educts
## - Formula: Ca(OH)_2
## - Atoms: ['Ca', 'H', 'H', 'O', 'O']
##
## - Formula: HNO_3
## - Atoms: ['H', 'N', 'O', 'O', 'O']
##
##Products
## - Formula: Ca(NO_3)_2
## - Atoms: ['Ca', 'N', 'N', 'O', 'O', 'O', 'O', 'O', 'O']
##
## - Formula: H_2O
## - Atoms: ['H', 'H', 'O']
Or just this dict: {'Educts': [{'Formula': 'Ca(OH)_2', 'Atoms': ['Ca', 'O', 'H', 'O', 'H']}, {'Formula': 'HNO_3', 'Atoms': ['H', 'N', 'O', 'O', 'O']}], 'Products': [{'Formula': 'Ca(NO_3)_2', 'Atoms': ['Ca', 'N', 'O', 'O', 'O', 'N', 'O', 'O', 'O']}, {'Formula': 'H_2O', 'Atoms': ['H', 'H', 'O']}]}