While iter() is all very good, I needed a way to walk an xml hierarchy while tracking the nesting level, and iter() doesn't help at all with that. I wanted something like iterparse() which emits start and end events at each level of the hierarchy, but I already have the ElementTree so didn't want the unnecessary step/overhead of convert to string and re-parsing that using iterparse() would require.
Surprised I couldn't find this, I had to write it myself:
def iterwalk(root, events=None, tags=None):
"""Incrementally walks XML structure (like iterparse but for an existing ElementTree structure)
Returns an iterator providing (event, elem) pairs.
Events are start and end
events is a list of events to emit - defaults to ["start","end"]
tags is a single tag or a list of tags to emit events for - if empty/None events are generated for all tags
"""
# each stack entry consists of a list of the xml element and a second entry initially None
# if the second entry is None a start is emitted and all children of current element are put into the second entry
# if the second entry is a non-empty list the first item in it is popped and then a new stack entry is created
# once the second entry is an empty list, and end is generated and then stack is popped
stack = [[root,None]]
tags = [] if tags is None else tags if type(tags) == list else [tags]
events = events or ["start","end"]
def iterator():
while stack:
elnow,children = stack[-1]
if children is None:
# this is the start of elnow so emit a start and put its children into the stack entry
if ( not tags or elnow.tag in tags ) and "start" in events:
yield ("start",elnow)
# put the children into the top stack entry
stack[-1][1] = list(elnow)
elif len(children)>0:
# do a child and remove it
thischild = children.pop(0)
# and now create a new stack entry for this child
stack.append([thischild,None])
else:
# finished these children - emit the end
if ( not tags or elnow.tag in tags ) and "end" in events:
yield ("end",elnow)
stack.pop()
return iterator
# myxml is my parsed XML which has nested Binding tags, I want to count the depth of nesting
# Now explore the structure
it = iterwalk( myxml, tags='Binding'))
level = 0
for event,el in it():
if event == "start":
level += 1
print( f"{level} {el.tag=}" )
if event == "end":
level -= 1
The stack is used so that you can emit the start events as you go down the hierarchy and then correctly backtrack. The last entry in the stack is initially [el, None] so the start event for el is emitted and the second entry is update to [el,children] with each child being removed from the children as it is entered, until after last child has been done the entry is [el,[]] at which point the end event for el is emitted and the top entry removed from the stack.
I did it this way with the stack because I'm not fond of debugging recursive code and anyway I'm not sure how to write a recursive iterator function.
Here's a recursive version which is easier to understand but would be difficult to debug if it weren't so simple and something went wrong - and I learned about yield from :-)
def iterwalk1(root, events=None, tags=None):
"""Recuirsive version - Incrementally walks XML structure (like iterparse but for an existing ElementTree structure)
Returns an iterator providing (event, elem) pairs.
Events are start and end
events is a list of events to emit - defaults to ["start","end"]
tags is a single tag or a list of tags to emit events for - if None or empty list then events are generated for all tags
"""
tags = [] if tags is None else tags if type(tags) == list else [tags]
events = events or ["start","end"]
def recursiveiterator(el,suppressyield=False):
if not suppressyield and ( not tags or el.tag in tags ) and "start" in events:
yield ("start",el)
for child in list(el):
yield from recursiveiterator(child)
if not suppressyield and ( not tags or el.tag in tags ) and "end" in events:
yield ("end",el)
def iterator():
yield from recursiveiterator( root, suppressyield=True )
return iterator