1

I am having trouble parsing data in this manner for an XML file which is converted to a CSV: enter image description here

For the first column, I would like to get the general name tag (recordingSystem, Ports, etc) and concatenate it with the subNames in the row tags (closedFileCount, processedFileCount, etc)

The tag where the subName is located keeps changing, it could be a "usage", "lwGuage", "hwGauge" and so on. I also need to collect those and put it in the column beside it.

Please see the sample XML below:

<?xml version="1.0" encoding="UTF-8"?>

<omGroups xmlns="urn:nortel:namespaces:mcp:oms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:oms OMSchema.xsd" >

        <group>
                <name>RecordingSystem</name>
                <row>
                        <package>com.nortelnetworks.mcp.ne.base.recsystem.fw.system</package>
                        <class>RecSysFileOMRow</class>
                        <usage name="closedFileCount" hasThresholds="true">
                                <measures>
                                        closed file count
                                </measures>
                                <description>
                                        This register counts the number
                                        of closed files in the spool directory of a
                                        particular stream and a particular system.
                                        Files in the spool directory store the raw
                                        OAM records where they are sent to the
                                        Element Manager for formatting.
                                </description>
                                <notes>
                                        Minor and major alarms
                                        when the value of closedFileCount
                                        exceeds certain thresholds. Configure
                                        the threshold values for minor and major
                                        alarms for this OM through engineering
                                        parameters for minorBackLogCount and
                                        majorBackLogCount, respectively. These
                                        engineering parameters are grouped under
                                        the parameter group of Log, OM, and
                                        Accounting for the logs’ corresponding
                                        system.
                                </notes>
                        </usage>
                        <usage name="processedFileCount" hasThresholds="true">
                                <measures>
                                        Processed file count
                                </measures>
                                <description>
                                        The register counts the number
                                        of processed files in the spool directory of
                                        a particular stream and a particular system.
                                        Files in the spool directory store the raw
                                        OAM records and then send the records to
                                        the Element Manager for formatting.
                                </description>
                        </usage>
                </row>
                <documentation>
                        <description>
                                Rows of this OM group provide a count of the number of files contained
                                within the directory (which is the OM row key value).
                        </description>
                        <rowKey>
                                The full name of the directory containing the files counted by this row.
                        </rowKey>
                </documentation>
                <generatedOn>
                        <all/>
                </generatedOn>
        </group>
        <group traffic="true">
                <name>Ports</name>
                <row>
                        <package>com.nortelnetworks.ims.cap.mediaportal.host</package>
                        <class>PortsOMRow</class>
                        <usage name="rtpMpPortUsage">
                                <measures>
                                        BCP port usage
                                </measures>
                                <description>
                                        Meter showing number of ports in use.
                                </description>
                        </usage>
                        <lwGauge name="connMapEntriesLWM">
                                <measures>
                                        Lowest simultaneous port usage
                                </measures>
                                <description>
                                        Lowest number of
                                        simultaneous ports detected to be in
                                        use during the collection interval
                                </description>
                        </lwGauge>
                        <hwGauge name="connMapEntriesHWM">
                                <measures>
                                        Highest simultaneous port usage
                                </measures>
                                <description>
                                        Highest number of
                                        simultaneous ports detected to be in
                                        use during the collection interval.
                                </description>
                        </hwGauge>
                        <waterMark name="connMapEntries">
                                <measures>
                                        Connections map entries
                                </measures>
                                <description>
                                        Meter showing the number of connections in the host
                                        CPU connection map.
                                </description>
                                <bwg lwref="connMapEntriesLWM" hwref="connMapEntriesHWM"/>
                        </waterMark>
                        <counter name="portUsageSampleCnt">
                                <measures>
                                    Usage sample count
                                </measures>
                                <description>
                                    The number of 100-second samples taken during the
                                    collection interval contributing to the average report.
                                </description>
                        </counter>
                        <counter name="sampledRtpMpPortUsage">
                                <measures>
                                    In-use ports usage
                                </measures>
                                <description>
                                    Provides the sum of the in-use ports every 100 seconds.
                                </description>
                        </counter>
                        <precollector>
                                <package>com.nortelnetworks.ims.cap.mediaportal.host</package>
                                <class>PortsOMCenturyPrecollector</class>
                                <collector>centurySecond</collector>
                        </precollector>
                </row>
                <documentation>
                        <description>
                        </description>
                        <rowKey>
                        </rowKey>
                </documentation>
                <generatedOn>
                        <list>
                            <ne>sessmgr</ne>
                            <ne>rtpportal</ne>
                        </list>
                </generatedOn>
        </group>
       
</omGroups>

The code below is supposed to get the GeneralName and display it in the csv file the correct number of times but I can not get it to display anything.

from xml.etree import ElementTree
import csv
from copy import copy

import lxml.etree



tree = ElementTree.parse('OM.xml')

sitescope_data = open('OMFileConverted.csv', 'w', newline='', encoding='utf-8')
csvwriter = csv.writer(sitescope_data)



#Create all needed columns here in order and writes them to excel file
col_names = ['name', 'OMRegister']
csvwriter.writerow(col_names)

def recurse(root, props):
    for child in root:
        if child.tag == '{urn:nortel:namespaces:mcp:oms}group':
            p2 = copy(props)
            for event in root.findall('{urn:nortel:namespaces:mcp:oms}group'):
                event_id = event.find('{urn:nortel:namespaces:mcp:oms}name')
                if event_id != None:
                    p2['name'] = event_id.text
                    recurse(child, p2)
                else:
                    recurse(child, props)


    for event in root.findall('{urn:nortel:namespaces:mcp:oms}group'):

        event_data = [props.get('name')]



        csvwriter.writerow(event_data)



root = tree.getroot()
recurse(root,{}) #root + empty dictionary
sitescope_data.close()
2
  • Can you use beautifulsoup? Commented Oct 5, 2020 at 19:38
  • If beautifulsoup could parse the xml like the csv screenshot then that would be great Commented Oct 5, 2020 at 19:44

1 Answer 1

1

If xml_string is your XML snippet from the question, then this script:

import csv
from bs4 import BeautifulSoup

soup = BeautifulSoup(xml_string, 'html.parser')

with open('data.csv', 'w', newline='') as f_out:
    writer = csv.writer(f_out)
    writer.writerow(['General name:SpecificName', 'RegisterType'])
    for item in soup.select('row [name]'):
        writer.writerow([item.find_previous('name').text + ':' + item['name'], item.name])

Produces data.csv (screenshot from LibreOffice):

enter image description here


Edit: To get measures tag into a column, you can do:

import csv
from bs4 import BeautifulSoup

soup = BeautifulSoup(xml_string, 'html.parser')

with open('data.csv', 'w', newline='') as f_out:
    writer = csv.writer(f_out)
    writer.writerow(['General name:SpecificName', 'RegisterType', 'Measures'])
    for item in soup.select('row [name]'):
        writer.writerow([item.find_previous('name').text + ':' + item['name'], item.name, item.find('measures').get_text(strip=True)])

Produces:

enter image description here

Sign up to request clarification or add additional context in comments.

6 Comments

I have another question if you don't mind. If I would also like to make a column for the measures tag that is nested, how would I do that?
@marcorivera8 You can use item.find('measures').text and then write it to csv.
I didwriter.writerow([item.find_previous('name').text + ':' + item['name'], item.name,item.find('measures').text]) in the same for loop but it didn't work
Hi @Andrej Kesely, I updated the post with another question if you could take a look at it whenever you have time. Thanks
@marcorivera8 Can you post the update as new question here on SO? (To not clutter the comment section and answered question). You can notify me afterwards and I look at it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.