1

I have an XML file that is used as index for a dynamic panel with this structure:

<?xml version="1.0"?>
<Addons>
  <line>
    <text>My collection 1</text>
    <level>3</level>
    <comment/>
    <file>Collection1.zip</file>
    <size>120</size>
    <parent>Collection</parent>
    <directory>picture</directory>
    <type>jpeg</type>
    <filedate>22/05/2014</filedate>
  </line>
  <line>
    <text>My collection 2</text>
    <level>3</level>
    <comment/>
    <file>Collection2.zip</file>
    <size>880</size>
    <parent>Collection</parent>
    <directory>picture</directory>
    <type>jpeg</type>
    <filedate>01/04/2013</filedate>
  </line>
</Addons>

My panel uses this file as building index. When I update the files on the server, I need to update the filedate element by hand, ...but the file has 80 lines and it's terrible.

Is there a way by a script to do the job? The sequence is:

  • parse the file line by line
  • read the file name from the <file> element
  • get the last-modification date of the file
  • update the <filedate> element if present
  • go to next line

Shell scripts and Python are available on the server.

Thanks!!

1
  • 1
    The proper tool would be an XML processor. Do you have xsltproc or similar? Commented May 28, 2014 at 20:55

3 Answers 3

2

Using the xmlstarlet command line tool:

xmlstarlet sel -t -v '//file' -n file.xml |
while IFS= read -r filename; do
    filedate=$(date -d "@$(stat -c %Y "$filename")" +%d/%m/%Y)
    xmlstarlet ed --inplace -u "//filedate[../file = '$filename']" -v "$filedate" file.xml
done
Sign up to request clarification or add additional context in comments.

2 Comments

+1; delightfully concise; however, you probably meant file.xml instead of xml.xml in the 2nd xmlstarlet invocation; also, to get the xmlstarlet sel command (1st invocation) to terminate its output with \n, I had to insert option -n after '//file' - without it, read wouldn't pick up the last filename. Finally, a heads-up: implies use of GNU date and stat (probably a fair assumption on a server). xmlstarlet can be found at xmlstar.sourceforge.net
@user3685289, I would highly recommend using the date format %Y-%m-%d -- anything else is ambiguous.
1

Here is a little python script that does that for you:

#!/usr/bin/python

import re
from datetime import date

input_file = open("input.xml", "r")
output_file = open("output.xml", "w")

today = date.today().strftime("%d/%m/%Y")
replacement = '<filedate>' + today + '</filedate>' 

for line in input_file:
    updated_line = re.sub(r'<filedate>.*?</filedate>', replacement, line)
    output_file.write(updated_line)

input_file.close()
output_file.close()

3 Comments

This works for the sample input file as formatted, but what if the filedate element is spread across multiple lines? Use of an XML processor avoids such issues.
Shure, this is not a general solution for xml parsing. But I think it does the job for the author. And why would you split such short information like a date into multiple lines?
SO is as much about the general as it is about the specific. If you're parsing a format that is not line-oriented line by line, it makes your solution fragile (less general). If you decide to go with this approach (for a simpler, quicker solution), its limitations are worth noting - which is what I did in my comment, but doing so in your answer is preferable. Also note that your solution is not retrieving the last-modified time stamp from the files referenced (another thing worth mentioning, if you decide not to implement that part).
0

A Python solution that uses an XML parser (xml.etree) to robustly process the input:

#!/usr/bin/env python

import os
import datetime
from xml.etree import ElementTree

inFile = "in.xml"
outFile = "out.xml"

# Parse the entire document.
tree = ElementTree.parse(inFile)

# Loop over the <line> elements
for elem in tree.iter('line'):

  # Get the child element [values] of interest.
  fname = elem.find('file').text.strip()
  elemFDate = elem.find('filedate')

  # Determine the new modification date.
  # Note: As @glenn jackman notes, %Y-%m-%d would be a better date format to use.
  newDate = datetime.datetime.fromtimestamp(os.path.getmtime(fname)) \
    .strftime("%d/%m/%y")

  # Update the date element.
  elemFDate.text = newDate


# Write the output file.
tree.write(outFile)

Note:

  • The entire document is read and written at once, which is only appropriate for smaller XML files.
  • There is no error handling - all relevant elements and the files referenced are assumed to exist.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.