1
$ ./a.py b.xml

This is ok. a.py reads files and prints something.

a.py reads arguments as in

# Each argument is a file
args = sys.argv[1:]

# Loop on files
for filename in args :

    # Open the file
    file = open(filename)

I want to pipe the out to other scripts.

$ ./a.py b.xml | grep '1)'

This gives python error.


This also fails

$ x=$(./a.py b.xml); echo $x...

How to tell python not to interpret shell script syntax such as | $() `` ?


The error is

Traceback (most recent call last):
  File "./flattenXml.py", line 135, in <module>
    process(file, prefix)
  File "./flattenXml.py", line 116, in process
    linearize(root, prefix + "//" + removeNS(root.tag))
  File "./flattenXml.py", line 104, in linearize
    linearize(childEl, path + '/' + numberedTag)
  File "./flattenXml.py", line 104, in linearize
    linearize(childEl, path + '/' + numberedTag)
  File "./flattenXml.py", line 104, in linearize
    linearize(childEl, path + '/' + numberedTag)
  File "./flattenXml.py", line 104, in linearize
    linearize(childEl, path + '/' + numberedTag)
  File "./flattenXml.py", line 104, in linearize
    linearize(childEl, path + '/' + numberedTag)
  File "./flattenXml.py", line 104, in linearize
    linearize(childEl, path + '/' + numberedTag)
  File "./flattenXml.py", line 83, in linearize
    print path + "/@" + removeNS(name) + "=" + val
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 106: ordinal not in range(128)

The python script is from Python recipes.

16
  • 1
    Please post the exact error you see. Commented Oct 3, 2013 at 19:53
  • 5
    Python never sees the shell syntax, the shell processes it transparently to the program. What error are you getting? Commented Oct 3, 2013 at 19:53
  • We cannot fix an approximation by guesses, we need the exact error message. The script itself seems fine to me. Commented Oct 3, 2013 at 19:57
  • You're still not saying what "fail" means. The above works, except you not using "$x" in the echo. Commented Oct 3, 2013 at 19:57
  • Please, copy the error you see in the console and paste it here. What does "fails" mean, exactly? We need the error message that gets printed in your console, if any. Commented Oct 3, 2013 at 19:58

1 Answer 1

1

The problem is that your document has non-ascii characters that can't be printed to an ascii output stream.

Internally, python can handle any unicode character but when that character is serialized, python needs to know which representation to use (utf-8, utf-16 or any of a zillion international character encodings) so that it can write the correct bits.

When run in a console, python can get the terminal's encoding (mine happens to be en_US.UTF-8) and setup an encoder for sys.stdout properly. When piping stdout to another program or redirecting stdout to a file, python doesn't know what to do and defaults to setting the ascii encoder for sys.stdout.

when run in a console the encoder usually knows how to convert the character to the right bits for your terminal and you get a nice display. When piped, the ascii encoder can't handle the character and throws an error.

One solution is to encode everything to utf-8 before writing to stdout.

import sys
encoding = sys.stdout.encoding or 'utf-8'

...
print (path + "/@" + removeNS(name) + "=" + val).encode(encoding)

Here, the utf-8 encoder sends a string that will pass through the still-existing ascii encoder on sys.stdout and make it to the other side. Its an open question whether the program on the other side can handle utf-8.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.