2

When trying to run windows batch files, encoded using utf-8, using Python 2.7 under Windows 7, the first command of the batch file is not recognized (see example).

Most likely, the bom is interpreted as characters. How can I make the underlying shell run the batch files properly?

The batch file called is from a third party. Here is a simple python script that recreates the problem:

import codecs
import subprocess

content = "@echo off"
with codecs.open('test_utf8.bat', 'w', 'utf-8-sig') as f:
    f.write(content)
    f.close()

with open('test_ansi.bat', 'w') as f:
    f.write(content)
    f.close()

print "Calling test_ansi.bat"
subprocess.call('test_ansi.bat', shell=True)

print "Calling test_utf8.bat"
subprocess.call('test_utf8.bat', shell=True)

print "Done"

Running the script gives the following output

t:\tmp\test>python test.py
Calling test_ansi.bat
Calling test_utf8.bat

t:\tmp\test>´╗┐@echo off
'´╗┐@echo' is not recognized as an internal or external command,
operable program or batch file.
Done

t:\tmp\test>

As a note, the shell parameter doesn't seem to have any effect.

1 Answer 1

3

Ok . I will let you with your reasons to use python to create batch files, and run those files externally, instead of doing it in Python - and also with your reasons to want those batches in utf-8 instead of whatever is the native encoding of your Windows or your DOS (as it is not uncommon that both differ).

And here it is: Just encode to "utf-8", and not to "utf-8-sig". The later is not an oficial variant, rather a variant that prepends marker bytes (BOM) which makes the file open ok in Windows notepad: """ increase the reliability with which a UTF-8 encoding can be detected, Microsoft invented a variant of UTF-8 (that Python 2.5 calls "utf-8-sig") for its Notepad program: Before any of the Unicode characters is written to the file, a UTF-8 encoded BOM (which looks like this as a byte sequence: 0xef, 0xbb, 0xbf) is written. As it’s rather improbable that any charmap encoded file starts with these byte values """ (http://docs.python.org/2/library/codecs.html), but is otherwise garbage for various other apps (including,as you see, Microsoft's cmd ).

In short: encode to "utf-8". If you want to edit the files in windows, after they are generated use an editor, not notepad, which remains mostly unchanged since windows 3.0 days. (I wonder if it can open files larger than 64kB nowadays)

Sign up to request clarification or add additional context in comments.

1 Comment

Ok thanks. It solves my test case. Turns out that it was generating the same error when run from the command line as well as was my other external script.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.