How to decode base64 file into binary in Python?

Question

I'm building a system which handles pdf file data (for which I use the PyPDF2 lib). I now obtain a base64 encoded PDF which I can decode and store correctly using the following:

import base64
# base64FileData  <= the base64 file data
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
with open('thefilename.pdf', 'w') as theFile:
    theFile.write(fileData)

I now want to use this fileData as a binary file to split it up, but when I do type(fileData), the fileData turns out to be a <type 'str'>. How can I convert this fileData to be a binary (or at least not a string)?

All tips are welcome!

[EDIT]

if I do open(fileData, 'rb') I get an error, saying

TypeError: file() argument 1 must be encoded string without NULL bytes, not str

To remove the null bytes I tried, fileData.rstrip(' \t\r\n\0') and fileData.rstrip('\0') and fileData.partition(b'\0')[0], but nothing seems to work. Any ideas?

[EDIT2]

The thing is that I pass this string to the PyPDF2 PdfFileReader class, which on lines 909 to 912 does the following (in which stream is the fileData I provide):

if type(stream) in (string_type, str):
    fileobj = open(stream, 'rb')
    stream = BytesIO(b_(fileobj.read()))
    fileobj.close()

So because its a string, it assumes it is a filename, after which it tries to open the file. This then fails with a TypeError. So before feeding the fileData to the PdfFileReader I need to somehow convert it to something else than str so that it doesn't try to open it, but just considers fileData a file on itself. Any ideas?

Concerning your edit: The first parameter of open has to be the filename not the content of your file. I guess you are using Python 2, str is just an alias for bytes in this version. — halex
– halex, Commented Nov 19, 2014 at 12:53
@halex - I added EDIT2 to my question. I'm starting to get more and more what the problem is. The main problem is that the test if type(fileData) == str succeeds, through which the system thinks its a filename instead of a a file. Any ideas how I could convert the fileData so that it passes the if type(fileData) == str test? — kramer65
– kramer65, Commented Nov 19, 2014 at 13:10

Cake · Accepted Answer · 2014-11-19 12:42:59Z

3

Hence the open's binary mode you have to use 'wb' else it gets saved as "text" basically.

import base64
# base64FileData  <= the base64 file data
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
with open('thefilename.pdf', 'wb') as theFile:
    theFile.write(fileData)

answered Nov 19, 2014 at 12:42

Cake

811 silver badge2 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

kramer65 Over a year ago

I guess my question was not clear; I need to have fileData (the variable) as binary, not thefilename.pdf. I added some more information to my question what I tried. Would you have any idea?

Muhammad Subair · Accepted Answer · 2020-12-06 21:07:28Z

2

Example your input data is came from this:

with open(local_image_path, "rb") as imageFile:
    str_image_data = base64.b64encode(imageFile.read())

then to get the binary in variable you can try:

import io
import base64

binary_image_data = io.BytesIO(base64.decodebytes(str_image_data))

answered Dec 6, 2020 at 21:07

Muhammad Subair

514 bronze badges

Collectives™ on Stack Overflow

How to decode base64 file into binary in Python?

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related