1

I'm building a system which handles pdf file data (for which I use the PyPDF2 lib). I now obtain a base64 encoded PDF which I can decode and store correctly using the following:

import base64
# base64FileData  <= the base64 file data
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
with open('thefilename.pdf', 'w') as theFile:
    theFile.write(fileData)

I now want to use this fileData as a binary file to split it up, but when I do type(fileData), the fileData turns out to be a <type 'str'>. How can I convert this fileData to be a binary (or at least not a string)?

All tips are welcome!

[EDIT]

if I do open(fileData, 'rb') I get an error, saying

TypeError: file() argument 1 must be encoded string without NULL bytes, not str

To remove the null bytes I tried, fileData.rstrip(' \t\r\n\0') and fileData.rstrip('\0') and fileData.partition(b'\0')[0], but nothing seems to work. Any ideas?

[EDIT2]

The thing is that I pass this string to the PyPDF2 PdfFileReader class, which on lines 909 to 912 does the following (in which stream is the fileData I provide):

if type(stream) in (string_type, str):
    fileobj = open(stream, 'rb')
    stream = BytesIO(b_(fileobj.read()))
    fileobj.close()

So because its a string, it assumes it is a filename, after which it tries to open the file. This then fails with a TypeError. So before feeding the fileData to the PdfFileReader I need to somehow convert it to something else than str so that it doesn't try to open it, but just considers fileData a file on itself. Any ideas?

2
  • Concerning your edit: The first parameter of open has to be the filename not the content of your file. I guess you are using Python 2, str is just an alias for bytes in this version. Commented Nov 19, 2014 at 12:53
  • @halex - I added EDIT2 to my question. I'm starting to get more and more what the problem is. The main problem is that the test if type(fileData) == str succeeds, through which the system thinks its a filename instead of a a file. Any ideas how I could convert the fileData so that it passes the if type(fileData) == str test? Commented Nov 19, 2014 at 13:10

2 Answers 2

3

Hence the open's binary mode you have to use 'wb' else it gets saved as "text" basically.

import base64
# base64FileData  <= the base64 file data
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
with open('thefilename.pdf', 'wb') as theFile:
    theFile.write(fileData)
Sign up to request clarification or add additional context in comments.

1 Comment

I guess my question was not clear; I need to have fileData (the variable) as binary, not thefilename.pdf. I added some more information to my question what I tried. Would you have any idea?
2

Example your input data is came from this:

with open(local_image_path, "rb") as imageFile:
    str_image_data = base64.b64encode(imageFile.read())

then to get the binary in variable you can try:

import io
import base64

binary_image_data = io.BytesIO(base64.decodebytes(str_image_data))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.