1

I need to use both compression and encryption in a project. There are two programs in the project.

In the first program, an ascii text file is first compressed and then encrypted. Further operations follow on this encrypted version of the file. However, a second program in the project follows the reverse process i.e. first decrypts and then decompresses to get the original ascii text file.

I've implemented the encryption module (aes via openssl) and it works fine. But when i looked for compression options in linux, i found that gzip, zlib etc throw their own versions of the file i.e. filename.gz or some other extension, the contents of which are not purely ascii. (For instance, i see diamond shaped symbols when i view the output in the terminal) Beause of this, i'm unable to read the compressed file completely in my C program.

So in short, i require a compressed file which contains only ascii characters. Is this possible by any means?

7
  • 1
    Your encryption algorithm doesn't emit asci-only chars (thank god). Why such a limitation on your compressor? and since when can you not read a binary file in a C program? Commented Oct 23, 2012 at 14:38
  • 2
    You could postprocess the compressed output with base64 or comparable encoding but that costs you 25% efficiency (it's an 6 bits in 8 encoding). It might be better to solve the problem that makes that your program cannot read binary encoded files. Commented Oct 23, 2012 at 14:38
  • 1
    By using an 'ASCII-only' representation of a compressed file, you'd undo most of the benefits of the compression. The ASCII-only representation would occupy more space than the binary representation. To read the file, use 'binary mode' (a no-op on Unix-like platforms, crucial on Windows). And don't use string manipulation functions on the binary data; there will be null bytes in the data (compressed or encrypted or both). Commented Oct 23, 2012 at 14:46
  • @fvu: 33%? Base-64 grows by a third (requires 4 bytes out for each 3 in). Commented Oct 23, 2012 at 14:47
  • @JonathanLeffler Same thing, different perspective 4/3 = 133% for 33% overhead, but 3/4 = 75% and 1 - 75% = 25% expansion... Commented Oct 23, 2012 at 14:53

2 Answers 2

2

Finally resolved the issue. The program is handling everything correctly.

On the sending side:

compression: gzip -c secret.txt -9 > compressed.txt.gz
encryption: openssl enc -aes-256-cbc -a -salt -in compressed.txt.gz -out encrypted.txt

The compression output (gz) is given as an input for encryption which outputs a text file. The resulting output is purely ascii.

On the receiving side:

decryption: openssl enc -d -aes-256-cbc -a -in decryptme.txt -out decrypted.txt.gz
decompression: gunzip -c decrypted.txt.gz > message.txt
Sign up to request clarification or add additional context in comments.

Comments

0

You can add uuencode / uudecode filter in between compression and encryption -- or you might want to loosen the restriction of the compressed data to be in ascii form: options:

  • read binary data from you c-program. (e.g. char buffer[256]; c=fread(buffer,1,256,stdin); )
  • convert the data to hexadecimal format
    static char encrypted_file[]={ 0x01,0x6e, ... };

6 Comments

Reading binary data even from stdin is an option
uuencode is a ghastly format because it can include spaces. Use Base-64 encoding instead; it learns from uuencode and avoids the mistakes that uuencode made.
You are right. Just used it as an example of a reversible process stripping the MSB. Also for security purposes it's a bad candidate for having lot's of known plain-text.
@AkiSuihkonen Neither uuencode nor base64 encoding are intended to have any security implication whatsoever. They're both just simple encodings to use a reduced code set for transmission purposes (e.g. 8-bit unclean channels). base64 has as much "known plain-text" as uuencode, as does Huffman coding, arithmetic coding or any other algorithm whose sole purpose is to re-encode one set of code words into a different alphabet or structure (with apologies to the definitions that I just stretched way out of shape there).
uuencode has a structure: it begins typically with 'begin 644' and ends with backquote; also it contains periodical sequences CR+ascii M.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.