ascii output of string compression in C

Question

I need to use both compression and encryption in a project. There are two programs in the project.

In the first program, an ascii text file is first compressed and then encrypted. Further operations follow on this encrypted version of the file. However, a second program in the project follows the reverse process i.e. first decrypts and then decompresses to get the original ascii text file.

I've implemented the encryption module (aes via openssl) and it works fine. But when i looked for compression options in linux, i found that gzip, zlib etc throw their own versions of the file i.e. filename.gz or some other extension, the contents of which are not purely ascii. (For instance, i see diamond shaped symbols when i view the output in the terminal) Beause of this, i'm unable to read the compressed file completely in my C program.

So in short, i require a compressed file which contains only ascii characters. Is this possible by any means?

Your encryption algorithm doesn't emit asci-only chars (thank god). Why such a limitation on your compressor? and since when can you not read a binary file in a C program? — WhozCraig
– WhozCraig, Commented Oct 23, 2012 at 14:38
You could postprocess the compressed output with base64 or comparable encoding but that costs you 25% efficiency (it's an 6 bits in 8 encoding). It might be better to solve the problem that makes that your program cannot read binary encoded files. — fvu
– fvu, Commented Oct 23, 2012 at 14:38
By using an 'ASCII-only' representation of a compressed file, you'd undo most of the benefits of the compression. The ASCII-only representation would occupy more space than the binary representation. To read the file, use 'binary mode' (a no-op on Unix-like platforms, crucial on Windows). And don't use string manipulation functions on the binary data; there will be null bytes in the data (compressed or encrypted or both). — Jonathan Leffler
– Jonathan Leffler, Commented Oct 23, 2012 at 14:46
@fvu: 33%? Base-64 grows by a third (requires 4 bytes out for each 3 in). — Jonathan Leffler
– Jonathan Leffler, Commented Oct 23, 2012 at 14:47
@JonathanLeffler Same thing, different perspective 4/3 = 133% for 33% overhead, but 3/4 = 75% and 1 - 75% = 25% expansion... — twalberg
– twalberg, Commented Oct 23, 2012 at 14:53

user720694 · Accepted Answer · 2012-10-24 13:21:06Z

2

Finally resolved the issue. The program is handling everything correctly.

On the sending side:

compression: gzip -c secret.txt -9 > compressed.txt.gz
encryption: openssl enc -aes-256-cbc -a -salt -in compressed.txt.gz -out encrypted.txt

The compression output (gz) is given as an input for encryption which outputs a text file. The resulting output is purely ascii.

On the receiving side:

decryption: openssl enc -d -aes-256-cbc -a -in decryptme.txt -out decrypted.txt.gz
decompression: gunzip -c decrypted.txt.gz > message.txt

answered Oct 24, 2012 at 13:21

user720694

2,0756 gold badges36 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Aki Suihkonen · Accepted Answer · 2012-10-23 14:40:01Z

0

You can add uuencode / uudecode filter in between compression and encryption -- or you might want to loosen the restriction of the compressed data to be in ascii form: options:

read binary data from you c-program. (e.g. char buffer[256]; c=fread(buffer,1,256,stdin); )
convert the data to hexadecimal format
static char encrypted_file[]={ 0x01,0x6e, ... };

answered Oct 23, 2012 at 14:40

Aki Suihkonen

20.5k1 gold badge43 silver badges68 bronze badges

6 Comments

Aki Suihkonen Over a year ago

Reading binary data even from stdin is an option

Jonathan Leffler Over a year ago

uuencode is a ghastly format because it can include spaces. Use Base-64 encoding instead; it learns from uuencode and avoids the mistakes that uuencode made.

Aki Suihkonen Over a year ago

You are right. Just used it as an example of a reversible process stripping the MSB. Also for security purposes it's a bad candidate for having lot's of known plain-text.

twalberg Over a year ago

@AkiSuihkonen Neither uuencode nor base64 encoding are intended to have any security implication whatsoever. They're both just simple encodings to use a reduced code set for transmission purposes (e.g. 8-bit unclean channels). base64 has as much "known plain-text" as uuencode, as does Huffman coding, arithmetic coding or any other algorithm whose sole purpose is to re-encode one set of code words into a different alphabet or structure (with apologies to the definitions that I just stretched way out of shape there).

Aki Suihkonen Over a year ago

uuencode has a structure: it begins typically with 'begin 644' and ends with backquote; also it contains periodical sequences CR+ascii M.

|

Collectives™ on Stack Overflow

ascii output of string compression in C

2 Answers 2

Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related