1

I'm looking for an efficient encoding for storing simulated coefficients.

The data has thousands of curves with each 512 contiguous numbers with single precision. The data may be stored as fixed point while it should preserve about 23-bit precision (compared to unity level).

The curves could look like those:

exaplatory curves

My best approach was to convert the numbers to 24-bit fixed point. Repeatedly I took the adjacent difference as long as the sum-of-squares decreases. When compressing the resulting data using LZMA (xz,lzip) I get about 7.5x compression (compared to float32).

Adjacent differences are good at the beginning, but they emphasize the quantization noise at each turn.

I've also tried the cosine transform after subtracting the slope/curve at the boundaries. The resulting compression was much weaker.

I tried AEC but LZMA compressed much stronger. The highest compression was using bzip3 (after adjacent differences).

I found no function to fit the data with high precision and a limited parameter count.

Is there a way to reduce the penalty of quantization noise when using adjacent differences?

Are there encodings which are better suited for this type of data?

5
  • What exactly do you mean by "contiguous"? Where is the quantization noise coming from? Do you mean just the noise in the least significant bit of the 23? When you say "simulated", are the generated by an algorithm as opposed to measured? If they are generated by an algorithm, then why not use the algorithm and parameters as the compressed representation? Commented Feb 6, 2023 at 17:59
  • Data model: The data is generated by a model (based on maybe 500MB of measured data). Calculating the model is expensive, so the model should be precalculated for various parameters and distributed. I'd like to compress resulting data of several tens of GB. The quantization noise I'm talking about are from the digital representation: After one adjacent difference from a least significant step you get a spike. With the next adjacent difference you get a positive spike at the leading edge and and a negative spike at the falling edge. Commented Feb 6, 2023 at 18:19
  • With contiguous I mean, that the data follows a strict curve. I used this word to emphasis that the data should be highly predictive. Commented Feb 6, 2023 at 18:31
  • Then you mean continuous, not contiguous. Commented Feb 6, 2023 at 19:54
  • Can you provide a link to a few examples, each with 512 samples? Commented Feb 6, 2023 at 21:54

1 Answer 1

2

You could try a higher-order predictor. Your "adjacent difference" is a zeroth-order predictor, where the next sample is predicted to be equal to the last sample. You take the differences between the actuals and the predictions, and then compress those differences.

You can try first, second, etc. order predictors. A first-order predictor would look at the last two samples, draw a line between those, and predict that the next sample will fall on the line. A second-order predictor would look at the last three samples, fit those to a parabola, and predict that the next sample will fall on the parabola. And so on.

Assuming that your samples are equally spaced on your x-axis, then the predictors for x[0] up through cubics are:

  1. x[-1] (what you're using now)
  2. 2*x[-1] - x[-2]
  3. 3*x[-1] - 3*x[-2] + x[-3]
  4. 4*x[-1] - 6*x[-2] + 4*x[-3] - x[-4]

(Note that the coefficients are alternating-sign binomial coefficients.)

I doubt that the cubic polynomial predictor will be useful for you, but experiment with all of them to see if any help.

Assuming that the differences are small, you should use a variable-length integer to represent them. The idea would be to use one byte for each difference most of the time. For example, you could code seven bits of difference, say -64 to 63, in one byte with the high bit clear. If the difference doesn't fit in that, then make the high bit set, and have a second byte with another seven bits for a total of 14 with that second high bit clear. And so on for larger differences.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the hint. I'm currently quite occupied and thus I couldn't yet test this. I will look at this asap.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.