The Blog

3D Model Format (Compression)


My goal with all proprietary formats associated with L. Spiro Engine is to be as small as possible as long as the loading times are reasonable.  To this end I developed a compression algorithm based on LZW—with modifications for improved security—which decompresses very quickly.  However that alone is not enough to achieve the best compression ratios; compression is most effective when geared towards certain special cases, and I will explain what I have done to achieve super-small 3D files here.



Normals are meant to be unit-length, so it is not necessary to store the entirety of one of the components.  A missing component’s value can be found by determining what value would make the vector have a length of one.  So if you have an X and a Y, the Z would be Sqrt( 1 – (X * X + Y * Y) ).  However, what if you have X = 0 and Y = 0?  Z must be 1, right?  Wrong, it could be either 1 or -1.  So a 96-bit floating-point normal can be stored in the file using only 65 bits per component, with one bit to indicate the sign of the missing component.

The compression routine in L. Spiro Engine is most effective on data that repeats.  Floating-point numbers usually have similar bytes between them, but if I store them linearly as 65-bits each, their bit patterns will not align for proper repeating data.  Instead, I store the X and Y for each normal together in one array, and the signs of the Z’s for each normal in a second array (one bit per normal).  Finally, both arrays are then LZW compressed.  The average resulting size is ~30% of the original data.



Index data references the various pools (vertex, normal, UV, etc.) and are used to describe how faces/polygons are put together.  Indices usually do not vary too much from the previous index.  For example, with indices 0 1 2 1 2 3 2 3 4, each index is only +1 or -1 of the previous index.  Smaller numbers need fewer bits to hold them.  If we use 3 bits for the first few numbers, then 4 bits for the next few, then 5 bits for the next few, we definitely do save some space.  But if we treat each index as a relative offset from the index before it (and use the first index raw), we have a lot more small numbers for a lot longer.  The catch is that now we have a sign value, so we must add a sign bit to each stored value.  Luckily this is trivial.

Start with a routine that returns the index of the highest-set bit in a number (HighBitIndex()).  So the number 0×3 returns 1.  The number of bits needed to store that number fully is one more than the return of that function.  So to store the number 0×3 we need 2 bits.  Add one more bit for the sign bit.  In binary, 0×3 is written as the 3 bits 011.

When negative numbers are encountered, inverse the bits with ~ when passing it to HighBitIndex().  Again add 1 for storage space and 1 for the sign.  Then store the original signed number using that many bits.  -3 is 0xFFFFFFFD in 32-bit form.  ~0xFFFFFFFD = 0×00000002.  HighBitIndex() will return 1 here.  2 bits are needed to store the number and 1 bit added for the sign, so we will use 3 bits for this number.  Storing -3 using 3 bits results in 101 (binary).

The full routine saves the first index raw using 24 bits, starts at 3 bits implicitly (we don’t need to explicitly store the bit count), and determines how many signed bits are needed to store the relative offsets of each index to the index before it.  Let’s say it gets 524 values in a row that can be stored using 3 bits.  It writes 524 using 3 bytes, then stores the next 524 index offsets using 3 bits each.  After this it counts how many values can be stored using 4 bits and continues until every index has been stored.  Finally, the newly created buffer is LZW compressed.   The average resulting size is ~2% of the original data.

To read back the signed bits, simply store X bits into a signed 32-bit integer, and perform two shifts as follows: VAL = ((VAL << (32 – X)) >> (32 – X)).

I originally implemented the easy method of just storing the indices using 3 bits, then 4 bits, then 5, etc.  The relative-offset method requires one extra bit per component, but in practice this will save space in the long run.  When I changed to this method, my BMW X6 model file was reduced from 10 megabytes to 8.2.  The model has ~250,000 triangles.  The amount of index data within the file is enormous, and the difference between 3-bit, 4-bit, 5-bit, etc. compression—which is already very small compared to using 32 bits per index—and the relative-offset variation amounted to 1.8 megabytes of savings.  Completely uncompressed, the index data is approximately 10 megabytes, which is in itself larger than the final model file.


Face Data

Faces are basically polygons.  They can have any number of edges/points and are typically (but not strictly) assumed to be entirely flat.  Putting faces together results in the final 3D object, but for submitting to graphics API’s such as Direct3D they will be converted to triangles, not left as polygons.  But before all that, in the file, each point on each face is a 32-bit unsigned value referencing into the pools of vertices, normals, etc.

A naive way to store these would be to simply loop over each face, store the number of points in it, then loop over those points and store the indices that reference the pools.  Because our method for compressing index data is extremely effective, we would rather try to group related sets of indices (all the vertex indices together, all the normal indices together, etc.) so that linearly they are within the same general range.  Take two faces, each with 3 points, and each with a vertex and a normal reference:

Face 1: 0 56, 1 57, 2 58

Face 2: 1 57, 2 58, 3 59

We can get decent results by grouping all of the vertices and normals for each face as such:

Face 1: [0 1 2] [56 57 58]

Face 2: [1 2 3] [57 58 59]

…and then compressing those groups via our index-compression routine.

But we can get better results by grouping all of the faces together:

Face 1 and 2: [0 1 2 1 2 3] [56 57 58 57 58 59]

…and compressing those groups each with our index-compression routine.

The average resulting size is ~1.5% of the original data.



Smaller files means more data in your games, and this is a serious issue with Android and iPhone games.  The loading times are not heavily damaged by these compression routines because LZW is very fast to unpack.  The models in L. Spiro Engine are less than 10% the sizes of the unpacked data, and because the files are so small they are safe to load fully into memory for the unpacking process, and they require less disk-reading to do so (which is slow for disk-based systems such as PlayStation 3 and Nintendo Wii).  This entirely mitigates the decompression time on disk-based machines.

The extra effort is far worth it.



L. Spiro

About L. Spiro

L. Spiro is a professional actor, programmer, and artist, with a bit of dabbling in music. || [Senior Core Tech Engineer]/[Motion Capture] at Deep Silver Dambuster Studios on: * Homefront: The Revolution * UNANNOUNCED || [Senior Graphics Programmer]/[Motion Capture] at Square Enix on: * Luminous Studio engine * Final Fantasy XV || [R&D Programmer] at tri-Ace on: * Phantasy Star Nova * Star Ocean: Integrity and Faithlessness * Silent Scope: Bone Eater * Danball Senki W || [Programmer] on: * Leisure Suit Larry: Beach Volley * Ghost Recon 2 Online * HOT PXL * 187 Ride or Die * Ready Steady Cook * Tennis Elbow || L. Spiro is currently a GPU performance engineer at Apple Inc. || Hyper-realism (pencil & paper): || Music (live-played classical piano, remixes, and original compositions):

2 Awesome Comments So Far

Don't be a stranger, join the discussion by leaving your own comment
  1. Joe
    August 11, 2017 at 9:42 AM #

    Any advice for optimizing joints, weights and animation transforms?



    • L. Spiro
      August 11, 2017 at 9:58 AM #

      Nothing specific to the binary representations of those, but when exporting animations you should always perform redundant key-frame elimination.
      If you have a key-frame A at 0%, B at 47%, and C at 100%, linearly interpolate between A and C by 47%. If the result is the same (within epsilon) of B then you can remove B.
      This is vital because a performance optimization is to only do linear interpolations at run-time, which means in order to handle any other kind of interpolation (spline-based, spherical, etc.) you have to add enough key-frames that you can accurately reproduce the animation with only linear interpolation.
      This means aggressively adding key-frames every few milliseconds through an animation and then using the above method to remove as many of them as you can while still keeping the original animation.

      L. Spiro

Leave a Comment

Remember to play nicely folks, nobody likes a troll.