A journey to the center of ArchiveLib
So, it’s been a long 3 days at LCA2019 and I’m thinking about doing a lightning talk on my latest project: Machine Embroidery. Specifically, the sizable amount of software acheology that’s been involved in implementing parsing(that still isn’t done) 2 specific embroidery formats VIP
and HUS
.
This is a rabbit hole; but one I am slowly backing up out of. So here we go.
The formats
Both formats follow the same structure: a header, color information, and 3 sections of compressed stitch data. Let’s take an example file and run through it. The file we’re using stitches this:

So … the header says we have a HUS
file with 22972(0x0000_59BC
) stitches and 6(0x0000_0006
) colours(it’s complicated; but each stripe is a different ‘colour’ even if it was stitched with the same colour thread #creativeLicense). Some other useful stuff for the machine like bounds; but we can skip that stuff. Finally the three offsets: 0x0000_0036
, 0x0000_00bc
, and 0x0000_09fe
.
1 | 00000000 5b af c8 00 bc 59 00 00 06 00 00 00 52 02 5a 02 |[....Y......R.Z.| |
For brevity; I’m going to skip the colour parsing because it is just convoluted. Seriously … just write out RGB values. Please.
Now onto the three compressed block; Here is the first one which contains all the information about the stitch attributes(is it a jump stitch, regular, colour change, or the end of the file).
1 | 00000036 00 8c 53 4c a0 4a d7 fe 9b 1e 93 88 69 6f ff 56 |..SL.J......io.V| |
This is mostly garbage to me so onto this compression algorithm
The compression
As described on the reference page this 138 byte block is compressed using a proprietary archive format called Greenleaf ArchiveLib. Not to be confused with libarchive
which is a completely different kettle of fish.
This is a compression algorithm that a company could buy the source code for and compile into their applications with any C++ compiler. Judging by this FAQ page archived by WayBack it has a number of benefits over some other algorithms of the age.
But, enough about the product; lets get to some code. Obviously the code exists on the internet; probably by accident. It’s of a old release(version 1.0) released around 1994/1995 but it has the compression algorithm in all its glory:
1 | /* ... snip ... */ |
Did I say glory; I meant gory. This code is minified, obfuscated C++. Cool. It’s source code. We can make this work.
Oh; side note, I want this embroidery library to be pure Rust, so whilst I could(and did for a time) link against this code, that is not ‘good enough’.
Step 1: Unminifying
So, the first step is to just get some resemblance of indentation; layout; something; anything; please.
1 | /* ... snip ... */ |
Wow, that is so much not any better. The formatting means I can see that I’m looking at some #define
s and a function; but besides that I don’t know what _531
is an what it’s doing(spoiler: the int
is a lie). Also, because this is all on a C++ class there is no easy differentiator between _531
which is a attribute on this
and _137
which is a #define
d constant and _269
which is a parameter.
Step 2: Culling
There is a lot of associated code, quite a few #ifdef
s and some weird DOS specific flags that were getting in the way; so now that I can see what I’m looking at, I started purging all of them. Huge swathes of code got deleted. I was aiming for the compression algorithm and the smallest amount of auxiliary code to support it.
At the same time I needed to standardize types; In C/C++ land int
isn’t an specific size; but in Rust everything is sized exactly, so to save myself some time in the future I changed all the types over; char
became uint8_t
, short
became int16_t
and so on.
Step 3: Renaming
Finally; a small enough code base, that still compiles. Lets make it easier to identify the source of each of these variables: #define
d variables become CONST_xxx
; locals, arguments, functions and the items on the class all get their own prefix:
1 | /* ... snip ... */ |
Step 4: A struct for the data
With my eyes firmly on the Rust prize and a recollection of earlier attempts I knew that a C++ class wasn’t going to translate well … so let’s move all that data into a struct
and just re-work some of the allocation. Simple find and replace later and we have a data struct and a basic class to hold onto it.
Step 5: Splitting the problem
At this point much of the auxiliary code was gone and I was left with 8 C++ files; mostly manageable; but the compress file was over 500 lines long at this point and I have this aversion to long files(I tend to get lost easily in them).
Woo. Now I have lots of files with code I don’t understand instead of one big one … wait … how is that better?!
Step 6: Refactoring
Not much to say … I spent a lot of time trying to come up with names for variables that would fit in well with all the references of it; and it was mostly unsuscessfull; but I managed to identify several ‘structures’ and gain enough of an understanding of the algorithm to describe it:
Step 7: Finally, Rust
Overall this was less painful than I had anticipated at the start(and from previous attempts). The work I’d done, especially in the refactoring, had meant I’d gained an understanding of the algorithm so I could un-unsafe
it(is that safeify?).
Conclusion
This algorithm is now written purely in Rust; with tests that use the old un-modified C++ version to verify the correctness. I have also take the opportunity to rewrite the expansion algorithm in more idiomatic Rust; the Compression algorithm though eludes my understanding.
You can find the code at archive-rs and it will be on Cargo once I clear up some licensing questions.
So how does the compression algorithm compress?
ArchiveLib generates lookup tables; and then looks up either the byte itself, or the number of bytes to look back in the output to copy from.
And how do I feel about Rust?
Never going to give you up