It's basic algorithm implementation so it doesn't ensure being efficient. It uses sliding window of size 4096 bytes and implicitly adds block consisting of 255 chars before actual data while encoding/decoding for primitive optimization. It seems to be highly defficient when applied to small amount of data with relatively high enthropy value.
Output format
struct { uint16_t offset; uint16_t length; char next_char; }