It's basic algorithm implementation so it doesn't ensure being efficient. It uses sliding window of size 4096 bytes and implicitly adds block consisting of 255 chars before actual data while encoding/decoding for primitive optimization. It seems to be highly defficient when applied to small amount of data with relatively high enthropy value.
Output format
struct {
uint16_t offset;
uint16_t length;
char next_char;
}