Crate den

source ·
Expand description

A difference library similar to rsync.

Performance

den is performant enough for large files and files, exceeding 10MB (when using rolling hashes). Please compile with the release preset for 10X the performance.

Allocating data and keeping it in memory is very fast compared to hashing. In the future, Den will support reading data bit by bit, greatly reducing the memory usage.

How-to & examples

Keep in mind this isn’t guaranteed to give the exact same data. Please check the data with a secure hashing algorithm (e.g. SHA-3) to ensure consistency.

Sending the data is possible due to serde providing serialization and deserialization. This requires the cargo feature serde to be enabled. You serialize all the structs in this library to any format.

These examples should cover what rsync does.

Get a remote’s data

To get someone else’s data, we construct a Signature and send it. The remote calculates a Difference using Signature::diff. The remote sends back the Difference which we Difference::apply.

Push my data to remote

Request the remote’s Signature. They calculate it and respond. We calculate a Signature::diff and send it to them. They Difference::apply it. Their data should now be equal to mine.

Get the difference of a local file

Gets a small diff to send to others, almost like how git works.

base_data is considered prior knowledge. target_data is the modified data.

The data segments can be any size. Performance should still be good.

let base_data = b"This is a document everyone has. It's about some new difference library.";
let target_data = b"This is a document only I have. It's about some new difference library.";

let mut signature = Signature::new(128);
signature.write(base_data);
let signature = signature.finish();

let diff = signature.diff(target_data);

// This is the small diff you could serialize with Serde and send.
let minified = diff.minify(8, base_data)
    .expect("This won't panic, as the data hasn't changed from calling the other functions.");

Future improvements

  • Rolling hash
  • Multi-threaded Signature::diff. There is no feasible way to implement this, as we look ahead and change which window we’re looking at after each iteration. Now with rolling hash, the performance is great.
  • Support read/write
    • Support to diff a reader
    • Support to apply to a writer
    • Fetch API for apply to get data on demand.
      • This could slow things down dramatically.
    • Implement Write for HashBuilder.
  • Use SHA(1|256?) to verify integrity of data. Bundled with the Signature. The implementer should provide this.

Modules

Structs

A delta between the local data and the data the Signature represents.
A generic rolling hash implementation.
One or more successive blocks found in the common data.
A segment with unknown contents. This will transmit the data.
A identifier of a file, much smaller than the file itself.
Builder of a Signature. Created using constructors on Signature (e.g. Signature::with_algorithm);

Enums

The algorithms which can be used for hashing the data.
A segment of data corresponding to a multiple of Difference::block_size.

Traits

A trait to fill extend a Vec with more data.
Used to make RollingHash generic.

Type Definitions

RollingHash using the Adler32 algorithm.
RollingHash using the 32-bit cyclic poly 23 algorithm.
RollingHash using the 32-bit cyclic poly 23 algorithm.