Crate content_inspector
source ·Expand description
A simple library for fast inspection of binary buffers to guess the type of content.
This is mainly intended to quickly determine whether a given buffer contains “binary”
or “text” data. Programs like grep
or git diff
use similar mechanisms to decide whether
to treat some files as “binary data” or not.
The analysis is based on a very simple heuristic: Searching for NULL bytes (indicating “binary” content) and the detection of special byte order marks (indicating a particular kind of textual encoding). Note that this analysis can fail. For example, even if unlikely, UTF-8-encoded text can legally contain NULL bytes. Conversely, some particular binary formats (like binary PGM) may not contain NULL bytes. Also, for performance reasons, only the first 1024 bytes are checked for the NULL-byte (if no BOM was detected).
If this library reports a certain type of encoding (say UTF_16LE
), there is no guarantee
that the binary buffer can actually be decoded as UTF-16LE.
Example
use content_inspector::{ContentType, inspect};
assert_eq!(ContentType::UTF_8, inspect(b"Hello"));
assert_eq!(ContentType::BINARY, inspect(b"\xFF\xE0\x00\x10\x4A\x46\x49\x46\x00"));
assert!(inspect(b"Hello").is_text());
Enums
- The type of encoding that was detected (for “text” data) or
BINARY
for “binary” data.
Functions
- Try to determine the type of content in the given buffer. See the crate documentation for a usage example and for more details on how this analysis is performed.