The default read implementation on slices was not generating efficient code. This custom implementation generates much smaller assembly with fewer function calls.
* add option and config traits
* thread options everywhere
* add WithOtherLimit, WithOtherEndian, and update internal to take advantage of it
* wip
* add rest of the public API and fix tests
* dtolnay feedback
* remove serialized_size_bounded and replace it with a use of config
* remove inline from trait method
* finish documentation and add custom reader support
* minor config_map refactor
* doc changes
* add with_(de)serializer functions and their associated modules
If we don't do this we end up using the generic read_exact method
which is not necessarily optimal. This is especially when
using a specialized Read implementation to go fast.
See https://github.com/TyOverby/bincode/issues/206