Skip to content

feat: Added useRawBinaryStrings option to Decoder to allow override of default UTF-8 behaviour #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 15 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,13 +115,16 @@ console.log(buffer);
| extensionCodec | ExtensionCodec | `ExtensionCodec.defaultCodec` |
| context | user-defined | - |
| forceBigIntToInt64 | boolean | false |
| useRawBinaryStrings | boolean | false |
| maxDepth | number | `100` |
| initialBufferSize | number | `2048` |
| sortKeys | boolean | false |
| forceFloat32 | boolean | false |
| forceIntegerToFloat | boolean | false |
| ignoreUndefined | boolean | false |

To skip UTF-8 decoding of strings, `useRawBinaryStrings` can be set to `true`. In this case, strings are decoded into `Uint8Array`.

### `decode(buffer: ArrayLike<number> | BufferSource, options?: DecoderOptions): unknown`

It decodes `buffer` that includes a MessagePack-encoded object, and returns the decoded object typed `unknown`.
Expand Down Expand Up @@ -522,18 +525,19 @@ The mapping of integers varies on the setting of `intMode`.
| number (53-bit int) | int family | number or bigint (\*2) |
| number (64-bit float) | float family | number (64-bit float) |
| bigint | int family | number or bigint (\*2) |
| string | str family | string |
| ArrayBufferView | bin family | Uint8Array (\*3) |
| string | str family | string (\*3) |
| ArrayBufferView | bin family | Uint8Array (\*4) |
| Array | array family | Array |
| Object | map family | Object (\*4) |
| Date | timestamp ext family | Date (\*5) |
| Object | map family | Object (\*5) |
| Date | timestamp ext family | Date (\*6) |
| bigint | int family | bigint |

- \*1 Both `null` and `undefined` are mapped to `nil` (`0xC0`) type, and are decoded into `null`
- \*2 MessagePack ints are decoded as either numbers or bigints depending on the [IntMode](#intmode) used during decoding.
- \*3 Any `ArrayBufferView`s including NodeJS's `Buffer` are mapped to `bin` family, and are decoded into `Uint8Array`
- \*4 In handling `Object`, it is regarded as `Record<string, unknown>` in terms of TypeScript
- \*5 MessagePack timestamps may have nanoseconds, which will lost when it is decoded into JavaScript `Date`. This behavior can be overridden by registering `-1` for the extension codec.
* \*1 Both `null` and `undefined` are mapped to `nil` (`0xC0`) type, and are decoded into `null`
* \*2 MessagePack ints are decoded as either numbers or bigints depending on the [IntMode](#intmode) used during decoding.
* \*3 If you'd like to skip UTF-8 decoding of strings, set `useRawBinaryStrings: true`. In this case, strings are decoded into `Uint8Array`.
* \*4 Any `ArrayBufferView`s including NodeJS's `Buffer` are mapped to `bin` family, and are decoded into `Uint8Array`
* \*5 In handling `Object`, it is regarded as `Record<string, unknown>` in terms of TypeScript
* \*6 MessagePack timestamps may have nanoseconds, which will lost when it is decoded into JavaScript `Date`. This behavior can be overridden by registering `-1` for the extension codec.

If you set `useBigInt64: true`, the following mapping is used:

Expand All @@ -550,8 +554,9 @@ If you set `useBigInt64: true`, the following mapping is used:
| Object | map family | Object |
| Date | timestamp ext family | Date |

- \*5 If the bigint is larger than the max value of uint64 or smaller than the min value of int64, then the behavior is undefined.
* \*6 If the bigint is larger than the max value of uint64 or smaller than the min value of int64, then the behavior is undefined.

* \*7 If the bigint is larger than the max value of uint64 or smaller than the min value of int64, then the behavior is undefined.
## Prerequisites

This is a universal JavaScript library that supports major browsers and NodeJS.
Expand Down
28 changes: 24 additions & 4 deletions src/Decoder.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,17 @@ export type DecoderOptions<ContextType = undefined> = Readonly<
*/
intMode?: IntMode;

/**
* By default, string values will be decoded as UTF-8 strings. However, if this option is true,
* string values will be returned as Uint8Arrays without additional decoding.
*
* This is useful if the strings may contain invalid UTF-8 sequences.
*
* Note that this option only applies to string values, not map keys. Additionally, when
* enabled, raw string length is limited by the maxBinLength option.
*/
useRawBinaryStrings: boolean;

/**
* Maximum string length.
*
Expand Down Expand Up @@ -202,6 +213,7 @@ export class Decoder<ContextType = undefined> {
private readonly extensionCodec: ExtensionCodecType<ContextType>;
private readonly context: ContextType;
private readonly intMode: IntMode;
private readonly useRawBinaryStrings: boolean;
private readonly maxStrLength: number;
private readonly maxBinLength: number;
private readonly maxArrayLength: number;
Expand All @@ -222,6 +234,7 @@ export class Decoder<ContextType = undefined> {
this.context = (options as { context: ContextType } | undefined)?.context as ContextType; // needs a type assertion because EncoderOptions has no context property when ContextType is undefined

this.intMode = options?.intMode ?? (options?.useBigInt64 ? IntMode.AS_ENCODED : IntMode.UNSAFE_NUMBER);
this.useRawBinaryStrings = options?.useRawBinaryStrings ?? false;
this.maxStrLength = options?.maxStrLength ?? UINT32_MAX;
this.maxBinLength = options?.maxBinLength ?? UINT32_MAX;
this.maxArrayLength = options?.maxArrayLength ?? UINT32_MAX;
Expand Down Expand Up @@ -406,7 +419,7 @@ export class Decoder<ContextType = undefined> {
} else {
// fixstr (101x xxxx) 0xa0 - 0xbf
const byteLength = headByte - 0xa0;
object = this.decodeUtf8String(byteLength, 0);
object = this.decodeString(byteLength, 0);
}
} else if (headByte === 0xc0) {
// nil
Expand Down Expand Up @@ -450,15 +463,15 @@ export class Decoder<ContextType = undefined> {
} else if (headByte === 0xd9) {
// str 8
const byteLength = this.lookU8();
object = this.decodeUtf8String(byteLength, 1);
object = this.decodeString(byteLength, 1);
} else if (headByte === 0xda) {
// str 16
const byteLength = this.lookU16();
object = this.decodeUtf8String(byteLength, 2);
object = this.decodeString(byteLength, 2);
} else if (headByte === 0xdb) {
// str 32
const byteLength = this.lookU32();
object = this.decodeUtf8String(byteLength, 4);
object = this.decodeString(byteLength, 4);
} else if (headByte === 0xdc) {
// array 16
const size = this.readU16();
Expand Down Expand Up @@ -636,6 +649,13 @@ export class Decoder<ContextType = undefined> {
this.stack.pushArrayState(size);
}

private decodeString(byteLength: number, headerOffset: number): string | Uint8Array {
if (!this.useRawBinaryStrings || this.stateIsMapKey()) {
return this.decodeUtf8String(byteLength, headerOffset);
}
return this.decodeBinary(byteLength, headerOffset);
}

private decodeUtf8String(byteLength: number, headerOffset: number): string {
if (byteLength > this.maxStrLength) {
throw new DecodeError(
Expand Down
49 changes: 49 additions & 0 deletions test/decode-raw-strings.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import assert from "assert";
import { encode, decode } from "../src";
import type { DecoderOptions } from "../src";

describe("decode with useRawBinaryStrings specified", () => {
const options = { useRawBinaryStrings: true } satisfies DecoderOptions;

it("decodes string as binary", () => {
const actual = decode(encode("foo"), options);
const expected = Uint8Array.from([0x66, 0x6f, 0x6f]);
assert.deepStrictEqual(actual, expected);
});

it("decodes invalid UTF-8 string as binary", () => {
const invalidUtf8String = Uint8Array.from([
61, 180, 118, 220, 39, 166, 43, 68, 219, 116, 105, 84, 121, 46, 122, 136, 233, 221, 15, 174, 247, 19, 50, 176,
184, 221, 66, 188, 171, 36, 135, 121,
]);
const encoded = Uint8Array.from([
196, 32, 61, 180, 118, 220, 39, 166, 43, 68, 219, 116, 105, 84, 121, 46, 122, 136, 233, 221, 15, 174, 247, 19, 50,
176, 184, 221, 66, 188, 171, 36, 135, 121,
]);

const actual = decode(encoded, options);
assert.deepStrictEqual(actual, invalidUtf8String);
});

it("decodes object keys as strings", () => {
const actual = decode(encode({ key: "foo" }), options);
const expected = { key: Uint8Array.from([0x66, 0x6f, 0x6f]) };
assert.deepStrictEqual(actual, expected);
});

it("ignores maxStrLength", () => {
const lengthLimitedOptions = { ...options, maxStrLength: 1 } satisfies DecoderOptions;

const actual = decode(encode("foo"), lengthLimitedOptions);
const expected = Uint8Array.from([0x66, 0x6f, 0x6f]);
assert.deepStrictEqual(actual, expected);
});

it("respects maxBinLength", () => {
const lengthLimitedOptions = { ...options, maxBinLength: 1 } satisfies DecoderOptions;

assert.throws(() => {
decode(encode("foo"), lengthLimitedOptions);
}, /max length exceeded/i);
});
});