Add support for compression dictionary transport#1854
Conversation
|
The RFC is pending publication so this will have to wait until that happens (should be any day now). RFC number has been assigned and final edits are complete: https://www.rfc-editor.org/auth48/rfc9842 |
e089eab to
8bc5398
Compare
|
RFC has published now so this should be ready to go (just rebased it). |
|
From the TPAC 2025 discussions, I added support for opaque responses to use compression dictionary when the |
defe29f to
e8ce404
Compare
e8ce404 to
b0403bd
Compare
|
Sorry for dragging this out. I think I addressed all of the questions/issues. The main question outstanding right now is if the no-cors path using CORP is "approved" or if there are more steps to getting that integrated before this can land. I filed it as a question with the TAG but I'm happy do do a more formal review somewhere if necessary. |
|
I switched back to only allowing dictionary use for non-opaque requests. It's cleaner from the browser's perspective and harder to get wrong for sites. This is what the browsers have implemented anyway and what the current WPT's test. The spec change should be ready to go now and match Chrome's and Mozilla's implementations. If we want to bring dictionary support to third-party embeds in some way (the main pain point for |
|
|
||
| <li><p>Let <var>pattern</var> be the result of | ||
| <a for=/>creating a URL pattern</a> from <var>dictionaryValue</var>["<code>match</code>"] | ||
| and <var>request</var>'s <a for=request>current URL</a>. |
There was a problem hiding this comment.
It seems the text for callers of this algorithm is written that way: "creating a URL pattern, given dictionaryValue["match"], request's current URL and an empty map".
Also, request's current URL is an URL not a string, so we should perform a serialization similar to https://urlpattern.spec.whatwg.org/#other-specs-http (IIUC we can't use that one, because dictionary["match"] is a dictionary value not an HTTP structured field value).
There was a problem hiding this comment.
Thanks. This should be fixed now. I changed it to extract the "item" from match and to serialize the URL. Let me know if you think a different serialization is needed than the one I used.
https://bugs.webkit.org/show_bug.cgi?id=295249 - Add build/runtime flag for compression dictionary transport. - Add "compression-dictionary" destination type. whatwg/fetch#1854 - Add Link rel "compression-dictionary". whatwg/html#11619
| the <a lt="URL serializer">serialization</a> of <var>request</var>'s <a for=request>current URL</a>, | ||
| and an empty map. | ||
|
|
||
| <li><p>If <var>pattern</var> is failure or <var>pattern</var> <a for=/>has regexp groups</a>, |
There was a problem hiding this comment.
"creating a URL pattern" uses algos from https://url.spec.whatwg.org/ that can return failure, but when that happens it actually throw an exception, rather than returning failure?
I wanted to ask about these potential exceptions the other day, but couldn't find an obvious input here that would make the algo throw... Anyway, I find this spec indeed deals with the case when an exception is thrown so I guess we probably want the same here: https://wicg.github.io/connection-allowlists/#abstract-opdef-parse-a-connection-allowlist-header
There was a problem hiding this comment.
It also seems we lack tests for this. We should have somewhat exhaustive tests for error conditions.
There was a problem hiding this comment.
Thanks. I added similar language that it should return response if the URL Pattern creation throws an exception. I'll add WPT tests for that and the rest of the edge cases that aren't currently covered now (will take a few days to work their way through).
There was a problem hiding this comment.
I just landed a (hopefully comprehensive) set of WPT tests to cover all of the edge cases I could think of: web-platform-tests/wpt#60164
- Invalid and out-of-scope
matchproperties (syntax failures and cross-origin patterns). - Invalid/unsupported dictionary
typetokens. - Comprehensive
match-destparsing (unknown destinations, matching/non-matching destinations, and explicit wildcards). - Dictionary
idmaximum length validation (exactly 1024 characters vs exceeding 1025 characters). - Robustness against unknown additional dictionary parameters.
- Rejection of entirely malformed structured headers (
Use-As-Dictionary: ?0) and non-cacheable responses (max-age=0). - Opaque response tainting resulting from cross-origin redirects under
no-corsmode correctly ignoring registration. - Evaluation of
Available-Dictionaryon matching redirect targets across redirect chains. - Precedence evaluation when matching overlapping pattern scopes.
- Decoding failures resulting from dictionary hash mismatches throwing clean network errors for both
dcbanddcz.
There was a problem hiding this comment.
Thank you so much for these tests! In general they look good but I have some suggestions/comments/questions:
Dictionary with unknown match-dest is not used for fetch() API
So this is aligned with your PR because we return the response if matchDestList is empty (so similar to "Dictionary registration with empty match-dest list acts as wildcard'"). But can you please also add a similar test that does not hit the empty case:
compression_dictionary_promise_test(async (t) => {
const match_dest = encodeURIComponent('("asdf", "")');
...
As I read your PR, "asdf" would be removed but the dictionary registration would still succeed, right? (similar to "Dictionary registration with matching fetch destination")
Dictionary registration with 1024 character dictionary ID
For completeness, can we also have a test that checks the character range. That would also exercise parsing/serialization of the dquotes and backslash is properly done as per rfc965:
+compression_dictionary_promise_test(async (t) => {
+ // https://www.rfc-editor.org/info/rfc9651/#name-parsing-a-string
+ // double quotes and backslash are escaped in headers per RFC9651.
+ let dictionaryIDsuffix = " !#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~";
+ let dictionaryID = `\\"\\\\${dictionaryIDsuffix}`;
+ const dict = await (await fetch(`${kRegisterDictionaryPath}?id=${encodeURIComponent(dictionaryID)}`)).text();
+ // Wait until `available-dictionary` header is available.
+ assert_equals(
+ await waitUntilAvailableDictionaryHeader(t, {}),
+ kDefaultDictionaryHashBase64);
+ assert_equals(await checkHeader('dictionary-id', {}), `"${dictionaryID}"`);
+}, `Dictionary registration with dictionary ID (valid characters and backslash escaping)`);
Dictionary with 1025 character dictionary ID is not registered
This test does not seem aligned with your PR? The ID would be ignored, so that should just behave like the "Simple dictionary registration and unregistration" case?
Also we can have a test for invalid character range (like '€') but in that case https://httpwg.org/specs/rfc9651.html#error-handling says the entire field should be ignored, so we would indeed really check that there is no available-dictionary header here (as opposed to a successful registration without ID).
Note that per https://httpwg.org/specs/1.html#error-handling the spec author can decide whether they want to ignore the entire field or define other handling (here remove the id from the dictionary).
Overlapping match patterns prioritize the more specific dictionary
Would be nice to test a bit more the priority from https://www.rfc-editor.org/info/rfc9842/#section-2.2.3
At least I think
- A Use-As-Dictionary with "match-dest" has priority over a Use-As-Dictionary without "match-dest" (with a longer "match" length even).
- Two Use-As-Dictionaries with the same match's length and matchDest's emptiness but different fetch time (the most recent wins).
Would be nice to test for the "best match" algorithms that involve more than 2 dictionaries too.
There was a problem hiding this comment.
Thanks, I'll update the tests (may take a week to make it through the plumbing).
As far as priority goes, path specificity is the only factor in prioritization (with freshness being a tie-breaker). match-dest is just a filter for what candidates can be considered.
For the unknown types, I'll see if something is broken with the current spec language and update it but the intent is:
- Default to match-everything (by defaulting to an empty list)
- If a
match-destare specified and it is not an empty list, ONLY match if the current dest is in the list.
For unsupported dests, this likely means we don't want to register them at all since there is nothing they could match and we don't want them to be treated like a wildcard. This might need some additional logic in the registration flow to remember the initial state of the match-dest list (empty or not) before filtering for known dest's and then drop the registration if the list is empty but it wasn't initially empty.
We want to avoid the case of a new dest being introduced and dictionaries that are targeting that dest accidentally behaving like a wildcard (and maybe taking priority over all other dictionaries for browsers that don't support the new dest).
There was a problem hiding this comment.
As far as priority goes, path specificity is the only factor in prioritization (with freshness being a tie-breaker). match-dest is just a filter for what candidates can be considered.
But https://www.rfc-editor.org/info/rfc9842/#name-multiple-matching-dictionar actually does use match-dest non-emptiness as the 1st factor for prioritization, with match length being second and freshness being third:
- For clients that support request destinations, a dictionary that specifies and matches a "match-dest" takes precedence over a match that does not use a destination
- Given equivalent destination precedence [...]
- Given equivalent destination and match length precedence [...]
There was a problem hiding this comment.
Yikes - thanks for pointing that out - Chrome's implementation probably needs to be fixed then (sorry, been working on it for so many years at this point that I'm forgetting parts). I'll fix the tests to make sure they test the behavior (and fix Chrome's implementation).
There was a problem hiding this comment.
FYI, I updated the PR to fail registration when id is invalid to match the test. Generally we want to ignore unknown fields (or unknown values in fields where we expect for future extensibility of values) but known fields with invalid values we should fail the registration.
There was a problem hiding this comment.
FYI, I updated the PR to fail registration when id is invalid to match the test.
For unsupported dests, this likely means we don't want to register them at all since there is nothing they could match and we don't want them to be treated like a wildcard.
OK I think these address my remaining concerns regarding mismatch between spec and tests. For match-dest, I believe the match-dest change is still pending, but probably you can just do an early check for the emptiness of match-dest rather than remembering the initial state (see my other comment above).
| "<code>dictionary</code>", and <var>response</var>'s <a for=response>header list</a>. | ||
|
|
||
| <li><p>If <var>dictionaryValue</var> is null or <var>dictionaryValue</var>["<code>match</code>"] | ||
| does not <a for=map>exist</a>, then return <var>response</var>. |
There was a problem hiding this comment.
What about other parameters?
For example https://www.rfc-editor.org/rfc/rfc9842#name-type says a client should not deal with a type value it does not understand, so I guess we should bail out if that happens.
For id what happens if it exceeds the 1024 characters limit?
For match-dest, I guess clients would just ignore unknown destinations.
There was a problem hiding this comment.
Added details on how to handle invalid values for the other parameters. I don't explicitly call out that unknown keys should be ignored but I can add an explicit step to delete unknown keys if you think it would help.
There was a problem hiding this comment.
Yeah, I was not sure what is the common way to handle this, however https://httpwg.org/specs/rfc9651.html#preserving-extensibility mentions this:
Specifications that use Dictionaries can also allow for forward compatibility by requiring that the presence of -- as well as value and type associated with -- unknown keys be ignored. Subsequent specifications can then add additional keys, specifying constraints on them as appropriate.
So maybe the spec should be explicit about ignoring unknown keys.
Incidentally, https://httpwg.org/specs/rfc9651.html#error-handling says ignoring the entire field is the default behavior when field-specific constraints are violated so it's indeed good the spec is now explicit for id, type, etc that we only ignore the corresponding key.
(note: probably rfc9651 should also have similar wording)
|
|
||
| <li><p>If <var>key</var> is null, then return null. | ||
|
|
||
| <li><p>Return the unique compression-dictionary cache associated with <var>key</var>. [[!RFC9842]] |
There was a problem hiding this comment.
I think we should make compression-dictionary cache a link.
There was a problem hiding this comment.
This mirrors the section above on HTTP cache partitions where it does "Return the unique HTTP cache associated with...". The RFC itself doesn't go into detail about the cache itself and how it operates. Is that something that needs to be fully defined somewhere?
| <li><p>If <var>compressionDictionaryCache</var> is null, then return <var>response</var>. | ||
|
|
||
| <li><p>Let <var>pattern</var> be the result of | ||
| <a for=/>creating a URL pattern</a> given the bare item of <var>dictionaryValue</var>["<code>match</code>"], |
There was a problem hiding this comment.
Added a link to the structured field parsing reference of bare item.
|
|
||
| <li><p>If <var>availableDictionaryItem</var> is null, then return a <a>network error</a>. | ||
|
|
||
| <li><p>Let <var>availableDictionaryHash</var> be the bare item of <var>availableDictionaryItem</var>. |
There was a problem hiding this comment.
This could maybe be inlined, though again we need more clarity on bare item.
|
I vaguely remember we discussed the token name incorrectly containing a hyphen, but I can't find it. Do you remember where that is? I suspect it's not worth correcting at this stage, but we should call it out in the specification is some kind of error we regret in the name of web compatibility so that we don't make it again for a new value. |
@annevk the link relation discussion was here which is where the token came from AFAIK. |
https://bugs.webkit.org/show_bug.cgi?id=295249 - Add build/runtime flag for compression dictionary transport. - Add "compression-dictionary" destination type. whatwg/fetch#1854 - Add Link rel "compression-dictionary". whatwg/html#11619
https://bugs.webkit.org/show_bug.cgi?id=295249 - Add build/runtime flag for compression dictionary transport. - Add "compression-dictionary" destination type. whatwg/fetch#1854 - Add Link rel "compression-dictionary". whatwg/html#11619
https://bugs.webkit.org/show_bug.cgi?id=295249 - Add build/runtime flag for compression dictionary transport. - Add "compression-dictionary" destination type. whatwg/fetch#1854 - Add Link rel "compression-dictionary". whatwg/html#11619
| with an algorithm that verifies that the dictionary hash in the stream matches | ||
| <var>availableDictionaryHash</var> and decodes the rest of the stream with the applicable | ||
| algorithm as defined in [[!RFC9842]]. If verification or decoding fails, | ||
| error the transformed stream. |
There was a problem hiding this comment.
I'm not sure what's implied by this "error the transformed stream". Shouldn't we return a network error if decoding/verification fails? If not what do we expect for newBody in that case?
Also, about the hash verification, I don't see a lot from RFC 9842... Looking for "hash" or "available-dictionary", I find these paragraphs:
- https://www.rfc-editor.org/info/rfc9842/#name-available-dictionary
- https://www.rfc-editor.org/info/rfc9842/#section-2.3
- https://www.rfc-editor.org/info/rfc9842/#name-changing-content
- https://www.rfc-editor.org/info/rfc9842/#name-negotiating-the-content-enc
and in particular "The dictionary is validated using an SHA-256 hash of the content to make sure that the client and server are both using the same dictionary." from which one can infer the server is expected to send back the same hash if it has found the same on its side.
So can we actually just add a previous step that checks availableDictionaryHash matches the Available-Dictionary in request's header at step 9 and return a network error otherwise. Or Am I missing something?
There was a problem hiding this comment.
availableDictionaryHash and Available-Dictionary are both generated by the client and attributes of the request. The issue we need to protect against is a server responding with a dcb or dcz stream that was compressed with a dictionary other than the one requested (has happened to more than a few people when deploying because of incorrectly-configured Vary response headers).
I'm happy to change it to be a network error of some kind if there's a sensible way to plumb that. Where it gets a bit complicated is that it's a problem with the stream, not necessarily the HTTP-level response container and is closer to being a corrupt payload (like sending a brotli payload with Content-Encoding: zstd). It also won't show up until the body starts being processed/read.
That feels like it's a problem at the stream level rather than the network level but I'm happy to plumb it however it best fits into the spec.
There was a problem hiding this comment.
Thank you, I see! So this is referring to the remaining instances of "hash" (that I disregarded yesterday because I thought they were irrelevant):
https://www.rfc-editor.org/info/rfc9842/#name-dictionary-compressed-brotl
https://www.rfc-editor.org/info/rfc9842/#name-dictionary-compressed-zstan
The header consists of a fixed 4-byte sequence and a 32-byte hash of the external dictionary that was used.
A "Dictionary-Compressed Zstandard" stream is a binary stream that starts with a 40-byte fixed header
So what do you think of this minor clarifications:
that first verifies that the dictionary hash in the stream's header matches availableDictionaryHash and decodes the rest of the stream with the applicable algorithm as defined in §4. Dictionary-Compressed Brotli and §5. Dictionary-Compressed Zstandard of [!RFC9842]].
So now I agree this is more an error at the stream level. Still I don't understand the implication of "error the transformed stream" (probably because I'm not familiar with the spec terminologies). What will be newBody after such an error?
There was a problem hiding this comment.
I'll see if I can find a better way to plumb the error that isn't hand-wavy (FWIW, I'm not all that well versed in writing specs so thanks for pushing on these unclear cases).
| with an algorithm that verifies that the dictionary hash in the stream matches | ||
| <var>availableDictionaryHash</var> and decodes the rest of the stream with the applicable | ||
| algorithm as defined in [[!RFC9842]]. If verification or decoding fails, | ||
| error the transformed stream. |
There was a problem hiding this comment.
Thank you, I see! So this is referring to the remaining instances of "hash" (that I disregarded yesterday because I thought they were irrelevant):
https://www.rfc-editor.org/info/rfc9842/#name-dictionary-compressed-brotl
https://www.rfc-editor.org/info/rfc9842/#name-dictionary-compressed-zstan
The header consists of a fixed 4-byte sequence and a 32-byte hash of the external dictionary that was used.
A "Dictionary-Compressed Zstandard" stream is a binary stream that starts with a 40-byte fixed header
So what do you think of this minor clarifications:
that first verifies that the dictionary hash in the stream's header matches availableDictionaryHash and decodes the rest of the stream with the applicable algorithm as defined in §4. Dictionary-Compressed Brotli and §5. Dictionary-Compressed Zstandard of [!RFC9842]].
So now I agree this is more an error at the stream level. Still I don't understand the implication of "error the transformed stream" (probably because I'm not familiar with the spec terminologies). What will be newBody after such an error?
| <p>If <var>dictionaryValue</var>["<code>match-dest</code>"] <a for=map>exists</a>: | ||
|
|
||
| <ol> | ||
| <li><p>Let <var>matchDestList</var> be <var>dictionaryValue</var>["<code>match-dest</code>"][0]. |
There was a problem hiding this comment.
I'll see if I can find a better way to word it but it is coming from the tuple definition from the structured field parsing of an item where the individual items of the parsed dictionary are tuples.
| the <a lt="URL serializer">serialization</a> of <var>request</var>'s <a for=request>current URL</a>, | ||
| and an empty map. | ||
|
|
||
| <li><p>If <var>pattern</var> is failure or <var>pattern</var> <a for=/>has regexp groups</a>, |
There was a problem hiding this comment.
Thank you so much for these tests! In general they look good but I have some suggestions/comments/questions:
Dictionary with unknown match-dest is not used for fetch() API
So this is aligned with your PR because we return the response if matchDestList is empty (so similar to "Dictionary registration with empty match-dest list acts as wildcard'"). But can you please also add a similar test that does not hit the empty case:
compression_dictionary_promise_test(async (t) => {
const match_dest = encodeURIComponent('("asdf", "")');
...
As I read your PR, "asdf" would be removed but the dictionary registration would still succeed, right? (similar to "Dictionary registration with matching fetch destination")
Dictionary registration with 1024 character dictionary ID
For completeness, can we also have a test that checks the character range. That would also exercise parsing/serialization of the dquotes and backslash is properly done as per rfc965:
+compression_dictionary_promise_test(async (t) => {
+ // https://www.rfc-editor.org/info/rfc9651/#name-parsing-a-string
+ // double quotes and backslash are escaped in headers per RFC9651.
+ let dictionaryIDsuffix = " !#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~";
+ let dictionaryID = `\\"\\\\${dictionaryIDsuffix}`;
+ const dict = await (await fetch(`${kRegisterDictionaryPath}?id=${encodeURIComponent(dictionaryID)}`)).text();
+ // Wait until `available-dictionary` header is available.
+ assert_equals(
+ await waitUntilAvailableDictionaryHeader(t, {}),
+ kDefaultDictionaryHashBase64);
+ assert_equals(await checkHeader('dictionary-id', {}), `"${dictionaryID}"`);
+}, `Dictionary registration with dictionary ID (valid characters and backslash escaping)`);
Dictionary with 1025 character dictionary ID is not registered
This test does not seem aligned with your PR? The ID would be ignored, so that should just behave like the "Simple dictionary registration and unregistration" case?
Also we can have a test for invalid character range (like '€') but in that case https://httpwg.org/specs/rfc9651.html#error-handling says the entire field should be ignored, so we would indeed really check that there is no available-dictionary header here (as opposed to a successful registration without ID).
Note that per https://httpwg.org/specs/1.html#error-handling the spec author can decide whether they want to ignore the entire field or define other handling (here remove the id from the dictionary).
Overlapping match patterns prioritize the more specific dictionary
Would be nice to test a bit more the priority from https://www.rfc-editor.org/info/rfc9842/#section-2.2.3
At least I think
- A Use-As-Dictionary with "match-dest" has priority over a Use-As-Dictionary without "match-dest" (with a longer "match" length even).
- Two Use-As-Dictionaries with the same match's length and matchDest's emptiness but different fetch time (the most recent wins).
Would be nice to test for the "best match" algorithms that involve more than 2 dictionaries too.
https://bugs.webkit.org/show_bug.cgi?id=295249 - Add build/runtime flag for compression dictionary transport. - Add "compression-dictionary" destination type. whatwg/fetch#1854 - Add Link rel "compression-dictionary". whatwg/html#11619
| }; | ||
|
|
||
| enum RequestDestination { "", "audio", "audioworklet", "document", "embed", "font", "frame", "iframe", "image", "json", "manifest", "object", "paintworklet", "report", "script", "sharedworker", "style", "text", "track", "video", "worker", "xslt" }; | ||
| enum RequestDestination { "", "audio", "audioworklet", "compression-dictionary", "document", "embed", "font", "frame", "iframe", "image", "json", "manifest", "object", "paintworklet", "report", "script", "sharedworker", "style", "text", "track", "video", "worker", "xslt" }; |
There was a problem hiding this comment.
If you are adding a new destination, that might affect the internal priority calculated by step 15 of https://fetch.spec.whatwg.org/#fetching ; currently that's completely implementation-defined so don't really need to say anything in the spec and that's probably off topic for this PR... But anyway I'm curious for WebKit and Firefox to know how Chromium is calculating fetch priority for the "compression-dictionary" destination? (naively, I would suspect it is low priority).
There was a problem hiding this comment.
Chrome defaults it to an idle-level priority request (and I believe it tries to wait until after the document onload has fired but I need to check to verify).
| <p>If <var>dictionaryValue</var>["<code>match-dest</code>"] <a for=map>exists</a>: | ||
|
|
||
| <ol> | ||
| <li><p>Let <var>matchDestList</var> be <var>dictionaryValue</var>["<code>match-dest</code>"][0]. |
| <ol> | ||
| <li><p>Let <var>matchDestList</var> be <var>dictionaryValue</var>["<code>match-dest</code>"][0]. | ||
|
|
||
| <li><p>For each <var>dest</var> of <var>matchDestList</var>: if <var>dest</var>'s <a>bare item</a> |
There was a problem hiding this comment.
So per what you said earlier, condition this and the next item on whether matchDestList is nonempty?
| the <a lt="URL serializer">serialization</a> of <var>request</var>'s <a for=request>current URL</a>, | ||
| and an empty map. | ||
|
|
||
| <li><p>If <var>pattern</var> is failure or <var>pattern</var> <a for=/>has regexp groups</a>, |
There was a problem hiding this comment.
FYI, I updated the PR to fail registration when id is invalid to match the test.
For unsupported dests, this likely means we don't want to register them at all since there is nothing they could match and we don't want them to be treated like a wildcard.
OK I think these address my remaining concerns regarding mismatch between spec and tests. For match-dest, I believe the match-dest change is still pending, but probably you can just do an early check for the emptiness of match-dest rather than remembering the initial state (see my other comment above).
Add processing steps for handling HTTP Compression Dictionary Transport content encoding and dictionary negotiation (RFC pending publication).
This adds a processing layer between the HTTP cache and network fetch that handles most of the dictionary-based content encoding (including matching dictionaries to outgoing requests).
Additionally, it adds processing above the HTTP cache for storing the dictionaries for future use and defines the "compression-dictionary" initiator and destination (the matching HTML spec update is in-process).
Support for clearing the caches through clear-site-data is in this PR.
Fix #1739, #1839
Preview | Diff