-
Notifications
You must be signed in to change notification settings - Fork 566
Interop issue between msquic and openssl-3.5-dev #4905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A little more debugging here, it appears that the client is discarding the inbound frames due to a mismatch between the initial source connection id, and the packets source connection id (based on section 7.2 of RFC 9000). Given that msquic sent an SCID of 25c6f641.... in frame 6 of the with_retry.cap, the openssl client expects subsequent packets from the server to contain that SCID for this connection, but the handshake data in frames 7,8 and onward contain a different SCID. This may possibly be related to #3762 , though I can't draw any connection between this behavior and the fact that it works if the client hello isn't split over multiple datagrams |
Looking further at the with_retry.cap file, I think this is related to #3762. Comparing this to a trace I just took in which I only have the client advertise an x25519 keyshare: I can see in the test.cap file (included in test.zip): Frame 1) Client sends an initial packet with DCID 69252cb875da67e9 And everything works (though wireshark seems a bit confused about decoding the encrypted data with the second change), but the client and server transfer data as expected, as seens from the subsequent stream messages Comparing this to the previous with_retry.cap file, in which larger keyshares are offered: From RFC 9000, section 7.2:
Normally the second SCID change would be honored by all parties, and everything is ok (as it is in test.cap above). But When the the initial packet with the client hello spans multiple datagrams, we get an early initial packet from the server (the ACK in Frame 6 in with_retry.cap), which indicates to the client that no further CID updates are expected, and so the change of CID in the server handshake message violates RFC requirements in section 7.2, causing the subsequent drops. I think the right fix here is that, once the server sends an initial packet after a retry, altering the CID in the connection can no longer be allowed. |
With the addition of larger ml-kem keys in our tls handshake, we've uncovered a interop failure, as described here: microsoft/msquic#4905 In short, when we send a client hello that spans multiple datagrams, the servers sends an ACK frame in a datagram prior to sending its server hello. msquic however, recomputes a new SCID always when sending its sserver hello, which is fine nominally, but because in this test the server sends a retry frame to update the SCID, followed by an ACK using that SCID (which is an initial packet), msquic violates the RFC in section 7.2 which states: Once a client has received a valid Initial packet from the server, it MUST discard any subsequent packet it receives on that connection with a different Source Connection ID Because msquic sent an initial packet with that ACK frame, we are required to discard subsequent frames on the connection containing a different SCID. Until msquic fixes that in their implementation we are going to fail the retry interop test, so for now, lets exclude the test. Also, while we're at it, re-add chrome into the client list for our server tests, as that seems to have been lost during the merge. Fixes openssl/project#1132
With the addition of larger ml-kem keys in our tls handshake, we've uncovered a interop failure, as described here: microsoft/msquic#4905 In short, when we send a client hello that spans multiple datagrams, the servers sends an ACK frame in a datagram prior to sending its server hello. msquic however, recomputes a new SCID always when sending its sserver hello, which is fine nominally, but because in this test the server sends a retry frame to update the SCID, followed by an ACK using that SCID (which is an initial packet), msquic violates the RFC in section 7.2 which states: Once a client has received a valid Initial packet from the server, it MUST discard any subsequent packet it receives on that connection with a different Source Connection ID Because msquic sent an initial packet with that ACK frame, we are required to discard subsequent frames on the connection containing a different SCID. Until msquic fixes that in their implementation we are going to fail the retry interop test, so for now, lets exclude the test. Also, while we're at it, re-add chrome into the client list for our server tests, as that seems to have been lost during the merge. Fixes openssl/project#1132 Reviewed-by: Saša Nedvědický <[email protected]> Reviewed-by: Matt Caswell <[email protected]> (Merged from #27014)
With the addition of larger ml-kem keys in our tls handshake, we've uncovered a interop failure, as described here: microsoft/msquic#4905 In short, when we send a client hello that spans multiple datagrams, the servers sends an ACK frame in a datagram prior to sending its server hello. msquic however, recomputes a new SCID always when sending its sserver hello, which is fine nominally, but because in this test the server sends a retry frame to update the SCID, followed by an ACK using that SCID (which is an initial packet), msquic violates the RFC in section 7.2 which states: Once a client has received a valid Initial packet from the server, it MUST discard any subsequent packet it receives on that connection with a different Source Connection ID Because msquic sent an initial packet with that ACK frame, we are required to discard subsequent frames on the connection containing a different SCID. Until msquic fixes that in their implementation we are going to fail the retry interop test, so for now, lets exclude the test. Also, while we're at it, re-add chrome into the client list for our server tests, as that seems to have been lost during the merge. Fixes openssl/project#1132 Reviewed-by: Saša Nedvědický <[email protected]> Reviewed-by: Matt Caswell <[email protected]> (Merged from #27014) (cherry picked from commit 2fb4cfe)
With the addition of larger ml-kem keys in our tls handshake, we've uncovered a interop failure, as described here: microsoft/msquic#4905 In short, when we send a client hello that spans multiple datagrams, the servers sends an ACK frame in a datagram prior to sending its server hello. msquic however, recomputes a new SCID always when sending its sserver hello, which is fine nominally, but because in this test the server sends a retry frame to update the SCID, followed by an ACK using that SCID (which is an initial packet), msquic violates the RFC in section 7.2 which states: Once a client has received a valid Initial packet from the server, it MUST discard any subsequent packet it receives on that connection with a different Source Connection ID Because msquic sent an initial packet with that ACK frame, we are required to discard subsequent frames on the connection containing a different SCID. Until msquic fixes that in their implementation we are going to fail the retry interop test, so for now, lets exclude the test. Also, while we're at it, re-add chrome into the client list for our server tests, as that seems to have been lost during the merge. Fixes openssl/project#1132 Reviewed-by: Saša Nedvědický <[email protected]> Reviewed-by: Matt Caswell <[email protected]> (Merged from openssl#27014)
Thanks for the bug report! We've been pretty busy lately. We'll try to take a look soon. |
Describe the bug
When doing some interop testing with openssl, I noticed a problem no the quic-interop-runner retry test:
https://github.com/openssl/openssl/actions/runs/13733428533/job/38414155590
In attempting to track it down I gathered the following data in the attached zip file
retrydata.zip
without-retry.cap - tcpdump of connection between openssl client and msquic with retry disabled on quicinteropserver
with-retry.cap - tcpdump of connection between openssl client and msquic with retry enabled on quicinteropserver
without_retry.log - log file generated from msquic quicinteropserver with retry disabled
with_retry.log - log file generated from msquic quicinteropserver with retry enabled
without_retry_keys.log - keylog file for tcpdump with retry disabled on msquic server. Unfortunately no keylog file is generated when retry is enabled, as the handshake never progresses that far.
As the logs show, everything works normally when retry is disabled, but the handshake never completes with retry enabled.
I'm having a hard time making out the server side logs, but it appears that when retry is enabled on the server, the full client hello never gets re-assembled .
This may well be relevant here: Openssl recently enabled the use of ML-KEM keyshares, which dramatically increases the side of the client hello record (spanning 3 datagrams in the tcpdump). I think, looking at the with_retry.log file, on line 277, that msquic eventually decodes the client hello, but then discards the record on line 291.
Its worth noting, that if I set the openssl client up such that it only advertises older keyshares (like X25519), everything also works fine, even with retry enabled. Its only when the client hello spans multiple datagrams with retry enabled on the server that the problem manifests.
Affected OS
Additional OS information
ubuntu linux 22.04
MsQuic version
main
Steps taken to reproduce bug
./quicinteropserver -retry:1 -listen:127.0.0.1 -port:4433 -root:~/www -file:/home/nhorman/git/openssl/test/certs/servercert.pem -key:/home/nhorman/git/openssl/test/certs/serverkey.pem
LD_LIBRARY_PATH=/home/nhorman/git/openssl SSLKEYLOGFILE=./keys.log SSL_CERT_FILE=/certs/ca.pem SSL_CERT_DIR=/certs ./quic-hq-interop 127.0.0.1 4433 ./reqfile.txt
Expected behavior
The files listed in reqfile.txt should be transferred from the server to the client
Actual outcome
client hangs waiting for handshake to complete, server sends some encrypted data, but its contents are unknown as the keylog file is never produced. Connection is never established
Additional details
No response
The text was updated successfully, but these errors were encountered: