12-21-2024 08:00 AM
My apologies for the density of the following, but I have a networking problem that I could use some help understanding.
When I access one particular website ( https://www.nemlog-in.dk ) from Chrome on a laptop connected to a Google WiFi network the result is that Chrome reports ERR_CONNECTION_RESET. This is fully reproduceable, and I get the same error from Chrome on Windows, MacOS and Linux, and from Edge on Windows too. Accessing the same website from Firefox works fine, so the problem seems to affect only Chromium-derived browsers, independent of operating system.
The wired Google WiFi AP is plugged into another router that's on a 1 gig fiber connection. This setup can't be changed as other wired devices are connected to the outer router as well, and it doesn't seem to cause problems in general. Except for the abovementioned problem everything works fine when connected to the wireless network - streaming (Netflix etc.), gaming, videoconferencing, VPN connections, mail/web/SSH etc. Google Home reports about 900Mbps bandwidth, full connectivity.
If instead of using Google WiFi I connect through wired ethernet to the outer router everything works fine from Chrome as well, so it's only when communicating through Google WiFi that the problem shows.
I have used Wireshark to dump the network traffic in the various scenarios. In all cases I get a packet when the browser is connecting like
client->server TLS1.2 Client Hello (SNI=www.nemlog-in.dk)
On Firefox and on Chrome through a wired network the next packet is always
server->client TCP ACK, no data
but on Chrome through Google WiFi I get
client->server TCP Retransmission PSH, ACK
server->client TCP ACK
server->client TCP RST, ACK
so it seems like it's the PSH retransmission that causes the server to drop the connection.
In a case like this I would usually expect that the browser or the server is misbehaving, but given the fact that both Chrome and this website are widely used any kind of incompatibility between them would be a major issue and thus quickly fixed my best guess is that Google WiFi is somehow (in part) to blame.
So - how do I investigate further? Is there any way to make the WiFi log the communications that it routes?
Answered! Go to the Recommended Answer.
12-27-2024 06:05 AM
I had the exact same problem with my in-laws Google WiFi, a few weeks ago, and today with my own.. Factory resetting my in-laws Google WiFi seemed to do the trick, so that's what I just did at home as well, and now it's working here again as well..
12-22-2024 03:32 AM
A couple of additional data points:
Given that it's the combination of Chromium and Google WiFi that malfunctions it could just as well be Chrome that has a problem, but from further experiments it seems that Google WiFi is in fact the culprit.
I have installed another (non-mesh) WiFi router in the exact same way - that is, behind the outer router - and through that (with double NAT) Chrome has no problems accessing the website. A Wireshark dump reveals that the retransmit-package is no longer present, like when running on a wired connection.
In addition Google WiFi makes the Safari browser on MacOS misbehave, only in a different way. The initial load of any website is extremely slow, and it's not the DNS lookup which takes time - the host command in a terminal window responds instantly with the IP address. http-only websites respond much faster than https, but still slower than in Firefox on MacOS, so apparently the delay is to some extent caused by TLS.
Pure Chromium behaves exactly like Chrome and Edge, so whatever it is that makes the browser send different TCP packets when on Google WiFi and other routers is present in Chromium.
It doesn't matter if you're connected to the wired or a non-wired AP; Chrome malfunctions on all of them.
I haven't found any Google WiFi settings that when changed make any difference.
12-27-2024 06:05 AM
I had the exact same problem with my in-laws Google WiFi, a few weeks ago, and today with my own.. Factory resetting my in-laws Google WiFi seemed to do the trick, so that's what I just did at home as well, and now it's working here again as well..
12-27-2024 12:32 PM
Hi - I'm guessing it's a fellow Dane 🙂
Many thanks for the suggestion. I'll try doing a factory reset once I've done a bit more testing. I'd really like to find out what it is that goes wrong; currently it seems that the way Chrome and the server at www.nemlog-in.dk ( 152.73.246.50 ) use TCP/IP triggers a bug in the Google WiFi router.
12-29-2024 07:54 AM
Well, whaddaya know...
I have found out that setting the "TLS 1.3 post-quantum key agreement" feature to Disabled in chrome://flags makes the website work again in Chrome and Chromium. This also applies to MitID, the brokenness of which was the original reason I started investigating.
So, to summarize: With possibly some unknown bad state in Google WiFi, using Chrome/Chromium with "TLS 1.3 post-quantum key agreement" enabled causes the entire NemLog-in website including MitID to break. Eliminating Google WiFi, disabling the flag or using Firefox makes the problem go away.
12-30-2024 03:41 AM
The bug in Google WiFi can also be reproduced without Chrome. Using the test script at https://github.com/dadrian/tldr.fail/blob/main/tldr_fail_test.py (from https://tldr.fail ) through Google WiFi I get
About to send a large TLS ClientHello (1479 bytes) to www.nemlog-in.dk:443.
The server should respond with a TLS ServerHello, which will be some
byte string beginning with b'\x16\x03\x03'. If it closes the
connection or sends something else, the server is misbehaving.
Sending the ClientHello in a single write:
[Errno 104] Connection reset by peer
but through another WiFi access point
About to send a large TLS ClientHello (1479 bytes) to www.nemlog-in.dk:443.
The server should respond with a TLS ServerHello, which will be some
byte string beginning with b'\x16\x03\x03'. If it closes the
connection or sends something else, the server is misbehaving.
Sending the ClientHello in a single write:
b'\x16\x03\x03\x16 [...]
12-31-2024 04:25 AM
Final update:
I have done a factory reset, and as predicted by JesperC it has made the problem go away. Judging by a couple of Wireshark captures the difference is that before the reset there was a large number of retransmits and/or duplicated packets, and that has completely disappeared now.
It specifically seems that what broke www.nemlog-in.dk was that the second packet of the TLS client hello was retransmitted. The 1479 byte payload was split into packets of 1330 and 149 bytes and the last packet was then transmitted twice; the only difference being the IPv4 identification which had been incremented by one (with the corresponding checksum change).
The retransmit is strange, as it has no obvious reason - it happened about 20 ms after the packet was first sent, and it was completely reproduceable in that the second packet of the TLS client hello was always retransmitted. Perhaps the processing of the first packet caused the WiFi router to acknowledge receipt of the second packet too slowly, making the client retransmit?