I was able to find these requests and piece together a timeline from what the validation authority on our side logged.
Thanks a lot for the effort.
As you noticed the staging env performs domain validation from multiple network perspectives.
Ah… it rings a bell now you say it, but looking at my logs alone I thought it was one thing retrying.
That makes a lot of sense why my heap is going down then, the TLS connections are expensive at mbedTLS level and the attempts are coming in parallel; the first ones really are still ongoing when the later ones come. I was wondering why the heap was not recovering sequentially.
That makes it sound like something simply broke my end and we are timing out each parallel attempt in turn (except the last one, which doesn’t really get very far due to OOM). I confirmed this works on a PC using mbedtls, but that was a few days ago and several changes were needed to adapt it for ESP32. I will check we’re still working on a PC and study the ESP32 behaviour closer today.
Lws already has a concept of restricting the amount of simultaneous TLS by deferring additional connection accept until we are below the TLS limit again, but I didn’t think to enable it on the temporary sni vhost.
The 1.6s delay in the requests seen my side is reflecting the singlethreaded event loop in lws in blocked for about that time by the mbedTLS actions. So as soon as we finished the work in mbedTLS for the first connection, we return to the event loop and saw the next connection, which had been waiting. Even if we have multiple TLS in play, this serialization effect will still be there on our side since we can only do the mbedTLS work one at a time. Your logs show two of the four starting together and the remaining two at +300ms and +2s… if the attempts can be cleared in 1.7s and we restrict it to one at a time on our side, ignoring RTTs it should be like
source start connect done (delay)
1 0 0 1.7 1.7
2 0 1.7 3.4 3.4
3 0.3 3.4 5.1 4.8
4 2.0 5.1 6.8 4.8
Considering roundtrip times on top, and there’s no guarantee about the test starts being staggered, it’s maybe too close for comfort to 5s. Anyway before worrying about that I will try to find out why we are timing out on the first two at least, which is going to be something my side, and reply again later. Thanks again for the insight.
so little heap space
We’re inheriting the advantages from a lot of other projects underneath that were also modest with space, like lwip and mbedtls; ESP32 has half the wireless-related pieces and libc in ROM as well. But even so it’s pretty wild the entire image I am working on is 920KB in flash including the OS, wlan stack, all pictures, JS, http/2 + websocket server etc, but once it gets to the browser, the TLS setup delay was the only sign you’re not talking to a normal server…