ESP32 time bootstrap problem

How to get the time after a cold start?

Tamás Sallai

16 mins

Want to learn AWS serverless development? Click here

The past couple of weeks, I've been working with an ESP32 chip. I'm making experiments at this moment: my goal is to find out if these chips are good enough now. Many years ago I started with ESP8266 chips and they were clearly not: they were so resource-limited that they could not do TLS (and by extension, HTTPS). Any interesting use-case requires a server reachable over the internet, the lack of secure communication disqualified that chip.

But the ESP32 (especially the C6 variant that is RISC-V, meaning the compilers work with it out of the box) seems like a good chip. It is powerful enough for post-quantum ML-KEM cryptography although it's not supported by the ESP-IDF (the base for any ESP-related development) at this moment. I started looking into this topic after reading that it is more immediate than it was expected before.

This is the chip. I added a TFT display because I was working on some experiments whether it can show graphics (it can!). The chip itself is on the left, roughly 2.5 cm long.

While I was making tests for post-quantum TLS (you can find the code here, but don't judge) I noticed that it was failing on the certificate validation step. This made me look into a related problem: how to get the time after cold-booting the chip? TLS needs to verify that the certificate it gets from the server is valid and expiration checking is getting more important as it is the primary mechanism behind revocation. This part is hidden when making an HTTPS request in most cases as the time is known but it's important for microcontrollers that don't have a clock that ticks while turned off, such as my ESP32.

It turns out to be a rather tricky challenge called the time bootstrap problem. This is because secure communication requires the knowledge of time, but learning the time from the outside world requires secure communication.

I don't have a solution, but this is the kind of problem I find fascinating: seems easy at first, then it reveals a lot of depth. It's also a fundamental problem; it's not about support of a specific SDK or library but something that affects all devices that lack a reliable clock.

This article is the summary of my research that went into this problem and how I'll approach it in my future projects.

Book

Building GraphQL APIs with AWS AppSync

How to design, implement, and deploy GraphQL-based APIs on the AWS cloud

Secure communication with TLS

When you want secure communication you want to use TLS. It provides authentication in a form of a certificate that the server presents that needs to chain back to a root certificate that the client has hardcoded. This is why every device needs this root store, but it's usually included under the hood: Android has its store, browsers usually ship with certificates, and also the ESP-IDF contains a shortened list.

These root certificates are usually long-lived, so you include them once and update once every 10-15 years.

If this was the full story then it would be enough: since the server can prove itself via the certificate the client can trust the messages.

Certificate revocation

But there is a complication: since there is no good way to revoke an exposed certificate (see Revocation is broken), we use short-lived certificates. This started with Let's Encrypt and over the years the lifespan that the servers present to clients gets shorter and shorter. This is the primary defense against a private key theft.

This means the client needs to check the validity of the certificate it receives. If an attacker can steal a private key for a certificate that is not valid anymore they should not be able to use it.

The time bootstrap problem

But that requires the knowledge of time which some devices, such as my ESP32, does not have. There is a way to store arbitrary data in an NVS (non-volatile storage) but that only provides a lower limit. If the device wasn't turned on for a while, it does not have knowledge of how much time passed.

This is markedly different than the root certificates. The certificates are valid long-term but the time changes after every restart. This means unless the device has some means of keeping time while it's powered off (usually by using a battery) it needs to communicate with an external service. And that needs the current time.

This is the bootstrapping problem.

Attacker capabilities

When I think about the security of a device, I assume that an attacker can read and change the traffic, maybe they can trick it into connecting to their Wifi or there is some other point in the communication chain where the attacker can insert themselves. I'm not considering physical access: if someone can access the device then all bets are off, they can reflash the firmware if it's not locked, or replace the whole device to an identical one.

NTP

The Network Time Protocol (NTP) is the primary way to synchronize clocks over the network. It needs a server and it tries to eliminate the latency introduced by the network call. There are a ton of servers out there you can choose from maintained by different people and organizations.

These servers are organized in different stratums. Stratum 0 is the clock itself (an atomic clock or GPS satellite for example), stratum 1 is the server that synchronizes from them directly, stratum 2 synchronizes from stratum 1 servers, and so on. Since errors are additive you usually want to use a lower-stratum server.

The problem with NTP is that security is completely missing from it. An attacker can inject whatever timestamp they want if they can intercept and change the communication coming from the device.

On its own it is not an immediate problem, but it weakens the security of the TLS certificate checking. Imagine if an attacker could steal the private key for an expired certificate for your server. By also having the ability to set the clock for the device they can do a man-in-the-middle attack: lie that the time is within the expiration time of the compromised certificate and then the device will be happy to talk to an attacker-controlled endpoint.

NTS

The Network Time Security for the Network Time Protocol (NTS, RFC 8915) aims to help with this problem a bit. It still uses NTP as the mechanism to synchronize time but it adds a key establishment step before that. And this key establishment is using TLS.

But wait a minute, isn't it the same problem, just with more components? If time synchronization needs key establishment that needs TLS that needs time, we're back to the same circular problem. The effect is that when validating the certificate of the key establishment server the device still can't do expiration checking.

This is also acknowledged in the RFC:

However, the expectation that the client does not yet have a correctly-set system clock at the time of certificate verification presents difficulties with verifying that the certificate is within its validity period, i.e., that the current time lies between the times specified in the certificate's notBefore and notAfter fields.

...

While there is no perfect solution to this problem, there are several mitigations the client can implement to make it more difficult for an adversary to successfully present an expired certificate

One thing that helps here is that it can be checked that the timestamp is between the certificate's validity period. While it's up to the client to check, I think it's a best practice to do it. This restriction brings some security guarantees: if an attacker can get an expired certificate of an NTS server then the false timestamp is constrained to the validity of the certificate.

This is also mentioned in the RFC:

NTP time replies are expected to be consistent with the NTS-KE TLS certificate validity period, i.e. time replies received immediately after an NTS-KE handshake are expected to lie within the certificate validity period. Implementations are recommended to check that this is the case.

Roughtime

A good source of information about what is available for base internet infrastructure is to look at what Cloudflare offers. From their Time Services page I learned about the Roughtime protocol that aims to solve this time bootstrapping problem. It originates from Google but apparently there is a more diverse set of people behind it now. It's also actively defined: at the time of writing the last draft is 19 and it was published in March 2026.

As an aside, I've been reading a ton of AI-generated text where I need to always evaluate whether what is written makes sense. It was a breath of fresh air to read the RFC where the limiting factor was my understanding instead.

So, what is Roughtime and how is it better than NTS?

Its basic functionality is the same as for the others: there is a server that tells the client the time. It also handles the authentication problem by having the server publish a long-term public key that the clients should store and check.

But it also adds a couple of extra things.

One is the accountability of the servers. A client can chain responses and then can prove to a third party if a server was lying, though it can not pinpoint which one. How it's implemented is clever: by sending the response of the previous server in the request to the next one in the chain a client can prove that the response to the second one was strictly later than to the first one. So if the second server's time response is earlier then it's proof one of them is lying.

Second, it defines a Merkle tree for the signature. One problem with signatures is that its cost is asymmetric: the client sends a small request that the server needs to provide a response to and also sign that response. When there are not many clients it's not a problem but as load increases it can be a bottleneck. So in Roughtime the server can batch responses together, form a tree, sign only the root, and then provide enough proof to each client so that they can each check the response they got. It's a performance improvement, and a clever one indeed.

I found it interesting how the spec provides defence against amplification attacks: the response can not be bigger than the request and it provides the ZZZZ tag so that clients can expand their queries to accommodate that.

One problem with the current spec is that it explicitly specifies Ed25519 as the signing scheme which is not quantum resistant. It is also acknowledged in the RFC:

Since the only supported signature scheme, Ed25519, is not quantum resistant, the Roughtime version described in this document will not survive the advent of quantum computers. A later version will have to be devised and implemented before then.

While this is a problem at this moment, I expect it is going to be solved when the RFC is finalized.

Why not Roughtime?

At first, it seemed that Roughtime is solving exactly the problem I have, but the more I think about it the less I see how it improves NTS. First, it's a draft and while there are some servers supporting it it's not widespread enough. But, of course, this is a temporary problem, when the RFC finalizes and people start adopting it then there could be a healthy ecosystem around it.

Then there is the problem with the long-term public key. In each request there is a delegated certificate that was signed by the private part of that stored public key. This sounds very similar to TLS: the client hardcodes the root certificate and the server's cert is then signed by that. This has downsides and upsides: a negative is that both the long-term key and the delegated keys are managed by the same party, so a compromise can expose both. On the other hand, separating it from the usual trust paths of TLS eliminates some attack vectors, such as when an attacker can write to the /.well-known/acme-challenge directory and then get a valid certificate from Let's Encrypt.

The accountability of the servers is a nice addition, but it's not that useful in a single deployment. My device can already detect that some servers are acting fishy: ask several of them and if they don't agree on the time then some of them are lying. Having a proof for a third-party would help regulate the ecosystem of time servers, but in my own deployment I can do the same without any strong cryptographic guarantees.

DoH/DoT

Let's talk about a seemingly unrelated technology: DNS. For a long time I did not take too much attention how DNS resolution works until I learned that there are ways to move this under TLS as well to make it more private. DoT (DNS over TLS) and DoH (DNS over HTTPS) bring DNS resolution to TLS and HTTPS, transports that are way more private by design than the original protocol.

Get the address for a domain name using Cloudflare's DNS:

curl -H 'accept: application/dns-json' 'https://cloudflare-dns.com/dns-query?name=advancedweb.hu&type=AAAA'

And the response:

{..., "Answer":[{"name":"advancedweb.hu","type":28,"TTL":86400,"data":"2606:50c0:8003::153"}, ...], ...}

How is it relevant to the time bootstrapping?

Since DNS is usually needed before the device can talk to the backend, it adds another certificate to check against. If an attacker can inject a fake time, it has to fall in the validity time of both the DoH certificate and the backend certificate. This raises the bar significantly: unless an old certificate of the DoH server is also exposed, the returned timestamp has to fall within its current certificate's validity time.

How limiting it is depends on how long the DoH's certificate is valid for. In the case of Cloudflare, it is 1 year:

* Server certificate:
*   subject: C=US; ST=California; L=San Francisco; O="Cloudflare, Inc."; CN=cloudflare-dns.com
*   start date: Dec 31 19:20:01 2025 GMT
*   expire date: Dec 21 19:20:01 2026 GMT

Fortunately, certificate validity times are getting shorter:

Subscriber Certificates issued before 2026-03-15 SHOULD NOT have a Validity Period greater than 397 days and MUST NOT have a Validity Period greater than 398 days.

Subscriber Certificates issued on or after 2026-03-15 and before 2027-03-15 SHOULD NOT have a Validity Period greater than 199 days and MUST NOT have a Validity Period greater than 200 days.

Subscriber Certificates issued on or after 2027-03-15 and before 2029-03-15 SHOULD NOT have a Validity Period greater than 99 days and MUST NOT have a Validity Period greater than 100 days.

Subscriber Certificates issued on or after 2029-03-15 SHOULD NOT have a Validity Period greater than 46 days and MUST NOT have a Validity Period greater than 47 days.

47 days is not bad.

An interesting downside is that a non-security mechanism (resolving domain names) becomes crucial for trusting the initial timestamp.

For the ESP32

My first line of thinking was to send the initial TLS requests without expiration checking but store the validity periods for them. Then when there is a timestamp, retroactively do the checks. For example, resolve the NTS domain name via DoH, then fetch the time, and after the response check both of the certificates.

One complication is the programming model of the ESP32. Certificate validity checking is turned off by default and it is a compile-time setting and AFAIK it can't be turned on or off per request. This makes it tricky to rely only on TLS-based traffic: even the first request needs to have an idea of the time.

It seems like having a first unencrypted NTP request just to get an idea of the time is necessary for the ESP32 chip. Then do the whole time bootstrap process and reject if the NTS timestamp comes out as very different. This way the compile-time TLS date check can be turned on and the only extra step is to check the NTS response against the initial NTP timestamp and then update the clock to the now-trusted NTS response. But even that has to wait as NTS is not yet supported by ESP-IDF.

After getting a known-good time, it is possible to store that into non-volatile storage (NVS) and that can be used to validate timestamps. This provides some assurances that the clock can not be moved backwards between reboots, but it also introduces a new failure mode: what if a time server returns a timestamp that is in the future for some reason? If the device persists that then it will be unable to communicate until that time or needs manual reflashing.

So the process I would implement:

turn on certificate validity checking
get the time using NTP
use DoH to resolve the backend's hostname
use HTTPS to communicate with the backend

I find it a bit ironic that all the above research ended up with this conclusion: just use NTP and rely on the DoH and the backend certificate to detect malfeasance. This is definitely not something that solves the time bootstrap problem. But it seems like an acceptable compromise.

And later I would improve it to:

turn on certificate validity checking
use DoH for all DNS resolution
get the time using Roughtime
use NTS with multiple servers to keep the time up-to-date
use HTTPS to communicate with the backend