Advanced Web Machinery2024-03-15T00:00:00+00:00https://advancedweb.huAdvanced Web MachineryHard-to-debug unhandled rejection cases2024-03-05T00:00:00+00:00https://advancedweb.hu/hard-to-debug-unhandled-rejection-cases<a href="https://advancedweb.hu/hard-to-debug-unhandled-rejection-cases/">(Read this article on the blog)</a><h2 id="unhandled-rejections" tabindex="-1">Unhandled rejections</h2>
<p>In one of my projects I'm building a <a href="https://github.com/sashee/with-file-cache">cache that works across workers</a>. With that library, I can turn any function
into a cached one where the cache key is calculated based on the arguments and if a given key is already finished then the stored result will be returned.</p>
<p>One core synchronization for this type of caching is that the function should only be called once for a given set of arguments. That means if the function is
already running then the caller will just wait until the execution is complete.</p>
<p>It is based on an object that keeps track of in-progress calculations. A simplified version:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> inProgress = {};
<span class="hljs-keyword">const</span> <span class="hljs-title function_">postTask</span> = (<span class="hljs-params">key, fn</span>) => {
<span class="hljs-keyword">if</span> (inProgress[key] === <span class="hljs-literal">undefined</span>) {
<span class="hljs-comment">// start task</span>
<span class="hljs-keyword">return</span> inProgress[key] = <span class="hljs-title function_">fn</span>();
}<span class="hljs-keyword">else</span> {
<span class="hljs-comment">// task already running</span>
<span class="hljs-keyword">return</span> inProgress[key];
}
}
</code></pre>
<p class="plantuml"><img srcset="/assets/d124915f34a3a3a17aaca775f847040772e89ae30a2df93011b41953bbd00e12.png 1.25x"/></p>
<p>Then I wanted to support worker threads as well. This means that if any process is currently calling the function then all parallel calls will wait for the
results, even if a worker thread is doing the calculation. This mechanism builds on
<a href="https://developer.mozilla.org/en-US/docs/Web/API/BroadcastChannel">BroadcastChannel</a> as threads don't share memory. A BroadcastChannel is a cross-context
messaging port that enables global messaging.</p>
<p>But the requires a rather complex messaging protocol between the workers and the main thread. For that, I implemented a coordinator that is run on the main
thread and handles workers' requests to start tasks.</p>
<p>When a worker wants to call the function, it checks first with the coordinator that a call with the arguments are not in progress. If it is, then the worker
needs to wait for the <code>finished</code> signal, if not, then the coordinator create an entry in the <code>inProgress</code> object and waits for the worker to report that
it finished the function.</p>
<p>A simplified code for this:</p>
<pre class="highlight"><code>channel.<span class="hljs-title function_">addEventListener</span>(<span class="hljs-string">"message"</span>, <span class="hljs-function">(<span class="hljs-params">{data: {type, key}}</span>) =></span> {
<span class="hljs-keyword">if</span> (type === <span class="hljs-string">"start"</span>) {
<span class="hljs-keyword">if</span> (inProgress[key] === <span class="hljs-literal">undefined</span>) {
<span class="hljs-comment">// no active calls to the function</span>
inProgress[key] = <span class="hljs-keyword">new</span> <span class="hljs-title class_">Promise</span>(<span class="hljs-function">(<span class="hljs-params">res, rej</span>) =></span> {
<span class="hljs-comment">// listen for finish and finish_error messages from the worker</span>
<span class="hljs-keyword">const</span> <span class="hljs-title function_">handler</span> = (<span class="hljs-params">{data: msg}</span>) => {
<span class="hljs-keyword">if</span> (msg.<span class="hljs-property">key</span> === key && [<span class="hljs-string">"finish"</span>, <span class="hljs-string">"finish_error"</span>].<span class="hljs-title function_">includes</span>(msg.<span class="hljs-property">type</span>)) {
channel.<span class="hljs-title function_">removeEventListener</span>(<span class="hljs-string">"message"</span>, handler);
<span class="hljs-keyword">delete</span> inProgress[key];
<span class="hljs-keyword">if</span> (msg.<span class="hljs-property">type</span> === <span class="hljs-string">"finish_error"</span>) {
<span class="hljs-title function_">rej</span>(msg.<span class="hljs-property">reason</span>);
}<span class="hljs-keyword">else</span> {
<span class="hljs-title function_">res</span>();
}
}
}
channel.<span class="hljs-title function_">addEventListener</span>(<span class="hljs-string">"message"</span>, handler);
channel.<span class="hljs-title function_">postMessage</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"startack"</span>, key});
});
}<span class="hljs-keyword">else</span> {
<span class="hljs-comment">// tell the worker that it's already in progress</span>
channel.<span class="hljs-title function_">postMessage</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"inprogress"</span>, key});
inProgress[key].<span class="hljs-title function_">finally</span>(<span class="hljs-function">() =></span> {
channel.<span class="hljs-title function_">postMessage</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"finished"</span>, key});
});
}
}
});
</code></pre>
<p class="plantuml"><img srcset="/assets/f09bd5e56e8b9349351f36c873480d7134bfcb07102fbc9d328b568df05e5251.png 1.25x"/></p>
<p>This implementation works, for example, starting the task in a worker then calling the function locally won't call it twice:</p>
<pre class="highlight"><code><span class="hljs-comment">// send start and wait for startack</span>
<span class="hljs-keyword">await</span> <span class="hljs-title function_">postToChannel</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"start"</span>, key}, <span class="hljs-string">"startack"</span>);
<span class="hljs-comment">// post local task</span>
<span class="hljs-title function_">postTask</span>(key, <span class="hljs-function">() =></span> {<span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(<span class="hljs-string">"called"</span>)});
<span class="hljs-comment">// finish in the worker</span>
<span class="hljs-keyword">await</span> <span class="hljs-title function_">postToChannel</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"finish"</span>, key});
<span class="hljs-comment">// no calls to the function</span>
</code></pre>
<p>I was happy with the implementation, up until I started writing test cases for rejections. What happens if the function rejects? In that case, the worker will
send a <code>finish_error</code> event, the coordinator rejects the <code>inProgress</code> Promise, and all the calls will be rejected as well, just as expected.</p>
<p>What I did not expect to see is unhandled rejections. And as I subsequently found out, tracking down these rejections is quite challenging and often surprising.</p>
<p>This article describes the three causes of unhandled rejections I encountered while working on this project. Each has different root causes, and posed different
challenges.</p>
<h2 id="case-1-no-reject-handler" tabindex="-1">Case #1: No reject handler</h2>
<p>Let's start with the one that comes with the fewest surprises! If nothing handles a rejection, then it becomes an unhandled rejection. While it seems trivial,
it still bit me.</p>
<p>Usually, rejections behave similarly to exceptions: they go up chain of async functions. This is why I hardly encounter this problem: except for a few
forgotten <code>await</code>, it never happens.</p>
<p>For example, deleting a file but forgetting the <code>await</code> produces an unhandled rejection:</p>
<pre class="highlight"><code>fs.<span class="hljs-title function_">rm</span>(file);
</code></pre>
<p>But adding an <code>await</code> everywhere usually solves this problem:</p>
<pre class="highlight"><code><span class="hljs-keyword">await</span> fs.<span class="hljs-title function_">rm</span>(file);
</code></pre>
<p>In this case, the Promise returned by <code>fs.rm</code> is awaited so the async function will be rejected if it rejects.</p>
<p>So, what went wrong in my use-case?</p>
<p>When a worker calls the function with some arguments the coordinator creates a Promise. This makes it easy for local calls to wait for the result: simply return
this Promise:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> <span class="hljs-title function_">postTask</span> = (<span class="hljs-params">key, fn</span>) => {
<span class="hljs-keyword">if</span> (inProgress[key] === <span class="hljs-literal">undefined</span>) {
<span class="hljs-keyword">return</span> inProgress[key] = (<span class="hljs-keyword">async</span> () => {
<span class="hljs-comment">// ...</span>
})();
}<span class="hljs-keyword">else</span> {
<span class="hljs-comment">// return the Promise</span>
<span class="hljs-keyword">return</span> inProgress[key];
}
}
</code></pre>
<p>That means depending on how many <code>postTask</code> calls a <code>key</code> gets, the Promise will be used zero or more times. The problem case here is the zero. What if
only the worker is running the function? In that case, the <code>inProgress[key]</code> Promise will be rejected but without anything to handle it.</p>
<pre class="highlight"><code><span class="hljs-comment">// send start, wait for startack</span>
<span class="hljs-keyword">await</span> <span class="hljs-title function_">postToChannel</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"start"</span>, key}, <span class="hljs-string">"startack"</span>);
<span class="hljs-comment">// send finish_error</span>
<span class="hljs-keyword">await</span> <span class="hljs-title function_">postToChannel</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"finish_error"</span>, key, <span class="hljs-attr">reason</span>: <span class="hljs-string">"failed #3"</span>});
<span class="hljs-comment">// unhandled rejection</span>
</code></pre>
<p class="plantuml"><img srcset="/assets/36e61c4546dacf911da4dde07740f5d16017b0a2e51dca3f4b75eae4556b0c34.png 1.25x"/></p>
<p>The solution is rather simple after figuring out the cause: make sure that at least one rejection handler is always attached:</p>
<pre class="highlight"><code>channel.<span class="hljs-title function_">addEventListener</span>(<span class="hljs-string">"message"</span>, <span class="hljs-function">(<span class="hljs-params">{data: {type, key}}</span>) =></span> {
<span class="hljs-keyword">if</span> (type === <span class="hljs-string">"start"</span>) {
<span class="hljs-keyword">if</span> (inProgress[key] === <span class="hljs-literal">undefined</span>) {
<span class="hljs-comment">// no active calls to the function</span>
inProgress[key] = <span class="hljs-keyword">new</span> <span class="hljs-title class_">Promise</span>(<span class="hljs-function">(<span class="hljs-params">res, rej</span>) =></span> {
<span class="hljs-comment">// ...</span>
});
inProgress[key].<span class="hljs-title function_">catch</span>(<span class="hljs-function">() =></span> {}); <span class="hljs-comment">// <== attach a rejection handler</span>
}<span class="hljs-keyword">else</span> {
<span class="hljs-comment">// ...</span>
}
}
});
</code></pre>
<h2 id="case-2-promise-finally" tabindex="-1">Case #2: Promise.finally</h2>
<p>When a worker wants to start working on a task, it needs to inform the coordinator about that with a <code>start</code> message. Then it receives either a
<code>startack</code> so that it can call the function, or an <code>inprogress</code> so that another thread is already calling the function. After the <code>inprogress</code>, the
worker then needs to wait for a <code>finished</code> message telling it that the result is ready.</p>
<p class="plantuml"><img srcset="/assets/da82c0f12308f5dd42c5f5d2c5fe50372531279ba87f244811a29f808b6cc1db.png 1.25x"/></p>
<p>This is sent by the coordinator:</p>
<pre class="highlight"><code>channel.<span class="hljs-title function_">addEventListener</span>(<span class="hljs-string">"message"</span>, <span class="hljs-function">(<span class="hljs-params">{data: {type, key}}</span>) =></span> {
<span class="hljs-keyword">if</span> (type === <span class="hljs-string">"start"</span>) {
<span class="hljs-keyword">if</span> (inProgress[key] === <span class="hljs-literal">undefined</span>) {
<span class="hljs-comment">// ...</span>
}<span class="hljs-keyword">else</span> {
channel.<span class="hljs-title function_">postMessage</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"inprogress"</span>, key});
inProgress[key].<span class="hljs-title function_">finally</span>(<span class="hljs-function">() =></span> {
channel.<span class="hljs-title function_">postMessage</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"finished"</span>, key});
});
}
}
});
</code></pre>
<p>The above implementation is wrong. If the function call rejects there will be an unhandled rejection:</p>
<pre class="highlight"><code><span class="hljs-comment">// worker1 starts and waits for the startack</span>
<span class="hljs-keyword">await</span> <span class="hljs-title function_">postToChannel</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"start"</span>, key}, <span class="hljs-string">"startack"</span>);
<span class="hljs-comment">// worker2 starts and gets an inprogress</span>
<span class="hljs-keyword">await</span> <span class="hljs-title function_">postToChannel</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"start"</span>, key}, <span class="hljs-string">"inprogress"</span>);
<span class="hljs-comment">// worker1 finishes with error</span>
<span class="hljs-keyword">await</span> <span class="hljs-title function_">postToChannel</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"finish_error"</span>, key, <span class="hljs-attr">reason</span>: <span class="hljs-string">"failed #4"</span>});
<span class="hljs-comment">// unhandled rejection</span>
</code></pre>
<p>Why is an unhandled rejection throw there? It turns out that if the Promise is rejected then the one returned by <code>finally</code> is also rejected. And since it's
not handled, it becomes an unhandled rejection.</p>
<p>The solution? Make sure that it can't reject:</p>
<pre class="highlight"><code>inProgress[key].<span class="hljs-title function_">catch</span>(<span class="hljs-function">() =></span> {}).<span class="hljs-title function_">then</span>(<span class="hljs-function">() =></span> {
channel.<span class="hljs-title function_">postMessage</span>({<span class="hljs-attr">type</span>: <span class="hljs-string">"finished"</span>, key});
});
</code></pre>
<p>It is usually not a problem as the Promise is usually returned and awaited on.</p>
<h2 id="case-3-late-return" tabindex="-1">Case #3: Late return</h2>
<p>The third one I encountered during writing test code for the coordinator.</p>
<p>In the library, when a task is posted the code first reads the filesystem to see if the result is already saved there. If not, then it proceeds with calling the
function.</p>
<p>A simplified version:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> <span class="hljs-title function_">postTask</span> = (<span class="hljs-params">key, fn</span>) => {
<span class="hljs-keyword">if</span> (inProgress[key] === <span class="hljs-literal">undefined</span>) {
<span class="hljs-keyword">return</span> inProgress[key] = (<span class="hljs-keyword">async</span> () => {
<span class="hljs-comment">// do something async...</span>
<span class="hljs-keyword">await</span> <span class="hljs-built_in">setTimeout</span>(<span class="hljs-number">1</span>);
<span class="hljs-keyword">return</span> <span class="hljs-title function_">fn</span>();
})();
}<span class="hljs-keyword">else</span> {
<span class="hljs-keyword">return</span> inProgress[key];
}
}
</code></pre>
<p>In the tests I wanted to control the series of events. For that, I usually use the Promise with the resolve/reject functions extracted, for which the
<code>Promise.withResolvers</code> syntactic sugar is coming.</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> {promise, resolve} = <span class="hljs-title function_">withResolvers</span>();
<span class="hljs-comment">// task returns the promise</span>
<span class="hljs-keyword">const</span> result = <span class="hljs-title function_">postTask</span>(key, <span class="hljs-function">() =></span> promise);
<span class="hljs-comment">// ... other steps</span>
<span class="hljs-comment">// finish the task</span>
<span class="hljs-title function_">resolve</span>();
<span class="hljs-keyword">await</span> result;
</code></pre>
<p>This works fine when the Promise is resolved. But when it rejects, it raises an unhandled rejection:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> {promise, reject} = <span class="hljs-title function_">withResolvers</span>();
<span class="hljs-comment">// post task</span>
<span class="hljs-keyword">const</span> result = <span class="hljs-title function_">postTask</span>(key, <span class="hljs-function">() =></span> promise);
<span class="hljs-comment">// ... do other steps</span>
result.<span class="hljs-title function_">catch</span>(<span class="hljs-function">() =></span> {});
<span class="hljs-comment">// reject</span>
<span class="hljs-title function_">reject</span>(<span class="hljs-string">"failed #6"</span>);
</code></pre>
<p>The interesting part is that the <code>result</code> is properly rejected, and before rejection there is a <code>catch</code> handler attached to it. So, where the unhandled
rejection comes from?</p>
<p>The problem is the order of operations here. Since the <code>postTask</code> does not immediately call the function, the <code>reject()</code> runs first. In a sizeable
codebase it was not easy to find this, but putting the two parts next to each other makes it more visible:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> <span class="hljs-title function_">postTask</span> = (<span class="hljs-params">key, fn</span>) => {
<span class="hljs-keyword">if</span> (inProgress[key] === <span class="hljs-literal">undefined</span>) {
<span class="hljs-keyword">return</span> inProgress[key] = (<span class="hljs-keyword">async</span> () => {
<span class="hljs-comment">// do something async...</span>
<span class="hljs-keyword">await</span> <span class="hljs-built_in">setTimeout</span>(<span class="hljs-number">1</span>);
<span class="hljs-keyword">return</span> <span class="hljs-title function_">fn</span>();
})();
}<span class="hljs-keyword">else</span> {
<span class="hljs-keyword">return</span> inProgress[key];
}
}
<span class="hljs-keyword">const</span> result = <span class="hljs-title function_">postTask</span>(key, <span class="hljs-function">() =></span> promise);
<span class="hljs-title function_">reject</span>(<span class="hljs-string">"failed #6"</span>);
</code></pre>
<p>In the example, the <code>setTimeout(1)</code> delays calling the <code>fn()</code> so that <code>reject()</code> runs before that. Without a rejection handler, it will raise an
unhandled rejection.</p>
<p class="plantuml"><img srcset="/assets/9c44c738131880cbca9b6f249c1c14db9352aeb5f2ad17f677ab7ad2135b1539.png 1.25x"/></p>
<p>To solve it, I needed to make sure that the function was already called when doing the rejection:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> {promise, reject} = <span class="hljs-title function_">withResolvers</span>();
<span class="hljs-keyword">const</span> {<span class="hljs-attr">promise</span>: calledPromise, <span class="hljs-attr">resolve</span>: calledResolve} = <span class="hljs-title function_">withResolvers</span>();
<span class="hljs-keyword">const</span> result = <span class="hljs-title function_">postTask</span>(key, <span class="hljs-function">() =></span> {
<span class="hljs-comment">// resolve calledPromise</span>
<span class="hljs-title function_">calledResolve</span>();
<span class="hljs-keyword">return</span> promise;
});
<span class="hljs-comment">// wait until the task function is called</span>
<span class="hljs-keyword">await</span> calledPromise;
<span class="hljs-comment">// then reject</span>
<span class="hljs-title function_">reject</span>(<span class="hljs-string">"failed #7"</span>);
</code></pre>
How to transfer binary data efficiently across worker threads in NodeJs2024-02-20T00:00:00+00:00https://advancedweb.hu/how-to-transfer-binary-data-efficiently-across-worker-threads-in-nodejs<a href="https://advancedweb.hu/how-to-transfer-binary-data-efficiently-across-worker-threads-in-nodejs/">(Read this article on the blog)</a><h2 id="postmessage-calls" tabindex="-1">postMessage calls</h2>
<p>Worker communication uses <code>postMessage</code> calls, and values passed to it will be cloned using the <a href="https://developer.mozilla.org/en-US/docs/Web/API/structuredClone">structured clone
algorithm</a>. This makes it easy to send objects, string, arrays, numbers, and even supports
circular references and Dates.</p>
<p>But what if the data is binary data in a Buffer? For example, an image that the worker generated, a PDF file, or a ZIP.</p>
<p>I got curious then: what are the ways to send binary data via <code>postMessage</code>? And what is the best one?</p>
<h2 id="sending-a-buffer" tabindex="-1">Sending a Buffer</h2>
<p>Let's start with the baseline: what happens if the Buffer is sent directly?</p>
<pre class="highlight"><code>parentPort.<span class="hljs-title function_">postMessage</span>(buffer);
</code></pre>
<p>Where <code>buffer</code> is a Buffer with some bytes:</p>
<pre class="highlight"><code><Buffer 72 61 77 20 62 75 66 66 65 72>
</code></pre>
<p>In this case, the other side gets an Uint8Array:</p>
<pre class="highlight"><code>Uint8Array(10) [
114, 97, 119, 32,
98, 117, 102, 102,
101, 114
]
</code></pre>
<p>The two objects contain the same bytes, just the <code><Buffer ...></code> is in hexadecimal and the <code>Uint8Array() [...]</code> is in decimal (7 * 16 + 2 = 114).</p>
<p>On the receiving end, it's easy to convert back to a Buffer: <code>Buffer.from(uint8array)</code>.</p>
<p>While the bytes are copied from one end to the other, this solution is easy: just return the Buffer and the data makes it to the other side. The downside is
that it changes the type of the value (Buffer to Uint8Array) and that it will have a performance impact for larger data.</p>
<h2 id="arraybuffer" tabindex="-1">ArrayBuffer</h2>
<p>While a Buffer is not transferable, <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Transferable_objects#supported_objects">an ArrayBuffer is</a>:</p>
<blockquote>
<p>The items that various specifications indicate can be transferred are:</p>
<ul>
<li>ArrayBuffer</li>
</ul>
<p>...</p>
</blockquote>
<p>So, what is an ArrayBuffer and how does it relate to Buffer and Uint8Array?</p>
<p>A Buffer and the typed arrays (for example, an Uint8Array) is a view over an ArrayBuffer. This means the ArrayBuffer holds the bytes underlying the other
objects. For example, <code>Buffer.buffer</code> returns the ArrayBuffer that backs the Buffer.</p>
<p>ArrayBuffer is transferable, which means it can be sent via <code>postMessage</code> without copying. How it works is transferring the object makes it unusable on the
sending end, i.e. no reading or writing it after that point.</p>
<p>While it's tempting to go and transfer the ArrayBuffer of a Buffer, it is not correct. Since the Buffer is a view over the ArrayBuffer, the latter can contain
more data. This is especially apparent for small Buffers:</p>
<pre class="highlight"><code><span class="hljs-title class_">Buffer</span>.<span class="hljs-title function_">from</span>(<span class="hljs-string">"abc"</span>, <span class="hljs-string">"utf8"</span>).<span class="hljs-property">buffer</span>
<span class="hljs-title class_">ArrayBuffer</span> {
[<span class="hljs-title class_">Uint8Contents</span>]: <span class="language-xml"><2f 00 00 00 00 00 00 00 61 62 63 00 00 00 00 00 61 62 63 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 8092 more bytes>,
byteLength: 8192
}
</span></code></pre>
<p>The Buffer has 2 properties that define which part of the underlying ArrayBuffer it uses: <code>byteOffset</code> and <code>length</code>:</p>
<pre class="highlight"><code>> buffer.byteOffset
128
> buffer.length
3
</code></pre>
<p>Part of the ArrayBuffer can be copied to a new ArrayBuffer using the <code>slice</code> method. With that, it's possible to construct one that contains exactly the
bytes needed to transfer.</p>
<p>As an extra performance trick, if the binary data is large enough the ArrayBuffer contains no extra bytes:</p>
<pre class="highlight"><code>> Buffer.from("a".padStart(10000, " ")).buffer
ArrayBuffer {
[Uint8Contents]: <20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 ... 9900 more bytes>,
byteLength: 10000
}
</code></pre>
<p>In that case, there is no need for copying it. And since that happens for larger arrays, it makes sure that copying does not happen when its impact is the
biggest.</p>
<p>To get an ArrayBuffer from a Buffer that contains only the necessary bytes:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> arrayBuf = (<span class="hljs-function">() =></span> {
<span class="hljs-keyword">if</span> (buffer.<span class="hljs-property">byteOffset</span> === <span class="hljs-number">0</span> && buffer.<span class="hljs-property">byteLength</span> === buffer.<span class="hljs-property">buffer</span>.<span class="hljs-property">byteLength</span>) {
<span class="hljs-comment">// no extra bytes, return the ArrayBuffer</span>
<span class="hljs-keyword">return</span> buffer.<span class="hljs-property">buffer</span>;
}<span class="hljs-keyword">else</span> {
<span class="hljs-comment">// copy the relevant part to a new ArrayBuffer</span>
<span class="hljs-keyword">return</span> buffer.<span class="hljs-property">buffer</span>.<span class="hljs-title function_">slice</span>(buffer.<span class="hljs-property">byteOffset</span>, buffer.<span class="hljs-property">byteOffset</span> + buffer.<span class="hljs-property">byteLength</span>);
}
})();
</code></pre>
<p>Finally, transfer it:</p>
<pre class="highlight"><code>parentPort.<span class="hljs-title function_">postMessage</span>(arrayBuf, [arrayBuf]);
</code></pre>
<p>The receiving end gets an ArrayBuffer that can be converted back to a Buffer:</p>
<pre class="highlight"><code><span class="hljs-title class_">Buffer</span>.<span class="hljs-title function_">from</span>(arrayBuf);
</code></pre>
<p>This implements an efficient, in the larger cases, zero-copy transfer between contexts.</p>
<h2 id="readablestream" tabindex="-1">ReadableStream</h2>
<p>A surprising transferable object is a ReadableStream. In this case, the receiving end can read bytes on demand which can provide an even more efficient way of
transfer because:</p>
<ul>
<li>the result can be larger than the available memory</li>
<li>the receiving end might not want to read the whole stream</li>
</ul>
<p>In NodeJS there is a mess at this moment in regards to streams. There is the <a href="https://nodejs.org/api/stream.html#class-streamreadable">Readable</a> that implements
readable streams in a Node-specific way, and then there is the <a href="https://nodejs.org/api/webstreams.html#class-readablestream">ReadableStream</a> (a.k.a. web
streams) that is more cross-platform. The reasoning, from the documentation:</p>
<blockquote>
<p>It is similar to the Node.js Streams API but emerged later and has become the "standard" API for streaming data across many JavaScript environments.</p>
</blockquote>
<p>The ReadableStream is transferable and the mechanism is similar to how the ArrayBuffer works: once transferred, it can not be used on the sending side anymore.</p>
<p>To construct a ReadableStream from a Buffer:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> readableStream = <span class="hljs-title class_">Readable</span>.<span class="hljs-title function_">toWeb</span>(<span class="hljs-title class_">Readable</span>.<span class="hljs-title function_">from</span>(buffer));
parentPort.<span class="hljs-title function_">postMessage</span>(readableStream, [readableStream])
</code></pre>
<p>And to read the contents:</p>
<pre class="highlight"><code><span class="hljs-keyword">import</span> {buffer} <span class="hljs-keyword">from</span> <span class="hljs-string">"node:stream/consumers"</span>;
<span class="hljs-keyword">await</span> <span class="hljs-title function_">buffer</span>(msg);
</code></pre>
<p>Of all the possible ways to pass binary data, this is the most resource-friendly. On the other hand, reading a transferred stream requires the worker running.
If it is terminated or otherwise stopped, the stream becomes unreadable.</p>
<p>This is especially problematic when the worker is used with a thread pool, as it usually builds a clear "end state" for the workers after which the pool manager
is free to reuse or terminate them.</p>
<p>Nevertheless, I found it interesting that streams can be transferred as well.</p>
<div class="internal_link_box my-5 p-3 pt-3 position-relative">
<div class="box-title text-monospace position-absolute py-1 px-3">
Related
</div>
<p><div class="row">
<div class="col-md-5 box-image mb-3">
<a href="/using-worker-pools-in-nodejs/">
<img class="w-100 h-100 img-bordered" srcset="/assets/049c11d62216f22712e83df2e626caabcfc26283dde8c4b60eb08b968caf6204.jpg 1x"/>
</a>
</div>
<div class="col-md-7">
<div class="h4 box-element-title mt-0 mb-0"><a href="/using-worker-pools-in-nodejs/">Using worker pools in NodeJs</a></div>
<div class="box-element-excerpt mt-2">
How to implement true parallelization
</div>
</div>
</div></p>
</div>
Using worker pools in NodeJs2024-02-06T00:00:00+00:00https://advancedweb.hu/using-worker-pools-in-nodejs<a href="https://advancedweb.hu/using-worker-pools-in-nodejs/">(Read this article on the blog)</a><h2 id="why-workers" tabindex="-1">Why workers</h2>
<p>JavaScript has a single thread model, which means that whatever code you write will be run by only one CPU core. It is nicely encapsulated in <a href="http://debuggable.com/posts/understanding-node-js:4bd98440-45e4-4a9a-8ef7-0f7ecbdd56cb">this
quote</a>:</p>
<blockquote>
<p>everything runs in parallel, except your code</p>
</blockquote>
<p>This makes programming easier as you no longer need to worry about things like "what happens between these two lines?" because it is guaranteed that no other
code will be run. A <code>click</code> handler writes an object? Because of the single-thread model it can not interfering with other code, avoiding a massive headache
that is present in most other languages.</p>
<blockquote>
<p>"So I don't have to worry about code accessing the same data structures at the same time?"</p>
<p>You got it! That's the entire beauty of JavaScripts single-threaded / event loop design!</p>
</blockquote>
<h3 id="impact-on-performance" tabindex="-1">Impact on performance</h3>
<p>In practice, most NodeJS code is limited by not the CPU but by waiting for other things: a webserver needs to send a network request to the database to fetch
some data or to read some file from the filesystem. In the code, these are short-running operations that kick off some parallel code. Even though the JavaScript
code is single threaded it can drive many cores.</p>
<p>That's the case for most backend applications: a request comes in, it needs some validation, a few calls to the database, then some transformation, and finally,
send back the response. The percentage to run the JavaScript code compared to serving the whole request is very small, making this setup able to serve many
requests in parallel.</p>
<p>But in other cases the single thread is a limitation. When the JavaScript code is doing a lot of CPU-intensive tasks, such as parsing documents, then it's
common to see ~20% utilization even though the code is computing as fast as it could. Here, the limiting factor is the single CPU core that the JavaScript code
can use.</p>
<h3 id="worker-threads" tabindex="-1">Worker threads</h3>
<p>Adding concurrency into the language is not simple. While it seems straightforward to add the ability to start a new thread, similar to other languages, that
would go counter to the simplified programming model of NodeJS. To keep the single-threadness but also add the ability for parallel processing, NodeJS provides
<em>worker threads</em>.</p>
<p>Each worker has its own context and behaves as normal JS code: it has a single thread running independently from other workers. This solves the CPU utilization
problem, but communication between threads becomes tricky. Sending data between threads can not be based on shared memory as that would nullify the simplified
programming model. Instead, workers need to rely on events to send data to each other.</p>
<p>The way a worker can communicate with other workers or the main thread is via <code>postMessage</code> calls. Then the other end can add a listener that gets called
sometime after a message is received:</p>
<pre class="highlight"><code><span class="hljs-comment">// index.mjs</span>
<span class="hljs-keyword">import</span> {<span class="hljs-title class_">Worker</span>} <span class="hljs-keyword">from</span> <span class="hljs-string">"node:worker_threads"</span>;
<span class="hljs-keyword">const</span> worker = <span class="hljs-keyword">new</span> <span class="hljs-title class_">Worker</span>(<span class="hljs-string">"./worker.mjs"</span>);
worker.<span class="hljs-title function_">addListener</span>(<span class="hljs-string">"message"</span>, <span class="hljs-function">(<span class="hljs-params">msg</span>) =></span> {
<span class="hljs-comment">// handle message</span>
});
</code></pre>
<pre class="highlight"><code><span class="hljs-comment">// worker.mjs</span>
<span class="hljs-keyword">import</span> {parentPort} <span class="hljs-keyword">from</span> <span class="hljs-string">"node:worker_threads"</span>;
parentPort.<span class="hljs-title function_">postMessage</span>(<span class="hljs-string">"Hello world!"</span>);
</code></pre>
<div class="internal_link_box my-5 p-3 pt-3 position-relative">
<div class="box-title text-monospace position-absolute py-1 px-3">
Related
</div>
<p><div class="row">
<div class="col-md-5 box-image mb-3">
<a href="/how-to-use-async-await-with-postmessage/">
<img class="w-100 h-100 img-bordered" srcset="/assets/e6d223c56e4ee0d27384c340f03af909eda9ef4033f79ced457f9b074aa6dc91.jpg 1x"/>
</a>
</div>
<div class="col-md-7">
<div class="h4 box-element-title mt-0 mb-0"><a href="/how-to-use-async-await-with-postmessage/">How to use async/await with postMessage</a></div>
<div class="box-element-excerpt mt-2">
Use MessageChannel and error propagation to use async/await with cross-context communication
</div>
</div>
</div></p>
</div>
<h2 id="using-a-worker-pool" tabindex="-1">Using a worker pool</h2>
<p>While it's possible to run worker threads all doing their own things and sometimes sending messages to each other, this is not how usually they are used.
Instead, workers are used to offload CPU-intensive tasks, making their lifecycle tied to the operation. For example, if the main thread needs to parse an HTML
document and find various elements in it, it can create a new worker, hand off the processing to it, and then terminate the worker when it's done.</p>
<p>For this use-case it's better to have a pool of workers waiting for tasks and have a request-response-style communication between the main thread and the
worker. This way, a pool manager can schedule tasks to available workers, keep track of a queue, and also start and stop threads when needed.</p>
<p>While worker threads are built-in into Node, managing a pool requires either custom code or a library. I searched for existing projects that provide this
functionality and there are a few:</p>
<ul>
<li><a href="https://www.npmjs.com/package/piscina">Piscina</a></li>
<li><a href="https://www.npmjs.com/package/workerpool">workerpool</a></li>
<li>and a couple of others with various popularity and updates</li>
</ul>
<p>I decided to go with Piscina, mainly because it only supports NodeJS worker threads so probably there is less mental overhead needed for it.</p>
<p>An efficient way to implement workers is to separate the relevant code into 2 parts:</p>
<ul>
<li>the runner code which is used by the main thread</li>
<li>the worker code that is the glue code between the worker and the rest of the codebase</li>
</ul>
<h3 id="worker-runner" tabindex="-1">Worker runner</h3>
<p>First, create a pool of workers:</p>
<pre class="highlight"><code><span class="hljs-keyword">import</span> <span class="hljs-title class_">Piscina</span> <span class="hljs-keyword">from</span> <span class="hljs-string">"piscina"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> pool = <span class="hljs-title class_">Piscina</span>.<span class="hljs-property">isWorkerThread</span> ? <span class="hljs-literal">undefined</span> : <span class="hljs-keyword">new</span> <span class="hljs-title class_">Piscina</span>({
<span class="hljs-attr">filename</span>: path.<span class="hljs-title function_">resolve</span>(__dirname, <span class="hljs-string">"worker.js"</span>),
});
</code></pre>
<p>To make sure that the worker won't spawn more workers, effectively fork-bombing the process, it checks whether it is imported by the main thread or not. As the
<code>pool</code> can be undefined, it needs a check whenever it's used.</p>
<h3 id="worker-code" tabindex="-1">Worker code</h3>
<p>The <code>worker.ts</code> then contains the functions that are the API of the worker. For example, I had a <code>validateFile</code> function that I wanted to run in the
worker:</p>
<pre class="highlight"><code><span class="hljs-keyword">import</span> {validateFile <span class="hljs-keyword">as</span> validateFileOrig} <span class="hljs-keyword">from</span> <span class="hljs-string">"./validate-file.js"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> <span class="hljs-title function_">validateFile</span> = <span class="hljs-keyword">async</span> (<span class="hljs-params">{baseUrl, url, res, roles}: {...}</span>) => {
<span class="hljs-keyword">return</span> <span class="hljs-title function_">validateFileOrig</span>(baseUrl, url, res, roles);
};
</code></pre>
<p>Notice that the worker's function has only 1 argument: the object with the <code>baseUrl</code>, the <code>url</code>, <code>res</code>, and <code>roles</code>, but the original function
has these as separate ones. This is because Piscina allows passing only one argument to the worker function.</p>
<h2 id="calling-the-worker" tabindex="-1">Calling the worker</h2>
<p>To call the function in the worker:</p>
<pre class="highlight"><code><span class="hljs-keyword">import</span> {pool} <span class="hljs-keyword">from</span> <span class="hljs-string">"./worker-runner.js"</span>;
<span class="hljs-keyword">import</span> {validateFile} <span class="hljs-keyword">from</span> <span class="hljs-string">"./worker.js"</span>;
<span class="hljs-keyword">const</span> allDocumentErrors = <span class="hljs-keyword">await</span> pool.<span class="hljs-title function_">run</span>(
{baseUrl, url, res, roles} <span class="hljs-keyword">as</span> <span class="hljs-title class_">Parameters</span><<span class="hljs-keyword">typeof</span> validateFile>[<span class="hljs-number">0</span>],
{<span class="hljs-attr">name</span>: validateFile.<span class="hljs-property">name</span>}
) <span class="hljs-keyword">as</span> <span class="hljs-title class_">ReturnType</span><<span class="hljs-keyword">typeof</span> validateFile>;
</code></pre>
<p>The nice thing is that type hints make this call type-safe. The parameters are checked using the <code>Parameters<typeof validateFile>[0]</code> and the compiler will
throw an error if it does not match. Then the result value is casted to the correct type with the <code>ReturnType<typeof validateFile></code>. With these, whenever
the function in the <code>worker.ts</code> changes, all the usages will result in a compile-time error.</p>
<h2 id="transferable-objects" tabindex="-1">Transferable objects</h2>
<p>The <code>postMessage</code> uses the <a href="https://developer.mozilla.org/en-US/docs/Web/API/structuredClone">structured clone algorithm</a>, which is a better version of
<code>JSON.parse(JSON.stringify(...))</code>. It supports circular references, Dates, TypedArray, Sets, Maps, and a few other types.</p>
<p>What is missing is functions. That means while it's better than plain JSON, it's still limited to rather simple types.</p>
<div class="internal_link_box my-5 p-3 pt-3 position-relative">
<div class="box-title text-monospace position-absolute py-1 px-3">
Related
</div>
<p><div class="row">
<div class="col-md-5 box-image mb-3">
<a href="/how-to-transfer-binary-data-efficiently-across-worker-threads-in-nodejs/">
<img class="w-100 h-100 img-bordered" srcset="/assets/19ca0ee03dac4cc601292e32ba441d7e53214bb9aeb8df964a0c17e940bae6e8.jpg 1x"/>
</a>
</div>
<div class="col-md-7">
<div class="h4 box-element-title mt-0 mb-0"><a href="/how-to-transfer-binary-data-efficiently-across-worker-threads-in-nodejs/">How to transfer binary data efficiently across worker threads in NodeJs</a></div>
<div class="box-element-excerpt mt-2">
How to return a Buffer from a worker
</div>
</div>
</div></p>
</div>
Modern JavaScript library starter2024-01-23T00:00:00+00:00https://advancedweb.hu/modern-javascript-library-starter<a href="https://advancedweb.hu/modern-javascript-library-starter/">(Read this article on the blog)</a><h2 id="publishing-a-library" tabindex="-1">Publishing a library</h2>
<p>Back then when I wanted to write and publish a JavaScript library, all I had to do is to create a new GitHub project, write a package.json with some basic
details, add an <code>index.js</code>, and publish to NPM via the CLI. But this simple setup misses a lot of new things that are considered essentials: no types, no
CI/CD, no tests, to name a few.</p>
<p>So the last time I needed to start a new JavaScript library I spent some time setting up the basics and then realized that these steps are mostly generic and
can be reused across different projects. This article is a documentation of the different aspects needed to develop and publish a modern library.</p>
<p>More specifically, I wanted these features:</p>
<ul>
<li>the library is written in TypeScript with types published in the package</li>
<li>there are tests, also written in TypeScript</li>
<li>a CI pipeline runs for commits building and running the tests</li>
<li>a CD pipeline is run for every new version publishing to the NPM registry</li>
</ul>
<h2 id="starting-code" tabindex="-1">Starting code</h2>
<p>The important files are some configuration, the package source, and the tests:</p>
<pre class="highlight"><code>src/index.ts
src/index.test.ts
package.json
tsconfig.json
</code></pre>
<p>Since there is a compile step, the sources and the compiled files are in different directories. While the <code>.ts</code> files are in <code>src/</code>, the target for the
compilation go to <code>dist/</code>.</p>
<p>The <code>package.json</code>:</p>
<pre class="highlight"><code><span class="hljs-punctuation">{</span>
<span class="hljs-comment">// name, version, description, other data</span>
<span class="hljs-attr">"main"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"dist/index.js"</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"type"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"module"</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"files"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">[</span>
<span class="hljs-string">"dist"</span>
<span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"devDependencies"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
<span class="hljs-attr">"ts-node"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"^10.9.2"</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"typescript"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"^5.3.3"</span>
<span class="hljs-punctuation">}</span>
<span class="hljs-punctuation">}</span>
</code></pre>
<p>The <code>files</code> define the <code>dist</code> as only the compiled files will be packaged and pushed to the NPM registry. Then the <code>main: "dist/index.js"</code> defines
the entry point.</p>
<p>The <code>tsconfig.json</code> configures the TypeScript compiler:</p>
<pre class="highlight"><code><span class="hljs-punctuation">{</span>
<span class="hljs-attr">"compilerOptions"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
<span class="hljs-attr">"noEmitOnError"</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"strict"</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"sourceMap"</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"target"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"es6"</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"module"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"nodenext"</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"moduleResolution"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"nodenext"</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"declaration"</span><span class="hljs-punctuation">:</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"outDir"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"dist"</span>
<span class="hljs-punctuation">}</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"include"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">[</span>
<span class="hljs-string">"src/**/*.*"</span>
<span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"exclude"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">[</span>
<span class="hljs-string">"**/*.test.ts"</span>
<span class="hljs-punctuation">]</span>
<span class="hljs-punctuation">}</span>
</code></pre>
<p>Depending on the project a lot of different configurations are possible, but the important parts are that the files in the <code>src/</code> folder is included but not
the tests, and the <code>outDir</code> is <code>dist</code>.</p>
<p>Then the <code>index.ts</code> and the <code>index.test.ts</code> files are simple, just to demonstrate that the library works:</p>
<pre class="highlight"><code><span class="hljs-comment">// src/index.ts</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> <span class="hljs-title function_">test</span> = (<span class="hljs-params">value: <span class="hljs-built_in">string</span></span>) => {
<span class="hljs-keyword">return</span> <span class="hljs-string">"Hello "</span> + value;
}
</code></pre>
<pre class="highlight"><code><span class="hljs-comment">// src/index.test.ts</span>
<span class="hljs-keyword">import</span> test <span class="hljs-keyword">from</span> <span class="hljs-string">"node:test"</span>;
<span class="hljs-keyword">import</span> { strict <span class="hljs-keyword">as</span> assert } <span class="hljs-keyword">from</span> <span class="hljs-string">"node:assert"</span>;
<span class="hljs-keyword">import</span> {test <span class="hljs-keyword">as</span> lib} <span class="hljs-keyword">from</span> <span class="hljs-string">"./index.js"</span>;
<span class="hljs-title function_">test</span>(<span class="hljs-string">'synchronous passing test'</span>, <span class="hljs-function">(<span class="hljs-params">t</span>) =></span> {
<span class="hljs-keyword">const</span> result = <span class="hljs-title function_">lib</span>(<span class="hljs-string">"World"</span>);
assert.<span class="hljs-title function_">strictEqual</span>(result, <span class="hljs-string">"Hello World"</span>);
});
</code></pre>
<p>Notice the <code>import ... from "./index.js"</code> line. While the file has <code>.ts</code> extension, importing is done using the <code>.js</code>.</p>
<h3 id="npm-scripts" tabindex="-1">NPM scripts</h3>
<p>Next, configure the <code>scripts</code> in the <code>package.json</code>.</p>
<p>First are the <code>build</code> and <code>clean</code>:</p>
<pre class="highlight"><code><span class="hljs-punctuation">{</span>
<span class="hljs-attr">"scripts"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
<span class="hljs-attr">"build"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"tsc --build"</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"clean"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"tsc --build --clean"</span>
<span class="hljs-punctuation">}</span>
<span class="hljs-punctuation">}</span>
</code></pre>
<p>These simply call the <code>tsc</code> to compile TypeScript to JavaScript:</p>
<pre class="highlight"><code>$ npm run build
> website-validator@0.0.8 build
> tsc --build
$ <span class="hljs-built_in">ls</span> dist
index.d.ts index.js index.js.map
</code></pre>
<p>Next, the <code>prepare</code> script runs the build when the package is being published. This is a special name as <code>npm</code> calls it at <a href="https://docs.npmjs.com/cli/v10/using-npm/scripts#prepare-and-prepublish">different parts of the
lifecycle</a>:</p>
<pre class="highlight"><code><span class="hljs-punctuation">{</span>
<span class="hljs-attr">"scripts"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
<span class="hljs-attr">"prepare"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"npm run clean && npm run build"</span>
<span class="hljs-punctuation">}</span>
<span class="hljs-punctuation">}</span>
</code></pre>
<h3 id="tests" tabindex="-1">Tests</h3>
<p>Next, configure automated tests. For this, I found that it's easier to not compile the test code but use a library that auto-complies TS files when needed. This
is where the <code>ts-node</code> dependency comes into play.</p>
<p>Because of this, the <code>test</code> script does not need to run the <code>build</code>:</p>
<pre class="highlight"><code><span class="hljs-punctuation">{</span>
<span class="hljs-attr">"scripts"</span><span class="hljs-punctuation">:</span> <span class="hljs-punctuation">{</span>
<span class="hljs-attr">"test"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"node --test --loader ts-node/esm src/**/*.test.ts"</span>
<span class="hljs-punctuation">}</span>
<span class="hljs-punctuation">}</span>
</code></pre>
<p>The <code>--loader ts-node/esm</code> attaches the <code>ts-node</code> to the node module resolution process and that compiles <code>.ts</code> files whenever they are imported.
This makes testing setup super easy: no compilation, just running.</p>
<pre class="highlight"><code>$ npm <span class="hljs-built_in">test</span>
> website-validator@0.0.8 <span class="hljs-built_in">test</span>
> node --<span class="hljs-built_in">test</span> --loader ts-node/esm src/**/*.test.ts
(node:245543) ExperimentalWarning: `--experimental-loader` may be removed <span class="hljs-keyword">in</span> the future; instead use `register()`:
--import <span class="hljs-string">'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("ts-node/esm", pathToFileURL("./"));'</span>
(Use `node --trace-warnings ...` to show <span class="hljs-built_in">where</span> the warning was created)
✔ synchronous passing <span class="hljs-built_in">test</span> (1.01411ms)
ℹ tests 1
ℹ suites 0
ℹ pass 1
ℹ fail 0
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 2650.590767
</code></pre>
<h2 id="continuous-integration" tabindex="-1">Continuous integration</h2>
<p>Now that we have all the scripts in place for the library, it's time to setup GitHub Actions to run the build and the tests for every push.</p>
<p>Actions are configured in the <code>.github/workflows</code> directory, where each YAML file describes a workflow.</p>
<pre class="highlight"><code><span class="hljs-comment"># .github/workflows/node.js.yml</span>
<span class="hljs-attr">name:</span> <span class="hljs-string">Node.js</span> <span class="hljs-string">CI</span>
<span class="hljs-attr">on:</span>
<span class="hljs-attr">push:</span>
<span class="hljs-attr">pull_request:</span>
<span class="hljs-attr">jobs:</span>
<span class="hljs-attr">build:</span>
<span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
<span class="hljs-attr">strategy:</span>
<span class="hljs-attr">matrix:</span>
<span class="hljs-attr">node-version:</span> [<span class="hljs-number">21.</span><span class="hljs-string">x</span>]
<span class="hljs-attr">steps:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Use</span> <span class="hljs-string">Node.js</span> <span class="hljs-string">${{</span> <span class="hljs-string">matrix.node-version</span> <span class="hljs-string">}}</span>
<span class="hljs-attr">uses:</span> <span class="hljs-string">actions/setup-node@v3</span>
<span class="hljs-attr">with:</span>
<span class="hljs-attr">node-version:</span> <span class="hljs-string">${{</span> <span class="hljs-string">matrix.node-version</span> <span class="hljs-string">}}</span>
<span class="hljs-attr">cache:</span> <span class="hljs-string">'npm'</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">ci</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">run</span> <span class="hljs-string">build</span> <span class="hljs-string">--if-present</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">test</span>
</code></pre>
<p>Let's break down the interesting parts in this workflow!</p>
<p>The <code>on: push, pull_requests</code> defines that the job will run on every push and pull request. You can define some filters here, such as to run tests only for
certain branches, but it's not needed for now.</p>
<p>The <code>build</code> job uses <code>ubuntu-latest</code> which is a good all-around base for running scripts as it has a <a href="https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2204-Readme.md">lot of preinstalled
software</a>.</p>
<p>The <code>strategy/matrix</code> defines which <code>node-version</code> to run the build with. This works like templating: the <code>${matrix.node-version}</code> placeholder will
be filled with each value in this array and each configuration will bu run during the build.</p>
<p>The <code>steps</code> are simple: <code>checkout</code> gets the current code, the <code>setup-node</code> installs the specific NodeJS version, then it runs <code>npm ci</code>, <code>npm run build</code>, and <code>npm test</code>.</p>
<h3 id="in-action" tabindex="-1">In action</h3>
<p>The GitHub Actions page shows that the workflow runs for every push:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/5a21afba894929eb3fa401ff670271ef6e949750737b783dd952f5ff8076b491.png 1.25x" alt="GitHub Actions status page"/>
<p>And each change shows the steps with the logs:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/3c238e1dce9205492266392c229fc7532da892dda0f23fa1fc6e44b541c67424.png 1.25x" alt="Steps for an Action"/>
<p>Moreover, a green checkmark shows that the actions were run successfully for a given commit:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/ac0937f10fe71506afb54293a8884b9e06111010050c5f1fecee5d016e55a010.png 1.25x" alt="A green checkmark shows that the actions were successful for a commit"/>
<p>This makes it very easy to see if tests are failing</p>
<h2 id="auto-deploy-to-npm" tabindex="-1">Auto-deploy to NPM</h2>
<p>Let's then implement the other half of CI/CD: automatic deployment!</p>
<p>For this, we'll configure a separate workflow:</p>
<pre class="highlight"><code><span class="hljs-comment"># .github/workflows/npm-publish.yml</span>
<span class="hljs-attr">name:</span> <span class="hljs-string">Node.js</span> <span class="hljs-string">Package</span>
<span class="hljs-attr">on:</span>
<span class="hljs-attr">push:</span>
<span class="hljs-attr">tags:</span>
<span class="hljs-bullet">-</span> <span class="hljs-string">"*"</span>
<span class="hljs-attr">permissions:</span>
<span class="hljs-attr">id-token:</span> <span class="hljs-string">write</span>
<span class="hljs-attr">jobs:</span>
<span class="hljs-attr">build:</span>
<span class="hljs-comment"># same as the other build</span>
<span class="hljs-attr">publish-npm:</span>
<span class="hljs-attr">needs:</span> <span class="hljs-string">build</span>
<span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
<span class="hljs-attr">steps:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/setup-node@v3</span>
<span class="hljs-attr">with:</span>
<span class="hljs-attr">node-version:</span> <span class="hljs-number">20</span>
<span class="hljs-attr">registry-url:</span> <span class="hljs-string">https://registry.npmjs.org/</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">ci</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">publish</span> <span class="hljs-string">--provenance</span>
<span class="hljs-attr">env:</span>
<span class="hljs-attr">NODE_AUTH_TOKEN:</span> <span class="hljs-string">${{secrets.npm_token}}</span>
</code></pre>
<p>The <code>on/push/tags: ["*"]</code> defines that the workflow will be run for all top-level tags, such as <code>1.0.0</code>, <code>v5.3.2</code>, but not for <code>feature/ticket</code>
or <code>fix/bug-45</code>. This is a good default config: it does not force any versioning strategy but also allows any type of hierarchical branch names.</p>
<p>The <code>build</code> step is the same as the other action, just to make sure that the library can be built with all the supported NodeJS versions and tests are
passing.</p>
<p>The <code>publish-npm</code> is the more interesting part: it checks out the code, sets up the correct NodeJS version, runs <code>npm ci</code>, the publishes the package.
The <code>--provenance</code> adds extra metadata to the package and that is the reason for the <code>permissions/id-token: write</code> config.</p>
<h3 id="provenance" tabindex="-1">Provenance</h3>
<p>Provenance is a modern feature of the NPM registry and its purpose is to <a href="https://github.blog/2023-04-19-introducing-npm-package-provenance/">provide a verifiable
link</a> from the published package to the source code that produced it.</p>
<p>Without it, nothing says that the code you see on GitHub is the same that the maintainer had when they built and published the package. And that means that even
if you go the extra mile to audit the source code of the package it can still happen that it was changed.</p>
<p>Provenance solves this problem: GitHub Actions adds the metadata pointing to the code and the workflow then signs the package. With it, it is no longer possible
that a malicious maintainer changes the code before publishing it.</p>
<p>When a version is published with provenance, it is shown on the package's page:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/e100a31ef54897922a9608e40260c2607f588b0bc3a32762587fcd42f0f8b735.png 1.25x" alt="Provenance badge on npm"/>
<p>And also there is a green checkmark next to the version:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/9b29d01bbac2e9ef8e5a9af7da2a98fa56c3e9ccb6b713f26c09a843d3568284.png 1.25x" alt="Green checkmark next to the version"/>
<h3 id="secrets" tabindex="-1">Secrets</h3>
<p>An important link is still missing: how does NPM know that a package can be published from that GitHub Action? This is where the access tokens come into play.</p>
<p>NPM allows creating M2M (Machine-to-Machine) tokens that grant access to publish new versions. So to configure a workflow with publish access, configure a
granular access token:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/061ac269e19b3f5144d81fa701e8d036f4db8c470c7eb0b853fd017821b00bbd.png 1.25x" alt="NPM access tokens"/>
<p>When adding a token, you can define which packages it has access to:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/10616b10345ade4b2d2b428ad72a328bd2c89886ae2eba051c228d66623690cb.png 1.25x" alt="Token scopes"/>
<p>On the other end, add a repository secret to the GitHub repo:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/16c5fefa514a5dbcf24b80b14d8056cb906407fdaf9e5115dbb91db6694c810d.png 1.25x" alt="Repository secret"/>
<p>Then the workflow can use this secret:</p>
<pre class="highlight"><code><span class="hljs-comment"># .github/workflows/npm-publish.yml</span>
<span class="hljs-attr">jobs:</span>
<span class="hljs-attr">publish-npm:</span>
<span class="hljs-attr">steps:</span>
<span class="hljs-comment"># ...</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">publish</span> <span class="hljs-string">--provenance</span>
<span class="hljs-attr">env:</span>
<span class="hljs-attr">NODE_AUTH_TOKEN:</span> <span class="hljs-string">${{secrets.npm_token}}</span>
</code></pre>
<h3 id="publishing-a-new-version" tabindex="-1">Publishing a new version</h3>
<p>When everything is configured, publishing a new version is simple:</p>
<pre class="highlight"><code>$ npm version patch
v0.0.9
</code></pre>
<p>Then push the code and the new tag:</p>
<pre class="highlight"><code>$ git push
$ git push --tags
</code></pre>
<p>This triggers the workflows:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/1fb0741d00a218ca5f7c155bc2ea78a1a1a614d5ea36baf8f10865e602eb9060.png 1.25x" alt="Version publish runs"/>
<p>And the new version is pushed to the NPM registry:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/2b01ae02fbf979fde51401b8c1d01495a768915f579635313d2f7c1bdfc4038e.png 1.25x" alt="NPM version history with the new version"/>
How to use CloudFront signed cookies to serve MPD or HLS videos2024-01-09T00:00:00+00:00https://advancedweb.hu/how-to-use-cloudfront-signed-cookies-to-serve-mpd-or-hls-videos<a href="https://advancedweb.hu/how-to-use-cloudfront-signed-cookies-to-serve-mpd-or-hls-videos/">(Read this article on the blog)</a><h2 id="signed-urls" tabindex="-1">Signed URLs</h2>
<p>Signed URLs is a mechanism to securely give access to protected content. It works by the backend generating a signature that the clients then can use directly
with S3 or CloudFront to get the content. It's the primary way to offer downloads and uploads in serverless applications.</p>
<p>For example, if an image is stored at <code>images/abc.jpg</code> then a signed URL for it would be <code>images/abc.jpg?x-id=GetObject&...</code>.</p>
<p>Notice that URL for the file is changed. This is usually not a problem, as when the user clicks the "download" button, there is no expectation about where the
file is downloaded from so the backend is free to return a signed URL.</p>
<h2 id="segmented-video-formats" tabindex="-1">Segmented video formats</h2>
<p>But in some cases, the client expects the file to have a specific URL. One of the most common examples for this is segmented video files, such as HLS or MPD.
Here, the video stream is broken up to segments and a manifest file defines where the individual files can be found.</p>
<p>For example, an MPD manifest looks like this:</p>
<pre class="highlight"><code><Representation
id="0"
mimeType="video/mp4"
codecs="avc1.4d401f"
bandwidth="800000"
width="1280"
height="720"
sar="1:1"
>
<SegmentTemplate
timescale="15360"
initialization="init-stream$RepresentationID$.m4s"
media="chunk-stream$RepresentationID$-$Number%05d$.m4s"
startNumber="1"
>
<SegmentTimeline>
<S t="0" d="122880" />
<S d="30720" />
</SegmentTimeline>
</SegmentTemplate>
</Representation>
</code></pre>
<p>When the client reads this and plays the video, it knows how to download the segments:</p>
<ul>
<li><code>chunk-stream0-00001.m4s</code></li>
<li><code>chunk-stream0-00002.m4s</code></li>
<li><code>chunk-stream0-00003.m4s</code></li>
<li>...</li>
</ul>
<p>But then it does not work with signed URLs anymore as the client can not possibly calculate the signature for each file.</p>
<p>This is where signed cookies are useful.</p>
<h2 id="cloudfront-signed-cookies" tabindex="-1">CloudFront signed cookies</h2>
<p>Signed cookies is another mechanism to give controlled access to protected files. Instead of modifying a URL, the backend returns a set of cookies for the
client. By the standard, these cookies are attached to the requests automatically by the browser, which means there is no change needed on the client.</p>
<p>An example set of signed cookies:</p>
<pre class="highlight"><code><span class="hljs-punctuation">{</span>
<span class="hljs-attr">"CloudFront-Key-Pair-Id"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"KJX6ADYM9FBCS"</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"CloudFront-Signature"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"kAqF32fiDKmOUpDPUNQ..."</span><span class="hljs-punctuation">,</span>
<span class="hljs-attr">"CloudFront-Policy"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"eyJTdGF0ZW1lbnQiOlt7IlJlc29..."</span>
<span class="hljs-punctuation">}</span>
</code></pre>
<p>The signature can contain wildcards for signed cookies, in practice that means that it's possible to sign them for all files under a directory. This gives an
easy-to-use structure where each video can be stored in a folder and the signature can give access to a specific one the client wants to play.</p>
<p>For example, the S3 bucket can contain two videos:</p>
<pre class="highlight"><code>bunny/bunny.mpd
bunny/chunk-stream0-00001.m4s
bunny/chunk-stream0-00002.m4s
bunny/chunk-stream1-00001.m4s
bunny/chunk-stream1-00002.m4s
bunny/init-stream0.m4s
bunny/init-stream1.m4s
sintel/sintel.mpd
sintel/chunk-stream0-00001.m4s
sintel/chunk-stream0-00002.m4s
sintel/chunk-stream1-00001.m4s
sintel/chunk-stream1-00002.m4s
sintel/init-stream0.m4s
sintel/init-stream1.m4s
</code></pre>
<p>This S3 bucket is then used as an origin for the CloudFront distribution and mapped to a path, let's say <code>/videos/*</code></p>
<p>During signing, the backend generates the cookies for a specific folder under this path:</p>
<pre class="highlight"><code><span class="hljs-keyword">import</span> {getSignedCookies} <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/cloudfront-signer"</span>;
<span class="hljs-keyword">return</span> <span class="hljs-title function_">getSignedCookies</span>({
<span class="hljs-attr">keyPairId</span>: process.<span class="hljs-property">env</span>.<span class="hljs-property">KEYPAIR_ID</span>,
<span class="hljs-attr">privateKey</span>: (<span class="hljs-keyword">await</span> <span class="hljs-title function_">getCfPrivateKey</span>()).<span class="hljs-property">Parameter</span>.<span class="hljs-property">Value</span>,
<span class="hljs-attr">policy</span>: <span class="hljs-title class_">JSON</span>.<span class="hljs-title function_">stringify</span>({
<span class="hljs-title class_">Statement</span>: [
{
<span class="hljs-title class_">Resource</span>:
<span class="hljs-string">`https://<span class="hljs-subst">${process.env.DISTRIBUTION_DOMAIN}</span>/videos/<span class="hljs-subst">${video}</span>/*`</span>,
<span class="hljs-title class_">Condition</span>: {
<span class="hljs-title class_">DateLessThan</span>: {
<span class="hljs-string">"AWS:EpochTime"</span>:
<span class="hljs-title class_">Math</span>.<span class="hljs-title function_">round</span>(
<span class="hljs-keyword">new</span> <span class="hljs-title class_">Date</span>(
<span class="hljs-keyword">new</span> <span class="hljs-title class_">Date</span>().<span class="hljs-title function_">getTime</span>() + <span class="hljs-number">60</span> * <span class="hljs-number">60</span> * <span class="hljs-number">1000</span>
).<span class="hljs-title function_">getTime</span>() / <span class="hljs-number">1000</span>
),
}
}
}
]
})
});
</code></pre>
<p>The only thing left is to set the cookies in the HTTP response:</p>
<pre class="highlight"><code><span class="hljs-keyword">return</span> {
<span class="hljs-attr">statusCode</span>: <span class="hljs-number">307</span>,
<span class="hljs-attr">cookies</span>: <span class="hljs-title class_">Object</span>.<span class="hljs-title function_">entries</span>(signedCookies).<span class="hljs-title function_">map</span>(<span class="hljs-function">(<span class="hljs-params">[name, value]</span>) =></span> {
<span class="hljs-keyword">return</span> [
<span class="hljs-string">`<span class="hljs-subst">${name}</span>=<span class="hljs-subst">${value}</span>`</span>,
<span class="hljs-string">"HttpOnly"</span>,
<span class="hljs-string">`Path=/videos/<span class="hljs-subst">${videoName}</span>`</span>,
<span class="hljs-string">"SameSite=Strict"</span>,
<span class="hljs-string">"Secure"</span>,
].<span class="hljs-title function_">join</span>(<span class="hljs-string">"; "</span>);
}),
<span class="hljs-attr">headers</span>: {
<span class="hljs-title class_">Location</span>: <span class="hljs-string">`/videos/<span class="hljs-subst">${videoName}</span>/<span class="hljs-subst">${videoName}</span>.mpd`</span>,
},
};
</code></pre>
<p>Notice the <code>Path</code> here: since the name of the cookies are fixed, setting them on the <code>/</code> would overwrite earlier cookies. That could cause problems when
the client plays multiple videos in parallel, such as using several tabs. The best practice here is to use the most specific path for the cookies.</p>
How to remove a resource before creating it with the CDK2023-12-26T00:00:00+00:00https://advancedweb.hu/how-to-remove-a-resource-before-creating-it-with-the-cdk<a href="https://advancedweb.hu/how-to-remove-a-resource-before-creating-it-with-the-cdk/">(Read this article on the blog)</a><h2 id="resource-creation-failures" tabindex="-1">Resource creation failures</h2>
<p>In the <a href="/the-problems-with-implicit-aws-resources/">previous article</a> we looked into a case where a resource was created implicitly, preventing the CDK from
creating it. In that case, that was a Log Group that AppSync created, but the same problem pops up in several places: an IoT Core domain configuration can not
be created when one already exists for the domain, for example. Whenever a resource has to be unique, CloudFormation's behavior, the service that the CDK uses
under the hood, will lead to undeployable stacks.</p>
<p>In this article, we'll look into the Log Group example we introduced in the previous article: the resource is already created by AppSync and adding it to the
CDK stack fails the deployment.</p>
<h2 id="example-baseline" tabindex="-1">Example baseline</h2>
<p>The stack defines an AppSync API:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> api = <span class="hljs-keyword">new</span> aws_appsync.<span class="hljs-title class_">GraphqlApi</span>(<span class="hljs-variable language_">this</span>, <span class="hljs-string">"Api"</span>, {
<span class="hljs-attr">name</span>: <span class="hljs-string">"test-api"</span>,
<span class="hljs-attr">definition</span>: {
<span class="hljs-attr">schema</span>: aws_appsync.<span class="hljs-property">SchemaFile</span>.<span class="hljs-title function_">fromAsset</span>(path.<span class="hljs-title function_">join</span>(__dirname, <span class="hljs-string">"schema.graphql"</span>)),
},
<span class="hljs-attr">authorizationConfig</span>: {
<span class="hljs-attr">defaultAuthorization</span>: {
<span class="hljs-attr">authorizationType</span>: aws_appsync.<span class="hljs-property">AuthorizationType</span>.<span class="hljs-property">IAM</span>,
}
},
<span class="hljs-attr">logConfig</span>: {
<span class="hljs-attr">fieldLogLevel</span>: <span class="hljs-string">"ALL"</span>,
},
});
</code></pre>
<p>This automatically configures an IAM role that allows it to create a Log Group and then put its messages there. As a result, the first invocation will create
resources in the AWS account.</p>
<p>The name of the Log Group is fixed for an AppSync API: it is always <code>/aws/appsync/apis/<api id></code>. Because of this, you can think of it as a resource that
has to be unique: the CDK can't create a new Log Group for this API without first deleting the existsing one.</p>
<p>This is demonstrated by an error when the Log Group resource is added to the stack:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> logs = <span class="hljs-keyword">new</span> aws_logs.<span class="hljs-title class_">LogGroup</span>(<span class="hljs-variable language_">this</span>, <span class="hljs-string">"AppSyncLogGroup"</span>, {
<span class="hljs-attr">logGroupName</span>: <span class="hljs-string">`/aws/appsync/apis/<span class="hljs-subst">${api.apiId}</span>`</span>,
<span class="hljs-attr">retention</span>: aws_logs.<span class="hljs-property">RetentionDays</span>.<span class="hljs-property">TWO_WEEKS</span>,
<span class="hljs-attr">removalPolicy</span>: <span class="hljs-title class_">RemovalPolicy</span>.<span class="hljs-property">DESTROY</span>,
});
</code></pre>
<p>Deployment fails:</p>
<pre class="highlight"><code>DeleterCustomResourceStack: deploying... [1/1]
DeleterCustomResourceStack: creating CloudFormation changeset...
[··························································] (0/4)
10:29:21 AM | CREATE_FAILED | AWS::Logs::LogGroup | AppSyncLogGroup
Resource handler returned message: "Resource of type 'AWS::Logs::LogGroup' with identifier '{"/properties/LogGroupName":"/aws/appsync/apis/veope6joxnfodacgbgusg7ti7y"}' alr
eady exists." (RequestToken: 0a680121-5b94-d3e4-71b3-08e2f04ff836, HandlerErrorCode: AlreadyExists)
10:29:21 AM | UPDATE_ROLLBACK_IN_P | AWS::CloudFormation::Stack | DeleterCustomResourceStack
The following resource(s) failed to create: [AppSyncLogGroup25FD6293].
10:29:21 AM | UPDATE_ROLLBACK_IN_P | AWS::CloudFormation::Stack | DeleterCustomResourceStack
The following resource(s) failed to create: [AppSyncLogGroup25FD6293].
</code></pre>
<p>An obvious solution here is to go to the AWS console and delete the resource before deploying the stack. After all, missing some logs is usually not a big
problem. But there are two problems with this approach.</p>
<p>First, in the case of logging, AppSync will recreate the Log Group whenever a new message arrives. This can be overcome by removing its
<code>logs:CreateLogGroup</code> permission.</p>
<p>But the second problem is that it does not work in other situations. For example, if you have a custom domain configured that was added outside the CDK then
removing that would render the API unavailable. Because of this, you can't do it in advance, but has to be done as part of the deployment. And a manual action
defeats the point of IaC.</p>
<h2 id="deleter-custom-resource" tabindex="-1">Deleter custom resource</h2>
<p>CloudFormation supports a mechanism to fill in the gaps with custom code: custom resources. This is a Lambda function that already exists or is deployed by the
same stack and will be called during the deployment. This offers enormous flexibility: since the Lambda code can be anything, a custom resource can potentially
touch any part of the AWS account and even outside.</p>
<p>The idea is simple: deploy a custom resource that deletes the Log Group when it is created. By adding a dependency to it from the Log Group resource, the CDK
deployment will automatically clear the group before creating it again.</p>
<p>While the two things happen in the scope of a single deployment, there is a race condition here: the AppSync API can recreate the resource on its own. Because
of this, it's useful to configure it without this permission:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> logsRole = <span class="hljs-keyword">new</span> aws_iam.<span class="hljs-title class_">Role</span>(<span class="hljs-variable language_">this</span>, <span class="hljs-string">"LogsRole"</span>, {
<span class="hljs-attr">assumedBy</span>: <span class="hljs-keyword">new</span> aws_iam.<span class="hljs-title class_">ServicePrincipal</span>(<span class="hljs-string">"appsync.amazonaws.com"</span>),
});
logsRole.<span class="hljs-title function_">addToPolicy</span>(<span class="hljs-keyword">new</span> aws_iam.<span class="hljs-title class_">PolicyStatement</span>({
<span class="hljs-attr">effect</span>: aws_iam.<span class="hljs-property">Effect</span>.<span class="hljs-property">ALLOW</span>,
<span class="hljs-attr">resources</span>: [<span class="hljs-string">"arn:aws:logs:*:*:*"</span>],
<span class="hljs-attr">actions</span>: [
<span class="hljs-comment">// no "logs:CreateLogGroup"</span>
<span class="hljs-string">"logs:CreateLogStream"</span>,
<span class="hljs-string">"logs:PutLogEvents"</span>,
],
}));
*/
<span class="hljs-keyword">const</span> api = <span class="hljs-keyword">new</span> aws_appsync.<span class="hljs-title class_">GraphqlApi</span>(<span class="hljs-variable language_">this</span>, <span class="hljs-string">"Api"</span>, {
<span class="hljs-comment">// ...</span>
<span class="hljs-attr">logConfig</span>: {
<span class="hljs-attr">role</span>: logsRole,
<span class="hljs-attr">fieldLogLevel</span>: <span class="hljs-string">"ALL"</span>,
},
});
</code></pre>
<p>Then the custom resource is a simple call to the AWS SDK:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> logGroupRemover = <span class="hljs-keyword">new</span> custom_resources.<span class="hljs-title class_">AwsCustomResource</span>(<span class="hljs-variable language_">this</span>, <span class="hljs-string">'AssociateVPCWithHostedZone'</span>, {
<span class="hljs-attr">onCreate</span>: {
<span class="hljs-attr">service</span>: <span class="hljs-string">"@aws-sdk/client-cloudwatch-logs"</span>,
<span class="hljs-attr">action</span>: <span class="hljs-string">"DeleteLogGroupCommand"</span>,
<span class="hljs-attr">parameters</span>: {
<span class="hljs-attr">logGroupName</span>: <span class="hljs-string">`/aws/appsync/apis/<span class="hljs-subst">${api.apiId}</span>`</span>,
},
<span class="hljs-attr">physicalResourceId</span>: custom_resources.<span class="hljs-property">PhysicalResourceId</span>.<span class="hljs-title function_">of</span>(<span class="hljs-string">`/aws/appsync/apis/<span class="hljs-subst">${api.apiId}</span>`</span>),
<span class="hljs-attr">ignoreErrorCodesMatching</span>: <span class="hljs-string">"ResourceNotFoundException"</span>,
},
<span class="hljs-attr">policy</span>: custom_resources.<span class="hljs-property">AwsCustomResourcePolicy</span>.<span class="hljs-title function_">fromSdkCalls</span>({
<span class="hljs-attr">resources</span>: custom_resources.<span class="hljs-property">AwsCustomResourcePolicy</span>.<span class="hljs-property">ANY_RESOURCE</span>,
}),
});
</code></pre>
<p>The above code takes advantage of CDK's wrapper construct that makes it simpler to write custom resources without all the boilerplate. But there is nothing
preventing writing a proper Lambda function that does more complex cleanup procedure. For example, to clean up an AppSync custom domain, you'll need a series of
operations: delete the domain configuration, delete the ACM certificate, then remove the CNAME record from the domain.</p>
<p>To make sure that the custom resource is deployed before the Log Group, add a dependency:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> logs = <span class="hljs-keyword">new</span> aws_logs.<span class="hljs-title class_">LogGroup</span>(<span class="hljs-variable language_">this</span>, <span class="hljs-string">"AppSyncLogGroup"</span>, {
<span class="hljs-attr">logGroupName</span>: <span class="hljs-string">`/aws/appsync/apis/<span class="hljs-subst">${api.apiId}</span>`</span>,
<span class="hljs-attr">retention</span>: aws_logs.<span class="hljs-property">RetentionDays</span>.<span class="hljs-property">TWO_WEEKS</span>,
<span class="hljs-attr">removalPolicy</span>: <span class="hljs-title class_">RemovalPolicy</span>.<span class="hljs-property">DESTROY</span>,
});
logs.<span class="hljs-property">node</span>.<span class="hljs-title function_">addDependency</span>(logGroupRemover);
</code></pre>
<p>With this, the Log Group can be moved to be managed by the CDK.</p>
The problems with implicit AWS resources2023-12-12T00:00:00+00:00https://advancedweb.hu/the-problems-with-implicit-aws-resources<a href="https://advancedweb.hu/the-problems-with-implicit-aws-resources/">(Read this article on the blog)</a><h2 id="logging-for-services" tabindex="-1">Logging for services</h2>
<p>Some resources in AWS are helpfully created when needed. The prime example for this is CloudWatch Log Groups: when the service, for example, Lambda or AppSync,
first want to send a log message it creates the Log Group first. This is possible when the service's execution role has the <code>logs:CreateLogGroup</code>
permission, which the AWS-managed policies contain for
<a href="https://docs.aws.amazon.com/appsync/latest/devguide/security_iam_policy_list.html#security-iam-awsmanpol-AWSAppSyncPushToCloudWatchLogs">both</a>
<a href="https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSLambdaBasicExecutionRole.html">services</a>.</p>
<p>This is convenient to start with: add the permission and logging works automatically. Moreover, clicking the button on the Console brings you to the correct
group where you can see all the log messages for the function or API.</p>
<h2 id="unmanaged-resources" tabindex="-1">Unmanaged resources</h2>
<p>One downside of this is behavior is defaults: the Log Group is created with no message expiration, meaning all log messages will be kept forever. This is a safe
starting point as you'll have the logs when you'll eventually need to investigate something. But usually not all log messages are needed forever and every byte
stored incurs a cost. More than that, these log groups are not removed when the function is, so messages are stored forever even when the resource that used it
no longer exists.</p>
<div class="internal_link_box my-5 p-3 pt-3 position-relative">
<div class="box-title text-monospace position-absolute py-1 px-3">
Related
</div>
<p><div class="row">
<div class="col-md-5 box-image mb-3">
<a href="/how-to-clean-up-lambda-logs/">
<img class="w-100 h-100 img-bordered" srcset="/assets/f3f650daa020fe8e30e3b60be73e1a324832c803f4214c7416a63edf4d7b9ff6.jpg 1x"/>
</a>
</div>
<div class="col-md-7">
<div class="h4 box-element-title mt-0 mb-0"><a href="/how-to-clean-up-lambda-logs/">How to clean up Lambda logs</a></div>
<div class="box-element-excerpt mt-2">
Lambda keeps its logs forever by default. Learn how to reduce the clutter
</div>
</div>
</div></p>
</div>
<p>While the lack of expiration time only affects costs, there are funcionality problems with this as well. Since the Log Group is created when the service first
puts a log message, there is a time when it is missing. Adding a metric filter, for example, then fails in this case.</p>
<p>And this can prevent a stack from deploying in some cases. In this article, we'll look into a case where a simple change in a CDK-managed AppSync API breaks new
deployments.</p>
<p>Fortunately, Lambda recently got a feature where you can optionally configure the Log Group where the function logs:
<a href="https://aws.amazon.com/blogs/compute/introducing-advanced-logging-controls-for-aws-lambda-functions/">blog</a> and
<a href="https://docs.aws.amazon.com/lambda/latest/dg/monitoring-cloudwatchlogs.html#monitoring-cloudwatchlogs-loggroups">docs</a>. So there you can simply manage the Log
Group with CDK/Terraform and that makes sure that it is created or updated with the correct configuration.</p>
<h2 id="managing-an-appsync-api" tabindex="-1">Managing an AppSync API</h2>
<p>A simple API resource managed by the CDK:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> api = <span class="hljs-keyword">new</span> aws_appsync.<span class="hljs-title class_">GraphqlApi</span>(<span class="hljs-variable language_">this</span>, <span class="hljs-string">"Api"</span>, {
<span class="hljs-attr">name</span>: <span class="hljs-string">"test-api"</span>,
<span class="hljs-attr">definition</span>: {
<span class="hljs-attr">schema</span>: aws_appsync.<span class="hljs-property">SchemaFile</span>.<span class="hljs-title function_">fromAsset</span>(path.<span class="hljs-title function_">join</span>(__dirname, <span class="hljs-string">"schema.graphql"</span>)),
},
<span class="hljs-attr">authorizationConfig</span>: {
<span class="hljs-attr">defaultAuthorization</span>: {
<span class="hljs-attr">authorizationType</span>: aws_appsync.<span class="hljs-property">AuthorizationType</span>.<span class="hljs-property">IAM</span>,
}
},
<span class="hljs-attr">logConfig</span>: {
<span class="hljs-attr">fieldLogLevel</span>: <span class="hljs-string">"ALL"</span>,
},
});
</code></pre>
<p>When deployed, it creates the API:</p>
<pre class="highlight"><code>$ aws appsync list-graphql-apis | more
{
"graphqlApis": [
{
"name": "test-api",
"apiId": "mk4geyw2kna4pkhph7zcwlqvv4",
"authenticationType": "AWS_IAM",
"logConfig": {
"fieldLogLevel": "ALL",
"cloudWatchLogsRoleArn": "arn:aws:iam::278868411450:role/DeleterCustomResourceStack-ApiApiLogsRole90293F72-UYoU4ARI8uF1",
"excludeVerboseContent": false
},
...
}
]
}
</code></pre>
<p>After the first request, a Log Group is created:</p>
<pre class="highlight"><code>$ aws logs describe-log-groups
{
"logGroups": [
{
"logGroupName": "/aws/appsync/apis/mk4geyw2kna4pkhph7zcwlqvv4",
"creationTime": 1701196713911,
"metricFilterCount": 0,
"arn": "...",
"storedBytes": 0
}
]
}
</code></pre>
<h3 id="adding-a-dependent-resource" tabindex="-1">Adding a dependent resource</h3>
<p>All good, let's add a metric filter to that:</p>
<pre class="highlight"><code>api.<span class="hljs-property">logGroup</span>.<span class="hljs-title function_">addMetricFilter</span>(<span class="hljs-string">"metric1"</span>, {
<span class="hljs-attr">filterPattern</span>: {
<span class="hljs-attr">logPatternString</span>: <span class="hljs-string">"ERROR"</span>,
},
<span class="hljs-attr">metricName</span>: <span class="hljs-string">"test"</span>,
<span class="hljs-attr">metricNamespace</span>: <span class="hljs-string">"test"</span>,
})
</code></pre>
<p>Deployment is successful, as expected:</p>
<pre class="highlight"><code>$ aws logs describe-log-groups
{
"logGroups": [
{
"logGroupName": "/aws/appsync/apis/mk4geyw2kna4pkhph7zcwlqvv4",
"creationTime": 1701196713911,
"metricFilterCount": 1,
"arn": "...",
"storedBytes": 0
}
]
}
</code></pre>
<p>But this simple change broke all new deployments. Imagine a new developer is joining the team and you want to set up a lab environment. In the new account, you
try to deploy the stack but get an error:</p>
<pre class="highlight"><code>$ npm run cdk deploy
...
DeleterCustomResourceStack: deploying... [1/1]
DeleterCustomResourceStack: creating CloudFormation changeset...
[█████████████████████████████·····························] (3/6)
7:44:15 PM | CREATE_FAILED | AWS::Logs::MetricFilter | Api/LogGroup/metric1
Resource handler returned message: "The specified log group does not exist. (Service: CloudWatchLogs, Status Code: 400, Request ID: 98ef7913-7388-4ab2-8fab-3b10211861ce)" (
RequestToken: 6bdcba86-1d96-983c-95a3-3e3aabe973f4, HandlerErrorCode: NotFound)
7:44:16 PM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack | DeleterCustomResourceStack
The following resource(s) failed to create: [ApiLogGroupmetric1913D2056, ApiSchema510EECD7]. Rollback requested by user.
7:44:16 PM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack | DeleterCustomResourceStack
The following resource(s) failed to create: [ApiLogGroupmetric1913D2056, ApiSchema510EECD7]. Rollback requested by user.
</code></pre>
<p>This is not unexpected: the Log Group does not exist because there were no requests sent to the API. This is not a problem with an existing API because the
resource was already created. But when the API is just creating it does not work.</p>
<h3 id="managed-log-group" tabindex="-1">Managed Log Group</h3>
<p>The solution is to manage the Log Group by the CDK instead of implicitly. That means a resource is needed:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> logs = <span class="hljs-keyword">new</span> aws_logs.<span class="hljs-title class_">LogGroup</span>(<span class="hljs-variable language_">this</span>, <span class="hljs-string">"AppSyncLogGroup"</span>, {
<span class="hljs-attr">logGroupName</span>: <span class="hljs-string">`/aws/appsync/apis/<span class="hljs-subst">${api.apiId}</span>`</span>,
<span class="hljs-attr">retention</span>: aws_logs.<span class="hljs-property">RetentionDays</span>.<span class="hljs-property">TWO_WEEKS</span>,
<span class="hljs-attr">removalPolicy</span>: <span class="hljs-title class_">RemovalPolicy</span>.<span class="hljs-property">DESTROY</span>,
});
</code></pre>
<p>To be extra secure, you can also remove the <code>logs:CreateLogGroup</code> permission from AppSync so that even if there is a request to the API somehow during the
deployment it won't create the Log Group:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> logsRole = <span class="hljs-keyword">new</span> aws_iam.<span class="hljs-title class_">Role</span>(<span class="hljs-variable language_">this</span>, <span class="hljs-string">"LogsRole"</span>, {
<span class="hljs-attr">assumedBy</span>: <span class="hljs-keyword">new</span> aws_iam.<span class="hljs-title class_">ServicePrincipal</span>(<span class="hljs-string">"appsync.amazonaws.com"</span>),
});
logsRole.<span class="hljs-title function_">addToPolicy</span>(<span class="hljs-keyword">new</span> aws_iam.<span class="hljs-title class_">PolicyStatement</span>({
<span class="hljs-attr">effect</span>: aws_iam.<span class="hljs-property">Effect</span>.<span class="hljs-property">ALLOW</span>,
<span class="hljs-attr">resources</span>: [<span class="hljs-string">"arn:aws:logs:*:*:*"</span>],
<span class="hljs-attr">actions</span>: [
<span class="hljs-string">"logs:CreateLogStream"</span>,
<span class="hljs-string">"logs:PutLogEvents"</span>,
],
}));
<span class="hljs-keyword">const</span> api = <span class="hljs-keyword">new</span> aws_appsync.<span class="hljs-title class_">GraphqlApi</span>(<span class="hljs-variable language_">this</span>, <span class="hljs-string">"Api"</span>, {
<span class="hljs-comment">// ...</span>
<span class="hljs-attr">logConfig</span>: {
<span class="hljs-attr">role</span>: logsRole,
<span class="hljs-attr">fieldLogLevel</span>: <span class="hljs-string">"ALL"</span>,
},
});
</code></pre>
<p>This can then be deployed and updated as well.</p>
Stable S3 signed URLs2023-11-28T00:00:00+00:00https://advancedweb.hu/stable-s3-signed-urls<a href="https://advancedweb.hu/stable-s3-signed-urls/">(Read this article on the blog)</a><h2 id="content-distribution" tabindex="-1">Content distribution</h2>
<p>URL signing is a way to provide controlled access to protected content. The backend contains custom code that decides whether a user can download a file and if
the decision is positive then it signs a URL using a secret that only it knows, then returns the URL to the user. Then for the download the backend is not
involved anymore: S3 checks the signature and then transfers the file to the client.</p>
<p>For example, an ecommerce site that sells digital products wants to distribute files only to users who bought those products. In that case, the backend is
responsible for deciding whether to allow the download, but then the file itself is served by S3.</p>
<p class="plantuml"><img srcset="/assets/92e20404bb02350cc835e65e3882a4495ffc7f94308dfb78a4fcf19990c2f06b.png 1.25x"/></p>
<p>Signed URLs is a cornerstone of how serverless apps can handle files. As they need to follow the "quick and small" response model, returning a file of arbitrary
size does not work. For example, a Lambda function has limits on the response size and that would impose limits on the maximum size of the returned files. Using
signed URLs allows a serverless app to handle any file.</p>
<h2 id="signed-urls" tabindex="-1">Signed URLs</h2>
<p>A signed URL includes several query parameters:</p>
<pre class="highlight"><code>https://terraform-20230713081606344000000003.s3.eu-central-1.amazonaws.com/test.jpg
?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD
&X-Amz-Credential=AKIAUB3O2IQ5EZ3CFTGG%2F20230713%2Feu-central-1%2Fs3%2Faws4_request
&X-Amz-Date=20230713T094500Z
&X-Amz-Expires=900
&X-Amz-Signature=ee4c48e11d1250c8e02b1615f11f217c6454c7a2701c4b82abb620ef271163a3
&X-Amz-SignedHeaders=host
&x-id=GetObject
</code></pre>
<p>The most important is the <code>X-Amz-Signature</code>. Calculating it requires a secret that is only known to the backend and it is checked by S3. Because of this, a
signed URL can not be forged.</p>
<p>How to sign a URL is <a href="https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html">well documented</a>, but in practice it's mostly calling
a function in the SDK:</p>
<pre class="highlight"><code><span class="hljs-keyword">import</span> {getSignedUrl} <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/s3-request-presigner"</span>;
<span class="hljs-keyword">import</span> {S3Client, <span class="hljs-title class_">GetObjectCommand</span>} <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/client-s3"</span>;
<span class="hljs-keyword">return</span> <span class="hljs-title function_">getSignedUrl</span>(<span class="hljs-keyword">new</span> <span class="hljs-title function_">S3Client</span>(), <span class="hljs-keyword">new</span> <span class="hljs-title class_">GetObjectCommand</span>({
<span class="hljs-title class_">Bucket</span>,
<span class="hljs-title class_">Key</span>,
}));
</code></pre>
<h2 id="caching" tabindex="-1">Caching</h2>
<p>By default, signed URLs are mostly unique, so if the backend signs two URLs for the same file they will still be different.</p>
<p>Compare these two signed URLs:</p>
<pre class="highlight"><code>https://terraform-20230713081606344000000003.s3.eu-central-1.amazonaws.com/test.jpg
?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD
&X-Amz-Credential=ASIAUB3O2IQ5NILQOP5T%2F20230713%2Feu-central-1%2Fs3%2Faws4_request
&X-Amz-Date=20230713T094506Z
&X-Amz-Expires=900
&X-Amz-Security-Token=IQoJb3JpZ2...
&X-Amz-Signature=0c127dbcf05090bdb74f78e2e8bbdf3e7edba7cdd8fac4f3d7c125a406ca4df8
&X-Amz-SignedHeaders=host
&x-id=GetObject
</code></pre>
<pre class="highlight"><code>https://terraform-20230713081606344000000003.s3.eu-central-1.amazonaws.com/test.jpg
?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD
&X-Amz-Credential=ASIAUB3O2IQ5DYKXZC7R%2F20230713%2Feu-central-1%2Fs3%2Faws4_request
&X-Amz-Date=20230713T094507Z
&X-Amz-Expires=900
&X-Amz-Security-Token=IQoJb3JpZ...
&X-Amz-Signature=2b9e8b177b4388ba5200b820057f5c8750bfa0177c852ccc3bd54ddfb7de8efc
&X-Amz-SignedHeaders=host
&x-id=GetObject
</code></pre>
<p>Notice that their signatures are different. This is usually not a problem as downloading a particular file is a one-time operation. In the ecommerce example, a
user might want to download the file after buying the product, and maybe a few times after that, but the assumption is that all these downloads get the full
contents.</p>
<p>But unique signed URLs defeat caching, both in the browser and on the edge. That means if the file is downloaded many times all of those requests will need to
go to S3 and get all the bytes. And sometimes it can be a problem.</p>
<p>For example, if the user's avatar image is shown on every page in a webapp and is served via a signed URL then the image will be downloaded over and over again
as the user navigates the app. Another example is a photo-sharing app that shows the private photos of the user using signed URLs. As those images can be rather
big, downloading them many times wastes a lot of bandwidth.</p>
<img class="d-block mx-auto img-fluid" srcset="/assets/9a7bf397aad981a9079c14655a7631bf141525791b0941f8e765dd921d8b1423.png 1x"/>
<h2 id="stable-signed-urls" tabindex="-1">Stable signed URLs</h2>
<p>To make signed URLs cacheable we need to make them stable. On the other hand, we can't make them <em>too stable</em> as that would defeat the access control mechanism.
S3 only checks the signature and the expiration time, so the longer the signed URL expires the longer the content will be available after making it private.
Moreover, there is a limit of 7 days a single signed URL can be valid.</p>
<p>For caching, it's enough to make the generated URLs stable for several minutes or hours. That way, if the user navigates the app then the files won't be
downloaded over and over again.</p>
<p>Let's see the variable parts of the URL and how to stabilize them!</p>
<h3 id="stable-date" tabindex="-1">Stable date</h3>
<p>The most dynamic part of the URL is the <code>X-Amz-Date</code> as that changes every second. This is the time of the signature and the validity period starts from
this point in time.</p>
<p>To make it more stable, we can round the signature time using the <code>signingDate</code> parameter:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> roundTo = <span class="hljs-number">5</span> * <span class="hljs-number">60</span> * <span class="hljs-number">1000</span>; <span class="hljs-comment">// 5 minutes</span>
<span class="hljs-keyword">return</span> <span class="hljs-title function_">getSignedUrl</span>(<span class="hljs-keyword">new</span> <span class="hljs-title function_">S3Client</span>(), <span class="hljs-keyword">new</span> <span class="hljs-title class_">GetObjectCommand</span>({
<span class="hljs-title class_">Bucket</span>,
<span class="hljs-title class_">Key</span>,
}), {<span class="hljs-attr">signingDate</span>: <span class="hljs-keyword">new</span> <span class="hljs-title class_">Date</span>(<span class="hljs-title class_">Math</span>.<span class="hljs-title function_">floor</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">Date</span>().<span class="hljs-title function_">getTime</span>() / roundTo) * roundTo)});
</code></pre>
<p>The above code rounds the date to the last 5-minute mark, so the <code>X-Amz-Date</code> changes only every 5 minutes instead of every second. This simple technique
offers a highly customizable way to define when new URLs will be generated.</p>
<p>This rounding has implications on the expiration time though. As it effectively backdates the signature the expiration time also comes closer to the actual
signing time. In the above case the effective expiration is between 10 and 15 minutes.</p>
<p class="plantuml"><img srcset="/assets/5dd5479eef2ff86d7339deb6137181515885abab130eb4b2883155107726a3f0.png 1.25x"/></p>
<h3 id="stable-credentials" tabindex="-1">Stable credentials</h3>
<p>With the <code>X-Amz-Date</code> fixed, the other changing part is the <code>X-Amz-Credential</code>. Compare the values in the two URLs:</p>
<pre class="highlight"><code>...
&X-Amz-Credential=ASIAUB3O2IQ5NILQOP5T%2F20230713%2Feu-central-1%2Fs3%2Faws4_request
...
&X-Amz-Credential=ASIAUB3O2IQ5DYKXZC7R%2F20230713%2Feu-central-1%2Fs3%2Faws4_request
</code></pre>
<p>They are clearly different, but it's not easy to see why. After all, the same backend signed both URLs.</p>
<p>In this case, the backend is a Lambda function and it uses its execution role to sign URLs. This is the usual serverless solution: the Lambda runtime can run
multiple instances of the same functions to quickly respond to changes in load.</p>
<p>These instances then use the same role, but they don't share the session. The Lambda runtime assumes the role multiple times resulting in the different
credentials used by the different Lambda instances. Depending on which instance signs the URL, the credential will be different.</p>
<p>To stabilize this, we need to opt out of IAM roles for signing and instead generate an access key for an IAM user and use that.</p>
<p>Note that it goes against the general best practice on not to use permanent credentials and there is a great deal of special care needed to handle the Secret
Access Key. It is possible to do it securely though as we've covered in <a href="/how-to-securely-generate-and-store-iam-secret-access-keys-with-terraform/">this article</a>. As we've discussed
there, the safest way is to store it in an SSM parameter.</p>
<p>With the Secret Access Key stored in SSM, the backend needs permissions to read it:</p>
<pre class="highlight"><code><span class="hljs-keyword">data</span> <span class="hljs-string">"aws_iam_policy_document"</span> <span class="hljs-string">"backend"</span> {
<span class="hljs-comment"># ...</span>
statement {
actions = [
<span class="hljs-string">"ssm:GetParameter"</span>,
]
resources = [
<span class="hljs-keyword">module</span>.access_key.parameter_arn
]
}
}
</code></pre>
<p>Then the best practice is to implement caching so that the parameter is not fetched for every request:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> <span class="hljs-title function_">cacheOperation</span> = (<span class="hljs-params">fn, cacheTime</span>) => {
<span class="hljs-keyword">let</span> lastRefreshed = <span class="hljs-literal">undefined</span>;
<span class="hljs-keyword">let</span> lastResult = <span class="hljs-literal">undefined</span>;
<span class="hljs-keyword">let</span> queue = <span class="hljs-title class_">Promise</span>.<span class="hljs-title function_">resolve</span>();
<span class="hljs-keyword">return</span> <span class="hljs-function">() =></span> {
<span class="hljs-keyword">const</span> res = queue.<span class="hljs-title function_">then</span>(<span class="hljs-keyword">async</span> () => {
<span class="hljs-keyword">const</span> currentTime = <span class="hljs-keyword">new</span> <span class="hljs-title class_">Date</span>().<span class="hljs-title function_">getTime</span>();
<span class="hljs-keyword">if</span> (lastResult === <span class="hljs-literal">undefined</span> || lastRefreshed + cacheTime < currentTime) {
lastResult = <span class="hljs-keyword">await</span> <span class="hljs-title function_">fn</span>();
lastRefreshed = currentTime;
}
<span class="hljs-keyword">return</span> lastResult;
});
queue = res.<span class="hljs-title function_">catch</span>(<span class="hljs-function">() =></span> {});
<span class="hljs-keyword">return</span> res;
};
};
<span class="hljs-keyword">const</span> getSecretAccessKey = <span class="hljs-title function_">cacheOperation</span>(<span class="hljs-function">() =></span> <span class="hljs-keyword">new</span> <span class="hljs-title class_">SSMClient</span>().<span class="hljs-title function_">send</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">GetParameterCommand</span>({<span class="hljs-title class_">Name</span>: process.<span class="hljs-property">env</span>.<span class="hljs-property">SECRET_ACCESS_KEY_PARAMETER</span>, <span class="hljs-title class_">WithDecryption</span>: <span class="hljs-literal">true</span>})), <span class="hljs-number">15</span> * <span class="hljs-number">1000</span>);
</code></pre>
<div class="internal_link_box my-5 p-3 pt-3 position-relative">
<div class="box-title text-monospace position-absolute py-1 px-3">
Related
</div>
<p><div class="row">
<div class="col-md-5 box-image mb-3">
<a href="/how-to-cache-ssm-getparameter-calls-in-a-lambda-function/">
<img class="w-100 h-100 img-bordered" srcset="/assets/1671a8530a4b646d473395f0f46a1c4e316fca477c8222629f126bbacf71e464.jpg 1x"/>
</a>
</div>
<div class="col-md-7">
<div class="h4 box-element-title mt-0 mb-0"><a href="/how-to-cache-ssm-getparameter-calls-in-a-lambda-function/">How to cache ssm.getParameter calls in a Lambda function</a></div>
<div class="box-element-excerpt mt-2">
Lower costs by staying in the standard SSM Parameter Store throughput limits
</div>
</div>
</div></p>
</div>
<p>Finally, use the credentials of the IAM users to sign the URL:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> accessKeyId = process.<span class="hljs-property">env</span>.<span class="hljs-property">ACCESS_KEY_ID</span>;
<span class="hljs-keyword">const</span> secretAccessKey = (<span class="hljs-keyword">await</span> <span class="hljs-title function_">getSecretAccessKey</span>()).<span class="hljs-property">Parameter</span>.<span class="hljs-property">Value</span>;
<span class="hljs-keyword">return</span> <span class="hljs-title function_">getSignedUrl</span>(<span class="hljs-keyword">new</span> <span class="hljs-title function_">S3Client</span>({
<span class="hljs-attr">credentials</span>: {
accessKeyId,
secretAccessKey,
},
}), <span class="hljs-keyword">new</span> <span class="hljs-title class_">GetObjectCommand</span>({
<span class="hljs-title class_">Bucket</span>,
<span class="hljs-title class_">Key</span>,
}), {<span class="hljs-attr">signingDate</span>: <span class="hljs-keyword">new</span> <span class="hljs-title class_">Date</span>(<span class="hljs-title class_">Math</span>.<span class="hljs-title function_">floor</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">Date</span>().<span class="hljs-title function_">getTime</span>() / roundTo) * roundTo)});
</code></pre>
<p>The resulting URL is stable and only changes when the <code>signingDate</code> changes:</p>
<pre class="highlight"><code>https://terraform-20230713081606344000000003.s3.eu-central-1.amazonaws.com/test.jpg
?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD
&X-Amz-Credential=AKIAUB3O2IQ5EZ3CFTGG%2F20230713%2Feu-central-1%2Fs3%2Faws4_request
&X-Amz-Date=20230713T094500Z
&X-Amz-Expires=900
&X-Amz-Signature=ee4c48e11d1250c8e02b1615f11f217c6454c7a2701c4b82abb620ef271163a3
&X-Amz-SignedHeaders=host
&x-id=GetObject
</code></pre>
<img class="d-block mx-auto img-fluid" srcset="/assets/ae808c329ee9448ff9a71686a5ab715a3a9d89a4573252f50febb799d0aa1a55.png 1x"/>
The effects of not maintaining consistency with DynamoDB2023-11-14T00:00:00+00:00https://advancedweb.hu/the-effects-of-not-maintaining-consistency-with-dynamodb<a href="https://advancedweb.hu/the-effects-of-not-maintaining-consistency-with-dynamodb/">(Read this article on the blog)</a><p>When working with DynamoDB I found that one of the main challenges is how to maintain consistency when multiple processes are accessing the database. I wrote
about <a href="/how-to-maintain-database-consistency-in-dynamodb/">different scenarios</a> such as keeping accurate counts, implementing foreign keys, and enforcing state in
referenced items. All of these are taking advantage of DynamoDB's transaction feature with conditional checks.</p>
<p>But I got curious: what is the effect of <em>not</em> implementing a condition check properly? Is it easy to exploit the code that fails to check a field when another
process might change the data in parallel?</p>
<p>My initial intuition said that it should be fairly hard to do. DynamoDB boasts single-digit millisecond performance, so what's the chance that I can make two
processes run at the same time?</p>
<p>It turns out, it's quite easy to trigger an inconsistent condition.</p>
<h2 id="test-setup" tabindex="-1">Test setup</h2>
<p>I chose the textbook example of race conditions: applying a coupon multiple times. While it sounds blatant, a similar problem <a href="https://web.archive.org/web/20140305135801/http://flexcoin.com/">took down a Bitcoin
exchange</a> and is actually a problem that <a href="https://vladmihalcea.com/race-condition/">affects SQL databases as
well</a>.</p>
<p>Here, I have a table with a coupon code and whether it has been used or not:</p>
<p class="plantuml"><img srcset="/assets/9c7bc9f03c2addb974b02b69f0f10b4e8aae3bb81da74391ebf9817e50aa03ff.png 1.25x"/></p>
<p>The API has an endpoint to apply a coupon: <code>/apply/<coupon></code>. This gives back a 200 response if the coupon was applied, 400 if not.</p>
<p>For repeated testing, there is also an endpoint to create new coupons: <code>/create</code>. This creates a new item in the table and returns the generated value.</p>
<h3 id="applying-the-coupon" tabindex="-1">Applying the coupon</h3>
<p>The apply code is straightforward: it first gets the current value, checks if the coupon is still valid, then sets its <code>used</code> status:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> item = <span class="hljs-keyword">await</span> dynamodb.<span class="hljs-title function_">send</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">GetItemCommand</span>({
<span class="hljs-title class_">TableName</span>: process.<span class="hljs-property">env</span>.<span class="hljs-property">TABLE</span>,
<span class="hljs-title class_">Key</span>: {
<span class="hljs-attr">coupon</span>: {
<span class="hljs-attr">S</span>: coupon,
},
},
}));
<span class="hljs-keyword">if</span> (item.<span class="hljs-property">Item</span>?.<span class="hljs-property">used</span>?.<span class="hljs-property">BOOL</span> === <span class="hljs-literal">false</span>) {
<span class="hljs-keyword">await</span> dynamodb.<span class="hljs-title function_">send</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">UpdateItemCommand</span>({
<span class="hljs-title class_">TableName</span>: process.<span class="hljs-property">env</span>.<span class="hljs-property">TABLE</span>,
<span class="hljs-title class_">Key</span>: {
<span class="hljs-attr">coupon</span>: {<span class="hljs-attr">S</span>: coupon}
},
<span class="hljs-title class_">UpdateExpression</span>: <span class="hljs-string">"SET #used = :used"</span>,
<span class="hljs-title class_">ExpressionAttributeNames</span>: {
<span class="hljs-string">"#used"</span>: <span class="hljs-string">"used"</span>,
},
<span class="hljs-title class_">ExpressionAttributeValues</span>: {
<span class="hljs-string">":used"</span>: {<span class="hljs-attr">BOOL</span>: <span class="hljs-literal">true</span>},
},
}));
<span class="hljs-keyword">return</span> {
<span class="hljs-attr">statusCode</span>: <span class="hljs-number">200</span>,
};
}
<span class="hljs-keyword">return</span> {
<span class="hljs-attr">statusCode</span>: <span class="hljs-number">400</span>,
};
</code></pre>
<h3 id="tester" tabindex="-1">Tester</h3>
<p>The tester code creates a coupon using the <code>/create</code> endpoint:</p>
<pre class="highlight"><code><span class="hljs-keyword">const</span> createRes = <span class="hljs-keyword">await</span> <span class="hljs-title function_">fetch</span>(url + <span class="hljs-string">"create/"</span>);
<span class="hljs-keyword">if</span> (!createRes.<span class="hljs-property">ok</span>) {
<span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-title class_">Error</span>(createRes);
}
<span class="hljs-keyword">const</span> couponValue = (<span class="hljs-keyword">await</span> createRes.<span class="hljs-title function_">json</span>()).<span class="hljs-property">coupon</span>;
</code></pre>
<p>Then using 3 concurrent requests it tries to apply the same value:</p>
<pre class="highlight"><code><span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> <span class="hljs-title class_">Promise</span>.<span class="hljs-title function_">all</span>([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>].<span class="hljs-title function_">map</span>(<span class="hljs-keyword">async</span> (idx) => {
<span class="hljs-keyword">const</span> applyRes = <span class="hljs-keyword">await</span> <span class="hljs-title function_">fetch</span>(url + <span class="hljs-string">"apply/"</span> + couponValue);
<span class="hljs-keyword">if</span> (!applyRes.<span class="hljs-property">ok</span>) {
<span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>
}
<span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
}));
</code></pre>
<p>By seeing how many of the three values are <code>true</code> we can see whether the coupon was applied or not.</p>
<h2 id="results" tabindex="-1">Results</h2>
<p>Even with this very simple setup I could replicate multiple uses 4 out of 10 times:</p>
<pre class="highlight"><code><span class="hljs-punctuation">[</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span> <span class="hljs-punctuation">]</span>
<span class="hljs-punctuation">]</span>
</code></pre>
<h2 id="fix" tabindex="-1">Fix</h2>
<p>The fix is rather simple after the root cause is identified: add a condition that only sets the <code>used</code> to <code>true</code> if its current value is <code>false</code>.
DynamoDB guarantees that the check and the set will be done atomically so that no coupon will be applied twice.</p>
<p>The <code>UpdateItem</code>:</p>
<pre class="highlight"><code><span class="hljs-keyword">await</span> dynamodb.<span class="hljs-title function_">send</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">UpdateItemCommand</span>({
<span class="hljs-title class_">TableName</span>: process.<span class="hljs-property">env</span>.<span class="hljs-property">TABLE</span>,
<span class="hljs-title class_">Key</span>: {
<span class="hljs-attr">coupon</span>: {<span class="hljs-attr">S</span>: coupon}
},
<span class="hljs-title class_">UpdateExpression</span>: <span class="hljs-string">"SET #used = :used"</span>,
<span class="hljs-title class_">ExpressionAttributeNames</span>: {
<span class="hljs-string">"#used"</span>: <span class="hljs-string">"used"</span>,
},
<span class="hljs-title class_">ExpressionAttributeValues</span>: {
<span class="hljs-string">":used"</span>: {<span class="hljs-attr">BOOL</span>: <span class="hljs-literal">true</span>},
<span class="hljs-string">":false"</span>: {<span class="hljs-attr">BOOL</span>: <span class="hljs-literal">false</span>},
},
<span class="hljs-title class_">ConditionExpression</span>: <span class="hljs-string">"#used = :false"</span>,
}));
</code></pre>
<p>With this change, the results show consistency:</p>
<pre class="highlight"><code><span class="hljs-punctuation">[</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span> <span class="hljs-punctuation">]</span><span class="hljs-punctuation">,</span>
<span class="hljs-punctuation">[</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">true</span></span><span class="hljs-punctuation">,</span> <span class="hljs-literal"><span class="hljs-keyword">false</span></span> <span class="hljs-punctuation">]</span>
<span class="hljs-punctuation">]</span>
</code></pre>
How to securely generate and store IAM Secret Access Keys with Terraform2023-10-31T00:00:00+00:00https://advancedweb.hu/how-to-securely-generate-and-store-iam-secret-access-keys-with-terraform<a href="https://advancedweb.hu/how-to-securely-generate-and-store-iam-secret-access-keys-with-terraform/">(Read this article on the blog)</a><h2 id="iam-identities" tabindex="-1">IAM identities</h2>
<p>Generally, it's a bad idea to use IAM users in cases when roles are also an option. This is because roles provide a secret-less way for systems to gain
permissions to AWS resources, while a user's Secret Access Key is a sensitive information that must be protected.</p>
<p>AWS provides built-in support for roles for most services. For example, a Lambda function has an execution role that allows attaching permissions and the runtime
automatically makes the credentials avaiable to the function. Also, EC2 instances can get permissions via their instance profiles. Using these approaches
provides a secure solution: all the keys are short-lived and there is no secret to lose.</p>
<div class="internal_link_box my-5 p-3 pt-3 position-relative">
<div class="box-title text-monospace position-absolute py-1 px-3">
Related
</div>
<p><div class="row">
<div class="col-md-5 box-image mb-3">
<a href="/why-aws-access-and-secret-keys-should-not-be-in-the-codebase/">
<img class="w-100 h-100 img-bordered" srcset="/assets/c490296f86bfb8a462de9ad7f5d19a27289db61d19f705065cec53a61173922a.jpg 1x"/>
</a>
</div>
<div class="col-md-7">
<div class="h4 box-element-title mt-0 mb-0"><a href="/why-aws-access-and-secret-keys-should-not-be-in-the-codebase/">Why AWS access and secret keys should not be in the codebase</a></div>
<div class="box-element-excerpt mt-2">
Setting AWS.config.credentials is a bad practice. And it is also unnecessary.
</div>
</div>
</div></p>
</div>
<p>That's the general rule, but there are exceptions. I needed to use an IAM user instead of a role when I wanted the credential part of an S3 signed URL
<a href="/stable-s3-signed-urls/">to be always the same</a>. In that case, roles are not a good solution as their Access Key ID changes every time the
role is assumed.</p>
<p>In this article, we'll look into how create a Secret Access Key in a secure way. Note though that a solution without secrets is always more secure than the one
with them, so opt for IAM roles whenever possible.</p>
<h2 id="permissions" tabindex="-1">Permissions</h2>
<p>First, let's create a user and attach permissions to it! After all, an access key for a user is only as useful as the policies attached to it.</p>
<p>Generate the user:</p>
<pre class="highlight"><code><span class="hljs-keyword">resource</span> <span class="hljs-string">"aws_iam_user"</span> <span class="hljs-string">"signer"</span> {
name = <span class="hljs-string">"signer-<span class="hljs-variable">${random_id.id.hex}</span>"</span>
}
</code></pre>
<p>Then attach some policies:</p>
<pre class="highlight"><code><span class="hljs-keyword">resource</span> <span class="hljs-string">"aws_iam_user_policy"</span> <span class="hljs-string">"signer"</span> {
user = aws_iam_user.signer.name
policy = jsonencode({
Version = <span class="hljs-string">"2012-10-17"</span>
Statement = [
{
Action = [
<span class="hljs-string">"s3:GetObject"</span>,
]
Effect = <span class="hljs-string">"Allow"</span>
Resource = <span class="hljs-string">"<span class="hljs-variable">${aws_s3_bucket.images.arn}</span>/*"</span>
},
]
})
}
</code></pre>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/e5db3b19c10bb8ffed75694df91fe04c713b18bfae5b81fe25e000b5ee9e03c7.png 1x" alt="IAM user"/>
<h2 id="generating-credentials" tabindex="-1">Generating credentials</h2>
<p>While it's tempting to generate the access key and pass the resulting Access Key ID and the Secret Access Key as environment variables, don't do this.
Environment variables are not secure, for example, the <code>ReadOnlyAccess</code> managed policy allows reading them. What to do instead is to store the Secret Access
Key in SSM Parameter Store and only pass a reference to it via environment variables.</p>
<p>Then the next question is: who is generating the user credentials?</p>
<p>The safest way is to deploy a Lambda function with the necessary permissions to generate and store the credentials and call it during deployment. This way even
the person (or process) who is doing the deployment does not have access to the secret value.</p>
<p>There is a Terraform module that handles the boilerplate of configuring and calling the Lambda function as well as managing the SSM parameter:
<a href="https://registry.terraform.io/modules/sashee/ssm-generated-value/aws/latest">ssm-generated-value</a>.</p>
<div class="internal_link_box my-5 p-3 pt-3 position-relative">
<div class="box-title text-monospace position-absolute py-1 px-3">
Related
</div>
<p><div class="row">
<div class="col-md-5 box-image mb-3">
<a href="/terraform-module-to-generate-secret-values-and-store-in-ssm-parameter-store/">
<img class="w-100 h-100 img-bordered" srcset="/assets/54d56b96dcebf659bdd9617697a3f0588ffd63908ea5b13ba3b717771a472109.jpg 1x"/>
</a>
</div>
<div class="col-md-7">
<div class="h4 box-element-title mt-0 mb-0"><a href="/terraform-module-to-generate-secret-values-and-store-in-ssm-parameter-store/">Terraform module to generate secret values and store in SSM Parameter Store</a></div>
<div class="box-element-excerpt mt-2">
Easy-to-use secrets module
</div>
</div>
</div></p>
</div>
<p>To use it, implement the custom part of generating and deleting access keys:</p>
<pre class="highlight"><code><span class="hljs-keyword">import</span> {<span class="hljs-title class_">IAMClient</span>, <span class="hljs-title class_">CreateAccessKeyCommand</span>, <span class="hljs-title class_">ListAccessKeysCommand</span>, <span class="hljs-title class_">DeleteAccessKeyCommand</span>} <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/client-iam"</span>;
<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> <span class="hljs-title class_">IAMClient</span>();
<span class="hljs-keyword">const</span> <span class="hljs-title class_">UserName</span> = <span class="hljs-string">"${aws_iam_user.signer.name}"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> <span class="hljs-title function_">generate</span> = <span class="hljs-keyword">async</span> (<span class="hljs-params"></span>) => {
<span class="hljs-keyword">const</span> result = <span class="hljs-keyword">await</span> client.<span class="hljs-title function_">send</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">CreateAccessKeyCommand</span>({
<span class="hljs-title class_">UserName</span>,
}));
<span class="hljs-keyword">return</span> {
<span class="hljs-attr">value</span>: result.<span class="hljs-property">AccessKey</span>.<span class="hljs-property">SecretAccessKey</span>,
<span class="hljs-attr">outputs</span>: {
<span class="hljs-title class_">AccessKeyId</span>: result.<span class="hljs-property">AccessKey</span>.<span class="hljs-property">AccessKeyId</span>,
}
};
}
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> <span class="hljs-title function_">cleanup</span> = <span class="hljs-keyword">async</span> (<span class="hljs-params"></span>) => {
<span class="hljs-keyword">const</span> list = <span class="hljs-keyword">await</span> client.<span class="hljs-title function_">send</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">ListAccessKeysCommand</span>({
<span class="hljs-title class_">UserName</span>,
}));
<span class="hljs-keyword">await</span> <span class="hljs-title class_">Promise</span>.<span class="hljs-title function_">all</span>(list.<span class="hljs-property">AccessKeyMetadata</span>.<span class="hljs-title function_">map</span>(<span class="hljs-keyword">async</span> ({<span class="hljs-title class_">AccessKeyId</span>}) => {
<span class="hljs-keyword">await</span> client.<span class="hljs-title function_">send</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">DeleteAccessKeyCommand</span>({
<span class="hljs-title class_">UserName</span>,
<span class="hljs-title class_">AccessKeyId</span>,
}));
}));
}
</code></pre>
<p>Then define what extra permissions the Lambda function needs:</p>
<pre class="highlight"><code>extra_statements = [
{
<span class="hljs-string">"Action"</span>: [
<span class="hljs-string">"iam:CreateAccessKey"</span>,
<span class="hljs-string">"iam:ListAccessKeys"</span>,
<span class="hljs-string">"iam:DeleteAccessKey"</span>
],
<span class="hljs-string">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
<span class="hljs-string">"Resource"</span>: aws_iam_user.signer.arn
}
]
</code></pre>
<p>The module adds these extra permissions to the ones needed to manage the SSM parameter:</p>
<img class="d-block mx-auto img-fluid img-bordered" srcset="/assets/4a23b085b63561c2bf11e84e8524e5dbb20fcdee5ba54f4f423a3901a55cae39.png 1x" alt="Permissions of the Lambda"/>
<p>When it's deployed, it creates an SSM parameter with the Secret Access Key:</p>
<img class="d-block mx-auto img-fluid img-bordered w-md-50" srcset="/assets/dd6f05af57399169015caf3f557eacad6f748fbe21875ffdcffbeb3f60fc41c1.png 1x" alt="The Secret Access Key stored in an SSM parameter"/>
<p>It also outputs the Access Key ID (that is public) and the SSM parameter's name and ARN. Then any component that needs access can use its IAM permissions to
fetch the secret and send requests as the IAM user:</p>
<pre class="highlight"><code><span class="hljs-keyword">resource</span> <span class="hljs-string">"aws_lambda_function"</span> <span class="hljs-string">"backend"</span> {
<span class="hljs-comment"># ...</span>
environment {
variables = {
SECRET_ACCESS_KEY_PARAMETER = <span class="hljs-keyword">module</span>.access_key.parameter_name
ACCESS_KEY_ID = jsondecode(<span class="hljs-keyword">module</span>.access_key.outputs).AccessKeyId
}
}
}
</code></pre>