My first two months using AI

Tamás Sallai

11 mins

Photo by Karola G: https://www.pexels.com/photo/composition-of-coffee-and-notebook-with-pen-4195327/

Want to learn AWS serverless development? Click here

I started using AI more seriously in early November, so around this time marks my second month. When I talk to others, everyone's experience feels very different. So to add one more data point, here is mine.

I resisted using AI for a long time. My reasoning was that prompting is easy to catch up with, so it does not matter if I join the crowd a year later. This turned out to be true. Also, I had some trials with the ChatGPT free version which was a mixed bag so I wasn't that convinced it is any good.

What prompted me for a proper try was talking with people. Some of them told me that for Python scripts it is correct almost 100% of the time. Others used it daily for programming tasks and were impressed by recent advancements. Moreover, apparently it is quite good for adapting cooking recipes for X amount of people and providing a shopping list which was particularly useful when I found myself in a rural community. I had a change of scenery and time needed for experimentation, I felt that this is a good time to start learning.

Book

Building GraphQL APIs with AWS AppSync

How to design, implement, and deploy GraphQL-based APIs on the AWS cloud

Starting

Ok, where should I start with AI? It felt a bit overwhelming at first.

I've heard about Copilot, so I tried to start with that. I started the CLI, did the login, then it asked for read and write access to all my public and private repositories on GitHub. So Copilot is out of the game.

Then I tried OpenRouter and OpenCode and this is the setup I ended up using. With OpenRouter I can switch between models easily so I can try what's new and see if it's better than what I'm currently using. And OpenCode is an independent open-source TUI where I can control access using the filesystem and it only needs an API key from OpenRouter. It turned out that with this combo I have access to quite a few free models, some via OpenRouter, some via OpenCode Zen. I'm regularly trying out these ones on my open-source projects as I don't mind them using data to train their models.

OpenRouter uses per-token pricing. I had to realize that especially Opus is very expensive in this way. After I burned 20 dollars in a morning I started looking at context length more closely and also using cheaper models for easier tasks. The big upside for free models is that I can put everything in the context and I have to worry about one less thing. Cost anxiety (similar to range anxiety for electric cars) is a real thing. Overall, I've spent something like $170 in the past two months for different models, although most of it in the first half.

Early this January I realized that it's possible to use Anthropic's fixed-price subscription with OpenCode as well. So I bought the Pro package and mainly switched to that. With OpenCode I can still switch between models easily, so it was an overall price reduction.

Then Anthropic broke this integration just before the first weekly reset so I cancelled my Pro subscription and went back to the per-token pricing. I'll keep an eye on subscriptions in the future.

For mobile, I installed oxproxion and I'm happy with it. It supports OpenRouter and supports images both for input and output. A huge win for OpenRouter is that I can set per-day limits for each API key so that I won't accidentally deplete my account.

For coding

Models are quite good for coding small tasks but usually need a fair amount of hand-holding. I still have no idea how people do vibe coding where everything the model produces is good as-is. My experience is that I need to chunk the task to small tasks then check and refactor after each subtask.

As for models, Opus 4.5 feels like the best, and Sonnet is also very good. I tried Grok Code Fast, GLM 4.7, and also Devstral 2. It feels like the free models are making more mistakes, but it's hard to objectively compare.

I noticed that I have "prompt anxiety": when the model makes mistakes and I need to reprompt over and over again to fix them. When it one-shots an implementation it's a magical feeling, but when there is a seemingly neverending series of "fix this", "use X, not Y", "here as well" prompts it gets frustrating very quickly.

Also, there is "review anxiety". I do a prompt, it makes some changes, I review them, then prompt again for some changes. Sometimes a small request rewrites a lot of things so I need to review everything one more time.

Models like to add a lot of console.log and they especially like emojis.

AI is the most useful when I already have the mental model and I use it to write things down. In one case, I changed the input structure of a function and asked it to adapt the tests to that. It's a rather manual process, looking at each occurrence and making a one or two lines change, just complex enough that I can't use a Vim macro. Using AI was a one-shot and made all the necessary changes perfectly.

In another occurrence, I wanted to get rid of the aws-iot-device-sdk and replace it with MQTT.js. It made the change perfectly on the first try. Having a lot of tests helps with the confidence, and this was a particularly well-tested part of the codebase.

On the other hand, I use dotfiles consisting of a good amount of Nix code. Making changes here is usually a mixed bag, simple things usually work but I found many models struggle with escaping or some other minor roadblocks and end up in an infinite loop. Granted, it combines multiple layers: modify a Fish script that is configuring Bubblewrap that builds up the environment for a program to run is not simple even for a human to reason about. But even for configuring my NeoVim setup which is a nix+lua file it fails sometimes.

AI is very good at checking comments, the prompt "is the comment accurate for the code?" questions will usually reveal when something is off. I'm not sure exactly how but it feels like an enabler that was not possible in the past. Also, it correctly pointed out in one case when the "milliseconds in month" calculation was incorrect, a hard-to-spot problem.

It is also quite good at reviewing changes: "here's the patch, tell me what's wrong with it". It will find a lot of false positives but it also helped me find genuine problems. But it needs a lot of back-and-forth.

Cost-wise, coding is expensive. It reads a couple of files and that inflates the context size a lot. Especially that it does so automatically, coding tasks are my most expensive usage for now.

For writing

I mostly used Gemini 3.0 (Flash and Pro) for writing and planning, sometimes Opus. My experience is that it's very good at pointing out weak arguments in my writing. Also, it forces me to write down my thoughts which is an immensely useful exercise in itself, although independent of AI.

I had to open a support ticket to AWS recently. I wrote a draft then asked Gemini how they would dismiss it. After a couple of iterations the end result was significantly better. On the other hand, letting Gemini to do the writing is generally not useful: it jumps into speculation eagerly which would weaken the overall message. But some sentence recommendations it gave were quite good.

Another example is my article on serverless. I wrote a draft, asked Sonnet (if I remember correctly) for feedback and iterated on it. It forced me to be less hand-wavy and provide clearer examples in several places. But I had to see where to stop and just publish.

I found that it's better to restart the session when asking for editing than replying. After a few rounds it focuses only on the initial set of recommendations and misses other things. When starting fresh, it will provide a different set that sometimes contains other useful insights. And in OpenCode it's very easy: I can use the /undo and press enter to resend the previous prompt.

For brainstorming

I also used AI to plan systems and it's probably the best usecase for it so far.

I planned how to implement multi-tenancy in Postgres using row-level security. Gemini told me that session variables are good for them, but I didn't like that they can be set multiple times. Then it made a custom function that can only be called once. Then I didn't like that the value can be read. Then we iterated to a type of variable that is access-controlled, can only be set once per transaction, and works with RLS. I'll need to look into this in more detail, but having an outline quickly is a huge win.

Instead of breaking the mental process with reading the documentation, it was a "flow" experience. I could ask questions "how X works?", "why Y does not work for this?", it would provide me guidance and code examples, and in the end I felt I have a quite good understanding of the tradeoffs.

Of course, this is just the exploration part, when I'll actually implement this I'll need to improve my understanding and read the documentation. But for exploration and finding the direction I want to go, it was very useful.

Interestingly, it told me many times that "this is the standard pattern of X" and I suspect it's making up these names most of the times.

General

For searching it is way better than traditional engines. That I can add context and then ask follow-up questions makes it incredibly useful. I could say "I know Puppeteer, how does it compare to Playwright" and I get a rather complete picture in 2 or 3 follow-ups. This experience is immeasurably better than reading a couple of "getting started with Playwright" and "Playwright vs Puppeteer" posts. And when I'm ready to dive in I can read the documentation.

I'm casually learning Spanish and I found AI to be a huge help with grammar. I have a folder with some instructions so that there are lessons each focusing on a specific point, and my task is to translate sentences. Then I get corrections and explanation of what errors I made. It also writes a summary of each lesson describing my weak points so that later I can focus on them more.

That said, it is annoyingly undeterministic. Sometimes it switches to Spanish for the explanations, sometimes it says "this is the end of the lesson" even though it is explicitly written that I'll track the time, and also sometimes it echoes back my sentences incorrectly so I need to scroll up to see what I actually wrote. And these are present with Gemini 3.0 Pro and also with Opus 4.5.

AI is a huge help with things I generally don't know about. Taking a picture of the wine shelf and asking "which one should I bring to an Italian-style lasagna dinner?" and getting option #1, #2, #3 is immensely useful even though I'm sure a true connoisseur would disagree. It helped me fix a zipper on one of my jackets, and also identifying a big black box on a pole I saw (it was a fiber patch box), among other things. It increases my baseline for low-impact things that I don't want to invest too much time in.

I also tried to generate text of my audio notes but with mixed success. Sometimes it captured my main point, sometimes it did not, and I found it is a bit too eager to fill in the blanks. Granted, it's a rather hard task as I'm meandering in thoughts when I record these memos.

The scene is changing fast

What I've seen in the past two months is that new models are coming out pretty much weekly and some of them are quite usable for certain tasks. Extrapolating further it seems like the next months will be similarly eventful. I'm curious what additional use-cases will be available.