Rok's Blog

AI-calls-Editor

Rok Strniša — Thu, 20 Nov 2025 14:06:31 GMT

AI coding assistants are getting good at generating code. However, there are some refactoring operations, such as renaming a symbol across a codebase or moving a definition and updating its references/imports, which should be instant (and free), but instead often take a substantial amount of both time and tokens.

The problem is that most coding assistants approach refactoring with the usual:

Search for occurrences. This is often unreliable, especially in large projects, due to language features like scoped namespaces.
Read file sections.
Generate text patches.

Both (2) and (3) are expensive in terms of tokens.

A good AI assistant will often perform the above approach correctly through a multi-shot loop that involves linting, type-checking, and running tests (assuming it received good instructions), all of which add to token usage and time.

There’s a much better approach: use the IDE’s built-in refactoring engine. It’s correct, fast, and uses almost no tokens. Here’s the outline of the “AI-calls-Editor” approach for Claude Code and Visual Studio Code.

Create the MCP Extension. Open your project in Visual Studio Code, which has an Extension that runs a mini local MCP server. The MCP server can receive a file path, a line number, a column number, and the new name, and then rename a symbol with:

vscode.commands.executeCommand(”vscode.executeDocumentRenameProvider”, path, position, name)

Register the MCP Server. Claude Code is informed about this capability with:

claude mcp add --transport http ai-calls-editor http://localhost:7272/mcp

Using the Capability. When Claude needs to rename a symbol, it only needs to figure out the parameters and make the call. It works.

Prototype: https://github.com/rokstrnisa/ai-calls-editor — contributions welcome!

Let’s save our tokens and our time.

Investing for Anyone

Rok Strniša — Sat, 24 Aug 2024 12:07:37 GMT

Disclaimer: This post is for informational purposes only and not financial advice. Consult a financial advisor before making investment decisions. Investing involves risks.

How should I invest my savings?

In this blog post, I outline a popular approach to achieve an expected yearly growth of ~7% (before inflation) over the long term.

Why should I listen to you?

After completing my PhD in computer science at the University of Cambridge, I worked at various software companies, as well as a London hedge fund. This article is based on my research and personal experience with investments. That said, I don’t think you should trust anyone with your investment strategy: do your own research.

I now believe that most people only need to read one book on the topic: Smarter Investing by Tim Hale. (Note that I didn’t share an affiliate link - I have no financial gain from recommending this book and no connection to the author.)

In the following sections, I try to effectively summarize the main ideas in the book, and show how I put these ideas into practice.

Main ideas

Through analyses of historical data, the book shows that:

It’s generally impossible to beat the market in the long term (20+ years), where “beating the market” means outperforming the stock market as a whole.
Human psychology tends to work against us: we tend to buy when the prices are high (fear of missing out), and sell when they are low (fear of losing more).

Based on this, the book shows that it’s best to:

Match the market, rather than try to beat it.
Create a system that prevents our emotions from hurting our investments.

Investment plan

(For definitions of various terms used below, see Terminology.)

To “match the market”, the book recommends investing in ETFs that:

Together roughly represent the world’s economy. Good candidates are ETFs that track MSCI World Index, or FTSE Developed and FTSE Emerging.
Have a low management fee, e.g. below 0.25% / year.
Are from a reputable fund manager (e.g. Vanguard, SPDR, iShares/BlackRock).
Have various other properties, like suitable domicile (for your taxes), size ($100M+), age (5+ years), physical tracking, trading in your local currency, and are accumulating (depending on the tax laws in your country).

In addition, the book recommends investing in reliable bonds to increase your portfolio’s stability by sacrificing some long-term growth:

This part normally represents between 10% and 50% of your portfolio, where lower values are recommended if you’re younger or have other stable assets.
The recommended bonds are short-term government bonds and global inflation-linked government bonds.
The bonds, too, should normally be bought through ETFs rather than directly.

Using the above, you can create an investment portfolio where each ETF represents some target percentage of the total portfolio value.

Such a portfolio, where the stocks represent around 75%, is statistically expected to return around 7% per year over the long run (20+ years). Therefore, the portfolio is expected to roughly double every 10 years (not accounting for inflation).

Dealing with human psychology

The book recommends the following system:

Do not follow any financial news. If you hear something good/bad, doing nothing to your portfolio is statistically the best thing to do.
Invest your monthly savings. Always buy the ETFs that are currently below their target percentage in your portfolio.
Rebalance every 6-12 months. Sell ETFs that are above their target percentage, and buy those that are below, to get them to roughly the target percentages.

With the above approach, you will often buy an ETF when its price is low and sell it when its price is high.

How I put this into practice

I live in the European Union, so I bought Euro-based ETFs domiciled in Ireland.

The stocks represent 75% of my portfolio, and include the following ETFs:

Vanguard FTSE Developed World UCITS ETF. 75% of the stocks.
Vanguard FTSE Emerging Markets UCITS ETF. 15% of the stocks.
SPDR Dow Jones Global Real Estate UCITS ETF. 10% of the stocks.

The bonds represent 25% of my portfolio, and include the following ETFs:

JPM BetaBuilders EUR Govt Bond 1-3 yr UCITS ETF. 25% of the bonds.
iShares $ Treasury Bond 1-3yr UCITS ETF. 25% of the bonds.
iShares Global Inflation Linked Govt Bond UCITS ETF. 50% of the bonds.

To manage my portfolio, I use an investment platform called Interactive Brokers, since it allows me to buy almost any ETF, is reliable, and has low transaction fees. There are many other investment platforms - please do your own research.

Interactive Brokers has a referral program, which matches your initial investment up to $1000. You can use my referral link, but please don’t feel like you need to.

FAQ

Should I buy crypto? Crypto is generally not linked to an underlying asset, and its markets are relatively unregulated, which invites a substantial amount of fraud. In addition, the crypto market is too new for there to be a proven long-term passive investment strategy.
Should I buy gold? In the long term, the price of gold is fairly stable after adjusting for inflation, which means that holding gold is, in the long term, worse than investing as described in this blog post, assuming the average inflation is lower than 7%.
Should I buy individual stocks/smaller ETFs? By doing this, you’re trying to beat the market as a whole, so you will, statistically speaking, likely do worse in the long term than following the described investment strategy.

Conclusion

I hope this post has given you a better understanding on how you can invest your savings, and helps you find an effective and stress-free way to grow your portfolio.

If you found this post helpful, feel free to share it with your friends and subscribe to my blog below. You can also follow me on X or LinkedIn. If you have any thoughts or questions, leave a comment.

Appendix: Terminology

Share/Stock. A share in the ownership of a company.
Dividends. Distribution of a company’s profits to its shareholders.
Bond. A loan to the issuing company/country for some period that results in interest at the end of the period (maturity date).
Stock Exchange. A market where stocks are bought and sold, e.g. New York Stock Exchange (NYSE).
Publicly Traded Company. A company whose stocks can be bought and sold by the general public.
[Market] Index. A weighted list of companies whose stocks are publicly traded at stock exchanges. The list is normally updated according to a predefined formula, e.g. S&P 500 is an index that tracks the largest 500 US exchange-traded companies.
ETF (Exchange Traded Fund). ETF is a fund that normally tracks an index and is traded at a stock exchange. There can be many ETFs for a single index.
Bond ETF. An ETF composed of bonds where bonds of a certain type are continually bought (early in their lifetime) and sold (late in their lifetime) to achieve an equivalent result of doing so manually yourself.
Fund Manager. A company that creates and manages funds, including ETFs.
Management Fee. A percentage fee that the fund manager charges annually to manage a fund.
Investment Platform. A company that holds cash, stocks, bonds, etc. on your behalf, and normally allows you to quickly and easily trade.

The Birth of AI Operating Systems

Rok Strniša — Wed, 05 Apr 2023 18:28:00 GMT

A thinking humanoid robot. (Generated with Midjourney.)

macOS, Ubuntu and Windows are popular operating systems (OS) for desktop users. There are also OSes for mobile phones, cloud servers, etc. Let’s refer to these as “standard OSes”.

With the advent of large language models, such as GPT-4 used by ChatGPT, “AI operating systems” (AI OSes) become possible. In this post, I explain what I mean by this, what an architecture of an AI OS could look like, and where I think the next steps are.

Background

GPT (Generative Pre-trained Transformer) is a large language model (LLM) that is very good at doing one specific task: “finding relevant word sequences (tokens) that follow a particular piece of text.” It is based on the work by Vaswani et al, e.g. Attention Is All You Need. The models are trained on vast amounts of data from many sources and are then fine-tuned to work even better for certain types of tasks, e.g. summarization, translation and answering questions.

While the models are massive (tens of GB), their “context window”, i.e. short-term memory, is comparatively very small. For example, GPT-4 has the maximum context window of 8K “tokens”, which corresponds to ~32K characters or ~6K words in English. This may seem like a lot, but it quickly becomes the limiting factor when doing more complex tasks, e.g. summarizing/generating long text.

LLMs can also be "fine-tuned" (i.e., trained on custom data) to increase their knowledge base. This "fine-tuning" is a handy way to attenuate the previously mentioned token burden by pre-specialising the LLM to perform specific tasks. Unfortunately, at the time of writing, OpenAI does not support fine-tuning GPT-4.

It’s also worth noting that the text generation of these models is still quite slow. For example, a request to the GPT-4 API currently takes on the order of seconds or tens of seconds (depending on the length of its response). This is a very long time compared to the milliseconds or even microseconds it often takes to complete a task on a local machine.

Giving It Wings

By itself, an LLM “only” generates text. However, we can use this ability to:

Give it a goal to perform some high-level tasks.
Tell it that it can use some predefined set of actions to achieve these tasks.
Repeat until the goal is complete:
1. Ask it to create a plan to achieve the tasks and create a list of actions to run.
2. Execute the actions on its behalf.
3. Tell it the results of the actions.

I wrote a proof of concept of the above approach: How I Got ChatGPT to Write Complete Programs. At the same time, similar proofs of concepts were created (e.g. AutoGPT, BabyAGI and my own RoboGPT). These are still experimental and not very reliable attempts at creating (semi-)Autonomous AI Agents (A3), but they show promise in the general approach.

I view these experiments as precursors of well-designed, robust, and extensible AI operating systems that can reliably perform a wide variety of complex tasks.

The AI Operating System

The architecture of the systems built so far is in some ways quite similar to the architecture of the standard OSes.

There are still many unknowns, but the main components of an AI Operating System appear to be:

Kernel. Manages short-term and long-term storage, AI and non-AI processes, inter-process communication.
Plug-in System. A system that allows the user to add/remove the capabilities that the system as a whole can do. For example, one could have a “file system” plug-in, which allows the system to perform actions such as “list a directory”, “read a file” and “write a file”.
[Task] Planner(s). This is an AI process that is responsible for evaluating the current goal(s), task(s) and the intermediate results, generating new tasks (if any), and prioritizing the remaining tasks. The key here is to divide and conquer, e.g. a single user-provided goal is continually split into smaller and smaller tasks until all of them can be performed by the available actions.
[Action] Runner(s). Once a task has been reduced to specific actions that the system knows how to execute, the task runner runs these actions and returns their result (which could be a failure).
User Interface (UI). A way for the user to enter the initial goal, to see the intermediate and final result, and (optionally) to interact with the system as it is doing the work — this could include updating the goal, suggesting alternative tasks, confirming proposed actions before they are executed, etc.

A few things to note about the above items:

The plug-in system mentioned in (2) is used to add capabilities to the AI Operating System directly, allowing it to execute different types of actions. The plug-in ecosystem lives outside of any LLM. In comparison, the ChatGPT plug-in system lives within the OpenAI ecosystem and is used to augment the capabilities of ChatGPT itself.
Each plug-in defines the name of each action, what syntax it has (these are instructions for the LLM for how to write the text for the action to run), the parser (this is a small program that converts text written by the LLM to a logical action to run), and the runner spec (this defines what to do when running the action). There is probably a way to use a standard parser generator, in which case the syntax and the parser are replaced with a grammar definition.
The action can itself run an AI process, e.g. to summarize some text. However, this is irrelevant to the overall architecture, since these AI processes should be completely isolated from the Planner(s).

Here is what the high-level AI Operating System architecture could look like:

A high-level diagram of the AI Operating System architecture.

Key Challenges

There are still many challenges and uncertainties in developing AI Operating Systems.

For example, for the Planner(s) to be able to have sufficient context to make effective decisions, the context window of the LLM(s) used must contain the relevant goal, tasks, available actions and the results so far. This can be particularly challenging when dealing with complex goals.

Another challenge for the Planner(s) is to reliably parse the text of the suggested actions. While LLM(s) have become very good at following specific syntax guidelines, mistakes can still occur. For example, GPT-4 sometimes still has difficulty escaping characters correctly, which is why using JSON to convey code is not ideal.

Most of the actions run by the Runner(s) should be pretty simple. However, some may require expressing both the request and the result in an efficient manner that still conveys sufficient information back to the Planner(s). This may be difficult in some cases.

Final Thoughts

The field of (semi-)Autonomous AI Agents (A3) is evolving fast. In this post, we propose a general architecture for AI Operating Systems that could be used to run these agents. A future post may cover the plug-in system in more detail.

I am currently working on improving RoboGPT based on the architecture proposed in this blog post.

If you have any suggestions on what to try next, or would simply like to stay up to date with my work, follow/DM me on X (Twitter) or LinkedIn.

Thank you for your support and interest in my work!

How I Got ChatGPT to Write Complete Programs

Rok Strniša — Sun, 02 Apr 2023 21:48:00 GMT

A humanoid robot writing with a pen and paper. (Generated with Midjourney.)

Large language models (LLMs) have recently taken the world by storm. You are probably most familiar with ChatGPT, an artificial-intelligence (AI) chatbot developed by OpenAI, which uses the language models also developed by OpenAI. These models are called GPT and the latest version is GPT-4.

With every new version, the language models are able to accomplish more complex tasks. For example, they can summarize large texts, create new ones based on complex requirements, write code, give suggestions about anything and now even do your taxes.

It’s worth noting that these language models can and do make mistakes. First, they are trained on imperfect data, the model itself may have remembered some details incorrectly (similar to JPEG encoding artifacts), and the instructions given by the user may be ambiguous or misleading. Each of these can lead to incorrect output; however, based on my experience, if the user specifies the task well, mistakes are already quite rare.

The Idea: Empowering ChatGPT

In a recent release of GPT-4, the official launch video included a demo of writing code, and suggesting changes to the code or the development environment in order to make something work. This is where I got an idea.

“What if ChatGPT was given the power to make these changes automatically?”

If ChatGPT had indirect access to the developer’s computer, it would be able to perform these tasks by itself, without the user having to execute the tasks suggested by ChatGPT. But how can it do so, if it’s “just a large language model”?

We could write a small program that makes requests to ChatGPT, reads what ChatGPT says, and does that. The same program can tell ChatGPT what happened as a result.

The Experiment: Creating a Proof Of Concept

I wrote detailed instructions to GPT-4 about the syntax for the following actions:

Terminal - run a specific command in the command-line prompt (e.g. Terminal). This action is very powerful, e.g. install dependencies.
CreateFile - create a new file with specific contents. This action could be simulated through the Terminal action, but doing so requires further escaping, which makes the task unnecessarily more difficult for GPT.
ReplaceFile - update the contents of an existing file.
ReplaceLine - replace the contents of a specific line in an existing file.
EndOfActions - signify that all actions are complete.

Then, I implemented a command-line tool that can parse these actions according to the specified syntax, and is able to execute them on the developer’s machine.

The Demo: A HackerNews Story Summarizer

Here is a quick demonstration:

The above video shows how I describe a relatively simple program to the tool, giving the following description:

Write a command line program called HackerNews Summarizer, which fetches the contents of the top 3 stories on HackerNews, and summarizes the contents. When the program gets the URL of a story through HackerNews API, it should fetch the HTML from the URL, convert it to text only, and then summarize the first 5000 characters of this text into a single paragraph using OpenAI's text completion with model "text-davinci-003". Quote the text that you pass to OpenAI using the triple backticks (```) syntax. For each story, the program prints its headline, its URL and its summary. Assume that the OpenAI API key is already available through the environment variable OPENAI_API_KEY.

This tool then sets up the environment, installs appropriate dependencies, writes files containing source code for the target program, runs the program and checks that the output looks reasonable. Here is the source code it wrote:

import os
import requests
from bs4 import BeautifulSoup
import openai

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
HN_API_BASE = "https://hacker-news.firebaseio.com/v0"

def get_top_stories():
    response = requests.get(f"{HN_API_BASE}/topstories.json")
    return response.json()

def get_story_details(story_id):
    response = requests.get(f"{HN_API_BASE}/item/{story_id}.json")
    return response.json()

def fetch_content(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    return soup.get_text()

def summarize_content(content):
    openai.api_key = OPENAI_API_KEY
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=f"Summarize the following content in a single paragraph:\n```{content[:5000]}```\n",
        max_tokens=100,
        n=1,
        temperature=0.5,
    )
    return response.choices[0].text.strip()

def main():
    story_ids = get_top_stories()[:3]

    for story_id in story_ids:
        story = get_story_details(story_id)
        if "url" in story:
            content = fetch_content(story["url"])
            summary = summarize_content(content)
            print(f"Title: {story['title']}\nURL: {story['url']}\nSummary: {summary}\n")

if __name__ == "__main__":
    main()

It does all of this in roughly 1m30s (parts of the video where the facilitator is waiting for ChatGPT are cut out).

Conclusion

Currently, when tools like GitHub Copilot (which also uses GPT) suggest a few lines of code, the developer checks if those lines look reasonable. This can already greatly increase developer productivity.

With the approach described in this post, the idea is for a large language model to iterate on a larger task itself; the developer ideally only needs to check the final result. The proof of concept shown above is still quite limited, but it shows potential.

If you have any suggestions on what to try next, or would simply like to stay up to date with my work, follow/DM me on X (Twitter) or LinkedIn.