← Back to writing

Ukoni: Architecting an agent driven interface

I built the CRUD powering Ukoni (https://ukoni.app) and then tried to use it for the first time to plan our shopping this weekend. It was a pretty painful experience. There were usability issues such as typing in a search term and then having to type it in again to create it if it doesn't exist, some data model issues such as not being able to add a product rather than a product variant to the shopping list, some mobile responsiveness issues and some missing features such as creating meal plans and then shopping lists from those meal plans. After writing out this laundry list of things, it might seem like it was complete garbage, but Rome wasn't built in a day, they were laying bricks every hour.

I think the biggest takeaway though (and it's something I alluded to in my introductory note for this project) is that a traditional UX sucks for this kind of thing.

Another very important factor is the interface. All that I’ve listed out here was covered by the Notion database template I mentioned above. However, it fell into disuse because keeping it up to date was incredibly tedious. I’m keen to try to build a different, more natural language interface for this system, that should mean I don’t abandon using it after two shopping trips.

I had to create each product and each variant as we went through things to put on the list and while part of that is a cold-start problem and shouldn't need to be done to the same extent again, it's just not a great way to interact with data. It's true, forms and form inputs are how we've interacted with inputing and editing most data for the longest time, but it has always been a process that we've adapted to rather than the ideal interaction.

It reminds me of something my lecturer for Human Computer Interaction (HCI) in university said around how cars had originally been designed with manual transmission as the way to drive them, but that automatic transmission was a much better interaction and that ultimately, driverless cars would be an even better interaction pattern. The constraints the builders of those eras worked under meant that they couldn't go straight from inventing the combustion engine to driverless cars immediately.

Now that we can use natural language even for structured input, it opens up much better user experience and affordance for users. I've already started seeing this in the wild, even before the terminology of agents too off. In 2023 for example, I completed referencing checks for a tenancy application via chatbot on Goodlord. Best referencing check I've had to do.

So, after successfully building my first LLM-enhanced app the other day with puzzle generation on 8Words, I'm going to attempt to build a natural language driven input for Ukoni.

Current Flow.

The current flow is a typical CRUD application flow. Let's walk through creating a shopping list for example.

sequenceDiagram actor User participant UI as Web / Mobile UI participant API participant DB as Database User->>UI: Log in UI->>API: Authenticate user API-->>UI: Auth success User->>UI: Navigate to Shopping Lists UI->>API: Fetch shopping lists API->>DB: Get lists for user DB-->>API: Lists API-->>UI: Lists User->>UI: Create new shopping list UI->>API: Create list (name) API->>DB: Create shopping list DB-->>API: List created API-->>UI: List details loop For each item User->>UI: Add item UI->>API: Search product API->>DB: Search products DB-->>API: Search results API-->>UI: Results alt Product found User->>UI: Select product alt Variant known and found User->>UI: Select variant else Variant known but not found User->>UI: Create variant UI->>API: Create variant (name, properties) API->>DB: Create variant DB -->>API: Variant created API-->>UI: Variant else Variant not specified UI->>UI: Use generic/default variant end else Product not found User->>UI: Create product UI->>API: Create product API->>DB: Create product DB-->>API: Product created API-->>UI: Product alt Variant known User->>UI: Create variant UI->>API: Create variant API->>DB: Create variant DB-->>API: Variant created API-->>UI: Variant else Variant not specified UI->>UI: Use generic/default variant end end User->>UI: Add item properties (outlet, notes) UI->>API: Add item to list API->>DB: Create list item DB-->>API: Item added API-->>UI: Updated list end User->>UI: Shopping list complete

Somewhat complex, but also a fairly vanilla data-driven system.

Natural Language Flow

Ideally, we could remove most of the back and forth with the user and replace it with natural language so that its a much simpler flow.

sequenceDiagram actor User participant UI as Web / Mobile UI participant API participant Agent as NL Agent participant DB as Database User->>UI: Log in UI->>API: Authenticate user API-->>UI: Auth success User->>UI: Enter shopping list in natural language UI->>API: Submit NL prompt API->>Agent: Extract list name + items Agent-->>API: Structured intent (list, items, variants) API->>DB: Check if list exists (by name) alt List exists DB-->>API: Existing list id else List does not exist API->>DB: Create shopping list DB-->>API: New list id end loop For each extracted item API->>DB: Check product existence alt Product exists DB-->>API: Product (+ variants) else Product does not exist API->>DB: Create product DB-->>API: Product created end alt Variant specified API->>DB: Check / create variant DB-->>API: Variant else No variant specified API->>API: Use canonical / default variant end API->>ListSvc: Stage list item (not committed) end API-->>UI: Proposed list changes (preview) User->>UI: Review & edit items UI->>API: Confirm changes API->>DB: Commit staged items DB-->>API: Items committed API-->>UI: Updated shopping list

With this, while the underlying data model complexity is exactly the same, the rigour of adding all that structured data is mostly hidden from the user.

Existing component diagram

I'm historically bad at diagrams in general, so forgive these, but hopefully they convey the idea. The classic flow is a client hits the server with CRUD actions after the user has gone through the user interface to set those up, and then the server performs those actions on a database.

flowchart LR U[User] WC[Web Client] API[API Server] DB[(Postgres DB)] U -->|CRUD actions| WC WC -->|CRUD actions| API API -->|CRUD actions| DB DB -->|Read results| API API -->|Responses| WC WC -->|UI updates| U

Proposed component diagram

In this proposed flow, the client hits a so called "agent server" with a natural language payload where the user has described the actions they want taken. This agent server communicates with one of the LLM models over an API (or does local inference) and can extract some structure from the user's prompt and then the agent server can call CRUD actions against the API server which eventually persists these against the database.

flowchart LR U[User] WC[Web Client] AS[Agent Server] API[API Server] DB[(Postgres DB)] U -->|Natural language| WC WC -->|Enhanced prompt| AS AS -->|CRUD actions| API API -->|CRUD actions| DB DB -->|Read results| API API -->|Responses| AS AS -->|Proposed changes / summaries| WC WC -->|UI updates| U

The Agent Server

So called, as I described it above. It's a harness for the agent in the same way as applications like Claude Code or OpenCode are. They orchestrate calls to the model, enhance prompts and manage context. Our harness will be far simpler than the illustrious examples I've cited, but I think the idea is similar.

I've learned a lot about this from reading GC Nwogu's Anatomy of an AI Agent (which inspired me to explore this) as well as Thorsten Ball's How to build an AI agent (which I read long ago and never acted on — until now). So a lot of the ideas for how the agent will work are inspired by them.

As Thorsten writes in that article, an agent is:

an LLM with access to tools, giving it the ability to modify something outside the context window.

In July 2025, I attended Georges Haidar's talk "Bridging the gap between your API and LLMs" at the Stripe London Developer Meetup. It was the moment the concept of tools clicked for me as API + description for the LLM. So, we'll essentially be converting our entire REST API surface into callable tools for the agent. The harness effectively becomes another client of the API, keeping the mental model simple and not requiring much change on the API side.

We'll use the OpenAPI spec to programmatically generate these tools, so that we don’t have to worry about drift between our API changes and the model calls. In the future, this could evolve into an MCP server for the app, but for now we’ll do something simpler with function calling from within our agent server.

Function calling works by specifying a set of functions, their description and their arguments as tools and passing that to the model along with a prompt. The model can then respond plainly or with a tool call specifying one of your defined functions along with arguments. You can then parse this and run the function before passing the result back to the model. And then the process continues until your task is complete.

sequenceDiagram participant Agent as Agent Server participant Model as LLM participant Tool as Function / Tool Agent->>Model: Prompt + tool definitions alt Plain response Model-->>Agent: Natural language response else Tool call Model-->>Agent: Tool call (function + arguments) Agent->>Tool: Execute function Tool-->>Agent: Function result Agent->>Model: Result as new context loop Until task complete Model-->>Agent: Next response or tool call end end

I intend to have support for multiple providers and have a “bring your own API key” policy. The good news is that Google Gemini API and OpenAI API (the two I’m starting with) both have support for function calling with fairly similar structures.

Conclusion

Got tired of writing. Wrapping up here. Good plan. Now for the execution.

This is only the second model-driven application I’m making. Pretty excited to see it turn out.