The last few years have seen an explosion of innovation within the technology sector, driven by the advent of widely-accessible, commodity-priced Large Language Models (LLMs) such as OpenAI’s GPT family or Anthropic’s Claude models.
As I write this, we are a few days out from OpenAI’s latest frontier model announcement, for o3, which has vaulted far into the lead on a number of serious benchmarks concerning reasoning and problem solving. While o3 is certainly not commodity-priced today, it shows that advancement in LLM capability is continuing, with fertile ground for innovation beyond merely increasing learning capacity - the primary innovation that came from the Transformer model that has spurred this wave of advancement.
At the same time, many smart people are working on developing agentic architectures - frameworks for enabling semi-autonomous AI-powered agents to operate in tandem to accomplish tasks previously only possible for humans or groups of humans.
This article is not an in-depth exploration of how LLMs work, though I will perhaps cover that in future. Instead, I want to explore how LLMs can be used to accelerate the prototyping and testing phases of the product development lifecycle, and consider the implications for the wider industry.
Code babblers
LLMs are capable of producing and completing large volumes of text content that are in coherence with other textual data used as an input. One great and immediate application of this is in completing and automatically writing code in a variety of programming languages.
Software engineering is much more than just writing code. When developing software, I spend more time reading and thinking than I do writing, and perhaps more writing diagrams and documentation than actual code. Shaping and resolving ambiguity is the primary goal of most development, and implementing code is just one step in that cycle.
Nonetheless, LLMs are a terrific accelerant for delivery of working software, when used in the right environment and guided by the right hands. Early product prototyping needs software that works on the happy path - i.e. without error cases and edge cases covered, because the goal is to learn if the ideal state is valuable enough to spend time developing a complete solution. As such, the ratio of coding to thinking tips to the left, and LLMs are therefore great at helping turn around scrappy early iterations very quickly.
At this point, I expect every strong developer to have some experience of how best to use an LLM to accelerate parts of their work. Though, as always, I implore engineers to ensure they understand every single line of code that they deploy to production - regardless of whether it was written by them or their AI assistant.
Fuzzy problem solvers
As LLMs can be invoked over API, they are also easily used directly as part of the software stack. When employing a philosophy of modular software design (structuring complex systems as networks of smaller, coherent components that are ignorant of the wider context in which they are used), LLMs are a great choice for rapidly developing the first version of a component that can afford to be non-deterministic in its output. Some compromises must be made.
For example, imagine a system for categorising uploaded user documents correctly to allow for filing of financial information. In a traditional application, this might require extremely high accuracy in the detection and labelling of categories - say 100% accuracy needed for identifying invoices, with any errors risking fines for the customer during an audit process. Engineering a bulletproof solution to this would be extremely difficult with traditional software, requiring hundreds of edge cases for different types of invoices (layouts, languages, standards) and other unexpected issues (handling invoices that have small errors etc). This could literally be the work of hundreds of engineers for many years, and may prove ultimately insurmountable.
In the right modular setup, this is actually a great use for an LLM. Just ask the LLM to categorise every document, fine-tuning either the prompt, examples or model weights to get the best accuracy you can, and having a strong repeatable system for evaluating its performance. Say you get to 95% - that is certainly not good enough for a pure automated solution, but getting there in a few days, you now have only the remaining 5% to deal with. Maybe your business makes more sense if you just employ human labour to handle these edge cases? As a bonus, over time you’ll build up more data that can help you refine the LLM and perhaps push that 95% higher. And if not - hey, your system is modular, so you can always swap the LLM out for a custom component further down the road.
The ability for LLMs to get 90% of the way through complex tasks very quickly changes the shape of development for complex systems with many components. In reality, LLMs are enabling a middle path between “build” and “buy”, in which the computation can be quickly outsourced to a commoditised “second party” agent. This changes the rules of the game, and businesses able to adapt to this paradigm are poised to reap the benefits.
Inverse mechanical turks
Finally, and perhaps most obviously - foundation LLMs accept input and produce output that is by definition designed to appear human. Written and verbal language is the primary interface between humans, particularly online, and LLMs are well poised to supplant the human on the other side of a computer interface in many, many cases - from customer support, to writing sales emails, to summarising documents, to comprehending requests or complaints and beyond.
This pattern is a neat inversion of the idea of a mechanical turk - a seeming automaton which in fact conceals a real human within, who uses their own intellect to solve problems but presents a mechanical interface to the outside world, hiding the presence of a human entirely. LLMs enable the direct inverse - allowing a human interface that can conceal something within which appears to possess a human intellect, but in fact is merely a neural network providing automated completion of likely run-on sentences. While this used to sound like science fiction, it is now the very basis for the fastest growing product of all time, and commonplace.
The true power of this pattern, in my opinion, comes from our ability to engineer systems that operate like the box in Schrödinger’s famous thought experiment - except that instead of a living or dead cat within, there is either a living human or the dead, reanimated flesh of a babbling LLM pretending to be one. Deepfakes as a service, if you like. The magic here is that the user need never collapse the waveform by opening the box.
The future
In my view, LLMs are unlikely to be the final state of AI. Wittgenstein once famously postulated that even if a lion were able to speak your language, you would be unable to hold a conversation with it - your frames of reference being so diverse that any conversation would result in each party being incomprehensible to the other. Whether thought and language are the same is one that I’ll leave to the philosophers, but I certainly don’t think that language is the only type of intelligence possible, and my belief is that artificial intelligence will likely end up substantially different in structure to human intelligence.
In 2024, it is indubitable that LLMs are a game changer for software development. With the current state of the art, they make it possible to move much faster and cheaper in a number of ways, as we have explored here. Even more exciting are the ways in which LLMs and other forms of AI will enable us to build systems that are currently beyond the realm of possibility, deploying superhuman reasoning and intelligence within automated software.
I’ve said it before and I’ll say it again - it is going to get weird. But for now, LLMs are a tremendous boon to those with a scrappy mindset seeking to build impactful software quickly.