My take on LLMs replacing programmers
Update on 18th January 2025: After witnessing another year of extraordinary growth in GenAI, the rise of agentic AI, and significant development, I am providing my updated views and a couple of new ones.
Large language models (LLMs) are sophisticated probability functions. They take a prompt and then generate a probability distribution of known classes (words in the case of LLMs). Having seen sufficient examples of a kind, it will know what words are most likely to occur next. It doesn’t have a sense of right or wrong. It doesn’t know if the generated response is right or wrong. It gives you the most probable output that occurs in reality. So, it will be biased towards the majority. Is the majority always right? Should we trust the majority blindly? Sure, it can generate a second most popular or third most popular opinion, which needs an appropriate prompt from the user. So, the question is, can LLMs replace programmers?
-
Problem 1 - Proprietary data: The most obvious problem is the data. Deep learning models are not sentient, as I noted before. They develop their predictability by learning the patterns from tons of data. LLMs can replace any repetitive jobs, not just programming, as long as a huge amount of training data is available. However, much of the software is specialized in the real world, and most well-established companies create proprietary products. For instance, at TigerIT Bangladesh, we interfaced different biometric hardware devices with software, but the device specifications varied wildly across vendors. We even had to undergo onsite training in Hong Kong for a product developed by a Hong Kong-based company. This is just one example. Data is the business. So, who would expose their proprietary data and business secrets for LLMs to learn, potentially jeopardizing their own competitive advantage? So, the data is a real bottleneck here.
Update on 18th January 2025: The rise of Retrieval-Augmented Generation (RAG) and agentic AI is rapidly closing the gap. Using RAG, LLMs can now leverage pre-trained knowledge and in-context learning (ICL) to fill knowledge gaps.
-
Problem 2 - Under-developed areas: Again, it’s a data-related one. A data-driven model can successfully learn a repetitive, widely available domain. Understandably, LLMs would struggle in the areas that are seldom explored. Humans are still needed to flourish and advance the area to the extent it becomes ubiquitous.
Update on 18th January 2025: The problem still largely exists. Although RAG can mitigate it to some extent, very novel domains remain beyond the reach of LLMs.
-
Problem 3 - Thought gaps: What LLMs see are end-products of human thoughts because only end-products are documented. When a novel system is developed, we don’t just come up with the end product. It follows a step-by-step reasoning and validation process. Such thought gaps would be lost in the machine, at least in its current state. Furthermore, as I said before, LLMs don’t have a sense of right or wrong, and they would not be capable of validating intermediate steps. Would LLMs be able to draw an analogy or connect two seemingly unrelated concepts- an important trait of the human thought process? So, when developing a new system, the undocumented human thought process would be missing, and LLMs would be limited in this context.
Update on 18th January 2025: This issue remains as relevant as before, and it represents a true test for AI models in achieving AGI. Chain-of-thought (CoT) Prompting & Tuning give the impression of improved reasoning capabilities; however, the true capability is still lacking. Test-time training (TTT) is another emerging approach that can pass more challenging reasoning tests, though it requires higher computational resources. However, none of these methods have provided true reasoning capability to LLMs yet. For most practical tasks, deep reasoning might not be necessary, and as the cost of TTT-like methods declines, they will likely find use in mainstream applications. This problem may become less limiting as time goes on.
The following are newly added:
-
Problem 4 - Trust Issue: LLMs essentially synthesize code from various sources on the internet to solve your problem. If your problem can be broken down into subproblems that have been frequently solved, you’re likely to receive complete code. However, this code still needs to be manually verified for end-to-end functionality. You can hardly trust AI-generated code unless you blindly trust everything available on the internet. This means that although we may see more automation of coding tasks, there will be an increase in human validation jobs for AI-generated content.
-
Problem 5 - Context Window: Transformers, the enabling architecture of modern LLMs, are still limited by their context window, which restricts their ability to resolve long-range dependencies. Real-world codebases are very large, and you often need to see things holistically or from a larger perspective to integrate or solve tasks. This issue needs to be addressed to truly automate software engineering jobs 100%. However, it is likely to improve over time.
-
Problem 6 - Prompting Matters: Prompting matters a lot. A detailed breakdown of the steps needed to solve a problem can significantly improve the performance of LLMs in code generation. It’s somewhat like programming the LLM but in natural language. However, a learned person with knowledge of algorithms and data structures is still needed for that. The difference is that the barrier to entry is lower.
Verdict: With the current state of AI, software engineering jobs are still largely human-centric, and complete automation is not happening anytime soon. However, AI will certainly impact coding jobs, as it increases productivity manifold, leading to job cuts in the short term. That said, there is a silver lining. With the rise of agentic AI in the near future, there will actually be a surge in demand for AI engineers to build customized AI agents for businesses.
Comments
Post comment