Give your LLM a terminal

Claude and ChatGPT are known for their conversational abilities, but their real power emerges when you give them tools. We've seen this evolution play out over the past two years: first with basic web search, then with code execution, and now with "computer use" — the ability to interact with graphical interfaces.

But there's an overlooked toolset that might be more powerful than all of these: a terminal with a persistent filesystem. It turns out that giving an LLM access to the same command-line environment that developers use daily is extremely powerful.

This idea has become mainstream for coding agents like Claude Code and OpenAI's Codex, but this post argues that its benefits extend to knowledge work in general.

The "agentic" paradigm

At its core, an agent is just an LLM equipped with a set of tools, running in a loop¹. The LLM decides which tool to use, executes it, observes the result, and decides what to do next². This simple pattern transforms a chatbot into something that can actually do things.

The promise is compelling: given the right tools, in theory you could automate any knowledge work. If you need to research a topic, the agent can search the web and synthesise its findings. If you need help managing a project, it can update task lists and send emails. If you need to analyse a dataset, it can write and execute SQL queries for you.

This is a powerful paradigm, but it relies on devising the right tools, and it turns out that not all tools are created equal.

A spectrum of tools

Some tools are fundamentally more powerful than others. On one end of the spectrum, you have simple retrieval tools like search APIs or weather lookups, and basic action tools for things like sending emails or modifying databases. These tools are useful but limited to tasks the developer has explicitly defined upfront.

At the other end of the spectrum, you have tools that give full computer access. Anthropic's "computer use" framework, which lets Claude interact with any application through screenshots and mouse/keyboard control, represents maximum flexibility. In theory, it could do anything a human can do on a computer.

But pixel-based computer use is currently slow, error-prone, and struggles with tasks that require reading lots of text or precise interactions. It doesn't play to the strengths of LLMs, which excel at understanding and generating text and code.

The power of the terminal

What if, instead of interacting with graphical interfaces, we give LLMs access to a shell with a persistent filesystem? This combines several powerful capabilities:

1. File storage and manipulation. The LLM can create files to store memories, to-do lists, research notes, drafts, etc. It can read these files later, update them, and organise them into directories. This gives it a form of long-term memory and a workspace to manage complex tasks. It can iterate on its work, including based on feedback from humans.

2. Text processing. Unix tools like grep, sed, awk, and jq were designed for exactly the kind of text manipulation tasks at which LLMs excel. Say you need to extract all email addresses from a document, search for patterns across thousands of files, or transform data into a different format. These might be one-liners in shell scripts, and they won't require a developer to specify any special tools to the model in advance³.

3. Programming environment. The model can write and run Python scripts, SQL queries, or any other code to manipulate data and files it's working with. But unlike isolated code execution environments, with a filesystem it can save these scripts, build up libraries of useful functions for its current project, and create complex data processing pipelines. Plus, it can leverage the existing ecosystem of CLI tools and libraries.

An example

Consider a concrete example: legal document analysis. Here's a simplistic example of LLM terminal use to illustrate how this could work in practice:

1# First, let's see what documents we're working with
2$ ls contracts/
3acquisition_2024.pdf  nda_template.docx  service_agreement.pdf
4 
5# Convert to text for analysis
6$ for file in contracts/*.pdf; do
7    pdftotext "$file" "${file%.pdf}.txt"
8  done
9 
10# Create a structured to-do list
11$ cat > analysis_plan.md << 'EOF'
12## Document Analysis Plan
13 
141. Extract key terms and parties
152. Identify unusual clauses
163. Flag potential risks
174. Create summary report
18 
19## Progress
20- [ ] acquisition_2024.pdf
21- [ ] service_agreement.pdf
22EOF
23 
24# Start reading documents
25$ sed -n '1,100p' contracts/acquisition_2024.txt

... and so on, perhaps then writing its conclusions to another text file, updating the to-do list, etc. The model can work iteratively, refining its analysis, cross-referencing documents, and building up a comprehensive review.

It's working in an analogous way to a human lawyer, but via a command line rather than via a PDF reader and a word processor in a graphical interface. The tools are different, but the underlying workflow is the same. Lawyers work iteratively through documents, adopting different "retrieval strategies" — sometimes reading top-to-bottom, sometimes searching for specific terms or clauses, and often keeping notes or drafting documents as they go. This can be replicated very naturally by an LLM in a terminal environment.

Memory management

A major benefit of this approach is that it solves the main bottleneck on LLM autonomy: memory. LLMs have limited context windows, which means their working memory is quickly exhausted when dealing with large amounts of information. By saving intermediate results to files, and then reading back just the parts it needs later, the model can effectively manage its own memory and continue working even as the oldest information falls out of the context window.

This is how Claude Code managed to work "independently for 7 hours with sustained performance" — it keeps track of to-do lists, summarises the message history when the context window is full, and continues editing files. Likewise, Claude Plays Pokemon sustains coherent gameplay over hours or days by keeping track of memory files detailing its goals and progress towards them, failed attempts, knowledge gained, and so on. With terminal access, the model can gain long-term coherence without needing any special memory management system⁴.

Beyond coding to general knowledge work

I've argued this framework is particularly suited to coding agents and legal analysis, but it's not limited to any particular domain. Consider these examples:

Academic research: Download papers, extract key findings, maintain a bibliography, write literature reviews
Data analysis: Import datasets, run SQL queries, create visualisations, generate reports
Project management: Maintain to-do lists, track progress, generate status updates
Content creation: Draft documents, manage revisions, organise research materials

The pattern is always the same: turn tasks into structured plans or workflows, use files to maintain state, leverage CLI tools to do some of the heavy lifting, and iterate on the results until the task is complete⁵.

Embracing tools built for humans

A good principle for automation via LLMs has been to essentially replicate how a human would do the task. The state-of-the-art coding tools follow this principle: they replicate how software engineers read and write code in a feedback loop with tests in a terminal. It's a safe bet because:

we know humans can do the task a certain way
humans have likely already developed effective workflows
there are deep analogies between how humans and LLMs process information

I think this principle has a corollary: we should give LLMs access to the tools that humans find effective for their work, and those certainly include a filesystem, a shell, and command-line tools.

The terminal isn't just a tool for programmers — it's a universal workspace. When we give LLMs access to this environment, we're not just enabling them to code better, we're giving them the same fundamental capabilities that have made humans productive at computers: the ability to create, organise, search, transform, and iterate on information.

It's an ideal interface for LLMs: flexible enough to enable long-running workflows, structured enough for models to use reliably, leverages their strengths in coding and text, and builds on decades of human effort to create effective command-line tools.

Vision-oriented computer use will be needed to interact with graphical interfaces, but I think there will always be a place for a text-based interface that allows LLMs to interact with a filesystem and run commands.

You can give this idea a try today in your local environment by asking Claude Code to perform a non-coding task (with a custom system prompt) or by installing a filesystem MCP server.

Footnotes

For a more detailed discussion, see Anthropic's engineering blog. ↩
See the ReAct paper for an academic treatment of this agentic paradigm. ↩
In practice consider using predefined tools like OpenAI's "apply patch" or Anthropic's "text editor" to maximise accuracy for targeted text editing, but it's not strictly necessary. ↩
Compared to MemGPT, which describes a hierarchical memory management system for LLMs inspired by operating systems, the terminal approach described here is much simpler. ↩
If you dismissed Manus when the hype came and went, you might want to revisit it, since it's built on this foundation. ↩