The first thing to get straight: the model never runs your code.
This surprises people who have been using AI agents for months. They see a response that says "I've checked the calendar" or "I've sent the message" and assume the model did something. What actually happened is more mechanical, and worth understanding precisely.
LLMs cannot actually call a tool themselves; instead, they express the intent to call a specific tool in their response. The model outputs structured text - a JSON object naming a function and its arguments - then stops. Your host program picks that up, runs the actual function, and feeds the result back into the next turn. The model sees the output and continues generating. LLMs do not call functions. Regardless of what you've been told, they simply don't. If your LLM API or SDK calls functions for you, there is a layer of software wrapped around it taking care of this and invoking the function.
That intermediary layer is the entire game.
What the exchange actually looks like
Say you ask an agent: "What's the weather in London and New York right now?"
The model receives a list of available tools, each described by a JSON schema - name, description, parameter types. The calling client sends the LLM a list of available functions, a description of each, and a description of the parameters each takes. The LLM then returns a response stating which function to call and which arguments to pass, based on the prompt it received.
In this case, the model doesn't generate prose. It outputs something like: