NodeLLM 1.7.0: Standardizing 'Extended Thinking' Across Providers

The "Reasoning Model" era has arrived, and with it, a new set of fragmented APIs. OpenAI has reasoning_effort, Anthropic has thinking budgets, Gemini has thought parts, and DeepSeek has reasoning_content.

If you're building production infrastructure, you don't want your domain logic littered with provider-specific "thinking" hacks.

NodeLLM 1.7.0 standardizes 'Extended Thinking' into a single, predictable interface.

One API, Every Reasoner

Instead of toggling internal flags or guessing where the 'thought' text lives in a stream, NodeLLM 1.7.0 introduces the .withThinking() and .withEffort() fluent methods.

// Anthropic Claude 3.7 (Thinking Budget)
const response = await chat
  .withThinking({ budget: 4000 })
  .ask("Solve this complex architectural problem...");

// OpenAI o1/o3 (Reasoning Effort)
const response = await chat
  .withEffort('high')
  .ask("Design a secure multi-tenant system...");

The core difference? Unified Output. Regardless of the provider, you get a consistent ThinkingResult containing:

text: The raw chain-of-thought.
signature: The verification signature (for Anthropic/OpenAI).
tokens: The token count consumed by the reasoning process.

Streaming the Thought Process

One of the biggest UX challenges with reasoning models is the "black box" period where the model is thinking but not yielding output.

NodeLLM 1.7.0 brings full Streaming support for Thinking chunks. Your UI can now render the model's internal processing in real-time, providing immediate feedback to the user while the final response is being formed.

for await (const chunk of chat.withThinking({ budget: 2000 }).stream(prompt)) {
  if (chunk.thinking) {
    process.stdout.write(chalk.dim(chunk.thinking));
  }
  if (chunk.content) {
    process.stdout.write(chunk.content);
  }
}

Fully Persistent (ORM 0.2.0)

Alongside Core 1.7.0, we've released @node-llm/orm 0.2.0. This update ensures that your model's "thinking" isn't lost to the ether.

The ORM now automatically captures:

Full chain-of-thought text in LlmMessage.thinkingText.
Reasoning token usage for precise billing and audit trails.
Thinking signatures for cryptographically verifying model outputs.

Whether you're using ask() or askStream(), the library handles the heavy lifting of aggregating chunks and persisting them to your database.

Professional Migrations (Prisma Migrate)

We've moved from "quick and dirty" database pushes to professional Prisma Migrate workflows.

Versioned, reproducible migrations are now the library standard. NodeLLM now includes a formalized migration history in prisma/migrations, ensuring that your production deployments are safe, predictable, and scalable.

Infrastructure, Not Just a Wrapper

NodeLLM's goal remains the same: to turn LLM integration into boring, reliable infrastructure.

With 1.7.0, we've taken the most complex new feature in LLMs—Extended Thinking—and made it as simple as a method call.

You can upgrade today:

npm install @node-llm/core@latest @node-llm/orm@latest

Check out the updated Reasoning Documentation for full implementation details.

Building with NodeLLM? Join the conversation on GitHub.