NodeLLM 1.10: Production-Grade Middleware System

Every production LLM system eventually faces the same problem: you need to log requests, track costs, redact PII, enforce budgets, and audit interactions—but you don't want this cross-cutting logic scattered throughout your codebase.

NodeLLM 1.10 introduces a first-class middleware system that solves this at the infrastructure level.

The Problem with Raw LLM Calls

When you call an LLM directly, you're on your own for:

Observability: How do you trace a request through your system?
Cost Control: How do you prevent a runaway agent from burning your budget?
Security: How do you ensure PII never reaches the provider?
Auditing: How do you maintain a permanent record of AI decisions?

Most teams end up with wrapper functions, scattered try-catch blocks, and inconsistent logging. The result? Fragile code that's impossible to maintain.

Middleware: The Infrastructure Layer

NodeLLM's middleware system gives you interception points at every stage of the LLM lifecycle:

interface Middleware {
  name: string;
  
  // Request/Response Lifecycle
  onRequest?: (context: MiddlewareContext) => Promise<void> | void;
  onResponse?: (context: MiddlewareContext, result: ChatResponseString) => Promise<void> | void;
  onError?: (context: MiddlewareContext, error: Error) => Promise<void> | void;
  
  // Tool Execution Lifecycle
  onToolCallStart?: (context: MiddlewareContext, tool: ToolCall) => Promise<void> | void;
  onToolCallEnd?: (context: MiddlewareContext, tool: ToolCall, result: unknown) => Promise<void> | void;
  onToolCallError?: (context: MiddlewareContext, tool: ToolCall, error: Error) => Promise<ToolErrorDirective>;
}

This isn't just logging—it's full lifecycle control.

Real-World Example: Observability + Cost Guard

Here's how a production system might stack middlewares:

import { 
  NodeLLM, 
  PIIMaskMiddleware, 
  CostGuardMiddleware,
  UsageLoggerMiddleware 
} from "@node-llm/core";

// Stack middlewares for defense-in-depth
const chat = NodeLLM.chat("gpt-4o", {
  middlewares: [
    new PIIMaskMiddleware({ mask: "[REDACTED]" }), // Scrub PII before provider
    new CostGuardMiddleware({ maxCost: 0.10 }),    // Budget enforcement
    new UsageLoggerMiddleware({ prefix: "HR-BOT" }) // Structured logging
  ]
});

await chat.ask("Process this employee query...");

Each middleware does one thing well, and the stack composes them into a robust pipeline.

The Onion Model

Middleware execution follows the onion model—outer middlewares wrap inner ones:

┌─────────────────────────────────────────┐
│ Logger.onRequest                        │
│   ┌─────────────────────────────────┐   │
│   │ Security.onRequest              │   │
│   │                                 │   │
│   │      [ LLM Provider Call ]      │   │
│   │                                 │   │
│   │ Security.onResponse             │   │
│   └─────────────────────────────────┘   │
│ Logger.onResponse                       │
└─────────────────────────────────────────┘

This means:

onRequest: Executed first-to-last (Logger → Security)
onResponse: Executed last-to-first (Security → Logger)
onToolCallEnd/onError: Also last-to-first

The logger sees the final state after all transformations. This is critical for accurate auditing.

Tool Execution Hooks

LLM agents with tool calling need special attention. A single ask() call might trigger multiple tool executions, each of which could fail or timeout.

NodeLLM 1.10 gives you hooks for the entire tool lifecycle:

const toolMonitor = {
  name: "ToolMonitor",
  
  onToolCallStart: async (ctx, tool) => {
    console.log(`[${ctx.requestId}] Calling tool: ${tool.function.name}`);
    ctx.state.toolStart = Date.now();
  },
  
  onToolCallEnd: async (ctx, tool, result) => {
    const duration = Date.now() - ctx.state.toolStart;
    await metrics.track("tool_execution", {
      tool: tool.function.name,
      duration,
      success: true
    });
  },
  
  onToolCallError: async (ctx, tool, error) => {
    await alerting.notify(`Tool ${tool.function.name} failed: ${error.message}`);
    return { action: "retry", maxRetries: 2 }; // Directive back to the engine
  }
};

Global Middlewares

For organization-wide policies, register middlewares at the LLM instance level:

import { createLLM, PIIMaskMiddleware } from "@node-llm/core";

const llm = createLLM({
  provider: "openai",
  middlewares: [
    new PIIMaskMiddleware()  // Applied to ALL chats, embeddings, etc.
  ]
});

// Every operation inherits the global middleware
const chat1 = llm.chat("gpt-4o");
const chat2 = llm.chat("gpt-4o-mini");
const embedding = llm.embed("Some text");

You can still add per-chat middlewares that extend (not replace) the global stack.

Built-in Middleware Library

NodeLLM ships with production-ready middlewares:

PIIMaskMiddleware

Redacts emails, phone numbers, credit cards, and SSNs before they reach the provider:

new PIIMaskMiddleware({ mask: "[REDACTED]" })

CostGuardMiddleware

Prevents runaway costs during agentic loops:

new CostGuardMiddleware({ 
  maxCost: 0.05,
  onLimitExceeded: (ctx, cost) => alerting.warn(`Budget exceeded: $${cost}`)
})

UsageLoggerMiddleware

Structured logging for token usage and costs:

new UsageLoggerMiddleware({ prefix: "MY-SERVICE" })
// Output: [MY-SERVICE] req_abc123 | gpt-4o | 1,234 tokens | $0.0045

Why This Matters

The middleware system transforms NodeLLM from a "wrapper library" into production infrastructure.

You can now:

Centralize cross-cutting concerns without touching business logic
Compose behaviors by stacking middlewares
Enforce organization-wide policies through global registration
Trace and debug agentic systems with tool execution hooks
Build compliance with automatic PII scrubbing and audit trails

This is how you scale LLM systems from prototype to production.

Upgrade Today

npm install @node-llm/core@1.10.0

Check out the full Middleware Documentation.

For production observability with cost tracking, dashboards, and database persistence, see our dedicated post: NodeLLM Monitor: Production Observability.

Building production LLM systems with NodeLLM? Join the conversation on GitHub.