- Published on
- • 4 min read
NodeLLM 1.10: Production-Grade Middleware System
- Authors

- Name
- Shaiju Edakulangara
- @eshaiju
Every production LLM system eventually faces the same problem: you need to log requests, track costs, redact PII, enforce budgets, and audit interactions—but you don't want this cross-cutting logic scattered throughout your codebase.
NodeLLM 1.10 introduces a first-class middleware system that solves this at the infrastructure level.
The Problem with Raw LLM Calls
When you call an LLM directly, you're on your own for:
- Observability: How do you trace a request through your system?
- Cost Control: How do you prevent a runaway agent from burning your budget?
- Security: How do you ensure PII never reaches the provider?
- Auditing: How do you maintain a permanent record of AI decisions?
Most teams end up with wrapper functions, scattered try-catch blocks, and inconsistent logging. The result? Fragile code that's impossible to maintain.
Middleware: The Infrastructure Layer
NodeLLM's middleware system gives you interception points at every stage of the LLM lifecycle:
interface Middleware {
name: string;
// Request/Response Lifecycle
onRequest?: (context: MiddlewareContext) => Promise<void> | void;
onResponse?: (context: MiddlewareContext, result: ChatResponseString) => Promise<void> | void;
onError?: (context: MiddlewareContext, error: Error) => Promise<void> | void;
// Tool Execution Lifecycle
onToolCallStart?: (context: MiddlewareContext, tool: ToolCall) => Promise<void> | void;
onToolCallEnd?: (context: MiddlewareContext, tool: ToolCall, result: unknown) => Promise<void> | void;
onToolCallError?: (context: MiddlewareContext, tool: ToolCall, error: Error) => Promise<ToolErrorDirective>;
}
This isn't just logging—it's full lifecycle control.
Real-World Example: Observability + Cost Guard
Here's how a production system might stack middlewares:
import {
NodeLLM,
PIIMaskMiddleware,
CostGuardMiddleware,
UsageLoggerMiddleware
} from "@node-llm/core";
// Stack middlewares for defense-in-depth
const chat = NodeLLM.chat("gpt-4o", {
middlewares: [
new PIIMaskMiddleware({ mask: "[REDACTED]" }), // Scrub PII before provider
new CostGuardMiddleware({ maxCost: 0.10 }), // Budget enforcement
new UsageLoggerMiddleware({ prefix: "HR-BOT" }) // Structured logging
]
});
await chat.ask("Process this employee query...");
Each middleware does one thing well, and the stack composes them into a robust pipeline.
The Onion Model
Middleware execution follows the onion model—outer middlewares wrap inner ones:
┌─────────────────────────────────────────┐
│ Logger.onRequest │
│ ┌─────────────────────────────────┐ │
│ │ Security.onRequest │ │
│ │ │ │
│ │ [ LLM Provider Call ] │ │
│ │ │ │
│ │ Security.onResponse │ │
│ └─────────────────────────────────┘ │
│ Logger.onResponse │
└─────────────────────────────────────────┘
This means:
- onRequest: Executed first-to-last (Logger → Security)
- onResponse: Executed last-to-first (Security → Logger)
- onToolCallEnd/onError: Also last-to-first
The logger sees the final state after all transformations. This is critical for accurate auditing.
Tool Execution Hooks
LLM agents with tool calling need special attention. A single ask() call might trigger multiple tool executions, each of which could fail or timeout.
NodeLLM 1.10 gives you hooks for the entire tool lifecycle:
const toolMonitor = {
name: "ToolMonitor",
onToolCallStart: async (ctx, tool) => {
console.log(`[${ctx.requestId}] Calling tool: ${tool.function.name}`);
ctx.state.toolStart = Date.now();
},
onToolCallEnd: async (ctx, tool, result) => {
const duration = Date.now() - ctx.state.toolStart;
await metrics.track("tool_execution", {
tool: tool.function.name,
duration,
success: true
});
},
onToolCallError: async (ctx, tool, error) => {
await alerting.notify(`Tool ${tool.function.name} failed: ${error.message}`);
return { action: "retry", maxRetries: 2 }; // Directive back to the engine
}
};
Global Middlewares
For organization-wide policies, register middlewares at the LLM instance level:
import { createLLM, PIIMaskMiddleware } from "@node-llm/core";
const llm = createLLM({
provider: "openai",
middlewares: [
new PIIMaskMiddleware() // Applied to ALL chats, embeddings, etc.
]
});
// Every operation inherits the global middleware
const chat1 = llm.chat("gpt-4o");
const chat2 = llm.chat("gpt-4o-mini");
const embedding = llm.embed("Some text");
You can still add per-chat middlewares that extend (not replace) the global stack.
Built-in Middleware Library
NodeLLM ships with production-ready middlewares:
PIIMaskMiddleware
Redacts emails, phone numbers, credit cards, and SSNs before they reach the provider:
new PIIMaskMiddleware({ mask: "[REDACTED]" })
CostGuardMiddleware
Prevents runaway costs during agentic loops:
new CostGuardMiddleware({
maxCost: 0.05,
onLimitExceeded: (ctx, cost) => alerting.warn(`Budget exceeded: $${cost}`)
})
UsageLoggerMiddleware
Structured logging for token usage and costs:
new UsageLoggerMiddleware({ prefix: "MY-SERVICE" })
// Output: [MY-SERVICE] req_abc123 | gpt-4o | 1,234 tokens | $0.0045
Why This Matters
The middleware system transforms NodeLLM from a "wrapper library" into production infrastructure.
You can now:
- Centralize cross-cutting concerns without touching business logic
- Compose behaviors by stacking middlewares
- Enforce organization-wide policies through global registration
- Trace and debug agentic systems with tool execution hooks
- Build compliance with automatic PII scrubbing and audit trails
This is how you scale LLM systems from prototype to production.
Upgrade Today
npm install @node-llm/core@1.10.0
Check out the full Middleware Documentation.
For production observability with cost tracking, dashboards, and database persistence, see our dedicated post: NodeLLM Monitor: Production Observability.
Building production LLM systems with NodeLLM? Join the conversation on GitHub.