AI Dev Tools
·4 min read·tutorial

How to Build an OpenAI Compatible API in TypeScript (And Why You Should)

Avoid vendor lock-in by replicating the chat completions spec. Learn how to build an OpenAI compatible API using TypeScript, Hono, and Zod.

When Google cloned 11,500 lines of Java SE API declarations for Android, they weren't just saving time—they were securing their platform's survival. Oracle sued, asserting copyright over the structure, sequence, and organization of those APIs. The Supreme Court's landmark ruling that API replication is "fair use" fundamentally protected the right of developers to build interoperable software.

Today, we are living through a historical rerun. OpenAI’s /v1/chat/completions endpoint has become the de facto standard interface for large language models. If your application code is littered with imports from the proprietary openai SDK, you have voluntarily locked yourself into a single vendor's ecosystem.

To maintain architectural sovereignty, you need to decouple your client-side code from your model provider. This guide will show you how to build an OpenAI compatible API using TypeScript and Hono, allowing you to seamlessly swap between OpenAI, Anthropic, local Llama instances, or proprietary internal models without changing a single line of frontend code.

Why API Compatibility is the Ultimate Developer Leverage

Hardcoding vendor-specific SDKs into your codebase is technical debt by design. If you want to route a high-volume, low-complexity request to a cheap local model like Mistral-7B, but send complex reasoning tasks to Claude 3.5 Sonnet, a tightly coupled codebase will force you into writing endless conditional wrappers.

By learning how to build an OpenAI compatible API, you gain three immediate advantages:

  1. Zero-Downtime Model Swapping: You can point your frontend, or even third-party tools that expect an OpenAI endpoint (like Cursor, AutoGen, or LibreChat), to your custom gateway.
  2. Enterprise-Grade Auditing: You can intercept, sanitize, rate-limit, and log prompts and completions before they hit external servers.
  3. Cost Optimization: You can dynamically route requests to cheaper self-hosted models running on vLLM or Ollama when OpenAI's latency spikes.

Instead of adopting heavy, opinionated frameworks like LangChain, building a lightweight, standards-compliant proxy layer gives you complete control over your AI agent architectures.


Designing the Architecture: How to Build an OpenAI Compatible API

To build a fully drop-in replacement, we must strictly adhere to OpenAI's request and response payloads. The core endpoint we need to replicate is:

POST /v1/chat/completions

Our server must support both standard JSON responses and Server-Sent Events (SSE) for real-time streaming.

┌─────────────────┐      POST /v1/chat/completions      ┌──────────────────────┐
│  Client App /   │ ──────────────────────────────────> │  Our Custom Gateway  │
│   OpenAI SDK    │ <────────────────────────────────── │   (TypeScript/Hono)  │
└─────────────────┘         JSON or SSE Stream          └──────────────────────┘
                                                                   │
                                           ┌───────────────────────┴───────────────────────┐
                                           ▼                                               ▼
                               ┌──────────────────────┐                        ┌──────────────────────┐
                               │   Anthropic API      │                        │ Local Ollama / vLLM  │
                               └──────────────────────┘                        └──────────────────────┘

For this implementation, we will use Hono running on Bun (or Node.js). Hono is incredibly fast, has native support for web-standard streams, and has zero external dependencies, making it the perfect tool for high-throughput proxying.


Step-by-Step: How to Build an OpenAI Compatible API with Hono

Let's write a production-ready TypeScript gateway. We will implement the /v1/chat/completions endpoint, validate the incoming payload using zod, and translate the request to the Anthropic Claude API under the hood as our target backend. This proves how easily we can translate between incompatible SDK formats while presenting a unified OpenAI interface to our clients.

1. Project Initialization and Schema Definitions

First, initialize your project and install the necessary dependencies:

bash
mkdir openai-compat-gateway
cd openai-compat-gateway
bun init -y
bun add hono zod @anthropic-ai/sdk

Next, create a schemas.ts file. We need to rigorously type the incoming OpenAI payload. While the official spec has dozens of parameters, we will focus on the core requirements: model, messages, stream, temperature, and max_tokens.

typescript
// schemas.ts
import { z } from 'zod';
 
export const ChatMessageSchema = z.object({
  role: z.enum(['system', 'user', 'assistant', 'tool']),
  content: z.string(),
  name: z.string().optional(),
});
 
export const OpenAICompletionRequestSchema = z.object({
  model: z.string(),
  messages: z.array(ChatMessageSchema),
  stream: z.boolean().optional().default(false),
  temperature: z.number().min(0).max(2).optional().default(1),
  max_tokens: z.number().integer().positive().optional(),
});
 
export type OpenAICompletionRequest = z.infer<typeof OpenAICompletionRequestSchema>;
export type ChatMessage = z.infer<typeof ChatMessageSchema>;

2. Implementing the Gateway Server

Now, let's write the core server in index.ts. We will initialize Hono, parse and validate the request, and route it. To make our gateway truly drop-in, we must also handle authentication headers exactly like OpenAI does (Authorization: Bearer <KEY>).

typescript
// index.ts
import { Hono } from 'hono';
import { streamSSE } from 'hono/streaming';
import Anthropic from '@anthropic-ai/sdk';
import { OpenAICompletionRequestSchema, type OpenAICompletionRequest } from './schemas';
 
const app = new Hono();
 
// Initialize the Anthropic client using our environment variable
const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY || '',
});
 
app.post('/v1/chat/completions', async (c) => {
  // 1. Authenticate the client (optional, but recommended)
  const authHeader = c.req.header('Authorization');
  if (!authHeader || !authHeader.startsWith('Bearer ')) {
    return c.json({ error: { message: 'Invalid API Key', type: 'invalid_request_error' } }, 401);
  }
 
  // 2. Validate the incoming payload
  const body = await c.req.json();
  const parseResult = OpenAICompletionRequestSchema.safeParse(body);
  
  if (!parseResult.success) {
    return c.json({ error: { message: parseResult.error.message, type: 'invalid_request_error' } }, 400);
  }
 
  const payload = parseResult.data;
 
  // 3. Translate OpenAI parameters to Anthropic format
  const systemMessage = payload.messages.find(m => m.role === 'system')?.content;
  const filteredMessages = payload.messages.filter(m => m.role !== 'system') as any[];
 
  // 4. Route based on streaming preference
  if
ShareTweet

Related Posts