Hailey API

An OpenAI-compatible AI endpoint. Point any OpenAI-style client or SDK at the Hailey gateway, authenticate with your service key, and get streaming or complete responses — with optional durable per-project context.

Base URL   https://ask.hailey.io/v1
Auth       Authorization: Bearer hly_…   (create keys at /settings/api-keys)

Quickstart
Authentication
Models & engines
POST /v1/chat/completions
Streaming
Projects — durable context
The hailey extension & headers
Parameters
Errors

Quickstart

cURL:

curl https://ask.hailey.io/v1/chat/completions \
  -H "Authorization: Bearer $HAILEY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-8",
    "messages": [{"role": "user", "content": "Hello, Hailey."}]
  }'

Python (official OpenAI SDK — just change the base URL):

from openai import OpenAI

client = OpenAI(
    base_url="https://ask.hailey.io/v1",
    api_key="hly_…",
)

resp = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "Hello, Hailey."}],
)
print(resp.choices[0].message.content)

JavaScript / TypeScript:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://ask.hailey.io/v1",
  apiKey: process.env.HAILEY_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "claude-opus-4-8",
  messages: [{ role: "user", content: "Hello, Hailey." }],
});
console.log(resp.choices[0].message.content);

Authentication

Every request needs a service key in the Authorization header: Authorization: Bearer hly_…. Keys are created per integration on the API keys page, are shown once at creation, and can be revoked at any time. Each key can carry a default project — the context it runs in when a request doesn't name one (see Projects).

Models & engines

GET /v1/models lists the models your key can use. A model id pins the engine (the AI backend adapter), the model family, and the version in one string — pick the id, and the gateway routes to the right backend. Currently enabled:

Model id	Engine	Context window	Max output
`claude-opus-4-8`	anthropic	200,000	32,000
`claude-opus-4-7`	anthropic	200,000	32,000
`claude-sonnet-4-6`	anthropic	200,000	64,000
`claude-haiku-4-5`	anthropic	200,000	8,192
`claude-fable-5`	anthropic	200,000	64,000

Additional engines (e.g. OpenAI-family models) appear in this list as adapters are enabled — your integration code doesn't change, you just select a different model id. Calling a registered model whose engine isn't enabled yet returns 501 engine_not_available. If a request omits model, the project's default model is used.

POST /v1/chat/completions

OpenAI chat-completions shape. Minimal request:

{
  "model": "claude-opus-4-8",
  "messages": [
    {"role": "system", "content": "Answer in one sentence."},
    {"role": "user",   "content": "What is Hailey?"}
  ]
}

Response:

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "claude-opus-4-8",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "…" },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 123, "completion_tokens": 45, "total_tokens": 168 }
}

Send the full messages history each call, exactly as with OpenAI — prior user/assistant turns are treated as conversation context. system messages are appended to the project's own system prompt (project instructions always apply first).

Streaming

Set "stream": true to receive server-sent events of chat.completion.chunk objects, terminated by data: [DONE] — the standard OpenAI streaming contract, so SDK streaming helpers work unchanged. The final chunk carries usage.

stream = client.chat.completions.create(model="…", messages=[…], stream=True)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Projects — durable context

A plain gateway call is stateless: it runs in a fresh, empty sandbox. A project gives your calls durable context — the gateway equivalent of running inside your project directory:

What a project carries	Effect on every call
System prompt	Always-on instructions, applied before any request `system` message.
`CLAUDE.md`	A project brief placed at the workspace root and auto-loaded by the engine.
Context files	Seeded into the call's workspace — the model can read them while answering.
Defaults	Model (engine + version), parameters, tool allowlist, workspace mode.

Create and manage projects at /settings/projects. Bind one to a key as its default, or select per request with hailey.project. Keys without a project use an auto-provisioned personal default project. A key can only reach projects owned by the same account.

Workspace modes

Mode	Behavior	Concurrency
`conversation` (default)	Each call gets a fresh workspace seeded from the project's context files. File mutations are discarded after the call.	Fully parallel.
`shared`	Calls run inside the project's persistent workspace — files written by one call are visible to the next.	Serialized per project; a concurrent call gets `409 project_busy`.

Conversations (shared mode)

In shared mode you may pass a conversation id of your choosing. The gateway maps it to a persistent engine session and resumes it on every call with the same id — server-side memory without re-sending history:

"hailey": { "workspace": "shared", "conversation": "onboarding-7c2" }

The `hailey` extension & headers

Hailey-specific options ride in an optional hailey object in the request body (use extra_body in the OpenAI SDKs). Clients that can't add body fields can send the equivalent headers.

{
  "model": "claude-opus-4-8",
  "messages": [ … ],
  "hailey": {
    "project":      "birthday-gold",   // project slug or id (default: the key's project)
    "workspace":    "shared",          // 'conversation' (default) | 'shared'
    "conversation": "onboard-7c2",     // shared mode only: resume this session
    "params":       { "max_turns": 8 } // per-call engine parameter overrides
  }
}

# Header equivalents
X-Hailey-Project: birthday-gold
X-Hailey-Workspace: shared
X-Hailey-Conversation: onboard-7c2

Python SDK example:

client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "Summarize the seed data."}],
    extra_body={"hailey": {"project": "birthday-gold"}},
)

Parameters

Parameters resolve in three layers: project defaults → per-request hailey.params → standard OpenAI top-level fields. Support depends on the engine; unsupported parameters are accepted and ignored, never fatal:

Parameter	Where	anthropic engine
`model`	top level	Honored — selects engine, family, and version.
`stream`	top level	Honored.
`messages` (system)	top level	Honored — appended to the project system prompt.
`max_turns`	`hailey.params` / project	Honored — caps agentic tool-use turns (1–50, default 20).
tool allowlist	project setting	Honored — restricts which tools the engine may use.
`temperature`, `top_p`, `max_tokens`	top level	Accepted, ignored (engine manages sampling and output budget).

Errors

Errors use the OpenAI error envelope:

{ "error": { "type": "authentication_error", "message": "Invalid API key.", "code": null } }

Status	Code	Meaning
401	—	Missing or invalid `Authorization: Bearer hly_…`.
400	`invalid_workspace_mode`, `conversation_requires_shared`, …	Malformed request.
404	`project_not_found`	Project unknown to this account (or archived).
404	`model_not_found`	Unknown or disabled model id.
409	`project_busy`	Another `shared`-mode call holds the project. Retry with backoff.
501	`engine_not_available`	The model's engine has no adapter enabled yet.
502	`upstream_error`	The engine failed mid-call. Safe to retry.

Need a key? Request access, then create one under Settings → API keys.