Hailey API
An OpenAI-compatible AI endpoint. Point any OpenAI-style client or SDK at the Hailey gateway, authenticate with your service key, and get streaming or complete responses — with optional durable per-project context.
Base URL https://ask.hailey.io/v1
Auth Authorization: Bearer hly_… (create keys at /settings/api-keys)
- Quickstart
- Authentication
- Models & engines
- POST /v1/chat/completions
- Streaming
- Projects — durable context
- The
haileyextension & headers - Parameters
- Errors
Quickstart
cURL:
curl https://ask.hailey.io/v1/chat/completions \
-H "Authorization: Bearer $HAILEY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-8",
"messages": [{"role": "user", "content": "Hello, Hailey."}]
}'
Python (official OpenAI SDK — just change the base URL):
from openai import OpenAI
client = OpenAI(
base_url="https://ask.hailey.io/v1",
api_key="hly_…",
)
resp = client.chat.completions.create(
model="claude-opus-4-8",
messages=[{"role": "user", "content": "Hello, Hailey."}],
)
print(resp.choices[0].message.content)
JavaScript / TypeScript:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://ask.hailey.io/v1",
apiKey: process.env.HAILEY_API_KEY,
});
const resp = await client.chat.completions.create({
model: "claude-opus-4-8",
messages: [{ role: "user", content: "Hello, Hailey." }],
});
console.log(resp.choices[0].message.content);
Authentication
Every request needs a service key in the Authorization header:
Authorization: Bearer hly_…. Keys are created per integration on the
API keys page, are shown once at creation, and can
be revoked at any time. Each key can carry a default project — the context
it runs in when a request doesn't name one (see Projects).
Models & engines
GET /v1/models lists the models your key can
use. A model id pins the engine (the AI backend adapter), the
model family, and the version in one string — pick
the id, and the gateway routes to the right backend. Currently enabled:
| Model id | Engine | Context window | Max output |
|---|---|---|---|
claude-opus-4-8 |
anthropic | 200,000 | 32,000 |
claude-opus-4-7 |
anthropic | 200,000 | 32,000 |
claude-sonnet-4-6 |
anthropic | 200,000 | 64,000 |
claude-haiku-4-5 |
anthropic | 200,000 | 8,192 |
claude-fable-5 |
anthropic | 200,000 | 64,000 |
Additional engines (e.g. OpenAI-family models) appear in this list as adapters are
enabled — your integration code doesn't change, you just select a different model id.
Calling a registered model whose engine isn't enabled yet returns
501 engine_not_available. If a request omits model, the
project's default model is used.
POST /v1/chat/completions
OpenAI chat-completions shape. Minimal request:
{
"model": "claude-opus-4-8",
"messages": [
{"role": "system", "content": "Answer in one sentence."},
{"role": "user", "content": "What is Hailey?"}
]
}
Response:
{
"id": "chatcmpl-…",
"object": "chat.completion",
"model": "claude-opus-4-8",
"choices": [{
"index": 0,
"message": { "role": "assistant", "content": "…" },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 123, "completion_tokens": 45, "total_tokens": 168 }
}
Send the full messages history each call, exactly as with OpenAI —
prior user/assistant turns are treated as conversation
context. system messages are appended to the project's own system
prompt (project instructions always apply first).
Streaming
Set "stream": true to receive server-sent events of
chat.completion.chunk objects, terminated by data: [DONE] —
the standard OpenAI streaming contract, so SDK streaming helpers work unchanged.
The final chunk carries usage.
stream = client.chat.completions.create(model="…", messages=[…], stream=True)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
Projects — durable context
A plain gateway call is stateless: it runs in a fresh, empty sandbox. A project gives your calls durable context — the gateway equivalent of running inside your project directory:
| What a project carries | Effect on every call |
|---|---|
| System prompt | Always-on instructions, applied before any request system message. |
CLAUDE.md | A project brief placed at the workspace root and auto-loaded by the engine. |
| Context files | Seeded into the call's workspace — the model can read them while answering. |
| Defaults | Model (engine + version), parameters, tool allowlist, workspace mode. |
Create and manage projects at /settings/projects.
Bind one to a key as its default, or select per request with
hailey.project. Keys without a project use an auto-provisioned personal
default project. A key can only reach projects owned by the same account.
Workspace modes
| Mode | Behavior | Concurrency |
|---|---|---|
conversation (default) |
Each call gets a fresh workspace seeded from the project's context files. File mutations are discarded after the call. | Fully parallel. |
shared |
Calls run inside the project's persistent workspace — files written by one call are visible to the next. | Serialized per project; a concurrent call gets 409 project_busy. |
Conversations (shared mode)
In shared mode you may pass a conversation id of your
choosing. The gateway maps it to a persistent engine session and resumes it on every
call with the same id — server-side memory without re-sending history:
"hailey": { "workspace": "shared", "conversation": "onboarding-7c2" }
The hailey extension & headers
Hailey-specific options ride in an optional hailey object in the request
body (use extra_body in the OpenAI SDKs). Clients that can't add body
fields can send the equivalent headers.
{
"model": "claude-opus-4-8",
"messages": [ … ],
"hailey": {
"project": "birthday-gold", // project slug or id (default: the key's project)
"workspace": "shared", // 'conversation' (default) | 'shared'
"conversation": "onboard-7c2", // shared mode only: resume this session
"params": { "max_turns": 8 } // per-call engine parameter overrides
}
}
# Header equivalents
X-Hailey-Project: birthday-gold
X-Hailey-Workspace: shared
X-Hailey-Conversation: onboard-7c2
Python SDK example:
client.chat.completions.create(
model="claude-opus-4-8",
messages=[{"role": "user", "content": "Summarize the seed data."}],
extra_body={"hailey": {"project": "birthday-gold"}},
)
Parameters
Parameters resolve in three layers: project defaults → per-request
hailey.params → standard OpenAI top-level fields. Support depends on the
engine; unsupported parameters are accepted and ignored, never fatal:
| Parameter | Where | anthropic engine |
|---|---|---|
model | top level | Honored — selects engine, family, and version. |
stream | top level | Honored. |
messages (system) | top level | Honored — appended to the project system prompt. |
max_turns | hailey.params / project | Honored — caps agentic tool-use turns (1–50, default 20). |
| tool allowlist | project setting | Honored — restricts which tools the engine may use. |
temperature, top_p, max_tokens | top level | Accepted, ignored (engine manages sampling and output budget). |
Errors
Errors use the OpenAI error envelope:
{ "error": { "type": "authentication_error", "message": "Invalid API key.", "code": null } }
| Status | Code | Meaning |
|---|---|---|
| 401 | — | Missing or invalid Authorization: Bearer hly_…. |
| 400 | invalid_workspace_mode, conversation_requires_shared, … | Malformed request. |
| 404 | project_not_found | Project unknown to this account (or archived). |
| 404 | model_not_found | Unknown or disabled model id. |
| 409 | project_busy | Another shared-mode call holds the project. Retry with backoff. |
| 501 | engine_not_available | The model's engine has no adapter enabled yet. |
| 502 | upstream_error | The engine failed mid-call. Safe to retry. |
Need a key? Request access, then create one under Settings → API keys.