Hailey API

An OpenAI-compatible AI endpoint. Point any OpenAI-style client or SDK at the Hailey gateway, authenticate with your service key, and get streaming or complete responses — with optional durable per-project context.

Base URL   https://ask.hailey.io/v1
Auth       Authorization: Bearer hly_…   (create keys at /settings/api-keys)

Quickstart

cURL:

curl https://ask.hailey.io/v1/chat/completions \
  -H "Authorization: Bearer $HAILEY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-8",
    "messages": [{"role": "user", "content": "Hello, Hailey."}]
  }'

Python (official OpenAI SDK — just change the base URL):

from openai import OpenAI

client = OpenAI(
    base_url="https://ask.hailey.io/v1",
    api_key="hly_…",
)

resp = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "Hello, Hailey."}],
)
print(resp.choices[0].message.content)

JavaScript / TypeScript:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://ask.hailey.io/v1",
  apiKey: process.env.HAILEY_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "claude-opus-4-8",
  messages: [{ role: "user", content: "Hello, Hailey." }],
});
console.log(resp.choices[0].message.content);

Authentication

Every request needs a service key in the Authorization header: Authorization: Bearer hly_…. Keys are created per integration on the API keys page, are shown once at creation, and can be revoked at any time. Each key can carry a default project — the context it runs in when a request doesn't name one (see Projects).

Models & engines

GET /v1/models lists the models your key can use. A model id pins the engine (the AI backend adapter), the model family, and the version in one string — pick the id, and the gateway routes to the right backend. Currently enabled:

Model idEngineContext windowMax output
claude-opus-4-8 anthropic 200,000 32,000
claude-opus-4-7 anthropic 200,000 32,000
claude-sonnet-4-6 anthropic 200,000 64,000
claude-haiku-4-5 anthropic 200,000 8,192
claude-fable-5 anthropic 200,000 64,000

Additional engines (e.g. OpenAI-family models) appear in this list as adapters are enabled — your integration code doesn't change, you just select a different model id. Calling a registered model whose engine isn't enabled yet returns 501 engine_not_available. If a request omits model, the project's default model is used.

POST /v1/chat/completions

OpenAI chat-completions shape. Minimal request:

{
  "model": "claude-opus-4-8",
  "messages": [
    {"role": "system", "content": "Answer in one sentence."},
    {"role": "user",   "content": "What is Hailey?"}
  ]
}

Response:

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "claude-opus-4-8",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "…" },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 123, "completion_tokens": 45, "total_tokens": 168 }
}

Send the full messages history each call, exactly as with OpenAI — prior user/assistant turns are treated as conversation context. system messages are appended to the project's own system prompt (project instructions always apply first).

Streaming

Set "stream": true to receive server-sent events of chat.completion.chunk objects, terminated by data: [DONE] — the standard OpenAI streaming contract, so SDK streaming helpers work unchanged. The final chunk carries usage.

stream = client.chat.completions.create(model="…", messages=[…], stream=True)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Projects — durable context

A plain gateway call is stateless: it runs in a fresh, empty sandbox. A project gives your calls durable context — the gateway equivalent of running inside your project directory:

What a project carriesEffect on every call
System promptAlways-on instructions, applied before any request system message.
CLAUDE.mdA project brief placed at the workspace root and auto-loaded by the engine.
Context filesSeeded into the call's workspace — the model can read them while answering.
DefaultsModel (engine + version), parameters, tool allowlist, workspace mode.

Create and manage projects at /settings/projects. Bind one to a key as its default, or select per request with hailey.project. Keys without a project use an auto-provisioned personal default project. A key can only reach projects owned by the same account.

Workspace modes

ModeBehaviorConcurrency
conversation (default) Each call gets a fresh workspace seeded from the project's context files. File mutations are discarded after the call. Fully parallel.
shared Calls run inside the project's persistent workspace — files written by one call are visible to the next. Serialized per project; a concurrent call gets 409 project_busy.

Conversations (shared mode)

In shared mode you may pass a conversation id of your choosing. The gateway maps it to a persistent engine session and resumes it on every call with the same id — server-side memory without re-sending history:

"hailey": { "workspace": "shared", "conversation": "onboarding-7c2" }

The hailey extension & headers

Hailey-specific options ride in an optional hailey object in the request body (use extra_body in the OpenAI SDKs). Clients that can't add body fields can send the equivalent headers.

{
  "model": "claude-opus-4-8",
  "messages": [ … ],
  "hailey": {
    "project":      "birthday-gold",   // project slug or id (default: the key's project)
    "workspace":    "shared",          // 'conversation' (default) | 'shared'
    "conversation": "onboard-7c2",     // shared mode only: resume this session
    "params":       { "max_turns": 8 } // per-call engine parameter overrides
  }
}
# Header equivalents
X-Hailey-Project: birthday-gold
X-Hailey-Workspace: shared
X-Hailey-Conversation: onboard-7c2

Python SDK example:

client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "Summarize the seed data."}],
    extra_body={"hailey": {"project": "birthday-gold"}},
)

Parameters

Parameters resolve in three layers: project defaults → per-request hailey.params → standard OpenAI top-level fields. Support depends on the engine; unsupported parameters are accepted and ignored, never fatal:

ParameterWhereanthropic engine
modeltop levelHonored — selects engine, family, and version.
streamtop levelHonored.
messages (system)top levelHonored — appended to the project system prompt.
max_turnshailey.params / projectHonored — caps agentic tool-use turns (1–50, default 20).
tool allowlistproject settingHonored — restricts which tools the engine may use.
temperature, top_p, max_tokenstop levelAccepted, ignored (engine manages sampling and output budget).

Errors

Errors use the OpenAI error envelope:

{ "error": { "type": "authentication_error", "message": "Invalid API key.", "code": null } }
StatusCodeMeaning
401Missing or invalid Authorization: Bearer hly_….
400invalid_workspace_mode, conversation_requires_shared, …Malformed request.
404project_not_foundProject unknown to this account (or archived).
404model_not_foundUnknown or disabled model id.
409project_busyAnother shared-mode call holds the project. Retry with backoff.
501engine_not_availableThe model's engine has no adapter enabled yet.
502upstream_errorThe engine failed mid-call. Safe to retry.

Need a key? Request access, then create one under Settings → API keys.