聊天补全(流式)

通过 Server-Sent Events(SSE)实时获取 AI 输出,逐 token 流式返回。

POST https://allmodel.top/v1/chat/completions

流式响应的唯一区别是在请求中加入 "stream": true,返回结果为多个 SSE 数据块。

SSE 数据格式

每个数据块格式为 data: {"choices":[{"delta":{"content":"字"}}]},最后以 data: [DONE] 结束。

Python SDK 示例

Python 流式响应
from openai import OpenAI

client = OpenAI(
    api_key="am-your-api-key",
    base_url="https://allmodel.top/v1"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "用100字介绍人工智能"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

cURL SSE 示例

cURL 流式
curl https://allmodel.top/v1/chat/completions \
  -H "Authorization: Bearer am-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

SSE 响应格式

SSE 响应流
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1713000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"你好"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1713000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1713000000,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Node.js 示例

Node.js 流式
import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: 'am-your-api-key',
    baseURL: 'https://allmodel.top/v1'
});

const stream = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: 'Hello!' }],
    stream: true
});

for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
console.log();