
使用 LLM 构建:实际差异
将大语言模型集成到应用中与调用典型的 REST API 不同。响应是非确定性的,可能非常长,随时间流式传输,并以新颖的方式失败。

设置与身份验证
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
timeout: 30000,
maxRetries: 3,
});
聊天补全
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'You are a helpful coding assistant.' },
{ role: 'user', content: 'Explain closures in JavaScript.' }
],
temperature: 0.7,
max_tokens: 1000,
});
const answer = response.choices[0].message.content;
console.log(response.usage); // { prompt_tokens, completion_tokens, total_tokens }

多轮对话
class Conversation {
private messages = [];
constructor(systemPrompt) {
this.messages.push({ role: 'system', content: systemPrompt });
}
async send(userMessage) {
this.messages.push({ role: 'user', content: userMessage });
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: this.messages,
});
const reply = response.choices[0].message;
this.messages.push(reply);
return reply.content;
}
}
流式响应
// SSE endpoint
app.post('/api/chat/stream', async (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
const stream = openai.chat.completions.stream({
model: 'gpt-4o',
messages: req.body.messages,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) res.write(`data: ${JSON.stringify({ content: delta })}
`);
}
res.write('data: [DONE]
');
res.end();
});

函数调用(工具使用)
const tools = [{
type: 'function',
function: {
name: 'get_weather',
description: 'Get weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name' },
},
required: ['location'],
},
},
}];
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: "Weather in Tokyo?" }],
tools,
tool_choice: 'auto',
});
// If model wants to call a function:
if (response.choices[0].message.tool_calls) {
const toolCall = response.choices[0].message.tool_calls[0];
const args = JSON.parse(toolCall.function.arguments);
const weatherData = await getWeatherAPI(args.location);
// Send result back to model for final response
}
嵌入向量用于语义搜索
// Generate embeddings
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'How do I reset my password?',
});
// 1536-dimensional vector — similar text = similar vectors
const vector = embedding.data[0].embedding;
// Cosine similarity for semantic search
function cosineSimilarity(a, b) {
const dot = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dot / (magA * magB);
}
生产最佳实践
// 1. Cost tracking
class OpenAIClient {
private totalTokens = 0;
async complete(messages) {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini', messages
});
this.totalTokens += response.usage.total_tokens;
return response;
}
}
// 2. Retry with backoff on rate limits
async function withRetry(fn, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (err) {
if (err.status === 429 && i < maxRetries - 1) {
await new Promise(r => setTimeout(r, 2 ** i * 1000));
continue;
}
throw err;
}
}
}
// 3. Response caching (deterministic queries)
const cache = new Map();
async function cachedComplete(prompt, temperature = 0) {
const key = `${prompt}:${temperature}`;
if (cache.has(key)) return cache.get(key);
const result = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: prompt }],
temperature,
});
cache.set(key, result);
return result;
}
模型选择指南
| 模型 | 速度 | 成本 | 最佳用途 |
|---|---|---|---|
| gpt-4o | 中等 | $ | 复杂推理、视觉 |
| gpt-4o-mini | 快 | $ | 大多数任务,成本敏感 |
| text-embedding-3-small | 快 | $0.00002/1K | 语义搜索 |
| text-embedding-3-large | 快 | $0.00013/1K | 高精度搜索 |
→ 使用 Token Generator 生成安全的 API 密钥。