DeepSeek V3.1 Terminus
Description
DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek-ai/deepseek-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active). It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.
At a Glance
Key pricing and model details available for this model.
Input price
$0.23
per 1M tokens
Output price
$0.64
per 1M tokens
Context window
164K
tokens
Hallucination rate
2%
Token Pricing
Token pricing normalized to per-million-token rates.
Input / 1M tokens
$0.23
Output / 1M tokens
$0.64
Cache Read / 1M tokens
Free
Token Pricing Details
Rates are shown per 1M tokens for easier comparison.
| Input / 1M tokens | $0.23 |
| Input unit | 1M tokens |
| Output / 1M tokens | $0.64 |
| Output unit | 1M tokens |
| Cache Read / 1M tokens | Free |
| Cache Read unit | 1M tokens |
Feature Availability
Capabilities explicitly listed in the current payload.
LLM
Available
Vision
Not listed
Function calling
Available
Reasoning
Not listed
Supported Parameters
Artificial Analysis
Index scores currently reported for this model.
Intelligence Index
28.5
Coding Index
31.9
Math Index
53.7
Category Radar
Aggregated from the benchmark values present for reasoning, code, math, and accuracy.
Benchmark Breakdown
Detailed benchmark results drawn from the current payload.
Intelligence Index
Overall 'how smart' score for an AI, combining reasoning, math, coding, and knowledge.
28.5
Reported score
Coding Index
How well the model handles real programming tasks.
31.9
Reported score
Math Index
Composite score measuring mathematical reasoning and problem-solving.
53.7
Reported score
MMLU-Pro
A broad and difficult knowledge-and-reasoning benchmark across many subjects.
83.6%
Reported score
GPQA
Graduate-level science questions designed to be difficult to shortcut.
75.1%
Reported score
HLE
A very hard expert-level exam across a wide range of subjects.
8.4%
Reported score
LiveCodeBench
Fresh programming tasks meant to test current coding ability.
52.9%
Reported score
SciCode
Coding tasks drawn from real scientific workflows.
32.1%
Reported score
AIME 2025
The 2025 AIME benchmark used to reduce data leakage concerns.
53.7%
Reported score
IFBench
Measures how precisely the model follows detailed instructions.
41.2%
Reported score
LCR
Tests long-context reasoning over large documents and conversations.
43.3%
Reported score
TerminalBench Hard
A harder coding-agent benchmark for complex multi-step terminal tasks.
31.8%
Reported score
Tau2
Evaluates realistic agent behavior in tool-using support workflows.
37.1%
Reported score
Code Samples
Quick start with the Routeway API
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: "https://api.routeway.ai/v1",
apiKey: "<YOUR_API_KEY>",
});
async function main() {
const completion = await openai.chat.completions.create({
model: "deepseek-v3.1-terminus",
messages: [
{
role: "user",
content: "Explain quantum computing in simple terms"
}
]
});
console.log(completion.choices[0].message);
}
main();