Step 3.5 Flash
Description
StepFun's most capable open-source reasoning model with visible reasoning traces. Built on a sparse Mixture-of-Experts architecture with 196B total parameters and only 11B active per token, it achieves frontier-level performance in math, logic, and agentic coding
At a Glance
Key pricing and model details available for this model.
Input price
$0.10
per 1M tokens
Output price
$0.30
per 1M tokens
Context window
256K
tokens
Hallucination rate
0%
Token Pricing
Token pricing normalized to per-million-token rates.
Input / 1M tokens
$0.10
Output / 1M tokens
$0.30
Cache Read / 1M tokens
$0.02
Token Pricing Details
Rates are shown per 1M tokens for easier comparison.
| Input / 1M tokens | $0.10 |
| Input unit | 1M tokens |
| Output / 1M tokens | $0.30 |
| Output unit | 1M tokens |
| Cache Read / 1M tokens | $0.02 |
| Cache Read unit | 1M tokens |
Feature Availability
Capabilities explicitly listed in the current payload.
LLM
Available
Vision
Not listed
Function calling
Not listed
Reasoning
Not listed
Supported Parameters
Artificial Analysis
Index scores currently reported for this model.
Intelligence Index
37.8
Coding Index
31.6
Category Radar
Aggregated from the benchmark values present for reasoning, code, math, and accuracy.
Benchmark Breakdown
Detailed benchmark results drawn from the current payload.
Intelligence Index
Overall 'how smart' score for an AI, combining reasoning, math, coding, and knowledge.
37.8
Reported score
Coding Index
How well the model handles real programming tasks.
31.6
Reported score
MMLU-Pro
A broad and difficult knowledge-and-reasoning benchmark across many subjects.
33.8%
Reported score
GPQA
Graduate-level science questions designed to be difficult to shortcut.
83.1%
Reported score
HLE
A very hard expert-level exam across a wide range of subjects.
19.1%
Reported score
LiveCodeBench
Fresh programming tasks meant to test current coding ability.
4.8%
Reported score
SciCode
Coding tasks drawn from real scientific workflows.
40.4%
Reported score
MATH-500
A set of difficult competition-style math problems.
16.4%
Reported score
AIME
Advanced math competition questions.
0.7%
Reported score
IFBench
Measures how precisely the model follows detailed instructions.
64.6%
Reported score
LCR
Tests long-context reasoning over large documents and conversations.
43%
Reported score
TerminalBench Hard
A harder coding-agent benchmark for complex multi-step terminal tasks.
27.3%
Reported score
Tau2
Evaluates realistic agent behavior in tool-using support workflows.
94.4%
Reported score
Code Samples
Quick start with the Routeway API
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: "https://api.routeway.ai/v1",
apiKey: "<YOUR_API_KEY>",
});
async function main() {
const completion = await openai.chat.completions.create({
model: "step-3.5-flash",
messages: [
{
role: "user",
content: "Explain quantum computing in simple terms"
}
]
});
console.log(completion.choices[0].message);
}
main();