Llama 3.1 8B Instruct
Description
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to leading closed-source models in human evaluations.
At a Glance
Key pricing and model details available for this model.
Input price
$0.09
per 1M tokens
Output price
$0.09
per 1M tokens
Context window
16K
tokens
Hallucination rate
48.4%
Token Pricing
Token pricing normalized to per-million-token rates.
Input / 1M tokens
$0.09
Output / 1M tokens
$0.09
Cache Read / 1M tokens
Free
Token Pricing Details
Rates are shown per 1M tokens for easier comparison.
| Input / 1M tokens | $0.09 |
| Input unit | 1M tokens |
| Output / 1M tokens | $0.09 |
| Output unit | 1M tokens |
| Cache Read / 1M tokens | Free |
| Cache Read unit | 1M tokens |
Feature Availability
Capabilities explicitly listed in the current payload.
LLM
Available
Vision
Not listed
Function calling
Not listed
Reasoning
Not listed
Supported Parameters
Artificial Analysis
Index scores currently reported for this model.
Intelligence Index
11.8
Coding Index
4.9
Math Index
4.3
Category Radar
Aggregated from the benchmark values present for reasoning, code, math, and accuracy.
Benchmark Breakdown
Detailed benchmark results drawn from the current payload.
Intelligence Index
Overall 'how smart' score for an AI, combining reasoning, math, coding, and knowledge.
11.8
Reported score
Coding Index
How well the model handles real programming tasks.
4.9
Reported score
Math Index
Composite score measuring mathematical reasoning and problem-solving.
4.3
Reported score
MMLU-Pro
A broad and difficult knowledge-and-reasoning benchmark across many subjects.
47.6%
Reported score
GPQA
Graduate-level science questions designed to be difficult to shortcut.
25.9%
Reported score
HLE
A very hard expert-level exam across a wide range of subjects.
5.1%
Reported score
LiveCodeBench
Fresh programming tasks meant to test current coding ability.
11.6%
Reported score
SciCode
Coding tasks drawn from real scientific workflows.
13.2%
Reported score
MATH-500
A set of difficult competition-style math problems.
51.9%
Reported score
AIME
Advanced math competition questions.
7.7%
Reported score
AIME 2025
The 2025 AIME benchmark used to reduce data leakage concerns.
4.3%
Reported score
IFBench
Measures how precisely the model follows detailed instructions.
28.6%
Reported score
LCR
Tests long-context reasoning over large documents and conversations.
15.7%
Reported score
TerminalBench Hard
A harder coding-agent benchmark for complex multi-step terminal tasks.
0.8%
Reported score
Tau2
Evaluates realistic agent behavior in tool-using support workflows.
16.4%
Reported score
Code Samples
Quick start with the Routeway API
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: "https://api.routeway.ai/v1",
apiKey: "<YOUR_API_KEY>",
});
async function main() {
const completion = await openai.chat.completions.create({
model: "llama-3.1-8b-instruct",
messages: [
{
role: "user",
content: "Explain quantum computing in simple terms"
}
]
});
console.log(completion.choices[0].message);
}
main();