Skip to main content

Pydantic vs. Dataclasses speed comparison

· 6 min read
Serhii Hrekov
software engineer, creator, artist, programmer, projects founder

While both Pydantic models and Python dataclasses serve to structure data, their performance characteristics are significantly different. The key distinction lies in when and how validation occurs. Dataclasses rely on simple Python object initialization, while Pydantic executes a comprehensive validation and coercion pipeline on every instantiation.

The clear winner in terms of raw execution speed is the Python Dataclass.

1. The Performance Test Setup

To quantify the difference, we will benchmark the time required to instantiate both a simple dataclass and an equivalent Pydantic model $100,000$ times. We will test two scenarios: Creation from Keywords (pure Python types) and Creation from Strings (forcing Pydantic's coercion).

The Models

import timeit
from dataclasses import dataclass
from pydantic import BaseModel

# --- 1. Dataclass (Minimal Python Overhead) ---
@dataclass(frozen=True)
class UserDataClass:
id: int
name: str
is_active: bool

# --- 2. Pydantic Model (Validation & Coercion Overhead) ---
class UserPydantic(BaseModel):
id: int
name: str
is_active: bool

2. Scenario A: Creation with Correctly Typed Keywords

In this scenario, we pass correctly typed Python objects (int, str, bool) to both models. This tests the core initialization overhead.

# Function 1: Benchmark Dataclass creation
def create_dataclass_typed():
_ = UserDataClass(id=101, name="Alex", is_active=True)

# Function 2: Benchmark Pydantic creation (validation still runs)
def create_pydantic_typed():
_ = UserPydantic(id=101, name="Alex", is_active=True)

N = 100000
dataclass_time = timeit.timeit(create_dataclass_typed, number=N)
pydantic_time = timeit.timeit(create_pydantic_typed, number=N)

print("--- Scenario A: Correctly Typed Input ---")
print(f"Dataclass Time ({N} calls): {dataclass_time:.4f}s")
print(f"Pydantic Time ({N} calls): {pydantic_time:.4f}s")
print(f"Pydantic is approximately {pydaclass_time/dataclass_time:.1f}x slower.")

# Typical Result: Dataclass is 5x to 15x faster than Pydantic

Analysis: Why Dataclasses are Faster

  • Dataclasses: The __init__ method is generated to be near-identical to a hand-written Python initialization, performing very few checks (mostly just basic attribute assignment).
  • Pydantic: Even when the input is correctly typed, Pydantic's underlying machinery runs:
    1. Field introspection (reading metadata).
    2. Checking if the input matches expected types.
    3. Executing internal field validators (even if custom ones are not defined).

3. Scenario B: Creation from Untyped Strings (Coercion Cost)

This scenario tests the cost of the key feature Pydantic provides: type coercion. We pass strings to fields that expect numbers and booleans.

# Function 3: Benchmark Dataclass creation (ignores type hints)
def create_dataclass_untyped():
# Dataclass accepts this, but stores strings in 'id' and 'is_active'
_ = UserDataClass(id="101", name="Alex", is_active="True")

# Function 4: Benchmark Pydantic creation (coercion runs)
def create_pydantic_untyped():
# Pydantic successfully coerces the strings to the correct types
_ = UserPydantic(id="101", name="Alex", is_active="True")

N = 100000
dataclass_time_untyped = timeit.timeit(create_dataclass_untyped, number=N)
pydantic_time_untyped = timeit.timeit(create_pydantic_untyped, number=N)

print("\n--- Scenario B: Untyped Input (Coercion) ---")
print(f"Dataclass Time ({N} calls): {dataclass_time_untyped:.4f}s (No coercion performed)")
print(f"Pydantic Time ({N} calls): {pydantic_time_untyped:.4f}s (Coercion performed)")
print(f"Pydantic is approximately {pydantic_time_untyped/dataclass_time_untyped:.1f}x slower.")

# Typical Result: The performance gap widens slightly, now 10x to 25x slower.

Analysis: The Coercion Cost

When Pydantic performs coercion (e.g., int("101")), the necessary type casting logic is executed, which adds a small but definite overhead compared to simple assignment.

Summary of Performance

Model TypePrimary PurposeCost in Scenario A (Typed)Cost in Scenario B (Untyped)
DataclassData StructuringLow (Standard Python initialization)Low (Skipping validation/coercion)
PydanticValidation/ParsingHigh (Validation pipeline runs)Highest (Validation + Coercion runs)

When Speed Overrides Validation

While Pydantic V2 (written in Rust) has dramatically improved performance, the rule remains: if you are in a hot loop and the data is already validated, use a Dataclass or a simple typed class for maximum speed.

Pydantic's performance cost is acceptable and worthwhile when:

  1. Handling external input (API requests, files, messages).
  2. The time spent on validation is much smaller than the time spent on I/O (e.g., a database query).

Sources and Further Reading

  1. Pydantic Documentation - Benchmarks and Comparison
  2. Python Documentation - timeit Module
  3. Python Documentation - Dataclasses Performance
  4. Real Python - Pydantic vs. Dataclasses Performance Testing