Skip to main content

SQLAlchemy Relationships Without Database Foreign Keys

· 7 min read
Serhii Hrekov
software engineer, creator, artist, programmer, projects founder

This pattern is often used for:

  1. Legacy Databases: Working with existing schemas that lack proper constraints.
  2. Performance: Avoiding the overhead of transactional foreign key checks.
  3. Data Warehousing: Dealing with schemas where relationships are semantic, not structural.

The key to achieving this is the relationship() function combined with the primaryjoin argument. This allows SQLAlchemy to define the join condition required for the relationship, enabling essential features like eager loading (joinedload, selectinload).

Core Components: relationship() and primaryjoin

The standard relationship() function automatically infers the join condition if a foreign key exists. When no foreign key exists, you must explicitly provide the primaryjoin argument.

1. The Target Database State

Imagine two tables (Parent and Child) that share a common column (code), but no foreign key constraint exists in the database.

TableColumn NameTypeKey Status
parentidIntegerPrimary Key
parentcodeString(No Key)
childidIntegerPrimary Key
childparent_codeString(No Key)

2. Defining the Models (The Magic of primaryjoin)

We use primaryjoin to tell SQLAlchemy: "To link Child to Parent, look where the Child.parent_code column equals the Parent.code column."

from sqlalchemy import create_engine, Column, Integer, String, Text
from sqlalchemy.orm import declarative_base, relationship, sessionmaker
from sqlalchemy.sql import func
from sqlalchemy.orm import joinedload # Import the key feature

Base = declarative_base()

class Parent(Base):
__tablename__ = 'parent'

id = Column(Integer, primary_key=True)
# The key used for the join, but NOT defined as a Foreign Key
code = Column(String(50), unique=True, nullable=False)
name = Column(String)

# Define the relationship to Child
children = relationship(
"Child",
# Explicitly define the join condition: Child.parent_code == Parent.code
primaryjoin="Parent.code == Child.parent_code",
# Prevents SQLAlchemy from creating an external FK constraint
foreign_keys="[Child.parent_code]",
# Optional: Add backref for easy lookup from Child to Parent
backref="parent_ref"
)

class Child(Base):
__tablename__ = 'child'

id = Column(Integer, primary_key=True)
title = Column(String)
# The column used in the join condition
parent_code = Column(String(50), nullable=False)

# No ParentId (FK) column is required here, only the shared code column

# Annotation: The foreign_keys argument points to the local column
# that acts as the link, satisfying SQLAlchemy's requirement for
# internal relationship tracking without touching the database schema.

Executing Eager Loading (joinedload)

The entire purpose of defining this relationship in code is to enable performance features like eager loading. Since the relationship is defined with primaryjoin, SQLAlchemy can correctly construct the necessary SQL LEFT OUTER JOIN (for joinedload) or SELECT statement (for selectinload).

Example: Running a joinedload Query

from sqlalchemy.orm import sessionmaker, joinedload

# --- Mock Database Setup (In a real scenario, this connects to your DB) ---
# engine = create_engine("sqlite:///memory:")
# Base.metadata.create_all(engine)
# Session = sessionmaker(bind=engine)
# session = Session()

# # Insert Data (Parent 'A1' and Child linked by 'A1')
# session.add(Parent(id=1, code='A1', name='Group A'))
# session.add(Child(id=101, parent_code='A1', title='Item 101'))
# session.commit()
# -------------------------------------------------------------------------

# The Key Feature: joinedload
# Assume we have a functional session object (session)

# query = session.query(Parent).options(
# # SQLAlchemy constructs the JOIN based on the primaryjoin defined above
# joinedload(Parent.children)
# )

# results = query.all()

# for parent in results:
# print(f"Parent: {parent.name} ({len(parent.children)} children)")
# for child in parent.children:
# # No additional SQL queries are executed here (no N+1 problem)
# print(f" -> Child: {child.title}")

#

# Annotation: When SQLAlchemy processes joinedload(Parent.children),
# it translates the Python relationship (Parent.code == Child.parent_code)
# into the correct SQL JOIN clause: `ON parent.code = child.parent_code`.

If two models can be joined on multiple fields, you must clearly specify the join condition to avoid ambiguity errors. This is another area where primaryjoin is indispensable.

class Department(Base):
__tablename__ = 'department'
id = Column(Integer, primary_key=True)
manager_id = Column(Integer)

class Employee(Base):
__tablename__ = 'employee'
id = Column(Integer, primary_key=True)
dept_id = Column(Integer)
# This column links to Department.manager_id
reports_to_id = Column(Integer)

# Relationship 1: The Employee's primary department (Joined by dept_id)
primary_dept = relationship(
"Department",
# Join condition based on dept_id
primaryjoin="Employee.dept_id == Department.id",
foreign_keys="[Employee.dept_id]"
)

# Relationship 2: The Employee's direct manager's department (Joined by reports_to_id)
manager_dept = relationship(
"Department",
# Join condition based on reports_to_id
primaryjoin="Employee.reports_to_id == Department.manager_id",
foreign_keys="[Employee.reports_to_id]" # Note: Must still specify foreign_keys locally
)

Annotation: Even without database foreign keys, you must provide foreign_keys to the relationship() function. When using primaryjoin, this argument tells SQLAlchemy which column on the local model is the source of the link, resolving potential relationship graph conflicts internally.


Sources and Further Reading

  1. SQLAlchemy Documentation - Defining Relationships
  2. SQLAlchemy Documentation - Controlling the Join Condition (primaryjoin)
  3. SQLAlchemy Documentation - Eager Loading (joinedload)
  4. SQLAlchemy Documentation - Advanced Relationship API