Linking Logs Across Python Microservices(Distributed Tracing)
π Distributed Tracing: Linking Logs Across Python Microservicesβ
In modern microservices architectures, a single user request might flow through an API Gateway, an authentication service, a business logic service, and several data services. While structured logging makes each service's log output clean, it doesn't automatically connect the dots.
Distributed Tracing is the operational practice of adding unique identifiers to every log and header related to a single request, allowing you to reconstruct the entire request path across all services.
1. The Distributed Logging Problemβ
When a user reports an error, you search the logs for that user ID. Without tracing, you find a log entry in Service A, but there is no easy way to find the related entries in Service B and Service C that were triggered downstream.
| Service | Log Message (Plain Text) | Log Message (Structured) |
|---|---|---|
| A | INFO: Auth successful for user 456. | {"user_id": 456, "message": "Auth successful"} |
| B | ERROR: Failed to update order. | {"user_id": 456, "message": "Failed to update order"} |
The logs are related, but there is no technical key linking them other than the user ID (which may not always be available).
2. The Solution: Trace and Span IDsβ
Distributed tracing introduces two primary identifiers, often propagated via HTTP headers:
- Trace ID: A unique ID generated at the entry point of the entire request (e.g., the API Gateway). This ID remains constant for the life cycle of the request across all services.
- Span ID: A unique ID for a specific unit of work within the trace (e.g., an individual function call or a trip to a database). A trace is composed of multiple spans.
3. Implementation in Python: Context Propagationβ
To implement tracing, you need a mechanism to:
- Generate a Trace ID upon initial entry.
- Inject the Trace ID into the logger's context (
extradictionary). - Propagate the Trace ID in the outgoing headers when calling a downstream service.
- Extract the Trace ID from incoming headers in downstream services.
Code Example: Using a logging.Filter and requestsβ
This example uses a logging.Filter to automatically inject the Trace ID into every log record within a request's thread.
import logging
import uuid
import requests
# 1. Define the TraceContextFilter
class TraceContextFilter(logging.Filter):
# A thread-local storage mechanism is typically used here
# (e.g., contextvars or threading.local)
thread_local_context = {}
def filter(self, record):
# Inject the current Trace ID if it exists in the thread context
if 'trace_id' in self.thread_local_context:
record.trace_id = self.thread_local_context['trace_id']
else:
record.trace_id = 'N/A'
return True
# 2. Add the custom filter to all handlers
for handler in logging.getLogger().handlers:
handler.addFilter(TraceContextFilter())
# 3. Request Flow Logic
def start_trace(request_headers):
# Extract existing ID or generate a new one
trace_id = request_headers.get('X-Trace-ID') or str(uuid.uuid4())
# Store the ID in the thread-local context
TraceContextFilter.thread_local_context['trace_id'] = trace_id
return trace_id
def call_downstream_service(url, trace_id):
# Propagate the ID in the header of the outgoing request
headers = {'X-Trace-ID': trace_id}
requests.get(url, headers=headers)
# --- EXECUTION ---
# In Service A (Entry Point):
req_id = start_trace({'Host': 'ClientA'}) # Generates a new ID
logger = logging.getLogger("ServiceA")
# Log message now automatically contains the 'trace_id' key
logger.info("Received request for processing.")
call_downstream_service('http://service-b', req_id)
Output in Structured Logs (Service A):
{
"timestamp": "...",
"levelname": "INFO",
"name": "ServiceA",
"trace_id": "9a3b6f...",
"message": "Received request for processing."
}
4. Open Standards: OpenTelemetryβ
While manually implementing context propagation (as above) works, the industry standard for distributed tracing is OpenTelemetry (OTel).
OTel provides vendor-neutral APIs, SDKs, and auto-instrumentation libraries that handle the messy work of generating, propagating, and exporting traces to backend analysis systems (like Jaeger or Zipkin).
The OTel approach is superior because:
- Automatic Propagation: It automatically wraps common libraries (like
requests, Flask, Django) to read/write trace headers. - Automatic Context: It seamlessly integrates with the
loggingmodule to inject the current trace and span IDs into your structured logs without manualextradictionaries. - Vendor Neutrality: Your tracing data can be sent to any OTel-compatible system.
Expert Advice: For any new microservices project, skip the manual implementation and jump straight to integrating the OpenTelemetry SDK for Python.
