Skip to main content

A Guide to Preserving YAML Formatting with PyYAML

· 6 min read
Serhii Hrekov
software engineer, creator, artist, programmer, projects founder

Python's PyYAML library is a powerful tool for working with YAML, but it has one common limitation: when you load a YAML file and dump it back, it doesn't preserve the original formatting. This can be a problem if you have multi-line strings formatted as literal blocks (|) and want to keep them that way for readability.

Fortunately, there's a straightforward and effective way to solve this by creating a custom representer for PyYAML. This guide will walk you through the process, using the code you provided, to ensure your multi-line strings are always dumped as literal blocks.

The Core Problem: Why PyYAML Doesn't Preserve Formatting

When PyYAML loads a YAML file, its parser's primary job is to convert the YAML data into native Python objects, like dictionaries, lists, and strings. In this process, the formatting information—such as whether a string was a literal block (|), a folded block (>), or a plain string—is discarded.

The Python object in memory just sees a multi-line string with newline characters (\n). When you dump this string back to YAML, PyYAML's dumper makes its own decision on how to format it, often defaulting to a folded style, which can be less readable than the original literal block.

The Solution: A Custom Representer

A representer is a function that tells PyYAML how to "represent" a specific Python type as a YAML construct. By adding a custom representer for the str type, you can override the default behavior and force it to use the | style whenever a string contains a newline.

Here's the code that accomplishes this, broken down step by step:

import yaml

# A custom representer function for strings
def str_presenter(dumper, data):
# Check if the string contains a newline character
if "\n" in data:
# If so, force a literal block style (|)
return dumper.represent_scalar("tag:yaml.org,2002:str", data, style="|")
# Otherwise, use the default representation
return dumper.represent_scalar("tag:yaml.org,2002:str", data)

# Register the custom representer for the 'str' type
yaml.add_representer(str, str_presenter)

How it works:

  • The str_presenter function is called every time PyYAML needs to dump a string.
  • The dumper argument is the object responsible for writing the YAML. The data argument is the actual Python string.
  • The simple check if "\n" in data acts as a heuristic. If a string has a newline, it's considered a multi-line string.
  • dumper.represent_scalar() is the method that actually writes the string to the YAML file. We pass it data and, crucially, a style="|" argument to force the literal block style.

Putting It All Together in a Practical Script

Now, let's embed this logic into a complete Python script that can be used to load, modify, and dump a YAML file while preserving the formatting for multi-line strings. This is a common pattern for configuration file management.

import sys
import yaml

# --- The custom representer code goes here ---
def str_presenter(dumper, data):
if "\n" in data:
return dumper.represent_scalar("tag:yaml.org,2002:str", data, style="|")
return dumper.represent_scalar("tag:yaml.org,2002:str", data)

yaml.add_representer(str, str_presenter)

def main():
if len(sys.argv) < 2:
print("Usage: python script_name.py <filename>")
sys.exit(1)

filename = sys.argv[1]

# Load the YAML file
with open(filename, "r") as f:
data = yaml.safe_load(f)

# You can now modify the 'data' dictionary as needed.
# For this example, we will just dump it back.

# Dump the data back to the same file, forcing literal block styles
with open(filename, "w") as f:
yaml.dump(data, f, sort_keys=False)

if __name__ == "__main__":
main()

To use this script, save it as yaml_formatter.py and run it from your terminal:

python yaml_formatter.py my_config.yaml

This will load my_config.yaml, and regardless of how the multi-line strings were originally formatted, the script will dump them back using the literal block style, ensuring consistent and readable output.

This method provides a simple, robust, and elegant solution for a common problem, allowing you to manage your YAML files programmatically without sacrificing human readability.