Skip to main content

Removing or Replacing Emoji in Text with Python

· 4 min read
Serhii Hrekov
software engineer, creator, artist, programmer, projects founder

Removing or Replacing Emoji in Text with Python (Copy-Friendly Version)

Since you know how to detect emojis, here are the most effective actions you can take once an emoji is found: removing it entirely, or converting it into a descriptive text shortcode.

1. Replacing Emojis with Descriptive Text (Demojize)

This is the most common and informative approach for NLP, as it preserves the sentiment or meaning of the emoji in a machine-readable format. Use the emoji.demojize() function.

FunctionActionExample Output
emoji.demojize()Converts a Unicode emoji to its official shortcode (e.g., :thumbs_up:).Python is fun :thumbs_up:

Python Example: emoji.demojize()

import emoji

# Sample text with composite and standard emojis
text_data = "I love this library! 👍🏽 The astronaut 👩‍🚀 is cool. ❤️"

# 1. Convert emojis to their default shortcode (e.g., :thumbs_up:)
shortcode_text = emoji.demojize(text_data)
print("--- Converted to Shortcode ---")
print(shortcode_text)
# Output: I love this library! :thumbs_up_medium_skin_tone: The astronaut :woman_astronaut: is cool. :red_heart:

# 2. Customize the output by changing delimiters
custom_text = emoji.demojize(text_data, delimiters=("", ""))
print("\n--- Converted without Delimiters ---")
print(custom_text)
# Output: I love this library! thumbs_up_medium_skin_tone The astronaut woman_astronaut is cool. red_heart

2. Removing Emojis Entirely

If you need to strip all non-textual elements, use the replace_emoji() function.

This is the cleanest and most direct method for removal.

FunctionActionExample Output
emoji.replace_emoji()Replaces the emoji with a specified string (default is empty).Python is fun
import emoji

text_data = "Python is great! 🐍💻🔥"

# Remove all emojis by replacing them with an empty string
text_removed = emoji.replace_emoji(text_data, replace='')
print("--- Emojis Removed ---")
print(text_removed)
# Output: Python is great!

# You can also replace them with a placeholder token
text_placeholder = emoji.replace_emoji(text_data, replace='[EMOJI_TOKEN]')
print("\n--- Emojis Replaced with Token ---")
print(text_placeholder)
# Output: Python is great! [EMOJI_TOKEN][EMOJI_TOKEN][EMOJI_TOKEN]

B. Using the demojize() Trick

You can also remove emojis by converting them to shortcodes first and then using a regular expression to strip the shortcodes.

import emoji
import re

text_data = "This is fast! 🚀"

# 1. Demojize to shortcode: "This is fast! :rocket:"
shortcode_text = emoji.demojize(text_data)

# 2. Use a regex to find and remove all shortcodes (words enclosed in colons)
text_removed_regex = re.sub(r':\w+:', '', shortcode_text).strip()
print(text_removed_regex)
# Output: This is fast!

3. Alternative: The clean-text Library

For complete text normalization (removing emails, digits, and emojis), the clean-text library is a powerful option.

pip install clean-text
from cleantext import clean

mixed_data = "Check out my new project! 🚀 Contact me at user@example.com."

cleaned_text = clean(mixed_data, no_emoji=True, no_emails=True)
print(cleaned_text)
# Output: check out my new project contact me at