Skip to main content

Removing or Replacing Emoji in Text with Python

Β· 4 min read
Serhii Hrekov
software engineer, creator, artist, programmer, projects founder

Removing or Replacing Emoji in Text with Python (Copy-Friendly Version)​

Since you know how to detect emojis, here are the most effective actions you can take once an emoji is found: removing it entirely, or converting it into a descriptive text shortcode.

1. Replacing Emojis with Descriptive Text (Demojize)​

This is the most common and informative approach for NLP, as it preserves the sentiment or meaning of the emoji in a machine-readable format. Use the emoji.demojize() function.

FunctionActionExample Output
emoji.demojize()Converts a Unicode emoji to its official shortcode (e.g., :thumbs_up:).Python is fun :thumbs_up:

Python Example: emoji.demojize()​

import emoji

# Sample text with composite and standard emojis
text_data = "I love this library! πŸ‘πŸ½ The astronaut πŸ‘©β€πŸš€ is cool. ❀️"

# 1. Convert emojis to their default shortcode (e.g., :thumbs_up:)
shortcode_text = emoji.demojize(text_data)
print("--- Converted to Shortcode ---")
print(shortcode_text)
# Output: I love this library! :thumbs_up_medium_skin_tone: The astronaut :woman_astronaut: is cool. :red_heart:

# 2. Customize the output by changing delimiters
custom_text = emoji.demojize(text_data, delimiters=("", ""))
print("\n--- Converted without Delimiters ---")
print(custom_text)
# Output: I love this library! thumbs_up_medium_skin_tone The astronaut woman_astronaut is cool. red_heart

2. Removing Emojis Entirely​

If you need to strip all non-textual elements, use the replace_emoji() function.

This is the cleanest and most direct method for removal.

FunctionActionExample Output
emoji.replace_emoji()Replaces the emoji with a specified string (default is empty).Python is fun
import emoji

text_data = "Python is great! πŸπŸ’»πŸ”₯"

# Remove all emojis by replacing them with an empty string
text_removed = emoji.replace_emoji(text_data, replace='')
print("--- Emojis Removed ---")
print(text_removed)
# Output: Python is great!

# You can also replace them with a placeholder token
text_placeholder = emoji.replace_emoji(text_data, replace='[EMOJI_TOKEN]')
print("\n--- Emojis Replaced with Token ---")
print(text_placeholder)
# Output: Python is great! [EMOJI_TOKEN][EMOJI_TOKEN][EMOJI_TOKEN]

B. Using the demojize() Trick​

You can also remove emojis by converting them to shortcodes first and then using a regular expression to strip the shortcodes.

import emoji
import re

text_data = "This is fast! πŸš€"

# 1. Demojize to shortcode: "This is fast! :rocket:"
shortcode_text = emoji.demojize(text_data)

# 2. Use a regex to find and remove all shortcodes (words enclosed in colons)
text_removed_regex = re.sub(r':\w+:', '', shortcode_text).strip()
print(text_removed_regex)
# Output: This is fast!

3. Alternative: The clean-text Library​

For complete text normalization (removing emails, digits, and emojis), the clean-text library is a powerful option.

pip install clean-text
from cleantext import clean

mixed_data = "Check out my new project! πŸš€ Contact me at user@example.com."

cleaned_text = clean(mixed_data, no_emoji=True, no_emails=True)
print(cleaned_text)
# Output: check out my new project contact me at