Removing or Replacing Emoji in Text with Python
Β· 4 min read
Removing or Replacing Emoji in Text with Python (Copy-Friendly Version)β
Since you know how to detect emojis, here are the most effective actions you can take once an emoji is found: removing it entirely, or converting it into a descriptive text shortcode.
1. Replacing Emojis with Descriptive Text (Demojize)β
This is the most common and informative approach for NLP, as it preserves the sentiment or meaning of the emoji in a machine-readable format. Use the emoji.demojize() function.
| Function | Action | Example Output |
|---|---|---|
emoji.demojize() | Converts a Unicode emoji to its official shortcode (e.g., :thumbs_up:). | Python is fun :thumbs_up: |
Python Example: emoji.demojize()β
import emoji
# Sample text with composite and standard emojis
text_data = "I love this library! ππ½ The astronaut π©βπ is cool. β€οΈ"
# 1. Convert emojis to their default shortcode (e.g., :thumbs_up:)
shortcode_text = emoji.demojize(text_data)
print("--- Converted to Shortcode ---")
print(shortcode_text)
# Output: I love this library! :thumbs_up_medium_skin_tone: The astronaut :woman_astronaut: is cool. :red_heart:
# 2. Customize the output by changing delimiters
custom_text = emoji.demojize(text_data, delimiters=("", ""))
print("\n--- Converted without Delimiters ---")
print(custom_text)
# Output: I love this library! thumbs_up_medium_skin_tone The astronaut woman_astronaut is cool. red_heart
2. Removing Emojis Entirelyβ
If you need to strip all non-textual elements, use the replace_emoji() function.
A. Using emoji.replace_emoji() (Recommended)β
This is the cleanest and most direct method for removal.
| Function | Action | Example Output |
|---|---|---|
emoji.replace_emoji() | Replaces the emoji with a specified string (default is empty). | Python is fun |
import emoji
text_data = "Python is great! ππ»π₯"
# Remove all emojis by replacing them with an empty string
text_removed = emoji.replace_emoji(text_data, replace='')
print("--- Emojis Removed ---")
print(text_removed)
# Output: Python is great!
# You can also replace them with a placeholder token
text_placeholder = emoji.replace_emoji(text_data, replace='[EMOJI_TOKEN]')
print("\n--- Emojis Replaced with Token ---")
print(text_placeholder)
# Output: Python is great! [EMOJI_TOKEN][EMOJI_TOKEN][EMOJI_TOKEN]
B. Using the demojize() Trickβ
You can also remove emojis by converting them to shortcodes first and then using a regular expression to strip the shortcodes.
import emoji
import re
text_data = "This is fast! π"
# 1. Demojize to shortcode: "This is fast! :rocket:"
shortcode_text = emoji.demojize(text_data)
# 2. Use a regex to find and remove all shortcodes (words enclosed in colons)
text_removed_regex = re.sub(r':\w+:', '', shortcode_text).strip()
print(text_removed_regex)
# Output: This is fast!
3. Alternative: The clean-text Libraryβ
For complete text normalization (removing emails, digits, and emojis), the clean-text library is a powerful option.
pip install clean-text
from cleantext import clean
mixed_data = "Check out my new project! π Contact me at user@example.com."
cleaned_text = clean(mixed_data, no_emoji=True, no_emails=True)
print(cleaned_text)
# Output: check out my new project contact me at
