Text Preprocessing: Handle Emoji & Emoticon

Text pre-processing step is a very crucial stage when you work with Natural Language Processing (NLP). There are many text pre-processing methods we need to conduct in text cleaning stage such as handle stop words, special characters, emoji, emoticon, punctuations, spelling correction, URL, etc.

Nowadays in a day to day life, people often use emoji and emoticon in a sentence to express the feeling or describe object instead of writing a word in a social media platform.

Both emoji and emoticon convey emotional expression in a text message. The main difference between emoji and emoticon: an emoji is a small actual image that is used to express emotions or idea in text messages. where an emoticon is a facial expression representation using keyboard characters and punctuations. For example, ? is an emoji and ‘:)’ is an emoticon that represents a happy face.

For text analysis, we might need to handle it. In this tutorial, you will get to know about how to handle emoji/emoticon in a text with examples. You can remove the emojis from the text for text analysis. Sometimes, emoticons give strong information about a text such as feeling expression.

For example, if we are working with product reviews sentiment analysis data where we have to identify each review whether it is positive or negative. People may write the emoji in a review to express the feeling towards the product how much he/she is satisfied with the product. In that case, we may lose valuable information if we remove the emojis. In this case, a better approach is to convert emoji to word format so that it preserves the emoji information.

However, not all emojis convey the message. Many of them are meaningless. we should remove it as they are not providing any helpful information.

The below section describe methods to remove the emoji/emoticon and conversation from emoji to word. Please note that removal of emoji/emoticon is not a preferred solution.  however, you can make a decision to remove it based on the use case.

Handle Emoji

Remove Emoji

import re
def remove_emoji(string):
    emoji_pattern = re.compile("["
                           u"\U0001F600-\U0001F64F" # emoticons
                           u"\U0001F300-\U0001F5FF" # symbols & pictographs
                           u"\U0001F680-\U0001F6FF" # transport & map symbols
                           u"\U0001F1E0-\U0001F1FF" # flags (iOS)
                           u"\U00002702-\U000027B0"
                           u"\U000024C2-\U0001F251"
                           "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', string)

remove_emoji("I like to eat ?")

This produces the following output:

'I like to eat '

A conversation of Emojis to Word
The better approach to deal with emoji is to convert emoji to word so that it’s being helpful to preserve information. Please download the dictionary of emoji by using this link and save it to your current working dictionary.

try:
    import cPickle as pickle
except ImportError: 
    import pickle
import re

with open('Emoji_Dict.p', 'rb') as fp:
    Emoji_Dict = pickle.load(fp)
Emoji_Dict = {v: k for k, v in Emoji_Dict.items()}

def convert_emojis_to_word(text):
    for emot in Emoji_Dict:
        text = re.sub(r'('+emot+')', "_".join(Emoji_Dict[emot].replace(",","").replace(":","").split()), text)
    return text

text = "I won ? in ?"
convert_emojis_to_word(text)

This produces the following result:

I won 1st_place_medal in cricket
convert_emojis_to_word("I like to eat ?")

Output

'I like to eat pizza'

Handle Emoticon

Remove Emoticon

Please download emoticon dictionary by this link and save it to your current working directory.

import re
try:
    import cPickle as pickle
except ImportError:
    import pickle  

with open('Emoticon_Dict.p', 'rb') as fp:
    Emoticon_Dict = pickle.load(fp)

def remove_emoticons(text):
    emoticon_pattern = re.compile(u'(' + u'|'.join(k for k in Emoticon_Dict) + u')')
    return emoticon_pattern.sub(r'', text)

remove_emoticons("Good Morning :-)")

# Output:
'Good Morning '

In this tutorial, you have learnt about how to deal with emoji and emoticon in a text. Please write a comment in the below section if you have any question regarding this tutorial.

.     .    .

Leave a Reply

Your email address will not be published. Required fields are marked *

Natural Language Processing Tutorials

A complete introduction to GPT-3 with Use Case examples

Deep Unveiling of the BERT Model

Word Embedding

Jaccard Similarity – Text Similarity Metric in NLP

TensorFlow : Text Classification of Movie Reviews

Text Preprocessing: Removal of Punctuations

Develop the text Classifier with TensorFlow Hub

Tensorflow : BERT Fine-tuning with GPU

Introduction to BERT

NLTK – WordNet

Word Tokenization with NLTK

Installation of NLTK

Introduction to Natural Language Processing (NLP)

Cosine Similarity – Text Similarity Metric

Introduction to Word Embeddings

NLP – Stop Words

An Introduction to N-grams

Stemming and Lemmatization

TfidfVectorizer for text classification

CountVectorizer for text classification

Regular Expression for Text Cleaning in NLP

Text Data Cleaning & Preprocessing

Different Tokenization Technique for Text Processing