Text pre-processing step is a very crucial stage when you work with Natural Language Processing (NLP). There are many text pre-processing methods we need to conduct in text cleaning stage such as handle stop words, special characters, emoji, emoticon, punctuations, spelling correction, URL, etc.
Nowadays in a day to day life, people often use emoji and emoticon in a sentence to express the feeling or describe object instead of writing a word in a social media platform.
Both emoji and emoticon convey emotional expression in a text message. The main difference between emoji and emoticon: an emoji is a small actual image that is used to express emotions or idea in text messages. where an emoticon is a facial expression representation using keyboard characters and punctuations. For example, ? is an emoji and ‘:)’ is an emoticon that represents a happy face.
For text analysis, we might need to handle it. In this tutorial, you will get to know about how to handle emoji/emoticon in a text with examples. You can remove the emojis from the text for text analysis. Sometimes, emoticons give strong information about a text such as feeling expression.
For example, if we are working with product reviews sentiment analysis data where we have to identify each review whether it is positive or negative. People may write the emoji in a review to express the feeling towards the product how much he/she is satisfied with the product. In that case, we may lose valuable information if we remove the emojis. In this case, a better approach is to convert emoji to word format so that it preserves the emoji information.
However, not all emojis convey the message. Many of them are meaningless. we should remove it as they are not providing any helpful information.
The below section describe methods to remove the emoji/emoticon and conversation from emoji to word. Please note that removal of emoji/emoticon is not a preferred solution. however, you can make a decision to remove it based on the use case.
Handle Emoji
Remove Emoji
import re def remove_emoji(string): emoji_pattern = re.compile("[" u"\U0001F600-\U0001F64F" # emoticons u"\U0001F300-\U0001F5FF" # symbols & pictographs u"\U0001F680-\U0001F6FF" # transport & map symbols u"\U0001F1E0-\U0001F1FF" # flags (iOS) u"\U00002702-\U000027B0" u"\U000024C2-\U0001F251" "]+", flags=re.UNICODE) return emoji_pattern.sub(r'', string) remove_emoji("I like to eat ?")
This produces the following output:
'I like to eat '
A conversation of Emojis to Word
The better approach to deal with emoji is to convert emoji to word so that it’s being helpful to preserve information. Please download the dictionary of emoji by using this link and save it to your current working dictionary.
try: import cPickle as pickle except ImportError: import pickle import re with open('Emoji_Dict.p', 'rb') as fp: Emoji_Dict = pickle.load(fp) Emoji_Dict = {v: k for k, v in Emoji_Dict.items()} def convert_emojis_to_word(text): for emot in Emoji_Dict: text = re.sub(r'('+emot+')', "_".join(Emoji_Dict[emot].replace(",","").replace(":","").split()), text) return text text = "I won ? in ?" convert_emojis_to_word(text)
This produces the following result:
I won 1st_place_medal in cricket
convert_emojis_to_word("I like to eat ?")
Output
'I like to eat pizza'
Handle Emoticon
Remove Emoticon
Please download emoticon dictionary by this link and save it to your current working directory.
import re try: import cPickle as pickle except ImportError: import pickle with open('Emoticon_Dict.p', 'rb') as fp: Emoticon_Dict = pickle.load(fp) def remove_emoticons(text): emoticon_pattern = re.compile(u'(' + u'|'.join(k for k in Emoticon_Dict) + u')') return emoticon_pattern.sub(r'', text) remove_emoticons("Good Morning :-)") # Output: 'Good Morning '
In this tutorial, you have learnt about how to deal with emoji and emoticon in a text. Please write a comment in the below section if you have any question regarding this tutorial.
. . .