Installation of NLTK

NLTK (Natural language Toolkit) is one of the leading Python package to work with Natural Language Processing. NLTK has introduced ease-to-use interfaces to over 50 corpora and lexical resources such as tokenization, lemmatization, stemming, WordNet, tagging, parsing, stopwords and text classification.

NLTK is a free open-source and available for Linux, Windows and macOS. This tutorial has explained about the installation of NLTK.

Installing NLTK

Open the command prompt and hit the following command

Installation with Python 2.X –

pip install nltk

Installation with Python 3.X –

pip3 install nltk

Installing NLTK Data

NLTK comes with many corpora, trained models, etc.  After finishing the installation of the NLTK package, please install NLTK data by the following command.

First, open the Python interpreter by hit the command python2 or python3 in command prompt.

Hit the below command in the Python interpreter prompt.

>>> import nltk
>>> nltk.download()

NLTK downloader window should open. You can download the individual package or collection by selecting it. And last hit the Download button. This will start to download the selected items.

 

After completion of the download of NLTK Data, Let’s ensure that NLTK works perfectly by using the following code.

>>> from nltk.tokenize import sent_tokenize, word_tokenize
>>> data = "All work and no play makes jack a dull boy, all work and no play"
>>> print(word_tokenize(data))
['All', 'work', 'and', 'no', 'play', 'makes', 'jack', 'a', 'dull', 'boy', ',', 'all', 'work', 'and', 'no', 'play']

.     .     .

Leave a Reply

Your email address will not be published. Required fields are marked *

Natural Language Processing Tutorials

A complete introduction to GPT-3 with Use Case examples

Deep Unveiling of the BERT Model

Word Embedding

Jaccard Similarity – Text Similarity Metric in NLP

TensorFlow : Text Classification of Movie Reviews

Text Preprocessing: Handle Emoji & Emoticon

Text Preprocessing: Removal of Punctuations

Develop the text Classifier with TensorFlow Hub

Tensorflow : BERT Fine-tuning with GPU

Introduction to BERT

NLTK – WordNet

Word Tokenization with NLTK

Introduction to Natural Language Processing (NLP)

Cosine Similarity – Text Similarity Metric

Introduction to Word Embeddings

NLP – Stop Words

An Introduction to N-grams

Stemming and Lemmatization

TfidfVectorizer for text classification

CountVectorizer for text classification

Regular Expression for Text Cleaning in NLP

Text Data Cleaning & Preprocessing

Different Tokenization Technique for Text Processing