Natural Language Processing With spaCy in Python

February 01, 2025 February 01, 2025

Natural Language Processing With spaCy in Python
by:
blow post content copied from Real Python
click here to view original post

spaCy is a robust open-source library for Python, ideal for natural language processing (NLP) tasks. It offers built-in capabilities for tokenization, dependency parsing, and named-entity recognition, making it a popular choice for processing and analyzing text. With spaCy, you can efficiently represent unstructured text in a computer-readable format, enabling automation of text analysis and extraction of meaningful insights.

By the end of this tutorial, you’ll understand that:

You can use spaCy for natural language processing tasks such as part-of-speech tagging, and named-entity recognition.
spaCy is often preferred over NLTK for production environments due to its performance and modern design.
spaCy provides integration with transformer models, such as BERT.
You handle tokenization in spaCy by breaking text into tokens using its efficient built-in tokenizer.
Dependency parsing in spaCy helps you understand grammatical structures by identifying relationships between headwords and dependents.

Unstructured text is produced by companies, governments, and the general population at an incredible scale. It’s often important to automate the processing and analysis of text that would be impossible for humans to process. To automate the processing and analysis of text, you need to represent the text in a format that can be understood by computers. spaCy can help you do that.

If you’re new to NLP, don’t worry! Before you start using spaCy, you’ll first learn about the foundational terms and concepts in NLP. You should be familiar with the basics in Python, though. The code in this tutorial contains dictionaries, lists, tuples, for loops, comprehensions, object oriented programming, and lambda functions, among other fundamental Python concepts.

Free Source Code: Click here to download the free source code that you’ll use for natural language processing (NLP) in spaCy.

Introduction to NLP and spaCy

NLP is a subfield of artificial intelligence, and it’s all about allowing computers to comprehend human language. NLP involves analyzing, quantifying, understanding, and deriving meaning from natural languages.

Note: Currently, the most powerful NLP models are transformer based. BERT from Google and the GPT family from OpenAI are examples of such models.

Since the release of version 3.0, spaCy supports transformer based models. The examples in this tutorial are done with a smaller, CPU-optimized model. However, you can run the examples with a transformer model instead. All Hugging Face transformer models can be used with spaCy.

NLP helps you extract insights from unstructured text and has many use cases, such as:

spaCy is a free, open-source library for NLP in Python written in Cython. It’s a modern, production-focused NLP library that emphasizes speed, streamlined workflows, and robust pretrained models. spaCy is designed to make it easy to build systems for information extraction or general-purpose natural language processing.

Another popular Python NLP library that you may have heard of is Python’s Natural Language Toolkit (NLTK). NLTK is a classic toolkit that’s widely used in research and education. It offers an extensive range of algorithms and corpora but with less emphasis on high-throughput performance than spaCy.

In this tutorial, you’ll learn how to work with spaCy, which is a great choice for building production-ready NLP applications.

Installation of spaCy

In this section, you’ll install spaCy into a virtual environment and then download data and models for the English language.

You can install spaCy using pip, a Python package manager. It’s a good idea to use a virtual environment to avoid depending on system-wide packages. To learn more about virtual environments and pip, check out Using Python’s pip to Manage Your Projects’ Dependencies and Python Virtual Environments: A Primer.

First, you’ll create a new virtual environment, activate it, and install spaCy. Select your operating system below to learn how:

Windows PowerShell


PS> python -m venv venv
PS> ./venv/Scripts/activate
(venv) PS> python -m pip install spacy

Shell


$ python -m venv venv
$ source ./venv/bin/activate
(venv) $ python -m pip install spacy

With spaCy installed in your virtual environment, you’re almost ready to get started with NLP. But there’s one more thing you’ll have to install:

Shell

(venv) $ python -m spacy download en_core_web_sm

There are various spaCy models for different languages. The default model for the English language is designated as en_core_web_sm. Since the models are quite large, it’s best to install them separately—including all languages in one package would make the download too massive.

Once the en_core_web_sm model has finished downloading, open up a Python REPL and verify that the installation has been successful:

Python


>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")

Read the full article at https://realpython.com/natural-language-processing-spacy-python/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

February 01, 2025 at 07:30PM
Click here for more details...

=============================
The original post is available in Real Python by
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Python

Python Reader

Natural Language Processing With spaCy in Python

Introduction to NLP and spaCy

Installation of spaCy

Read the full article at https://realpython.com/natural-language-processing-spacy-python/ »

Post a Comment