Explore Courses
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Birla Institute of Management Technology Birla Institute of Management Technology Post Graduate Diploma in Management (BIMTECH)
  • 24 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Popular
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science & AI (Executive)
  • 12 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
University of MarylandIIIT BangalorePost Graduate Certificate in Data Science & AI (Executive)
  • 8-8.5 Months
upGradupGradData Science Bootcamp with AI
  • 6 months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
OP Jindal Global UniversityOP Jindal Global UniversityMaster of Design in User Experience Design
  • 12 Months
Popular
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Rushford, GenevaRushford Business SchoolDBA Doctorate in Technology (Computer Science)
  • 36 Months
IIIT BangaloreIIIT BangaloreCloud Computing and DevOps Program (Executive)
  • 8 Months
New
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Popular
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
Golden Gate University Golden Gate University Doctor of Business Administration in Digital Leadership
  • 36 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
Popular
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
Bestseller
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
IIIT BangaloreIIIT BangalorePost Graduate Certificate in Machine Learning & Deep Learning (Executive)
  • 8 Months
Bestseller
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in AI and Emerging Technologies (Blended Learning Program)
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
ESGCI, ParisESGCI, ParisDoctorate of Business Administration (DBA) from ESGCI, Paris
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration From Golden Gate University, San Francisco
  • 36 Months
Rushford Business SchoolRushford Business SchoolDoctor of Business Administration from Rushford Business School, Switzerland)
  • 36 Months
Edgewood CollegeEdgewood CollegeDoctorate of Business Administration from Edgewood College
  • 24 Months
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with Concentration in Generative AI
  • 36 Months
Golden Gate University Golden Gate University DBA in Digital Leadership from Golden Gate University, San Francisco
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Deakin Business School and Institute of Management Technology, GhaziabadDeakin Business School and IMT, GhaziabadMBA (Master of Business Administration)
  • 12 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science (Executive)
  • 12 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityO.P.Jindal Global University
  • 12 Months
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (AI/ML)
  • 36 Months
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDBA Specialisation in AI & ML
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
New
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGrad KnowledgeHutupGrad KnowledgeHutAzure Administrator Certification (AZ-104)
  • 24 Hours
KnowledgeHut upGradKnowledgeHut upGradAWS Cloud Practioner Essentials Certification
  • 1 Week
KnowledgeHut upGradKnowledgeHut upGradAzure Data Engineering Training (DP-203)
  • 1 Week
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
Loyola Institute of Business Administration (LIBA)Loyola Institute of Business Administration (LIBA)Executive PG Programme in Human Resource Management
  • 11 Months
Popular
Goa Institute of ManagementGoa Institute of ManagementExecutive PG Program in Healthcare Management
  • 11 Months
IMT GhaziabadIMT GhaziabadAdvanced General Management Program
  • 11 Months
Golden Gate UniversityGolden Gate UniversityProfessional Certificate in Global Business Management
  • 6-8 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
IU, GermanyIU, GermanyMaster of Business Administration (90 ECTS)
  • 18 Months
Bestseller
IU, GermanyIU, GermanyMaster in International Management (120 ECTS)
  • 24 Months
Popular
IU, GermanyIU, GermanyB.Sc. Computer Science (180 ECTS)
  • 36 Months
Clark UniversityClark UniversityMaster of Business Administration
  • 23 Months
New
Golden Gate UniversityGolden Gate UniversityMaster of Business Administration
  • 20 Months
Clark University, USClark University, USMS in Project Management
  • 20 Months
New
Edgewood CollegeEdgewood CollegeMaster of Business Administration
  • 23 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
KnowledgeHut upGradKnowledgeHut upGradBackend Development Bootcamp
  • Self-Paced
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 5 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
upGradupGradDigital Marketing Accelerator Program
  • 05 Months

Text Summarisation in Natural Language Processing: Algorithms, Techniques & Challenges

Updated on 23 September, 2022

10.56K+ views
11 min read

Creating a summary from a given piece of content is a very abstract process that everyone participates in. Automating such a process can help parse through a lot of data and help humans better use their time to make crucial decisions. With the sheer volume of media out there, one can be very efficient by reducing the fluff around the most critical information. We have already started seeing text summaries across the web that are automatically generated.

If you frequent Reddit, you might’ve seen the ‘Autotldr bot’ routinely helps Redditors by summarizing linked articles in a given post. It was created in just 2011 and has already saved thousands of person-hours. There is a market for reliable text summaries, as shown by a trend of applications that do precisely that, such as Inshorts (summarizing news in 60 words or less) and Blinkist (summarizing books ).

Automatic Text Summarization, thus, is an exciting yet challenging frontier in Natural Language Processing (NLP) and Machine Learning (ML). The current developments in Automatic text Summarization are owed to research into this field since the 1950s when Hans Peter Luhn’s paper titled “The automatic creation of literature abstracts” was published.

This paper outlined the use of features such as word frequency and phrase frequency to extract essential sentences from a document. This was followed by another critical research done by Harold P Edmundson in the late 1960s, which highlighted the presence of cue words, words used in the title appearing in the text, and the location of sentences to extract sentences of significance from a document.

Now that the world has made strides in Machine learning and publishing newer studies in the field, automatic text summarization is on the verge of becoming a ubiquitous tool to interact with information in the digital age.

Must Read: NLP Engineer Salary in India

There are primarily two main approaches to Summarizing text in NLP

Text Summarization in NLP

1. Extraction-based summarization

As the name suggests, this technique relies on merely extracting or pulling out key phrases from a document. It is then followed by combining these key phrases to form a coherent summary.  

2. Abstractive-based summarization

This technique, unlike extraction, relies on being able to paraphrase and shorten parts of a document. When such abstraction is done correctly in deep learning problems, one can be sure to have consistent grammar. But, this added layer of complexity comes at the cost of being harder to develop than extraction.

There is another way to come up with higher quality summaries. This approach is called aided summarization, which entails a combined human and software effort. This too comes in 2 different flavors 

  1. Machine-aided human summarization: extractive techniques highlight candidate passages to be included, which the human may add or remove text.
  2. Human aided Machine summarization: the human simply edits the output of the software.

Apart from the main approaches to summarize text, there are other bases on which text summarizers are classified. The following are those category heads:

3. Single vs. Multi-document summarization

Single documents rely on the cohesiveness and infrequent repetition of facts to generate summaries. Multi-document summarizations, on the other hand, increase the chance of redundant information and recurrence.

4. Indicative vs. informative

The taxonomy of the summaries relies on the user’s end goal. For instance, in indicative type summaries, one would expect high-level points of an article. Whereas, in an informative overview, one may expect more topic filtering to let the reader drill down the summary.

5. Document length and type

The length of the input text heavily influences the sort of summarization approach.

The largest summarization datasets, like newsroom by Cornell, have focussed on news articles, which are about 300-1000 words on average. Extractive summarizers deal with such lengths relatively well. A multipage document or chapter of a book can only be summarized adequately with more advanced approaches like hierarchical clustering or discourse analysis.

Additionally, the genre of the text influences the summarizer too. The methods that would summarize a technical white-paper would be radically different from the techniques that may be better equipped to summarize a financial statement.

In this article, we will focus on further details of the extraction summarization technique.

PageRank Algorithm

This algorithm helps search engines like google rank web pages. Let’s understand the algorithm with an example. Assume you have four web pages with different levels of connectivity between them. One may have no links to the other three; one may be connected to the other 2, one may be correlated to just one, and so on.

We can then model the probabilities of navigating from one page to another by using a matrix with n rows and columns, where n is the number of web pages. Each element within the matrix will represent the probability of transitioning from one webpage to another. By assigning the right probabilities, one can iteratively update such a matrix to come to a web page ranking.

Also Read: NLP Project & Topics

TextRank Algorithm

The reason we explored the PageRank algorithm is to show how the same algorithm can be used to rank text instead of web pages. This can be done by changing perspective by replacing links between pages to similarity between sentences and using the PageRank style matrix as a similarity score.

Implementing the TextRank algorithm

Required Libraries

  • Numby
  • Pandas
  • Ntlk
  • re

The following is an explanation of the code behind the extraction summarization technique:

Step 1

Concatenate all the text you have in the source document as one solid block of text. The reason to do that is to provide conditions so that we can execute step 2 more easily.

Step 2

We provide conditions that define a sentence such as looking for punctuation marks such as period (.), question mark (?), and an exclamation mark (!). Once we have this definition, we simply split the text document into sentences.

Step 3

Now that we have access to separate sentences, we find vector representations (word embeddings) of each of those sentences. It is now that we must understand what vector representations are. Word embeddings are a type of word representation that provides a mathematical description of words with similar meanings. In actuality, this is an entire class of techniques that represent words as real-valued vectors in a predefined vector space.

Each word is represented by a real-valued vector that has many dimensions (over 100 at times). The distribution representation is based on the usage of words and, thus, allows words used in similar ways to have similar descriptions. This allows us to naturally capture the meanings of words as by their proximity to other words represented as vectors themselves. 

For this guide, we will use the Global Vectors of Word Representation (GloVe). The gloVe is the open-source distributed word representation algorithm that was developed by Pennington at Stanford. It combines the features of 2 model families, namely the global matrix factorization and local context window methods.

Step 4

Once we have the vector representation for our words, we have to extend the process to represent entire sentences as vectors. To do so, we may fetch the vector representations of the terms that constitute words in a sentence and then the mean/average of those vectors to arrive at a consolidated vector for the sentence.

Step 5

At this point, we have a vector representation for each individual sentence. It is now helpful to quantify similarities between the sentences using the cosine similarity approach. We can then populate an empty matrix with the cosine similarities of the sentences.

Step 6

Now that we have a matrix populated with the cosine similarities between the sentences. We can convert this matrix into a graph wherein the nodes represent the sentences, and the edges represent the similarity between the sentences. It is on this graph that we will use the handy PageRank algorithm to arrive at the sentence ranking.

Step 7

We now have ranked all sentences in the article in order of importance. We can now extract the top N (say 10) sentences to create a summary.

To find the code for such a method, there are many such projects on Github; this article, on the other hand, helps develop an understanding of the same. 

Check out: Evolution of Language Modelling in Modern Life

Evaluation techniques

An important factor in fine-tuning such models is to have a reliable method to judge the quality of the summaries produced. This necessitates good evaluation techniques, which can be broadly classified into the following:

  • Intrinsic and extrinsic evaluation:

Intrinsic: such evaluation tests the summarization system in and of itself. They mainly assess the coherence and informativeness of the summary.

Extrinsic: such evaluation tests the summarization based on how it affects some other task. It may test the impact of the summarization on tasks like relevance assessment, reading comprehension, etc. 

  • Inter-textual and Intra-textual:

Inter-textual: Such evaluations focus on a contrastive analysis of several summarization systems.

Intra-textual: such evaluations assess the output of a specific summarization system.

  • Domain-specific and domain-independent:

Domain independent: These techniques generally apply sets of general features that can be focused on identifying information-rich text segments.

Domain-specific: These techniques utilize the available knowledge specific to a domain on a text. For example, text summarization of medical literature requires the use of sources of medical knowledge and ontologies. 

  •  Evaluating summaries qualitatively:

The major drawback of other evaluation techniques is that they necessitate reference summaries to be able to compare the output of the automatic summaries with the model. This makes the task of evaluation hard and expensive. There is work being done to build a corpus of articles/documents and their corresponding summaries to solve this problem. 

Challenges to Text Summarization

Despite highly developed tools to generate and evaluate summaries, challenges remain to find a reliable way for text summarizers to understand what is important and relevant. 

As discussed, vector representation and similarity matrices attempt to find word associations, but they still do not have a reliable method to identify the most important sentences.

Another challenge in text summarization is the complexity of human language and the way people express themselves, especially in written text. Language is not only composed of long sentences with adjectives and adverbs to describe something but also relative sentences, appositions, etc. such insights may add valuable information they don’t help in establishing the main crux of information to be included into the summary. 

“Anaphora problem” is another barrier in text summarization. In language, we often replace the subject in the conversation with its synonyms or pronouns. The understanding of which pronoun substitutes for which term is the “anaphora problem.” 

“Cataphora problem” is the opposite problem of the anaphora problem. In these ambiguous words and explanations, a particular term is used in the text before introducing the term itself.

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Conclusion

The field of text summarization is experiencing rapid growth, and specialized tools are being developed to tackle more focused summarization tasks. With open-source software and word embedding packages becoming widely available, users are stretching the use case of this technology.

Automatic text summarization is a tool that enables a quantum leap in human productivity by simplifying the sheer volume of information that humans interact with daily. This not only allows people to cut down on the reading necessary but also frees up time to read and understand otherwise overlooked written works. It is only a matter of time that such summarizers get integrated so well that they create summaries indistinguishable from those written by humans.

If you wish to improve your NLP skills, you need to get your hands on these NLP projects. If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Frequently Asked Questions (FAQs)

1. What are the uses of NLP?

NLP or Natural Language Processing, one of the most sophisticated and interesting modern technologies, is used in diverse ways. Its top applications include – automatic word correction, auto prediction, chatbots and voice assistants, speech recognition in virtual assistants, sentiment analysis of human speech, email and spam filtering, translation, social media analytics, target advertising, text summarization, and resume scanning for recruitment, among others. Further advancements in NLP giving rise to concepts like Natural Language Understanding (NLU) are helping achieve higher accuracy and far superior outcomes from complex tasks.

2. Do I have to study mathematics to learn NLP?

With the abundance of resources available both offline and online, it is now easier to access study material designed for learning NLP. These study resources are all about specific concepts of this vast field called NLP rather than the bigger picture. But if you wonder whether mathematics is part of any of NLP concepts, then you must know that maths is an essential part of NLP. Mathematics, especially probability theory, statistics, linear algebra, and calculus, are the foundational pillars of the algorithms that drive NLP. Having a basic understanding of statistics is helpful so that you can build upon it as required. Still, there is no way to learn Natural Language processing without getting into mathematics.

3. What are some NLP techniques used to extract information?

In this digital age, there has been a massive surge in the generation of unstructured data, mainly in the form of audio, images, videos, and texts from various channels like social media platforms, customer complaints, and surveys. NLP helps extract useful information from volumes of unstructured data, which can help businesses. There are five common NLP techniques that are used to extract insightful data, namely – named entity recognition, text summarization, sentiment analysis, aspect mining, and topic modeling. There are many other data extraction methods in NLP, but these are the most popularly used.