Explore Courses
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Birla Institute of Management Technology Birla Institute of Management Technology Post Graduate Diploma in Management (BIMTECH)
  • 24 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Popular
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science & AI (Executive)
  • 12 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
University of MarylandIIIT BangalorePost Graduate Certificate in Data Science & AI (Executive)
  • 8-8.5 Months
upGradupGradData Science Bootcamp with AI
  • 6 months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
OP Jindal Global UniversityOP Jindal Global UniversityMaster of Design in User Experience Design
  • 12 Months
Popular
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Rushford, GenevaRushford Business SchoolDBA Doctorate in Technology (Computer Science)
  • 36 Months
IIIT BangaloreIIIT BangaloreCloud Computing and DevOps Program (Executive)
  • 8 Months
New
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Popular
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
Golden Gate University Golden Gate University Doctor of Business Administration in Digital Leadership
  • 36 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
Popular
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
Bestseller
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
IIIT BangaloreIIIT BangalorePost Graduate Certificate in Machine Learning & Deep Learning (Executive)
  • 8 Months
Bestseller
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in AI and Emerging Technologies (Blended Learning Program)
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
ESGCI, ParisESGCI, ParisDoctorate of Business Administration (DBA) from ESGCI, Paris
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration From Golden Gate University, San Francisco
  • 36 Months
Rushford Business SchoolRushford Business SchoolDoctor of Business Administration from Rushford Business School, Switzerland)
  • 36 Months
Edgewood CollegeEdgewood CollegeDoctorate of Business Administration from Edgewood College
  • 24 Months
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with Concentration in Generative AI
  • 36 Months
Golden Gate University Golden Gate University DBA in Digital Leadership from Golden Gate University, San Francisco
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Deakin Business School and Institute of Management Technology, GhaziabadDeakin Business School and IMT, GhaziabadMBA (Master of Business Administration)
  • 12 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science (Executive)
  • 12 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityO.P.Jindal Global University
  • 12 Months
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (AI/ML)
  • 36 Months
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDBA Specialisation in AI & ML
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
New
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGrad KnowledgeHutupGrad KnowledgeHutAzure Administrator Certification (AZ-104)
  • 24 Hours
KnowledgeHut upGradKnowledgeHut upGradAWS Cloud Practioner Essentials Certification
  • 1 Week
KnowledgeHut upGradKnowledgeHut upGradAzure Data Engineering Training (DP-203)
  • 1 Week
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
Loyola Institute of Business Administration (LIBA)Loyola Institute of Business Administration (LIBA)Executive PG Programme in Human Resource Management
  • 11 Months
Popular
Goa Institute of ManagementGoa Institute of ManagementExecutive PG Program in Healthcare Management
  • 11 Months
IMT GhaziabadIMT GhaziabadAdvanced General Management Program
  • 11 Months
Golden Gate UniversityGolden Gate UniversityProfessional Certificate in Global Business Management
  • 6-8 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
IU, GermanyIU, GermanyMaster of Business Administration (90 ECTS)
  • 18 Months
Bestseller
IU, GermanyIU, GermanyMaster in International Management (120 ECTS)
  • 24 Months
Popular
IU, GermanyIU, GermanyB.Sc. Computer Science (180 ECTS)
  • 36 Months
Clark UniversityClark UniversityMaster of Business Administration
  • 23 Months
New
Golden Gate UniversityGolden Gate UniversityMaster of Business Administration
  • 20 Months
Clark University, USClark University, USMS in Project Management
  • 20 Months
New
Edgewood CollegeEdgewood CollegeMaster of Business Administration
  • 23 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
KnowledgeHut upGradKnowledgeHut upGradBackend Development Bootcamp
  • Self-Paced
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 5 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
upGradupGradDigital Marketing Accelerator Program
  • 05 Months

Tokenization in Natural Language Processing

Updated on 24 November, 2022

5.74K+ views
7 min read

When dealing with textual data, the most basic step is to tokenize the text. ‘Tokens’ can be considered as individual words, sentences, or any minimum unit. Therefore, breaking the sentences into separate units is nothing but Tokenization.

By the end of this tutorial, you will have the knowledge of the following:

  • What is Tokenization
  • Different types of Tokenizations
  • Different ways to Tokenize

Tokenization is the most fundamental step in an NLP pipeline.

But why is that?

These words or tokens are later converted into numeric values so that the computer can understand and make sense out of it. These tokens are cleaned, pre-processed and then converted into numeric values by the methods of “Vectorization”. These vectors can then be fed to the Machine Learning algorithms and neural networks. 

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Tokenization can not only be word level, but also sentence level. That is, text can be either tokenized with words as tokens or sentences as tokens. Let’s discuss a few ways to perform tokenization.

Python Split()

The split() function of Python returns the list of tokens splitted by the character mentioned. By default, it splits the words by spaces.

Word Tokenization

Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP tasks.”
Tokens = Mystr.split()
#Output:
>> [‘This’, ‘is’, ‘a’, ‘tokenization’, ‘tutorial.’, ‘We’, ‘are’, ‘learning’, ‘different’, ‘tokenization’, ‘methods,’, ‘and’, ‘ways?’, ‘Tokenization’, ‘is’, ‘essential’, ‘in’, ‘NLP’, ‘tasks.’]

Sentence Tokenization

The same text can be splitted into sentences by passing the separator as “.”.

Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP tasks.”

Tokens = Mystr.split(“.”)
#Output:
>> [‘This is a tokenization tutorial’, ‘ We are learning different tokenization methods, and ways? Tokenization is essential in NLP tasks’, ”]

Though this seems straightforward and simple, it has a lot of flaws. And if you notice, it splits after the last “.” as well. And it doesn’t consider the “?” as an indicator of next sentence because it only takes one character, which is “.”.

Text data in real life scenarios is very dirty and not nicely put in words and sentences. A lot of garbage text might be present which will make it very difficult for you to tokenize this way. Therefore, let’s move ahead to better and more optimized ways of tokenization.

Must Read: Top 10 Deep Learning Techniques You Should Know

Regular Expression

Regular Expression (RegEx) is a sequence of characters that are used to match against a pattern of characters. We use RegEx to find certain patterns, words or characters to replace them or do any other operation on them. Python has the module re which is used for working with RegEx. Let’s see how we can tokenize the text using re.

Word Tokenization\

Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP’s tasks.”

Tokens = re.findall(“[\w’]+”, Mystr)
#Output:
>> [‘This’, ‘is’, ‘a’, ‘tokenization’, ‘tutorial’, ‘We’, ‘are’, ‘learning’, ‘different’, ‘tokenization’, ‘methods’, ‘and’, ‘ways’, ‘Tokenization’, ‘is’, ‘essential’, ‘in’, ‘NLP’, ‘tasks’]

So, what happened here?

The re.findall() function matches against all the sequences that match with it and stores them in a list. The expression “[\w]+” means that any character – be it alphabets or numbers or Underscore (“_”). The “+” symbol means all the occurrences of the pattern. So essentially it will scan all the characters and put them in the list as one token when it hits a whitespace or any other special character apart from an underscore.

Please notice that the word “NLP’s” is a single word but our regex expression broke it into “NLP” and “s” because of apostrophe. 

Sentence Tokenization

Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP’s tasks.”

Tokens = re.compile(‘[.!?] ‘).split(Mystr)
#Output:
>> [‘This is a tokenization tutorial’, ‘We are learning different tokenization methods, and ways’, ‘Tokenization is essential in NLP tasks.’]

Now, here we combined multiple splitting characters into one condition and called the re.split function. Therefore, when it hits any of these 3 characters, it will treat it as a separate sentence. This is an advantage of RegEx over the python split function where you cannot pass multiple characters to split at.

Also Read: Applications of Natural Language Processing

NLTK Tokenizers

Natural Language Toolkit (NLTK) is a Python library specifically for handling NLP tasks. NLTK consists of functions and modules built-in which are made for some specific processes of the complete NLP pipeline. Let’s have a look at how NLTK handles tokenization.

Word Tokenization

NLTK has a separate module, NLTK.tokenize, to handle tokenization tasks. For word tokenization, one of the methods it consists of is word_tokenize.

from nltk.tokenize import word_tokenize

Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP’s tasks.”
word_tokenize(Mystr)
#Output:
>>[‘This’, ‘is’, ‘a’, ‘tokenization’, ‘tutorial’, ‘.’, ‘We’, ‘are’, ‘learning’, ‘different’, ‘tokenization’, ‘methods’, ‘,’, ‘and’, ‘ways’, ‘?’, ‘Tokenization’, ‘is’, ‘essential’, ‘in’, ‘NLP’, ‘tasks’, ‘.’]

Please notice that word_tokenize considered the punctuations as separate tokens. To prevent this from happening, we need to remove all the punctuations and special characters before this step itself.

Sentence Tokenization

from nltk.tokenize import sent_tokenize

Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP’s tasks.”
sent_tokenize(Mystr)
#Output:
>> [‘This is a tokenization tutorial.’, ‘We are learning different tokenization methods, and ways?’, ‘Tokenization is essential in NLP tasks.’]

SpaCy Tokenizers

SpaCy is probably one of the most advanced libraries for NLP tasks. It consists of support for almost 50 languages. Therefore the first step is to download the core for English language. Next, we need to import the English module which loads the tokenizer, tagger, parser, NER and word vectors.

Word Tokenization 

from spacy.lang.en import English

nlp = English()
Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP’s tasks.”
my_doc = nlp(Mystr)

Tokens = []
for token in my_doc:
    Tokens.append(token.text)
Tokens 
#Output:
>> [‘This’, ‘is’, ‘a’, ‘tokenization’, ‘tutorial’, ‘.’, ‘We’, ‘are’, ‘learning’, ‘different’, ‘tokenization’, ‘methods’, ‘,’, ‘and’, ‘ways’, ‘?’, ‘Tokenization’, ‘is’, ‘essential’, ‘in’, ‘NLP’, “‘s”, ‘tasks’, ‘.’]

Here, when we call the function nlp with MyStr passed, it creates the word tokens for it. Then we index through them and store them in a separate list.

Sentence Tokenization

from spacy.lang.en import English

nlp = English()
sent_tokenizer = nlp.create_pipe(‘sentencizer’)
nlp.add_pipe(sent_tokenizer)

Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP’s tasks.”

my_doc = nlp(Mystr)

Sents = []
for sent in doc.sents:
    Sents.append(sent.text)
Sents 
#Output:
>> [‘This is a tokenization tutorial.’, ‘We are learning different tokenization methods, and ways?’, “Tokenization is essential in NLP’s tasks.”]

For sentence tokenization, call the creat_pipe method to create the sentencizer component which creates sentence tokens. We then add the pipeline to the nlp object. When we pass the text string to nlp object, it creates sentence tokens for it this time. Now they can be added to a list in the same way as in the previous example.

Keras Tokenization

Keras is one of the most preferred deep learning frameworks currently. Keras also offers a dedicated class for text processing tasks – keras.preprocessing.text. This class has the text_to_word_sequence function which creates word level tokens from the text. Let’s have a quick look.

from keras.preprocessing.text import text_to_word_sequence

Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP’s tasks.”
Tokens = text_to_word_sequence(Mystr)
Tokens 
#Output:
>> [‘this’, ‘is’, ‘a’, ‘tokenization’, ‘tutorial’, ‘we’, ‘are’, ‘learning’, ‘different’, ‘tokenization’, ‘methods’, ‘and’, ‘ways’, ‘tokenization’, ‘is’, ‘essential’, ‘in’, “nlp’s”, ‘tasks’]

Please notice that it treated the word “NLP’s” as a single token. Plus, this keras tokenizer lowercased all the tokens which is an added bonus.

Gensim Tokenizer

Gensim is another popular library for handling NLP based tasks and topic modelling. The class gensim.utils offers a method tokenize, which can be used for our tokenization tasks.

Word Tokenization

from gensim.utils import tokenize
Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP’s tasks.”

list(tokenize(Mystr))
#Output:
>> [‘This’, ‘is’, ‘a’, ‘tokenization’, ‘tutorial’, ‘We’, ‘are’, ‘learning’, ‘different’, ‘tokenization’, ‘methods’, ‘and’, ‘ways’, ‘Tokenization’, ‘is’, ‘essential’, ‘in’, ‘NLP’, ‘s’, ‘tasks’]

Sentence Tokenization

For sentence tokenization, we use the split_sentences method from the gensim.summarization.textcleaner class.

from gensim.summarization.textcleaner import split_sentences

Mystr = “This is a tokenization tutorial. We are learning different tokenization methods, and ways? Tokenization is essential in NLP’s tasks.”

Tokens = split_sentences(Mystr)
Tokens 
#Output:
>> [‘This is a tokenization tutorial.’, ‘We are learning different tokenization methods, and ways?’, “Tokenization is essential in NLP’s tasks.”]

Before You Go

In this tutorial we discussed various ways to tokenize your text data based on applications. This is an essential step of the NLP pipeline, but it is necessary to have the data cleaned before proceeding to tokenization.

If you’re interested to learn more about machine learning & AI, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Get Free Consultation

+91
Phone number

By clicking "Submit" you Agree toupGrad's Terms & Conditions



SUGGESTED BLOGS

Technology will surely kill some jobs, but not all of them

898.89K+

Technology will surely kill some jobs, but not all of them

“Remember that dystopian view of the future in which technology displaces millions of people from their jobs? It’s happening” Jeff Weiner, CEO LinkedIn, wrote when Microsoft announced it was acquiring LinkedIn. Some of the top companies in the world such as handset maker Foxconn, US-based retail company Walmart and McDonald’s are now turning to robots and automation. It’s true that some jobs may become defunct as this shift becomes more pronounced. At the same time, these technologies doubtless offer lots of opportunities for many other types of jobs such as digital curation and preservation, data mining and big data analytics. Top Machine Learning and AI Courses Online Master of Science in Machine Learning & AI from LJMU Executive Post Graduate Programme in Machine Learning & AI from IIITB Advanced Certificate Programme in Machine Learning & NLP from IIITB Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland To Explore all our certification courses on AI & ML, kindly visit our page below. Machine Learning Certification The shift of skills in jobs Most industries in India and around the world are undergoing a digital transformation, and skills to utilise emerging technologies like mobility, cloud computing, business intelligence, artificial intelligence, machine learning, robotics and nanotechnology among others are gaining popularity. In fact, the World Economic Forum estimates that (pdf) 65% of children entering school today will ultimately end up working in jobs that don’t yet exist. For example, demand for data analysts — a relatively new occupation — increased by almost 90% by the end of 2014 within a year. Many big e-commerce players, credit firms, airlines, hospitality, BFSI and retail industries already use analytics in a major way. In India, the analytics and business intelligence industry together is sized around 10 billion and is expected to grow by 22% to 26.9 billion by 2017. Skill deprivation: Education alone won’t guarantee a job! Human cognition will be in demand in the automation age When we speak of manual work being supplanted by technology, we must keep in mind that routine jobs are most susceptible to being replaced by automation. And while non-cognitive and routine work is decreasing, knowledge-oriented work is increasing. The demand for labour adept at managing such technology is on the rise – a trend that is likely to intensify as our processes become more technologically complex and disruptive. Humans are discovering newer ways of enhancing their productivity and efficiency. Most of the pattern-driven work is slowly getting automated as technology presents new ways to speed it up. But this doesn’t mean humans will be useless. They will be the ones who will need to identify problems and ask the right questions. Trending Machine Learning Skills AI Courses Tableau Certification Natural Language Processing Deep Learning AI Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career. Demand for newer jobs will remain History shows us that jobs have consistently been rendered obsolete with the advent of technology and machines. When the washing machine was invented, those who professionally hand-washed clothes faced large-scale unemployment and redundancy. People had to learn a more complex skill in a similar area or enter a new profession altogether. Similarly, drivers may be out of jobs if driverless cars become a norm in the future but other jobs that require manufacturing, programming and sale of such cars will have high demand. This is the way old jobs metamorphose into new ones and the economy learns to keep up. There’ll Be A Billion-Plus Job-Seekers By 2050! India ripe for tech driven roles The world is set for a technology boom with information technology jobs expected to grow by 22% through 2020 — and India is one of the leaders of the troupe. To capitalise, young job-seekers have to train themselves and take charge of technology-driven roles such as product managers, application developers, data analysts and digital marketers among others. And the rising number of startups in India, especially in the online space, provides a fertile ground. In fact, software startups in India are going to create 80,000 jobs by the following year itself. So jobs that seem to be at risk, may be like molecules – splitting further and creating more jobs – just of a different kind. Instead of worrying about unemployment, those entering the workforce need to keep one finger on the pulse of evolving technology, and invest in training themselves to acquire new skill sets. Popular AI and ML Blogs & Free Courses IoT: History, Present & Future Machine Learning Tutorial: Learn ML What is Algorithm? Simple & Easy Robotics Engineer Salary in India : All Roles A Day in the Life of a Machine Learning Engineer: What do they do? What is IoT (Internet of Things) Permutation vs Combination: Difference between Permutation and Combination Top 7 Trends in Artificial Intelligence & Machine Learning Machine Learning with R: Everything You Need to Know AI & ML Free Courses Introduction to NLP Fundamentals of Deep Learning of Neural Networks Linear Regression: Step by Step Guide Artificial Intelligence in the Real World Introduction to Tableau Case Study using Python, SQL and Tableau
Read More

by Mayank Kumar

07 Jul'16
Keep an Eye Out for the Next Big Thing: Machine Learning

5.2K+

Keep an Eye Out for the Next Big Thing: Machine Learning

Artificial Intelligence (AI) and Machine Learning (ML) are buzzwords that are increasingly being used to discuss upcoming trends in Data Science and other technologies. However, are these two concepts really peas in the same pod? Artificial Intelligence is a broader concept of smart machines carrying out various tasks on their own. While Machine Learning is an application of Artificial Intelligence where machines learn from data provided to them using various types of algorithms. Therefore, Machine Learning is a method of data analysis that automates analytical model building, allowing computers to find hidden insights without being explicitly programmed to do so. Sounds like the pitch-perfect solution to all our technological woes, doesn’t it? Top Machine Learning and AI Courses Online Master of Science in Machine Learning & AI from LJMU Executive Post Graduate Programme in Machine Learning & AI from IIITB Advanced Certificate Programme in Machine Learning & NLP from IIITB Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland To Explore all our certification courses on AI & ML, kindly visit our page below. Machine Learning Certification Evolution of Machine Learning Arthur Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term ‘Machine Learning’ in 1959 while at IBM. During its early days, Machine Learning was born from pattern recognition with the theory that computers can learn from patterns in data without being programmed to perform specific tasks. Researchers interested in Artificial Intelligence later developed algorithms with which computers or machines could learn from data. As a result of this, whenever the machines were exposed to new data, they were able to independently adapt as well Trending Machine Learning Skills AI Courses Tableau Certification Natural Language Processing Deep Learning AI Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career. It’s a science that’s not new, but one that’s gaining fresh momentum, thanks mainly to new computing technologies that have evolved over the last few decades. Many Machine Learning algorithms have been around for a long time. But, the ability to automatically apply complex mathematical calculations to large data sets is a fresh development being witnessed. Here are a few examples of Machine Learning applications you might be familiar with: Online recommendations from Amazon and Netflix. YouTube detecting and removing terror content on the platform. Knowing what customers are saying about you on Twitter The Rise of Machine Learning The emergence of the internet, as well as the massive increase in digital information being generated, stored, and made available for analysis, are seen to be the two important factors that have led to the emergence of Machine Learning. With the magnitude of quality data from the internet, economical data storage options and improved data processing capabilities, Machine Learning algorithms are seen as a vehicle propelling the development of Artificial Intelligence at a scorching pace in recent times. Neural Networks A neural network works on a system of probability by being able to make statements, decisions, or predictions based on data fed to it. Moreover, a feedback loop enables further “learning” by sensing; it also modifies the learning process based on whether its decisions are right or wrong. An artificial neural network is a computer system with node networks inspired from the neurons in the animal brain. Such networks can be taught to recognise and classify patterns through witnessing examples rather than telling the algorithm how exactly to recognise and classify patterns. Machine Learning derived applications of neural networks can read pieces of text and recognise the nature of the text – whether it is a complaint or congratulatory note. They can also listen to a piece of music, decide whether it is likely to make someone happy or sad, and find other pieces of similar music. What’s more, they can even compose music expressing the same mood or theme. In the near future, with the help of Machine Learning and Artificial Intelligence, it should be possible for a person to communicate and interact with electronic devices and digital information thanks to another emerging field of AI called Natural Language Processing (NLP). NLP has become a source of cutting-edge innovation in the past few years, and one which is heavily reliant on Machine Learning. NLP applications attempt to understand human communication, both written as well as spoken, and communicate using various languages. In this context, Machine Learning helps machines understand the nuances in human language and respond in a way that a particular audience is likely to comprehend. So, who is actually using it? Most industries working with large amounts of data have recognised the value of Machine Learning. Large companies glean vital real-time actionable insights from stored data and are hence able to increase efficiency or gain an advantage over their competitors. Financial services Banks and other businesses use Machine Learning to identify important insights in data generated and thereby prevent frauds. These insights can identify investment opportunities or help investors know when to trade. Data mining can also identify clients with high-risk profiles or use cyber surveillance to warn customers about fraud and thereby minimise identity theft. Marketing and sales E-commerce websites use Machine Learning technology to analyse buying history based on previous purchases, to recommend items that you may like and promote other items. The retail industry is enlisting the ability of websites to capture data, analyse it, and use it to personalise a shopping experience or implement marketing campaigns. Summing up, Artificial Intelligence and, in particular, Machine Learning, certainly has a lot to offer today. With its promise of automating mundane tasks as well as offering creative insights, industries in every sector from banking to healthcare and manufacturing are reaping the benefits. Popular AI and ML Blogs & Free Courses IoT: History, Present & Future Machine Learning Tutorial: Learn ML What is Algorithm? Simple & Easy Robotics Engineer Salary in India : All Roles A Day in the Life of a Machine Learning Engineer: What do they do? What is IoT (Internet of Things) Permutation vs Combination: Difference between Permutation and Combination Top 7 Trends in Artificial Intelligence & Machine Learning Machine Learning with R: Everything You Need to Know AI & ML Free Courses Introduction to NLP Fundamentals of Deep Learning of Neural Networks Linear Regression: Step by Step Guide Artificial Intelligence in the Real World Introduction to Tableau Case Study using Python, SQL and Tableau Eventually, scientists hope to develop human-like Artificial Intelligence that is capable of increasing the speed of various automated functions, especially with the advent of chatbots in the internet realm. Much of the exciting progress that we have seen in recent years is due to progressive changes in Artificial Intelligence, which have been brought about by Machine Learning. This is clearly why Machine Learning is poised to become the next big thing in the data sciences sphere. So go ahead, UpGrad yourself to stay ahead of the curve.
Read More

by Varun Dattaraj

17 Oct'17
The Difference between Data Science, Machine Learning and Big Data!

7.87K+

The Difference between Data Science, Machine Learning and Big Data!

Many professionals and ‘Data’ enthusiasts often ask, “What’s the difference between Data Science, Machine Learning and Big Data?” This is a question frequently asked nowadays. Here’s what differentiates Data Science, Machine Learning and Big Data from each other: Data Science Data Science follows an interdisciplinary approach. It lies at the intersection of Maths, Statistics, Artificial Intelligence, Software Engineering and Design Thinking. Data Science deals with data collection, cleaning, analysis, visualisation, model creation, model validation, prediction, designing experiments, hypothesis testing and much more. The aim of all these steps is just to derive insights from data. Top Machine Learning and AI Courses Online Master of Science in Machine Learning & AI from LJMU Executive Post Graduate Programme in Machine Learning & AI from IIITB Advanced Certificate Programme in Machine Learning & NLP from IIITB Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland To Explore all our certification courses on AI & ML, kindly visit our page below. Machine Learning Certification Digitisation is progressing at an exponential rate. Internet accessibility is improving at breakneck speed. More and more people are getting absorbed into the digital ecosystem. All these activities are generating a humongous amount of data. Companies are currently sitting on a data landmine. But data, by itself, is not of much use. This is where Data Science comes into the picture. It helps in mining this data and deriving insights from it; for taking meaningful action. Various Data Science tools can help us in the process of insight generation. If you are a beginner and interested to learn more about data science, check out our data scientist courses from top universities. Frameworks exist to help derive insights from data. A framework is nothing but a supportive structure. It’s a lifecycle used to structure the development of Data Science projects. A lifecycle outlines the steps —  from start to finish — that projects usually follow. In other words, it breaks down the complex challenges into simple steps. This ensures that any significant phase, which leads to the generation of actionable insights from data, is not missed out. One such framework is the ‘Cross Industry Standard Process for Data Mining’, abbreviated as the CRISP-DM framework. The other is the ‘Team Data Science Process’ (TDSP) from Microsoft. Let’s understand this with the help of an example. A bank named ‘X’, which has been in business for the past ten years. It receives a loan application from one of its customers. Now, it wants to predict whether this customer will default in repaying the loan. How can the bank go about achieving this task? Like every other bank, X must have captured data regarding various aspects of their customers, such as demographic data, customer-related data, etc. In the past ten years, many customers would have succeeded in repaying the loan, but some customers would have defaulted. How can this bank leverage this data to improve its profitability? To put it simply, how can it avoid providing loans to a customer who is very likely to default? How can they ensure not losing out on good customers who are more likely to repay their debts? Data Science can help us resolve this challenge. Raw Data —> Data Science —-> Actionable Insights Let’s understand how various branches of Data Science will help the bank overcome its challenge. Statistics will assist in the designing of experiments, finding a correlation between variables, hypothesis testing, exploratory data analysis, etc. In this case, the loan purpose or educational qualifications of the customer could influence their loan default. After performing data cleaning and exploratory study, the data becomes ready for modeling. Statistics and artificial intelligence provide algorithms for model creation. Model creation is where machine learning comes into the picture. Machine learning is a branch of artificial intelligence that is utilised by data science to achieve its objectives. Before proceeding with the banking example, let’s understand what machine learning is. Trending Machine Learning Skills AI Courses Tableau Certification Natural Language Processing Deep Learning AI Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career. Machine Learning “Machine learning is a form of artificial intelligence. It gives machines the ability to learn, without being explicitly programmed.” How can machines learn without being explicitly programmed, you might ask? Aren’t computers just devices made to follow instructions? Not anymore. Machine learning consists of a suite of intelligent algorithms, enabling machines to learn without being explicitly programmed for it. Machine learning helps you learn the objective function — which maps the inputs to the target variable, or independent variables to the dependent variables. In our banking example, the objective function determines the various demographics, customer and behavioural variables which influences the probability of a loan default. Independent attributes or inputs are the demographic, customer and behavioural variables of a customer. The dependent variable is either ‘to default’ or not. The objective function is an equation which maps these inputs to outputs. It’s a function which tells us which independent variables influence the dependent variable, i.e. the tendency to default. This process of deriving an objective function, which maps inputs to outputs is known as modelling. Initially, this objective function will not be able to predict precisely whether a customer will default or not. As the model encounters new instances, it learns and evolves. It improves as more and more examples become available. Ultimately, this model reaches a stage where it will be able to tell with a certain degree of precision. hings like, which customer is going to default, and whom the bank can rely on to improve its profitability. Machine learning aims to achieve ‘generalisability’. This means, the objective function — which maps the inputs to the output — should apply to the data, which hasn’t encountered it, yet. In the banking example, our model learns patterns from the data provided to it. The model determines which variables will influence the tendency to default. If a new customer applies for a loan, at this point, his/her variables are not yet seen by this model. The model should be relevant to this customer as well. It should predict reliably whether this customer will default or not. If this model is unable to do this, then it will not able to generalise the unseen data. It is an iterative process. We need to create many models to see which work, and which don’t. Data science and analysis utilise machine learning for this kind of model creation and validation. It is important to note that all the algorithms for this model creation do not come from machine learning. They can enter from various other fields. The model needs to be kept relevant at all times. If the conditions change, then the model — which we created earlier — may become irrelevant. The model needs to be checked for its predictability at different times and needs to be modified if its predictability reduces. For the banking employee to take an instant decision the moment a customer applies for a loan, the model needs to be integrated with the bank’s IT systems. The bank’s servers should host the model. As a customer applies for a loan, his variables must be captured from a website and utilised by the model running on the server. Then, this model should convey the decision — whether the credit can be granted or not — to the bank employee, instantly. This process comes under the domain of information technology, which is also utilised by data science. In the end, it is all about communicating the results from the analysis. Here, the presentation and storytelling skills are required to demonstrate the effects from the study efficiently. Design-thinking helps in visualising the results, and effectively tell the story from the analysis. Big Data The final piece of our puzzle is ‘Big Data’. How is it different from data science and machine learning? According to IBM, we create 2.5 Quintillion (2.5 × 1018) bytes of data every day! The amount of data which companies gather is so vast that it creates a large set of challenges regarding data acquisition, storage, analysis and visualisation. The problem is not entirely regarding the quantity of data that is available, but also its variety, veracity and velocity. All these challenges necessitated a new set of methods and techniques to deal with the same. Big data involves the four ‘V’s — Volume, Variety, Veracity, and Velocity — which differentiates it from conventional data. Volume: The amount of data involved here is so humongous, that it requires specialised infrastructure to acquire, store and analyse it. Distributed and parallel computing methods are employed to handle this volume of data. Variety: Data comes in various formats; structured or unstructured, etc. Structured means neatly arranged rows and columns. Unstructured means that it comes in the form of paragraphs, videos and images, etc. This kind of data also consists of a lot of information. Unstructured data requires different database systems than traditional RDBMS. Cassandra is one such database to manage unstructured data. Veracity:  The presence of huge volumes of data will not lead to actionable insights. It needs to be correct for it to be meaningful. Extreme care needs to be taken to make sure that the data captured is accurate, and that the sanctity is maintained, as it increases in volume and variety. Popular AI and ML Blogs & Free Courses IoT: History, Present & Future Machine Learning Tutorial: Learn ML What is Algorithm? Simple & Easy Robotics Engineer Salary in India : All Roles A Day in the Life of a Machine Learning Engineer: What do they do? What is IoT (Internet of Things) Permutation vs Combination: Difference between Permutation and Combination Top 7 Trends in Artificial Intelligence & Machine Learning Machine Learning with R: Everything You Need to Know AI & ML Free Courses Introduction to NLP Fundamentals of Deep Learning of Neural Networks Linear Regression: Step by Step Guide Artificial Intelligence in the Real World Introduction to Tableau Case Study using Python, SQL and Tableau Velocity: It refers to the speed at which the data is generated. 90% of data in today’s world was created in the last two years alone. However, this velocity of information generated is bringing its own set of challenges. For some businesses, real-time analysis is crucial. Any delay will reduce the value of the data and its analysis for business. Spark is one such platform which helps analyse streaming data. As time progresses, new ‘V’s get added to the definition of big data. But — volume, variety, veracity, and velocity — are the four essential constituents which differentiate data from big data. The algorithms which deal with big data, including machine learning algorithms, are optimised to leverage a different hardware infrastructure, that is utilised to handle big data. To summarise, Executive PG Programme in Data Science is an interdisciplinary field with an aim to derive actionable insights from data. Machine learning is a branch of artificial intelligence which is utilised by data science to teach the machines the ability to learn, without being explicitly programmed. Volume, variety, veracity, and velocity are the four important constituents which differentiate big data from conventional data.
Read More
Natural Language Generation: Top Things You Need to Know

6.14K+

Natural Language Generation: Top Things You Need to Know

From a linguistic point of view, language was created for the survival of human beings. The effective communication helped a primitive man to hunt, gather and survive in groups. This means a language is necessary to carry out all activities needed for not only survival but also a meaningful existence of human beings. As humans evolved so did their literary skills. From pictorial scripts to well developed universal ones, we have made an impressive progress. In fact, such remarkable progress that a machine developed by humans now can read data, write text and not in a machine, binary language but a real, conversational language. Natural Language Generation has made this possible. Top Machine Learning and AI Courses Online Master of Science in Machine Learning & AI from LJMU Executive Post Graduate Programme in Machine Learning & AI from IIITB Advanced Certificate Programme in Machine Learning & NLP from IIITB Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland To Explore all our certification courses on AI & ML, kindly visit our page below. Machine Learning Certification What is Natural Language Generation? Natural language is an offshoot of Artificial Intelligence. It is a tool to automatically analyse data, interpret it, identify the important information and narrow it down to a simple text, to make decision making in business easier, faster and of course, cheaper. It crunches numbers and drafts a narrative for you. Trending Machine Learning Skills AI Courses Tableau Certification Natural Language Processing Deep Learning AI Learn ML courses from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career. What are the different variations of Natural Language Generation? Basic Natural Language Generation: The basic form of NLG converts data into text through Excel-like functions. For example, a mail merge that restates numbers into a language. Templated Natural Language Generation: In this type of NGL tool, a user takes the call on designing content templates and interpreting the output. Templated systems are restricted in their capability to scan multiple data sources, perform advanced analytics. Advanced Natural Language Generation: It is the ‘smartest’ way of analysing data. It processes the data right from the beginning and separates it based on its significance for a particular audience, and then writes the narrative with relevant information in a conversational tone. For example, if a data analyst wants to know how a particular product is doing in a market, an advanced NLG tool would write a report by segregating the data of only the required product. Do we really need natural language generation? A number of devices are connected to the internet creating a huge Internet of Things. All these devices are creating data at a lightning speed leading to Big Data generation. It is almost humanly impossible to analyse, interpret and draw rational interference from this enormous data. Along with data analysis and accurate interpretation the need for the optimum use of resources, cost cutting and time management are the essentials for a modern business to survive, grow and flourish. Natural Language Generation helps up to effectively achieve all these goals in one go. Additionally, when a machine can do these routine tasks, and accurately. So, valuable human resources can indulge themselves in the activities that require innovation, creativity and problem-solving. Will Natural Language Generation kill jobs? First of all, not all kinds of narratives can be written by Natural Language Generation tools. It is only for creating a text based on data. Creative writing, engaging content is developed not only by analytical skills but with the help of major emotional involvement. The passion of an individual, their skills, their ability to cater complex terms in simpler formats can’t be replaced. Additionally, to rationalise the text created by Natural Language Generation tools, human intervention is critical. Natural Language Generation only augments the job and enriches the life of employees by freeing them from menial jobs. Alain Kaeser, founder of Yseop has rightly acknowledged that- “The next industrial revolution will be the artificial intelligence revolution and the automation of knowledge work and repetitive tasks to enhance human capacity”. Why should you get a hang of Natural Language Generation? A research commissioned by Forrester Research anticipated a 300% increase in investment in artificial intelligence in 2017 compared to 2016. The Artificial Intelligence market will grow from $8 billion in 2016 to more than $47 billion in 2020. Based on this report, Forbes magazine has come up with a list of the ‘hottest ten Artificial Intelligence technologies’ that will rule the market in the near future. Natural Language Generation is one of them and it is set to see a huge boost. Examples and Applications of Natural Language Generation Natural Language Generation techniques are put to use across various industries as per their requirements. Healthcare-Pharma, Banking services, Digital marketing… it’s everywhere! From fund reporting in finance and campaign analytics reporting in marketing to personalised client alerts for preparing dashboards in sales and customer service maintenance, it is used to generate effective results for all departments in an organisation. Let’s have a quick look at how NLG has varied applications in various departments: Marketing – Two main responsibilities of a marketing department are designing market strategy and conducting market research. Both of these activities heavily depend on data analysis, and in today’s world of big data, it is becoming increasingly complex. Natural Language Generation tools can help you scan big data, analyse it and write reports for you within a few hours. Sales – A sales analysis report indicates the trends in a company’s sales volume over a period of time. A sales analysis report throws light on the factors that affects sales, like season, competitors strategy, advertising efforts etc. Managers use sales analysis reports to recognise market opportunities and areas where they could increase volume. These reports are purely based on humongous data. Natural Language Generation programs save your time and efforts of manually scanning data, finding trends and writing reports. Once you feed the inputs, it takes care of all of these activities. Banking and finance – May it be a finance department of an organisation or an investment bank, financial reports stating the financial health of a company needs to be written and sent out to shareholders, investors, rating agencies, government agencies etc. The general financial statements like balance sheets, Statement of cash flows, Income statement etc. are loaded with numbers and a reader likes to have a quick understanding of these statements. Natural Language Generation software scans through these statements and presents this information in a simple, text format rather than complicated accounting one. Healthcare and medicine – Recently Natural Language Generation tools are being used to summarise e-medical records. Additional research in this area is opening doors to prudent medical decision-making for medical professionals. It is also being used in communicating with patients, as a part of patient awareness programs in India, as per the NCBI report. The data collected through medical research like what kind of lifestyle diseases are most dreadful or what kinds of habits are healthy can be summarized in a simple language for patients which is extremely useful for the doctors to make a case for their advice. And this is just the tip of the iceberg. The applications of NLG tools are widespread already and are ready to take off to greater heights in the future.   Techniques of natural language generation – How to get started A refined Natural Language Generation system needs to inject some aspects of planning and amalgamation of information to enable the NLG tools to generate the text which appears natural and interesting. The general stages of natural language generation, as proposed by Dale and Reiter in their book ‘Building Natural Language Generation Systems’ are: Content determination: In this stage, a data analyst must decide what kind of information to present by using their discretion with respect to relevance. For example, deciding what kind of information a share trader would want to know vs what kind of information a dealer in the commodity market would want to know. Document structuring: In this stage, a user will have to decide the sequence, format of content and the desired template. For example, to decide the order of large cap, mid cap, small cap shares while writing a narrative about equity movement in the stock market. Aggregation: No repetition is the basic rule of any report writing. To keep it simple and improve readability, merging sentences, omitting repetitive words, phrases etc, falls under this stage. For example, if NLG software is writing a report on sales and there is no substantial change in volume of sales for a few months, there are chances NLG software might write repetitive paragraphs for no substantial information. You will then have to condense it in a way it does not become long and boring. Lingual choice: Deciding what words to use exactly to describe particular concepts. For example, deciding whether to use the word ‘medium’ or ‘moderate’ while describing a change. Best software products available for natural language generation There are a variety of software products available to help you get started with Natural Language Generation. Quill, Syntheses, Arria, Amazon Polly, Yseop are popular ones. You can make a decision based on the industry you are operating in, for the department you will be deploying the tool, exact nature of report creation, etc. Let us see what kind of aid does these programs offer to the businesses. Yseop: Yseop Compose’s Natural Language Generation software enables data-driven decision making by explaining insights in a plain language. Yseop Compose is the only multilingual Natural Language Generation software and hence truly global. Amazon Polly: It is a software that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Arria: Arria NLG Platform is the one that integrates cutting-edge techniques in data analytics, artificial intelligence and computational linguistics. It analyses large and diverse data sets and automatically writes tailored, actionable reports on what’s happening within that data, with no human intervention, at vast scale and speed. Quill: It is an advanced NLG platform which comprehends user intent and performs relevant data analysis to deliver Intelligent Narratives—automated stories full of audience-relevant, insightful information. Synthesys: It is one of the popular NLG software products that scans through all data and highlights the important people, places, organizations, events and facts being discussed, resolve highlighted points and determines what’s important, connecting the dots together and figures out what the final picture means by comparing it with the opportunities, risks and anomalies users are looking for. Natural Language Generation tools automate analysis and increase the efficacy of Business Intelligence tools. Rather than generating charts and tables, NLG tools interpret the data and draft analysis in a written form that communicates precisely what’s important to know. These tools perform regular analysis of predefined data sets, eliminate the manual efforts required to draft reports and the skilled labour required to analyse and interpret the results. Popular AI and ML Blogs & Free Courses IoT: History, Present & Future Machine Learning Tutorial: Learn ML What is Algorithm? Simple & Easy Robotics Engineer Salary in India : All Roles A Day in the Life of a Machine Learning Engineer: What do they do? What is IoT (Internet of Things) Permutation vs Combination: Difference between Permutation and Combination Top 7 Trends in Artificial Intelligence & Machine Learning Machine Learning with R: Everything You Need to Know AI & ML Free Courses Introduction to NLP Fundamentals of Deep Learning of Neural Networks Linear Regression: Step by Step Guide Artificial Intelligence in the Real World Introduction to Tableau Case Study using Python, SQL and Tableau What are the best resources to learn Natural Language Generation? Gartner, a leading research and advisory company forecasts that most companies will have to employ a Chief Data officer by 2019. With the gigantic amount of data available, it is important to decide which information can add business value, drive efficiency and improve risk management. This will be the responsibility of Data Officers. With increasing global demand for the profession, there can be no better time to learn about Natural Language Generation which is a critical part of Data Science and Artificial Intelligence. Though Natural Language generation has a huge scope, there are very few comprehensive academic programs designed to train candidates to be future ready. However, with a great vision, UpGrad offers a PG Diploma in Machine Learning and AI, in partnership with IIIT-Bangalore, which aims to build highly skilled professionals in India to cater to the increasing global demand. It gives you a chance to learn from a comprehensive collection of case-studies, hand-picked by industry experts, to give you an in-depth understanding of how Machine Learning & Artificial Intelligence impact industries like Telecom, Automobile, Finance & more. What are you waiting for? Don’t let go of this wonderful opportunity, start exploring today!
Read More

by Maithili Pradhan

30 Jan'18
A Beginner’s Guide To Natural Language Understanding

8.3K+

A Beginner’s Guide To Natural Language Understanding

“A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.” – Alan Turing Best Machine Learning and AI Courses Online Master of Science in Machine Learning & AI from LJMU Executive Post Graduate Programme in Machine Learning & AI from IIITB Advanced Certificate Programme in Machine Learning & NLP from IIITB Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland To Explore all our courses, visit our page below. Machine Learning Courses The entire gamut of artificial intelligence is based on machines being able to ‘understand’ and ‘respond’ to human beings. Which is impossible without the capability of machines to interact with humans in their natural language, like other human beings. Moreover, understanding does not involve the mere exchange of information and data but an exchange of emotions, feelings, ideas and intent. Can machines ever do that? Well, the answer is affirmative and it is not even that surprising anymore. What is this miraculous technology that smoothly facilitates the interaction between humans and machines? It is Natural Language Understanding. What is Natural Language Understanding? Natural Language Understanding is a part of Natural Language Processing. It undertakes the analysis of content, text-based metadata and generates summarized content in natural, human language. It is opposite to the process of Natural Language Generation. NLG deals with input in the form of data and generates output in the form of plain text while Natural Language Understanding tools process text or voice that is in natural language and generates appropriate responses by summarizing, editing or creating vocal responses. In-demand Machine Learning Skills Artificial Intelligence Courses Tableau Courses NLP Courses Deep Learning Courses Get Machine Learning Certification from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career. Natural Language Understanding Vs Natural Language Processing Natural Language Processing is a wide term which includes both Natural Language Understanding and Natural Language Generations along with many other techniques revolving around translating and analysing natural language by machines to perform certain commands.    Examples of Natural Language Processing Natural Language Processing is everywhere and we use it in our daily lives without even realising it. Do you know how spam messages are separated from your emails? Or autocorrect and predictive typing that saves so much of our time, how does that happen? Well, it is all part of Natural Language Processing. Here are some examples of Natural Language Processing technologies used widely: Intelligent personal assistants – We are all familiar with Siri and Cortana. These mobile software products that perform tasks, offer services, with a combination of user input, location awareness, and the ability to access information from a variety of online sources are undoubtedly one of the biggest achievements of natural language processing. Machine translation – To read a description of a beautiful picture on Instagram or to read updates on Facebook, we all have used that ‘see translation’ command at least once. And google translation services helps in urgent situations or sometimes just to learn few new words. These are all examples of machine translations, where machines provide us with translations from one natural language to another. Speech recognition – Converting spoken words into data is an example of natural language processing. It is used for multiple purposes like dictating to Microsoft Word, voice biometrics, voice user interface, etc. Affective computing – It is nothing but emotional intelligence training for machines. They learn to understand your emotions, feelings, ideas to interact with you in more humane ways. Natural language generation – Natural language generation tools scan structured data, undertake analysis and generate information in text format produced in natural language. Natural language understanding – As explained above, it scans content written in natural languages and generates small, comprehensible summaries of text. Learn ML courses from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career. Best tools for Natural Language Understanding available today Natural Language Processing deals with human language in its most natural form and on a real-time basis, as it appears in social media content, emails, web pages, tweets, product descriptions, newspaper articles, and scientific research papers, etc, in a variety of languages. Businesses need to keep a tab on all this content, constantly. Here are a few popular natural language understanding software products which effectively aid them in this daunting task. Wolfram – Wolfram Alpha is an answer engine developed by Wolfram Alpha LLC (a subsidiary of Wolfram Research). It is an online service that provides answers to factual questions by computing the answer from externally sourced, “curated data”. Natural language toolkit – The Natural Language Toolkit, also known as NLTK, is a suite of programs used for symbolic and statistical natural language processing (NLP) for the English language. It is written in the Python programming language and was developed by Steven Bird and Edward Loper at the University of Pennsylvania. Stanford coreNLP – Stanford CoreNLP is an annotation-based NLP pipeline that offers core natural language analysis. The basic distribution provides model files for the analysis of English, but the engine is compatible with models for other languages. GATE (General Architecture for Text Engineering) – It offers a wide range of natural language processing tasks. It is a mature software used across industries for more than 15 years. Apache openNLP – The Apache OpenNLP is a toolkit based on machine learning to process natural language text. It is written in Java and is produced by Apache software foundation. It offers services like tokenizers, chucking, parsing, part of speech tagging, sentence segmentation, etc. Applications of Natural Language Understanding As we have already seen, natural language understanding is basically nothing but a smart machine reading comprehension. Now let’s have a close look at how it is used to promote the efficiency and accuracy, while saving time and efforts, of human resources, which can then be put to better use. Collecting data and data analysis – To be able to serve well, a business must know what is expected out of them. Data on customer feedback is not numeric data like sales or financial statements. It is open-ended and text heavy. For companies to identify patterns and trends throughout, this data and taking action as per identified gaps or insights, is crucial for survival and growth. More and more companies are realizing that implementing a natural language understanding solution provides strong benefits to analysing metadata like customer feedback and product reviews. Natural language understanding in such cases proves to be more effective and accurate than traditional methods like hand-coding. It helps the customer’s voice to reach you clearer and faster, which leads to effective strategizing and productive implementation. Reputation monitoring –  Customer feedback is just a tip of the iceberg as compared to the real feelings of customers about the brand. As customers, we hardly participate in customer survey feedbacks. Most of the real customer sentiments hence are trapped in unstructured data. News, blog posts, chats, and social media updates contain huge amounts of such data which is more natural and can be used to know the ‘real’ feelings of customers about the product or service. Natural language understanding software products help businesses to scan through such scattered data and draw practical inferences. Customer service – Natural Language Understanding is able to communicate with untrained individuals and can understand their intent. NLU is capable of understanding the meaning in spite of some human errors like mispronunciations or transposed letters or words. It also uses algorithms that break down human speech to structured ontology and fishes out the meaning, intent, sentiment, and the crux of human speech. One of the most important goals of NLU is to create chatbots or human interacting bots that can effectively communicate with humans without any human supervision. There are various software products like Nuance which are already involved in customer interaction. Popular AI and ML Blogs & Free Courses IoT: History, Present & Future Machine Learning Tutorial: Learn ML What is Algorithm? Simple & Easy Robotics Engineer Salary in India : All Roles A Day in the Life of a Machine Learning Engineer: What do they do? What is IoT (Internet of Things) Permutation vs Combination: Difference between Permutation and Combination Top 7 Trends in Artificial Intelligence & Machine Learning Machine Learning with R: Everything You Need to Know AI & ML Free Courses Introduction to NLP Fundamentals of Deep Learning of Neural Networks Linear Regression: Step by Step Guide Artificial Intelligence in the Real World Introduction to Tableau Case Study using Python, SQL and Tableau Automated trading – Capital market trading automation is not a new phenomenon anymore. Multiple software products and platforms are now available that analyse market movements, the profile of industries and financial strength of a company and based on technical analysis design the trading patterns. Advanced Natural Language Understanding tools which scan through various sources like financial statements, reports, market news are the basis of automated trading systems. Market Intelligence – “What are competitors doing?” is one of the most critical information businesses need on a real-time basis. Information influences markets. Information exchange between various stakeholders designs and redesigns market dynamics all the time. Keeping a close watch on the status of an industry is essential to developing a powerful strategy, but the channels of content distribution today (RSS feeds, social media, emails) generate so much information that it’s been increasingly difficult to keep a tab on such unstructured, multi-sourced content. Financial markets have started using natural language understanding tools rigorously to keep track of information exchange in the market and help them reach it immediately. Due to such varied functions carried out by natural language understanding programs, its importance in trade, business, commerce and the industry is ever increasing. It is a smart move to learn natural language understanding programs to ensure yourself a successful career. What is the best way to learn Natural Language Understanding? The best way to prepare yourself for a brighter future in technological endeavors is to understand the algorithms of Artificial intelligence. The Post Graduate Diploma in Machine Learning and AI by UpGrad offers a chance to master concepts like Neural Networks, Natural Language Processing, Graphical Models and Reinforcement Learning. The most unique aspect of this course is the career support. And, the industry mentorship, which will help you prepare yourself for intense competition in the industry, within your actual job. So, let’s learn to use software products widely used in industry mentioned earlier like NLKT. This program aims at producing well-rounded data scientists and AI professionals with thorough knowledge of mathematics, expertise in relevant tools/languages and understanding of cutting-edge algorithms and applications. Start preparing today for a better tomorrow! Learn ML courses from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
Read More

by Maithili Pradhan

30 Jan'18
Neural Networks for Dummies: A Comprehensive Guide

10.99K+

Neural Networks for Dummies: A Comprehensive Guide

Our brain is an incredible pattern-recognizing machine. It processes ‘inputs’ from the outside world, categorizes them (that’s a dog; that’s a slice of pizza; ooh, that’s a bus coming towards me!), and then generates an ‘output’ (petting the dog; the yummy taste of that pizza; getting out of the way of the bus!). Best Machine Learning and AI Courses Online Master of Science in Machine Learning & AI from LJMU Executive Post Graduate Programme in Machine Learning & AI from IIITB Advanced Certificate Programme in Machine Learning & NLP from IIITB Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland To Explore all our courses, visit our page below. Machine Learning Courses All of this with little conscious effort, almost impulsively. It’s the very same system that senses if someone is mad at us, or involuntarily notices the stop signal as we speed past it. Psychologists call this mode of thinking ‘System 1’, and it includes innate skills — like perception and fear — that we share with other animals. (There’s also a ‘System 2’, to know more about it, check out the extremely informative Thinking, Fast and Slow by Daniel Kahneman). How is all of this related to Neural Networks, you ask? Wait, we’ll get there in a second. Look at the image above, just your regular numbers, distorted to help you explain the learning of Neural Networks better. Even looking cursorily, your mind will prompt you with the words “192”. You surely didn’t go “Ah, that seems like a straight line, I think it’s a 1”. You didn’t compute it – it happened instantly. In-demand Machine Learning Skills Artificial Intelligence Courses Tableau Courses NLP Courses Deep Learning Courses Fascinating, right? There is a very simple reason for this – you’ve come across the digit so many times in your life, that by trial and error, your brain automatically recognizes the digit if you present it with something even remotely close to it. Let’s cut to the chase. What exactly is a Neural Network? How does it work? By definition, a neural network is a system of hardware or softwares, patterned after the working of neurons in the human brain. Basically, it helps computers think and learn like humans. An example will make this clearer: As a child, if we ever touched a hot coffee mug and it burnt us, we made sure not to touch a hot mug ever again. But did we have any such concept of hurt in our conscience BEFORE we touched it? Not really. This adjustment of our knowledge and understanding of the world around us is based on recognizing patterns. And, like us, computers, too, learn through the same type of pattern recognition. This learning forms the whole basis of the working of neural networks. Traditional computer programs work on logic trees – If A happens, then B happens. All the potential outcomes for each of the systems can be preprogrammed. However, this eliminates the scope of flexibility. There’s no learning there. And that’s where Neural Networks come into the picture! A neural network is built without any specific logic. Essentially, it is a system that is trained to look for and adapt to, patterns within data. It is modeled exactly after how our own brain works. Each neuron (idea) is connected via synapses. Each synapse has a value that represents the probability or likelihood of the connection between two neurons to occur. Take a look at the image below: What exactly are neurons, you ask? Simply put, a neuron is just a singular concept. A mug, the colour white, tea -, the burning sensation of touching a hot mug, basically anything. All of these are possible neurons. All of them can be connected, and the strength of their connection is decided by the value of their synapse. Higher the value, better the connection. Let’s see one basic neural network connection to make you understand better: Each neuron is the node and the lines connecting them are synapses. Synapse value represents the likelihood that one neuron will be found alongside the other. So, it’s pretty clear that the diagram shown in the above image is describing a mug containing coffee, which is white in colour and is extremely hot. All mugs do not have the properties like the one in question. We can connect many other neurons to the mug. Tea, for example, is likely more common than coffee. The likelihood of two neurons being connected is determined by the strength of the synapse connecting them. Greater the number of hot mugs, the stronger the synapse. However, in a world where mugs are not used to hold hot beverages, the number of hot mugs would decrease drastically. Incidentally, this decrease would also result in lowering the strength of the synapses connecting mugs to heat. So, Becomes This small and seemingly unimportant description of a mug represents the core construction of neural networks. We touch a mug kept on a table — we find that it’s hot. It makes us think all mugs are hot. Then, we touch another mug – this time, the one kept on the shelf – it’s not hot at all. We conclude that mugs in the shelf aren’t hot. As we grow, we evolve. Our brain has been taking in data all this time. This data makes it determine an accurate probability as to whether or not the mug we’re about to touch will be hot. Neural Networks learn in the exact same way. Now, let’s talk a bit aboutthe first and the most basic model of a neural network: The Perceptron! What is a Perceptron? A perceptron is the most basic model of a neural network. It takes multiple binary inputs: x1, x2, …, and produces a single binary output. Let’s understand the above neural network better with the help of an analogy. Say you walk to work. Your decision of going to work is based on two factors majorly: the weather, and whether it is a weekday or not. The weather factor is still manageable, but working on weekends is a big no! Since we have to work with binary inputs, let’s propose the conditions as yes or no questions. Is the weather fine? 1 for yes, 0 for no. Is it a weekday? 1 yes, 0 no. Remember, we cannot explicitly tell the neural network these conditions; it’ll have to learn them for itself. How will it decide the priority of these factors while making a decision? By using something known as “weights”. Weights are just a numerical representation of the preferences. A higher weight will make the neural network consider that input at a higher priority than the others. This is represented by the w1, w2…in the flowchart above. “Okay, this is all pretty fascinating, but where do Neural Networks find work in a practical scenario?” Real-life applications of Neural Networks If you haven’t yet figured it out, then here it is, a neural network can do pretty much everything as long as you’re able to get enough data and an efficient machine to get the right parameters. Anything that even remotely requires machine learning turns to neural networks for help. Deep learning is another domain that makes extensive use of neural networks. It is one of the many machine learning algorithms that enables a computer to perform a plethora of tasks such as classification, clustering, or prediction. With the help of neural networks, we can find the solution of such problems for which a traditional-algorithmic method is expensive or does not exist. Neural networks can learn by example, hence, we do not need to program it to a  large extent. Neural networks are accurate and significantly faster than conventional speeds. Because of the reasons mentioned above and more, Deep Learning, by making use of Neural Networks, finds extensive use in the following areas: Speech recognition: Take the example of Amazon Echo Dot – magic speakers that allow you to order food, get news and weather updates, or simply buy something online just by talking it out. Handwriting recognition: Neural networks can be trained to understand the patterns in somebody’s handwriting. Have a look at Google’s Handwriting Input application – which makes use of handwriting recognition to seamlessly convert your scribbles into meaningful texts. Face recognition: From improving the security on your phone (Face ID) to the super-cool Snapchat filters – face recognition is everywhere. If you’ve ever uploaded a photo on Facebook and were asked to tag the people in your photo, you know what face recognition is! Providing artificial intelligence in games: If you’ve ever played chess against a computer, you already know how artificial intelligence powers games and game development. It’s to the extent that players use AI to improve upon their tactics and try their strategies first-hand. Popular AI and ML Blogs & Free Courses IoT: History, Present & Future Machine Learning Tutorial: Learn ML What is Algorithm? Simple & Easy Robotics Engineer Salary in India : All Roles A Day in the Life of a Machine Learning Engineer: What do they do? What is IoT (Internet of Things) Permutation vs Combination: Difference between Permutation and Combination Top 7 Trends in Artificial Intelligence & Machine Learning Machine Learning with R: Everything You Need to Know AI & ML Free Courses Introduction to NLP Fundamentals of Deep Learning of Neural Networks Linear Regression: Step by Step Guide Artificial Intelligence in the Real World Introduction to Tableau Case Study using Python, SQL and Tableau In Conclusion… Neural networks form the backbone of almost every big technology or invention you see today. It’s only fair to say that imagining deep/machine learning without neural networks is next to impossible. Depending on the way you implement a network and the kind of learning you put to use, you can achieve a lot out of a neural network, as compared to a traditional computer system. Learn ML courses from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
Read More

by Reetesh Chandra

06 Feb'18