HISTORY OF NLP
1948
1957
1966
1970s
FirstNLP Application - Dictionary Look-up system at Birkbeck college
Chomsky's Syntactic Structures – Revolution linguistics (influence BNF &RE)
ALPAC Report - machine translation, advances in grammar and semantics
NLP influenced by AI – LUNAR , Prolog
1980s
1982
Statistical NLP Emerges – Hidden Markov model
Project Jabberwacky – early chatbox
3.
HISTORY OF NLP
1988
2001
2003
2013
FrameNetProject – Shallow semantic parsing
Word Embeddings Introduced – NN for language modelling
Latent Dirichlet Allocation (LDA) – topic modelling in ML
NLP influenced by AI – improves Word Embeddings (CNN)
Mar 2016
Sep 2016
Microsoft’s Tay Chatbot – highlight ethical challenges in AI
Google Neural Machine Translation (NMT) –reduce translation errors(deep LSTM)
4.
HOW NLP?
• Naturallanguages-humans use to communicate
• Computers have their own programming languages and were not meant to understand
natural languages.
• Why not speak to the computer and let it respond in a natural language? This is one of the
aims of Natural Language Processing (NLP) – machine Translation
• NLP is rooted in the theory of linguistics
• The process of computer analysis of input provided in a human language (natural
language), and conversion of this input into a useful form of representation.
5.
WHAT IS NLP?
•Natural Language Processing is a subset technique of Artificial Intelligence that
is used to narrow the communication gap between the Computer and Human.
• Techniques from machine learning and deep neural networks have also been
successfully applied to NLP problems.
• While many practical applications of NLP already exist, NLP has many
unsolved problems.
• The field of NLP is primarily concerned with getting computers to perform
useful and interesting tasks with human languages.
• The field of NLP is secondarily concerned with helping us come to a better
understanding of human language.
• Eg: Customer support service for a product
6.
WHY IT ISIMPORTANT?
• Faster response
• Without Bias
• Manage more volume of data
• To learn more GOALS
• Scientific- Computer to understand
• Practical – Using available data
7.
WHY DO COMPUTERSHAVE DIFFICULTY WITH NLP?
• Computers - dealing with structured data(organized, indexed
and referenced)
• In NLP, we often deal with unstructured data.
• Eg: Social media posts, news articles, emails, and product
reviews are examples of text-based unstructured data.
• To process such text, NLP has to learn the structure and
grammar of the natural language.
• Importantly, 80% of enterprise data is unstructured.
8.
HOW IT ISWORKING:GENERAL SKETCH
USER MACHINE TEXT
PROCESS
RESPONSE
Input Convert
ML
Out
Audio/Text
9.
Natural Language
Processing isdivided
into sub-areas, i.e.,
Natural Language
Generation and Natural
Language
Understanding, which
are, as the name
suggests, associated
with the generation and
understanding of the
text.
TERMSTO UNDERSTAND
10.
SYNTAX
• Definition: Theset of rules that governs the structure and order of words in sentences.
• Importance: Helps in understanding sentence structure and grammar.
• Example:
• Correct Syntax: "The cat sits on the mat."
• Incorrect Syntax: "Cat the mat on sits."
• Applications in NLP:
• Part-of-Speech (POS) tagging.
• Syntactic parsing to analyze grammatical structure.
11.
MORPHOLOGY
• Definition: Thestudy of the structure and formation of words, including roots, prefixes, suffixes,
and inflections.
• Types:
• Inflectional Morphology: Changes in word form to express tense, number, gender, etc. (e.g., "walk"
→ "walked").
• Derivational Morphology: Formation of new words by adding affixes (e.g., "happy" → "happiness").
• Importance: Helps in breaking words into meaningful components.
• Applications in NLP:
• Lemmatization and stemming.
• Spell correction and word segmentation.
12.
SEMANTICS
• Definition: Thestudy of meaning in language, including word meanings and sentence
meanings.
• Importance: Focuses on understanding the meaning of text.
• Example:
• "The bank is by the river" (financial institution vs. riverbank).
• Applications in NLP:
• Word sense disambiguation.
• Named entity recognition (NER).
• Sentiment analysis.
13.
PRAGMATICS
• Definition: Thestudy of how context influences the interpretation of meaning in language.
• Importance: Goes beyond literal meanings to understand implied meanings and situational
context.
• Example:
• Literal: "Can you pass the salt?" (Yes, I can.)
• Pragmatic: (Actually passing the salt).
• Applications in NLP:
• Chatbots and conversational agents.
• Sarcasm and irony detection.
• Context-aware language generation.
14.
PHONOLOGY
• Definition: Thestudy of the sound systems of a language and how sounds
are organized and used.
• Importance: Relevant in speech-related NLP tasks.
• Applications in NLP:
• Speech recognition (converting speech to text).
• Text-to-speech systems (TTS).
• Pronunciation correction tools.
15.
NLU and NLG
•Natural Language Understanding (NLU):
• involves converting speech or text into useful representations on which analysis can be performed.
• Goal- to resolve ambiguities, obtain context and understand the meaning of what's being said.
• NLU is about semantic relationships and meaning.
• NLU tackles the complexities of language beyond the basic sentence structure.
• Natural Language Generation (NLG):
• Given an internal representation.
• involves selecting the right words, forming phrases and sentences.
• Sentences need to ordered so that information is conveyed correctly.
analysis
synthesis
16.
MAIN APPROACHES ADOPTEDBY NLP
• Symbolic approach (from the 1950s)
• rooted in linguistics.
• Given the rules of syntax and grammar - obtain the
structure of text.
• Using logic, we could obtain the meaning.
• But rules had to be hand-crafted and were often
numerous.
• They didn't handle colloquial text well.
• Rules worked well for specific use cases but couldn't be
generalized.
• Statistical approach(from the 1980s)
• Rules were learned and they had associated probabilities.
• ML models came in with support vector machines and
logistic regression.
• More recently, Deep Learning (DL) models that employ a
neural network of many layers have brought better
accuracy.
• This success is partly due to the more efficient
representations given by word embeddings.
NLP involves different levels or scope of analysis.
LEXICAL AND MORPHOLOGICALANALYSIS
• The lexical phase - involves scanning text and breaking it down into smaller units (tokens) .
• Tokenization is essential for understanding and processing text at the word level.
• In addition to tokenization, various data cleaning and feature extraction techniques are applied, including:
Lemmatization, Stopwords Removal, Correcting Misspelled Words.
• Morphological Analysis - focusing on identifying morphemes, Understanding morphemes is vital
for grasping the structure of words and their relationships.
• Types of Morphemes: Free Morphemes and Bound Morphemes
• Importance of Morphological Analysis- Understanding Word Structure, Predicting Word Forms,
Improving Accuracy
19.
SYNTACTIC ANALYSIS (PARSING)
•essential for understanding the structure of a sentence and assessing its
grammatical correctness.
• It involves analyzing the relationships between words and ensuring
their logical consistency by comparing their arrangement against
standard grammatical rules.
• Role- examines the grammatical structure and relationships within a
given text and assigns Parts-Of-Speech (POS) tags to each word
• This tagging is crucial for understanding how words relate to each other
syntactically and helps in avoiding ambiguity
Sentence: "John eats an apple."
POS Tags:
• John: Proper Noun (NNP)
• eats: Verb (VBZ)
• an: Determiner (DT)
• apple: Noun (NN)
20.
SEMANTIC ANALYSIS
• focusingon extracting the meaning from text.
• concerned with the literal and contextual meaning of words, phrases, and sentences.
• determines whether the arrangement of words in a sentence makes logical sense
• helps in finding context and logic by ensuring the semantic coherence of sentences.
• Key Tasks:
• Named Entity Recognition (NER): identifies and classifies entities within the text
• Word Sense Disambiguation (WSD): determines the correct meaning of ambiguous words based on context
• Example- “Orange eats a Mary” - grammatically correct but does not make sense semantically.
21.
DISCOURSE INTEGRATION
• comprehendingthe relationship between the current sentence and earlier sentences or the
larger context.
• contextualizing text and understanding the overall message conveyed.
• Role- examines how words, phrases, and sentences relate to each other within a larger
context.
• assesses the impact a word or sentence and how the combination of sentences affects the
overall meaning.
• helps in understanding implicit references and the flow of information across sentences.
• Example: "This is unfair!“ - "this" - need to examine the preceding or following sentences
22.
PRAGMATIC ANALYSIS
• focusingon interpreting the inferred meaning of a text beyond its literal content.
• Role- aims to grasp these deeper meanings in communication. i.e what the writer
or speaker truly intends to convey?
• Importance of Understanding Intentions - the word "Hello" can have various
interpretations depending on the tone and context in which it is spoken.
• Example: "Hello! What time is it?“ -might be a straightforward request for the
current time, but it could also imply concern about being late.
23.
NLP PIPELINE
• NLPpipeline is a sequence of interconnected steps that
systematically transform raw text data into a desired
output.
• It’s analogous to a factory assembly line, where each step
refines the material until it reaches its final form.
• This pipeline is not universal.
• This is ML pipeline and deep learning pipelines are
slightly different.
• NLP pipeline is non-linear (that means stages can have
more dynamic connections, allowing for branching and
iteration).
24.
DATA ACQUISITION
Objective: Collectraw text data for creating a robust dataset.
• Data Available:
• On Your Desk: Begin text preprocessing immediately.
• In Databases: Collaborate with data engineers to retrieve
data.
• Insufficient Data: Use data augmentation techniques:
• Synonym replacement.
• Bigram flip.
• Back translation.
• Adding noise.
• Data from External Sources:
• Use public datasets (Kaggle, UCI, government
repositories).
• Extract data using web scraping (e.g., BeautifulSoup,
Scrapy).
• Access APIs (e.g., Twitter, Reddit, news aggregators).
• Extract text from PDFs (e.g., PyPDF2, PDFMiner).
• No Existing Data:
• Collaborate with trusted clients for anonymized data.
• Generate synthetic data through surveys, interviews, or
user-generated content.
25.
TEXT PREPROCESSING
Objective: Cleanand standardize text for meaningful analysis.
Steps:
• Basic Cleaning:
• Remove HTML tags and irrelevant formatting.
• Handle emojis (convert or remove).
• Perform spell checks for consistency.
• Basic Preprocessing:
• Tokenize text into words or sentences.
• Remove stop words (e.g., “the,” “is”).
• Apply stemming or lemmatization.
• Convert text to lowercase.
• Detect the text’s language.
• Advanced Preprocessing:
• Perform Part-of-Speech (POS) tagging.
• Conduct parsing for grammatical
structure.
• Resolve coreferences for coherent
understanding.
26.
FEATURE ENGINEERING
Objective: Converttext into numerical features for models.
• Techniques:
• Bag of Words (BoW): Frequency-based representation of unique words.
• TF-IDF: Weighs word importance based on frequency and rarity.
• One-Hot Encoding: Binary vectors for words, effective for small vocabularies.
• Word Embeddings: Dense vector representations capturing semantic meaning (e.g., Word2Vec,
GloVe, FastText).
• N-Gram Models: Capture sequences of adjacent words (bigrams, trigrams).
• Dependency Parsing: Capture relationships between words through syntactic dependencies.
27.
MODELLING
Objective: Train modelsto perform NLP tasks.
• Approaches:
• Heuristic Models:
• Rule-based systems for specific patterns (e.g., keyword matching).
• Machine Learning:
• Support Vector Machines (SVM): Effective for text classification.
• Random Forests: Suitable for sentiment analysis or categorization.
• Deep Learning:
• RNNs: Handle sequence-based tasks like language modeling.
• Transformers: Capture long-range dependencies for tasks like translation and summarization.
• Cloud APIs:
• Google Cloud, Microsoft Azure: Provide pre-trained models for rapid prototyping.
DEPLOYMENT
Objective: Implement themodel in real-world applications.
Steps:
• Deployment:
• Integrate the model into production systems.
• Set up infrastructure for scalability and reliability.
• Validate functionality through testing.
• Monitoring:
• Continuously monitor performance and behavior.
• Implement alerts for deviations or anomalies.
Updates:
• Adapt to dynamic data and retrain
models periodically.
• Maintain version control for
transparency.
• Address evolving user needs based on
feedback.
30.
CHALLENGES IN NLP
•Diversity in Language and Communication
• Challenges in Sourcing and Preparing Training Data
• Time and Resource Demands for NLP Development
• Dealing with Ambiguity in Phrasing and Meaning
• Correcting Spelling and Grammar Errors
• Addressing Bias and Fairness in NLP Models
• Handling Lexical Ambiguity and Multiple Meanings
• Overcoming Multilingual and Cross-Cultural Barriers
• Minimizing Uncertainty and False Positive Predictions
• Enabling Seamless and Ongoing Conversations
31.
HOW TO OVERCOMENLP CHALLENGES
• Enhance Data Quantity and Quality
• Use high-quality, diverse datasets to train NLP models effectively.
• Apply techniques like data augmentation, data synthesis, and crowdsourcing to address data scarcity.
• Handle Ambiguity in Language
• Train NLP algorithms to disambiguate words and phrases using context and semantic analysis.
• Address Out-of-Vocabulary (OOV) Words
• Implement techniques like tokenization, character-level modeling, and vocabulary expansion to manage
OOV words.
• Tackle Lack of Annotated Data
• Use transfer learning and pre-training to leverage large datasets and apply knowledge to tasks with limited
labeled data.
32.
SCOPE OF NLP
TextProcessing and Analysis
• Sentiment Analysis: Understanding opinions and sentiments from text data (e.g., social media,
reviews).
• Text Summarization: Generating concise summaries of lengthy documents or articles.
• Topic Modeling: Identifying hidden topics within text datasets.
• Text Classification: Categorizing emails, documents, and news articles into predefined groups.
Human-Computer Interaction
• Chatbots: Enhancing customer service through conversational agents.
• Virtual Assistants: Powering voice-based systems like Siri, Alexa, and Google Assistant.
• Speech-to-Text and Text-to-Speech: Enabling accessibility for visually or hearing-impaired users.
33.
SCOPE OF NLP(CONT…)
Healthcare Applications
• Clinical Text Analysis: Extracting insights from electronic health records (EHRs).
• Medical Chatbots: Offering basic medical advice and appointment scheduling.
• Drug Discovery: Analyzing medical literature for drug development.
• Disease Prediction: Detecting early signs of illness from patient records.
Language Translation and Localization
• Machine Translation: Tools like Google Translate for multilingual communication.
• Localization: Adapting content for cultural and regional relevance.
• Cross-Language Information Retrieval: Searching for information across languages.
34.
SCOPE OF NLP(CONT…)
Business and Marketing
• Customer Sentiment Analysis: Understanding customer feedback and improving products/services.
• Personalized Marketing: Crafting targeted campaigns based on user behavior and preferences.
• Automated Report Generation: Summarizing business insights from data analytics.
Education and E-Learning
• Grammar and Spell Checking: Tools like Grammarly for improving written communication.
• Content Recommendation: Tailoring learning materials based on user progress and preferences.
• Language Learning: Interactive tools for acquiring new languages.
35.
SCOPE OF NLP(CONT…)
Media and Entertainment
• Content Moderation: Detecting inappropriate or harmful content.
• Script Analysis: Generating or analyzing scripts for movies or shows.
• Automated Subtitles: Generating real-time captions for videos.
Legal and Compliance
• Document Review: Analyzing contracts and legal documents for compliance.
• Case Law Analysis: Extracting insights from legal precedents.
• Regulatory Monitoring: Keeping track of changes in compliance requirements.
36.
SCOPE OF NLP(CONT…)
Research and Development
• Knowledge Graphs: Building relationships between entities for research purposes.
• Question-Answering Systems: Advanced AI models for research and academic purposes.
• Scientific Literature Analysis: Summarizing and categorizing research papers.
Emerging Areas
• Emotion Detection: Understanding emotions conveyed in text or speech.
• Real-Time Applications: Real-time language translation and sentiment tracking.
• Ethical AI in NLP: Developing models that mitigate bias and ensure fairness.
• Multimodal NLP: Integrating text with images, audio, and video for deeper insights.
37.
APPLICATIONS
Chatbots
• Simulate human-likeconversation using Natural Language Processing (NLP) and Machine
Learning (ML).
• Understand complex language and improve over time by learning from interactions.
• Function through two steps: understanding user input and providing appropriate responses.
Autocomplete in Search Engines
• Suggest possible completions for typed queries based on keyword predictions.
• Analyze vast datasets and patterns to provide relevant suggestions.
• NLP identifies relationships between words to predict user intent.
38.
APPLICATIONS (CONT…..)
Voice Assistants
•Examples: Siri, Alexa, Google Assistant.
• Perform tasks such as making calls, setting reminders, and surfing the internet using voice commands.
• Utilize speech recognition, natural language understanding, and NLP for interaction.
Language Translators
• Translate text between languages using Sequence-to-Sequence modeling.
• Transitioned from Statistical Machine Translation (SMT) to advanced NLP models for improved
accuracy.
• Examples: Google Translate, which identifies patterns and vocabulary of languages
39.
APPLICATIONS (CONT…..)
Sentiment Analysis
•Analyze user sentiments on social media, reviews, or feedback.
• Employs NLP, text analysis, and computational linguistics to classify sentiments as positive,
negative, or neutral.
• Helps businesses gauge public opinion, understand brand perception, and improve services.
Grammar Checkers
• Enhance professional and academic writing by correcting grammar and spelling errors.
• Suggest synonyms and improve readability using NLP algorithms trained on large datasets.
• Essential for producing polished and error-free content.
40.
APPLICATIONS (CONT…..)
Email Classificationand Filtering
• Categorize emails into sections like Primary, Social, and Promotions using text classification.
• NLP identifies the context and content of emails to automate sorting.
• Improves productivity by decluttering inboxes and organizing communication
Electronic Health Records (EHR) Analysis
• Extract and organize unstructured data from clinical notes, discharge summaries, and patient
histories.
• Streamline documentation and provide physicians with actionable insights.
• Enable faster and more accurate diagnosis through data-driven decision-making..