NATURAL
LANGUAGE
PROCESSING(NLP)
Dr. Siron Anita Susan T
Assistant Professor
SRM Institute of Technology Tiruchirappalli
Overview of NLP: Definition, Scope, Applications of NLP
HISTORY OF NLP
1948
1957
1966
1970s
First NLP Application - Dictionary Look-up system at Birkbeck college
Chomsky's Syntactic Structures – Revolution linguistics (influence BNF &RE)
ALPAC Report - machine translation, advances in grammar and semantics
NLP influenced by AI – LUNAR , Prolog
1980s
1982
Statistical NLP Emerges – Hidden Markov model
Project Jabberwacky – early chatbox
HISTORY OF NLP
1988
2001
2003
2013
FrameNet Project – Shallow semantic parsing
Word Embeddings Introduced – NN for language modelling
Latent Dirichlet Allocation (LDA) – topic modelling in ML
NLP influenced by AI – improves Word Embeddings (CNN)
Mar 2016
Sep 2016
Microsoft’s Tay Chatbot – highlight ethical challenges in AI
Google Neural Machine Translation (NMT) –reduce translation errors(deep LSTM)
HOW NLP?
• Natural languages-humans use to communicate
• Computers have their own programming languages and were not meant to understand
natural languages.
• Why not speak to the computer and let it respond in a natural language? This is one of the
aims of Natural Language Processing (NLP) – machine Translation
• NLP is rooted in the theory of linguistics
• The process of computer analysis of input provided in a human language (natural
language), and conversion of this input into a useful form of representation.
WHAT IS NLP?
• Natural Language Processing is a subset technique of Artificial Intelligence that
is used to narrow the communication gap between the Computer and Human.
• Techniques from machine learning and deep neural networks have also been
successfully applied to NLP problems.
• While many practical applications of NLP already exist, NLP has many
unsolved problems.
• The field of NLP is primarily concerned with getting computers to perform
useful and interesting tasks with human languages.
• The field of NLP is secondarily concerned with helping us come to a better
understanding of human language.
• Eg: Customer support service for a product
WHY IT IS IMPORTANT?
• Faster response
• Without Bias
• Manage more volume of data
• To learn more GOALS
• Scientific- Computer to understand
• Practical – Using available data
WHY DO COMPUTERS HAVE DIFFICULTY WITH NLP?
• Computers - dealing with structured data(organized, indexed
and referenced)
• In NLP, we often deal with unstructured data.
• Eg: Social media posts, news articles, emails, and product
reviews are examples of text-based unstructured data.
• To process such text, NLP has to learn the structure and
grammar of the natural language.
• Importantly, 80% of enterprise data is unstructured.
HOW IT ISWORKING: GENERAL SKETCH
USER MACHINE TEXT
PROCESS
RESPONSE
Input Convert
ML
Out
Audio/Text
Natural Language
Processing is divided
into sub-areas, i.e.,
Natural Language
Generation and Natural
Language
Understanding, which
are, as the name
suggests, associated
with the generation and
understanding of the
text.
TERMSTO UNDERSTAND
SYNTAX
• Definition: The set of rules that governs the structure and order of words in sentences.
• Importance: Helps in understanding sentence structure and grammar.
• Example:
• Correct Syntax: "The cat sits on the mat."
• Incorrect Syntax: "Cat the mat on sits."
• Applications in NLP:
• Part-of-Speech (POS) tagging.
• Syntactic parsing to analyze grammatical structure.
MORPHOLOGY
• Definition: The study of the structure and formation of words, including roots, prefixes, suffixes,
and inflections.
• Types:
• Inflectional Morphology: Changes in word form to express tense, number, gender, etc. (e.g., "walk"
→ "walked").
• Derivational Morphology: Formation of new words by adding affixes (e.g., "happy" → "happiness").
• Importance: Helps in breaking words into meaningful components.
• Applications in NLP:
• Lemmatization and stemming.
• Spell correction and word segmentation.
SEMANTICS
• Definition: The study of meaning in language, including word meanings and sentence
meanings.
• Importance: Focuses on understanding the meaning of text.
• Example:
• "The bank is by the river" (financial institution vs. riverbank).
• Applications in NLP:
• Word sense disambiguation.
• Named entity recognition (NER).
• Sentiment analysis.
PRAGMATICS
• Definition: The study of how context influences the interpretation of meaning in language.
• Importance: Goes beyond literal meanings to understand implied meanings and situational
context.
• Example:
• Literal: "Can you pass the salt?" (Yes, I can.)
• Pragmatic: (Actually passing the salt).
• Applications in NLP:
• Chatbots and conversational agents.
• Sarcasm and irony detection.
• Context-aware language generation.
PHONOLOGY
• Definition: The study of the sound systems of a language and how sounds
are organized and used.
• Importance: Relevant in speech-related NLP tasks.
• Applications in NLP:
• Speech recognition (converting speech to text).
• Text-to-speech systems (TTS).
• Pronunciation correction tools.
NLU and NLG
• Natural Language Understanding (NLU):
• involves converting speech or text into useful representations on which analysis can be performed.
• Goal- to resolve ambiguities, obtain context and understand the meaning of what's being said.
• NLU is about semantic relationships and meaning.
• NLU tackles the complexities of language beyond the basic sentence structure.
• Natural Language Generation (NLG):
• Given an internal representation.
• involves selecting the right words, forming phrases and sentences.
• Sentences need to ordered so that information is conveyed correctly.
analysis
synthesis
MAIN APPROACHES ADOPTED BY NLP
• Symbolic approach (from the 1950s)
• rooted in linguistics.
• Given the rules of syntax and grammar - obtain the
structure of text.
• Using logic, we could obtain the meaning.
• But rules had to be hand-crafted and were often
numerous.
• They didn't handle colloquial text well.
• Rules worked well for specific use cases but couldn't be
generalized.
• Statistical approach(from the 1980s)
• Rules were learned and they had associated probabilities.
• ML models came in with support vector machines and
logistic regression.
• More recently, Deep Learning (DL) models that employ a
neural network of many layers have brought better
accuracy.
• This success is partly due to the more efficient
representations given by word embeddings.
NLP involves different levels or scope of analysis.
STAGES OF NLP
LEXICAL AND MORPHOLOGICAL ANALYSIS
• The lexical phase - involves scanning text and breaking it down into smaller units (tokens) .
• Tokenization is essential for understanding and processing text at the word level.
• In addition to tokenization, various data cleaning and feature extraction techniques are applied, including:
Lemmatization, Stopwords Removal, Correcting Misspelled Words.
• Morphological Analysis - focusing on identifying morphemes, Understanding morphemes is vital
for grasping the structure of words and their relationships.
• Types of Morphemes: Free Morphemes and Bound Morphemes
• Importance of Morphological Analysis- Understanding Word Structure, Predicting Word Forms,
Improving Accuracy
SYNTACTIC ANALYSIS (PARSING)
• essential for understanding the structure of a sentence and assessing its
grammatical correctness.
• It involves analyzing the relationships between words and ensuring
their logical consistency by comparing their arrangement against
standard grammatical rules.
• Role- examines the grammatical structure and relationships within a
given text and assigns Parts-Of-Speech (POS) tags to each word
• This tagging is crucial for understanding how words relate to each other
syntactically and helps in avoiding ambiguity
Sentence: "John eats an apple."
POS Tags:
• John: Proper Noun (NNP)
• eats: Verb (VBZ)
• an: Determiner (DT)
• apple: Noun (NN)
SEMANTIC ANALYSIS
• focusing on extracting the meaning from text.
• concerned with the literal and contextual meaning of words, phrases, and sentences.
• determines whether the arrangement of words in a sentence makes logical sense
• helps in finding context and logic by ensuring the semantic coherence of sentences.
• Key Tasks:
• Named Entity Recognition (NER): identifies and classifies entities within the text
• Word Sense Disambiguation (WSD): determines the correct meaning of ambiguous words based on context
• Example- “Orange eats a Mary” - grammatically correct but does not make sense semantically.
DISCOURSE INTEGRATION
• comprehending the relationship between the current sentence and earlier sentences or the
larger context.
• contextualizing text and understanding the overall message conveyed.
• Role- examines how words, phrases, and sentences relate to each other within a larger
context.
• assesses the impact a word or sentence and how the combination of sentences affects the
overall meaning.
• helps in understanding implicit references and the flow of information across sentences.
• Example: "This is unfair!“ - "this" - need to examine the preceding or following sentences
PRAGMATIC ANALYSIS
• focusing on interpreting the inferred meaning of a text beyond its literal content.
• Role- aims to grasp these deeper meanings in communication. i.e what the writer
or speaker truly intends to convey?
• Importance of Understanding Intentions - the word "Hello" can have various
interpretations depending on the tone and context in which it is spoken.
• Example: "Hello! What time is it?“ -might be a straightforward request for the
current time, but it could also imply concern about being late.
NLP PIPELINE
• NLP pipeline is a sequence of interconnected steps that
systematically transform raw text data into a desired
output.
• It’s analogous to a factory assembly line, where each step
refines the material until it reaches its final form.
• This pipeline is not universal.
• This is ML pipeline and deep learning pipelines are
slightly different.
• NLP pipeline is non-linear (that means stages can have
more dynamic connections, allowing for branching and
iteration).
DATA ACQUISITION
Objective: Collect raw text data for creating a robust dataset.
• Data Available:
• On Your Desk: Begin text preprocessing immediately.
• In Databases: Collaborate with data engineers to retrieve
data.
• Insufficient Data: Use data augmentation techniques:
• Synonym replacement.
• Bigram flip.
• Back translation.
• Adding noise.
• Data from External Sources:
• Use public datasets (Kaggle, UCI, government
repositories).
• Extract data using web scraping (e.g., BeautifulSoup,
Scrapy).
• Access APIs (e.g., Twitter, Reddit, news aggregators).
• Extract text from PDFs (e.g., PyPDF2, PDFMiner).
• No Existing Data:
• Collaborate with trusted clients for anonymized data.
• Generate synthetic data through surveys, interviews, or
user-generated content.
TEXT PREPROCESSING
Objective: Clean and standardize text for meaningful analysis.
Steps:
• Basic Cleaning:
• Remove HTML tags and irrelevant formatting.
• Handle emojis (convert or remove).
• Perform spell checks for consistency.
• Basic Preprocessing:
• Tokenize text into words or sentences.
• Remove stop words (e.g., “the,” “is”).
• Apply stemming or lemmatization.
• Convert text to lowercase.
• Detect the text’s language.
• Advanced Preprocessing:
• Perform Part-of-Speech (POS) tagging.
• Conduct parsing for grammatical
structure.
• Resolve coreferences for coherent
understanding.
FEATURE ENGINEERING
Objective: Convert text into numerical features for models.
• Techniques:
• Bag of Words (BoW): Frequency-based representation of unique words.
• TF-IDF: Weighs word importance based on frequency and rarity.
• One-Hot Encoding: Binary vectors for words, effective for small vocabularies.
• Word Embeddings: Dense vector representations capturing semantic meaning (e.g., Word2Vec,
GloVe, FastText).
• N-Gram Models: Capture sequences of adjacent words (bigrams, trigrams).
• Dependency Parsing: Capture relationships between words through syntactic dependencies.
MODELLING
Objective: Train models to perform NLP tasks.
• Approaches:
• Heuristic Models:
• Rule-based systems for specific patterns (e.g., keyword matching).
• Machine Learning:
• Support Vector Machines (SVM): Effective for text classification.
• Random Forests: Suitable for sentiment analysis or categorization.
• Deep Learning:
• RNNs: Handle sequence-based tasks like language modeling.
• Transformers: Capture long-range dependencies for tasks like translation and summarization.
• Cloud APIs:
• Google Cloud, Microsoft Azure: Provide pre-trained models for rapid prototyping.
EVALUATION
Objective: Assess model performance.
• Types:
• Intrinsic Evaluation:
• Accuracy, Precision, Recall, F1-Score.
• BLEU for translation, Perplexity for language models.
• Extrinsic Evaluation:
• Business metrics (e.g., customer satisfaction, revenue impact).
• Task-specific metrics (e.g., classification accuracy).
• User-centric evaluation (feedback, surveys).
DEPLOYMENT
Objective: Implement the model in real-world applications.
Steps:
• Deployment:
• Integrate the model into production systems.
• Set up infrastructure for scalability and reliability.
• Validate functionality through testing.
• Monitoring:
• Continuously monitor performance and behavior.
• Implement alerts for deviations or anomalies.
Updates:
• Adapt to dynamic data and retrain
models periodically.
• Maintain version control for
transparency.
• Address evolving user needs based on
feedback.
CHALLENGES IN NLP
• Diversity in Language and Communication
• Challenges in Sourcing and Preparing Training Data
• Time and Resource Demands for NLP Development
• Dealing with Ambiguity in Phrasing and Meaning
• Correcting Spelling and Grammar Errors
• Addressing Bias and Fairness in NLP Models
• Handling Lexical Ambiguity and Multiple Meanings
• Overcoming Multilingual and Cross-Cultural Barriers
• Minimizing Uncertainty and False Positive Predictions
• Enabling Seamless and Ongoing Conversations
HOW TO OVERCOME NLP CHALLENGES
• Enhance Data Quantity and Quality
• Use high-quality, diverse datasets to train NLP models effectively.
• Apply techniques like data augmentation, data synthesis, and crowdsourcing to address data scarcity.
• Handle Ambiguity in Language
• Train NLP algorithms to disambiguate words and phrases using context and semantic analysis.
• Address Out-of-Vocabulary (OOV) Words
• Implement techniques like tokenization, character-level modeling, and vocabulary expansion to manage
OOV words.
• Tackle Lack of Annotated Data
• Use transfer learning and pre-training to leverage large datasets and apply knowledge to tasks with limited
labeled data.
SCOPE OF NLP
Text Processing and Analysis
• Sentiment Analysis: Understanding opinions and sentiments from text data (e.g., social media,
reviews).
• Text Summarization: Generating concise summaries of lengthy documents or articles.
• Topic Modeling: Identifying hidden topics within text datasets.
• Text Classification: Categorizing emails, documents, and news articles into predefined groups.
Human-Computer Interaction
• Chatbots: Enhancing customer service through conversational agents.
• Virtual Assistants: Powering voice-based systems like Siri, Alexa, and Google Assistant.
• Speech-to-Text and Text-to-Speech: Enabling accessibility for visually or hearing-impaired users.
SCOPE OF NLP (CONT…)
Healthcare Applications
• Clinical Text Analysis: Extracting insights from electronic health records (EHRs).
• Medical Chatbots: Offering basic medical advice and appointment scheduling.
• Drug Discovery: Analyzing medical literature for drug development.
• Disease Prediction: Detecting early signs of illness from patient records.
Language Translation and Localization
• Machine Translation: Tools like Google Translate for multilingual communication.
• Localization: Adapting content for cultural and regional relevance.
• Cross-Language Information Retrieval: Searching for information across languages.
SCOPE OF NLP (CONT…)
Business and Marketing
• Customer Sentiment Analysis: Understanding customer feedback and improving products/services.
• Personalized Marketing: Crafting targeted campaigns based on user behavior and preferences.
• Automated Report Generation: Summarizing business insights from data analytics.
Education and E-Learning
• Grammar and Spell Checking: Tools like Grammarly for improving written communication.
• Content Recommendation: Tailoring learning materials based on user progress and preferences.
• Language Learning: Interactive tools for acquiring new languages.
SCOPE OF NLP (CONT…)
Media and Entertainment
• Content Moderation: Detecting inappropriate or harmful content.
• Script Analysis: Generating or analyzing scripts for movies or shows.
• Automated Subtitles: Generating real-time captions for videos.
Legal and Compliance
• Document Review: Analyzing contracts and legal documents for compliance.
• Case Law Analysis: Extracting insights from legal precedents.
• Regulatory Monitoring: Keeping track of changes in compliance requirements.
SCOPE OF NLP (CONT…)
Research and Development
• Knowledge Graphs: Building relationships between entities for research purposes.
• Question-Answering Systems: Advanced AI models for research and academic purposes.
• Scientific Literature Analysis: Summarizing and categorizing research papers.
Emerging Areas
• Emotion Detection: Understanding emotions conveyed in text or speech.
• Real-Time Applications: Real-time language translation and sentiment tracking.
• Ethical AI in NLP: Developing models that mitigate bias and ensure fairness.
• Multimodal NLP: Integrating text with images, audio, and video for deeper insights.
APPLICATIONS
Chatbots
• Simulate human-like conversation using Natural Language Processing (NLP) and Machine
Learning (ML).
• Understand complex language and improve over time by learning from interactions.
• Function through two steps: understanding user input and providing appropriate responses.
Autocomplete in Search Engines
• Suggest possible completions for typed queries based on keyword predictions.
• Analyze vast datasets and patterns to provide relevant suggestions.
• NLP identifies relationships between words to predict user intent.
APPLICATIONS (CONT…..)
Voice Assistants
• Examples: Siri, Alexa, Google Assistant.
• Perform tasks such as making calls, setting reminders, and surfing the internet using voice commands.
• Utilize speech recognition, natural language understanding, and NLP for interaction.
Language Translators
• Translate text between languages using Sequence-to-Sequence modeling.
• Transitioned from Statistical Machine Translation (SMT) to advanced NLP models for improved
accuracy.
• Examples: Google Translate, which identifies patterns and vocabulary of languages
APPLICATIONS (CONT…..)
Sentiment Analysis
• Analyze user sentiments on social media, reviews, or feedback.
• Employs NLP, text analysis, and computational linguistics to classify sentiments as positive,
negative, or neutral.
• Helps businesses gauge public opinion, understand brand perception, and improve services.
Grammar Checkers
• Enhance professional and academic writing by correcting grammar and spelling errors.
• Suggest synonyms and improve readability using NLP algorithms trained on large datasets.
• Essential for producing polished and error-free content.
APPLICATIONS (CONT…..)
Email Classification and Filtering
• Categorize emails into sections like Primary, Social, and Promotions using text classification.
• NLP identifies the context and content of emails to automate sorting.
• Improves productivity by decluttering inboxes and organizing communication
Electronic Health Records (EHR) Analysis
• Extract and organize unstructured data from clinical notes, discharge summaries, and patient
histories.
• Streamline documentation and provide physicians with actionable insights.
• Enable faster and more accurate diagnosis through data-driven decision-making..
SUMMARY OF THE SESSION

operating system notes for II year IV semester students

  • 1.
    NATURAL LANGUAGE PROCESSING(NLP) Dr. Siron AnitaSusan T Assistant Professor SRM Institute of Technology Tiruchirappalli Overview of NLP: Definition, Scope, Applications of NLP
  • 2.
    HISTORY OF NLP 1948 1957 1966 1970s FirstNLP Application - Dictionary Look-up system at Birkbeck college Chomsky's Syntactic Structures – Revolution linguistics (influence BNF &RE) ALPAC Report - machine translation, advances in grammar and semantics NLP influenced by AI – LUNAR , Prolog 1980s 1982 Statistical NLP Emerges – Hidden Markov model Project Jabberwacky – early chatbox
  • 3.
    HISTORY OF NLP 1988 2001 2003 2013 FrameNetProject – Shallow semantic parsing Word Embeddings Introduced – NN for language modelling Latent Dirichlet Allocation (LDA) – topic modelling in ML NLP influenced by AI – improves Word Embeddings (CNN) Mar 2016 Sep 2016 Microsoft’s Tay Chatbot – highlight ethical challenges in AI Google Neural Machine Translation (NMT) –reduce translation errors(deep LSTM)
  • 4.
    HOW NLP? • Naturallanguages-humans use to communicate • Computers have their own programming languages and were not meant to understand natural languages. • Why not speak to the computer and let it respond in a natural language? This is one of the aims of Natural Language Processing (NLP) – machine Translation • NLP is rooted in the theory of linguistics • The process of computer analysis of input provided in a human language (natural language), and conversion of this input into a useful form of representation.
  • 5.
    WHAT IS NLP? •Natural Language Processing is a subset technique of Artificial Intelligence that is used to narrow the communication gap between the Computer and Human. • Techniques from machine learning and deep neural networks have also been successfully applied to NLP problems. • While many practical applications of NLP already exist, NLP has many unsolved problems. • The field of NLP is primarily concerned with getting computers to perform useful and interesting tasks with human languages. • The field of NLP is secondarily concerned with helping us come to a better understanding of human language. • Eg: Customer support service for a product
  • 6.
    WHY IT ISIMPORTANT? • Faster response • Without Bias • Manage more volume of data • To learn more GOALS • Scientific- Computer to understand • Practical – Using available data
  • 7.
    WHY DO COMPUTERSHAVE DIFFICULTY WITH NLP? • Computers - dealing with structured data(organized, indexed and referenced) • In NLP, we often deal with unstructured data. • Eg: Social media posts, news articles, emails, and product reviews are examples of text-based unstructured data. • To process such text, NLP has to learn the structure and grammar of the natural language. • Importantly, 80% of enterprise data is unstructured.
  • 8.
    HOW IT ISWORKING:GENERAL SKETCH USER MACHINE TEXT PROCESS RESPONSE Input Convert ML Out Audio/Text
  • 9.
    Natural Language Processing isdivided into sub-areas, i.e., Natural Language Generation and Natural Language Understanding, which are, as the name suggests, associated with the generation and understanding of the text. TERMSTO UNDERSTAND
  • 10.
    SYNTAX • Definition: Theset of rules that governs the structure and order of words in sentences. • Importance: Helps in understanding sentence structure and grammar. • Example: • Correct Syntax: "The cat sits on the mat." • Incorrect Syntax: "Cat the mat on sits." • Applications in NLP: • Part-of-Speech (POS) tagging. • Syntactic parsing to analyze grammatical structure.
  • 11.
    MORPHOLOGY • Definition: Thestudy of the structure and formation of words, including roots, prefixes, suffixes, and inflections. • Types: • Inflectional Morphology: Changes in word form to express tense, number, gender, etc. (e.g., "walk" → "walked"). • Derivational Morphology: Formation of new words by adding affixes (e.g., "happy" → "happiness"). • Importance: Helps in breaking words into meaningful components. • Applications in NLP: • Lemmatization and stemming. • Spell correction and word segmentation.
  • 12.
    SEMANTICS • Definition: Thestudy of meaning in language, including word meanings and sentence meanings. • Importance: Focuses on understanding the meaning of text. • Example: • "The bank is by the river" (financial institution vs. riverbank). • Applications in NLP: • Word sense disambiguation. • Named entity recognition (NER). • Sentiment analysis.
  • 13.
    PRAGMATICS • Definition: Thestudy of how context influences the interpretation of meaning in language. • Importance: Goes beyond literal meanings to understand implied meanings and situational context. • Example: • Literal: "Can you pass the salt?" (Yes, I can.) • Pragmatic: (Actually passing the salt). • Applications in NLP: • Chatbots and conversational agents. • Sarcasm and irony detection. • Context-aware language generation.
  • 14.
    PHONOLOGY • Definition: Thestudy of the sound systems of a language and how sounds are organized and used. • Importance: Relevant in speech-related NLP tasks. • Applications in NLP: • Speech recognition (converting speech to text). • Text-to-speech systems (TTS). • Pronunciation correction tools.
  • 15.
    NLU and NLG •Natural Language Understanding (NLU): • involves converting speech or text into useful representations on which analysis can be performed. • Goal- to resolve ambiguities, obtain context and understand the meaning of what's being said. • NLU is about semantic relationships and meaning. • NLU tackles the complexities of language beyond the basic sentence structure. • Natural Language Generation (NLG): • Given an internal representation. • involves selecting the right words, forming phrases and sentences. • Sentences need to ordered so that information is conveyed correctly. analysis synthesis
  • 16.
    MAIN APPROACHES ADOPTEDBY NLP • Symbolic approach (from the 1950s) • rooted in linguistics. • Given the rules of syntax and grammar - obtain the structure of text. • Using logic, we could obtain the meaning. • But rules had to be hand-crafted and were often numerous. • They didn't handle colloquial text well. • Rules worked well for specific use cases but couldn't be generalized. • Statistical approach(from the 1980s) • Rules were learned and they had associated probabilities. • ML models came in with support vector machines and logistic regression. • More recently, Deep Learning (DL) models that employ a neural network of many layers have brought better accuracy. • This success is partly due to the more efficient representations given by word embeddings. NLP involves different levels or scope of analysis.
  • 17.
  • 18.
    LEXICAL AND MORPHOLOGICALANALYSIS • The lexical phase - involves scanning text and breaking it down into smaller units (tokens) . • Tokenization is essential for understanding and processing text at the word level. • In addition to tokenization, various data cleaning and feature extraction techniques are applied, including: Lemmatization, Stopwords Removal, Correcting Misspelled Words. • Morphological Analysis - focusing on identifying morphemes, Understanding morphemes is vital for grasping the structure of words and their relationships. • Types of Morphemes: Free Morphemes and Bound Morphemes • Importance of Morphological Analysis- Understanding Word Structure, Predicting Word Forms, Improving Accuracy
  • 19.
    SYNTACTIC ANALYSIS (PARSING) •essential for understanding the structure of a sentence and assessing its grammatical correctness. • It involves analyzing the relationships between words and ensuring their logical consistency by comparing their arrangement against standard grammatical rules. • Role- examines the grammatical structure and relationships within a given text and assigns Parts-Of-Speech (POS) tags to each word • This tagging is crucial for understanding how words relate to each other syntactically and helps in avoiding ambiguity Sentence: "John eats an apple." POS Tags: • John: Proper Noun (NNP) • eats: Verb (VBZ) • an: Determiner (DT) • apple: Noun (NN)
  • 20.
    SEMANTIC ANALYSIS • focusingon extracting the meaning from text. • concerned with the literal and contextual meaning of words, phrases, and sentences. • determines whether the arrangement of words in a sentence makes logical sense • helps in finding context and logic by ensuring the semantic coherence of sentences. • Key Tasks: • Named Entity Recognition (NER): identifies and classifies entities within the text • Word Sense Disambiguation (WSD): determines the correct meaning of ambiguous words based on context • Example- “Orange eats a Mary” - grammatically correct but does not make sense semantically.
  • 21.
    DISCOURSE INTEGRATION • comprehendingthe relationship between the current sentence and earlier sentences or the larger context. • contextualizing text and understanding the overall message conveyed. • Role- examines how words, phrases, and sentences relate to each other within a larger context. • assesses the impact a word or sentence and how the combination of sentences affects the overall meaning. • helps in understanding implicit references and the flow of information across sentences. • Example: "This is unfair!“ - "this" - need to examine the preceding or following sentences
  • 22.
    PRAGMATIC ANALYSIS • focusingon interpreting the inferred meaning of a text beyond its literal content. • Role- aims to grasp these deeper meanings in communication. i.e what the writer or speaker truly intends to convey? • Importance of Understanding Intentions - the word "Hello" can have various interpretations depending on the tone and context in which it is spoken. • Example: "Hello! What time is it?“ -might be a straightforward request for the current time, but it could also imply concern about being late.
  • 23.
    NLP PIPELINE • NLPpipeline is a sequence of interconnected steps that systematically transform raw text data into a desired output. • It’s analogous to a factory assembly line, where each step refines the material until it reaches its final form. • This pipeline is not universal. • This is ML pipeline and deep learning pipelines are slightly different. • NLP pipeline is non-linear (that means stages can have more dynamic connections, allowing for branching and iteration).
  • 24.
    DATA ACQUISITION Objective: Collectraw text data for creating a robust dataset. • Data Available: • On Your Desk: Begin text preprocessing immediately. • In Databases: Collaborate with data engineers to retrieve data. • Insufficient Data: Use data augmentation techniques: • Synonym replacement. • Bigram flip. • Back translation. • Adding noise. • Data from External Sources: • Use public datasets (Kaggle, UCI, government repositories). • Extract data using web scraping (e.g., BeautifulSoup, Scrapy). • Access APIs (e.g., Twitter, Reddit, news aggregators). • Extract text from PDFs (e.g., PyPDF2, PDFMiner). • No Existing Data: • Collaborate with trusted clients for anonymized data. • Generate synthetic data through surveys, interviews, or user-generated content.
  • 25.
    TEXT PREPROCESSING Objective: Cleanand standardize text for meaningful analysis. Steps: • Basic Cleaning: • Remove HTML tags and irrelevant formatting. • Handle emojis (convert or remove). • Perform spell checks for consistency. • Basic Preprocessing: • Tokenize text into words or sentences. • Remove stop words (e.g., “the,” “is”). • Apply stemming or lemmatization. • Convert text to lowercase. • Detect the text’s language. • Advanced Preprocessing: • Perform Part-of-Speech (POS) tagging. • Conduct parsing for grammatical structure. • Resolve coreferences for coherent understanding.
  • 26.
    FEATURE ENGINEERING Objective: Converttext into numerical features for models. • Techniques: • Bag of Words (BoW): Frequency-based representation of unique words. • TF-IDF: Weighs word importance based on frequency and rarity. • One-Hot Encoding: Binary vectors for words, effective for small vocabularies. • Word Embeddings: Dense vector representations capturing semantic meaning (e.g., Word2Vec, GloVe, FastText). • N-Gram Models: Capture sequences of adjacent words (bigrams, trigrams). • Dependency Parsing: Capture relationships between words through syntactic dependencies.
  • 27.
    MODELLING Objective: Train modelsto perform NLP tasks. • Approaches: • Heuristic Models: • Rule-based systems for specific patterns (e.g., keyword matching). • Machine Learning: • Support Vector Machines (SVM): Effective for text classification. • Random Forests: Suitable for sentiment analysis or categorization. • Deep Learning: • RNNs: Handle sequence-based tasks like language modeling. • Transformers: Capture long-range dependencies for tasks like translation and summarization. • Cloud APIs: • Google Cloud, Microsoft Azure: Provide pre-trained models for rapid prototyping.
  • 28.
    EVALUATION Objective: Assess modelperformance. • Types: • Intrinsic Evaluation: • Accuracy, Precision, Recall, F1-Score. • BLEU for translation, Perplexity for language models. • Extrinsic Evaluation: • Business metrics (e.g., customer satisfaction, revenue impact). • Task-specific metrics (e.g., classification accuracy). • User-centric evaluation (feedback, surveys).
  • 29.
    DEPLOYMENT Objective: Implement themodel in real-world applications. Steps: • Deployment: • Integrate the model into production systems. • Set up infrastructure for scalability and reliability. • Validate functionality through testing. • Monitoring: • Continuously monitor performance and behavior. • Implement alerts for deviations or anomalies. Updates: • Adapt to dynamic data and retrain models periodically. • Maintain version control for transparency. • Address evolving user needs based on feedback.
  • 30.
    CHALLENGES IN NLP •Diversity in Language and Communication • Challenges in Sourcing and Preparing Training Data • Time and Resource Demands for NLP Development • Dealing with Ambiguity in Phrasing and Meaning • Correcting Spelling and Grammar Errors • Addressing Bias and Fairness in NLP Models • Handling Lexical Ambiguity and Multiple Meanings • Overcoming Multilingual and Cross-Cultural Barriers • Minimizing Uncertainty and False Positive Predictions • Enabling Seamless and Ongoing Conversations
  • 31.
    HOW TO OVERCOMENLP CHALLENGES • Enhance Data Quantity and Quality • Use high-quality, diverse datasets to train NLP models effectively. • Apply techniques like data augmentation, data synthesis, and crowdsourcing to address data scarcity. • Handle Ambiguity in Language • Train NLP algorithms to disambiguate words and phrases using context and semantic analysis. • Address Out-of-Vocabulary (OOV) Words • Implement techniques like tokenization, character-level modeling, and vocabulary expansion to manage OOV words. • Tackle Lack of Annotated Data • Use transfer learning and pre-training to leverage large datasets and apply knowledge to tasks with limited labeled data.
  • 32.
    SCOPE OF NLP TextProcessing and Analysis • Sentiment Analysis: Understanding opinions and sentiments from text data (e.g., social media, reviews). • Text Summarization: Generating concise summaries of lengthy documents or articles. • Topic Modeling: Identifying hidden topics within text datasets. • Text Classification: Categorizing emails, documents, and news articles into predefined groups. Human-Computer Interaction • Chatbots: Enhancing customer service through conversational agents. • Virtual Assistants: Powering voice-based systems like Siri, Alexa, and Google Assistant. • Speech-to-Text and Text-to-Speech: Enabling accessibility for visually or hearing-impaired users.
  • 33.
    SCOPE OF NLP(CONT…) Healthcare Applications • Clinical Text Analysis: Extracting insights from electronic health records (EHRs). • Medical Chatbots: Offering basic medical advice and appointment scheduling. • Drug Discovery: Analyzing medical literature for drug development. • Disease Prediction: Detecting early signs of illness from patient records. Language Translation and Localization • Machine Translation: Tools like Google Translate for multilingual communication. • Localization: Adapting content for cultural and regional relevance. • Cross-Language Information Retrieval: Searching for information across languages.
  • 34.
    SCOPE OF NLP(CONT…) Business and Marketing • Customer Sentiment Analysis: Understanding customer feedback and improving products/services. • Personalized Marketing: Crafting targeted campaigns based on user behavior and preferences. • Automated Report Generation: Summarizing business insights from data analytics. Education and E-Learning • Grammar and Spell Checking: Tools like Grammarly for improving written communication. • Content Recommendation: Tailoring learning materials based on user progress and preferences. • Language Learning: Interactive tools for acquiring new languages.
  • 35.
    SCOPE OF NLP(CONT…) Media and Entertainment • Content Moderation: Detecting inappropriate or harmful content. • Script Analysis: Generating or analyzing scripts for movies or shows. • Automated Subtitles: Generating real-time captions for videos. Legal and Compliance • Document Review: Analyzing contracts and legal documents for compliance. • Case Law Analysis: Extracting insights from legal precedents. • Regulatory Monitoring: Keeping track of changes in compliance requirements.
  • 36.
    SCOPE OF NLP(CONT…) Research and Development • Knowledge Graphs: Building relationships between entities for research purposes. • Question-Answering Systems: Advanced AI models for research and academic purposes. • Scientific Literature Analysis: Summarizing and categorizing research papers. Emerging Areas • Emotion Detection: Understanding emotions conveyed in text or speech. • Real-Time Applications: Real-time language translation and sentiment tracking. • Ethical AI in NLP: Developing models that mitigate bias and ensure fairness. • Multimodal NLP: Integrating text with images, audio, and video for deeper insights.
  • 37.
    APPLICATIONS Chatbots • Simulate human-likeconversation using Natural Language Processing (NLP) and Machine Learning (ML). • Understand complex language and improve over time by learning from interactions. • Function through two steps: understanding user input and providing appropriate responses. Autocomplete in Search Engines • Suggest possible completions for typed queries based on keyword predictions. • Analyze vast datasets and patterns to provide relevant suggestions. • NLP identifies relationships between words to predict user intent.
  • 38.
    APPLICATIONS (CONT…..) Voice Assistants •Examples: Siri, Alexa, Google Assistant. • Perform tasks such as making calls, setting reminders, and surfing the internet using voice commands. • Utilize speech recognition, natural language understanding, and NLP for interaction. Language Translators • Translate text between languages using Sequence-to-Sequence modeling. • Transitioned from Statistical Machine Translation (SMT) to advanced NLP models for improved accuracy. • Examples: Google Translate, which identifies patterns and vocabulary of languages
  • 39.
    APPLICATIONS (CONT…..) Sentiment Analysis •Analyze user sentiments on social media, reviews, or feedback. • Employs NLP, text analysis, and computational linguistics to classify sentiments as positive, negative, or neutral. • Helps businesses gauge public opinion, understand brand perception, and improve services. Grammar Checkers • Enhance professional and academic writing by correcting grammar and spelling errors. • Suggest synonyms and improve readability using NLP algorithms trained on large datasets. • Essential for producing polished and error-free content.
  • 40.
    APPLICATIONS (CONT…..) Email Classificationand Filtering • Categorize emails into sections like Primary, Social, and Promotions using text classification. • NLP identifies the context and content of emails to automate sorting. • Improves productivity by decluttering inboxes and organizing communication Electronic Health Records (EHR) Analysis • Extract and organize unstructured data from clinical notes, discharge summaries, and patient histories. • Streamline documentation and provide physicians with actionable insights. • Enable faster and more accurate diagnosis through data-driven decision-making..
  • 41.