Natur al Language Processing




                         Jaganadh G
                     Process expert (NLP, ir & ie)
                           r&d Division
                           365media inc.
                        Coimbatore, India
                         California , usa
                         Jaganadhg@365media.in
                           www.365media.com




04-06-2010               Govt. Eng. College
                              painav
outline

  ➢Introduction
  ➢History


  ➢Areas in NLP


  ➢Future of NLP


  ➢References




04-06-2010          Govt. Eng. College
                         painav
Question ?

  ➢Have you ever used any NLP products/ NLP
  Powered tools ?
  ➢


  ➢


  ➢




04-06-2010       Govt. Eng. College
                      painav
Natural Language Processing



  ➢A sub-field of Artificial Intelligence (AI)
  ➢An inter disciplinary subject


  ➢Aim:


  ➢To build intelligent computers that can interact with


  human being like a human being !!


04-06-2010            Govt. Eng. College
                           painav
Natural Language ?



  ➢   Natural Language?
  ➢Refers to the language spoken by people, e.g. English,
  Japanese, Swahili, as opposed to artificial languages, like
  C++, Java, etc.


04-06-2010                Govt. Eng. College
                               painav
Definition


  Natural Language Processing is a theoretically motivated
  range of computational techniques for analyzing and
  representing naturally occurring texts/speech at one or
  more levels of linguistic analysis for the purpose of
  achieving human-like language processing for a range of
  tasks or applications.

04-06-2010             Govt. Eng. College
                            painav
History



  ●Second World War !!!
  ●Started with Machine Translation Research


  ● Now:


  ●The most promising technology solutions


  ●Labs --> Industry --> Layman




04-06-2010            Govt. Eng. College
                           painav
Why NLP



   ➢Huge amounts of data
   Internet = at least 20 billions pages
       Text data – web sites, blog, tweets .......
       Audio data – speech .......
   ➢Applications for processing large amounts of texts require NLP

   expertise


04-06-2010                 Govt. Eng. College
                                painav
Why nlp?
  News:
  AN EARTHQUAKE struck Indonesia today - a strapping 7.7 magnitude earthquake that struck early today off the
  northern coast of the island of Sumatra. It caused minor damage and there are no reports of any deaths, although
  electricity was interrupted in several places.

  Location : Indonesia
  Magnitude: 7.7
  Region: Sumatra (Northern Cost)
  Deaths: Nil
  Damage: Minor

  Tweet
  @nokia announces release of new PDA phones see is.gd/iuTuY
  Who: Nokia
  What: Product announcement




04-06-2010                                   Govt. Eng. College
                                                  painav
Is NLP really hard to achieve




04-06-2010       Govt. Eng. College
                      painav
MAJOR Areas of Research & Development

  ➢Text Processing
  ➢Morph Analyzer


  ➢POS Tagging


  ➢Parsing


  ➢Machine Translation .........


  ➢Speech Processing


  ➢Text to Speech (TTS)


  ➢Automatic Speech Recognition (ASR)


  ➢Speech to Speech Translation

04-06-2010                Govt. Eng. College
                               painav
Text processing
 ●   Processing raw text
      ● Morphological Analysis


        ● Running --> run + ing


      ● POS Tagging


        ● Ram/NNP goes/VB to/TO school/NNP ..


      ● Stemming


        ● running --> run


      ● Parsing


        ● Identifying sentence structure


        ● S --> NP + VP .Govt. Eng. College
04-06-2010
                                 painav
Text processing



  Machine Translation
  Translating content in one natural language to another
  natural language
  Example : Translating and English Sentence to Malaylam
  with the help of a software.


04-06-2010            Govt. Eng. College
                           painav
Speech processing

  ➢Text to speech
     Converting electronic text to digital speech
  ➢Automatic Speech Recognition


     Automatic transcription of spoken content to
     electronic text
  ➢Speech to speech translation


     Translating spoken content from one language to
     another in real time or offline.
04-06-2010           Govt. Eng. College
                          painav
MAJOR Areas of Research & Development
             industrial Applications
 ➢   Search Engines
   ➢Advanced Text Editors


   ➢Commercial Machine Translation Systems


   ➢Information Extraction


   ➢Collaborative filtering


   ➢Translation Memories


   ➢Computational Advertising


   ➢Fraud Detection


   ➢Sentiment Analysis


   ➢Opinion Mining ......... Govt. Eng. College
04-06-2010
                                  painav
Some examples


  Document classification




                                       ??
                                     Sports
             Document                 Arts
                                     History
                                     Science
                                       ??


04-06-2010                 Govt. Eng. College
                                painav
Information extraction


                                               Who did what ?
             Document                            When ?
                                                 Where?




                Barrack Obama
                                                    Person: Barrack Obama ->Who
              elected as president
                                                     Position: President -> What
                     Of US
                                                        Event: elected -> What



04-06-2010                           Govt. Eng. College
                                          painav
Sentiment analysis



        #2012 in very good !!??
                bleh :-(




             Toby Segram's Programming
              Collective intelligence is a
             nice book. It gives a detailed
                and simple view on ......

04-06-2010                              Govt. Eng. College
                                             painav
Collaborative filtering




  The art /technology to make recommendations based on
  user behavior

04-06-2010            Govt. Eng. College
                           painav
Search engines




04-06-2010      Govt. Eng. College
                     painav
Semantic web/search




04-06-2010         Govt. Eng. College
                        painav
Future of Nlp


  ➢Semantic Web/Search
  ➢Sentiment Analysis / Opinion Mining


  ➢Machine Translation


  ➢Advanced Speech Processing Application


  ➢Social Network Analysis


  ➢Collective Intelligence




04-06-2010             Govt. Eng. College
                            painav
NLP in other Domains
 ➢   Bio-Medical
   ➢Forensic Science


   ➢Advertisement


   ➢Education


   ➢Politics


   ➢E-governance


   ➢Business Development


   ➢Marketing


   ➢and where ever we use language !!!
04-06-2010              Govt. Eng. College
                              painav
Nlp in India
                    IIT Kanpur
                IIT Kharagpur
                      IIT Delhi
                 IIIT Hydrabad
               AU-KBC Chennai
                       C-DAC
                     Microsoft
                        Yahoo
                         AOL
                  365MEEDIA
                        Taazaa
                Reuters India
                          .....
04-06-2010     Govt. Eng. College
                    painav
Discussion time




             Questions ?

04-06-2010       Govt. Eng. College
                      painav
About 365media




  Real time information services
  Started in 1998 – with 10 staff at California
  India operations started in 2005 @ coimbatore
  Now 300 employees , 20 + clients




04-06-2010          Govt. Eng. College
                         painav
thanks

                         Jaganadh G
                            Email
             Business -Jaganadhg@365media.in
              Personal -jaganadhg@gmail.com
             http://jaganadhg.freeflux.net/blog


04-06-2010               Govt. Eng. College
                              painav

Natural Language Processing

  • 1.
    Natur al LanguageProcessing Jaganadh G Process expert (NLP, ir & ie) r&d Division 365media inc. Coimbatore, India California , usa Jaganadhg@365media.in www.365media.com 04-06-2010 Govt. Eng. College painav
  • 2.
    outline ➢Introduction ➢History ➢Areas in NLP ➢Future of NLP ➢References 04-06-2010 Govt. Eng. College painav
  • 3.
    Question ? ➢Have you ever used any NLP products/ NLP Powered tools ? ➢ ➢ ➢ 04-06-2010 Govt. Eng. College painav
  • 4.
    Natural Language Processing ➢A sub-field of Artificial Intelligence (AI) ➢An inter disciplinary subject ➢Aim: ➢To build intelligent computers that can interact with human being like a human being !! 04-06-2010 Govt. Eng. College painav
  • 5.
    Natural Language ? ➢ Natural Language? ➢Refers to the language spoken by people, e.g. English, Japanese, Swahili, as opposed to artificial languages, like C++, Java, etc. 04-06-2010 Govt. Eng. College painav
  • 6.
    Definition NaturalLanguage Processing is a theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts/speech at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications. 04-06-2010 Govt. Eng. College painav
  • 7.
    History ●SecondWorld War !!! ●Started with Machine Translation Research ● Now: ●The most promising technology solutions ●Labs --> Industry --> Layman 04-06-2010 Govt. Eng. College painav
  • 8.
    Why NLP ➢Huge amounts of data Internet = at least 20 billions pages Text data – web sites, blog, tweets ....... Audio data – speech ....... ➢Applications for processing large amounts of texts require NLP expertise 04-06-2010 Govt. Eng. College painav
  • 9.
    Why nlp? News: AN EARTHQUAKE struck Indonesia today - a strapping 7.7 magnitude earthquake that struck early today off the northern coast of the island of Sumatra. It caused minor damage and there are no reports of any deaths, although electricity was interrupted in several places. Location : Indonesia Magnitude: 7.7 Region: Sumatra (Northern Cost) Deaths: Nil Damage: Minor Tweet @nokia announces release of new PDA phones see is.gd/iuTuY Who: Nokia What: Product announcement 04-06-2010 Govt. Eng. College painav
  • 10.
    Is NLP reallyhard to achieve 04-06-2010 Govt. Eng. College painav
  • 11.
    MAJOR Areas ofResearch & Development ➢Text Processing ➢Morph Analyzer ➢POS Tagging ➢Parsing ➢Machine Translation ......... ➢Speech Processing ➢Text to Speech (TTS) ➢Automatic Speech Recognition (ASR) ➢Speech to Speech Translation 04-06-2010 Govt. Eng. College painav
  • 12.
    Text processing ● Processing raw text ● Morphological Analysis ● Running --> run + ing ● POS Tagging ● Ram/NNP goes/VB to/TO school/NNP .. ● Stemming ● running --> run ● Parsing ● Identifying sentence structure ● S --> NP + VP .Govt. Eng. College 04-06-2010 painav
  • 13.
    Text processing Machine Translation Translating content in one natural language to another natural language Example : Translating and English Sentence to Malaylam with the help of a software. 04-06-2010 Govt. Eng. College painav
  • 14.
    Speech processing ➢Text to speech Converting electronic text to digital speech ➢Automatic Speech Recognition Automatic transcription of spoken content to electronic text ➢Speech to speech translation Translating spoken content from one language to another in real time or offline. 04-06-2010 Govt. Eng. College painav
  • 15.
    MAJOR Areas ofResearch & Development industrial Applications ➢ Search Engines ➢Advanced Text Editors ➢Commercial Machine Translation Systems ➢Information Extraction ➢Collaborative filtering ➢Translation Memories ➢Computational Advertising ➢Fraud Detection ➢Sentiment Analysis ➢Opinion Mining ......... Govt. Eng. College 04-06-2010 painav
  • 16.
    Some examples Document classification ?? Sports Document Arts History Science ?? 04-06-2010 Govt. Eng. College painav
  • 17.
    Information extraction Who did what ? Document When ? Where? Barrack Obama Person: Barrack Obama ->Who elected as president Position: President -> What Of US Event: elected -> What 04-06-2010 Govt. Eng. College painav
  • 18.
    Sentiment analysis #2012 in very good !!?? bleh :-( Toby Segram's Programming Collective intelligence is a nice book. It gives a detailed and simple view on ...... 04-06-2010 Govt. Eng. College painav
  • 19.
    Collaborative filtering The art /technology to make recommendations based on user behavior 04-06-2010 Govt. Eng. College painav
  • 20.
    Search engines 04-06-2010 Govt. Eng. College painav
  • 21.
    Semantic web/search 04-06-2010 Govt. Eng. College painav
  • 22.
    Future of Nlp ➢Semantic Web/Search ➢Sentiment Analysis / Opinion Mining ➢Machine Translation ➢Advanced Speech Processing Application ➢Social Network Analysis ➢Collective Intelligence 04-06-2010 Govt. Eng. College painav
  • 23.
    NLP in otherDomains ➢ Bio-Medical ➢Forensic Science ➢Advertisement ➢Education ➢Politics ➢E-governance ➢Business Development ➢Marketing ➢and where ever we use language !!! 04-06-2010 Govt. Eng. College painav
  • 24.
    Nlp in India IIT Kanpur IIT Kharagpur IIT Delhi IIIT Hydrabad AU-KBC Chennai C-DAC Microsoft Yahoo AOL 365MEEDIA Taazaa Reuters India ..... 04-06-2010 Govt. Eng. College painav
  • 25.
    Discussion time Questions ? 04-06-2010 Govt. Eng. College painav
  • 26.
    About 365media Real time information services Started in 1998 – with 10 staff at California India operations started in 2005 @ coimbatore Now 300 employees , 20 + clients 04-06-2010 Govt. Eng. College painav
  • 27.
    thanks Jaganadh G Email Business -Jaganadhg@365media.in Personal -jaganadhg@gmail.com http://jaganadhg.freeflux.net/blog 04-06-2010 Govt. Eng. College painav