Alyona Medelyan (Pingar)
                      @zelandiya

    THE NEXT-GENERATION
             SHAREPOINT:
POWERED BY TEXT ANALYTICS
AGENDA
• Information tasks
• Text analytics
• APIs
• Demos
• Conclusions
Information tasks
What do they cost us?
How does SharePoint help?
Avg. hours per week
14.5
       13.3                                               = $37K       year / person


              9.6   9.5
                          8.8   8.3
                                        6.8   6.7
                                                    5.6   5.6
                                                                4.3   4.2

                                                                             1




                                                                     Source:
                                      IDC, Hidden Cost of Information (2005)
SHAREPOINT SAVES TIME
 Interact with SP from Outlook
       Create docs collaboratively
                   Customize search configuration
                              Use sites, sets & libraries
                                     Define Managed Metadata
                                                       Configure forms
                                                            Design Workflow
Text Analytics
What is it and how does it work?
What tasks does it solve?
WHAT IS TEXT ANALYTICS?
                unstructured data



Linguistics                                  Search
   Statistics                          Data Extraction
  Text Processing                    Document Organization
Machine Learning                    Business Intelligence
Natural Language Processing          Opinion Mining
     Text Mining
TEXT ANALYTICS SAVES MORE TIME
    Compose search reports
        Extract entities
                                        … automatically
        Mine opinions & sentiment
              Cluster search results
                   Redact
                           Summarize
                               Generate metadata
                                              Fill databases
                                                     Profanity check
Text Analytics Software
What companies offer text analytics?
What are open source tools like?
TEXT ANALYTICS: GLOBAL PERSPECTIVE

User adoption has grown by 25% in 2010
 creating an $835 million market because:

• Unstructured data grows (ex. social)  Text analytics!
• Text analytics is central to effective information access
• Many successes in NLP: IBM Watson, Wolfram Alpha



                                    Full report by Seth Grimes:
                                  http://altaplana.com/TA2011
APPLICATIONS OF TEXT ANALYTICS
            Search & info access                                    39%
Customer experience management                                      39%
             Brand management                                       39%
                          Research                               36%
          Competitive intelligence                            33%
                Customer service                        26%
                       E-discovery                15%
                      Life sciences               15%
                    Product design                15%
                Online commerce             11%
                            Finance        10%
                               Other      9%
            Content management           8%
                Insurance & fraud        8%
              Millitary intelligence    7%
                 Law enforcement       6%                        Source:
                                             http://altaplana.com/TA2011
SEARCH & INFO ACCESS
 METADATA EXTRACTION

Document                  Easy to extract:                Metadata
                          File type, name & location,
                          creation & modification date,
                          authors

           Difficult to extract:
           Keywords,
           people & companies mentioned,
           suppliers & addresses mentioned
SEARCH & INFO ACCESS
KEYWORD EXTRACTION

Document     Candidates                                         Keywords



           Hi All,
           As of today, MetaStock has several new functions.
           The most important new feature is the ability to
           display forward heat rate charts.
           Also, notice that the interface looks different -- this
           reflects and accommodates the new features.
           If you have any questions regarding this new
           version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
KEYWORD EXTRACTION

Document     Candidates                                         Keywords



           Hi All,
           As of today, MetaStock has several new functions.
           The most important new feature is the ability to
           display forward heat rate charts.
           Also, notice that the interface looks different -- this
           reflects and accommodates the new features.
           If you have any questions regarding this new
           version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
    KEYWORD EXTRACTION

    Document     Candidates       Properties                        Keywords



               Hi All,
               As of today, MetaStock has several new functions.
 Frequency     The most important new feature is the ability to
    Position   display forward heat rate charts.
Corpus stats   Also, notice that the interface looks different -- this
Relatedness    reflects and accommodates the new features.
               If you have any questions regarding this new
               version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
 KEYWORD EXTRACTION

Document      Candidates       Properties         Scoring        Keywords



            Hi All,
            As of today, MetaStock has several new functions.
Heuristic   The most important new feature is the ability to
 scoring    display forward heat rate charts.
            Also, notice that the interface looks different -- this
Machine     reflects and accommodates the new features.
learning    If you have any questions regarding this new
            version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
NAMES EXTRACTION

Document      Examples       Properties       Learning        Names



           If you have any questions regarding this new version of
           MetaStock, please contact Bella Santuri.


                                NLP,
       Training data                            Machine
                             Heuristics,
       (annotations)                            Learning
                             Text mining
<SEARCH + TEXT ANALYTICS> COMPANIES




 Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
BRAND & CUSTOMER MANAGEMENT
   SENTIMENT ANALYSIS

 Reviews
Document
Document                                                        Visualization
  Tweets                Sentiment Analysis
                                                                Summary
  Surveys

Naïve approach: Sentiment-words dictionary!

Negative    Positive    BUT:
  suck      fantastic                        If you are reading this because it
 terrible   excellent                        is your darling fragrance, please
  awful     awesome                          wear it at home exclusively, and
                                             tape the windows shut.

                                                No sentiment words!
BRAND & CUSTOMER MANAGEMENT
   SENTIMENT ANALYSIS

 Reviews
Document
Document                                                  Visualization
  Tweets        Examples     Properties    Learning
                                                          Summary
  Surveys


                                       Presence
                                       Position
Training data          Lexicon                            Machine
                                    Part-of-Speech
(annotations)         induction                           Learning
                                       Negation
                                    Generalization
                Important:
                Identifying sentiment bearing sentences
                Attaching sentiment to a topic!
SENTIMENT ANALYSIS COMPANIES
Attensity
AlchemyAPI
Lexalytics
Saplo
Medallia
SAS
RESEARCH
    TEXT SUMMARIZATION
          Address      Hi All,
    Announcement       As of today, MetaStock has several new functions.
           Details     The most important new feature is the ability to
                       display forward heat rate charts.
       More details    Also, notice that the interface looks different -- this
                       reflects and accommodates the new features.
         Conclusion    If you have any questions regarding this new
                       version of MetaStock, please contact Bella Santuri.

Extractive summary:   As of today, MetaStock has several new functions.
Sentence compression: MetaStock has several new functions.
                      The new interface looks different.
Abstractive summary: MetaStock has new features and a new interface.
TEXT SUMMARIZATION COMPANIES




Lexalytics, Pingar
COMPETITIVE INTELLIGENCE:
ENTITY & ENTITY RELATION EXTRACTION




     Companies:
     OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta
FRAUD INVESTIGATION:
NORMALIZATION OF DATES & NAMES




           Companies:
           Cicero, BasisTech
OPEN-SOURCE TOOLS
• NLTK – Apache license, Book, Python & academic
  datasets, nltk.org
• LingPipe – Commercial
  licenses, Tutorials, Coreference & Chinese
  segment, alias-i.com/lingpipe
• OpenNLP – Apache license, Parsing, MaxEnt
  ML, incubator.apache.org/opennlp
• GATE – restricted GPL, Training courses, Applications
  & framework, gate.ac.uk
• Stanford NLP – full GPL, Online docs, Full
  library, nlp.stanford.edu
APIs
What’s an API and how does it work?
What are the advantages of the API model?
Which API is the right one for you?
API ACCESS
                                     a protocol specifies how • SOAP
                                     XML needs to be encoded • REST
                a call is an XML message
                describing the request

                includes API authentication
                calls via a web service
                                              API                          ENGINE
             SDK
               usage examples
Developer creates                       An interface that                Software engine
  an application                     ensures communication             solves a specific task
REST API ACCESS FROM A BROWSER
API request
http://search.yahooapis.com/WebSearchService/V1/webSe
arch?appid=YahooDemo&query=madonna&context=Italian+sc
ulptors+and+painters+of+the+renaissance+favored+the+V
irgin+Mary+for+inspiration
API response
SOAP API ACCESS FROM VS2010
SOAP API ACCESS IN POWERSHELL




Read complete blog post “Bulk metadata extraction in SharePoint”:
http://bit.ly/powershell-migrate
API = EASY INTEGRATION & FLEXIBILITY
• Integrate into existing architecture
  via any programming language
• Improve known flaws in the current system/process
• Minimize adoption barriers within the company
  no or little training required for stuff
• Only pay for the features you need
• Flexible deployment:
   • Host API on site = Secure data exchange
   • Access the API in the cloud = Save on tech support & hardware
WHICH API IS BEST FOR YOU?
         I need to take some text and get a list of the
         important entities/keywords/phrases.


          Y: Term Extractor        API restrictions
          OpenCalais               Supported languages
          BeliefNetworks           Quality of results
          OpenAmplify              Semantic links
          AlchemyAPI 2nd           Synonyms/Duplicates
          Evri 1st

                           Blog post on API comparison:
                                      faganm.com/blog
HOW TO CHOOSE AN API:
• Define a specific task
• Think of what features are important
• Get prepared:
  • Subscribe for API keys
  • Get SDKs
  • Learn libraries
• Find representative data
• Build a test framework
• Compare results
METADATA EXTRACTION
IN SHAREPOINT
Demo
Pingar’s add-on for SharePoint 2010
built using a text analytics API
INTEGRATING APIS
INTO SCANNING
Video
Using Fuji Xerox SmartConnect and Pingar API
to scan documents in batch into SharePoint



                       http://www.youtube.com/watch?v=kluVp25upag
THE NEXT-GENERATION SHAREPOINT:
POWERED BY TEXT ANALYTICS
• What can be automated?
  • Metadata extraction, Data entry, Opinion mining,
    Sanitization, Doc approval, Summarization, …

• How to integrate text analytics
  into existing SharePoint applications?
  • Easy! Via an API

• How to find the right text analytics API?
  • Review what’s available
    Set up an experiment
    Compare results
Thank you to all of our Sponsors

The Next Generation SharePoint: Powered by Text Analytics

  • 1.
    Alyona Medelyan (Pingar) @zelandiya THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS
  • 2.
    AGENDA • Information tasks •Text analytics • APIs • Demos • Conclusions
  • 3.
    Information tasks What dothey cost us? How does SharePoint help?
  • 4.
    Avg. hours perweek 14.5 13.3 = $37K year / person 9.6 9.5 8.8 8.3 6.8 6.7 5.6 5.6 4.3 4.2 1 Source: IDC, Hidden Cost of Information (2005)
  • 5.
    SHAREPOINT SAVES TIME Interact with SP from Outlook  Create docs collaboratively  Customize search configuration  Use sites, sets & libraries  Define Managed Metadata  Configure forms  Design Workflow
  • 6.
    Text Analytics What isit and how does it work? What tasks does it solve?
  • 7.
    WHAT IS TEXTANALYTICS? unstructured data Linguistics Search Statistics Data Extraction Text Processing Document Organization Machine Learning Business Intelligence Natural Language Processing Opinion Mining Text Mining
  • 8.
    TEXT ANALYTICS SAVESMORE TIME  Compose search reports  Extract entities … automatically  Mine opinions & sentiment  Cluster search results  Redact  Summarize  Generate metadata  Fill databases  Profanity check
  • 9.
    Text Analytics Software Whatcompanies offer text analytics? What are open source tools like?
  • 10.
    TEXT ANALYTICS: GLOBALPERSPECTIVE User adoption has grown by 25% in 2010 creating an $835 million market because: • Unstructured data grows (ex. social)  Text analytics! • Text analytics is central to effective information access • Many successes in NLP: IBM Watson, Wolfram Alpha Full report by Seth Grimes: http://altaplana.com/TA2011
  • 11.
    APPLICATIONS OF TEXTANALYTICS Search & info access 39% Customer experience management 39% Brand management 39% Research 36% Competitive intelligence 33% Customer service 26% E-discovery 15% Life sciences 15% Product design 15% Online commerce 11% Finance 10% Other 9% Content management 8% Insurance & fraud 8% Millitary intelligence 7% Law enforcement 6% Source: http://altaplana.com/TA2011
  • 12.
    SEARCH & INFOACCESS  METADATA EXTRACTION Document Easy to extract: Metadata File type, name & location, creation & modification date, authors Difficult to extract: Keywords, people & companies mentioned, suppliers & addresses mentioned
  • 13.
    SEARCH & INFOACCESS KEYWORD EXTRACTION Document Candidates Keywords Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 14.
    SEARCH & INFOACCESS KEYWORD EXTRACTION Document Candidates Keywords Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 15.
    SEARCH & INFOACCESS KEYWORD EXTRACTION Document Candidates Properties Keywords Hi All, As of today, MetaStock has several new functions. Frequency The most important new feature is the ability to Position display forward heat rate charts. Corpus stats Also, notice that the interface looks different -- this Relatedness reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 16.
    SEARCH & INFOACCESS KEYWORD EXTRACTION Document Candidates Properties Scoring Keywords Hi All, As of today, MetaStock has several new functions. Heuristic The most important new feature is the ability to scoring display forward heat rate charts. Also, notice that the interface looks different -- this Machine reflects and accommodates the new features. learning If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 17.
    SEARCH & INFOACCESS NAMES EXTRACTION Document Examples Properties Learning Names If you have any questions regarding this new version of MetaStock, please contact Bella Santuri. NLP, Training data Machine Heuristics, (annotations) Learning Text mining
  • 18.
    <SEARCH + TEXTANALYTICS> COMPANIES Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
  • 19.
    BRAND & CUSTOMERMANAGEMENT  SENTIMENT ANALYSIS Reviews Document Document Visualization Tweets Sentiment Analysis Summary Surveys Naïve approach: Sentiment-words dictionary! Negative Positive BUT: suck fantastic If you are reading this because it terrible excellent is your darling fragrance, please awful awesome wear it at home exclusively, and tape the windows shut. No sentiment words!
  • 20.
    BRAND & CUSTOMERMANAGEMENT  SENTIMENT ANALYSIS Reviews Document Document Visualization Tweets Examples Properties Learning Summary Surveys Presence Position Training data Lexicon Machine Part-of-Speech (annotations) induction Learning Negation Generalization Important: Identifying sentiment bearing sentences Attaching sentiment to a topic!
  • 21.
  • 22.
    RESEARCH  TEXT SUMMARIZATION Address Hi All, Announcement As of today, MetaStock has several new functions. Details The most important new feature is the ability to display forward heat rate charts. More details Also, notice that the interface looks different -- this reflects and accommodates the new features. Conclusion If you have any questions regarding this new version of MetaStock, please contact Bella Santuri. Extractive summary: As of today, MetaStock has several new functions. Sentence compression: MetaStock has several new functions. The new interface looks different. Abstractive summary: MetaStock has new features and a new interface.
  • 23.
  • 24.
    COMPETITIVE INTELLIGENCE: ENTITY &ENTITY RELATION EXTRACTION Companies: OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta
  • 25.
    FRAUD INVESTIGATION: NORMALIZATION OFDATES & NAMES Companies: Cicero, BasisTech
  • 26.
    OPEN-SOURCE TOOLS • NLTK– Apache license, Book, Python & academic datasets, nltk.org • LingPipe – Commercial licenses, Tutorials, Coreference & Chinese segment, alias-i.com/lingpipe • OpenNLP – Apache license, Parsing, MaxEnt ML, incubator.apache.org/opennlp • GATE – restricted GPL, Training courses, Applications & framework, gate.ac.uk • Stanford NLP – full GPL, Online docs, Full library, nlp.stanford.edu
  • 27.
    APIs What’s an APIand how does it work? What are the advantages of the API model? Which API is the right one for you?
  • 28.
    API ACCESS a protocol specifies how • SOAP XML needs to be encoded • REST a call is an XML message describing the request includes API authentication calls via a web service API ENGINE SDK usage examples Developer creates An interface that Software engine an application ensures communication solves a specific task
  • 29.
    REST API ACCESSFROM A BROWSER API request http://search.yahooapis.com/WebSearchService/V1/webSe arch?appid=YahooDemo&query=madonna&context=Italian+sc ulptors+and+painters+of+the+renaissance+favored+the+V irgin+Mary+for+inspiration API response
  • 30.
    SOAP API ACCESSFROM VS2010
  • 31.
    SOAP API ACCESSIN POWERSHELL Read complete blog post “Bulk metadata extraction in SharePoint”: http://bit.ly/powershell-migrate
  • 32.
    API = EASYINTEGRATION & FLEXIBILITY • Integrate into existing architecture via any programming language • Improve known flaws in the current system/process • Minimize adoption barriers within the company no or little training required for stuff • Only pay for the features you need • Flexible deployment: • Host API on site = Secure data exchange • Access the API in the cloud = Save on tech support & hardware
  • 33.
    WHICH API ISBEST FOR YOU? I need to take some text and get a list of the important entities/keywords/phrases. Y: Term Extractor API restrictions OpenCalais Supported languages BeliefNetworks Quality of results OpenAmplify Semantic links AlchemyAPI 2nd Synonyms/Duplicates Evri 1st Blog post on API comparison: faganm.com/blog
  • 34.
    HOW TO CHOOSEAN API: • Define a specific task • Think of what features are important • Get prepared: • Subscribe for API keys • Get SDKs • Learn libraries • Find representative data • Build a test framework • Compare results
  • 35.
    METADATA EXTRACTION IN SHAREPOINT Demo Pingar’sadd-on for SharePoint 2010 built using a text analytics API
  • 36.
    INTEGRATING APIS INTO SCANNING Video UsingFuji Xerox SmartConnect and Pingar API to scan documents in batch into SharePoint http://www.youtube.com/watch?v=kluVp25upag
  • 38.
    THE NEXT-GENERATION SHAREPOINT: POWEREDBY TEXT ANALYTICS • What can be automated? • Metadata extraction, Data entry, Opinion mining, Sanitization, Doc approval, Summarization, … • How to integrate text analytics into existing SharePoint applications? • Easy! Via an API • How to find the right text analytics API? • Review what’s available Set up an experiment Compare results
  • 39.
    Thank you toall of our Sponsors

Editor's Notes

  • #2 Opening slide please include
  • #3 How many hours per week does an average person that uses a computer spends on Searching?What the heck is text analytics, a 101 introduction course…How API work and why they are great for both business people and developers.
  • #12 What are your primary applications where text comes into play?