White Paper
Big Data, Little Data, and
Everything in Between –
IBM SPSS Solutions Help
You Bring Analytics to
Everyone
Contents
Executive Summary.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 2
Introduction.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 3
	 Do Your Users Really Have “Big Data?” Does it Matter?. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 3
	 Trends in Predictive Analytics. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 4
	 Opportunities Created by Effective Predictive Analytics.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 4
Predictive Analytics Next.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 5
	 Heterogeneous Users and Diverse Use Cases.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 5
	 Real-time and Embedded Analytics.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 6
	 The Bottom Line.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 6
Big Data – Complicated, Messy, and Really Useful. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 7
	 Asking the Right Questions.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 8
	 Yes, Your Users Can Access Hadoop.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 8
	 Performance and Scalability, No Matter How “Big” the Data. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 8
Conclusion: Teaching Users What They Want, Giving Them What They Need. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 9
®
®
About Ziff Davis B2B
Ziff Davis B2B is a leading provider of research to technology buyers and high-quality
leads to IT vendors. As part of the Ziff Davis family, Ziff Davis B2B has access to over
50 million in-market technology buyers every month and supports the company’s core
mission of enabling technology buyers to make more informed business decisions.
Contact Ziff Davis B2B
100 California Street, Suite 650
San Francisco, CA 94111
Tel: 415.318.7200  |  Fax: 415.318.7219	
Email: marty_fettig@ziffdavis.com
www.ziffdavis.com
Copyright © 2014 Ziff Davis B2B. All rights reserved.
ziffdavis.com 2 of 9
Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between
ziffdavis.com
Executive Summary
“Big Data” has been on corporate radar screens for years now. Unfortunately, outside of the
data scientists and statisticians who spend their days immersed in truly complex datasets,
both end users and key decision makers often struggle to make sense of the data their
organizations collect and generate. Whether an organization’s data is really “big” is even a topic
of debate.
Characterizing a company’s data, though, can’t simply be dismissed as a matter of semantics. It
frequently falls on IT to provide appropriate analytics solutions for heterogeneous users with a
wide range of skill sets, job descriptions, and analytical needs, whether the data being analyzed
is truly unstructured, web-scale data or just too many spreadsheets. Regardless of the type
or scale of data your users need to harness and analyze, they need a straightforward, visual
solution that is easy to use on the front end and highly scalable on the back end. Fortunately,
IBM SPSS Modeler, SPSS Analytical Server, and SPSS Analytical Catalyst provide just such
an ecosystem that can make different kinds of data stores, from Hadoop to those proverbial
spreadsheets, useful sources of business insight and decision support.
ziffdavis.com 3 of 9
Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between
Introduction
Modern businesses no longer struggle to collect customer data, record internal metrics, or
even build web-scale data warehouses. As the
volume of available data continues to increase
exponentially and our ability to collect it from a
multitude of sources has improved dramatically,
the real problem has become how to turn all
of this data into usable information. How can
the data drive strategic planning and tactical
decision-making in concrete ways from the
executive boardroom down to specific lines of
business?
In the same way, IT departments have
become quite adept at building storage and
data management infrastructures. Storage
virtualization, cloud technologies, and even
Hadoop clusters let IT collect, store, and manage
all manner of data. However, as the “keepers
of the data”, IT is also frequently asked to
implement and support an analytics solution that
can do more with all of this data than merely
spit out reports. Easy-to-use and big data
analytics are not two concepts that usually go
hand-in-hand, but whatever solution IT delivers
needs to be flexible and scalable on the backend
while meeting the needs of a variety of users
on the front end. It needs to connect to existing
data stores and be ready for future sources of
data, the structure and scale of which may not
even be predictable. Too often, this puts IT in the
unenviable position of rolling out a solution to a
very poorly defined problem.
Do Your Users Really Have “Big Data?” Does it Matter?
One way to approach the widely varying needs of end users is to look for multiple solutions
that suit particular analytics requirements. For example, human resources may want to analyze
metrics collected from various departments and job classifications to help determine pay
grades and compensation. The data they need to examine would hardly be considered “big
data” but would be completely overwhelming in basic pivot tables or spreadsheets.
What is Big Data?
	 The term “Big Data” is used
so frequently that it would
hardly seem to require a
definition. Yet it is frequently
misused and misunderstood.
IT administrators know that:
•	 Everyone thinks they have big data
•	 Everyone believes they should be
leveraging big data
•	 If they happen to not have big data
or the tools to analyze it, they want
them…now
	 In reality, data is not so easily
quantified. That said, “big data”
refers to collections of data that
are too large and complicated for
management and analysis with
standard tools. These tools were
built for relational databases that
pre-date a ubiquitous World Wide
Web, machine-to-machine data, and
unstructured data that now dominate
our most challenging analytical tasks
ziffdavis.com 4 of 9
Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between
The marketing department, on the other hand, may be analyzing potentially millions of records
from social media, online advertising, and overall market trends, and attempting to correlate
that information with actual brick and mortar point of sale data. They may be encountering
unstructured data, data that normalize poorly, and both transactional and historical data in
near real-time. Most people would consider this a far better example of big data than the HR
information being analyzed above.
But does it matter? Probably not, if IT can identify a single, unified analytics platform that can
scale both on the back end and for end users, no matter how “big” their data. Realistically,
the IT department can only support a finite number of tools and it is likely that others in the
organization will want to analyze aggregated data that spans the business, a task made much
harder with disparate data management and analytics tools.
Trends in Predictive Analytics
This shift away from traditional data management paradigms with statisticians and data
scientists as the sole end users of an organization’s data is paralleled by a move away from
strict analytical reporting and towards predictive analytics. Predictive analytics have been a
hallmark of business intelligence and decision support systems for some time, but again, these
systems have largely been the domain of statisticians and with executives enjoying the insights
they provide.
Now, however, systems are emerging that allow a much larger group of end users to use
historical and transactional data to model business problems and predict potential outcomes.
The idea of being “data-driven” is extending beyond the C-Suite and trickling down to the rest
of the organization. Tools for predictive analytics are:
•	 Becoming visual and easier to use so that they are accessible to many users
•	 Becoming differentiated and/or scalable, making them suitable for statisticians to build
advanced models and for line of business employees to intelligently formulate questions
and use them for front-line decision-making
•	 Enabling embedded features such that even customer-facing applications can include
predictive features
Opportunities Created by Effective Predictive Analytics
“Want of foresight, unwillingness to act when action would be simple and effective, lack of
clear thinking, confusion of counsel until the emergency comes…these are the features which
constitute the endless repetition of history.” - Winston Churchill
No, Winston Churchill was not talking about predictive analytics or big data in 1935 when
he made these remarks. But predictive analytics are, in fact, a key enabler of so-called
“organizational learning.” Businesses can ask how to better meet customer needs, respond
ziffdavis.com 5 of 9
Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between
to market fluctuations, manage risk, and otherwise seek new competitive advantages
by developing predictive models based on historical data. When implemented correctly,
organizations can use predictive models to answer critical questions that are far better
addressed statistically than with intuition:
•	 How is the current business environment like environments the organization has
encountered in the past?
•	 What approaches worked well then? What approaches didn’t?
•	 What patterns of customer behavior can we correlate with products, marketing, and
strategic shifts?
•	 What changes led to emerging quality problems or customer complaints?
•	 What is the general perception of our products in social media? And what effects do
particular campaigns have on those perceptions?
While these are high-level questions that predictive analytics can help answer, the right
software can also suggest operational adjustments.
Recent high-profile data breaches also highlight opportunities that can be created by
predictive analytics tools. For example, companies could identify transactional patterns
associated with an ongoing attack and address vulnerabilities before they reach critical scale.
Predictive Analytics Next
Large organizations have used predictive analytics for years. Researchers have employed
predictive techniques and tools to model everything from climate change to the efficacy
of cancer drugs. However, the next generation of predictive tools is here. These tools are
accessible enough to find their way into the hands of end users and embedded predictive
analytics are increasingly being surfaced to customers in online applications and ecommerce.
As a result, we’re seeing predictive tools pushed down to operations and moving into the realm
of not just business intelligence but “predictive intelligence.”
As businesses in all sectors look to create cultures of data, IT departments are being asked to
identify solutions that empower end users with robust predictive tools. The traditional “decision
support system” is too far removed from daily decision making and is better suited to strategic
planning.
Heterogeneous Users and Diverse Use Cases
Instead, increasingly savvy users are demanding access to streams of real-time data,
vital historical information, and far more complex data than the distilled reports that many
businesses provide. Interactive dashboards that include predictive analytics and deep
drill-down and visualization capabilities are quickly replacing simple BI scorecards.
ziffdavis.com 6 of 9
Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between
Simultaneously, the data czars in an organization (usually statisticians and data scientists)
need to be able to develop increasingly sophisticated analytics applications to surface to
users.
Real-time and Embedded Analytics
Advanced analytics platform must at once:
•	 Accesses data stores from across an organization
•	 Supports the development of complex applications and deep data insights
•	 Be nearly transparent to most end users
The multifaceted nature of current (and future) analytical needs is driving the growth of
embedded analytics. In particular, embedded predictive analytics support everything from
customer recommendation engines to line of business applications like CRM that improve
customer service and responsiveness in sales and marketing teams.
In fact, for predictive analytics to be truly transformative in an organization and accessible
to the broadest cross-section of users, a growing number of IT and BI professionals believe
that users shouldn’t even realize they are accessing predictive tools. Rather, they should be
application-embedded such that users are seamlessly provided with decision support, without
any need to conduct their own analyses. For example:
•	 Field agents in homeland security positions should not need to log into a separate
analytics application to gain insight into emerging threats based on increased chatter on
social media
•	 Customers visiting a website should automatically be presented with product
recommendation tied to past purchases, profiles built from similar users, and their current
locations
•	 Insurance agents should have a complete view of a client’s risk profile that aggregates
everything from credit scores to prior claims to healthcare data
The Bottom Line for IT
Businesses, and their IT departments in particular, must substantially alter their definition
of users to include customers, partners, internal end users, developers, statisticians, and
executives. Fundamentally, when IT groups are asked to implement predictive analytics
solutions, they are actually being asked to provide an ecosystem of platforms and tools
suitable for every user covered by this new definition, enabling them to make better decisions
faster.
ziffdavis.com 7 of 9
Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between
As we’ve seen, responsibility for business analytics is increasingly being taken on from the
executive to the operational levels of modern enterprises with statisticians and data scientists
leading the most complex and strategically important analytical initiatives. Although IT has
some analytical needs in its own right (e.g., tracking hardware capacity, application readiness,
etc.), IT’s real focus is on providing platforms. Integrated platforms that can support:
1.	 Complex analytics with hooks into Hadoop and other varied data stores
2.	 More basic standalone analytics needs
3.	 Executive-level decision support
4.	 Embedded predictive analytics are hard to find in the market today
No discussion of predictive analytics tools would be complete without addressing ways that
they address Big Data. As we will see in the next section, IBM SPSS Modeler, SPSS Analytical
Server, and SPSS Analytical Catalyst form exactly the sort of integrated platform outlined
above that can address both Big Data needs and satisfy requirements for analysis of local
data stores.
Big Data – Complicated, Messy, and
Really Useful
Actually using Big Data in meaningful and
insightful ways, influence customers, and,
as described in the discussion of predictive
analytics above, “make better decisions faster”,
is one of the greatest challenges facing
organizations today. Big Data is messy for
several reasons:
•	 Its scale is such that many tools buckle
under the sheer volume of records involved
•	 Data often don’t fit (because of their
inherent structure or lack thereof) into the
neat, glorified spreadsheets to which users
are accustomed
•	 Data must often be aggregated from
sources that were never meant to be
merged and joined to generate insights
All of these challenges aside, organizations can’t afford to ignore their vast stores of data
What is Hadoop?
Hadoop is an open source technology
for storing, indexing, and analyzing
very complicated datasets. Originally
conceived by Google to perform deep
analytics on unstructured search data,
Hadoop has grown into a mature tool for
distributed storage and analysis of data
that fits poorly into standard relational
tables.
Though incredibly powerful, Hadoop is
not only complicated but often poorly
understood outside the data science
community. As with Big Data, IT often
receives mandates to implement
Hadoop because every other data-driven
organization is using it…Aren’t they?
ziffdavis.com 8 of 9
Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between
if they wish to remain competitive. Similarly, IT can’t afford, in a very literal sense, to simply
accumulate massive datasets and not deploy platforms that enable users to leverage them in
real time for both operational and strategic purposes.
Asking the Right Questions
One of the most challenging aspects of Big Data analytics is simply being able to ask the
right questions. In traditional data collection activities like clinical drug trials or educational
assessments, questions and hypotheses are formulated in advance and data structures are
built specifically to answer those questions: “Is this curriculum associated with a statistically
significant improvement in test scores?” and “Does treatment with this medication improve
clinical outcomes when compared to placebo?”
With Big Data, however, users need to be able to explore and visualize the data before they
can start asking meaningful questions. Especially with unstructured data, questions can
rarely be precisely formulated in advance. Exploratory tools, though, like those found in IBM
SPSS Modeler, let users connect with statisticians and data scientists, asking much more
open-ended questions. For example, “There appears to be a group of customers who aren’t
returning while another group appears to be quite loyal. Are there underlying characteristics of
the two groups that could explain this split? And have any of our advertising campaigns been
able to bring back customers? What are defining characteristics of the customers we won
back?”
Statisticians aren’t marketers, quality control engineers, manufacturers, or sales staff. They
have the expertise to answer the questions but require input from lines of business and
subject matter experts to know what questions need answering. Again, this is where IT enters
the picture. IT needs to provide the tools that let salespeople talk to statisticians.
Yes, Your Users Can Access Hadoop
Hadoop is intimidating even to experienced users. Hadoop and the data it is designed to
manage and analyze are simply too complicated for end users to jump in and begin the kinds
of exploratory analysis described above. IBM SPSS Analytical Server, though, provides a
connection to a variety of data sources (including Hadoop) while IBM SPSS Catalyst gives
users a unique browser-based means of exploring the aggregated data, regardless of its
source. Each of these components contributes to the dialog between data scientists
and users.
Performance and Scalability, No Matter How “Big” the Data
Because this platform can scale from a single-user desktop deployment of SPSS Modeler
to a full-blown predictive analytics ecosystem, the tools include several performance
enhancements. SQL pushback is built into SPSS Analytical Server, a technique that allows
ziffdavis.com 9 of 9
Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between
SQL database servers to execute code on their own hardware.
SPSS Analytical Server also supports analysis of real-time data streams. While Hadoop is
well-suited to dealing with very large datasets and batch processing of data, real-time data
will quickly overwhelm Hadoop. Analytical Server, on the other hand, can deliver real-time
analytical capabilities on large numbers of large data streams. It also speeds analytics, whether
the results are being delivered to customers in an e-commerce setting or enterprise users
exploring potential relationships in Big Data applications.
Conclusion: Teaching Users What They Want, Giving Them What They Need
IT has a unique opportunity in IBM SPSS predictive analytics tools to deliver a robust, highly
scalable solution that meets the needs of heterogeneous users in ways that few other
platforms can. In bringing these tools to an organization, IT can then bring a range of predictive
analytics to bear on a variety of business problems. In fact, SPSS predictive software is
a complete solution for harnessing Hadoop, relational databases, and even the mass of
spreadsheets that tend to accumulate in lines of business.
When users aren’t clear on their data analysis needs (and they generally aren’t), tools
like SPSS Modeler are sufficiently flexible to help both IT and statisticians translate user
requirements into data-rich applications. Perhaps more importantly, this ecosystem of tools
can make data stores that are utterly inaccessible to most users into deeply interactive
environments that connect lines of business to decision-makers and data scientists whose
work would otherwise not be well-informed by “feet on the ground.”
To learn more about IBM SPSS Modeler, Analytical Server, and Catalyst, visit:
http://www-01.ibm.com/software/analytics/applications/big-data/

Big Data, Little Data, and Everything in Between

  • 1.
    White Paper Big Data,Little Data, and Everything in Between – IBM SPSS Solutions Help You Bring Analytics to Everyone Contents Executive Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Do Your Users Really Have “Big Data?” Does it Matter?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Trends in Predictive Analytics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Opportunities Created by Effective Predictive Analytics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Predictive Analytics Next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Heterogeneous Users and Diverse Use Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Real-time and Embedded Analytics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 The Bottom Line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Big Data – Complicated, Messy, and Really Useful. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Asking the Right Questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Yes, Your Users Can Access Hadoop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Performance and Scalability, No Matter How “Big” the Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusion: Teaching Users What They Want, Giving Them What They Need. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 ® ® About Ziff Davis B2B Ziff Davis B2B is a leading provider of research to technology buyers and high-quality leads to IT vendors. As part of the Ziff Davis family, Ziff Davis B2B has access to over 50 million in-market technology buyers every month and supports the company’s core mission of enabling technology buyers to make more informed business decisions. Contact Ziff Davis B2B 100 California Street, Suite 650 San Francisco, CA 94111 Tel: 415.318.7200  |  Fax: 415.318.7219 Email: marty_fettig@ziffdavis.com www.ziffdavis.com Copyright © 2014 Ziff Davis B2B. All rights reserved.
  • 2.
    ziffdavis.com 2 of9 Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between ziffdavis.com Executive Summary “Big Data” has been on corporate radar screens for years now. Unfortunately, outside of the data scientists and statisticians who spend their days immersed in truly complex datasets, both end users and key decision makers often struggle to make sense of the data their organizations collect and generate. Whether an organization’s data is really “big” is even a topic of debate. Characterizing a company’s data, though, can’t simply be dismissed as a matter of semantics. It frequently falls on IT to provide appropriate analytics solutions for heterogeneous users with a wide range of skill sets, job descriptions, and analytical needs, whether the data being analyzed is truly unstructured, web-scale data or just too many spreadsheets. Regardless of the type or scale of data your users need to harness and analyze, they need a straightforward, visual solution that is easy to use on the front end and highly scalable on the back end. Fortunately, IBM SPSS Modeler, SPSS Analytical Server, and SPSS Analytical Catalyst provide just such an ecosystem that can make different kinds of data stores, from Hadoop to those proverbial spreadsheets, useful sources of business insight and decision support.
  • 3.
    ziffdavis.com 3 of9 Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between Introduction Modern businesses no longer struggle to collect customer data, record internal metrics, or even build web-scale data warehouses. As the volume of available data continues to increase exponentially and our ability to collect it from a multitude of sources has improved dramatically, the real problem has become how to turn all of this data into usable information. How can the data drive strategic planning and tactical decision-making in concrete ways from the executive boardroom down to specific lines of business? In the same way, IT departments have become quite adept at building storage and data management infrastructures. Storage virtualization, cloud technologies, and even Hadoop clusters let IT collect, store, and manage all manner of data. However, as the “keepers of the data”, IT is also frequently asked to implement and support an analytics solution that can do more with all of this data than merely spit out reports. Easy-to-use and big data analytics are not two concepts that usually go hand-in-hand, but whatever solution IT delivers needs to be flexible and scalable on the backend while meeting the needs of a variety of users on the front end. It needs to connect to existing data stores and be ready for future sources of data, the structure and scale of which may not even be predictable. Too often, this puts IT in the unenviable position of rolling out a solution to a very poorly defined problem. Do Your Users Really Have “Big Data?” Does it Matter? One way to approach the widely varying needs of end users is to look for multiple solutions that suit particular analytics requirements. For example, human resources may want to analyze metrics collected from various departments and job classifications to help determine pay grades and compensation. The data they need to examine would hardly be considered “big data” but would be completely overwhelming in basic pivot tables or spreadsheets. What is Big Data? The term “Big Data” is used so frequently that it would hardly seem to require a definition. Yet it is frequently misused and misunderstood. IT administrators know that: • Everyone thinks they have big data • Everyone believes they should be leveraging big data • If they happen to not have big data or the tools to analyze it, they want them…now In reality, data is not so easily quantified. That said, “big data” refers to collections of data that are too large and complicated for management and analysis with standard tools. These tools were built for relational databases that pre-date a ubiquitous World Wide Web, machine-to-machine data, and unstructured data that now dominate our most challenging analytical tasks
  • 4.
    ziffdavis.com 4 of9 Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between The marketing department, on the other hand, may be analyzing potentially millions of records from social media, online advertising, and overall market trends, and attempting to correlate that information with actual brick and mortar point of sale data. They may be encountering unstructured data, data that normalize poorly, and both transactional and historical data in near real-time. Most people would consider this a far better example of big data than the HR information being analyzed above. But does it matter? Probably not, if IT can identify a single, unified analytics platform that can scale both on the back end and for end users, no matter how “big” their data. Realistically, the IT department can only support a finite number of tools and it is likely that others in the organization will want to analyze aggregated data that spans the business, a task made much harder with disparate data management and analytics tools. Trends in Predictive Analytics This shift away from traditional data management paradigms with statisticians and data scientists as the sole end users of an organization’s data is paralleled by a move away from strict analytical reporting and towards predictive analytics. Predictive analytics have been a hallmark of business intelligence and decision support systems for some time, but again, these systems have largely been the domain of statisticians and with executives enjoying the insights they provide. Now, however, systems are emerging that allow a much larger group of end users to use historical and transactional data to model business problems and predict potential outcomes. The idea of being “data-driven” is extending beyond the C-Suite and trickling down to the rest of the organization. Tools for predictive analytics are: • Becoming visual and easier to use so that they are accessible to many users • Becoming differentiated and/or scalable, making them suitable for statisticians to build advanced models and for line of business employees to intelligently formulate questions and use them for front-line decision-making • Enabling embedded features such that even customer-facing applications can include predictive features Opportunities Created by Effective Predictive Analytics “Want of foresight, unwillingness to act when action would be simple and effective, lack of clear thinking, confusion of counsel until the emergency comes…these are the features which constitute the endless repetition of history.” - Winston Churchill No, Winston Churchill was not talking about predictive analytics or big data in 1935 when he made these remarks. But predictive analytics are, in fact, a key enabler of so-called “organizational learning.” Businesses can ask how to better meet customer needs, respond
  • 5.
    ziffdavis.com 5 of9 Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between to market fluctuations, manage risk, and otherwise seek new competitive advantages by developing predictive models based on historical data. When implemented correctly, organizations can use predictive models to answer critical questions that are far better addressed statistically than with intuition: • How is the current business environment like environments the organization has encountered in the past? • What approaches worked well then? What approaches didn’t? • What patterns of customer behavior can we correlate with products, marketing, and strategic shifts? • What changes led to emerging quality problems or customer complaints? • What is the general perception of our products in social media? And what effects do particular campaigns have on those perceptions? While these are high-level questions that predictive analytics can help answer, the right software can also suggest operational adjustments. Recent high-profile data breaches also highlight opportunities that can be created by predictive analytics tools. For example, companies could identify transactional patterns associated with an ongoing attack and address vulnerabilities before they reach critical scale. Predictive Analytics Next Large organizations have used predictive analytics for years. Researchers have employed predictive techniques and tools to model everything from climate change to the efficacy of cancer drugs. However, the next generation of predictive tools is here. These tools are accessible enough to find their way into the hands of end users and embedded predictive analytics are increasingly being surfaced to customers in online applications and ecommerce. As a result, we’re seeing predictive tools pushed down to operations and moving into the realm of not just business intelligence but “predictive intelligence.” As businesses in all sectors look to create cultures of data, IT departments are being asked to identify solutions that empower end users with robust predictive tools. The traditional “decision support system” is too far removed from daily decision making and is better suited to strategic planning. Heterogeneous Users and Diverse Use Cases Instead, increasingly savvy users are demanding access to streams of real-time data, vital historical information, and far more complex data than the distilled reports that many businesses provide. Interactive dashboards that include predictive analytics and deep drill-down and visualization capabilities are quickly replacing simple BI scorecards.
  • 6.
    ziffdavis.com 6 of9 Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between Simultaneously, the data czars in an organization (usually statisticians and data scientists) need to be able to develop increasingly sophisticated analytics applications to surface to users. Real-time and Embedded Analytics Advanced analytics platform must at once: • Accesses data stores from across an organization • Supports the development of complex applications and deep data insights • Be nearly transparent to most end users The multifaceted nature of current (and future) analytical needs is driving the growth of embedded analytics. In particular, embedded predictive analytics support everything from customer recommendation engines to line of business applications like CRM that improve customer service and responsiveness in sales and marketing teams. In fact, for predictive analytics to be truly transformative in an organization and accessible to the broadest cross-section of users, a growing number of IT and BI professionals believe that users shouldn’t even realize they are accessing predictive tools. Rather, they should be application-embedded such that users are seamlessly provided with decision support, without any need to conduct their own analyses. For example: • Field agents in homeland security positions should not need to log into a separate analytics application to gain insight into emerging threats based on increased chatter on social media • Customers visiting a website should automatically be presented with product recommendation tied to past purchases, profiles built from similar users, and their current locations • Insurance agents should have a complete view of a client’s risk profile that aggregates everything from credit scores to prior claims to healthcare data The Bottom Line for IT Businesses, and their IT departments in particular, must substantially alter their definition of users to include customers, partners, internal end users, developers, statisticians, and executives. Fundamentally, when IT groups are asked to implement predictive analytics solutions, they are actually being asked to provide an ecosystem of platforms and tools suitable for every user covered by this new definition, enabling them to make better decisions faster.
  • 7.
    ziffdavis.com 7 of9 Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between As we’ve seen, responsibility for business analytics is increasingly being taken on from the executive to the operational levels of modern enterprises with statisticians and data scientists leading the most complex and strategically important analytical initiatives. Although IT has some analytical needs in its own right (e.g., tracking hardware capacity, application readiness, etc.), IT’s real focus is on providing platforms. Integrated platforms that can support: 1. Complex analytics with hooks into Hadoop and other varied data stores 2. More basic standalone analytics needs 3. Executive-level decision support 4. Embedded predictive analytics are hard to find in the market today No discussion of predictive analytics tools would be complete without addressing ways that they address Big Data. As we will see in the next section, IBM SPSS Modeler, SPSS Analytical Server, and SPSS Analytical Catalyst form exactly the sort of integrated platform outlined above that can address both Big Data needs and satisfy requirements for analysis of local data stores. Big Data – Complicated, Messy, and Really Useful Actually using Big Data in meaningful and insightful ways, influence customers, and, as described in the discussion of predictive analytics above, “make better decisions faster”, is one of the greatest challenges facing organizations today. Big Data is messy for several reasons: • Its scale is such that many tools buckle under the sheer volume of records involved • Data often don’t fit (because of their inherent structure or lack thereof) into the neat, glorified spreadsheets to which users are accustomed • Data must often be aggregated from sources that were never meant to be merged and joined to generate insights All of these challenges aside, organizations can’t afford to ignore their vast stores of data What is Hadoop? Hadoop is an open source technology for storing, indexing, and analyzing very complicated datasets. Originally conceived by Google to perform deep analytics on unstructured search data, Hadoop has grown into a mature tool for distributed storage and analysis of data that fits poorly into standard relational tables. Though incredibly powerful, Hadoop is not only complicated but often poorly understood outside the data science community. As with Big Data, IT often receives mandates to implement Hadoop because every other data-driven organization is using it…Aren’t they?
  • 8.
    ziffdavis.com 8 of9 Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between if they wish to remain competitive. Similarly, IT can’t afford, in a very literal sense, to simply accumulate massive datasets and not deploy platforms that enable users to leverage them in real time for both operational and strategic purposes. Asking the Right Questions One of the most challenging aspects of Big Data analytics is simply being able to ask the right questions. In traditional data collection activities like clinical drug trials or educational assessments, questions and hypotheses are formulated in advance and data structures are built specifically to answer those questions: “Is this curriculum associated with a statistically significant improvement in test scores?” and “Does treatment with this medication improve clinical outcomes when compared to placebo?” With Big Data, however, users need to be able to explore and visualize the data before they can start asking meaningful questions. Especially with unstructured data, questions can rarely be precisely formulated in advance. Exploratory tools, though, like those found in IBM SPSS Modeler, let users connect with statisticians and data scientists, asking much more open-ended questions. For example, “There appears to be a group of customers who aren’t returning while another group appears to be quite loyal. Are there underlying characteristics of the two groups that could explain this split? And have any of our advertising campaigns been able to bring back customers? What are defining characteristics of the customers we won back?” Statisticians aren’t marketers, quality control engineers, manufacturers, or sales staff. They have the expertise to answer the questions but require input from lines of business and subject matter experts to know what questions need answering. Again, this is where IT enters the picture. IT needs to provide the tools that let salespeople talk to statisticians. Yes, Your Users Can Access Hadoop Hadoop is intimidating even to experienced users. Hadoop and the data it is designed to manage and analyze are simply too complicated for end users to jump in and begin the kinds of exploratory analysis described above. IBM SPSS Analytical Server, though, provides a connection to a variety of data sources (including Hadoop) while IBM SPSS Catalyst gives users a unique browser-based means of exploring the aggregated data, regardless of its source. Each of these components contributes to the dialog between data scientists and users. Performance and Scalability, No Matter How “Big” the Data Because this platform can scale from a single-user desktop deployment of SPSS Modeler to a full-blown predictive analytics ecosystem, the tools include several performance enhancements. SQL pushback is built into SPSS Analytical Server, a technique that allows
  • 9.
    ziffdavis.com 9 of9 Ziff Davis | White Paper |  Big Data, Little Data and Everything in Between SQL database servers to execute code on their own hardware. SPSS Analytical Server also supports analysis of real-time data streams. While Hadoop is well-suited to dealing with very large datasets and batch processing of data, real-time data will quickly overwhelm Hadoop. Analytical Server, on the other hand, can deliver real-time analytical capabilities on large numbers of large data streams. It also speeds analytics, whether the results are being delivered to customers in an e-commerce setting or enterprise users exploring potential relationships in Big Data applications. Conclusion: Teaching Users What They Want, Giving Them What They Need IT has a unique opportunity in IBM SPSS predictive analytics tools to deliver a robust, highly scalable solution that meets the needs of heterogeneous users in ways that few other platforms can. In bringing these tools to an organization, IT can then bring a range of predictive analytics to bear on a variety of business problems. In fact, SPSS predictive software is a complete solution for harnessing Hadoop, relational databases, and even the mass of spreadsheets that tend to accumulate in lines of business. When users aren’t clear on their data analysis needs (and they generally aren’t), tools like SPSS Modeler are sufficiently flexible to help both IT and statisticians translate user requirements into data-rich applications. Perhaps more importantly, this ecosystem of tools can make data stores that are utterly inaccessible to most users into deeply interactive environments that connect lines of business to decision-makers and data scientists whose work would otherwise not be well-informed by “feet on the ground.” To learn more about IBM SPSS Modeler, Analytical Server, and Catalyst, visit: http://www-01.ibm.com/software/analytics/applications/big-data/