Cc unit 5

Cloud Computing (KCS-713):
Unit-5: Cloud Technologies And Advancements
Hadoop
Dr. Radhey Shyam
Professor
Department of Computer Science and Engineering
SRMGPC Lucknow
(Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Lucknow)
Unit-5 have been compiled/prepared by Dr. Radhey Shyam, with grateful acknowledgment who made their
course contents freely available. Feel free to use this study material for your own academic purposes. For
any query, the communication can be made through my mail shyam0058@gmail.com.
Date: November 26, 2021

Cloud Computing (KCS713)
(UNIT – V)
Cloud Technologies And Advancements Hadoop
1. Hadoop - MapReduce
Hadoop is an open source software framework used to develop data processing applications which
are executed in a distributed computing environment. There are (of Hadoop Architecture) basically
two components in Hadoop:
The first one is HDFS for storage (Hadoop distributed File System), that allows you to store data of
various formats across a cluster.
The second one is YARN, for resource management in Hadoop. It allows parallel processing over
the data, i.e. stored across HDFS.
Fig. : Hadoop Framework
MapReduce is the core component for data processing in Hadoop framework. It is a processing
technique built on divide and conquer algorithm. It is made of two different tasks - Map and Reduce.
Map takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples. Secondly, reduce task, which takes the output from a map as an input and
combines those data tuples into a smaller set of tuples and fetches it.
1

How MapReduce Algorithm Works? The whole process goes through four phases of execution
namely, splitting, mapping, shuffling, and reducing. The data goes through the following phases:
Input Splits: In this phase it takes input tasks (say Data Sets) and divided into fixed-size pieces
called input splits.
Mapping: This is the very first phase in the execution of map-reduce program. It takes input tasks
(say DataSets) and divides them into smaller sub-tasks. Then perform required computation on each
sub-task in parallel. The output of this Map Function is a set of key and value pairs in the form of
<word, frequency>.
Shuffling: Shuffle Function is also known as “Combine Function”. It performs the following two sub-
steps:
 Merging
 Sorting
This phase consumes the output of mapping phase and performs these two sub-steps on each and
every key-value pair.
o Merging step combines all key-value pairs which have same keys.
o Sorting step takes input from merging step and sorts all key-value pairs by using Keys.
Finally, Shuffle Function returns a list of <Key, List<Value>> sorted pairs to next step.
Reducing: In this phase, output values from the shuffling phase are aggregated. This phase
combines values from shuffling phase and returns a single output value. In short, this phase
summarizes the complete dataset.
Let's understand this with an example –
Consider you have following input data for your Map Reduce Program
Welcome to Hadoop Class
Hadoop is good, Hadoop is bad
2

The final output of the MapReduce task is
2. Virtual Box: Virtual Box is open-source software for virtualizing the x86 computing
architecture [x86 is Intel CPU architecture. Today, the term "x86" is used generally to refer to any
32-bit processor compatible with the x86 instruction set]. It acts as a hypervisor, creating a VM
(virtual machine) in which the user can run another OS (operating system).
The operating system in which Virtual Box runs is called the "host" OS. The operating system
running in the VM is called the "guest" OS. Virtual Box supports Windows, Linux, or macOS as its
host OS.
Guest operating systems supported by Virtual Box include:
3
bad 1
Class 1
good 1
Hadoop 3
is 2
to 1
Welcom
e
1

o Windows 10, 8, 7, XP, Vista, 2000, NT, and 98.
o Solaris and OpenSolaris
o MS-DOS.
o OS/2
o QNX
o BeOS
3. Google App Engine (GAE): Google App Engine is a Platform-as-a-Service. Amongst its various
cloud-based products, Google app engine has become quite popular.
It is a service for developing and hosting Web applications in Google's data centers, belonging to the
platform as a service (PaaS) category of cloud computing. These applications are required to be
written in one of a few supported languages, namely: Java, Python, GO, PHP etc. It is basically a
cloud-computing platform through which applications can be run in a serverless environment. The
app engine supports the delivery, testing and development of software on demand in a Cloud
computing environment that supports millions of users and is highly scalable.
The company extends its platform and infrastructure to the Cloud through its app engine. It
presents the platform to those who want to develop SaaS solutions at competitive costs.
Features of App Engine
A. Runtimes and Languages: You can use Go, Java, PHP or Python to write an app engine
application. You can develop and test an app locally using the SDK containing tools for deploying
apps. Every language has its own SDK and runtime. Your code is executed in a:
 Java 7 environment by Java runtime
 Python 2.7 environment by Python runtime
 PHP 5.4 environment by PHP runtime
 Go 1.2 environment by Go runtime
B. Generally Available Features: These are covered by the depreciation policy and the service-
level agreement of the app engine. Any changes made to such a feature are backward-compatible
and implementation of such a feature is usually stable. These include data storage, retrieval, and
search; communications; process management; computation; app configuration and management.
C. Features in Preview: These features are sure to ultimately become generally available features
in some release of the app engine in the future. However, their implementation might change in
backward-incompatible ways, as these are in the preview. These include Sockets, Map Reduce and
Google Cloud Storage Client Library.
Preview features include Google Cloud storage client library, sockets, and Map Reduce.
4

D. Experimental Features: These might or might not become generally available in app engine
releases in the future. The experimental features include Appstats Analytics, Restore / Backup /
Datastore Admin, Task Queue Tagging, MapReduce, Task Queue REST API, OAuth, Prospective
Search, PageSpeed and OpenID.
Advantages of Google App Engine: There are many advantages to the Google App Engine that
helps to take your app ideas to the next level. This includes:
Infrastructure for Security : Around the world, the Internet infrastructure that Google has is
probably the most secure. There is rarely any type of unauthorized access to date as the application
data and code are stored in highly secure servers.
Faster Time to Market: Quickly releasing a product or service to market is the most important thing
for every business. Stimulating the development and maintenance of an app is critical when it comes
to deploying the product fast. With the help of Google cloud app Engine, a business can quickly
develop-
 Feature-rich apps with a quick development process
 The backend application in a PaaS style environment
 NoSQL style storage, flexible data storage, or Google Cloud SQL for relational database
support.
Quick to Start: With no product or hardware to purchase and maintain, you can prototype and
deploy the app to your users without taking much time.
Easy to Use: Google App Engine (GAE) incorporates the tools that you need to develop, test,
launch, and update the applications.
Rich set of APIs & Services:
Google App Engine has several built-in APIs and services that allow developers to build robust and
feature-rich apps. These features include:
 Access to the application log
 Blobstore, serve large data objects
 Google Cloud Storage
 SSL Support
 Page Speed Services
 Google Cloud Endpoint, for mobile application
 URL Fetch API, User API, Memcache API, Channel API, XXMP API, File API
Platform Independence: You can move all your data to another environment without any difficulty
as there are not many dependencies on the app engine platform.
Cost Savings: You don’t have to hire engineers to manage your servers or to do that yourself. You
can invest the money saved into other parts of your business.
5

Performance and Reliability: Google is among the leaders worldwide among global brands. So,
when you discuss performance and reliability you have to keep that in mind. In the past 15 years, the
company has created new benchmarks based on its services’ and products’ performance. The app
engine provides the same reliability and performance as any other Google product.
4. Programming Environment for GAE :
Build and deploy applications on a fully managed platform. Scale your applications seamlessly from
zero to planet scale without having to worry about managing the underlying infrastructure. With zero
server management and zero configuration deployments, developers can focus only on building
great applications without the management overhead. App Engine enables developers to stay more
productive and agile by supporting popular development languages and a wide range of developer
tools.
Open and familiar languages and tools: Quickly build and deploy applications using many of the
popular languages like Java, PHP, Node.js, Python, C#, .Net, Ruby, and Go or bring your own
language runtimes and frameworks if you choose.
Manage resources from the command line, debug source code in production, and run API backends
easily, using industry-leading tools such as Cloud SDK, Cloud Source Repositories, IntelliJ IDEA,
Visual Studio, and PowerShell.
Just add code: Focus just on writing code, without the worry of managing the underlying
infrastructure. With capabilities such as automatic scaling-up and scaling-down of your application
between zero and planet scale, fully managed patching and management of your servers, you can
offload all your infrastructure concerns to Google. Protect your applications from security threats
using App Engine firewall capabilities, identity and access management (IAM) rules, and managed
SSL/ TLS certificates.
Pay only for what you use: Choose to run your applications in a serverless environment without
the worry of over or under provisioning. App Engine automatically scales depending on your
application traffic and consumes resources only when your code is running. You will only need to
pay for the resources you consume.
Features:
Popular languages
Open and flexible
Fully managed
Monitoring, logging, and diagnostics
6

Application versioning
Traffic splitting
Application security
Services ecosystem
5. Open Stack:
OpenStack is an open source cloud computing platform that allows businesses to control large pools
of compute, storage and networking in a data centre. It uses pooled virtual resources to build and
manage private and public clouds.
So OpenStack is Infrastructure-as-a-Service (IaaS) solution that consists a set of interrelated
services. OpenStack is highly configurable it means there are many different ways to use
OpenStack, which makes it a flexible tool that is able to work along with other software.
Another reason to adopt OpenStack is that it supports different hypervisors (Xen, VMware or
kernel-based virtual machine [KVM] for instance) and several virtualization technologies (such as
bare metal or high-performance computing).
OpenStack components: The OpenStack cloud platform is not a single thing, but a group of
software modules that serve different purposes. OpenStack components are shaped by open source
contributions from the developer community, and adopters can implement some or all of these
components. Key OpenStack components, by category, include:
o Compute- “Nova” is a full management and access tool to OpenStack compute resources—
handling scheduling, creation, and deletion.
o Storage- “Swift” an object storage service;
o Networking and content delivery- “Neutron” connects the networks across other
OpenStack services.
o Data and analytics- “Searchlight” a data indexing and search service;
o Security and compliance- “Barbican” a management service for passwords, encryption
keys and X.509 Certificates;
o Deployment- “Kolla” a service for container deployment;
o Management- “Rally” an OpenStack benchmark service;
o Applications- “Solum” a software development tool;
o Monitoring- “Monasca” a high-speed metrics monitoring and alerting service;
7

OpenStack pros and cons:
► Avoid vendor lock-in - It means makes a customer dependent on a vendor for products
and services,
unable to use another vendor without substantial switching costs. The most common vendor lock-in
is the operating system. When custom programs are written for a specific operating system, it is time
consuming and costly to convert those programs to another platform.
► Strong security - it has outstanding security features that keep you secure all the time.
► Open-source- OpenStack is open-source that makes it the favourite cloud software for
the developers and entrepreneurs. You can change OpenStack according to your
growing needs. Due to open-source, you can always add extra features. Thus, it
becomes very flexible software. You can use it without any restrictions - OpenStack is
free of cost, and there are no restrictions to use it.
► Development support- OpenStack has been receiving a concrete development support
from many prestigious companies and from the top developers of the IT industry for
many years.
► An array of services for different tasks.
► Easy to access and manage OpenStack.
But potential enterprise adopters must also consider some drawbacks.
Perhaps the biggest disadvantage of OpenStack is its very size and scope -- such complexity
requires an IT staff to have significant knowledge to deploy the platform and make it work. In some
cases, an organization might require additional staff or a consulting firm to deploy OpenStack, which
adds time and cost.
As open source software, OpenStack is not owned or directed by any one vendor or team.
This can make it difficult to obtain support for the technology -- other than support from the open
source community.
6. Federation in the cloud:
A cloud federation is the deployment and management of multiple external and internal cloud
computing services to match business needs. It means that the functions and resources of two
geographically different clouds are completely available to each other. A federation is the union of
several smaller parts that perform a common action.
8

Consistency and access controls are managed when two or more independent geographically
distributed clouds share authentication, files, computing resources, control structures or access to
storage resources. This means that the right information must flow from one cloud to the other and
vice-versa.
There are four basic types of federation: 1) Permissive 2) Verified 3) Encrypted 4) Trusted
What happens in a Federated Cloud?
In a federated cloud, the boundary between two clouds is always present. But, the elements of the
boundary which prevent the interoperability of two clouds are removed. The relevancy and visibility
depend on who is doing what kind of action to complete a task.
CLOUD FEDERATION BENEFITS:
1) The federation of cloud resources allows client to optimize enterprise IT service delivery.
2) The federation of cloud resources allows a client to choose. The best cloud service
providers in terms of flexibility cost and availability of services to neat a particular business or
technological need within their organization.
3) Federation across different cloud resources pools allows applications to run in the most
appropriate infrastructure environments.
4) The federation of cloud resources allows an enterprise to distribute workload around the
globe and move data between desperate networks and implement innovative security models for
user access to cloud resources.
6.1 Level of federations: Each cloud federation level presents different challenges and operates at
a different layer of the IT stack. It then requires the use of different approaches and technologies.
Taken together, the solutions to the challenges faced at each of these levels constitute a reference
model for a cloud federation.
9

CONCEPTUAL LEVEL: The conceptual level addresses the challenges in presenting a cloud
federation as a favorable solution with respect to the use of services leased by single cloud
providers. In this level it is important to clearly identify the advantages for either service providers or
service consumers in joining a federation and to describe the new opportunities that a federated
environment creates with respect to the single-provider solution.
Elements of concern at this level are:
 Motivations for cloud providers to join a federation.
 Motivations for service consumers to leverage (lift) a federation.
 Advantages for providers in leasing their services to other providers.
 Obligations of providers once they have joined the federation.
 Trust agreements between providers.
 Transparency versus consumers.
LOGICAL & OPERATIONAL LEVEL: The logical and operational level of a federated cloud
identifies and addresses the challenges in devising a framework that enables the aggregation of
providers that belong to different administrative domains within a context of a single overlay
infrastructure, which is the cloud federation.
10

At this level, policies and rules for interoperation are defined. Moreover, this is the layer at which
decisions are made as to how and when to lease a service to—or to leverage a service from—
another provider.
The logical component defines a context in which agreements among providers are settled and
services are negotiated, whereas the operational component characterizes and shapes the dynamic
behaviour of the federation as a result of the single providers’ choices.
This is the level where Maintenance Operations Control Centre (MOCC) is implemented and
realized. It is important at this level to address the following challenges:
• How should a federation be represented?
• How should we model and represent a cloud service, a cloud provider, or an agreement?
• How should we define the rules and policies that allow providers to join a federation?
• What are the mechanisms in place for settling agreements among providers?
• What are provider’s responsibilities with respect to each other?
• When should providers and consumers take advantage of the federation?
• Which kinds of services are more likely to be leased or bought?
• How should we price resources that are leased, and which fraction of resources should we lease?
The logical and operational level provides opportunities for both academia and industry.
INFRASTRUCTURE LEVEL: The infrastructural level addresses the technical challenges involved in
enabling heterogeneous cloud computing systems to interoperate seamlessly. It deals with the
technology barriers that keep separate cloud computing systems belonging to different
administrative domains. By having standardized protocols and interfaces, these barriers can be
overcome.
At this level it is important to address the following issues:
• What kind of standards should be used?
• How should design interfaces and protocols be designed for interoperation?
• Which are the technologies to use for interoperation?
• How can we realize a software system, design platform components, and services enabling
interoperability?
6.2 Federated Services and Applications:
Active Directory Federation Service (AD FS) enables Federated Identity and Access Management by
securely sharing digital identity and entitlements rights across security and enterprise boundaries. In
ADFS, an identity federation is constructed between two organizations. On one side is the federation
server, which authenticates the user through standard accepted means using an active directory and
issues tokens containing the user's claims. On the other side is the resource. Federation services
11

validate this token and accept the claimed identity. This allows the federation to provide a user with
access to resources that essentially belong to another secure server.
It provides a secure, reliable, scalable, and extensible identity federation solution.
6.2.1 ADFS Functionality: ADFS 2.0 uses a true claim-based approach to authentication,
authorization, and federation. ADFS takes a standards-based approach to implementing
functionality. This allows greater interoperability with other token services and claims-based IdPs
(identity provider).
 Claims-Based Authentication Clients: ADFS provides full claims-based authentication (CBA)
functionality by
supporting both active and passive clients. Passives clients generally use in web-site-based
activities. Most web browsers have built-in passive CBA client functionality. Active clients are a
little bit different; they are mostly used with web services. Active CBA clients are usually
developed using the Windows Identity Foundation framework.
 Security Assertion Markup Language (SAML): In order to provide standard token support,
ADFS supports the use of SAML. This allows it to be compatible with a wide range of federation
technologies. It can interoperate with virtually any implementation that adheres to the SAML
standard.
 Federation with Other Secure Token Servers: ADFS supports federation with other Secure
Token Servers (STSs). This allows you to trust tokens that were generated by another issuer.
The federation server will then perform a token transformation. The federation server will pull the
claims from the incoming token and use them to create tokens of its own. The new token can
then be used by relying parties that trust you’re STS.
6.2.2 ADFS Components: An ADFS 2.0 implementation includes several key components. Each
component plays a different role in providing the total solution. We will cover each of these
components. They include the federation servers, the attribute store, relying parties, and endpoints.
 Federation Service: The Federation Service is one of the key components of an ADFS
environment. The Federation Service serves several purposes. The federation server is the
server that manages the tokens. Basically, it’s the server where the STS is installed. The
Federation Service manages the trust relationship with the relying parties. It also manages the
trust relationship with other IdPs. The federation server can be configured using the Federation
Server Configuration Wizard.
 Federation Proxy Servers: Federation Proxy Servers allow external users access to your
internal ADFS environment. A Federation Proxy Server can be installed in your DMZ (A
demilitarized zone (DMZ) refers to a host or network that acts as a secure and intermediate
network or path between an organization's internal network and the external). External users will
12

authenticate against the proxy. The proxy will forward the requests to your internal Federation
Server. This allows you to authenticate external users without having to let unauthenticated traffic
into your internal network
 Attribute Stores: The attribute store is where the values used for the claims are stored. After
authentication, the STS will query the attribute store to find the appropriate user information
needed to set the claims and create the token.
 Relying Parties: The relying party is the consumer of the claims created by the STS. Since
ADFS supports both active and passive clients, the relying parties can be web applications or
web services. The STS must be configured with the configuration information for each relying
party that it will support.
 Endpoints: Endpoints are used to provide access to services on the federation server. There are
several types of endpoints that can be used with ADFS including WS-Trust 1.3, WS-Trust 2005,
WS-Federation Passive, SAML SS0, Federation Metadata, SAML Artifact Resolution, and WS-
Trust WSDL.
6.2.3 Future of federation: Cloud Federation continues being an open issue in current cloud
market. Cloud Federation would address many existing limitations in cloud computing:
 Cloud end-users are often tied to a unique cloud provider, because of the different APIs,
image formats, and access methods exposed by different providers that make very difficult for
an average user to move its applications from one cloud to another, so leading to a vendor
lock-in problem.
 Many big companies (e.g. banks, hosting companies, etc.) and also many large institutions
maintain several distributed data-centers or server-farms, for example to serve to multiple
geographically distributed offices. Resources and networks in these distributed data-centers
are usually configured as non-cooperative separate elements, so that usually every single
service or workload is deployed in a unique site or replicated in multiple sites.
 Many educational and research centers often deploy their own computing infrastructures, that
usually do not cooperate with other institutions, except in same punctual situations (e.g. in
joint projects or initiatives). Many times, even different departments within the same institution
maintain their own non-cooperative infrastructures
This Study Group will evaluate the main challenges to enable the provision of federated cloud
infrastructures, with special emphasis on inter-cloud networking and security issues:
13

-Security and Privacy
-Interoperability and Portability
-Performance and Networking Cost
References:
[1] R Shyam, P Srivastava, DS Kushwaha, “A Taxonomy and Survey of Cloud Computing [Security
Issues and Challenges]”, BL Joshi, 62, 2012.
[2] PK Varshney, P Singh, R Shyam, “Weak Spots of Cloud Computing and Their Revelations”, BL
Joshi, 109, 2012.
[3] Kai Hwang, Geoffrey C. Fox, Jack G. Dongarra, “Distributed and Cloud Computing, From Parallel
Processing to the Internet of Things”, Morgan Kaufmann Publishers, 2012.
[4] Rittinghouse, John W., and James F. Ransome, “Cloud Computing: Implementation,
Management and Security”, CRC Press, 2017.
[5] Rajkumar Buyya, Christian Vecchiola, S. Thamarai Selvi, “Mastering Cloud Computing”, Tata
Mcgraw Hill, 2013.
[6] Toby Velte, Anthony Velte, Robert Elsenpeter, “Cloud Computing – A Practical Approach”, Tata
Mcgraw Hill, 2009.
[7] George Reese, “Cloud Application Architectures: Building Applications and Infrastructure in the
Cloud”: Transactional Systems for EC2 and Beyond (Theory in Practice), O’Reilly, 2009.
14

Cc unit 5

More Related Content

What's hot

Similar to Cc unit 5

More from Dr. Radhey Shyam

Recently uploaded

In this document

Cc unit 5