[OracleCode SF] In memory analytics with apache spark and hazelcast

1.
@gamussa @hazelcast #oraclecode IN-MEMORYANALYTICS with APACHE SPARK and HAZELCAST

2.
@gamussa @hazelcast #oraclecode SolutionsArchitect Developer Advocate @gamussa in internetz Please, follow me on Twitter I’m very interesting © Who am I?

3.
@gamussa @hazelcast #oraclecode What’sApache Spark? Lightning-Fast Cluster Computing

4.
@gamussa @hazelcast #oraclecode Runprograms up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

5.
@gamussa @hazelcast #oraclecode Whento use Spark? Data Science Tasks when questions are unknown Data Processing Tasks when you have to much data You’re tired of Hadoop

6.
@gamussa @hazelcast #oraclecode SparkArchitecture

7.
@gamussa @hazelcast #oraclecode

8.
@gamussa @hazelcast #oraclecode RDD

9.
@gamussa @hazelcast #oraclecode ResilientDistributed Datasets (RDD) are the primary abstraction in Spark – a fault-tolerant collection of elements that can be operated on in parallel

10.

11.
@gamussa @hazelcast #oraclecode RDDOperations

12.
@gamussa @hazelcast #oraclecode operationson RDDs: transformations and actions

13.
@gamussa @hazelcast #oraclecode transformationsare lazy (not computed immediately) the transformed RDD gets recomputed when an action is run on it (default)

14.
@gamussa @hazelcast #oraclecode RDD Transformations

15.

16.

17.
@gamussa @hazelcast #oraclecode RDD Actions

18.

19.

20.
@gamussa @hazelcast #oraclecode RDD FaultTolerance

21.

22.
@gamussa @hazelcast #oraclecode RDD Construction

23.
@gamussa @hazelcast #oraclecode parallelizedcollections take an existing Scala collection and run functions on it in parallel

24.
@gamussa @hazelcast #oraclecode Hadoopdatasets run functions on each record of a file in Hadoop distributed file system or any other storage system supported by Hadoop

25.
@gamussa @hazelcast #oraclecode What’sHazelcast IMDG? The Fastest In-memory Data Grid

26.
@gamussa @hazelcast #oraclecode HazelcastIMDG is an operational, in-memory, distributed computing platform that manages data using in-memory storage, and performs parallel execution for breakthrough application speed and scale

27.
@gamussa @hazelcast #oraclecode High-Density Caching In-Memory DataGrid Web Session Clustering Microservices Infrastructure

28.
@gamussa @hazelcast #oraclecode What’sHazelcast IMDG? In-memory Data Grid Apache v2 Licensed Distributed Caches (IMap, JCache) Java Collections (IList, ISet, IQueue) Messaging (Topic, RingBuffer) Computation (ExecutorService, M-R)

29.
@gamussa @hazelcast #oraclecode Green Primary Green Backup Green Shard

30.

31.
@gamussa @hazelcast #oraclecode finalSparkConf sparkConf = new SparkConf() .set("hazelcast.server.addresses", "localhost") .set("hazelcast.server.groupName", "dev") .set("hazelcast.server.groupPass", "dev-pass") .set("hazelcast.spark.readBatchSize", "5000") .set("hazelcast.spark.writeBatchSize", "5000") .set("hazelcast.spark.valueBatchingEnabled", "true"); final JavaSparkContext jsc = new JavaSparkContext("spark://localhost:7077", "app", sparkConf); final HazelcastSparkContext hsc = new HazelcastSparkContext(jsc); final HazelcastJavaRDD<Object, Object> mapRdd = hsc.fromHazelcastMap("movie"); final HazelcastJavaRDD<Object, Object> cacheRdd = hsc.fromHazelcastCache("my- cache");

32.

33.

34.

35.
@gamussa @hazelcast #oraclecode Demo

36.
@gamussa @hazelcast #oraclecode LIMITATIONS

37.
@gamussa @hazelcast #oraclecode DATASHOULD NOT BE UPDATED WHILE READING FROM SPARK

38.
@gamussa @hazelcast #oraclecode WHY?

39.
@gamussa @hazelcast #oraclecode MAPEXPANSION SHUFFLES THE DATA INSIDE THE BUCKET

40.
@gamussa @hazelcast #oraclecode CURSORDOESN’T POINT TO CORRECT ENTRY ANYMORE, DUPLICATE OR MISSING ENTRIES COULD OCCUR

41.
@gamussa @hazelcast #oraclecode github.com/hazelcast/hazelcast-spark

42.
@gamussa @hazelcast #oraclecode THANKS! Anyquestions? You can find me at @gamussa viktor@hazelcast.com

[OracleCode SF] In memory analytics with apache spark and hazelcast

More Related Content

What's hot

Viewers also liked

Similar to [OracleCode SF] In memory analytics with apache spark and hazelcast

More from Viktor Gamov

Recently uploaded

[OracleCode SF] In memory analytics with apache spark and hazelcast