COLLECTIONS FRAMEWORK
PROGRAMMAZIONE CONCORRENTE E DISTR.
Università degli Studi di Padova
Dipartimento di Matematica
Corso di Laurea in Informatica, A.A. 2015 – 2016
rcardin@math.unipd.it
Programmazione concorrente e distribuita
SUMMARY
 Introduction
 Collections and iterators
 Linked list
 Array list
 Hash set
 Tree set
 Maps
 Collections framework
2Riccardo Cardin
Programmazione concorrente e distribuita
INTRODUCTION
 Data structures can make a big difference in
programming
 Do you need to search quickly? Do you need to rapidly
insert and remove element? ...
 The first version of Java was supplied with a very
small set of classes
 Vector, Stack, Hashtable, BitSet and Enumeration
 Java needed a serious data structure library
 Without the complexity of C++ STL
 With the benefit of «generic algorithms» of C++ STL
 The collections framework satisfies those needs
3Riccardo Cardin
Programmazione concorrente e distribuita
INTRODUCTION
 Java collection framework separates interfaces
and implementations
 It defines a set of interfaces that represents abstract
data structures
 Each interface is then implemented in different ways,
each with main focus on some aspects
 For example, fast insertion, fast search, fixed memory
consumption, ...
 Using interfaces, you can change the implementation
simply modifying a single statement
4Riccardo Cardin
// ArrayList optimizes random access, LinkedList modification stmts
List<Integer> list = new ArrayList<>();
list = new LinkedList<>()
Programmazione concorrente e distribuita
INTRODUCTION
5Riccardo Cardin
Programmazione concorrente e distribuita
COLLECTIONS AND ITERATORS
 The fundamental interface for collection is the
Collection interface
 The add method adds an element ot the collection
 Returns true if the collection was changed by the addition
 Iterators are used visit elements in the collection
 Implements the Iterator design pattern
6Riccardo Cardin
public interface Collection<E> {
boolean add(E element);
Iterator<E> iterator();
// ...
}
public interface Iterator<E> {
E next();
boolean hasNext();
void remove();
}
Programmazione concorrente e distribuita
COLLECTIONS AND ITERATORS
 Using iterators we have decoupled a collection
from the traversing policies
 The order in which elements are visited depends on
the the collection type
 Using the next method the collection is visited one
element at time
 If the end of the collection is reached an
NoSuchElementException is thrown
 Use the hasNext method to check if the collection
has more elements to visit
 Think of Java iterators as being between elements
 The iterator jumps over the next element
7Riccardo Cardin
Programmazione concorrente e distribuita
COLLECTIONS AND ITERATORS
8Riccardo Cardin
Think of Java iterators as being
between elements
While moving, the iterator returns
the current element
Programmazione concorrente e distribuita
COLLECTIONS AND ITERATORS
 Using next and hasNext it is possibile to
traverse the collection
 As of Java 5, there is an elegant shortcut to looping a
collection using iterators
 To use the for each loop the data structure must
implement Iterable<E> interface
9Riccardo Cardin
for (String element : c) {
// do something with element
}
Collection<String> c = /* ... */;
Iterator<String> iter = c.iterator();
while (iter.hasNext()) {
String element = iter.next();
// do something with element
}
Programmazione concorrente e distribuita
COLLECTIONS AND ITERATORS
 Using an iterator it possibile to remove elements
from a collection
 The remove method removes the element that was
returned by the last call to next
 There is a strong dependency between next and remove: it
is illegal to call remove if it wasn’t preceded by a call to next
 This is the only way to safely modify a collection after
the creation of an iterator
10Riccardo Cardin
for (Iterator<String> iterator = list.iterator();
iterator.hasNext();) {
// Point the iterator to the current element
String string = iterator.next();
iterator.remove();
}
Programmazione concorrente e distribuita
COLLECTIONS AND ITERATORS
11Riccardo Cardin
Programmazione concorrente e distribuita
CONCRETE COLLECTIONS
 All concrete collections implement Collection
or Map interfaces
 We will introduce only few of them
12Riccardo Cardin
Collection type Description
ArrayList An indexed sequence that grows and
shrinks dynamically
LinkedList An ordered sequence that allows efficient
insertion and removal at any location
HashSet An unordered collection that rejects
duplicates
TreeSet A sorted set
HashMap A data structure that stores key/value
associations
Programmazione concorrente e distribuita
LINKED LIST
 A LinkedList<E> is an ordered data structure
that stores each object in a link
 Each link store a reference to the next link of the seq.
 In Java, a linked list is always doubly linked
13Riccardo Cardin
Implements List<E>
interface
Programmazione concorrente e distribuita
LINKED LIST
 Very efficient for remove and add operations
 These ops are made through an Iterator
 Other elements in the list have not to be repositioned after
removal of an element
14Riccardo Cardin
Programmazione concorrente e distribuita
LINKED LIST
 Also di add operation is made efficiently
through an iterator of type ListIterator<E>
 Use List.listIterator method to get one
 New element is added before the current position of
the iterator
 Be carefull of concurrent modification using iterators
 Linked list are very inefficient in random access
15Riccardo Cardin
interface ListIterator<E> extends Iterator<E> {
void add(E element);
E previous()
boolean hasPrevious()
}
for (int i = 0; i < list.size(); i++)
// do something with list.get(i);
Programmazione concorrente e distribuita
LINKED LIST
16Riccardo Cardin
Adding a new element
changes at most two
references
Programmazione concorrente e distribuita
ARRAY LIST
 An ArrayList<E> is an ordered data structure
that is very efficient in random access ops.
 It encapsulate a dynamically reallocated array of
objects
 Adding and removing operation
are not so efficient
 Reallocation of elements is needed
 Use ArrayList instead of Vector
 More efficient due to its not
synchronized methods
17Riccardo Cardin
Programmazione concorrente e distribuita
ARRAY LIST
18Riccardo Cardin
Programmazione concorrente e distribuita
HASH SET
 A Set<E> is a data structure that doesn’t care
about element’s ordering
 A set contain the same element only once
 Search operation performs very efficiently
 A set is a Collection
 An HashSet<E> uses hash codes to distinguish
among elements
 An hashCode is a number that can be derived from
object data
 You must provide an hash function to your classes
 The function must be compatible with the equals method
19Riccardo Cardin
Programmazione concorrente e distribuita
HASH SET
 Hash codes have to be computed quickly
 Hash functions have some hash collision
 Using the correct hash function the number of collision
should be unlikely
 Hash table are implemented as an array of linked lists
20Riccardo Cardin
Using an hash function, the
bucket in which inserting a new
element is equal to
hash(element) % #bucket
If the hash function produceds values
that are randomly distributed, collision
should be rare
buckets
Programmazione concorrente e distribuita
HASH SET
 An HashSet is implemented using an hash table
 The contains method is very efficient, because it has
to lookup the element only in one bucket
 An iterator to hash set visit each bucket in turn
 Because of scattering, they are visited in a seemingly random
order
 The add method adds an element if it is not already
present
 Don’t mutate an element in a set once inserted
 If the hash code of an element were to change, the
element would no longer be in the correct position
21Riccardo Cardin
Programmazione concorrente e distribuita
TREE SET
 A TreeSet<E> is a sorted set
 While iterating over the collection, the elements are
presented in sorted order
 It uses a red-black tree to store data
 Insertion is slower than insertion in an hash table, but it is
still much faster than insertion in an array or linked list
 ...but a tree set automatically sorts the elements ;)
 The type of the elements may implement
Comparable<T>
22Riccardo Cardin
public interface Comparable<T> {
// It returns a value that is < 0, = 0 or > 0
int compareTo(T other);
}
Programmazione concorrente e distribuita
TREE SET
 What if elements do not implement Comparable
or if you need more than on compation alg?
 Provide a Comparator during set construction
 The compare method acts like the compareTo method
 Function object (lambda anyone?!)
 Using an HashSet or a TreeSet?
 There must be a total ordering defined on elements
 You have to implement also the Comparator interface
 Do you neeed elements to be sorted?
23Riccardo Cardin
public interface Comparator<T> {
// Defines how two elements of type T have to be compared
int compare(T a, T b);
}
Programmazione concorrente e distribuita
TREE SET
24Riccardo Cardin
Programmazione concorrente e distribuita
MAPS
 A Map<K,V> is a data structure that allows you
to search an element using a key
 A map stores key/value pairs
 Key must be unique: If you call the put method twice with
the same key, the second value replaces the first one.
 If no info is associated with a key, get returns null
 In sets to find an element you must have a copy of it
 An HashMap hashes the keys
25Riccardo Cardin
// HashMap implements Map
Map<String, Employee> staff = new HashMap<>();
Employee harry = new Employee("Harry Hacker");
staff.put("987-98-9996", harry);
String s = "987-98-9996";
e = staff.get(s); // gets harry
Programmazione concorrente e distribuita
MAPS
 The collection framework does not consider a
map itself as a Collection
 It is possible to obtain views of the map
 If you are interested in all the values of a map, loop over the
enumeration of its entries
 Iterators on views cannot add elements to the map
 An UnsupportedOperationException is thrown
26Riccardo Cardin
Set<K> keySet() // Set of keys
Collection<K> values() // Collection of values
Set<Map.Entry<K, V>> entrySet() // Set of pairs (key,value)
for (Map.Entry<String, Employee> entry : staff.entrySet()) {
String key = entry.getKey();
Employee value = entry.getValue();
// do something with key, value
}
Programmazione concorrente e distribuita
THE FRAMEWORK
 A framework is a set of classes that form the
basis for building advanced functionality
 The Collection framework defines classes to
implement collections
 The main interfaces are Collection and Map
 Insertion interfaces are different between the two types
 To get the elements from a Collection, just iterate over it
 To get a value from a Map, use the get method
27Riccardo Cardin
// To insert an element in a Collection
boolean add(E element)
// To store a key/value pair in a Map
V put(K key, V value)
V get(K key)
Programmazione concorrente e distribuita
THE FRAMEWORK
 A List is an ordered collection
 There is the concept of position of an element
 A list provides random access methods
 Lists provides a specialized iterator, ListIterator
 A Set is a Collection with no duplicates
 The add method can reject a value if already present
 Methods equals and hashCode are used to maintain
elements inside the set
28Riccardo Cardin
void add(int index, E element)
E get(int index)
void remove(int index)
void add(E element)
Programmazione concorrente e distribuita
THE FRAMEWORK
 The framework provides some wrappers
29Riccardo Cardin
Programmazione concorrente e distribuita
THE FRAMEWORK
 The framework have some companion objects
 The Arrays type allows to trasform arrays into List
 The list returned is a view on the array: it is not possible to
change the size of the list; elements are the same
 The Collections type have a bunch of utilities
 The method nCopies builds an illusory immutable list
 Object is stored only once
 Method singleton returns an illusory set with one element
 And so on...
30Riccardo Cardin
static <T> List<T> asList(T... a)
List<String> settings = Collections.nCopies(100, "DEFAULT");
Collections.singleton(anObject)
Programmazione concorrente e distribuita
THE FRAMEWORK
31Riccardo Cardin
Programmazione concorrente e distribuita
EXAMPLES
32Riccardo Cardin
https://github.com/rcardin/pcd-snippets
Programmazione concorrente e distribuita
REFERENCES
 Chap. 13 «Collections», Core Java Volume I - Fundamentals, Cay
Horstmann, Gary Cornell, 2012, Prentice Hall
33Riccardo Cardin

Java - Collections framework

  • 1.
    COLLECTIONS FRAMEWORK PROGRAMMAZIONE CONCORRENTEE DISTR. Università degli Studi di Padova Dipartimento di Matematica Corso di Laurea in Informatica, A.A. 2015 – 2016 rcardin@math.unipd.it
  • 2.
    Programmazione concorrente edistribuita SUMMARY  Introduction  Collections and iterators  Linked list  Array list  Hash set  Tree set  Maps  Collections framework 2Riccardo Cardin
  • 3.
    Programmazione concorrente edistribuita INTRODUCTION  Data structures can make a big difference in programming  Do you need to search quickly? Do you need to rapidly insert and remove element? ...  The first version of Java was supplied with a very small set of classes  Vector, Stack, Hashtable, BitSet and Enumeration  Java needed a serious data structure library  Without the complexity of C++ STL  With the benefit of «generic algorithms» of C++ STL  The collections framework satisfies those needs 3Riccardo Cardin
  • 4.
    Programmazione concorrente edistribuita INTRODUCTION  Java collection framework separates interfaces and implementations  It defines a set of interfaces that represents abstract data structures  Each interface is then implemented in different ways, each with main focus on some aspects  For example, fast insertion, fast search, fixed memory consumption, ...  Using interfaces, you can change the implementation simply modifying a single statement 4Riccardo Cardin // ArrayList optimizes random access, LinkedList modification stmts List<Integer> list = new ArrayList<>(); list = new LinkedList<>()
  • 5.
    Programmazione concorrente edistribuita INTRODUCTION 5Riccardo Cardin
  • 6.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS  The fundamental interface for collection is the Collection interface  The add method adds an element ot the collection  Returns true if the collection was changed by the addition  Iterators are used visit elements in the collection  Implements the Iterator design pattern 6Riccardo Cardin public interface Collection<E> { boolean add(E element); Iterator<E> iterator(); // ... } public interface Iterator<E> { E next(); boolean hasNext(); void remove(); }
  • 7.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS  Using iterators we have decoupled a collection from the traversing policies  The order in which elements are visited depends on the the collection type  Using the next method the collection is visited one element at time  If the end of the collection is reached an NoSuchElementException is thrown  Use the hasNext method to check if the collection has more elements to visit  Think of Java iterators as being between elements  The iterator jumps over the next element 7Riccardo Cardin
  • 8.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS 8Riccardo Cardin Think of Java iterators as being between elements While moving, the iterator returns the current element
  • 9.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS  Using next and hasNext it is possibile to traverse the collection  As of Java 5, there is an elegant shortcut to looping a collection using iterators  To use the for each loop the data structure must implement Iterable<E> interface 9Riccardo Cardin for (String element : c) { // do something with element } Collection<String> c = /* ... */; Iterator<String> iter = c.iterator(); while (iter.hasNext()) { String element = iter.next(); // do something with element }
  • 10.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS  Using an iterator it possibile to remove elements from a collection  The remove method removes the element that was returned by the last call to next  There is a strong dependency between next and remove: it is illegal to call remove if it wasn’t preceded by a call to next  This is the only way to safely modify a collection after the creation of an iterator 10Riccardo Cardin for (Iterator<String> iterator = list.iterator(); iterator.hasNext();) { // Point the iterator to the current element String string = iterator.next(); iterator.remove(); }
  • 11.
    Programmazione concorrente edistribuita COLLECTIONS AND ITERATORS 11Riccardo Cardin
  • 12.
    Programmazione concorrente edistribuita CONCRETE COLLECTIONS  All concrete collections implement Collection or Map interfaces  We will introduce only few of them 12Riccardo Cardin Collection type Description ArrayList An indexed sequence that grows and shrinks dynamically LinkedList An ordered sequence that allows efficient insertion and removal at any location HashSet An unordered collection that rejects duplicates TreeSet A sorted set HashMap A data structure that stores key/value associations
  • 13.
    Programmazione concorrente edistribuita LINKED LIST  A LinkedList<E> is an ordered data structure that stores each object in a link  Each link store a reference to the next link of the seq.  In Java, a linked list is always doubly linked 13Riccardo Cardin Implements List<E> interface
  • 14.
    Programmazione concorrente edistribuita LINKED LIST  Very efficient for remove and add operations  These ops are made through an Iterator  Other elements in the list have not to be repositioned after removal of an element 14Riccardo Cardin
  • 15.
    Programmazione concorrente edistribuita LINKED LIST  Also di add operation is made efficiently through an iterator of type ListIterator<E>  Use List.listIterator method to get one  New element is added before the current position of the iterator  Be carefull of concurrent modification using iterators  Linked list are very inefficient in random access 15Riccardo Cardin interface ListIterator<E> extends Iterator<E> { void add(E element); E previous() boolean hasPrevious() } for (int i = 0; i < list.size(); i++) // do something with list.get(i);
  • 16.
    Programmazione concorrente edistribuita LINKED LIST 16Riccardo Cardin Adding a new element changes at most two references
  • 17.
    Programmazione concorrente edistribuita ARRAY LIST  An ArrayList<E> is an ordered data structure that is very efficient in random access ops.  It encapsulate a dynamically reallocated array of objects  Adding and removing operation are not so efficient  Reallocation of elements is needed  Use ArrayList instead of Vector  More efficient due to its not synchronized methods 17Riccardo Cardin
  • 18.
    Programmazione concorrente edistribuita ARRAY LIST 18Riccardo Cardin
  • 19.
    Programmazione concorrente edistribuita HASH SET  A Set<E> is a data structure that doesn’t care about element’s ordering  A set contain the same element only once  Search operation performs very efficiently  A set is a Collection  An HashSet<E> uses hash codes to distinguish among elements  An hashCode is a number that can be derived from object data  You must provide an hash function to your classes  The function must be compatible with the equals method 19Riccardo Cardin
  • 20.
    Programmazione concorrente edistribuita HASH SET  Hash codes have to be computed quickly  Hash functions have some hash collision  Using the correct hash function the number of collision should be unlikely  Hash table are implemented as an array of linked lists 20Riccardo Cardin Using an hash function, the bucket in which inserting a new element is equal to hash(element) % #bucket If the hash function produceds values that are randomly distributed, collision should be rare buckets
  • 21.
    Programmazione concorrente edistribuita HASH SET  An HashSet is implemented using an hash table  The contains method is very efficient, because it has to lookup the element only in one bucket  An iterator to hash set visit each bucket in turn  Because of scattering, they are visited in a seemingly random order  The add method adds an element if it is not already present  Don’t mutate an element in a set once inserted  If the hash code of an element were to change, the element would no longer be in the correct position 21Riccardo Cardin
  • 22.
    Programmazione concorrente edistribuita TREE SET  A TreeSet<E> is a sorted set  While iterating over the collection, the elements are presented in sorted order  It uses a red-black tree to store data  Insertion is slower than insertion in an hash table, but it is still much faster than insertion in an array or linked list  ...but a tree set automatically sorts the elements ;)  The type of the elements may implement Comparable<T> 22Riccardo Cardin public interface Comparable<T> { // It returns a value that is < 0, = 0 or > 0 int compareTo(T other); }
  • 23.
    Programmazione concorrente edistribuita TREE SET  What if elements do not implement Comparable or if you need more than on compation alg?  Provide a Comparator during set construction  The compare method acts like the compareTo method  Function object (lambda anyone?!)  Using an HashSet or a TreeSet?  There must be a total ordering defined on elements  You have to implement also the Comparator interface  Do you neeed elements to be sorted? 23Riccardo Cardin public interface Comparator<T> { // Defines how two elements of type T have to be compared int compare(T a, T b); }
  • 24.
    Programmazione concorrente edistribuita TREE SET 24Riccardo Cardin
  • 25.
    Programmazione concorrente edistribuita MAPS  A Map<K,V> is a data structure that allows you to search an element using a key  A map stores key/value pairs  Key must be unique: If you call the put method twice with the same key, the second value replaces the first one.  If no info is associated with a key, get returns null  In sets to find an element you must have a copy of it  An HashMap hashes the keys 25Riccardo Cardin // HashMap implements Map Map<String, Employee> staff = new HashMap<>(); Employee harry = new Employee("Harry Hacker"); staff.put("987-98-9996", harry); String s = "987-98-9996"; e = staff.get(s); // gets harry
  • 26.
    Programmazione concorrente edistribuita MAPS  The collection framework does not consider a map itself as a Collection  It is possible to obtain views of the map  If you are interested in all the values of a map, loop over the enumeration of its entries  Iterators on views cannot add elements to the map  An UnsupportedOperationException is thrown 26Riccardo Cardin Set<K> keySet() // Set of keys Collection<K> values() // Collection of values Set<Map.Entry<K, V>> entrySet() // Set of pairs (key,value) for (Map.Entry<String, Employee> entry : staff.entrySet()) { String key = entry.getKey(); Employee value = entry.getValue(); // do something with key, value }
  • 27.
    Programmazione concorrente edistribuita THE FRAMEWORK  A framework is a set of classes that form the basis for building advanced functionality  The Collection framework defines classes to implement collections  The main interfaces are Collection and Map  Insertion interfaces are different between the two types  To get the elements from a Collection, just iterate over it  To get a value from a Map, use the get method 27Riccardo Cardin // To insert an element in a Collection boolean add(E element) // To store a key/value pair in a Map V put(K key, V value) V get(K key)
  • 28.
    Programmazione concorrente edistribuita THE FRAMEWORK  A List is an ordered collection  There is the concept of position of an element  A list provides random access methods  Lists provides a specialized iterator, ListIterator  A Set is a Collection with no duplicates  The add method can reject a value if already present  Methods equals and hashCode are used to maintain elements inside the set 28Riccardo Cardin void add(int index, E element) E get(int index) void remove(int index) void add(E element)
  • 29.
    Programmazione concorrente edistribuita THE FRAMEWORK  The framework provides some wrappers 29Riccardo Cardin
  • 30.
    Programmazione concorrente edistribuita THE FRAMEWORK  The framework have some companion objects  The Arrays type allows to trasform arrays into List  The list returned is a view on the array: it is not possible to change the size of the list; elements are the same  The Collections type have a bunch of utilities  The method nCopies builds an illusory immutable list  Object is stored only once  Method singleton returns an illusory set with one element  And so on... 30Riccardo Cardin static <T> List<T> asList(T... a) List<String> settings = Collections.nCopies(100, "DEFAULT"); Collections.singleton(anObject)
  • 31.
    Programmazione concorrente edistribuita THE FRAMEWORK 31Riccardo Cardin
  • 32.
    Programmazione concorrente edistribuita EXAMPLES 32Riccardo Cardin https://github.com/rcardin/pcd-snippets
  • 33.
    Programmazione concorrente edistribuita REFERENCES  Chap. 13 «Collections», Core Java Volume I - Fundamentals, Cay Horstmann, Gary Cornell, 2012, Prentice Hall 33Riccardo Cardin