OQL querying and indexes with Apache Geode (incubating)

OQL
It is a SQL-like language with extended functionality for querying complex objects, object attributes and methods.
Only a subset of the OQL features are supported.
Advantages of OQL:
● You can query on any arbitrary object
● You can navigate object collections
● You can invoke methods and access the behavior of objects
● You are not required to declare types. Since you do not need type definitions, you can work across multiple
languages
● You are not constrained by a schema

Commonly used Keywords
SELECT * or field projection
FROM “select * from /users”
WHERE “select * from /users where id = 0”
AND “select * from /users where id > 0 and age > 21”
OR “select * from /users where id != 0 or age < 21”
AS “select * from /users as u where u.id <> 0” , “select * from /users u where u.id > 0”
COUNT “select count(*) from /users”
DISTINCT “select distinct(*) from /users”, “select distinct(name) from /users
IN “select * from /users u where u.id in set (0, 1, 2)”,
“select * from /users u where u.id in (select id from /employees e)”
LIMIT “select * from /users u limit 5”
LIKE “select * from /users u where u.name like ‘%a%’”
NOT “select * from /users u where u.name NOT (id = 2)”
ORDER BY “select * from /users u where u.name = ‘Joe’ order by u.id”
TO_DATE (parsed using SimpleDateFormat) to_date('05/09/10', 'yy/dd/yy') to_date('050910', 'yyddMM')
That’s not all! More keywords and information can be found in the Geode Documentation

Geode Specific Keywords
IS_DEFINED
● Query function. Returns TRUE if the expression does not evaluate to UNDEFINED.
IS_UNDEFINED
● Query function. Returns TRUE if the expression evaluates to UNDEFINED. In most queries, undefined values are
not included in the query results. The IS_UNDEFINED function allows undefined values to be included, so you can
identify element with undefined values.

Geode Specific Keywords Continued
<trace> “<trace> select * from /users u where u.id = 0”
Example log output:
No Indexes used:
● [info 2015/05/26 10:25:35.102 PDT Server <main> tid=0x1] Query Executed in 9.619656 ms; rowCount =
99; indexesUsed(0) "select * from /users u where id > 0 and status='active'"
One index used:
● [info 2015/05/26 10:25:35.317 PDT Server <main> tid=0x1] Query Executed in 1.5342 ms; rowCount =
199; indexesUsed(1):sampleIndex-1(Results: 199) "select count * from /users u where u.id > 0"
When more than one index is used:
● [info 2015/05/26 10:25:35.673 PDT Serve <main> tid=0x1] Query Executed in 2.43847 ms; rowCount =
199; indexesUsed(2):sampleIndex-2(Results: 100),sampleIndex-1(Results: 199) "select * from /users u
where u.id > 0 OR u.status='active'"
System.setProperty("gemfire.Query.VERBOSE","true");
<hint ‘indexName’> or <hint ‘indexName1’, ‘indexName2’>
Example:“<hint ‘nameIndex’>select * from /users u where u.name = ‘Joe’ and u.age > 10”

Query Bind Parameters
What
Similar to a SQL prepared statement
Parameters start with a ‘$’ and a number starting from 1
Examples:
String queryString = “SELECT DISTINCT * FROM /exampleRegion p WHERE p.status = $1 and p.symbol = $2”;
...
Object[] params = {“sold”, “abc”}
SelectResults results = (SelectResults)query.execute(params);
Possible Exceptions
QueryParameterCountInvalidException
TypeMismatchException
Bind region as a parameter
● Binding region parameter requires actual region object and not the string name
“SELECT DISTINCT * FROM $1 p WHERE p.status = $2”

Field visibility and Method Invocation
The query engine tries to evaluate the value using the public field value, if public field is not found makes a get call
using field name (having its first character uppercase).
Examples:
SELECT DISTINCT * FROM /users u where u.firstName = 'Joe'
SELECT DISTINCT * FROM /users u where u.getFirstName() = 'Joe'
SELECT DISTINCT * FROM /users u where u.combineFullName() = ‘Joe’s Full Name’

Type conversions
The Geode query engine will implicitly do the following conversions
Binary Numeric Promotion
The query processor performs binary numeric promotion on the operands of the following operators:
● Operators <, <=, >, and >=, = and <>
1. If either operand is of type double, the other is converted to double
2. If either operand is of type float, the other is converted to float
3. If either operand is of type long, the other is converted to long
4. Both operands are converted to type int char
Temporal Type Conversion
java.util.Date , java.sql.Date , java.sql.Time , and java.sql.Timestamp are treated as nanosecond comparisons
Enum Conversion are not done implicitly, a toString() call is needed
Query Evaluation of Float.NaN and Double.NaN
Float.NaN and Double.NaN are not evaluated as primitives; instead, they are compared in the same manner used as
the JDK methods Float.compareTo and Double.compareTo

Query a Partitioned Region
Operations summary:
1.) “Coordinating” node calculates where all data resides
2.) Creates and executes tasks to query data on remote nodes
a.) Each node will execute the query, using any indexes the node currently has
3.) Executes query on local node
4.) On failure, will recalculate where failed data now resides
5.) Executes tasks to query data on remote nodes that failed/where data now resides
6.) Combines data and returns

Query Monitor
Query Timeout -
Set the system property - gemfire.Cache.MAX_QUERY_EXECUTION_TIME (default is disabled and set to -1)
ResourceManager - Monitoring Queries for Low Memory
Helps prevent out of memory exceptions when querying or creating indexes.
This feature is automatically enabled when you set a critical-heap-percentage attribute for the resource-manager
element in cache.xml or by using cache.getResourceManager().setCriticalHeapPercentage(float heapPercentage) API.
If set, timeout is now set to 5 hours if one has not been set.
Queries will be cancelled with QueryExecutionLowMemoryExcepton and InvalidIndexException
Set the system property - gemfire.cache.DISABLE_QUERY_MONITOR_FOR_LOW_MEMORY to true to disable.
Partitioned Region Queries and Low Memory
Partitioned region queries are likely causes for out-of-memory exceptions. If query monitoring is enabled, partitioned
region queries drop or ignore results that are being gathered by other servers if the executing server is low in
memory.

Indexing
Why use an index?
● Significantly improve querying speeds.
● No longer iterate through the entire region when a matching index can be used
Additional Info:
● Indexed fields must implement Comparable
● Provide simple way to index on fields, nested object fields, nested collection of objects/fields and nested maps
Types:
● Functional Index
● Functional (Compact) Index
● Map index
● Hash Index
● Primary Key Index

Functional Index
A sorted index, internally represented as a tuple and copy of the value
How to create
qs.createIndex(“indexName”, “d.name”, “/users u, u.dependents d”); //(List or Set)
qs.createIndex(“indexName”, “d.name”, “/users u, u.dependents.values d”); //(Map)
Representation
Key Values
Sonny | Collection: [(User:Joe, Sonny)]
Cheryl | Collection: [(User:Joe, Cheryl), (User:John, Cheryl)]
Example query
“select * from /users u, u.dependents d where d.name = ‘Sonny’”
Restrictions:
Cannot be created on overflow regions

Functional Index (Compact)
Memory savings over the non compact index at the expense of doing extra work during index maintenance.
How to create
qs.createIndex(“user names”, “u.name”, “/users u”);
qs.createIndex(“user names”, “u.nestedObject.fieldName”, “/users u”);
Representation
Key Values
Joe | Region Entry
John | [Region Entry, Region Entry]
Jerry | Collection(Region Entry, Region Entry)
Restrictions:
Index maintenance is synchronous
Only when there is one iterator in the from clause (example: /users u)
Additional Info:
What about updates in progress?
What about “in place modification”

Key Index
Creating a key index makes the query service aware of the relationship between the values in the region and the keys
in the region.
This allows the query service to translate a query using a key into a get.
How to create:
qs.createKeyIndex(“indexName”, “u.id”, “/users u”);
Example Query:
“select * from /users u where u.id = 1”
Restrictions:
Equality comparisons only

Hash Index
The good
Saves on memory due to not storing index key values
Hash values are computed from index key
The bad
Slower maintenance and query times
Only a slight savings in memory
Name is a bit misleading
Representation
Array: [ RE, RE, null, RE, REMOVED, null, RE, ...]
How to create
qs.createHashIndex(“indexName”, “u.name”, “/users u”);
Restrictions:
Only equality based queries
Single iterator

Map Index
Allows indexing a map field of an object
How to create:
qs.createIndex("indexName", "u.name[*]", "/users u");
qs.createIndex("indexName", "u.name['first', 'middle']", "/users u");
In Gfsh:
gfsh>create index --name="IndexName" --expression="u.name[‘first’, 'middle']" --region="/users u"
Example of query:
“SELECT * FROM /users u WHERE u.name['first'] = 'John' OR u.name['last'] = 'Smith'”
Gotcha:
Using u.name.get(‘first’) will not create or query the map index.

Map Index...
‘first’
‘middle’
‘last’
Keys
Range Index
Key Value
Joe Collection: [(User: Joe Bob, Joe)]
John Collection:[(User:John Jacob Schmidt, John)]
Jerry Collection:[(User:Jerry Schmidt, Jerry)]
Range Index
Key Value
Jacob Collection:[User: John Jacob Schmidt, Jacob)]
Range Index
Key Value
Bob Collection: [(User: Joe Bob, Bob)]
Schmidt Collection:[(User:John Jacob Schmidt, Schmidt),
(User:Jerry Schmidt, Schmidt)]]
Values

Multiple Index Creation
Creating an multiple indexes on a populated region requires iterating that region for each index
This has significant impact when we have overflow regions
Same mechanism used when cache is brought up internally
Example of multiple index creation:
Cache cache = new CacheFactory().create();
QueryService queryService = cache.getQueryService();
queryService.defineIndex("name1", "indexExpr1", "regionPath1");
queryService.defineIndex("name2", "indexExpr2", "regionPath2");
queryService.defineHashIndex("name3", "indexExpr3", "regionPath2");
queryService.defineKeyIndex("name4", "indexExpr4", "regionPath2");
List<Index> indexes = queryService.createDefinedIndexes();
To clear any defined indexes that have not been created yet
queryService.clearDefinedIndexes();

Querying with Functions
Benefits:
● Allows targeting specific nodes by filtering by partitioning key
● Closer to data
● Logic and computation on results from node, possibly less to send back
Drawbacks:
● More work for users (writing the function)
● More work for users (registering the function)

Equijoin Queries
Restrictions:
● Must be colocated
Problems:
● Slow due to cartesian
● Memory usage due to temporary joined result sets
Some improvements are coming:
● Significantly reduce join time for single iterator filters where indexes can be used:
“select * from /users u, /employees e where u.name = ‘John’ and u.id = e.id”
“select * from /users u, /employees e where u.name = ‘John’ and u.age > 21 and u.id = e.id”
“select * from /users u, /employees e, /office o where u.name = ‘John’” and u.id = e.id and e.location = o.location”

General Tips/Tricks
● From clause of the query and index expression should match
● For AND operators, put the more selective filter first in the query
● Whenever possible, provide a hint to allow the query engine to prefer a specific index

OQL querying and indexes with Apache Geode (incubating)

More Related Content

What's hot

Similar to OQL querying and indexes with Apache Geode (incubating)

Recently uploaded

OQL querying and indexes with Apache Geode (incubating)