0
Eric Sedor
Index Automation and Dex
June 2013
2
Agenda
• MongoDB index basics
• Indexing tips and tricks
• Dex automation
• Dex details and demo
• Extras
3
Some notable MongoDB fundamentals
• Good performance starts with indexes
– you create them; they don’t just happen
• Each query uses at most one index
– so index accordingly
• The query optimizer is empirical
– every so often (~1k writes) MongoDB runs a race between
query plans. The first query plan to complete wins.
– query plans are also re-run after certain changes to a
collection (such as adding an index).
4
Proper indexing is critical
• Indexes can improve query performance by 2 to 3
orders of magnitude
– 1000ms query down to <1ms!
• Bad queries don’t just get in their own way, they get
in the way of other things too:
– write lock, queued operations, page faults
• Bad indexing → Memory Apocalypse
– without warning, large portions of your working data topple
out of memory and must be page-faulted back
5
Five key commands
db.adventurers.find(
{"name" : "Eric", "class": "Wizard"}).explain()
db.adventurers.getIndexKeys()
db.adventurers.getIndexes()
db.adventurers.ensureIndex({"name": 1, "class": 1},
{"background": true})
db.adventurers.dropIndex({"name": 1, "class": 1})
6
explain()will reveal a scanAndOrder
• scanAndOrder is almost always bad!
• If MongoDB is re-ordering to satisfy a sort
clause, explain() includes: { scanAndOrder:
true }
• MongoDB sorts documents in-memory! (very
expensive!)
– without an index, large result sets are rejected with an error
7
Know Thy B-Tree
8
An index is a b-tree that maps a sequence of
key values to a list of document pointers
*
“Ben”
“Fighter” “Noble”
“Eric”
“Engineer” “Wizard”
{ "name": 1, "class": 1 }
name->
class->
the order of the keys really matters!
9
Index key order determines how the b-tree
is constructed
This ordering of keys influences how:
• applicable an index is to a given query
– a query that doesn't include the first field(s) in the index
cannot use the index
• quickly the scope of possible results is pruned
– here is where your data's cardinality weighs in
• documents are sorted in result sets
– did I mention scanAndOrder was bad?
10
Ordering is tricky and especially important
with range operators
The order of fields in an index should be the:
① fields on which you will query for exact values
② fields on which you will sort
③ fields on which you will query for a range of values
($in, $gt, $lt, etc.)
Article explaining this topic in detail:
bit.ly/mongoindex
11
Put the range field value last in your index
diagram at bit.ly/mongoindex
12
Put the range field value last in your index
diagram at bit.ly/mongoindex
13
Put the range field value last in your index
diagram at bit.ly/mongoindex
14
Put the range field value last in your index
diagram at bit.ly/mongoindex
15
Slow Hell
(like normal hell only slower)
What do we do?
16
Be warned if you...
• Use a variety of query patterns
• Give the app user control over queries
• Use MongoDB like a relational database
• Have many indexes in each collection
17
Don’t die the death of a thousand cuts
• The most expensive queries are not always the
slowest queries.
– 50 queries * 20 ms == 1 s
That’s 1 second other queries can't use!
• Profile your queries and check the <100ms range for
a high volume of expensive but relatively fast
queries
• Remember... bad queries don't just get into their
own way!
18
Identify the problematic queries
• Search the log file
– logs any query over 100ms
• Use the database profiler
① Turn it on
db.setProfilingLevel(1)logs slow queries
db.setProfilingLevel(2)logs all queries (helpful but noisy)
② Find the slow queries
.sort({millis: -1})
.find({ns: "mongoquest.adventurers"})
.find({op: {$in: ["query", "update", "command"]})
③ Cleanup
db.setProfilingLevel(0)
db.system.profile.dropCollection()
19
Here’s a hint() if you have too many
indexes
• The query optimizer might choose a suboptimal
index
– It’s empirical, so it is vulnerable to poor conditions at query
time, especially in high-page-fault environments
• Hint your queries to the better index
– db.adventurers.find(…).hint({“myIndex”: 1})
20
Introducing...
21
How Dex Works
① Dex iterates over the input
(log or profile collection)
② A LogParser or
ProfileParser extracts
queries from each line of
input.
③ Dex passes the query to a
QueryAnalyzer.
④ The QueryAnalyzer
compares the query to
existing indexes (from left
to right)
⑤ If an index meeting
Dex's criteria does not
already exist, Dex
suggests the best
index for that query
The Heart of Dex
22
Dex understands that order of fields in an index
should be:
① Equivalency checks {a:1}
② Sorts .sort({b: 1})
③ Range checks {c: {$in: [1, 2]}}
23
Using Dex is easy
Install using pip:
> sudo pip install dex
Usage: dex [<options>] uri
> dex –f my/mongod/data/path/mongodb.log
mongodb://myUser:myPass@myHost:12345/myDb
> dex –p mongodb://myUser:myPass@myHost:12345/myDb
24
Demo
25
'runStats': {
'linesRecommended': 76,
'linesProcessed': 76,
'linesPassed': 93
},
'results': [
{
'index': '{"name": 1}',
'totalTimeMillis': 410041,
'namespace': 'mongoquest.adventurers',
'queryCount': 2161,
'avgTimeMillis': 189,
'queries': [
'{"q": {"name": "<name>"}}'
]
},
...
Example of Dex's output
(use –v for a shell command!)
26
> dex -f my/mongod/data/path/mongodb.log
-n "myFirstDb.collectionOne"
mongodb://myUser:myPass@myHost:12345/myFirst
Db
> dex -f my/mongod/data/path/mongodb.log
-n "*.collectionOne"
mongodb://myUser:myPass@myHost:12345/admin
> dex -f my/mongod/data/path/mongodb.log
-n "myFirstDb.*" -n "mySecondDb.*"
mongodb://myUser:myPass@myHost:12345/admin
Note the auth to the admin db to run against more than one db!
The namespace filter (-n)
27
For when you want current results, not prior results.
> dex –w -f my/mongod/data/path/mongodb.log
mongodb://myUser:myPass@myHost:12345/myFirst
Db
> dex –w –p –n "dbname.*"
mongodb://myUser:myPass@myHost:12345/admin
Watch mode (-w)
28
Focus on longer-running queries
> dex –w -f my/mongod/data/path/mongodb.log
mongodb://myUser:myPass@myHost:12345/myFirst
Db –s 1000
> dex –w –p
mongodb://myUser:myPass@myHost:12345/admin -
-slowms 5000
SlowMS (-s/--slowms)
29
{parsed: ...,
namespace: db.adventurers,
queryAnalysis: {analyzedFields: [{fieldName: name,
fieldType: EQUIV},
{fieldName: class,
fieldType: EQUIV},
fieldCount: N,
supported: true|false},
indexAnalysis: {fullIndexes: [],
partialIndexes: [{name: 1}]
needsRecommendation: true|false },
recommendation: {namespace: mongoquest.adventurers
index: {name: 1, class: 1}
shellCommand: db.ensureIndex... } }
Dex's guts
30
Future plans for Dex
Dev/Testing now:
– Aggregation framework, geospatial queries, map/reduce
– min/max/average nscanned and nreturned
– scanAndOrder true/false
Soon:
• Renovation of internals
• Improved index recommendations
– set-wise optimization of index fields
• minimize the number of indexes required to cover all of your
queries
– order-wise optimization of index fields
• measure cardinality for key ordering
31
http://mongolab.org
32
PS
We’re hiring!
33
Questions?
Thank you and good luck out there!
eric@mongolab.com
www.github.com/mongolab/dex
http://mongolab.org
http://blog.mongolab.com/2012/06/introducing-dex-the-index-bot/
http://blog.mongolab.com/2012/07/remote-dex/
http://blog.mongolab.com/2012/06/cardinal-ins/
http://blog.mongolab.com/2013/04/thinking-about-arrays-in-mongodb/

Automated Slow Query Analysis: Dex the Index Robot

  • 1.
  • 3.
    2 Agenda • MongoDB indexbasics • Indexing tips and tricks • Dex automation • Dex details and demo • Extras
  • 4.
    3 Some notable MongoDBfundamentals • Good performance starts with indexes – you create them; they don’t just happen • Each query uses at most one index – so index accordingly • The query optimizer is empirical – every so often (~1k writes) MongoDB runs a race between query plans. The first query plan to complete wins. – query plans are also re-run after certain changes to a collection (such as adding an index).
  • 5.
    4 Proper indexing iscritical • Indexes can improve query performance by 2 to 3 orders of magnitude – 1000ms query down to <1ms! • Bad queries don’t just get in their own way, they get in the way of other things too: – write lock, queued operations, page faults • Bad indexing → Memory Apocalypse – without warning, large portions of your working data topple out of memory and must be page-faulted back
  • 6.
    5 Five key commands db.adventurers.find( {"name": "Eric", "class": "Wizard"}).explain() db.adventurers.getIndexKeys() db.adventurers.getIndexes() db.adventurers.ensureIndex({"name": 1, "class": 1}, {"background": true}) db.adventurers.dropIndex({"name": 1, "class": 1})
  • 7.
    6 explain()will reveal ascanAndOrder • scanAndOrder is almost always bad! • If MongoDB is re-ordering to satisfy a sort clause, explain() includes: { scanAndOrder: true } • MongoDB sorts documents in-memory! (very expensive!) – without an index, large result sets are rejected with an error
  • 8.
  • 9.
    8 An index isa b-tree that maps a sequence of key values to a list of document pointers * “Ben” “Fighter” “Noble” “Eric” “Engineer” “Wizard” { "name": 1, "class": 1 } name-> class-> the order of the keys really matters!
  • 10.
    9 Index key orderdetermines how the b-tree is constructed This ordering of keys influences how: • applicable an index is to a given query – a query that doesn't include the first field(s) in the index cannot use the index • quickly the scope of possible results is pruned – here is where your data's cardinality weighs in • documents are sorted in result sets – did I mention scanAndOrder was bad?
  • 11.
    10 Ordering is trickyand especially important with range operators The order of fields in an index should be the: ① fields on which you will query for exact values ② fields on which you will sort ③ fields on which you will query for a range of values ($in, $gt, $lt, etc.) Article explaining this topic in detail: bit.ly/mongoindex
  • 12.
    11 Put the rangefield value last in your index diagram at bit.ly/mongoindex
  • 13.
    12 Put the rangefield value last in your index diagram at bit.ly/mongoindex
  • 14.
    13 Put the rangefield value last in your index diagram at bit.ly/mongoindex
  • 15.
    14 Put the rangefield value last in your index diagram at bit.ly/mongoindex
  • 16.
    15 Slow Hell (like normalhell only slower) What do we do?
  • 17.
    16 Be warned ifyou... • Use a variety of query patterns • Give the app user control over queries • Use MongoDB like a relational database • Have many indexes in each collection
  • 18.
    17 Don’t die thedeath of a thousand cuts • The most expensive queries are not always the slowest queries. – 50 queries * 20 ms == 1 s That’s 1 second other queries can't use! • Profile your queries and check the <100ms range for a high volume of expensive but relatively fast queries • Remember... bad queries don't just get into their own way!
  • 19.
    18 Identify the problematicqueries • Search the log file – logs any query over 100ms • Use the database profiler ① Turn it on db.setProfilingLevel(1)logs slow queries db.setProfilingLevel(2)logs all queries (helpful but noisy) ② Find the slow queries .sort({millis: -1}) .find({ns: "mongoquest.adventurers"}) .find({op: {$in: ["query", "update", "command"]}) ③ Cleanup db.setProfilingLevel(0) db.system.profile.dropCollection()
  • 20.
    19 Here’s a hint()if you have too many indexes • The query optimizer might choose a suboptimal index – It’s empirical, so it is vulnerable to poor conditions at query time, especially in high-page-fault environments • Hint your queries to the better index – db.adventurers.find(…).hint({“myIndex”: 1})
  • 21.
  • 22.
    21 How Dex Works ①Dex iterates over the input (log or profile collection) ② A LogParser or ProfileParser extracts queries from each line of input. ③ Dex passes the query to a QueryAnalyzer. ④ The QueryAnalyzer compares the query to existing indexes (from left to right) ⑤ If an index meeting Dex's criteria does not already exist, Dex suggests the best index for that query
  • 23.
    The Heart ofDex 22 Dex understands that order of fields in an index should be: ① Equivalency checks {a:1} ② Sorts .sort({b: 1}) ③ Range checks {c: {$in: [1, 2]}}
  • 24.
    23 Using Dex iseasy Install using pip: > sudo pip install dex Usage: dex [<options>] uri > dex –f my/mongod/data/path/mongodb.log mongodb://myUser:myPass@myHost:12345/myDb > dex –p mongodb://myUser:myPass@myHost:12345/myDb
  • 25.
  • 26.
    25 'runStats': { 'linesRecommended': 76, 'linesProcessed':76, 'linesPassed': 93 }, 'results': [ { 'index': '{"name": 1}', 'totalTimeMillis': 410041, 'namespace': 'mongoquest.adventurers', 'queryCount': 2161, 'avgTimeMillis': 189, 'queries': [ '{"q": {"name": "<name>"}}' ] }, ... Example of Dex's output (use –v for a shell command!)
  • 27.
    26 > dex -fmy/mongod/data/path/mongodb.log -n "myFirstDb.collectionOne" mongodb://myUser:myPass@myHost:12345/myFirst Db > dex -f my/mongod/data/path/mongodb.log -n "*.collectionOne" mongodb://myUser:myPass@myHost:12345/admin > dex -f my/mongod/data/path/mongodb.log -n "myFirstDb.*" -n "mySecondDb.*" mongodb://myUser:myPass@myHost:12345/admin Note the auth to the admin db to run against more than one db! The namespace filter (-n)
  • 28.
    27 For when youwant current results, not prior results. > dex –w -f my/mongod/data/path/mongodb.log mongodb://myUser:myPass@myHost:12345/myFirst Db > dex –w –p –n "dbname.*" mongodb://myUser:myPass@myHost:12345/admin Watch mode (-w)
  • 29.
    28 Focus on longer-runningqueries > dex –w -f my/mongod/data/path/mongodb.log mongodb://myUser:myPass@myHost:12345/myFirst Db –s 1000 > dex –w –p mongodb://myUser:myPass@myHost:12345/admin - -slowms 5000 SlowMS (-s/--slowms)
  • 30.
    29 {parsed: ..., namespace: db.adventurers, queryAnalysis:{analyzedFields: [{fieldName: name, fieldType: EQUIV}, {fieldName: class, fieldType: EQUIV}, fieldCount: N, supported: true|false}, indexAnalysis: {fullIndexes: [], partialIndexes: [{name: 1}] needsRecommendation: true|false }, recommendation: {namespace: mongoquest.adventurers index: {name: 1, class: 1} shellCommand: db.ensureIndex... } } Dex's guts
  • 31.
    30 Future plans forDex Dev/Testing now: – Aggregation framework, geospatial queries, map/reduce – min/max/average nscanned and nreturned – scanAndOrder true/false Soon: • Renovation of internals • Improved index recommendations – set-wise optimization of index fields • minimize the number of indexes required to cover all of your queries – order-wise optimization of index fields • measure cardinality for key ordering
  • 32.
  • 33.
  • 34.
    33 Questions? Thank you andgood luck out there! eric@mongolab.com www.github.com/mongolab/dex http://mongolab.org http://blog.mongolab.com/2012/06/introducing-dex-the-index-bot/ http://blog.mongolab.com/2012/07/remote-dex/ http://blog.mongolab.com/2012/06/cardinal-ins/ http://blog.mongolab.com/2013/04/thinking-about-arrays-in-mongodb/

Editor's Notes

  • #23 query is – find me all people named X whose less than age N sorted by heightcomposite index with order of ( name, height, age)
  • #33 Thank you
  • #34 Thank you