1

I got a big array with data in the following format:

{
  "application": "myapp",
  "buildSystem": {
    "counter": 2361.1,
    "hostname": "host.com",
    "jobName": "job_name",
    "label": "2361",
    "systemType": "sys"
  },
  "creationTime": 1517420374748,
  "id": "123",
  "stack": "OTHER",
  "testStatus": "PASSED",
  "testSuites": [
    {
      "errors": 0,
      "failures": 0,
      "hostname": "some_host",
      "properties": [

        {
          "name": "some_name",
          "value": "UnicodeLittle"
        },
        <MANY MORE PROPERTIES>,
        {
          "name": "sun",
          "value": ""
        }
      ],
      "skipped": 0,
      "systemError": "",
      "systemOut": "",
      "testCases": [
        {
          "classname": "IdTest",
          "name": "has correct representation",
          "status": "PASSED",
          "time": "0.001"
        },
        <MANY MORE TEST CASES>,
        {
          "classname": "IdTest",
          "name": "normalized values",
          "status": "PASSED",
          "time": "0.001"
        }
      ],
      "tests": 8,
      "time": 0.005,
      "timestamp": "2018-01-31T17:35:15",
      "title": "IdTest"
    }
      <MANY MORE TEST SUITES  >,
]}

Where I can distinct three main structures with big data: TestSuites, Properties, and TestCases. My task is to sum all times from each TestSuite so that I can get the total duration of the test. Since the properties and TestCases are huge, the query cannot complete. I would like to select only the "time" value from TestSuites, but it kind of conflicts with the "time" of TestCases in my query:

    db.my_tests.find(
        {
        application: application,
        creationTime:{
            $gte: start_date.valueOf(),
            $lte: end_date.valueOf()
        }
    },
    {
        application: 1, 
        creationTime: 1,
        buildSystem: 1,
        "testSuites.time": 1,
        _id:1
    }
   )

Is it possible to project only the "time" properties from TestSuites without loading the whole schema? I already tried testSuites: 1, testSuites.$.time: 1 without success. Please notice that TestSuites is an array of one element with a dictionary.

I already checked this similar post without success: Mongodb update the specific element from subarray

4
  • Why the <sql> tag? Commented Feb 20, 2018 at 14:51
  • sorry, that's not relevant tho Commented Feb 20, 2018 at 14:57
  • 1
    docs.mongodb.com/manual/reference/operator/aggregation/project You can use $project for this. Commented Feb 20, 2018 at 15:06
  • Already try, see comment bellow, didn't work since TestSuites is an array. Commented Feb 20, 2018 at 15:25

2 Answers 2

1

Following code prints duration of each TestSuite:

query = db.my_collection.aggregate(
    [
        {$match: {

            application: application,
            creationTime:{
                $gte: start_date.valueOf(),
                $lte: end_date.valueOf()
            }
        }
        },
        { $project :
            { duration: { $sum: "$testSuites.time"}}
        } 
    ] 
).forEach(function(doc) 
    {
        print(doc._id)
        print(doc.duration)
    }
)
Sign up to request clarification or add additional context in comments.

Comments

0

Is it possible to project only the "time" properties from TestSuites without loading the whole schema? I already tried testSuites: 1, testSuites.$.time

Answering to your problem of prejecting only the time property of the testSuites document you can simply try projecting it with "testSuites.time" : 1 (you need to add the quotes for the dot notation property references).

My task is to sum all times from each TestSuite so that I can get the total duration of the test. Since the properties and TestCases are huge, the query cannot complete
As for your task, i suggest you try out the mongodb's aggregation framework for your calculations documents tranformations. The aggregations framework option {allowDiskUse : true} will also help you if you are proccessing "large" documents.

2 Comments

Thanks for your response, I tested the aggregation query as follows: db.my_collection.aggregate([ {$match: {creationTime:{ $gte: start_date.valueOf(),$lte: end_date.valueOf() }, application: application}}, { $project : {"testSuites.time" : 1 } } ]). It cannot complete, this query works inside Zeppelin, so IDK if I can allow disk use, this query will iterate through different applications so its actually necessary to select only the necessary fields. I think this projection will work if TestSuites were not an array
Hi, It actually works, other of my queries that were executing after was the breaker. I'll approve your answer, thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.