Mongodb Mapreduce join array

Question

I have a big collection of songs and want to get most played songs per week, in a array. as example:

{
    "_id" : {
        "title" : "demons savaites hitas",
        "name" : "imagine dragons"
    },
    "value" : {
        "weeks" : [ 
            {
                "played" : 56,
                "week" : 9,
                "year" : 2014
            }
        ]
    }
}

It sometimes becomes:

{
    "_id" : {
        "title" : "",
        "name" : "top 15"
    },
    "value" : {
        "played" : 1,
        "week" : 8,
        "year" : 2014
    }
}

The collection which i get the data from is named songs and new fields get added all the time when a songs get added. No unique artistnames or songtitles and every document in the collection looks like this:

{
    "_id" : ObjectId("530536e3d4ca1a783342f1c8"),
    "week" : 8,
    "artistname" : "City Shakerz",
    "songtitle" : "Love Somebody (Summer 2012 Mix Edit)",
    "year" : 2014,
    "date" : ISODate("2014-02-19T22:57:39.926Z")
}

I now want to do a mapreduce which add the new week to the array. It now overwrites it. I also noted when trying to change to a array, not all the played get counted, with the new mapreduce.

The new mapreduce not working, with weeks:

map = function () {
if (this.week == 9 && this.year == 2014) emit({title:this.songtitle.toLowerCase(), name:this.artistname.toLowerCase()}, {played:1, week:this.week, year:this.year});
}
reduce = function(k, values) {

var result = {};
result.weeks = new Array();
var object = {played:0, week: 0, year: 0};
values.forEach(function(value) {
    object.played += value.played;
    object.week = value.week;
    object.year = value.year;
});
result.weeks.push(object);
return result;
}
db.songs.mapReduce(map,reduce,{out: {reduce:"played2"}})

This is the old one i'm using with is a new field in the collection per week and song:

map = function () {
if (this.week == 10 && this.year == 2014) emit({title:this.songtitle.toLowerCase(), name:this.artistname.toLowerCase(), week:this.week, year:this.year}, {count:1});
}
reduce = function(k, values) {
var result = {count: 0,};
values.forEach(function(value) {
result.count += value.count;
});
return result;
}
db.songs.mapReduce(map,reduce,{out: {merge:"played"}})

I get the information fro the toplist right now from played2 like this:

db.played2.find({'_id.week': 9,'_id.year': 2014}).sort(array("value.count" => -1)).limit(50)

Above line can include any typo because i use mongoclient for php and needed to change it to javascript syntax for you.

What am I doing wrong?

Can you include what the structure of your original collection is. My point is I don't think you want mapReduce for this and there may be a better way. — Neil Lunn
– Neil Lunn, Commented Mar 5, 2014 at 22:49
@NeilLunn - I have edited the questions with the document and how that collection works. it is just a long feed-collection with last played songs, new songs gets added all the time, around 10 per second. — Håkan Nylén
– Håkan Nylén, Commented Mar 6, 2014 at 9:43
Try the aggregation statement in the answer. The aggregation pipeline runs much faster than map reduce and this seems to suit your desired results. — Neil Lunn
– Neil Lunn, Commented Mar 6, 2014 at 9:55
I wanted to add the weeks for one songtitle and artistname mostly because I want to see the changes for a song over the weeks. It's a bit harsh to two aggregate for each week then. — Håkan Nylén
– Håkan Nylén, Commented Mar 6, 2014 at 10:04
Just change your criteria. If you are only matching one song and artist, since it is part of the key then there will only be that song in the results, for every week it appeared. Drop the limit at the end as you don't need it. The match part is just a standard query like you would issue to find. You are familiar with that are you not? Any more questions, then comment on the answer rather than your question. — Neil Lunn
– Neil Lunn, Commented Mar 6, 2014 at 10:09

Håkan Nylén · Accepted Answer · 2014-03-08 23:53:05Z

I found out that I could do mapreduce as the code snippet above and then just get this week in a query and another one for previous week and do simple double for with a if to update this week with previous week place.

I made the script in python, which i run also for my mapreduce as a cronjob. As example:

if len(sys.argv) > 1 and sys.argv[1] is not None:
    week = int(sys.argv[1])
else:
    week = (datetime.date.today().isocalendar()[1]) - 1

year = datetime.date.today().year

previous_week = week - 1

client = MongoClient()
db = client.db
played = db.played

print "Updating it for week: " + str(week)

previous = played.find({"_id.week": previous_week, "_id.year": year}).sort("value.count", -1).limit(50)
thisweek = played.find({"_id.week": week, "_id.year": year}).sort("value.count", -1).limit(50)

thisplace = 1
for f in thisweek:
    previous.rewind()  # Reset second_collection_records's iterator
    place = 1

    if previous.count() > 0:
        checker = bool(1)
        for s in previous:
             if s["_id"]["name"] == f["_id"]["name"] and s["_id"]["title"] == f["_id"]["title"]:
                result = played.update({"_id.week": f["_id"]["week"], "_id.year": f["_id"]["year"], "_id.title": f["_id"]["title"], "_id.name": f["_id"]["name"]}, {"$set": {"place.previous_week":place, "place.this_week":thisplace}})
                checker = bool(0)
                print result
             place = place + 1
        if checker is True:
            result = played.update({"_id.week": f["_id"]["week"], "_id.year": f["_id"]["year"], "_id.title": f["_id"]["title"], "_id.name": f["_id"]["name"]}, {"$set": {"place.previous_week":0, "place.this_week":thisplace}})
            print result
    else:
        result = played.update({"_id.week": f["_id"]["week"], "_id.year": f["_id"]["year"], "_id.title": f["_id"]["title"], "_id.name": f["_id"]["name"]}, {"$set": {"place.previous_week":0, "place.this_week":thisplace}})
        print result
    thisplace = thisplace + 1

print "done."

This seems to work very good. Hopefully mongodb adds support to just update a field or anything in mapreduce to add information to a document without overwrite it.

Community · Accepted Answer · 2017-05-23 12:12:51Z

0

I'm taking a stab at the structure of your collection based on your input fields, but I don't think mapReduce is the tool you want. Your apparent desired output can be achieved using aggregate :

db.collection.aggregate([
    // Match a specific week and year if you want - remove if you want all
    { "$match": { "year": inputYear, "week": inputWeek } }, 

    // Group to get the total number of times played
    { "$group": {
        "_id": {
            "title": { "$toLower": "$songtitle" },
            "name": { "$toLower": "$artistname" },
            "week": "$week",
            "year": "$year"
        },
        played: { "$sum": 1 }
    }},

    // Sort the results by the most played in the range
    { "$sort": { "year": -1, "week": -1, "played": -1 } },

    // Optionally limit to the top 15 results
    { "$limit": 15 }

])

That basically is what you appear to be trying to do. So this sums up the "number of appearances" as the number of times played. Then we take the additional steps of sorting the results, and optionally (if you can live with looking for one week at a time) limits the results to a set number. Those last two steps you won't get with mapReduce.

If you are ultimately looking for the "top ten" for each week, as a single query result, then you can look at this for a discussion (and methods to achieve) what we call the "topN" results problem.

edited May 23, 2017 at 12:12

CommunityBot

11 silver badge

answered Mar 5, 2014 at 23:22

Neil Lunn

151k36 gold badges356 silver badges327 bronze badges

2 Comments

Håkan Nylén Over a year ago

I get Error: Line 13: Unexpected token { on this query

Neil Lunn Over a year ago

@HåkanNylén typo in the posted query. Was missing a closing bracket on the group statement

Collectives™ on Stack Overflow

Mongodb Mapreduce join array

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related