0

I have a big collection of songs and want to get most played songs per week, in a array. as example:

{
    "_id" : {
        "title" : "demons savaites hitas",
        "name" : "imagine dragons"
    },
    "value" : {
        "weeks" : [ 
            {
                "played" : 56,
                "week" : 9,
                "year" : 2014
            }
        ]
    }
}

It sometimes becomes:

{
    "_id" : {
        "title" : "",
        "name" : "top 15"
    },
    "value" : {
        "played" : 1,
        "week" : 8,
        "year" : 2014
    }
}

The collection which i get the data from is named songs and new fields get added all the time when a songs get added. No unique artistnames or songtitles and every document in the collection looks like this:

{
    "_id" : ObjectId("530536e3d4ca1a783342f1c8"),
    "week" : 8,
    "artistname" : "City Shakerz",
    "songtitle" : "Love Somebody (Summer 2012 Mix Edit)",
    "year" : 2014,
    "date" : ISODate("2014-02-19T22:57:39.926Z")
}

I now want to do a mapreduce which add the new week to the array. It now overwrites it. I also noted when trying to change to a array, not all the played get counted, with the new mapreduce.

The new mapreduce not working, with weeks:

map = function () {
if (this.week == 9 && this.year == 2014) emit({title:this.songtitle.toLowerCase(), name:this.artistname.toLowerCase()}, {played:1, week:this.week, year:this.year});
}
reduce = function(k, values) {

var result = {};
result.weeks = new Array();
var object = {played:0, week: 0, year: 0};
values.forEach(function(value) {
    object.played += value.played;
    object.week = value.week;
    object.year = value.year;
});
result.weeks.push(object);
return result;
}
db.songs.mapReduce(map,reduce,{out: {reduce:"played2"}})

This is the old one i'm using with is a new field in the collection per week and song:

map = function () {
if (this.week == 10 && this.year == 2014) emit({title:this.songtitle.toLowerCase(), name:this.artistname.toLowerCase(), week:this.week, year:this.year}, {count:1});
}
reduce = function(k, values) {
var result = {count: 0,};
values.forEach(function(value) {
result.count += value.count;
});
return result;
}
db.songs.mapReduce(map,reduce,{out: {merge:"played"}})

I get the information fro the toplist right now from played2 like this:

db.played2.find({'_id.week': 9,'_id.year': 2014}).sort(array("value.count" => -1)).limit(50)

Above line can include any typo because i use mongoclient for php and needed to change it to javascript syntax for you.

What am I doing wrong?

8
  • Can you include what the structure of your original collection is. My point is I don't think you want mapReduce for this and there may be a better way. Commented Mar 5, 2014 at 22:49
  • @NeilLunn - I have edited the questions with the document and how that collection works. it is just a long feed-collection with last played songs, new songs gets added all the time, around 10 per second. Commented Mar 6, 2014 at 9:43
  • Try the aggregation statement in the answer. The aggregation pipeline runs much faster than map reduce and this seems to suit your desired results. Commented Mar 6, 2014 at 9:55
  • I wanted to add the weeks for one songtitle and artistname mostly because I want to see the changes for a song over the weeks. It's a bit harsh to two aggregate for each week then. Commented Mar 6, 2014 at 10:04
  • Just change your criteria. If you are only matching one song and artist, since it is part of the key then there will only be that song in the results, for every week it appeared. Drop the limit at the end as you don't need it. The match part is just a standard query like you would issue to find. You are familiar with that are you not? Any more questions, then comment on the answer rather than your question. Commented Mar 6, 2014 at 10:09

2 Answers 2

1

I found out that I could do mapreduce as the code snippet above and then just get this week in a query and another one for previous week and do simple double for with a if to update this week with previous week place.

I made the script in python, which i run also for my mapreduce as a cronjob. As example:

if len(sys.argv) > 1 and sys.argv[1] is not None:
    week = int(sys.argv[1])
else:
    week = (datetime.date.today().isocalendar()[1]) - 1

year = datetime.date.today().year

previous_week = week - 1

client = MongoClient()
db = client.db
played = db.played

print "Updating it for week: " + str(week)

previous = played.find({"_id.week": previous_week, "_id.year": year}).sort("value.count", -1).limit(50)
thisweek = played.find({"_id.week": week, "_id.year": year}).sort("value.count", -1).limit(50)

thisplace = 1
for f in thisweek:
    previous.rewind()  # Reset second_collection_records's iterator
    place = 1

    if previous.count() > 0:
        checker = bool(1)
        for s in previous:
             if s["_id"]["name"] == f["_id"]["name"] and s["_id"]["title"] == f["_id"]["title"]:
                result = played.update({"_id.week": f["_id"]["week"], "_id.year": f["_id"]["year"], "_id.title": f["_id"]["title"], "_id.name": f["_id"]["name"]}, {"$set": {"place.previous_week":place, "place.this_week":thisplace}})
                checker = bool(0)
                print result
             place = place + 1
        if checker is True:
            result = played.update({"_id.week": f["_id"]["week"], "_id.year": f["_id"]["year"], "_id.title": f["_id"]["title"], "_id.name": f["_id"]["name"]}, {"$set": {"place.previous_week":0, "place.this_week":thisplace}})
            print result
    else:
        result = played.update({"_id.week": f["_id"]["week"], "_id.year": f["_id"]["year"], "_id.title": f["_id"]["title"], "_id.name": f["_id"]["name"]}, {"$set": {"place.previous_week":0, "place.this_week":thisplace}})
        print result
    thisplace = thisplace + 1

print "done."

This seems to work very good. Hopefully mongodb adds support to just update a field or anything in mapreduce to add information to a document without overwrite it.

Sign up to request clarification or add additional context in comments.

Comments

0

I'm taking a stab at the structure of your collection based on your input fields, but I don't think mapReduce is the tool you want. Your apparent desired output can be achieved using aggregate :

db.collection.aggregate([
    // Match a specific week and year if you want - remove if you want all
    { "$match": { "year": inputYear, "week": inputWeek } }, 

    // Group to get the total number of times played
    { "$group": {
        "_id": {
            "title": { "$toLower": "$songtitle" },
            "name": { "$toLower": "$artistname" },
            "week": "$week",
            "year": "$year"
        },
        played: { "$sum": 1 }
    }},

    // Sort the results by the most played in the range
    { "$sort": { "year": -1, "week": -1, "played": -1 } },

    // Optionally limit to the top 15 results
    { "$limit": 15 }

])

That basically is what you appear to be trying to do. So this sums up the "number of appearances" as the number of times played. Then we take the additional steps of sorting the results, and optionally (if you can live with looking for one week at a time) limits the results to a set number. Those last two steps you won't get with mapReduce.

If you are ultimately looking for the "top ten" for each week, as a single query result, then you can look at this for a discussion (and methods to achieve) what we call the "topN" results problem.

2 Comments

I get Error: Line 13: Unexpected token { on this query
@HåkanNylén typo in the posted query. Was missing a closing bracket on the group statement

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.