0

I have an array of about 15,000 javascript objects. Each object has two fields:

{
  name    : "Foo",
  address : "[email protected]"
}

I want to create a new array which only stores unique email addresses and corresponding names. So far I have this method:

// temp1 is my array of 15,000 objects
var arr = [];

for (var i = 0; i<temp1.length; i++){
   var count = 0;
   if(!arr.length){arr.push(temp1[i])};
   for(var x = 0; x<arr.length; x++){
      if(temp1[i].address === arr[x].address){
        count++;
        if(temp1[i].name.length && !arr[x].name.length){arr[x] = temp1[i];} // Choose the new object if the old one has no name field
      }

      if((x === arr.length -1) && count === 0){
         arr.push(temp1[i])
      }
   }
}

I have an added requirement in here - if the object in arr has a blank string as its name field, and the temp1 object does, I want to store the temp1 object instead.

My current method takes a good 30s to run in Chrome and this is not ideal.

EDIT: To clarify, I'm asking whether there is a more efficient method in Javascript to find unique objects in an array. One method above is to create a new array, iterate against the original and for each one loop through everything in the new array to check for duplicates. I'm wondering what's out that that will be more efficient than this.

6
  • how is the original array of objects created? Does it need to be 15000 or can you reduce it earlier on? Commented Jul 7, 2015 at 13:41
  • So, what is the actual problem that you are having? Commented Jul 7, 2015 at 13:45
  • @depperm Not really - it's coming from an IMAP download of a user's sent/received emails, so these belong to everyone they've contacted. This sorting will be happening on my Node server before being sent to my database. Commented Jul 7, 2015 at 13:46
  • 1
    I'm voting to close this question as off-topic because it belongs on codereview.stackexchange.com Commented Jul 7, 2015 at 13:47
  • 1
    @Xotic750 The current code I have is very slow, and in real-world examples the original array to be sorted could have 200,000+ objects within so I'm looking for a more efficient method, thought that was clear in the question. Commented Jul 7, 2015 at 13:48

4 Answers 4

1

Here's another possibility

var tmp = {};

temp1.forEach(function(item) {
    var key = item.address;
    add = tmp[key] = tmp[key] || item;
    add.name = add.name || item.name;
});
var addr = Object.keys(tmp).map(function(t) { return tmp[t] });

caveat: ie9 or later - or use the following polyfills for lesser ie browsers

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map for map

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/forEach for forEach

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/keys for Object.keys

After taking into consideration comments by @dev-null

var tmp = {}, item, key, add, i, l = temp1.length, addr;
for(i = 0; i < l; i++) {
    item = temp1[i];
    key = item.address;
    add = tmp[key] = tmp[key] || item;
    add.name = add.name || item.name;
};
addr = new Array(Object.keys(tmp).length);
i = 0;
for(key in tmp) {
    addr[i++] = tmp[key];
}

That's on average twice as fast as my first test (in firefox though)

and 64 times faster than the OP's original script

edit: this is the fastest though (in firefox)

var tmp = {}, item, key, add, i, l = temp1.length, addr;
for(i = 0; i < l; i++) {
    item = temp1[i];
    key = item.address;
    add = tmp[key] = tmp[key] || item;
    add.name = add.name || item.name;
};
addr = Object.keys(tmp).map(function(t) { return tmp[t] });
Sign up to request clarification or add additional context in comments.

6 Comments

Cater speed? Don't use .forEach Forgot Object.keys polyfill btw.
I've read mixed reports on .forEach vs for loop over the years - as it's "native code" I'd assume better speed - not sure what "cater speed" means @dev-null
OP is concerned about performance because they are working with large datasets; .forEach is slow by default because it have to check for undefined values in the array: Consider having an array like this: var a = []; a[0] = 'a'; a[2] = 'b'; - See how I skipped one? - Running on .forEach you will get a, b but using for-i-len you will get a, undefined, b.
FYI OP is working with V8 in a node env. V8 is good at compiling JS to machine code and simple for iterations will usually end up as machine code. Your approach is good but I would be faster using for-i-len. About the Object.keys I have no idea, it all depends on the size of the dataset. Sometimes for-in is faster sometimes Object.keys is faster. But I bet you an for-i-len iteration would be faster than .map
@dev-null Added two alternatives, both about twice as fast as my original - the first alternative is fastest in Chrome, second alternative is fastest in Firefox
|
0
var seen = {},
    unique = arr.filter(function(item) {
        var address = item.address;
        return seen.hasOwnProperty(address ) ? false : (seen[address] = true);
    });

In the same way you can add the parts to store the temp1 thingy and such.

2 Comments

I think you forgot to consider the duplicate addresses where name might be missing: [{address: "[email protected]", "name": ""}, {address: "[email protected]", "name": "test"}]. Should give you [{address: "[email protected]", "name": "test"}]
I did specify the OP could add that part in a similar way, but yes, if you have to be exact, I didn't include that part. (Just set seen[address] to the 'name' if it exists and keep 'true' for the unseen ones, ready to be overwritten if a name isn't included.)
0

I would use a temp object to store the address->names pair:

var tmp = {};

for ( var i = 0; i < temp1.length; i++ ) {
  var obj = temp1[i];
  if ( !tmp[obj.address] ) {
    tmp[obj.address] = obj.name;
  }
}

This will give you an object like this:

{
  "[email protected]": "Foo",
  "[email protected]": "John Doe",
  ....
}

If you want to flip it back to an array afterworths you might want to store the complete object in the tmp one:

var tmp = {};
for ( var i = 0; i < temp1.length; i++ ) {
  var obj = temp1[i];
  if ( !tmp[obj.address] || !tmp[obj.address].name ) {
    tmp[obj.address] = obj;
  }
}

Which will produce an object like this:

{
  "[email protected]": {
    "name": "John Doe",
    "address": "[email protected]"
  },
  ...
}

Going from object back to an array is fairly simple:

var arr = [], i = 0;
for ( var prop in tmp ) {
  arr[i++] = tmp[prop];
};

It can be optimized even more if you use new Array(length). So in your initial filter iteration you will need to count the new length.

Comments

0

Instead of checking the arr for every new key in the loop, create a map, and lookup in the map. Then finally from the map create the object.

http://jsbin.com/wunedaxabo/edit

var arr = [];
var hashMap = {};

var isEmptyName= function(hashMap, tempObject){

  var name = hashMap[tempObject.address];
  return name.trim().length === 0;

};

for (var i = 0; i<temp.length; i++){
   var count = 0;
   var tempObject = temp[i];
   //if not in the hashMap  or the entry has an emptyName push into the hasMap
   if(!hashMap[tempObject.address] || isEmptyName(hashMap, tempObject)){
     hashMap[tempObject.address] = tempObject.name;
   }   
}

for(var address in hashMap){
  arr.push({address: hashMap[address]});
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.