I have a large list of string arrays, and within this List<string[]> there can be arrays with all same values (and possibly with different indexes). I'm looking to find and count these duplicate string arrays and have a Dictionary<string[], int> with int being the count (however if there is a better way than using a dictionary I would be interested in hearing). Does anyone have any advice on how to achieve this? Any and all input is very appreciated, thanks!
3 Answers
You can use linq GroupBy with a IEqualityComparer to compare the string[]
var items = new List<string[]>()
{
new []{"1", "2", "3" ,"4" },
new []{"4","3", "2", "1"},
new []{"1", "2"}
};
var results = items
.GroupBy(i => i, new UnorderedEnumerableComparer<string>())
.ToDictionary(g => g.Key, g => g.Count());
The IEqualityComparer for the unordered list
public class UnorderedEnumerableComparer<T> : IEqualityComparer<IEnumerable<T>>
{
public bool Equals(IEnumerable<T> x, IEnumerable<T> y)
{
return x.OrderBy(i => i).SequenceEqual(y.OrderBy(i => i));
}
// Just the count of the array,
// it violates the rule of hash code but should be fine here
public int GetHashCode(IEnumerable<T> obj)
{
return obj.Count();
}
}
Comments
You might find duplicate keys if you use number of occurrences as a Key to Dictionary I would suggest use Dictionary<string, int> where key represents the string and value represents no of occurrences. Now we can use Linq statements.
var results = items.SelectMany(item=>item)
.GroupBy(item=>item)
.ToDictionary(g=>g.Key, g=>g.Count());
Other approach is having LookUp, which allows a collection of keys each mapped to one or more values
var lookup = items.SelectMany(item=>item)
.GroupBy(item=>item)
.ToLookup(c=>c.Count(), c=>c.Key);
Working example
6 Comments
imageobject
Hmm I think this is just counting and grouping single strings from all arrays in list? It doesn't need to compare strings, but each array of strings and group by / count array with all same string values
Hari Prasad
in that case second approach (lookup) should work to you.
Hari Prasad
Ahh... now I got what you mean, you want grouping at per array and count duplicates with in that array, is that correct?
imageobject
No i think your second approach is what I'm looking for, just running into an issue with SelectMany. I think because i actually have
string[]'s inside of an object. Error CS0411: The type arguments for method System.Linq.Enumerable.SelectMany<TSource,TResult>(this System.Collections.Generic.IEnumerable<TSource>, System.Func<TSource,System.Collections.Generic.IEnumerable<TResult>>)' cannot be inferred from the usage. Try specifying the type arguments explicitlyimageobject
Oh I got the same results with second approach. Nope there aren't any duplicates within each array - pretty much am looking to group and count in a scenario like:
new [] { "camera", "lens", "tripod" } == new [] { "camera", "tripod", "lens" } |
import java.util.Scanner;
public class Q1 {
public static void main(String[] args) {
System.out.println("String entry here --> ");
Scanner input = new Scanner(System.in);
String entry = input.nextLine();
String[] words = entry.split("\\s");
System.out.println(words.length);
for(int i=0; i<words.length; i++){
int count = 0;
if(words[i] != null){
for(int j=i+1;j<words.length;j++){
if(words[j] != null){
if(words[i].equals(words[j])){
words[j] = null;
count++;
}
}
else{
continue;
}
}
if(count != 0){
System.out.println("Count of duplicate " + words[i] + " = " + count );
}
}
else{
continue;
}
}
input.close();
}
}
Dictionary<int, string[]>is quite confusingintto be the count butDictionary<string[], int>would make more sense. thanks for pointing that out