0

I have a word frequency list which contains strings ordered alphabetically and ints unsorted that represent the frequency of the words(there is no need to read a txt or something cause a "(letter) (number)" query is typed by the user in the console). I don´t need to count them or something like that but to print the most frequent words by every specific input of i.e a query in the console like:"AA 12". In this case it started with "A" so the ideal thing will be to retrieve the most frequent startWith("A") with at least 5 words in descending order related to its frequency but at the same time with its A-Z order.

I have read many stuff on BSTs, Dictionary, Tuple, SortedList, List, SortedSet, Linq... and algorithms books, and I learned that the keys and values can be sorted by Ascending, Descending, A-Z, but not in a simultaneously way... Someone can explain me how can I introduce this query of "AA 12" in which I already split to string a = "AA"; and int b=12; into a BST or Binary Search Tree of string,int word frequency-style but without the need to count just to apply a query that retrieve the 5 most frequent words that match the string and the int of this 100000 word-frequency list and console print it like the Google Search autocomplete but more basic?

sample word-frequency A-Z list:

AA 12
AAA 32
AAB 4
AABB 38
BBAA 3
CDDDA 76
...
YZZZ 45
ZZZZZY 356

user-query: "AA 15"

ideal answer:

AAA
AA
AABB
AAB

The code:

 var list = new List<KeyValuePair<string, int>>();
 StreamReader sr = new StreamReader("C:\\dicti.txt");

 while (true)
 {
      string line = sr.ReadLine();   //read each line
      string[] ln;
      if (line == null) break;            // no more lines
      try
      {
           ln = line.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries);
           string a = ln[0];
           int b = Convert.ToInt32(ln[1]);

           list.Add(new KeyValuePair<string, int>(a, b));       
      }
      catch (IndexOutOfRangeException)
      {
           break;
      }

      string word = Console.ReadLine();

      string[] ln2;
      ln2 = word.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries);
      string am = ln2[0];
      int bm = Convert.ToInt32(ln2[1]);

This is the code I´ve written so far. I'm kind of lost on how to get the values sorted by alphabetical order and by frecuency respective with the first letter of the user query.


This is my actual version of the code... I´m having 1:15 minutes reading complete 1000 words´s frequency list so... I want to now how can I improve my lambdas to get the 15 seconds 1000 word frequency list requierement or what can I do then if lambdas won´t work??

    static void Main(string[] args)
    {
        var dic = new Dictionary<string, int>();


        int contador = 0;

        StreamReader sr = new StreamReader("C:\\dicti.txt");

        while (true)
        {

            string line = sr.ReadLine();   // To read lines
            string[] ln;
            if (line == null) break;            // There is no more lines
            try
            {
                ln = line.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries);
                string a = ln[0];
                int b = Convert.ToInt32(ln[1]);

                dic.Add(a,b);   

            }
            catch (IndexOutOfRangeException) { break; }

        }

        string[] ln2;
        string am,word;
        int bm;
        do
        {
            //counter++;
            do
            {
                word = Console.ReadLine();



                ln2 = word.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries);

                    am = ln2[0];

                    bm = Convert.ToInt32(ln2[1]);

            } while (!(am.Length >= 2 && bm >= 1 && bm <= 1000000 )); 

            if (true)
            {
                var aj = (dic.Where(x => x.Value >= bm).Where(x => x.Key.StartsWith(am)).OrderByDescending(d => d.Value).Take(2));


                foreach (var p in aj)
                {


                        Console.WriteLine("{0} ", p.Key);



                }

            }
        } while (counter < 1001);



    }

}

}

11
  • It's unclear what you mean by 'simultaneous sort', and what your requiements are. List<T>.Sort() sorts a list, if that's what you're looking for. If not, please explain how the input given should produce the output. Commented Jan 5, 2012 at 14:46
  • By 'simultaneous sort' I´m wondering if there is a way in which I can sort a word frecuency list of 100000 based not only in one sort but simultaneously sorting by descending order and from A-Z. The problem with List<T>.Sort() is that it only focus on one criteria and as I already mention above I don´t find convinient to the purpose of the program. A program that needs from a list of 100000 word frequencies alphabetically-sorted-only search i.e the user type console query:"AA 223". Retrieving the 4-5 most frequent words that match alphabetically and in descending order. Commented Jan 5, 2012 at 15:09
  • I still don't understand. Do you want to have two copies of a list sorted two different ways? I don't understand the requirements either. It seems the user-query pulls up all the words that contain the given word. I don't see any use for the number. Maybe posting some code would help. Commented Jan 5, 2012 at 15:22
  • Done check the code and let me know if you still numb. Input: SAC 500 TED 1000 Output: SACK SACRED SACRIFICED TEDDY TEDIOUS Commented Jan 5, 2012 at 15:41
  • 1
    The first thing I see: Don't use a List of KeyValuePairs. Use a Dictionary. It doesn't have anything to do with the problem at hand, but I had to mention it. And I still don't see what the number in your query does. Commented Jan 5, 2012 at 16:12

2 Answers 2

1

Do you want something like this?

    public static IEnumerable<KeyValuePair<string, int>> SearchAndSortBy(Dictionary<string, int> fullSet, string searchFilter)
    {
        return fullSet.Where((pair) => pair.Key.Contains(searchFilter)).OrderByDescending((pair) => pair.Value);
    }

Then you use it like this:

        var mySet = new Dictionary<string, int>();
        mySet.Add("AA", 12);
        mySet.Add("AAA", 32);
        mySet.Add("AAB", 4);
        mySet.Add("AABB", 38);
        mySet.Add("BBAA", 3);
        mySet.Add("CDDDA", 76);
        //...
        mySet.Add("YZZZ", 45);
        mySet.Add("ZZZZZY", 356);

        var results = SearchAndSortBy(mySet, "AA");
        foreach (var item in results)
        {
            Console.Write(item.Key);
            Console.Write(" ");
            Console.WriteLine(item.Value);
        }

And when I run it, I get these results:

AABB 38
AAA 32
AA 12
AAB 4
BBAA 3

I could even change the for loop to:

    foreach (var item in results.Take(5))

If I only wanted the top 5.

Sign up to request clarification or add additional context in comments.

5 Comments

Interesting solution... I was wondering... Is there away in which with this same solution when the user enter the string"(ANYLETTER) (ANYNUMBER)" it will inmediately retrieve top 5 of this related words. I try to save in list at every sorting iteration but it is doesn´t working. Which will be a version in which there won´t be need to store any sort in a list an then search the query?
@thecodingpianist You haven't explained in any sense what the Number should do?
@McKay: You don't need to put brackets around (pair) in your lambdas and instead of OrderBy with a negative value you should better use OrderByDescending as it expresses better what you are trying to achieve (and it might result in unexpetced problems when the value starts becoming an unsigned int)
@thecodingpianist You want to display a filtered list to the user. Therefore you need to create this filtered list. There is no way around it. Basically whenever the user types something in you have to rerun your filter. You can optimize it by checking if the user simply added a new letter then you can rerun your filter on the already filtered list. But if the user removes a letter then you need to rerun it on the whole source.
@ChrisWue I really like having parens around member lists. I think it better describes that this is a method. OrderByDescending is a better idea. (I should spend more than 10 seconds thinking about SO Posts?) I'll update thanks.
0

I think you can tweak the OrderBy to achieve your search requirements. Let's take a quick look:

Your input:

AA 12
AAA 32
AAB 4
AABB 38
BBAA 3
CDDDA 76

Desired result for searching "AA"

AAA
AA
AABB
AAB

So AAA comes before AA because it has a higher frequency but AABB comes after because AABB < AAA. Now here comes the problem: It is also AA < AAA so if you sort your keys alphabetically then AA will always appear before AAA regardless of it's frequency.

But if you "continue" each word with its last character then you get what you want by first sorting alphabetically and then by frequency:

public static IEnumerable<KeyValuePair<string, int>> FilterAndSort(IEnumerable<KeyValuePair<string, int>> fullSet, string searchFilter, int maxKeyLength)
{
    return fullSet
            .Where(p => p.Key.StartsWith(searchFilter))
            .OrderBy(p => p.Key.PadRight(maxKeyLength, p.Key.Last()))
            .ThenByDescending(p => p.Value);
}

Test:

List<KeyValuePair<string, int>> list = new List<KeyValuePair<string,int>>
{
    new KeyValuePair<string, int>("AA", 12),
    new KeyValuePair<string, int>("AAA", 32),
    new KeyValuePair<string, int>("AAB", 4),
    new KeyValuePair<string, int>("AABB", 38),
    new KeyValuePair<string, int>("BBAA", 3),
    new KeyValuePair<string, int>("CDDDA", 76),
};

foreach (var p in FilterAndSort(list, "AA", list.Max(p => p.Key.Length)))
{
    Console.WriteLine("{0} {1}", p.Key, p.Value);
} 

Output:

AAA 32
AA 12
AABB 38
AAB 4

You can optimize it by precomputing the padded words when you read the list. IN this case you might want to use a Tuple<string, string, int> (original word, padded word, frequency). instead of a KeyValuePair Will take up a bit more memory but you have to do it only once instead on every filter.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.