1

I am having a file where there are say n columns. Where the first n-1 columns represent the value of the n-1 attributes and the n-th column represent the value of the class for a particular dataset. Now I want to first read that dataset and print a single line as output where it will print n-1 comma separated * and then at the nth column, the class with the maximum frequency will come and sit. For an example suppose I have a file dataset1.data which contains :

12,13,14,44,0
11,11,10,34,0
22,54,98,11,2
34,90,78,90,1
44,34,34,33,1
22,54,98,11,0
34,90,78,90,2
44,34,34,33,1
22,54,98,11,2
34,90,78,90,2
44,34,34,33,2

For the above case the output will be: *,*,*,*,2 because class 2 has the highest frequency.

And in case of tie in the highest frequency count, it will take the minimum class value.

For an example:

    12,13,14,44,0
    11,11,10,34,0
    22,54,98,11,2
    34,90,78,90,1
    44,34,34,33,1
    22,54,98,11,0
    34,90,78,90,2
    44,34,34,33,1
    22,54,98,11,2

In this case the output will be : *,*,*,*,0 because here all the class have the same frequency.

How can I do it? Can anyone help please!

2
  • 1
    I suggest you keep a dictionary with the counts of each class while reading the lines from the file. What attempts have you made so far? Commented Feb 28, 2022 at 9:46
  • Easiest way is probably to read your file with pandas and then use value_counts(). It will give you the count of each class and then you can sort and write the result. Commented Feb 28, 2022 at 9:53

1 Answer 1

2

You could use collections.Counter:

from collections import Counter

cls_counts = Counter()
with open('dataset1.data') as f:
    for line in f:
        row = list(map(int, line.strip().split(',')))
        attrs, cls = row[:-1], row[-1]
        cls_counts[cls] += 1
max_cls_val = max(cls_counts.values())
max_cls_keys = [cls for cls, count in cls_counts.items() if count == max_cls_val]
print(f"{'*,' * len(attrs)}{min(max_cls_keys)}")

Example Usage 1, Unique class with max count:

dataset1.data:

12,13,14,44,0
11,11,10,34,0
22,54,98,11,2
34,90,78,90,1
44,34,34,33,1
22,54,98,11,0
34,90,78,90,2
44,34,34,33,1
22,54,98,11,2
34,90,78,90,2
44,34,34,33,2

Output:

*,*,*,*,2

Example Usage 2, Multiple classes with max count:

dataset1.data:

12,13,14,44,0
11,11,10,34,0
22,54,98,11,2
34,90,78,90,1
44,34,34,33,1
22,54,98,11,0
34,90,78,90,2
44,34,34,33,1
22,54,98,11,2

Output:

*,*,*,*,0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.