1

I was wondering how can I find minimum and maximum values from a dataset, which is basically a text file. It has 50 rows, 50 columns.

I know I can set up a control loop (for loop to be specific) to have it read each row and column, and determine the min/max values. But, I'm not sure how to do that.

I think the rows and columns need to be converted to list first and then I need to use the split() function. I tried setting something up as follows, but it doesn't seem to work:

for x in range(4,50): # using that range as an example
    x.split()
    max(4,50)
    print x

New to Python. Please excuse my mistakes.

5
  • How does the file look like exactly? Can you provide a portion of the file? Commented Oct 21, 2011 at 20:23
  • Here's a script I've written which reads all the lines in a file, places it in a list and loops through the list. It's not the program you're looking for, but it might be of some help to you. Commented Oct 21, 2011 at 20:28
  • @Griffin: Sorry, I should've mentioned it's a ASCII dataset. Here's a sample - cl.ly/BBqr Commented Oct 21, 2011 at 20:28
  • Do you want the minimum and maximum of each row, or the minimum and maximum of the entire dataset, or just what? Is there something special about the first few rows/columns that you want to exclude? Is there something special about the data size? Normally, programmers ignore what they "know" about the size of input data whenever possible, preferring to write something that will handle any amount of data (it's usually just as easy, or even easier, anyway). Commented Oct 21, 2011 at 21:39
  • @KarlKnechtel I need to determine the minimum/maximum of the entire dataset. Commented Oct 21, 2011 at 22:14

4 Answers 4

3

Try something like this:

data = []
with open('data.txt') as f:
    for line in f:                   # loop over the rows
        fields = line.split()        # parse the columns
        rowdata = map(float, fields) # convert text to numbers
        data.extend(rowdata)         # accumulate the results
print 'Minimum:', min(data)
print 'Maximum:', max(data)

Note that split() takes an optional argument if you want to split on something other than whitespace (commas for example).

Sign up to request clarification or add additional context in comments.

5 Comments

from the book <Introduction to Algorithms>, aka CLRS, If we must find both the minimum and the maximum simultaneously, one can find both the minimum and the maximum using at most 3 * (n // 2) comparisons instead of 2 * n - 2. should python provide something like minmax()?
@sunqiang, it's pretty cool that the number of comparisons can be reduced by 25%. While it isn't important enough to put into the Python core, it is an interesting algorithm, so I posted sample code at code.activestate.com/recipes/577916-fast-minmax-function
@Raymond Hettinger, Thanks for providing the recipe in such a short time. another cool example of itertools, :P
@RaymondHettinger - Thank you. I noticed your comments inside the code but just so I understand this correctly, can you elaborate on fields = line.split() if it's ok? I haven't used the map function before. I just read about it on Python Docs. Is that basically making the text into the list first and then converting them to numbers?
Thanks for this bit of code. It worked perfectly cut/paste into a script I'm working on (with only small modifications to fit my needs)!
3

If the file contains a regular (rectangular) matrix, and you know how many lines of header info it contains, then you can skip over the header info and use NumPy to do this particularly easily:

import numpy as np

f = open("file.txt")
# skip over header info
X = np.loadtxt(f)
max_per_col = X.max(axis=0)
max_per_row = X.max(axis=1)

Comments

2

Hmmm...are you sure that doesn't apply here? ;) Regardless:

You need to not only split the input lines, you need to convert the text values into numbers. So assuming you've read the input line into in_line, you'd do something like this:

...
row = [float(each) for each in in_line.split()]
rows.append(row) # assuming you have a list called rows
...

Once you have a list of rows, you need to get columns:

...
columns = zip(*rows)

Then you can just iterate through each row and each column calling max():

...
for each in rows:
    print max(each)
for eac in columns:
    print max(each)

Edit: Here's more complete code showing how to open a file, iterate through the lines of the file, close the file, and use the above hints:

in_file = open('thefile.txt', 'r')

rows = []
for in_line in in_file:
    row = [float(each) for each in in_line.split()]
    rows.append(row)

in_file.close() # this'll happen at the end of the script / function / method anyhow

columns = zip(*rows)

for index, row in enumerate(rows):
    print "In row %s, Max = %s, Min = %s" % (index, max(row), min(row))

for index, column in enumerate(columns):
    print "In column %s, Max = %s, Min = %s" % (index, max(column), min(column))

Edit: For new-school goodness, don't use my old, risky file handling. Use the new, safe version:

rows = []
with open('thefile.txt', 'r') as in_file:
    for in_line in in_file:
        row = ....

Now you've got a lot of assurances that you don't accidentally do something bad like leave that file open, even if you throw an exception while reading it. Plus, you can entirely skip in_file.close() without feeling even a little guilty.

5 Comments

Sorry, yes I should've tagged that. I'm new to Python, been practicing but just need help sometimes. Thank you for this. This somewhat makes sense. I'll try this out and post my final code here.
Only thing I'd add would be to consider using the csv module, but this works just as well.
@kolor - that's no problem - it just smelled of homework to me! :) Obviously, to find the minimums, you'll need to iterate through calling min() as well.
@AustinMarshall - I made 2 assumptions: 1) space delimited values and 2) The focus of this exercise was on manipulating the data, not on reading it from the file. I use and love csv but didn't want to get into it here.
@gomad Before I can use in_line.split()... I need to define in_line, right? So I'm using in_line = f.readlines() but I get the following error: AttributeError: 'list' object has no attribute 'split'
1

Will this work for you?

infile = open('my_file.txt', 'r')
file_lines = file.readlines(infile)

for line in file_lines[6:]:
    items = [int(x) for x in line.split()]
    max_item = max(items)
    min_item = min(items)

6 Comments

Hi jcfollower, thank you. Can you elaborate on items = [in(x) for x in line.split()]? Is it basically trying to find all the integer values in the file?
[int(x) for x in line.split()] means "a list, containing the result of applying int to each x in line.split()". line.split() breaks the text up on whitespace, so that you have a list of "words". int attempts to interpret the text it's given as an integer. So this is creating an integer out of each "word" on a given line. It will fail loudly if there is any garbage in that part of the file. (The [6:] part is basically skipping the header info.)
@jcfollower, I've tested this on my dataset (cl.ly/BBqr) but it seems to be picking the wrong values. Based on my dataset, just by looking at it in Notepad++, the max value should be 232 while the min value should be 15. But the program displays 171 as max and 22 as min.
The for-loop sets the max_item and min_item repeatedly for every line. The assumption was that you would do more work within the for-loop with these values, per line. To get the maximum and minimum of the entire dataset, you would need to pass them all to max and min at once, by creating an items that contains the entire dataset. You should be able to think of a way to do this. Hint: try using .readline explicitly to skip the header, and then using .read to read the rest of the file into a single string. The line.split() trick will treat newlines the same as spaces.
@KarlKnechtel Hmm. Ok. Thank you. I'll try it out. I sort of get it. But the line for line in file_lines[6:]: is skipping the header information anyway, right? And it's reading all other contents besides the header.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.