1

Im currently working on a program that uses a file having data in the format - 6 columns and dynamic no. of rows.

The file I got for testing is 26 mb and following is the program that converts first 3 columns into 3 different lists.

f = open('foo', 'r')
print('running...')
a = []
b = []
c = []
for line in f:
    x = (line.split(' '))
    a.append(x[0])
    b.append(x[1])
    c.append(x[2])
print(a,b,c,sep='\n')

I have rechecked this program and logic looks correct and when implemented on small file it works but when i use this program with the 26 mb file it stops responding.

Description of the program: The program opens a file name 'foo' and implements line by line of the file. It splits the line into parts based on the separator that is defined as an argument in the .split() method. In my program I have used white space as an separator as in the text file the data is separated using white spaces.

Im not able to figure out why this program stops responding and I need help with it!

8
  • what is the "cutoff", the size where the program stops responding? I've had similar problems when the program was running in the background but no output so my computer thought it wasn't responding... Commented May 29, 2015 at 16:50
  • I really dont no! you see that "running..." string ! after that string it run for like a very small tym and suddenly stops! am not able to debug why this is hapening! Commented May 29, 2015 at 16:53
  • 1drv.ms/1eDwv2w - link for the file on which am working! Commented May 29, 2015 at 16:53
  • while the program is appending the lines, there is no output, so your program isn't "responding" even though it's doing something. Also, please try to figure out what file size causes the problems, also, try the answer given, it will work for 150 lines Commented May 29, 2015 at 16:59
  • ohk! ill check it out! Commented May 29, 2015 at 17:00

2 Answers 2

1

I looked at the file, and it's 419,041 lines, not 150 lines. I tested my own algorithm on a subset of the file, and I'd estimate that the whole thing would take about 40 seconds.

Here's the algorithm I used:

with open('foo', 'r') as f:
        a, b, c, d, e, f = zip(*(map(float, line.split()) for line in f))

This creates a tuple for the numbers in each column, converting them from strings to floats.

I then tested your algorithm on the same small file, and found that it took almost twice as long. You may need to wait a full minute or two (depending on your computer's performance) for the file to finish processing. Since there's no output until it's done, it'll look like it's frozen. I also wouldn't recommend printing all the results at the end, because 1) that'll take a long time, 2) all it'll do is reprint the file in a messier way, and 3) most command line terminals don't have a very large buffer, so you'll only be able to scroll back over a small fraction of the output.

Sign up to request clarification or add additional context in comments.

Comments

1

if you use numpy, you can use genfromtxt:

import numpy as np

a,b,c=np.genfromtxt('foo',usecols=[0,1,2],unpack=True)

Does that work with your large file?

EDIT:

OK, so I tried it on your file, and it seems to work fine. So I'm not sure what your problem is.

In [1]: from numpy import genfromtxt

In [2]: a,b,c=genfromtxt('foo',usecols=[0,1,2],unpack=True)

In [3]: a
Out[3]: 
array([ 406.954744,  406.828508,  406.906079, ...,  408.944226,
        408.833872,  408.788698])

In [4]: b
Out[4]: 
array([ 261.445358,  261.454366,  261.602131, ...,  260.46189 ,
        260.252377,  260.650606])

In [5]: c
Out[5]: 
array([ 17.451789,  17.582017,  17.388673, ...,  26.41099 ,  26.481148,
        26.606282])

In [6]: print len(a), len(b), len(c)
419040 419040 419040

4 Comments

no dude! it seems onli for 1 line! the txt file has around 150 lines and every line is split and white space is used as seperator! so i dont think it would work!
Declare the the delimiter. a, b, c = np.loadtxt('foo', usecols = [0,1,2], delimiter = ' ', unpack = True)
you don't need to declare whitespace as the delimited - that's the default for genfromtxt. I've added some ipython output showing that this method works.
":) This worked for me! I guess I should take my previous comment back!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.