0

I have written a script in python, which works on a single file. I couldn't find an answer to make it run on multiple files and to give output for each file separately.

out = open('/home/directory/a.out','w')
infile = open('/home/directory/a.sam','r')

for line in infile:
    if not line.startswith('@'):
        samlist = line.strip().split()
        if 'I' or 'D' in samlist[5]:
            match = re.findall(r'(\d+)I', samlist[5]) # remember to chang I and D here aswell
            intlist = [int(x) for x in match]
##            if len(intlist) < 10:
            for indel in intlist:
                if indel >= 10:
##                    print indel
            ###intlist contains lengths of insertions in for each read
            #print intlist
                    read_aln_start = int(samlist[3])
                    indel_positions = []
                    for num1, i_or_d, num2, m in re.findall('(\d+)([ID])(\d+)?([A-Za-z])?', samlist[5]):
                        if num1:
                            read_aln_start += int(num1)
                        if num2:
                            read_aln_start += int(num2)
                        indel_positions.append(read_aln_start)
                #print indel_positions
                    out.write(str(read_aln_start)+'\t'+str(i_or_d) + '\t'+str(samlist[2])+ '\t' + str(indel) +'\n')
out.close()

I would like my script to take multiple files with names like a.sam, b.sam, c.sam and for each file give me the output : aout.sam, bout.sam, cout.sam

Can you please pass me either a solution or a hint.

Regards, Irek

6
  • 1
    Have you tried wrapping that script in a function and passing the names of the input and output file as parameters? Commented Jul 18, 2013 at 9:29
  • 3
    if 'I' or 'D' in samlist[5] doesn't do what you think it does. This condition is always true. Commented Jul 18, 2013 at 9:31
  • I don't think it's always true. Only some lines contain I or D. Most of them is actually without any of these letters, so then the condition is false. Commented Jul 18, 2013 at 9:38
  • 1
    @Irek Python interprets any non-empty string as a boolean True, so the above condition is essentially if bool('I') or ('D' in samlist[5]): Commented Jul 18, 2013 at 9:40
  • 1
    And the correct way of writing that code is if samlist[5] in ('I', 'D') Commented Jul 18, 2013 at 9:41

3 Answers 3

4

Loop over filenames.

input_filenames = ['a.sam', 'b.sam', 'c.sam']
output_filenames = ['aout.sam', 'bout.sam', 'cout.sam']
for infn, outfn in zip(input_filenames, output_filenames):
    out = open('/home/directory/{}'.format(outfn), 'w')
    infile = open('/home/directory/{}'.format(infn), 'r')
    ...

UPDATE

Following code generate output_filenames from given input_filenames.

import os

def get_output_filename(fn):
    filename, ext = os.path.splitext(fn)
    return filename + 'out' + ext

input_filenames = ['a.sam', 'b.sam', 'c.sam'] # or glob.glob('*.sam')
output_filenames = map(get_output_filename, input_filenames)
Sign up to request clarification or add additional context in comments.

9 Comments

Not exactly what I'm looking for. I still need to write all of the filenames. It's cool until you have 100 files in the directory
@Irek, Added another code that generate output_filenames from input_filenames.
Ok great. Is it possible to also generate fileinput names?
@Irek, you can use glob.glob('*.sam'). But that will only work for first time. Once the script run, glob.glob('*.sam') will also include output filenames because both input filenames, output filenames ends with .sam.
Ok. That's not a problem, I put my output to separate directory, that should do the trick
|
1

I'd recommend wrapping that script in a function, using the def keyword, and passing the names of the input and output files as parameters to that function.

def do_stuff_with_files(infile, outfile):
    out = open(infile,'w')
    infile = open(outfile,'r')
    # the rest of your script

Now you can call this function for any combination of input and output file names.

do_stuff_with_files('/home/directory/a.sam', '/home/directory/a.out')

If you want to do this for all files in a certain directory, use the glob library. To generate the output filenames, just replace the last three characters ("sam") with "out".

import glob
indir, outdir = '/home/directory/', '/home/directory/out/'
files = glob.glob1(indir, '*.sam')
infiles  = [indir  + f              for f in files]
outfiles = [outdir + f[:-3] + "out" for f in files]
for infile, outfile in zip(infiles, outfiles):
    do_stuff_with_files(infile, outfile)

5 Comments

glob.glob('/home/directory/*.out') will not work, because you have to create output file before run the script.
@falsetru Yes, realized that, too. Borrowed your method for that. ;-)
@tobias_k if I would like also to make my outfiles in different directory?
How can I modify the code if I want to make process on my inFile before writing the outFile? @tobias_k
@user91 What do you mean? You just add that code to do_stuff_with_files
1

The following script allows working with an input and output file. It will loop over all files in the given directory with the ".sam" extension, perform the specified operation on them, and output the results to a separate file.

Import os
# Define the directory containing the files you are working with
path = '/home/directory'
# Get all the files in that directory with the desired
# extension (in this case ".sam")
files = [f for f in os.listdir(path) if f.endswith('.sam')]
# Loop over the files with that extension
for file in files:
    # Open the input file
    with open(path + '/' + file, 'r') as infile:
        # Open the output file
        with open(path + '/' + file.split('.')[0] + 'out.' +
                               file.split('.')[1], 'a') as outfile:
            # Loop over the lines in the input file
            for line in infile:
                # If a line in the input file can be characterized in a
                # certain way, write a different line to the output file.
                # Otherwise write the original line (from the input file)
                # to the output file
                if line.startswith('Something'):
                    outfile.write('A different kind of something')
                else:
                    outfile.write(line)
    # Note the absence of either a infile.close() or an outfile.close()
    # statement. The with-statement handles that for you

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.