I am currently on Python 3.4.1, but I don't have access to any modules like Pandas or Numpy on my work pc.
I originally wrote a VBA program in excel where the original data is on Sheet1, the new data in on Sheet2, and the difference between the two sheets is on Sheet3. My program did the following three things:
- Sort the data based upon the values in the first column (they can be integers or alphanumeric).
- Sequentially order the horizontal rows so the items in the first columns match each other; if they don't match then an extra blank row is added so the rows align with each other.
- Create a new result tab and compare the rows. If anything is different between them then it copies over the entire row information from the original CSV file.
Since it was excruciatingly slow, I decided to try and learn Python. In Python I'm able to compare the data but am now wanting to be able to sort the columns and order the rows.
For example:
Original CSV #1
Column1,Column2,Column3,Column4,Column5
1,b1,c11111,d1,e1
2,b2,c2,d2,e2
5,b5,c5,d5,e5,
25,b25,c25,d2555,e25
7,b7,c7,d7,e7
Original CSV #2
Column1,Column2,Column3,Column4,Column5
2,b2,c2,d2,e2
1,b1,c1,d1,e1
3,b3,c3,d3,e3
7,b7,c7,d7,e777
25,b25,c25,d25,e25
Since the values in Row 2 are the same in both files, that data is not copied into the results for either file.
Results CSV #1
Column1,Column2,Column3,Column4,Column5
1,b1,c11111,d1,e1
5,b5,c5,d5,e5
7,b7,c7,d7,e7
25,b25,c25,d2555,e25
Results CSV #2
Column1,Column2,Column3,Column4,Column5
1, b1,c1,d1,e1
3,b3,c3,d3,e3
7,b7,c7,d7,e777
25,b25,c25,d25,e25
With the code below, I can accomplish step #3.
strpath = 'C://Users//User//Desktop//compare//'
strFileNameA = 'File1'
strFileNameB = 'File2'
testfile1 = open(strpath + strFileNameA + '.csv', 'r')
testfile2 = open(strpath + strFileNameB + '.csv', 'r')
testresult1 = open(strpath + strFileNameA + '-Results' + '.csv', 'w')
testresult2 = open(strpath + strFileNameB + '-Results' + '.csv', 'w')
testlist1 = testfile1.readlines()
testlist2 = testfile2.readlines()
k=1
z=0
for i,j in zip(testlist1,testlist2):
if k==1:
testresult1.write(i.rstrip('\n') + ('n'))
if i!=j:
testresult1.write(i.rstrip('\n') + ('n'))
testresult2.write(j.rstrip('\n') + ('n'))
z = z+1
k =int(k)
k = k+1
if z ==0:
testresult1.write('Exact match for ' + str(k) + ' rows')
testresult1.write('Exact match for ' + str(k) + ' rows')
testfile1.close()
testfile2.close()
testresult1.close()
testresult2.close()