Any help is greatly appreciated!! I have been trying to solve this for the last few days....
I have two arrays: import pandas as pd
OldDataSet = {
'id': [20,30,40,50,60,70]
,'OdoLength': [26.12,43.12,46.81,56.23,111.07,166.38]}
NewDataSet = {
'id': [3000,4000,5000,6000,7000,8000]
,'OdoLength': [25.03,42.12,45.74,46,110.05,165.41]}
df1= pd.DataFrame(OldDataSet)
df2 = pd.DataFrame(NewDataSet)
OldDataSetArray = df1.as_matrix()
NewDataSetArray = df2.as_matrix()
The result that I am trying to get is:
Array 1 and Array 2 Match by closes difference, based on left over number from Array2
20 26.12 3000 25.03
30 43.12 4000 42.12
40 46.81 6000 46
50 56.23 7000 110.05
60 111.07 8000 165.41
70 166.38 0 0
Starting at Array 1, ID 20, find the nearest which in this case would be the first Number in Array 2 ID 3000 (26.12-25.03). so ID 20, gets matched to 3000. Where it gets tricky is if one value in Array 2 is not the closest, then it is skipped. for example, ID 40 value 46.81 is compared to 45.74, 46 and the smallest value is .81 from 46 ID 6000. So ID 40--> ID 6000. ID 5000 in array 2 is now skipped for any future comparisons. So now when comparing array 1 ID 50, it is compared to the next available number in array 2, 110.05. array 1 ID 50 is matched to Array 2 ID 7000.
UPDATE
so here's the code that i have tried and it works. Yes, it is not the greatest, so if someone has another suggestion please let me know.
import pandas as pd
import operator
OldDataSet = {
'id': [20,30,40,50,60,70]
,'OdoLength': [26.12,43.12,46.81,56.23,111.07,166.38]}
NewDataSet = {
'id': [3000,4000,5000,6000,7000,8000]
,'OdoLength': [25.03,42.12,45.74,46,110.05,165.41]}
df1= pd.DataFrame(OldDataSet)
df2 = pd.DataFrame(NewDataSet)
OldDataSetArray = df1.as_matrix()
NewDataSetArray = df2.as_matrix()
newPos = 1
CurrentNumber = 0
OldArrayLen = len(OldDataSetArray) -1
NewArrayLen = len(NewDataSetArray) -1
numberResults = []
for oldPos in range(len(OldDataSetArray)):
PreviousNumber = abs(OldDataSetArray[oldPos, 0]- NewDataSetArray[oldPos, 0])
while newPos <= len(NewDataSetArray) - 1:
CurrentNumber = abs(OldDataSetArray[oldPos, 0] - NewDataSetArray[newPos, 0])
#if it is the last row for the inner array, then match the next available
#in Array 1 to that last record
if newPos == NewArrayLen and oldPos < newPos and oldPos +1 <= OldArrayLen:
numberResults.append([OldDataSetArray[oldPos +1, 1],NewDataSetArray[newPos, 1],OldDataSetArray[oldPos +1, 0],NewDataSetArray[newPos, 0]])
if PreviousNumber < CurrentNumber:
numberResults.append([OldDataSetArray[oldPos, 1], NewDataSetArray[newPos - 1, 1], OldDataSetArray[oldPos, 0], NewDataSetArray[newPos - 1, 0]])
newPos +=1
break
elif PreviousNumber > CurrentNumber:
PreviousNumber = CurrentNumber
newPos +=1
#sort by array one values
numberResults = sorted(numberResults, key=operator.itemgetter(0))
numberResultsDf = pd.DataFrame(numberResults)