1

I have a function that takes all, non-distinct, MatchId and (xG_Team1 vs xG_Team2, paired) and gives an output of as an array. which then summed up to be sse constant.

The problem with the function is it iterates through each row, duplicating MatchId. I want to stop this.

For each distinct MatchId I need the corresponding home and away goals as a list. I.e. Home_Goal and Away_Goal to be used in each iteration. from Home_Goal_time and Away_Goal_time columns of the dataframe. The list below doesn't seem to work.

MatchId Event_Id   EventCode        Team1        Team2      Team1_Goals
0   842079  2053    Goal Away    Huachipato  Cobresal       0
1   842079  2053    Goal Away    Huachipato  Cobresal       0
2   842080  1029    Goal Home      Slovan    lava           3
3   842080  1029    Goal Home      Slovan    lava           3
4   842080  2053    Goal Away      Slovan    lava           3
5   842080  1029    Goal Home      Slovan    lava           3
6   842634  2053    Goal Away      Rosario   Boca Juniors   0
7   842634  2053    Goal Away      Rosario   Boca Juniors   0
8   842634  2053    Goal Away      Rosario   Boca Juniors   0
9   842634  2054  Cancel Goal Away Rosario   Boca Juniors   0

    Team2_Goals xG_Team1    xG_Team2    CurrentPlaytime  Home_Goal_Time Away_Goal_Time
0   2       1.79907     1.19893     2616183         0       87
1   2       1.79907     1.19893     3436780         0       115
2   1       1.70662     1.1995      3630545         121     0
3   1       1.70662     1.1995      4769519         159     0
4   1       1.70662     1.1995      5057143         0       169
5   1       1.70662     1.1995      5236213         175     0
6   2       0.82058     1.3465      2102264         0       70
7   2       0.82058     1.3465      4255871         0       142
8   2       0.82058     1.3465      5266652         0       176
9   2       0.82058     1.3465      5273611         0       0

For example MatchId = 842079, Home_goal =[], Away_Goal = [87, 115]

x1 = [1,0,0] 
x2 = [0,1,0] 
x3 = [0,0,1]
m = 1 ,arbitrary constant used to optimise sse.
k = 196
total_timeslot = 196 
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal

def sum_squared_diff(x1, x2, x3, y):
    ssd = []
    for k in range(total_timeslot):  # k will take multiple values
        if k in Home_Goal:
            ssd.append(sum((x2 - y) ** 2))
        elif k in Away_Goal:
            ssd.append(sum((x3 - y) ** 2))
        else:
            ssd.append(sum((x1 - y) ** 2))
    return ssd

def my_function(row):
    xG_Team1 = row.xG_Team1
    xG_Team2 = row.xG_Team2
    return np.array([1-(xG_Team1*m + xG_Team2*m)/k, xG_Team1*m/k, xG_Team2*m/k])

results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)

results
sum(results.sum())

For the three matches above the desire outcome should look like the following. If I need an individual sse, sum(sum_squared_diff(x1, x2, x3, y)) gives me the following.

MatchId =  842079   =  3.984053038520635
MatchId =  842080   =  7.882189570700502
MatchId =  842080   =  5.929085973050213

Given the size of the original data, realistically I am after the total sum of the sse. For the above sample data, simply adding up the values give total sse=17.79532858227135.` Once I achieve this, then I will try to optimise the sse based on this figure by updating the arbitrary value m.

Here are the lists i hoped the function will iterate over.

Home_scored = s.groupby('MatchId')['Home_Goal_time'].apply(list)
Away_scored = s.groupby('MatchId')['Away_Goal_Time'].apply(list)
type(HomeGoal)
pandas.core.series.Series

Then convert it to lists.

Home_Goal = Home_scored.tolist()
Away_Goal = Away_scored.tolist()
type(Home_Goal)
 list

 Home_Goal
Out[303]: [[0, 0], [121, 159, 0, 175], [0, 0, 0, 0]]


Away_Goal 
Out[304]: [[87, 115], [0, 0, 169, 0], [70, 142, 176, 0]]

But the function still takes Home_Goal and Away_Goal as empty list.

1
  • Sample output please? Commented Jun 18, 2018 at 13:16

1 Answer 1

1

If you only want to consider one MatchId at a time you should .groupby('MatchID') first

df.groupby('MatchID').apply(...)
Sign up to request clarification or add additional context in comments.

1 Comment

thank you for taking your time to look at the question. Can you Please take a look at the edited post?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.