2

I'm writing a physics game, and I'm trying to speed up my motion calculations. Every tick of the update cycle, I call an rk4 routine which calls an ODE function 4 times, passing updated values for dt/2 and dt etc each of the four calls, to adjust the weighting on integration- so this must run in sequence. However, the ODE itself calculates forces on every particle in the simulation, so I wanted to run those calculations in parallel- and then the four calls happen sequentially, with the dataset of positions and velocities fully updated.

I'm trying to use Panda3d's modified Tasks to implement this, but since modifying my code to re-use the same tasks, the positions are no longer updating visually. I'm at a bit of a loss as to how to debug any further- the position variable seems to be being modified, but for some reason, this is no longer being passed to the nodepath rendering chain correctly. I feel like this is more of an overall implementation thing, and wondered if there's any tips on straightforward ways to do this, or online information that gives more detail than the Panda3d manuals (which are the main things holding me back from trying to use the Serials and Parallels features of the engine).

Here's the code I've added to my __init__() function for my Showbase Class:

self.spheres = np.zeros(sphereNum)
self.thetas = np.zeros(sphereNum)
self.sphere = self.loader.loadModel("../sphere")
self.sphere.setScale(0.5)
self.sphereNodep = NodePath('spheres') # placeholders can be attached to another node to spawn groups of instances
for i in range(sphereNum):
    placeholder = self.sphereNodep.attachNewNode("Sphere-Placeholder")
            placeholder.setScale(0.5)
            placeholder.setPos(pos[i,0],pos[i,1],pos[i,2])
            self.sphere.instanceTo(placeholder)
            self.thetas[i] = 0.0
    placeholder = render.attachNewNode("SphereGroup")
    placeholder.setPos(-50,180,5)
    self.sphereNodep.instanceTo(placeholder)

# create a task chain capable of multithreading
self.taskMgr.setupTaskChain('physTaskChain', 
                            numThreads=8, 
                            threadPriority=1)
# initialise variables and weights for rk4 in scope, each as an array of size sphereNum, populated with 3d vectors
self.k = np.array([np.zeros((sphereNum,3)) for _ in range(4)])
self.r = pos
self.v = vel
self.t = 0.0
self.rk4step = 0
self.tasks = []
# generate a task to calculate for each sphere
for i in range(sphereNum):
    self.tasks.append(taskMgr.doMethodLater(0, self.ode, "sphere_ode", extraArgs=[i], taskChain="physTaskChain", 
                                                                 sort=0, appendTask=True))

And here are the two member functions which run as Tasks:

def ode(self, i, task): 
    force = -dLJP(self.r,i) + coul(self.r,i) #+ grav(r,i)                                 #!!! CURRENTLY, ADDING GRAVITY HALVES THE FRAMERATE
    self.k[self.rk4step,i] = np.array([np.transpose(np.transpose(force) / mass[i])])    #!!!!!!!!!!!!!!!!! hacky - only works while masses are equal
    self.v[i] = self.v[i] + self.k[self.rk4step,i] * self.t         # find r'(t) = v(t) from a = r''(t)
    self.r[i] = self.r[i] + self.v[i] * self.t                      # find r(t)
    return task.done    

async def update(self, task):
    global pos, vel

    for i in range(4):
        if (taskMgr.mgr.getActiveTasks().hasTask(self.tasks[sphereNum-1])): # wait for previous round of tasks to finish
            for tsk in taskMgr.mgr.getActiveTasks():
                if (tsk.name == "sphere_ode"):
                    await tsk
        self.r = pos 
        self.rk4step = i
        if (i == 0): 
            self.t = 0
            self.v = vel
        elif (i == 1 or i == 2): 
            self.t = setdt/2. 
            self.v = vel + self.k[i-1] * setdt/2
        else: 
            self.t = setdt
            self.v = vel + self.k[3] * setdt

        for tsk in self.taskMgr.getDoLaters(): # run one task per sphere
            if (tsk.name == "sphere_ode"):
                tsk.again
        
    if (taskMgr.getTasksNamed("sphere_ode") != 0): # wait for previous round of tasks to finish
            for tsk in taskMgr.getTasksNamed("sphere_ode"):
                await tsk

    vel = self.v + setdt/6*(self.k[0] + 2*self.k[1] + 2*self.k[2] + self.k[3])
    pos = self.r

    # # update the positions of the spheres:
    for i in range(sphereNum):
        self.sphereNodep.getChild(i).setFluidPos(pos[i,0],pos[i,1],pos[i,2])
            (sin(self.thetas[i])*6)+10) 

    return task.cont

Any advice is appreciated!

6
  • In the current TaskChains approach, are the variables r, v and dt computed on the fly for each I, before being passed as extraArgs? Commented Jul 19 at 4:57
  • Note that in pure-Python, there are only two ways to perform parallelism on multicores: multithreading and multiprocessing. All libraries are wrappers on it (otherwise, they cannot support pure-python operations or perform true parallelism). Multiprocessing introduces a huge copy overhead in many cases and expensive syscalls so it is certainly useless here. Multithreading is limited by the GIL (except if you relax the GIL which is only possible in the very new Python versions and/or only with some modules). Commented Jul 19 at 10:19
  • On top of that, you should ensure not to spawn either processes or threads in the loop since it will make since pretty slow. This takes a significant time to do that (especially from a Python code) and I expect the computation time to be pretty small already because each frame of a game usually only takes no more than dozens of milliseconds. Commented Jul 19 at 10:21
  • Last but not least, if you call something which looks like parallel but is not from multiple threads, then It might seems to work but run much slower (possibly slower than a sequential code). This often happens because of race-conditions happening on shared data in the code. Indeed, when two threads share the same data and read/write on the same parts on it, cache line bouncing typically happens and this serialize the cache line access possibly creating a chain with a huge latency for each bouncing. This is the least of the problem since a race condition often results in wrong results. Commented Jul 19 at 10:27
  • 1
    Hey @Hari, the issue is that r and v are to be calculated every tick by these tasks, so passing them as extraArgs doesn't work for me to pass the data. I fear I might need to give them access to my numpy array of r and v values. dt is constant for now! Commented Jul 20 at 18:59

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.