5

We have one program in Java and one in Python, and need to get them taking together in a ping-pong manner, each time exchanging an integer array of length 100,000, and taking ~ 0.1 - 1 second to do their work:

  1. Java does some work and fires an int array of length 100,000 over to ...
  2. Python, which does some work and fires a new array of length 100,000 back to ...
  3. Java, which does some work ... etc

Note that

  • Each program needs to wait for the other to do it's part.
  • They will run on the same Linux machine.
  • We will do Monte Carlo simulation, so speed is important.

I am more familiar with Java, and understand that a shared memory backed file approach is likely to be the fastest. This seems relevant for the Java side, but how would I get each program to wait/block for the other to complete its work and update the shared memory before the other starts reading? I've heard of something called 'semaphore', but can't figure it out.

These are my fallback ideas, but perhaps they are better?

14
  • 4
    Maybe a stupid suggestion, but if they run on the same machine, couldn't you execute the Python program via Jython (which should simplify the communication)? Commented Apr 29, 2017 at 16:02
  • 1
    Well you can just iterate from 0 to 100,000 and read an integer from a socket Commented Apr 29, 2017 at 16:21
  • 3
    I hope you see how much complexity that switch between Java and python creates. I would vote to either use jython or to drop one side and do all computations either with java or python. Commented Apr 29, 2017 at 16:34
  • 1
    Yes, interprocess communication is relatively simple but gets more complex when you're required to use more than one language Commented Apr 29, 2017 at 16:47
  • 1
    If you are doing Monte Carlo, your bottleneck should be the amount of CPU you are using. Using loopback should be more than fast enough. You can pass 400KB over loopback in a few milli-seconds. Commented Apr 29, 2017 at 18:10

2 Answers 2

2

You could try combining java and python in the same process using jep. The latest release added support for sharing memory between python and java using numpy ndarrays and java direct buffers. This would let you share the data without any copying which should give the best performance possible.

Sign up to request clarification or add additional context in comments.

Comments

1

Go for a fast intermediary data server to assist in communication between them. Redis would do the trick. You'll need two data structures there:

  1. a list (your list of 100,000 items). We'll call that my_project:list for reference.
  2. a lock. This can just be a Redis string set to "Python" or "Java."

Then have the following interaction:

  1. Both Python and Java poll the Redis lock. If it's equal to "Python", it's Python's turn. If "Java," it's Java's turn.
  2. Whichever program's turn it is goes into work mode and does whatever it needs to my_project:list, then it sets the lock to the other program's turn.
  3. Repeat indefinitely.

2 Comments

Makes sense, but worried that the polling strategy would make things much slower than blocking on sockets.
You can run a timeit for this. Polling won't cause more than a couple milliseconds lag tops for you as long as everything is on the same box. If your time horizon is 100ms - 1s anyway, this won't affect your timings in a big way, but will massively simplify your development. You would not have a good time debugging blocking on sockets. You're trading a ~.1 - 5% slowdown for much greater simplicity and development speed. It's a good trade. Use the extra time to you'll save debugging to speed up your Java and Python programs, which will be taking up the vast majority of your processing time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.