How to Integrate Python into a
Scala Stack to Build
Realtime Predictive Models
Jerry Chou
Lead Research Engineer
jerry@fliptop.com
Stories Beforehand
• Product pivoted
• Data search => data analysis
• Build on top of existing infrastructure (hosted on AWS & Azure)
• Need tools for scientific computation
• Mahout (Java)
• Weka (Java)
• Scikit-learn (Python)
2
Agenda
• Requirements and high level concepts
• Tools for calling Python from Scala
• Decision making
3
High Level Concept - Before
4
Existing business logic
(in both Scala & Java)
Modeling Logic
(in Python)
Node 1
Modeling Logic
(in Python)
Node 2
… Modeling Logic
(in Python)
Node N
Requirements
• APIs to exploit Python’s modeling power
• Train, predict, model info query, etc
• Scalability
• On demand Python serving nodes
5
Tools for Scala-Python Integration
• Reimplementation of Python
• Jython (JPython)
• Communication through JNI
• Jepp
• Communication through IPC
• Thrift
• Communication through REST API calls
• Bottle
6
Jython (JPython)
• Re-Implementation of Python in Java
• Compiles to Java bytecode
• either on demand or statically.
• Can import and use any Java class
7
Jython
8
JVM
Scala Code
Python Code
Jython
Jython
• Lacks support for lots of extensions for
scientific computing
• Numpy, Scipy, etc.
• JyNI to the rescue?
• Not ready yet for even Numpy
9
10
糟透了 全部重做
Communication through JNI
•Jepp (Java Embedded Python)
• Embeds CPython in Java
• Runs Python code in CPython
• Leverages both JNI and Python/C API for integration
11
Python Interpreter
Jepp
12
JVM
Scala Code
Python Code
JNI Jepp
Jepp
13
object TestJepp extends App {
val jep = new Jep()
jep.runScript("python_util.py")
val a = (2).asInstanceOf[AnyRef]
val b = (3).asInstanceOf[AnyRef]
val sumByPython = jep.invoke("python_add", a, b)
}
object TestJepp extends App {
val jep = new Jep()
jep.runScript("python_util.py")
val a = (2).asInstanceOf[AnyRef]
val b = (3).asInstanceOf[AnyRef]
val sumByPython = jep.invoke("python_add", a, b)
}
def python_add(a, b):
return a + b
def python_add(a, b):
return a + b
python_util.py
TestJepp.scala
Communication through IPC
• Thrift
•Developed & open sourced by Facebook
•IDL-based (Interface Definition Language)
•Generates server/client code in specified languages
•Take care of protocol and transport layer details
•Comes with generators for Java, Python, C++, etc.
• No Scala generator
• Scrooge to the rescue!
14
Thrift – IDL
15
namespace java python_service_test
namespace py python_service_test
service PythonAddService
{
i32 pythonAdd (1:i32 a, 2:i32 b),
}
namespace java python_service_test
namespace py python_service_test
service PythonAddService
{
i32 pythonAdd (1:i32 a, 2:i32 b),
}
TestThrift.thrift
$ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift
Thrift – Python Server
class ExampleHandler(python_service_test.PythonAddService.Iface):
def pythonAdd(self, a, b):
return a + b
handler = ExampleHandler()
processor = Example.Processor(handler)
transport = TSocket.TServerSocket(9090)
tfactory = TTransport.TBufferedTransportFactory()
pfactory = TBinaryProtocol.TBinaryProtocolFactory()
server = TServer.TThreadedServer(processor, transport, tfactory, pfactory)
server.serve()
class ExampleHandler(python_service_test.PythonAddService.Iface):
def pythonAdd(self, a, b):
return a + b
handler = ExampleHandler()
processor = Example.Processor(handler)
transport = TSocket.TServerSocket(9090)
tfactory = TTransport.TBufferedTransportFactory()
pfactory = TBinaryProtocol.TBinaryProtocolFactory()
server = TServer.TThreadedServer(processor, transport, tfactory, pfactory)
server.serve()
PythonAddServer.py
class Iface:
def pythonAdd(self, a, b):
pass
class Iface:
def pythonAdd(self, a, b):
pass
PythonAddService.py
Thrift – Scala Client
17
object PythonAddClient extends App {
val transport: TTransport = new TSocket("localhost", 9090)
val protocol: TProtocol = new TBinaryProtocol(transport)
val client = new PythonAddService.Client(protocol)
transport.open()
val sumByPython = client.python_add(3, 5)
println("3 + 5 = " + sumByPython)
transport.close()
}
object PythonAddClient extends App {
val transport: TTransport = new TSocket("localhost", 9090)
val protocol: TProtocol = new TBinaryProtocol(transport)
val client = new PythonAddService.Client(protocol)
transport.open()
val sumByPython = client.python_add(3, 5)
println("3 + 5 = " + sumByPython)
transport.close()
}
PythonAddClient.scala
Thrift
18
JVM Scala Code
Thrift
Python Code
Python Interpreter
Thrift
Python Code
Python Interpreter
Thrift
…
Auto Balancing 、
Built-in Encryption
19
哦 ~ 還不錯
REST API Architecture
20
…Bottle
Python Code
Bottle
Python Code
Bottle
Python Code
JVM
Scala Code
Auto Balancer?
Encoding?
Thrift v.s. REST
Thrift RES
T
Load Balancer
✔
Encode / Decode
✔
Low Learning Curve
✔
No Dependency
✔
Does it matter?
No
(AWS & Azure)
No
(We’re already doing it)
Maybe
Yes
Fliptop’s Architecture
22
Load Balancer
…Bottle
Python Code
Bottle
Python Code
Bottle
Python Code
JVM Scala Code
5 Python servers
~4,500 requests/sec
Summary
• Jython
• (✓) Tight integration with Scala/Java
• (✗) Lack support for C extensions (JyNI might help in the future)
• Jepp
• (✓) Access high quality Python extensions with CPython speed
• (✗) Two runtime environments
• Thrift, REST
• (✓) Language-independent development
• (✗) Bigger communication overhead
23
Thank You
24
Other tools
• JyNI (Jython Native Interface)
• A compatibility layer to enable Jython to use native CPython
extensions like NumPy or SciPy
• Binary compatible with existing builds
• Cython
• A subset of Python implementation written in Python that
translates Python codes to C
• JNA (Java Native Access)
• JNI-based wrapper providing Java programs access to native
shared libraries
• JPE (Java-Python Extension)
• JNI-based wrapper integrating Java and standard Python
• last updated at: 2013-03-22
25

[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

  • 1.
    How to IntegratePython into a Scala Stack to Build Realtime Predictive Models Jerry Chou Lead Research Engineer jerry@fliptop.com
  • 2.
    Stories Beforehand • Productpivoted • Data search => data analysis • Build on top of existing infrastructure (hosted on AWS & Azure) • Need tools for scientific computation • Mahout (Java) • Weka (Java) • Scikit-learn (Python) 2
  • 3.
    Agenda • Requirements andhigh level concepts • Tools for calling Python from Scala • Decision making 3
  • 4.
    High Level Concept- Before 4 Existing business logic (in both Scala & Java) Modeling Logic (in Python) Node 1 Modeling Logic (in Python) Node 2 … Modeling Logic (in Python) Node N
  • 5.
    Requirements • APIs toexploit Python’s modeling power • Train, predict, model info query, etc • Scalability • On demand Python serving nodes 5
  • 6.
    Tools for Scala-PythonIntegration • Reimplementation of Python • Jython (JPython) • Communication through JNI • Jepp • Communication through IPC • Thrift • Communication through REST API calls • Bottle 6
  • 7.
    Jython (JPython) • Re-Implementationof Python in Java • Compiles to Java bytecode • either on demand or statically. • Can import and use any Java class 7
  • 8.
  • 9.
    Jython • Lacks supportfor lots of extensions for scientific computing • Numpy, Scipy, etc. • JyNI to the rescue? • Not ready yet for even Numpy 9
  • 10.
  • 11.
    Communication through JNI •Jepp(Java Embedded Python) • Embeds CPython in Java • Runs Python code in CPython • Leverages both JNI and Python/C API for integration 11
  • 12.
  • 13.
    Jepp 13 object TestJepp extendsApp { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } def python_add(a, b): return a + b def python_add(a, b): return a + b python_util.py TestJepp.scala
  • 14.
    Communication through IPC •Thrift •Developed & open sourced by Facebook •IDL-based (Interface Definition Language) •Generates server/client code in specified languages •Take care of protocol and transport layer details •Comes with generators for Java, Python, C++, etc. • No Scala generator • Scrooge to the rescue! 14
  • 15.
    Thrift – IDL 15 namespacejava python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } TestThrift.thrift $ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift
  • 16.
    Thrift – PythonServer class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() PythonAddServer.py class Iface: def pythonAdd(self, a, b): pass class Iface: def pythonAdd(self, a, b): pass PythonAddService.py
  • 17.
    Thrift – ScalaClient 17 object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } PythonAddClient.scala
  • 18.
    Thrift 18 JVM Scala Code Thrift PythonCode Python Interpreter Thrift Python Code Python Interpreter Thrift … Auto Balancing 、 Built-in Encryption
  • 19.
  • 20.
    REST API Architecture 20 …Bottle PythonCode Bottle Python Code Bottle Python Code JVM Scala Code Auto Balancer? Encoding?
  • 21.
    Thrift v.s. REST ThriftRES T Load Balancer ✔ Encode / Decode ✔ Low Learning Curve ✔ No Dependency ✔ Does it matter? No (AWS & Azure) No (We’re already doing it) Maybe Yes
  • 22.
    Fliptop’s Architecture 22 Load Balancer …Bottle PythonCode Bottle Python Code Bottle Python Code JVM Scala Code 5 Python servers ~4,500 requests/sec
  • 23.
    Summary • Jython • (✓)Tight integration with Scala/Java • (✗) Lack support for C extensions (JyNI might help in the future) • Jepp • (✓) Access high quality Python extensions with CPython speed • (✗) Two runtime environments • Thrift, REST • (✓) Language-independent development • (✗) Bigger communication overhead 23
  • 24.
  • 25.
    Other tools • JyNI(Jython Native Interface) • A compatibility layer to enable Jython to use native CPython extensions like NumPy or SciPy • Binary compatible with existing builds • Cython • A subset of Python implementation written in Python that translates Python codes to C • JNA (Java Native Access) • JNI-based wrapper providing Java programs access to native shared libraries • JPE (Java-Python Extension) • JNI-based wrapper integrating Java and standard Python • last updated at: 2013-03-22 25