TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM support to Java
Briefly

TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM support to Java
"The TornadoVM project recently reached version 2.0, a major milestone for the open-source project that aims to provide a heterogeneous hardware runtime for Java. This release is likely to be of particular interest to teams developing LLM solutions on the JVM. The project automatically accelerates Java programs on multi-core CPUs, GPUs, and FPGAs. It does not replace existing JVMs, but instead adds the capability of offloading Java code to the backends,"
"handling memory management between Java and hardware accelerators, and running the compute-kernels. This capability provides a key component of modern cloud and ML workloads. InfoQ has previously covered the project in 2020 and 2022. TornadoVM compiles Java bytecode at runtime (by acting as a JIT compiler) to one of three backends: OpenCL C, NVIDIA CUDA PTX, and SPIR-V binary. Developers can choose which backends to install and run depending on their specific systems."
"Note that not every sort of Java computation is amenable to being offloaded to TornadoVM. For example, workloads with for-loops that do not have dependencies between iterations are very good candidates, as these allow computation in parallel. In particular, matrix-based applications such as machine learning and deep learning are good candidates. Other good examples of this pattern are physics simulations (e.g., N-body particle computation), financial applications such as Black-Scholes, and a range of applications in computer vision, computational photography, natural language processing, and signal processing."
TornadoVM 2.0 provides a heterogeneous hardware runtime enabling Java bytecode to be JIT-compiled at runtime and offloaded to multi-core CPUs, GPUs, and FPGAs. It integrates with existing JVMs and handles memory management and kernel execution between Java and accelerators. Compilation targets include OpenCL C, NVIDIA CUDA PTX, and SPIR-V, selectable per system. Workloads with parallelizable loop iterations, such as matrix operations, ML/DL, physics simulations, Black-Scholes, computer vision, and NLP, are good candidates. Two programming models are available: a Loop Parallel API using Java annotations (@Parallel, @Reduce) and a Kernel API exposing GPU-style constructs like thread IDs and local memory.
Read at InfoQ
Unable to calculate read time
[
|
]