How to Efficiently Use JAX for Numerical Pipelines

Lomanu4 · 12 Май 2025

Introduction

In the world of numerical computing, transitioning from NumPy to JAX can offer significant performance benefits thanks to JAX's Just-In-Time (JIT) compilation and optimizations for GPU acceleration. However, this transition isn't always smooth for all operations, particularly for those that are heavily transformation-based, such as broadcast_to and moveaxis. In this article, we will explore why these functions may perform slower in JAX compared to NumPy and discuss best practices to optimize their usage.

Understanding the Performance Discrepancy

Why JAX May Be Slower for Basic Operations

The performance of JAX can vary significantly depending on the operations being performed. While JAX is designed for efficient numerical computation and can excel in larger batch sizes and complex computations, some basic operations may take longer due to:

Overhead for GPU/TPU Operations: JAX incurs a setup overhead for transferring data to the device (GPU/TPU) and compiling functions with JIT.
Function Compilation: If the function has not been compiled with JIT, JAX must interpret the operation, which can be slower, especially for simpler tasks.
Use of Device Arrays: JAX arrays behave differently than NumPy arrays and can lead to performance bottlenecks if not handled properly.

Step-by-Step Benchmarking of JAX vs NumPy

To better understand the differences, let's perform a benchmark comparing the use of moveaxis and broadcast_to in both NumPy and JAX for a large batch size.

Preparing the Benchmark Code

Here’s a minimal code example that benchmarks moveaxis combined with broadcast_to, as well as broadcast_to alone in both libraries:

import timeit
import jax
import jax.numpy as jnp
import numpy as np
from jax import jit

# Base transformation matrix
M_np = np.array([[1, 0, 0, 0.5],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]])

M_jax = jnp.array(M_np)

# Batch size
n = 1_000_000

print("### Benchmark: moveaxis + broadcast_to ###")

# NumPy

t_numpy = timeit.timeit(
lambda: np.moveaxis(np.broadcast_to(M_np[:, :, None], (4, 4, n)), 2, 0),
number=10
)
print(f"NumPy: moveaxis + broadcast_to → {t_numpy:.6f} s")

# JAX

t_jax = timeit.timeit(
lambda: jnp.moveaxis(jnp.broadcast_to(M_jax[:, :, None], (4, 4, n)), 2, 0).block_until_ready(),
number=10
)
print(f"JAX: moveaxis + broadcast_to → {t_jax:.6f} s")

# JAX JIT
@jit
def broadcast_and_move_jax(M):
return jnp.moveaxis(jnp.broadcast_to(M[:, :, None], (4, 4, n)), 2, 0)

# Warm-up
broadcast_and_move_jax(M_jax).block_until_ready()

t_jit = timeit.timeit(
lambda: broadcast_and_move_jax(M_jax).block_until_ready(),
number=10
)
print(f"JAX JIT: moveaxis + broadcast_to → {t_jit:.6f} s")

print("\n### Benchmark: broadcast_to only ###")

# NumPy

t_numpy_b = timeit.timeit(
lambda: np.broadcast_to(M_np[:, :, None], (4, 4, n)),
number=10
)
print(f"NumPy: broadcast_to → {t_numpy_b:.6f} s")

# JAX

t_jax_b = timeit.timeit(
lambda: jnp.broadcast_to(M_jax[:, :, None], (4, 4, n)).block_until_ready(),
number=10
)
print(f"JAX: broadcast_to → {t_jax_b:.6f} s")

# JAX JIT
@jit
def broadcast_only_jax(M):
return jnp.broadcast_to(M[:, :, None], (4, 4, n))

broadcast_only_jax(M_jax).block_until_ready()

t_jit_b = timeit.timeit(
lambda: broadcast_only_jax(M_jax).block_until_ready(),
number=10
)
print(f"JAX JIT: broadcast_to → {t_jit_b:.6f} s")

Interpreting the Results

After running the above code, you should see output with the execution times for each operation:

NumPy tends to perform better for simpler operations, as indicated by lower execution times.
JAX might show slower times when operations are not compiled with JIT, but with JIT, it can often catch up or outperform NumPy in more complex scenarios.

Tips for Optimizing JAX Operations

To maximize performance when using JAX for numerical transformations:

Utilize JIT Compilation: For larger batches and more complex operations, always utilize JIT to speed up execution times.
Optimize Array Structure: Ensure your data is structured to minimize overhead when JAX is transferring to device arrays.
Batch Size Consideration: Adjust your batch size based on your device's capabilities. Sometimes reducing batch size can yield better overall performance.

Frequently Asked Questions

Q1: Why is JAX slower for simple operations like moveaxis? A1: The overhead from device transfers and compilation can cause JAX to lag behind NumPy for simple operations without JIT.

Q2: How do I know if JIT is beneficial? A2: Compare execution times with and without JIT for your specific use case. JIT typically shows benefits for larger and more complex computations.

Q3: Can I completely switch to JAX for all numerical computations? A3: While JAX is powerful, it may not surpass NumPy in all cases. Analyze performance benchmarks for your specific operations to make an informed choice.

Conclusion

Migrating numerical pipelines from NumPy to JAX can yield considerable performance improvements, especially when leveraging GPU acceleration. However, understanding the nuances of each library and utilizing best practices can ensure you unlock the full potential of JAX in your applications.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

How to Efficiently Use JAX for Numerical Pipelines

Lomanu4