Torchzero - modular optimization library

qq-me · July 14, 2025, 7:56pm

I made a library about optimization GitHub - inikishev/torchzero: Modular optimization library for PyTorch.

The basic idea is that everything is a module and different modules can be combined, a basic example of Cautious Adam with gradient clipping

opt = tz.Modular(
    model.parameters(),
    tz.m.ClipValue(4),
    tz.m.Adam(),
    tz.m.Cautious(),
    tz.m.LR(1e-3),
    tz.m.WarmupNormClip(steps=100),
)

but the modules can do a lot of things, like gradient approximation, line searches, for example here is BFGS with approximated gradient and backtracking line search

opt = tz.Modular(
    bench.parameters(),
    tz.m.FDM(),
    tz.m.BFGS(),
    tz.m.Backtracking(),
)

There is a module for sharpness-aware minimization which now can be used with any other optimizer (SAM, ESAM and MSAM), various preconditioning modules (Shampoo, SophiaG, AdaHessian, Full-matrix adagrad, a huge number of quasi-newton methods, newton, etc) that can also be applied to any other algorithm. Another thing that is possible is LoRA-like methods for low-memory training with any other optimizer, this is work in progress.

Algorithms can be combined in other ways too, like a sum tz.m.Sum(tz.m.Adam(), tz.m.SOAP()) , grafting tz.m.GraftModules(tz.m.Shampoo(), tz.m.RMSprop()) , or just put output of one optimizer into another one.

Besides the modularity a lot of the modules are algorithms that didn’t seem to have pytorch implementations available, especially a lot of quasi-newton methods.

A list of all modules is available at https://torchzero.readthedocs.io/en/latest/autoapi/torchzero/modules/index.html.

I will also add nonlinear least squares methods like gauss-newton soon, they can be integrated very easily, and will automatically support my trust region modules. Constrained optimization can also be added easily but I’m not motivated to work on it for now because I don’t have any uses for it. Let me know if there is anything else that is missing