ZFLOW AI’s cover photo
ZFLOW AI

ZFLOW AI

Software Development

Santa Clara, CA 61 followers

Bridging Hardware and Software to Optimize AI Systems at Scale

About us

ZFLOW AI is redefining AI infrastructure through simulation and optimization. Our platform helps engineers, researchers, and architects profile AI workloads, explore hardware-software tradeoffs, and optimize performance before deployment.

Website
https://www.zflow.ai
Industry
Software Development
Company size
2-10 employees
Headquarters
Santa Clara, CA
Type
Privately Held
Founded
2024

Locations

Employees at ZFLOW AI

Updates

  • Great update from the vLLM team — the new plugin system is a big step forward for flexible, upstream-safe LLM serving. This aligns perfectly with ZFLOW AI’s simulation-driven approach. We need a serving engine that can be customized (scheduling, KV-cache behavior, model variants) without forking. vLLM’s plugin layer gives us exactly that. This will let us integrate custom optimization logic directly into our simulation stack and deploy the same logic in production through ZFLOW Serve — closing the loop between simulation → optimization → real deployment. Excited to explore deeper synergy here. #AIInfrastructure #LLMInference #vLLM #ZFLOWAI

    View organization page for vLLM

    6,519 followers

    Need to customize vLLM? Don't fork it. 🔌 vLLM's plugin system lets you inject surgical modifications without maintaining a fork or monkey-patching entire modules. Blog by Dhruvil Bhatt from AWS SageMaker 👇 Why plugins > forks: • vLLM releases every 2 weeks with 100s of PRs merged • Forks require constant rebasing & conflict resolution • Monkey patches break on every vLLM upgrade How it works: • Use VLLMPatch[TargetClass] for precise, class-level mods • Register via vllm.general_plugins entry point   • Control patches with env vars (VLLM_CUSTOM_PATCHES) • Version-guard with @min_vllm_version decorator Example: Add priority scheduling to vLLM's scheduler in ~20 lines. One Docker image serves multiple models with different patches enabled via environment variables. The plugin loads in ALL vLLM processes (main, workers, GPU/CPU) before any inference starts—ensuring consistent behavior across distributed setups. Read the full implementation guide with code examples: https://lnkd.in/e4U_xeFa

  • View organization page for ZFLOW AI

    61 followers

    𝗭𝗙𝗟𝗢𝗪 𝗔𝗜 𝗮𝘁 𝗖𝗔𝗦𝗣𝗔 𝟮𝟬𝟮𝟱: 𝗨𝗻𝗶𝗳𝘆𝗶𝗻𝗴 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗔𝗰𝗿𝗼𝘀𝘀 𝘁𝗵𝗲 𝗔𝗜 𝗦𝘁𝗮𝗰𝗸 At this year’s CASPA Annual Conference (Sep 27, 2025), themed “𝗔𝗜 𝗘𝗰𝗼𝘀𝘆𝘀𝘁𝗲𝗺 𝗥𝗲𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻”, our Founder, CEO, and CASPA President & Chairman 𝗗𝗿. Zhibin (David) Xiao shared how ZFLOW AI is driving the next wave of innovation through 𝗮 𝘂𝗻𝗶𝗳𝗶𝗲𝗱 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺 that connects AI models, compilers, schedulers, and hardware systems. More than a simulation tool, ZFLOW AI delivers 𝗲𝗻𝗱-𝘁𝗼-𝗲𝗻𝗱 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻—from model analysis and compiler tuning to runtime scheduling and system-level performance prediction. This approach enables architects and developers to design, evaluate, and optimize the entire AI stack before deployment, accelerating innovation from silicon to cloud to edge. Dr. Xiao’s vision reflects the spirit of the AI Ecosystem Revolution—advancing collaboration between software and hardware to make AI systems more efficient, scalable, and accessible for all. #AI #AIInfra #CAPEX #Simulation #Optimization #HardwareSoftwareCodesign #ZFLOWAI #CASPA2025 #AIEcosystemRevolution

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image

Similar pages