Skip to content
This repository was archived by the owner on Aug 7, 2025. It is now read-only.

Conversation

@sachanub
Copy link
Collaborator

@sachanub sachanub commented Dec 20, 2023

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

The objective of this PR is to include the GPT Fast model with weights corresponding to Llama 7B with int4 quantization.

Steps to download Llama 7B weights in the benchmark host:

Ran a temporary workflow to download weights with the HUGGING_FACE_HUB_TOKEN in the commit 1e6088e

Results of successful run: https://github.com/pytorch/serve/actions/runs/7271384883/job/19811851851?pr=2857

Testing:

Ran benchmark workflow in the commit 31936c9

Results of the successful run: https://github.com/pytorch/serve/actions/runs/7272224847/job/19813999840?pr=2857
Benchmark report file: report.md

Updates in benchmark-ab.py script:

Also updated the benchmark-ab.py script to include -l in the ab commands to allow variable response lengths without counting them as errors (https://httpd.apache.org/docs/2.4/programs/ab.html).

@sachanub sachanub changed the title Include GPT Fast in torch.compile nightly benchmark workflow [WIP] Include GPT Fast in torch.compile nightly benchmark workflow Dec 20, 2023
gpt_fast:
7b_int4:
benchmark_engine: "ab"
url: https://torchserve.pytorch.org/mar_files/gpt_fast_7b_int4.mar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clearly specify the model in the name. eg. Llama-2-7b-hf

backend_profiling: False
exec_env: "local"
processors:
- "cpu"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu should be reomoved.

@chauhang
Copy link
Contributor

@namannandan @lxning What is the work remaining for this PR?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants