Xorbits Inference (Xinference)
This page demonstrates how to use Xinference with LangChain.
Xinference
is a powerful and versatile library designed to serve LLMs,
speech recognition models, and multimodal models, even on your laptop.
With Xorbits Inference, you can effortlessly deploy and serve your or
state-of-the-art built-in models using just a single command.
Installation and Setup
Xinference can be installed via pip from PyPI:
pip install "xinference[all]"
LLM
Xinference supports various models compatible with GGML, including chatglm, baichuan, whisper, vicuna, and orca. To view the builtin models, run the command:
xinference list --all
Wrapper for Xinference
You can start a local instance of Xinference by running:
xinference
You can also deploy Xinference in a distributed cluster. To do so, first start an Xinference supervisor on the server you want to run it:
xinference-supervisor -H "${supervisor_host}"
Then, start the Xinference workers on each of the other servers where you want to run them on:
xinference-worker -e "http://${supervisor_host}:9997"
You can also start a local instance of Xinference by running:
xinference
Once Xinference is running, an endpoint will be accessible for model management via CLI or Xinference client.
For local deployment, the endpoint will be http://localhost:9997.
For cluster deployment, the endpoint will be http://${supervisor_host}:9997.
Then, you need to launch a model. You can specify the model names and other attributes including model_size_in_billions and quantization. You can use command line interface (CLI) to do it. For example,
xinference launch -n orca -s 3 -q q4_0
A model uid will be returned.
Example usage:
from langchain.llms import Xinference
llm = Xinference(
server_url="http://0.0.0.0:9997",
model_uid = {model_uid} # replace model_uid with the model UID return from launching the model
)
llm(
prompt="Q: where can we visit in the capital of France? A:",
generate_config={"max_tokens": 1024, "stream": True},
)
API Reference:
- Xinference from
langchain.llms
Usage
For more information and detailed examples, refer to the example notebook for xinference
Embeddings
Xinference also supports embedding queries and documents. See example notebook for xinference embeddings for a more detailed demo.