Xorbits Inference (Xinference)

This page demonstrates how to use Xinference with LangChain.

Xinference is a powerful and versatile library designed to serve LLMs, speech recognition models, and multimodal models, even on your laptop. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command.

Installation and Setup

Xinference can be installed via pip from PyPI:

pip install "xinference[all]"

LLM

Xinference supports various models compatible with GGML, including chatglm, baichuan, whisper, vicuna, and orca. To view the builtin models, run the command:

xinference list --all

Wrapper for Xinference

You can start a local instance of Xinference by running:

xinference

You can also deploy Xinference in a distributed cluster. To do so, first start an Xinference supervisor on the server you want to run it:

xinference-supervisor -H "${supervisor_host}"

Then, start the Xinference workers on each of the other servers where you want to run them on:

xinference-worker -e "http://${supervisor_host}:9997"

You can also start a local instance of Xinference by running:

xinference

Once Xinference is running, an endpoint will be accessible for model management via CLI or Xinference client.

For local deployment, the endpoint will be http://localhost:9997.

For cluster deployment, the endpoint will be http://${supervisor_host}:9997.

Then, you need to launch a model. You can specify the model names and other attributes including model_size_in_billions and quantization. You can use command line interface (CLI) to do it. For example,

xinference launch -n orca -s 3 -q q4_0

A model uid will be returned.

Example usage:

from langchain.llms import Xinference

llm = Xinference(
    server_url="http://0.0.0.0:9997",
    model_uid = {model_uid} # replace model_uid with the model UID return from launching the model
)

llm(
    prompt="Q: where can we visit in the capital of France? A:",
    generate_config={"max_tokens": 1024, "stream": True},
)

API Reference:

Xinference from langchain.llms

Usage

For more information and detailed examples, refer to the example notebook for xinference

Embeddings

Xinference also supports embedding queries and documents. See example notebook for xinference embeddings for a more detailed demo.

Xorbits Inference (Xinference)

Installation and Setup​

LLM​

Wrapper for Xinference​

API Reference:

Usage​

Embeddings​

Installation and Setup

LLM

Wrapper for Xinference

Usage

Embeddings