关键字:

dify,localai,huggingface,model gallery,pre-built binaries(AVX高级向量扩展指令集),CPU flagset compatibility, Diffusers backend(AMD),p2p,Federated mode,


Tip:

lm studio 国内下载模型:

  • 1、修改C:\Users\Admin\AppData\Local\LM-Studio\app-0.2.24\resources\app\.webpack\main中的3个js文件:index.js  llmworker.js  worker.js,以及C:\Users\admin\AppData\Local\LM-Studio\app-0.2.31\resources\app\.webpack\renderer\main_window\ index.js(替换这儿搜索后的模型地址直接就是hf-mirror.com
    )中替换huggingface.co为hf-mirror.com。这样就能实现搜索模型文件。
  • 2、搜索模型,选择下载,出现下载失败,因为此时地址还是huggingface.co
  • 3、修改C:\Users\Admin\.cache\lm-studio下的downloads.json,替换huggingface.co为hf-mirror.com。
  • 4、重新打开LM Studio,在“ModelDownloads”点击“Try Resume”按钮,就可以下载了。

要将模型加载到 LocalAI 中,您可以手动使用模型(例如:local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf,或者配置 LocalAI 从外部源(如 Huggingface)拉取模型并进行配置可以将 LocalAI 指向一个 YAML 配置文件的 URL—— LocalAI 在二进制文件中也嵌入了一些流行的模型配置。在下面您可以找到 LocalAI 预先构建的模型配置列表,关于如何从 URL 配置模型,请参阅模型自定义

https://github.com/mudler/LocalAI/blob/master/examples/configurations/phi-2.yaml
## Important! Substitute with your gist's URL!
docker run -p 8080:8080 localai/localai:v2.19.4-ffmpeg-core https://gist.githubusercontent.com/xxxx/phi-2.yaml

Dify应用实践(LocalAI)_API

Tip:

  • Dify 目前支持多个 Rerank 模型,进入 “模型供应商” 页填入重排序模型(Rerank Model)
    (例如 Cohere、Jina 等模型)的 API Key。
  • Dify 元数据存在 PostgreSQL 中
  • LocalAI 的All-in-One imagelocalai/localai:latest-aio-cpu)含有 a pre-configured set of models and backends,以充分利用几乎所有的LocalAI功能;而standard images没有any model pre-configured and installed;如果没有GPU, 需使用 CPU images。
  • The LocalAI’s (All-in-One)AIO images come pre-configured with the following features:
  • Text to Speech (TTS)
  • Speech to Text
  • Function calling
  • Large Language Models (LLM) for text generation
  • Image generation
  • Embedding server
  • LocalAI v2.20.0正式弃用 gpt4all.cpp 和 petals 后端backends。较新的 llama.cpp 提供了一组卓越的功能和更好的性能,使其成为未来的首选。


1、本地部署mudler/LocalAI也支持Rerank 模型:LocalAI是一个自托管、社区驱动、本地OpenAI兼容的API,可以在具有消费级硬件的CPU上运行。它使您能够在本地或本地运行模型,而无需互联网连接或外部服务器。使用/v1/models端点列出可用模型,或者使用/v1/completions端点生成文本完成

Dify应用实践(LocalAI)_github_02

2、使用go-skynet/helm-charts: go-skynet helm chart repository (github.com)项目部署LocalAI

helm repo add go-skynet https://go-skynet.github.io/helm-charts/
values.yaml
deployment:
  image:
    repository: registry.cn-beijing.aliyuncs.com/mizy/localai 
    tag: master-aio-cpu 
# https://localai.io/basics/container/#all-in-one-images    
persistence:
  models: 
    enabled: true
    storageClass: rook-cephfs
    accessModes: ReadWriteMany
    size: 50Gi
    globalMount: /models
  output:
    enabled: true
    storageClass: rook-cephfs
    accessModes: ReadWriteMany
    size: 5Gi
    globalMount: /tmp/generated

Dify应用实践(LocalAI)_API_03

3、自己将模型拷贝到宿主机/models目录,然后挂载到容器。因为默认访问huggingface.co失败。

mkdir models

# Download luna-ai-llama2 to models/
wget https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF/resolve/main/luna-ai-llama2-uncensored.Q4_0.gguf -O models/luna-ai-llama2

# Use a template from the examples, if needed
cp -rf prompt-templates/getting_started.tmpl models/luna-ai-llama2.tmpl

docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:latest --models-path /models --context-size 700 --threads 4

# Now the API is accessible at localhost:8080
curl http://localhost:8080/v1/models
# {"object":"list","data":[{"id":"luna-ai-llama2","object":"model"}]}

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "luna-ai-llama2",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9
   }'
# {"model":"luna-ai-llama2","choices":[{"message":{"role":"assistant","content":"I'm doing well, thanks. How about you?"}}]}

4、关于url地址的解析问题:

  • 例如LocalAI/aio/cpu/embeddings.yaml中model的值huggingface://mudler/all-MiniLM-L6-v2/ggml-model-q4_0.bin会解析成url地址:https://huggingface.co/mudler/all-MiniLM-L6-v2/ggml-model-q4_0.bin
name: text-embedding-ada-002
backend: bert-embeddings
parameters:
  model: huggingface://mudler/all-MiniLM-L6-v2/ggml-model-q4_0.bin

usage: |
    You can test this model with curl like this:

    curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
      "input": "Your text string goes here",
      "model": "text-embedding-ada-002"
    }'
  • 完成该url地址解析的源码如下(可见如果地址直接以https://hf-mirror.com开头的话不会执行一下代码操作)
LocalAI/.github/check_and_update.py
# Function to parse the URI and determine download method
def parse_uri(uri):
    if uri.startswith('huggingface://'):
        repo_id = uri.split('://')[1]
        return 'huggingface', repo_id.rsplit('/', 1)[0]
    elif 'huggingface.co' in uri:
        parts = uri.split('/resolve/')
        if len(parts) > 1:
            repo_path = parts[0].split('https://huggingface.co/')[-1]
            return 'huggingface', repo_path
    return 'direct', uri
  • huggingface://转换为https://huggingface.co/%s/%s/resolve/%s/%s;github://转换为https://raw.githubusercontent.com/%s/%s/%s/%s
LocalAI\pkg\downloader\uri.go
const (
	HuggingFacePrefix = "huggingface://"
	GithubURI         = "github:"
	GithubURI2        = "github://"

func (s URI) ResolveURL() string {
	switch {
case strings.HasPrefix(string(s), HuggingFacePrefix):
		repository := strings.Replace(string(s), HuggingFacePrefix, "", 1)
		// convert repository to a full URL.
		// e.g. TheBloke/Mixtral-8x7B-v0.1-GGUF/mixtral-8x7b-v0.1.Q2_K.gguf@main -> https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF/resolve/main/mixtral-8x7b-v0.1.Q2_K.gguf
		owner := strings.Split(repository, "/")[0]
		repo := strings.Split(repository, "/")[1]
		branch := "main"
		if strings.Contains(repo, "@") {
			branch = strings.Split(repository, "@")[1]
		}
		filepath := strings.Split(repository, "/")[2]
		if strings.Contains(filepath, "@") {
			filepath = strings.Split(filepath, "@")[0]
		}
		return fmt.Sprintf("https://huggingface.co/%s/%s/resolve/%s/%s", owner, repo, branch, filepath)
	}
	return string(s)

case strings.HasPrefix(string(s), GithubURI):
		parts := strings.Split(string(s), ":")
		repoParts := strings.Split(parts[1], "@")
		branch := "main"
		if len(repoParts) > 1 {
			branch = repoParts[1]
		}
		repoPath := strings.Split(repoParts[0], "/")
		org := repoPath[0]
		project := repoPath[1]
		projectPath := strings.Join(repoPath[2:], "/")
		return fmt.Sprintf("https://raw.githubusercontent.com/%s/%s/%s/%s", org, project, branch, projectPath)

5、LocalAI本地部署问题

  • 程序./local-ai-Linux-x86_64运行
[root@k8s-master01 aiWorkSpace]# ./local-ai-Linux-x86_64 
./local-ai-Linux-x86_64: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./local-ai-Linux-x86_64)
./local-ai-Linux-x86_64: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./local-ai-Linux-x86_64)
  • docker运行可能因为CPU no AVX运行有问题(见本文第7部分
docker run -p 8080:8080 localai/localai:v2.19.4-ffmpeg-core https://gitee.com/mi_zy/LocalAI/raw/master/examples/configurations/phi-2.yaml
CPU: no AVX、AVX2、AVX512  found
INF Downloading "https://gitee.com/mi_zy/LocalAI/raw/master/examples/configurations/phi-2.yaml"
INF Downloading "https://hf-mirror.com/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q8_0.gguf"
---测试
curl http://localhost:8080/v1/chat/completions -H "Content-Type:  
  application/json" -d '{ "model": "phi-2", "messages": [{"role": "user",     
  "content": "How are you doing?", "temperature": 0.1}] }' 
---报错
INF [llama-cpp] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
ERR Server error error="rpc error: code = Unknown desc = unimplemented" ip=172.17.0.1 latency=26.45223802s method=POST status=500 url=/v1/chat/completions
  • k8s运行也不成功,报错:
{"error":{"code":500,"message":"could not load model - all backends returned error: [llama-cpp]: could not load model: rpc error: code = Unavailable
  • Windows / Linux PC with a processor that supports AVX2 (typically newer PCs)
  • I’m getting a ‘SIGILL’ error, what’s wrong? (FAQ | LocalAI documentation
  • Your CPU probably does not have support for certain instructions that are compiled by default in the pre-built binaries. If you are running in a container, try setting REBUILD=true and disable the CPU instructions that are not compatible with your CPU. For instance: CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" make build
  • 在 TensorFlow 1.5.0 中,基于 CUDA 9 和 cnDNN 7 的预构建库二进制文件pre-built binaries发生了重大改变,二进制文件基于 ubuntu 16 容器构建,可能与 ubuntu 14 glibc 存在兼容问题。从 TF1.6 版本开始,预构建二进制文件使用 AVX 指令集,可能与旧 GPU 存在兼容问题。(具有大量向量计算的应用,在支持相应指令集的硬件上能够显著提升性能,而在不支持的硬件上则可能性能不佳。)

6、model gallery是 LocalAI 的精选模型配置集合,支持直接从 LocalAI Web 界面一键安装模型。LocalAI 简化了模型的安装。提供了一种在启动时预加载模型并在运行时下载和安装模型的方法

  •  可以通过将模型复制到 models 目录上来手动安装模型;
  • 可以使用 API or the Web interface to配置、下载和 verify the model assets。

与 LocalAI 兼容的模型必须以 gguf 格式进行量化。

How to install a model not part of a gallery:指定模型配置文件 URL (url)、用于安装模型的名称(name)、要安装的额外文件 (files) 和配置覆盖 (overrides)。

LOCALAI=http://localhost:8080
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "config_url": "https://raw.githubusercontent.com/mudler/LocalAI/master/embedded/models/hermes-2-pro-mistral.yaml"
   }'

7、构建自定义容器镜像(-DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF)

  • Diffusers:能够从文本生成图像和视频的后端。在某些情况下,可能希望从源代码重新构建 LocalAI(例如利用 Apple Silicon 加速),或者使用自己的后端构建自定义容器镜像。
  • CPU flagset compatibility 。LocalAI uses different backends based on ggml and llama.cpp to run models. If your CPU doesn’t support common instruction sets, you can disable them during build:
CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF -DGGML_FMA=OFF" make build
To have effect on the container image, you need to set REBUILD=true:
docker run  quay.io/go-skynet/localai
--->docker run --rm -ti -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -e REBUILD=true -e CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_AVX=OFF -DGGML_FMA=OFF" -v $PWD/models:/models quay.io/go-skynet/local-ai:latest
在k8s-master01上运行了2天,报错退出
/usr/bin/upx backend-assets/grpc/local-store
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2020
UPX 3.96        Markus Oberhumer, Laszlo Molnar & John Reiser   Jan 23rd 2020
        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
  25669892 ->   9831668   38.30%   linux/amd64   local-store                   
Packed 1 file.
I local-ai build info:
I BUILD_TYPE: hipblas
I GO_TAGS: 
I LD_FLAGS: -s -w -X "github.com/mudler/LocalAI/internal.Version=v2.19.4" -X "github.com/mudler/LocalAI/internal.Commit=af0545834fd565ab56af0b9348550ca9c3cb5349"
I UPX: /usr/bin/upx
CGO_LDFLAGS="-O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link -L/opt/rocm/lib/llvm/lib" go build -ldflags "-s -w -X "github.com/mudler/LocalAI/internal.Version=v2.19.4" -X "github.com/mudler/LocalAI/internal.Commit=af0545834fd565ab56af0b9348550ca9c3cb5349"" -tags "" -o local-ai ./
  • 指令集扩展是那些可提升性能且同时确保在多个数据对象上进行相同操作的附加指令。它们可包括 SSE(单指令多数据流扩展)和 AVX(Advanced Vector Extensions高级矢量扩展)。
  • WIN10主机CPU为:至强® Silver 4210,指令集扩展支持Intel® SSE4.2, Intel® AVX, Intel® AVX2, Intel® AVX-512,支持的AVX-512 FMA 单元数为1;
  • k8s主机DL388 G7的CPU是Intel(R) Xeon(R) CPU E5606指令集扩展支持SSE(单指令多数据流扩展)