腾讯云-HAI域探秘——自行搭建AI对话服务对话

单独访问效果：

vscode内运行效果：

我们使用腾讯云来创建，有完整的操作流程，很方便我们搭建使用。

一、服务创建

创建效果：

步骤1：

步骤2：(这里可以下拉选择大一些的硬盘)

剩余的时间需要等待。

创建完毕效果：

二、操作面板介绍

1、chatglm_gradio:

我们可以直接通过这个网址进行对话操作。

2、jupyter_lab:

创建控制台窗口，可以在这里进行具体的代码编辑与运行。

三、基础服务示例(jupyter_lab操作)

1、进入并启动服务

代码语言：javascript

复制

cd /root/ChatGLM2-6B/
python api.py

运行起来能看到有信息提示。

2、启动后开启访问端口(8000)

进入到服务详情。

添加防火墙可通过的端口号。

添加效果：

3、Python接口访问效果

添加后即可访问：http://你的公网IP:8000/ 的这个接口，具体服务参数如下列代码：

代码语言：javascript

复制

import requests
定义测试数据，以及FastAPI服务器的地址和端口
server_url = "http://0.0.0.0"  # 请确保将地址和端口更改为您的API服务器的实际地址和端口

test_data = {

"prompt": "'电影雨人讲的是什么？'",

"history": [],

"max_length": 50,

"top_p": 0.7,

"temperature": 0.95

}
发送HTTP POST请求
response = requests.post(server_url, json=test_data)
处理响应
if response.status_code == 200:

result = response.json()

print("Response:", result["response"])

print("History:", result["history"])

print("Status:", result["status"])

print("Time:", result["time"])

else:

print("Failed to get a valid response. Status code:", response.status_code)

访问效果：

四、正式服务代码

1、修改【openai-api.py】文件

使用以下代码覆盖原有的代码：

代码语言：javascript

复制

# coding=utf-8
Implements API for ChatGLM2-6B in OpenAI's format. (https://platform.openai.com/docs/api-reference/chat)
Usage: python openai_api.py
Visit http://localhost:8000/docs for documents.
import time

import torch

import uvicorn

from pydantic import BaseModel, Field

from fastapi import FastAPI, HTTPException

from fastapi.middleware.cors import CORSMiddleware

from contextlib import asynccontextmanager

from typing import Any, Dict, List, Literal, Optional, Union

from transformers import AutoTokenizer, AutoModel

from sse_starlette.sse import ServerSentEvent, EventSourceResponse
@asynccontextmanager

async def lifespan(app: FastAPI): # collects GPU memory

yield

if torch.cuda.is_available():

torch.cuda.empty_cache()

torch.cuda.ipc_collect()
app = FastAPI(lifespan=lifespan)
app.add_middleware(

CORSMiddleware,

allow_origins=["*"],

allow_credentials=True,

allow_methods=["*"],

allow_headers=["*"],

)
class ModelCard(BaseModel):

id: str

object: str = "model"

created: int = Field(default_factory=lambda: int(time.time()))

owned_by: str = "owner"

root: Optional[str] = None

parent: Optional[str] = None

permission: Optional[list] = None
class ModelList(BaseModel):

object: str = "list"

data: List[ModelCard] = []
class ChatMessage(BaseModel):

role: Literal["user", "assistant", "system"]

content: str
class DeltaMessage(BaseModel):

role: Optional[Literal["user", "assistant", "system"]] = None

content: Optional[str] = None
class ChatCompletionRequest(BaseModel):

model: str

messages: List[ChatMessage]

temperature: Optional[float] = None

top_p: Optional[float] = None

max_length: Optional[int] = None

stream: Optional[bool] = False
class ChatCompletionResponseChoice(BaseModel):

index: int

message: ChatMessage

finish_reason: Literal["stop", "length"]
class ChatCompletionResponseStreamChoice(BaseModel):

index: int

delta: DeltaMessage

finish_reason: Optional[Literal["stop", "length"]]
class ChatCompletionResponse(BaseModel):

model: str

object: Literal["chat.completion", "chat.completion.chunk"]

choices: List[Union[ChatCompletionResponseChoice, ChatCompletionResponseStreamChoice]]

created: Optional[int] = Field(default_factory=lambda: int(time.time()))
@app.get("/v1/models", response_model=ModelList)

async def list_models():

global model_args

model_card = ModelCard(id="gpt-3.5-turbo")

return ModelList(data=[model_card])
@app.post("/v1/chat/completions", response_model=ChatCompletionResponse)

async def create_chat_completion(request: ChatCompletionRequest):

global model, tokenizer
if request.messages[-1].role != &#34;user&#34;:
    raise HTTPException(status_code=400, detail=&#34;Invalid request&#34;)
query = request.messages[-1].content

prev_messages = request.messages[:-1]
if len(prev_messages) &gt; 0 and prev_messages[0].role == &#34;system&#34;:
    query = prev_messages.pop(0).content + query

history = []
if len(prev_messages) % 2 == 0:
    for i in range(0, len(prev_messages), 2):
        if prev_messages[i].role == &#34;user&#34; and prev_messages[i+1].role == &#34;assistant&#34;:
            history.append([prev_messages[i].content, prev_messages[i+1].content])

if request.stream:
    generate = predict(query, history, request.model)
    return EventSourceResponse(generate, media_type=&#34;text/event-stream&#34;)

response, _ = model.chat(tokenizer, query, history=history)
choice_data = ChatCompletionResponseChoice(
    index=0,
    message=ChatMessage(role=&#34;assistant&#34;, content=response),
    finish_reason=&#34;stop&#34;
)

return ChatCompletionResponse(model=request.model, choices=[choice_data], object=&#34;chat.completion&#34;)

async def predict(query: str, history: List[List[str]], model_id: str):

global model, tokenizer
choice_data = ChatCompletionResponseStreamChoice(
    index=0,
    delta=DeltaMessage(role=&#34;assistant&#34;),
    finish_reason=None
)
chunk = ChatCompletionResponse(model=model_id, choices=[choice_data], object=&#34;chat.completion.chunk&#34;)
#yield &#34;{}&#34;.format(chunk.json(exclude_unset=True, ensure_ascii=False))
yield &#34;{}&#34;.format(chunk.model_dump_json(exclude_unset=True))

current_length = 0

for new_response, _ in model.stream_chat(tokenizer, query, history):
    if len(new_response) == current_length:
        continue

    new_text = new_response[current_length:]
    current_length = len(new_response)

    choice_data = ChatCompletionResponseStreamChoice(
        index=0,
        delta=DeltaMessage(content=new_text),
        finish_reason=None
    )
    chunk = ChatCompletionResponse(model=model_id, choices=[choice_data], object=&#34;chat.completion.chunk&#34;)
    #yield &#34;{}&#34;.format(chunk.json(exclude_unset=True, ensure_ascii=False))
    yield &#34;{}&#34;.format(chunk.model_dump_json(exclude_unset=True))

choice_data = ChatCompletionResponseStreamChoice(
    index=0,
    delta=DeltaMessage(),
    finish_reason=&#34;stop&#34;
)
chunk = ChatCompletionResponse(model=model_id, choices=[choice_data], object=&#34;chat.completion.chunk&#34;)
#yield &#34;{}&#34;.format(chunk.json(exclude_unset=True, ensure_ascii=False))
yield &#34;{}&#34;.format(chunk.model_dump_json(exclude_unset=True))
yield &#39;[DONE]&#39;

if name == "main":

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", revision="v1.0", trust_remote_code=True)

model = AutoModel.from_pretrained("THUDM/chatglm2-6b", revision="v1.0", trust_remote_code=True).cuda()

# 多显卡支持，使用下面两行代替上面一行，将num_gpus改为你实际的显卡数量

# from utils import load_model_on_gpus

# model = load_model_on_gpus("THUDM/chatglm2-6b", num_gpus=2)

model.eval()
uvicorn.run(app, host=&#39;0.0.0.0&#39;, port=8000, workers=1)</code></pre></div></div><h5 id="cp5sg" name="2%E3%80%81%E8%BF%90%E8%A1%8C%E3%80%90openai-api.py%E3%80%91%E6%96%87%E4%BB%B6%EF%BC%8C%E6%9C%8D%E5%8A%A1%E7%AB%AF%E5%BC%80%E5%90%AF%E6%9C%8D%E5%8A%A1"> 2、运行【<strong>openai-api.py</strong>】文件，服务端开启服务</h5><p>在控制台直接输入python openai-api.py即可运行</p><figure class=""><div class="rno-markdown-img-url" style="text-align:center"><div class="rno-markdown-img-url-inner" style="width:100%"><div style="width:100%"><img src="https://cdn.static.attains.cn/app/developer-bbs/upload/1723319216348495924.png" /></div></div></div></figure><h4 id="1l33g" name="%E4%BA%94%E3%80%81%E5%8F%AF%E8%A7%86%E5%8C%96%E9%A1%B5%E9%9D%A2%E6%90%AD%E5%BB%BA">五、可视化页面搭建</h4><h5 id="22le" name="1%E3%80%81%E5%9C%A8%E5%88%9B%E5%BB%BAcloud-Studio%E7%9A%84%E6%97%B6%E5%80%99%E9%80%89%E6%8B%A9%E3%80%90%E5%BA%94%E7%94%A8%E6%8E%A8%E8%8D%90%E3%80%91">1、在创建cloud Studio的时候选择【应用推荐】</h5><p>选择【ChatGPT Next Web】</p><figure class=""><div class="rno-markdown-img-url" style="text-align:center"><div class="rno-markdown-img-url-inner" style="width:100%"><div style="width:100%"><img src="https://cdn.static.attains.cn/app/developer-bbs/upload/1723319216814882994.png" /></div></div></div></figure><h5 id="9k07q" name="2%E3%80%81Fork%E9%A1%B9%E7%9B%AE">2、Fork项目</h5><figure class=""><div class="rno-markdown-img-url" style="text-align:center"><div class="rno-markdown-img-url-inner" style="width:100%"><div style="width:100%"><img src="https://cdn.static.attains.cn/app/developer-bbs/upload/1723319217461556997.png" /></div></div></div></figure><h5 id="b7vs0" name="3%E3%80%81%E4%BF%AE%E6%94%B9%E3%80%90.env.template%E3%80%91%E6%96%87%E4%BB%B6">3、修改【<strong>.env.template</strong>】文件</h5><figure class=""><div class="rno-markdown-img-url" style="text-align:center"><div class="rno-markdown-img-url-inner" style="width:69.9%"><div style="width:100%"><img src="https://cdn.static.attains.cn/app/developer-bbs/upload/1723319218098819713.png" /></div></div></div></figure><p>直接替换我下面的就行，但是需要替换一下你服务的IP。</p><div class="rno-markdown-code"><div class="rno-markdown-code-toolbar"><div class="rno-markdown-code-toolbar-info"><div class="rno-markdown-code-toolbar-item is-type"><span class="is-m-hidden">代码语言：</span>javascript</div></div><div class="rno-markdown-code-toolbar-opt"><div class="rno-markdown-code-toolbar-copy"><i class="icon-copy"></i><span class="is-m-hidden">复制</span></div></div></div><div class="developer-code-block"><pre class="prism-token token line-numbers language-javascript"><code class="language-javascript" style="margin-left:0"># Your openai api key. (required)

OPENAI_API_KEY="hongMuXiangXun"
Access passsword, separated by comma. (optional)
CODE=
You can start service behind a proxy
PROXY_URL=http://你的IP:8000
Override openai api request base url. (optional)
Default: https://api.openai.com
Examples: http://your-openai-proxy.com
BASE_URL=http://你的IP:8000
Specify OpenAI organization ID.(optional)
Default: Empty
OPENAI_ORG_ID=
(optional)
Default: Empty
If you do not want users to input their own API key, set this value to 1.
HIDE_USER_API_KEY=
(optional)
Default: Empty
If you do not want users to use GPT-4, set this value to 1.
DISABLE_GPT4=
(optional)
Default: Empty
If you do not want users to query balance, set this value to 1.
HIDE_BALANCE_QUERY=