欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 文旅 > 明星 > whisper-api语音识别语音翻译高性能兼容openai接口协议的开源项目

whisper-api语音识别语音翻译高性能兼容openai接口协议的开源项目

2024/10/24 4:29:52 来源:https://blog.csdn.net/weixin_40986713/article/details/140502682  浏览:    关键词:whisper-api语音识别语音翻译高性能兼容openai接口协议的开源项目

whisper-api

介绍

使用openai的开源项目winsper语音识别开源模型封装成openai chatgpt兼容接口

软件架构

使用uvicorn、fastapi、openai-whisper等开源库实现高性能接口

更多介绍 https://blog.csdn.net/weixin_40986713/article/details/138712293

使用说明
  1. 下载代码
  2. 安装 ffmpeg https://ffmpeg.org/download.html
  3. 安装依赖 项目根目录下执行命令 pip install -r requirements.txt
  4. 运行代码 项目根目录下执行命令 python main.py

这里的 http://0.0.0.0:3003 就是连接地址。

启动类代码
import atexit
import json
import os
import tempfile
import timeimport uvicorn
from fastapi import FastAPI, UploadFile, File, Security, HTTPException
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentialsfrom whisper_script import WhisperHandlerapp = FastAPI()
security = HTTPBearer()
env_bearer_token = 'sk-tarzan'
model_size = os.getenv("MODEL_SIZE", "base")
language = os.getenv("LANGUAGE", "Chinese")def cleanup_temp_file(path):if os.path.exists(path):os.remove(path)with open('options.json', 'r') as options:# 使用json.load()函数读取并解析文件内容load_options = json.load(options)# 语音识别
@app.post("/v1/audio/transcriptions")
async def transcribe(file: UploadFile = File(...), credentials: HTTPAuthorizationCredentials = Security(security)):if env_bearer_token is not None and credentials.credentials != env_bearer_token:raise HTTPException(status_code=401, detail="Invalid token")file_bytes = await file.read()return {"text": audio_to_text(file_bytes, 'transcribe')}# 语音翻译
@app.post("/v1/audio/translations")
async def translate(file: UploadFile = File(...), credentials: HTTPAuthorizationCredentials = Security(security)):if env_bearer_token is not None and credentials.credentials != env_bearer_token:raise HTTPException(status_code=401, detail="Invalid token")file_bytes = await file.read()return {"text": audio_to_text(file_bytes, 'translate')}def audio_to_text(file_bytes, task):start_time = time.time()max_file_size = 500 * 1024 * 1024if len(file_bytes) > max_file_size:raise HTTPException(status_code=400, detail="File is too large")temp_path = Nonetry:with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as temp_audio:temp_audio.write(file_bytes)temp_path = temp_audio.namemodel_size = load_options.get("model_size")language = load_options.get("language")prompts = {"verbose": load_options.get("verbose"),"temperature": load_options.get("temperature"),"compression_ratio_threshold": load_options.get("compression_ratio_threshold"),"logprob_threshold": load_options.get("logprob_threshold"),"no_speech_threshold": load_options.get("no_speech_threshold"),"condition_on_previous_text": load_options.get("condition_on_previous_text"),"initial_prompt": load_options.get("initial_prompt"),"word_timestamps": load_options.get("word_timestamps"),"prepend_punctuations": load_options.get("prepend_punctuations"),"append_punctuations": load_options.get("append_punctuations")}print('temp_path', temp_path)handler = WhisperHandler(temp_path, model_size=model_size, language=language, task=task, prompt=prompts)result = handler.transcribe()except Exception as e:raise HTTPException(status_code=500, detail=str(e))finally:atexit.register(cleanup_temp_file, temp_path)end_time = time.time()print(f"audio to text took {end_time - start_time:.2f} seconds")return result['text']if __name__ == "__main__":token = os.getenv("ACCESS_TOKEN")if token is not None:env_bearer_token = tokentry:uvicorn.run("main:app", reload=True, host="0.0.0.0", port=3003)except Exception as e:print(f"API启动失败!\n报错:\n{e}")
开源地址

项目开源地址: https://gitee.com/taisan/whisper-api

docker
  1. docker打包命令
docker build -t whisper .

2.docker命令启动

gpu显卡模式

docker run -itd --name whisper-api -p 3003:3003 --gpus all --restart=always whisper
  • 默认 ACCESS_TOKEN=sk-tarzan

cpu模式

docker run -itd --name whisper-api -p 3003:3003 --restart=always whisper
  • 默认 ACCESS_TOKEN=sk-tarzan

鉴权模式

docker run -itd --name whisper-api -p 3003:3003-e ACCESS_TOKEN=yourtoken --gpus all --restart=always whisper
docker run -itd --name whisper-api -p 3003:3003-e ACCESS_TOKEN=yourtoken --restart=always whisper
  • yourtoken 修改你设置的鉴权token,接口调用header 里传 Authorization:Bearer sk-tarzan

docker日志查看

docker logs -f [容器id或容器名称]
配置文件

options.json

{"model_size": "base","language": "Chinese"
}
  • 可结合one-api,接入FastGPT等rag开源项目使用,使用教程如下:
    《Fastgpt接入Whisper本地模型实现语音输入》

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com