Skip to content

experiments with websocket-stream [Review required] #3054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ option(WHISPER_BUILD_SERVER "whisper: build server example" ${WHISPER_STANDALO
# 3rd party libs
option(WHISPER_CURL "whisper: use libcurl to download model from an URL" OFF)
option(WHISPER_SDL2 "whisper: support for libSDL2" OFF)
option(WEBSOCKET "whisper: support for websocket" OFF)

if (CMAKE_SYSTEM_NAME MATCHES "Linux")
option(WHISPER_FFMPEG "whisper: support building and linking with ffmpeg libs (avcodec, swresample, ...)" OFF)
Expand Down
6 changes: 4 additions & 2 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,10 @@ if (EMSCRIPTEN)
add_subdirectory(bench.wasm)
elseif(CMAKE_JS_VERSION)
add_subdirectory(addon.node)
else()
add_subdirectory(cli)
else()
if (WEBSOCKET)
add_subdirectory(cli)
endif()
add_subdirectory(bench)
add_subdirectory(server)
add_subdirectory(quantize)
Expand Down
8 changes: 8 additions & 0 deletions examples/websocket-stream/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
set(TARGET whisper-websocket-stream)
add_executable(${TARGET} main.cpp whisper-server.cpp message-buffer.cpp)
find_package(ixwebsocket)
find_package(CURL REQUIRED)
include(DefaultTargetOptions)
target_link_libraries(${TARGET} PRIVATE common whisper ixwebsocket z CURL::libcurl ${CMAKE_THREAD_LIBS_INIT})

install(TARGETS ${TARGET} RUNTIME)
90 changes: 90 additions & 0 deletions examples/websocket-stream/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# WebSocket Whisper Stream Example

This example demonstrates a WebSocket-based real-time audio transcription service using the Whisper model. The server captures audio from clients, processes it using the Whisper model, and sends transcriptions back through WebSocket connections.

## Features

- Real-time audio transcription
- WebSocket communication for audio and transcription data
- Configurable parameters for model, language, and processing settings
- Integration with backend services via HTTP requests

## Usage

Run the server with the following command:

```bash
./build/bin/whisper-websocket-stream -m ./models/ggml-large-v3-turbo.bin -t 8 --host 0.0.0.0 --port 9002 --forward-url http://localhost:8080/completion
```

### Parameters

- `-m` or `--model`: Path to the Whisper model file.
- `-t` or `--threads`: Number of threads for processing.
- `-H` or `--host`: Hostname or IP address to bind the server to.
- `-p` or `--port`: Port number for the server.
- `-f` or `--forward-url`: URL to forward transcriptions to a backend service.
- `-nm` or `--max-messages`: Maximum number of messages before sending to the backend.
- `-l` or `--language`: Spoken language for transcription.
- `-vth` or `--vad-thold`: Voice activity detection threshold.
- `-tr` or `--translate`: Enable translation to English.
- `-ng` or `--no-gpu`: Disable GPU usage.
- `-bs` or `--beam-size`: Beam size for beam search.

## Building

To build the server, follow these steps:

```bash
# Install dependencies
git clone --depth 1 https://github.com/machinezone/IXWebSocket/
cd IXWebSocket
mkdir -p build && cd build && cmake -GNinja .. && sudo ninja -j$((npoc)) install
# Build the project
#cuda is optional
git clone --depth 1 https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
mkdir -p build && cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DWEBSOCKET=ON -DGGML_CUDA ..
ninja -j$((npoc))

# Run the server
./bin/whisper-websocket-stream --help
```

## Client Integration

Clients can connect to the WebSocket server and send audio data. The server processes the audio and sends transcriptions back through the WebSocket connection.

### Example Client Code (JavaScript)

```javascript
const socket = new WebSocket('ws://localhost:9002');

socket.onopen = () => {
console.log('Connected to WebSocket server');
};

socket.onmessage = (event) => {
console.log('Transcription:', event.data);
};

socket.onclose = () => {
console.log('Disconnected from WebSocket server');
};

// Function to send audio data to the server
function sendAudioData(audioData) {
socket.send(audioData);
}
```

## Backend Integration

The server can forward transcriptions to a backend service via HTTP requests. Configure the `forward_url` parameter to specify the backend service URL.

## Dependencies
- whisper.cpp
- ixwebsocket for WebSocket communication
- libcurl for HTTP requests
```
15 changes: 15 additions & 0 deletions examples/websocket-stream/client-session.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#ifndef CLIENT_SESSION_H
#define CLIENT_SESSION_H
#include <vector>
#include <mutex>
#include <atomic>
#include "ixwebsocket/IXWebSocketServer.h"
#include "message-buffer.h"
struct ClientSession {
std::vector<float> pcm_buffer;
std::mutex mtx;
std::atomic<bool> active{true};
ix::WebSocket *connection;
MessageBuffer buffToBackend;
};
#endif
61 changes: 61 additions & 0 deletions examples/websocket-stream/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
<!DOCTYPE html>
<html>
<head>
<title>Mic to WebSocket</title>
</head>
<body>
<button id="startBtn">Start Mic</button>
<div id="status"></div>

<script>
const startBtn = document.getElementById('startBtn');
const statusDiv = document.getElementById('status');
let isRecording = false;
let socket;

startBtn.addEventListener('click', async () => {
if (!isRecording) {
try {
socket = new WebSocket('ws://192.168.2.109:9002');

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({sampleRate: 16000});
const source = audioContext.createMediaStreamSource(stream);

const processor = audioContext.createScriptProcessor(1024, 1, 1);

source.connect(processor);
processor.connect(audioContext.destination);
function floatTo16BitPCM(input) {
const output = new Int16Array(input.length);
for (let i = 0; i < input.length; i++) {
output[i] = Math.max(-1, Math.min(1, input[i])) * 0x7FFF;
}
return output;
}
processor.onaudioprocess = (e) => {
const input = e.inputBuffer.getChannelData(0);
const int16Data = floatTo16BitPCM(input);

if (socket.readyState === WebSocket.OPEN) {
socket.send(int16Data.buffer);
}
};

statusDiv.textContent = 'Recording...';
startBtn.textContent = 'Stop';
isRecording = true;
} catch (err) {
console.error('Error accessing microphone:', err);
statusDiv.textContent = 'Error accessing microphone';
}
} else {
if (socket) socket.close();
statusDiv.textContent = 'Stopped';
startBtn.textContent = 'Start Mic';
isRecording = false;
}
});
</script>
</body>
</html>
74 changes: 74 additions & 0 deletions examples/websocket-stream/main.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#include <string>
#include "whisper.h"
#include "server-params.h"
#include "whisper-server.h"

#define CONVERT_FROM_PCM_16
std::string forward_url = "http://127.0.0.1:8080/completion";
size_t max_messages = 1000;

void print_usage(int argc, char** argv, const ServerParams& params) {
fprintf(stderr, "\n");
fprintf(stderr, "usage: %s [options]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "options:\n");
fprintf(stderr, " -h, --help show this help message and exit\n");
fprintf(stderr, " -H HOST, --host HOST [%-7s] hostname or ip\n", params.host.c_str());
fprintf(stderr, " -p PORT, --port PORT [%-7d] server port\n", params.port);
fprintf(stderr, " -f FORWARD_URL, --forward-url FORWARD_URL [%-7s] forward url\n", forward_url.c_str());
fprintf(stderr, " -t N, --threads N [%-7d] number of threads\n", params.n_threads);
fprintf(stderr, " -nm max_messages, --max-messages max_messages [%-7d] max messages before send to backend\n", max_messages);
fprintf(stderr, " -m FNAME, --model FNAME [%-7s] model path\n", params.model.c_str());
fprintf(stderr, " -l LANG, --language LANG [%-7s] spoken language\n", params.language.c_str());
fprintf(stderr, " -vth N, --vad-thold N [%-7.2f] voice activity threshold\n", params.vad_thold);
fprintf(stderr, " -tr, --translate [%-7s] translate to english\n", params.translate ? "true" : "false");
fprintf(stderr, " -ng, --no-gpu [%-7s] disable GPU\n", params.use_gpu ? "false" : "true");
fprintf(stderr, " -bs N, --beam-size N [%-7d] beam size for beam search\n", params.beam_size);
fprintf(stderr, "\n");
}

bool parse_params(int argc, char** argv, ServerParams& params) {
for (int i = 1; i < argc; i++) {
std::string arg = argv[i];

if (arg == "-h" || arg == "--help") {
print_usage(argc, argv, params);
exit(0);
}
else if (arg == "-H" || arg == "--host") { params.host = argv[++i]; }
else if (arg == "-p" || arg == "--port") { params.port = std::stoi(argv[++i]); }
else if (arg == "-f" || arg == "--forward-url") { forward_url = argv[++i]; }
else if (arg == "-t" || arg == "--threads") { params.n_threads = std::stoi(argv[++i]); }
else if (arg == "-nm" || arg == "--max-messages") { max_messages = std::stoi(argv[++i]); }
else if (arg == "-m" || arg == "--model") { params.model = argv[++i]; }
else if (arg == "-l" || arg == "--language") { params.language = argv[++i]; }
else if (arg == "-vth" || arg == "--vad-thold") { params.vad_thold = std::stof(argv[++i]); }
else if (arg == "-tr" || arg == "--translate") { params.translate = true; }
else if (arg == "-bs" || arg == "--beam-size") { params.beam_size = std::stoi(argv[++i]); }
else if (arg == "-ng" || arg == "--no-gpu") { params.use_gpu = false; }
else {
fprintf(stderr, "error: unknown argument: %s\n", arg.c_str());
print_usage(argc, argv, params);
return false;
}
}
return true;
}

int main(int argc, char** argv) {
ServerParams params;
if (!parse_params(argc, argv, params)) {
return 1;
}
if (params.port < 1 || params.port > 65535) {
throw std::invalid_argument("Invalid port number");
}
if (params.language != "auto" && whisper_lang_id(params.language.c_str()) == -1) {
fprintf(stderr, "error: unknown language '%s'\n", params.language.c_str());
return 1;
}

WhisperServer server(params);
server.run();
return 0;
}
79 changes: 79 additions & 0 deletions examples/websocket-stream/message-buffer.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#include <sstream>
#include <mutex>
#include <curl/curl.h>

#include "message-buffer.h"
extern std::string forward_url;
extern size_t max_messages;
namespace {
std::stringstream ss;
std::mutex mtx;
size_t current_count = 0;
static size_t write_callback(char* ptr, size_t size, size_t nmemb, void* userdata) {
((std::string*)userdata)->append(ptr, size * nmemb);
return size * nmemb;
}
}

void MessageBuffer::add_message(const char* msg) {
std::lock_guard<std::mutex> lock(mtx);
ss << std::string(msg) << '\n';
if (++current_count >= max_messages) {
flush();
}
}

std::string MessageBuffer::get_payload() {
std::lock_guard<std::mutex> lock(mtx);
return ss.str();
}

void MessageBuffer::flush() {
std::string payload = get_payload();
if (!payload.empty()) {
send_via_http(payload);
ss.str(""); //clear string stream
current_count = 0;
}
}

void MessageBuffer::send_via_http(const std::string& data) {
CURL* curl = curl_easy_init();
if (!curl) {
printf("CURL init failed");
return;
}

//make headers
struct curl_slist* headers = NULL;
headers = curl_slist_append(headers, "Content-Type: text/plain");
std::string cid_header = "X-Connection-ID: " + connection_id;
headers = curl_slist_append(headers, cid_header.c_str());

//config curl
std::string response;
printf("sending to %s\n", forward_url.c_str());
curl_easy_setopt(curl, CURLOPT_URL, forward_url.c_str());
curl_easy_setopt(curl, CURLOPT_POST, 1L);
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, data.c_str());
curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, data.size());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
curl_easy_setopt(curl, CURLOPT_TIMEOUT, 5L);
curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, 2L);

//run curl
for (int retry = 0; retry < 3; ++retry) {
CURLcode res = curl_easy_perform(curl);
if (res == CURLE_OK) {
printf("[Response (%s): %s\n", connection_id.c_str(), response.c_str());
break;
}
printf("[CURL error: %s\n", curl_easy_strerror(res));
}

//clean
curl_slist_free_all(headers);
curl_easy_cleanup(curl);
}
14 changes: 14 additions & 0 deletions examples/websocket-stream/message-buffer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#ifndef MESSAGE_BUFFER_H
#define MESSAGE_BUFFER_H
class MessageBuffer {
public:
std::string connection_id;
void add_message(const char* msg);

std::string get_payload();

void flush();

void send_via_http(const std::string& data);
};
#endif
23 changes: 23 additions & 0 deletions examples/websocket-stream/server-params.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#ifndef SERVER_PARAMS_H
#define SERVER_PARAMS_H
#include <thread>
struct ServerParams {
int32_t port = 9002;
int32_t n_threads = std::min(4, (int32_t)std::thread::hardware_concurrency());
int32_t audio_ctx = 0;
int32_t beam_size = -1;

float vad_thold = 0.6f;

bool translate = false;
bool print_special = false;
bool no_timestamps = true;
bool tinydiarize = false;
bool use_gpu = true;
bool flash_attn = true;

std::string language = "en";
std::string model = "ggml-large-v3-turbo.bin";
std::string host = "0.0.0.0";
};
#endif
Loading