* Add support for --carry-initial-prompt
* PR fixes for ruby and go
* Refactoring for readability
* WIP 1
* WIP 2
* PR fixes
* More PR fixes
* PR fix
* Further simplification
* d'oh
* One more logic fix
* Update src/whisper.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Truncate prompt_past0 upon initialization
* Slight simplification
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* cli: Fix assignment for vad_min_silence_duration_ms
Found and fixed this simple copy/paste error
* server : fix vad_min_silence_duration_ms assignment
---------
Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
This commit adds a check to the platform in use and adjust the path to
the addon.node shared library.
The motivation for this change is that on windows addon.node library is
built into build\bin\Release and on linux into build/Release.
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3360
* stream.wasm : add language selection support
This commit adds support for selecting the language in the stream.wasm
example. This is includes adding the model `base` which supports
multilingual transcription, and allowing the user to select a language
from a dropdown menu in the HTML interface.
The motivation for this is that it allows users to transcribe audio in
various languages.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3347
* squash! stream.wasm : add language selection support
Remove strdup() for language in stream.wasm and update butten text for
base (should not be "base.en" but just "base").
This commit adds a note to the README files of the WASM examples
about the `WHISPER_WASM_SINGLE_FILE` option.
The motivation for this is that currently this option is not documented
and might be surprising to users who expect a separate .wasm file to be
generated.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3290
* fix 404 link
* update link in whisper.wasm example
* update example in command.wasm
* update link in bench.wasm example
* update link in stream.wasm example
* Add DTW model large-v3-turbo parameters to server.cpp example
DTW support is available in whispercpp and the large-v3-turbo model has already been added to the sources, but the large-v3-turbo model hasn't been added to the server.cpp file to make use of it. This commit hopefully corrects that issue.
* match original linebreak of original server.cpp file after adding large.v3.turbo dtw
* whisper : add version function
This commit adds a version function to the whisper API.
The motivation for this is that it might be convenient to have a way to
programmatically check the version.
Example usage:
```c++
printf("Using whisper version: %s\n", whisper_version());
```
Will output:
```console
Using whisper version: 1.7.6
```
* examples : add version to android example CMakeLists.txt
* stream : add nullptr check of whisper_context
This commit adds a check to ensure that the `whisper_context` is not
null after initialization.
The motivation for this is that currently, if the initialization fails,
the program continues to run leading to a segmentation fault. This sort
of check is performed by others examples like whisper-cli.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3280#issuecomment-3003778035
* examples : add nullptr check for whisper_context
* android : update CMakeLists.txt to use FetchContent for ggml
This commit updates the CMakeLists.txt file for the Android Whisper
example to use FetchContent for managing the ggml library.
The motivation for this change is avoid having to make manual changes to
the CMakeLists.txt file after syncing the ggml library.
I've built and run the example locally to verify that it works as
expected.
Refs: https://github.com/ggml-org/whisper.cpp/pull/3265#issuecomment-2986715717
* android.java : update cmake to use FetchContent for ggml
This commit updates the CMake configuration for the Android Java example
to use `FetchContent` for including the `ggml` library. Do be able to
use FetchContent we also update the `compileSdkVersion` and
`targetSdkVersion` to 31, and the `buildToolsVersion` to '30.0.3'.
This also required a an update to the Gradle plugin version to 7.4.0.
The motivation for this change is avoid having to make manual changes to
the CMakeLists.txt file after syncing the ggml library.
This commit adds a conversion from stereo to mono in the
`read_audio_data` function of `common-whisper.cpp`.
The motivation for this change is prior to Commit
7d3da68f79 ("examples : use miniaudio for
direct decoding flac, mp3, ogg and wav (#2759)", there was a step that
read stereo int16 data -> pcm16 (448512 samples), and then converted to
mono (224256 samples), and then also convert to stereo in `pcmf32s.
The middle step here seems to have been missed when rewriting the code to
use Miniaudio and caused issues then transcribing stereo audio files.
For example, currently using the audio sample in the linked issue the
output is:
```console
[00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org
```
And with the change in this commit the output is:
```
[00:00:00.000 --> 00:00:01.500] (speaker 1) *sonnerie de téléphone*
[00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme !
[00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ?
[00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout !
[00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de...
[00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier
```
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3092
* server : add Voice Activity Detection (VAD) support
This commit adds support for Voice Activity Detection (VAD) in the
server example.
The motivation for this is to enable VAD processing when using
whisper-server.
Resolves: https://github.com/ggml-org/whisper.cpp/issues/3089
* server : add VAD parameters to usage in README.md [no ci]
This commit also adds a few missing parameters.
* server : fix conflicting short options [no ci]
This commit add support for language detection in the Whisper Node.js
addon example. It also updates the node addon to return an object
instead of an array as the results.
The motivation for this change is to enable the inclusion of the
detected language in the result, in addition to the transcription
segments.
For example, when using the `detect_language` option, the result will
now be:
```console
{ language: 'en' }
```
And if the `language` option is set to "auto", it will also return:
```console
{
language: 'en',
transcription: [
[
'00:00:00.000',
'00:00:07.600',
' And so my fellow Americans, ask not what your country can do for you,'
],
[
'00:00:07.600',
'00:00:10.600',
' ask what you can do for your country.'
]
]
}
```
This commit enable the node addon to suppress all output, even the
result of the transcription if the no_prints parameter is set to true.
The motivation for this is that for the node addon there is a
fullfilment handler/success callback to process the transcription
result. And it might be useful to be able to disable the printing of
the transcription result to the console, so that the user can handle
the result in their own way.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3176