whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Pedro Probst	58210d6a76	examples : fix node compilation (#2115 ) * node : fix compilation and update examples * node : fix readme * Update addon.node test	2024-05-02 22:52:55 +01:00
Georgi Gerganov	b0c3cbf2e8	main : pass nullptr when regex is empty (#2070 )	2024-04-17 12:23:47 +03:00
Emmanuel Schmidbauer	9fab28135c	server : add dtw (#2044 ) * server.cpp: add dtw * Update examples/server/server.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-15 22:16:58 +03:00
Pedro Probst	1b5439a6c2	node : support no timestamps (#2048 ) * fix: node: do not compute timestamps if you do not need them * feat: add no_timestamps parameter to node addon	2024-04-15 20:03:34 +03:00
Kendrick Taylor	5c554c04ff	whisper.nvim : fix missing reference to "model" variable (#2049 )	2024-04-15 19:41:28 +03:00
Ikko Eltociear Ashimine	c383f091a1	whisper : update grammar-parser.cpp (#2058 ) preceeding -> preceding	2024-04-15 19:40:27 +03:00
ulatekh	c15b4cda7d	common : fix file-handle leak in read_wav() (#2026 ) Now it cleans up in case of error.	2024-04-09 18:34:34 +03:00
Rotem Dan	d3cfb6ca2b	main : set stdin to binary mode on Windows (#2025 )	2024-04-09 18:33:32 +03:00
ulatekh	671b4bde6c	main : allow a response-file as the sole parameter (#2019 ) * The "main" example now allows a response-file as the sole parameter. A response-file is a text file with command-line parameters, one per line. Prefix the name of the response-file with "@" to identify it as such. It's used under MS Windows to work around command-line length limits. It may be useful under other platforms to simplify character-escaping. * minor : style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 18:31:16 +03:00
ulatekh	c8eeb93a6a	whisper : suppress tokens with a regex (#1997 ) * Allow a regular expression to describe tokens to suppress. Example: --suppress-tokens-re "[,\.]\|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens. Technique inspired by https://github.com/openai/whisper/discussions/1041 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Blind change to fix Java test. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 18:27:28 +03:00
ulatekh	319fe5146e	cmake : create solution folders (#2004 ) * Create solution folders in the CMake build. * Fixed non-SDL2 build. * Fixed emscripten build.	2024-04-09 18:23:33 +03:00
Georgi Gerganov	81a3c41aa0	talk-llama : sync llama.cpp	2024-04-07 16:21:08 +03:00
ulatekh	fc366b807a	main : add command-style grammar (#1998 ) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-28 12:02:10 +02:00
Georgi Gerganov	9fb308d90f	make : add grammar parser to common objects	2024-03-28 11:59:48 +02:00
Georgi Gerganov	2948c740a2	sync : ggml (#2001 ) * sync : update scripts * sync : ggml * talk-llama : sync llama.cpp * make : WHISPER_CUBLAS -> WHISPER_CUDA * ci : try to fix sycl build * talk-llama : fix make build	2024-03-27 18:55:10 +02:00
Georgi Gerganov	1558ec5a16	whisper : improve handling of prompts (#1981 ) * whisper : improve handling of prompts * whisper : add whisper_token_count helper	2024-03-25 14:48:19 +02:00
Mohammadreza Hendiani	04e48094e4	readme : add Fedora dependencies (#1970 ) * README.md fix documentaion and added fedora liunx dependencies for stream build * fix documentaion and added fedora liunx dependencies for command build * fix documentaion and added fedora liunx dependencies for talk build * fix documentaion and added fedora liunx dependencies for talk-llama build * reverted back mistakenly removed MacOS documentaion	2024-03-20 18:42:11 +02:00
denersc	741abb162c	whisper : token-level timestamps with DTW (#1485 ) * whisper.cpp: impl dtw algo * WIP: producing and placing DTW timestamps on tokens * Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. * Fix mistake causing incorrect alignment of dtw timestamps * implement N_TOP_MOST and CUSTOM alignment heads setting * whisper: fix typo on alignment heads enum * Fix issues related to changes in whisper.cpp * Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function * decoder: save cross QKs only if requested * Calling median filter with ggml_map_custom1 * Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads * Copying cross QKs from decoder backend correctly * dtw: cleanup * Fix incorrect n_frames passed to dtw when near end of audio * Fix aheads_masks_init for backend != CPU * whisper : minor style * main : add dtw (wip) * whisper: fix invalid memory access in aheads_masks_init * main : add dtw (cont) * whisper : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 18:25:26 +02:00
Jo Liss	e7794a868f	examples : rename --audio-context to --audio-ctx per help text (#1953 )	2024-03-18 17:53:33 +02:00
Georgi Gerganov	de4d067f1e	talk-llama : sync llama.cpp	2024-03-15 14:21:59 +02:00
slaren	f60ccfd83b	update examples and tests	2024-03-15 14:01:14 +02:00
Georgi Gerganov	2f5a5a66dd	talk-llama : use llama_decode instead of llama_eval	2024-03-08 12:04:43 +02:00
Georgi Gerganov	8e409d1113	talk-llama : sync llama.cpp	2024-03-08 11:55:50 +02:00
Georgi Gerganov	05d1b61af4	talk-llama : sync llama.cpp	2024-03-08 11:52:47 +02:00
F1L1P	2e2626b167	examples : Auto lowercase language parameter in main.cpp (#1928 ) * Auto lowercase language parameter * Update examples/main/main.cpp Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com> --------- Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2024-03-06 22:25:10 +00:00
zhouwg	c0c0ae2dea	examples : fix typo in bench.cpp (#1933 )	2024-03-06 22:21:44 +00:00
zhouwg	f22d27a385	whisper.android.java : fix returns in JNI (#1929 )	2024-03-05 15:59:26 +02:00
Georgi Gerganov	25d313b38b	talk-llama : sync llama.cpp	2024-02-28 13:04:05 +02:00
Georgi Gerganov	1711bb3881	sync : llama.cpp (ggml/0)	2024-02-28 13:00:30 +02:00
Andrew S	0d8fd8483a	stream.wasm : fix invalid memory access when no segments (#1902 ) No segments may be returned when a smaller sample buffer (EG 2048 samples) is sent to the worker.	2024-02-26 10:12:35 +02:00
Georgi Gerganov	3170841ed9	talk-llama : sync llama.cpp	2024-02-25 20:00:10 +02:00
Georgi Gerganov	578e47e70c	sync : llama.cpp (ggml/0)	2024-02-25 19:58:46 +02:00
Tamotsu Takahashi	f18738f247	talk, talk-llama : pass text_to_speak as a file (#1865 ) * talk-llama: pass file instead of arg it is too hard to quote text in a portable way * talk-llama: pass heard_ok as a file * talk-llama: let eleven-labs.py accept options Options: -v voice, -s savefile, -p (--play) * talk-llama: check installed commands in "speak" Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed * talk-llama: pass voice_id again in order to sync talk with talk-llama * talk: sync with talk-llama Passing text_to_speak as a file is safer and more portable cf. https://stackoverflow.com/a/59036879/45375 * talk and talk-llama: get all installed voices in speak.ps1 * talk and talk-llama: get voices from api * talk and talk-llama: add more options to eleven-labs.py and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/) ``` usage: eleven-labs.py [-q] [-l] [-h] [-n NAME \| -v NUMBER] [-f KEY=VAL] [-s FILE \| -p] [TEXTFILE] options: -q, --quick skip checking the required library action: TEXTFILE read the text file (default: stdin) -l, --list show the list of voices and exit -h, --help show this help and exit voice selection: -n NAME, --name NAME get a voice object by name (default: Arnold) -v NUMBER, --voice NUMBER get a voice object by number (see --list) -f KEY=VAL, --filter KEY=VAL filter voices by labels (default: "use case=narration") this option can be used multiple times filtering will be disabled if the first -f has no "=" (e.g. -f "any") output: -s FILE, --save FILE save the TTS to a file (default: audio.mp3) -p, --play play the TTS with ffplay ``` * examples: add speak_with_file() as suggested in the review * talk and talk-llama: ignore to_speak.txt	2024-02-24 09:24:47 +02:00
Abhilash Majumder	a0ddd8392c	whisper : add SYCL support (#1863 ) * add changes from llama upstream * add sycl abstraction * add sycl build * update cmake * add sycl build config * fix bug * fix bug * refactor build * fix bug * update build * call build * use sycl header * add examples * add target * fix typecast in quant.c * readd fp16 and readme * fix quant typecast * add sample * add readme * remove cxx file check	2024-02-23 09:22:24 +02:00
Georgi Gerganov	a2506909b1	talk-llama : sync llama.cpp	2024-02-22 23:30:53 +02:00
Georgi Gerganov	5fdb27ff80	ggml : 32-bit arm compat (#1891 ) * ggml : 32-bit arm compat * ggml : add ggml_vqtbl1q_s8 impl * ggml : cont	2024-02-22 18:31:40 +02:00
Georgi Gerganov	ce411498f6	sync : llama.cpp (ggml/0) ggml-ci	2024-02-22 15:12:36 +02:00
Davidson Francis	c56344b509	main : fix file existence check in main.cpp (#1889 ) In commit `dda4b0e` of PR #1872, I've introduced a check for the existence of files before loading the model. However, I haven't considered the case where whisper.cpp might read from stdin as well, and in such cases, the checks should ignore the "-" argument as it does not represent a regular file. Additionally, this commit removes the usage of 'stat()' in favor of the recently introduced function 'is_file_exist()' in common.cpp from PR #1871. Apologies for the bug introduced in the previous PR and any inconvenience it may have caused.	2024-02-22 15:01:08 +02:00
Georgi Gerganov	59119f4f20	talk-llama : sync llama.cpp	2024-02-20 12:09:57 +02:00
Georgi Gerganov	83afebe872	common : add IQ1_S (ggml/0) ggml-ci	2024-02-19 15:53:25 +02:00
Davidson Francis	dda4b0ed06	main : check if input files exist before proceeding (#1872 ) Until the most recent commit (`3d42463`), the main.cpp sample file does not check whether the input files exist or not. Consequently, the model is loaded first before reporting whether there was a failure or not when processing a file. In environments with HDD, this can take about 50 seconds or more, depending on the loaded model. This commit addresses this issue by checking in advance whether the input files exist or not.	2024-02-19 10:51:26 +02:00
Felix	07d04280be	examples : clean up common code (#1871 ) move some utility functions into common.h	2024-02-19 10:50:15 +02:00
Georgi Gerganov	551529290d	talk-llama : sync llama.cpp	2024-02-12 10:39:58 +02:00
dscripka	a6fb6ab597	examples : added audio_ctx argument to main and server (#1857 ) * added audio_ctx argument to main and server examples * Better default value Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * better default value (again) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-12 09:19:07 +02:00
Georgi Gerganov	f273e66dc6	examples : initialize context params properly (#1852 )	2024-02-11 16:39:12 +02:00
Georgi Gerganov	02b4c52c12	talk-llama : sync llama.cpp	2024-02-10 10:10:59 +02:00
Valentin Gosu	80e8a2ea39	server : allow CORS request with authorization headers (#1850 ) Whisper plugin in Obsidian requires an API key which is then sent as an authorization header. However, the presence of an authorization header requires a CORS Preflight, so both the OPTIONS method and the Access-Control-Allow-Headers: authorization must be handled.	2024-02-09 17:42:41 +02:00
Neuman Vong	19f8048139	whisper.android : how to build with CLBlast (#1809 ) * FetchContent * OpenCL * Documentation and make optional * Specify GGML build options in build.gradle * Use gradle properties * @ggerganov Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * @gpokat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-09 17:39:05 +02:00
Georgi Gerganov	434b8f3b96	talk-llama : stream response (#1121 )	2024-02-06 19:56:12 +02:00
Georgi Gerganov	7a74e929c8	sync : ggml (#0 )	2024-01-30 21:30:26 +02:00
JacobLinCool	ae5c4f7340	common : fix wav buffer detection (#1819 )	2024-01-30 19:35:08 +02:00
JacobLinCool	baa30bacdb	server : add fields to `verbose_json` response (#1802 ) * server: include additional fields in the verbose_json response as OpenAI does * server: show request examples on home page * server: todo note for compression_ratio and no_speech_prob * server: add simple demo form to the homepage	2024-01-30 14:15:55 +02:00
Georgi Gerganov	e72e4158de	talk-llama : sync llama.cpp	2024-01-28 19:44:10 +02:00
Georgi Gerganov	52cce82493	common : fix input buffer check (#1812 )	2024-01-27 17:33:09 +02:00
Georgi Gerganov	ef3c9ed9eb	talk-llama : sync llama.cpp	2024-01-27 17:24:53 +02:00
Michael Rienstra	4bbb60efce	docs : make model options / model install methods clearer (#1806 ) * Make models more "discoverable" * Clean up code block language identifiers * make 3 options clearer * undo Prettier formatter change * docs: `$` shell prompt, consistently * docs: minor changes	2024-01-26 17:39:54 +02:00
Neuman Vong	d6b9be21d7	whisper.android : return output from benchmarks (#1785 ) Benchmarks are failing because JNI expects a jstring and the benchmarks are missing a return statement (i.e., returning null). The functions actually build a jstring but don't return it, so this seems to have been an oversight. This patch returns the jstring and now the benchmarks run successfully. Fixes #1783.	2024-01-19 16:17:38 +02:00
Ryan Hitchman	c0329acde8	server : implement "verbose_json" format with token details (#1781 ) * examples/server: implement "verbose_json" format with token details. This is intended to mirror the format of openai's Python whisper.transcribe() return values. * server: don't write WAV to a temporary file if not converting * server: use std::lock_guard instead of manual lock/unlock	2024-01-18 22:58:42 +02:00
Georgi Gerganov	1f50a7d29f	sync : llama.cpp	2024-01-17 21:23:33 +02:00
Benjamin Heiniger	f6614155e4	talk-llama : optional wake-up command and audio confirmation (#1765 ) * talk-llama: add optional wake-word detection from command * talk-llama: add optional audio confirmation before generating answer * talk-llama: fix small formatting issue in output * talk-llama.cpp: fix Windows build	2024-01-16 15:52:01 +02:00
Przemysław Pawełczyk	f5f159c320	server : fix building and simplify lib deps on Windows (#1772 ) * make : fix server example building on MSYS2 environments (Windows) It was not working since commit `eff3570f78` when server was introduced. * cmake : simplify server example lib deps on Windows server uses httplib::Server, not httplib::SSLServer, so there is no need to mention cryptographic libraries in target_link_libraries. Winsock (ws2_32) suffices here. Also use plain library names like we use in other places.	2024-01-15 15:48:13 +02:00
Georgi Gerganov	6ebba525f1	talk-llama : sync llama.cpp	2024-01-14 18:08:20 +02:00
Georgi Gerganov	2a5874441d	talk-llama : llama.cpp	2024-01-14 11:06:28 +02:00
Georgi Gerganov	d08445c9ad	sync : ggml	2024-01-14 10:55:18 +02:00
Georgi Gerganov	f001a3b7b6	talk-llama : sync llama.cpp	2024-01-14 00:13:17 +02:00
RhinoDevel	db078a9ba8	talk-llama : add optional CLI arg to set the bot name (#1764 )	2024-01-13 20:51:35 +02:00
james wolf	a13a7da5ad	examples : add python example for transcription (#1744 ) * rebase and add simple python interface * moved python files to examples/python	2024-01-13 19:37:18 +02:00
Georgi Gerganov	40ae0962f4	talk-llama : sync llama.cpp	2024-01-12 22:04:51 +02:00
George Hindle	fbcb52d3cd	server : add more parameters to server api (#1754 ) * feat(server): add more parameters to server api * fix(server): reset params to original parsed values for each request	2024-01-12 13:42:52 +02:00
George Hindle	f7908f9bb8	params : don't compute timestamps when not printing them (#1755 )	2024-01-12 13:24:38 +02:00
Georgi Gerganov	00b7a4be02	talk-llama : sync llama.cpp	2024-01-11 22:10:10 +02:00
Georgi Gerganov	32e71a1861	sync : ggml	2024-01-11 21:54:17 +02:00
Georgi Gerganov	9c857cf280	sync : llama.cpp	2024-01-11 21:50:01 +02:00
RhinoDevel	bcc1658cd0	talk-llama : add optional Piper TTS support (#1749 ) Add commented-out command to optionally use Piper (https://github.com/rhasspy/piper) as text-to-speech solution for the talk-llama example. Piper voices sound almost like real people which is a big improvement (e.g.) from something like espeak.	2024-01-10 16:15:28 +02:00
Emmanuel Schmidbauer	c46886f599	server : add request path option(#1741 )	2024-01-08 22:39:51 +00:00
Georgi Gerganov	29f78392c1	main : add cli option to disable system prints (#1740 )	2024-01-08 16:41:28 +02:00
Georgi Gerganov	022756a872	server : fix server temperature + add temperature_inc (#1729 ) * server : fix server temperature + add temperature_inc * server : change dashes to underscores in parameter names	2024-01-07 13:35:14 +02:00
Georgi Gerganov	3b8c2dff57	talk-llama : sync latest llama.cpp	2024-01-06 17:22:57 +02:00
Georgi Gerganov	ab0a8593c5	whisper.swiftui : add .gitignore	2024-01-04 15:00:27 +02:00
Tamotsu Takahashi	d87de61ae6	ci : build with CLBlast + ggml-opencl use GGML_API (#1576 ) * Build with CLBlast * Declare GGML_API After rebasing, examples/talk-llama failed: "D:\a\whisper.cpp\whisper.cpp\build\ALL_BUILD.vcxproj" (build target) (1) -> "D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj" (default target) (14) -> (Link target) -> llama.obj : error LNK2019: unresolved external symbol ggml_cl_free_data referenced in function "public: __cdecl llama_model::~llama_model(void)" (??1llama_model@@QEAA@XZ) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj] llama.obj : error LNK2019: unresolved external symbol ggml_cl_transform_tensor referenced in function "public: void __cdecl llama_model_loader::load_all_data(struct ggml_context ,void (__cdecl)(float,void ),void ,struct llama_mlock *)" (?load_all_data@llama_model_loader@@QEAAXPEAUggml_context@@P6AXMPEAX@Z1PEAUllama_mlock@@@Z) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj] D:\a\whisper.cpp\whisper.cpp\build\bin\Release\talk-llama.exe : fatal error LNK1120: 2 unresolved externals [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]	2023-12-29 12:23:27 +02:00
Georgi Gerganov	3a5302108d	sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677 ) * sync : ggml * sync : llama.cpp * talk-llama : fix obsolete param * ggml-alloc : fix ggml_tallocr_is_own * talk.wasm : update to new ggml * ggml : fix type punning in ggml_scale * ggml : cuda jetson + arm quants warnings	2023-12-22 17:53:39 +02:00
bobqianic	d2419030b0	examples : Revert CMakeLists.txt for talk-llama (#1669 )	2023-12-21 22:48:52 +02:00
Georgi Gerganov	940de9dbe9	wchess : update README.md	2023-12-14 22:00:47 +02:00
Georgi Gerganov	375585c07c	wchess : update readme	2023-12-14 17:51:14 +02:00
fraxy-v	fd99ece8e3	wchess : whisper assisted chess (#1595 ) * wchess: whisper assisted chess * wchess: fix allowed moves in check * wchess: touchstart, touchend events * wchess: css, disabled button * wchess : html touches * wchess : minor fixes and code style * wchess : bump encoder context to 1280 * wchess : index.html * wchess : fix CI warnings * wchess : add array header * wchess : build static library * wchess : display grammar * wchess : update UX * wchess : add comment * wchess : add README --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-14 15:58:26 +02:00
Kreijstal	ec03661b20	cmake : target windows 8 or above for prefetchVirtualMemory in llama-talk (#1617 ) Since we use prefetchVirtualMemory we specify we target win 8 or above, otherwise other compilers will refuse to use the prefetchVirtualMemory api, (I understand you are loading it dynamically but the header definition has this limitation)	2023-12-12 11:35:00 +00:00
Kreijstal	6335933a5b	cmake : Fix bug in httplib.h for mingw (#1615 ) Fix bug in httlib.h for mingw, please see https://github.com/yhirose/cpp-httplib/issues/1669	2023-12-10 17:47:52 +00:00
Georgi Gerganov	9521ba6801	whisper.objc : disable timestamps for real-time transcription	2023-12-08 13:43:37 +02:00
Oleg Sidorov	3163090d89	server : pass max-len argument to the server (#1574 ) This commit fixes the missing parameter binding for max-len between the input arguments and wparams.	2023-12-05 23:01:45 +02:00
Aleksander Andrzejewski	a0ec3fac54	Server : Add support for .vtt format to Whisper server (#1578 ) - The code comes from examples/main - The output mimetype is set to text/vtt Example usage: ```shell curl 127.0.0.1:8080/inference \ -H "Content-Type: multipart/form-data" \ -F file="@samples/jfk.wav" \ -F temperature="0.2" \ -F response-format="vtt" ```	2023-11-30 23:44:26 +00:00
Oleg Sidorov	6559b538e5	server : backport .srt output format (#1565 ) This commit adds a support of .srt format to Whisper server. The code is effectively backported from examples/main. The output mimetype is set to application/x-subrip as per https://en.wikipedia.org/wiki/SubRip. Example usage: curl 127.0.0.1:8080/inference \ -H "Content-Type: multipart/form-data" \ -F file="@<file-path>" \ -F temperature="0.2" \ -F response-format="srt"	2023-11-28 15:42:58 +02:00
Kasumi	6b094b6dfe	server : set default CORS headers to allow all (#1567 )	2023-11-28 11:55:20 +02:00
Hang	641f2f4282	readme : update help (#1560 )	2023-11-27 12:04:08 +02:00
Ismatulla Mansurov	23c21e92eb	server : automatically convert audio on the server (#1539 ) * server : automatically convert audio on the server * server : remove rebundant comments * server : automatic conversion refactor * server : update server readme * server : remove unnecessary comments and tabs * server : put back remove calling * server : apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : check ffmpeg before the server lunch * server : fix indentation * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : fix function typo calling * server : fix function typo calling * server : add warning in readme --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-27 11:28:34 +02:00
ecneladis	a5881d619c	server : add --print-realtime param (#1541 ) * server : add --print-realtime param * Fix duplicate realtime output	2023-11-24 09:35:02 +02:00
Okabintaro	8328d1900f	fix(server): typo in temperature parameter (#1545 ) Also fixed another typo in comments.	2023-11-23 20:59:36 +02:00
Felix	5c7be85fdc	Change temp file name for server application (#1535 ) Avoid issue of removing file if it exists in the current working directory	2023-11-22 09:23:36 +01:00
Felix	9ac88f2b57	Close file after writing in server application (#1533 ) Fix of mistake leaving file open while reading it again as wav	2023-11-21 20:36:10 +01:00
Georgi Gerganov	46f5b6cb08	server : add video to readme	2023-11-21 17:30:43 +02:00
Felix	eff3570f78	server : add a REST Whisper server example with OAI-like API (#1380 ) * Add first draft of server * Added json support and base funcs for server.cpp * Add more user input via api-request also some clean up * Add reqest params and load post function Also some general clean up * Remove unused function * Add readme * Add exception handlers * Update examples/server/server.cpp * make : add server target * Add magic curl syntax Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-20 21:40:24 +02:00
Georgi Gerganov	a01b2e0971	sdl : fix audio callback (#1523 )	2023-11-20 13:16:38 +02:00
Georgi Gerganov	bebf0da983	quantize : add support for K-quant types	2023-11-16 16:18:24 +02:00
Sam Pullara	7883d1cae4	talk-llama : improve quote and backtick handling (#1364 ) * ISSUE-1329: replace " with ' so it doesn't try to execute code in backticks. * Typo * Update to keep possessives in the output Closes the ' then puts a ' in quotes then reopens the ' to escape the ' characters.	2023-11-16 10:34:05 +02:00
Georgi Gerganov	ccc85b4ff8	talk-llama : enable GPU by default	2023-11-15 21:33:00 +02:00
Georgi Gerganov	bfbaa4dce5	whisper : make large version explicit + fix data size units (#1493 )	2023-11-15 19:42:25 +02:00
Georgi Gerganov	b6c5f49b78	whisper : add batched decoding (#1486 ) * whisper : add whisper_batch * whisper : move kv_self to whisper_state * whisper : full batched decoding support * whisper : fix memory leak in whisper_batch * whisper : fix mem leak again + remove oboslete function * whisper : clear kv cache when using whisper_decode API * whisper : speed-up sampling * whisper : fix decoders initializer * bench : add batch size 5 bench * whisper : add comment about the KV cache size * whisper : add check for max number of decoders * whisper : avoid starting sampling threads with bs=1 * whisper : enable beam-search by default * cuda : sync llama.cpp fixes	2023-11-15 16:12:52 +02:00
Evan Jones	3e5c7feeff	whisper : add grammar-based sampling (#1229 ) * whisper : add grammar-based sampling * build : fix after master merge * command : fix exception when recognizing the command * whisper : fine-tuning grammar functionality * command : grammar-related improvements - option to read grammar from file - add sample grammars for colors and chess moves - fine-tune the performance further * grammars : add assistant + update comments * command : enable beam-search, add "no_timestamps", add "context", add p * whisper : remove comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-13 10:51:34 +02:00
rlapray	c23598e4ca	talk-llama : add n_gpu_layers parameter (#1475 )	2023-11-13 10:04:16 +02:00
Tong Li	54a08bde29	examples : add whisper.android.java for compatibility with older Android versions using Java (#1382 ) * save the recorded audio to a file * Alignment -help * Save the correct audio * chage to a consistent coding style * Correct typo * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Correct variable misuse * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * add .bin .cxx/ .gradle/ cmake-build-debug/ to gitignore add whisper.android.java * Added support for older versions of Android of Java * add examples for android java * add README.md for android java * add fullTranscribeWithTime * 增加 toString()方法和测试 * change return type to void * update to v1.4.1 * add WhisperService * chage to whisper_full_get_segment_t1 * add method transcribeDataWithTime * modified toString ``` return "[" + start + " --> " + end + "]:" + sentence; ``` * Optimize code logic * update text view on handle * set max lines * change Chinese to English * Update bindings/java/build.gradle * Update .gitignore * add android.java to github action * chage android.java to android_java in build.yml * remove gradle * chage jdk to temurin in android_java of CI * chage jdk to temurin 11 in android_java of CI * add x to gradlew * set api-level for android_java of CI * Update examples/whisper.android.java/app/src/main/jni/whisper/CMakeLists.txt * add ndk version in build.gradle * remove local.properties * add testFullTranscribeWithTime --------- Co-authored-by: litongmacos <litongjava@qq.com> Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2023-11-12 18:31:58 +02:00
Georgi Gerganov	b0502836b8	whisper : add full CUDA and Metal offloading (#1472 ) * whisper : migrate to ggml-backend * whisper : fix logit reading * whisper : fix tensor allocation during load * whisper : fix beam-search with CUDA * whisper : free backends + fix compile warning * whisper : print when CUDA is enabled * whisper : fix CoreML * make : clean-up * talk : fix compile warning * whisper : support ggml_conv with CUDA and Metal (#1473) * ggml : add CUDA support for ggml_conv * whisper : remove ggml_repeat for conv bias + single backend * cuda : fix im2col kernel * metal : add im2col support + mul mat-vec f16 x f16 * bench-all : add q4 models * whisper : clean-up * quantize-all : fix * ggml : im2col opts * whisper : avoid whisper_model_data wrapper * whisper : add note that ggml_mul_mat_pad does not work with CUDA * whisper : factor out graph compute in common function * whisper : fixes * whisper : fix UB with measure buffers * whisper : try to fix the parallel whisper_state functionality (#1479) * whisper : try to fix the parallel whisper_state functionality * whisper : fix multi-state Metal * whisper : free backend instances in whisper_state	2023-11-12 15:31:08 +02:00
Jakub Ráček	37947203e6	talk-llama : add language auto detect (#1467 ) * Add '-l auto' to talk-llama example * Update examples/talk-llama/talk-llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-09 19:21:44 +02:00
Sindre Sorhus	d03c60dd7f	ios : add support for Swift Package Manager (#1370 ) * Add support for Swift * Make it build in Xcode * Use the SPM package in the SwiftUI example app	2023-11-07 23:53:31 +02:00
Georgi Gerganov	2cdfc4e025	whisper : add support for large v3 (#1444 ) * whisper : add support for large v3 * bench : fix build + fix go bindings * bench : fix n_mels * models : update readme	2023-11-07 15:30:18 +02:00
Tobrun	973111088b	android : decouple example into a library and app module (#1445 )	2023-11-07 14:27:33 +02:00
Georgi Gerganov	b629d2d4fe	cmake : fix talk-llama build	2023-11-07 11:03:21 +02:00
Jhen-Jie Hong	75dc800d21	talk-llama : fix n_gpu_layers usage again (#1442 )	2023-11-07 10:51:27 +02:00
Jhen-Jie Hong	3989b29a9b	examples : fix n_gpu_layers usage in talk-llama (#1441 )	2023-11-07 01:36:23 +00:00
Jhen-Jie Hong	0463028bc2	whisper : add context param to disable gpu (#1293 ) * whisper : check state->ctx_metal not null * whisper : add whisper_context_params { use_gpu } * whisper : new API with params & deprecate old API * examples : use no-gpu param && whisper_init_from_file_with_params * whisper.objc : enable metal & disable on simulator * whisper.swiftui, metal : enable metal & support load default.metallib * whisper.android : use new API * bindings : use new API * addon.node : fix build & test * bindings : updata java binding * bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java * metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load * metal : move bundle var into block * metal : use SWIFT_PACKAGE instead of GGML_SWIFT * style : minor updates --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-06 11:04:24 +02:00
Georgi Gerganov	f96e1c5b78	sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) (#1422 ) * sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) * metal : allow env metal variable to override resource path (#1415) * Allow env variable to override resource path * Update ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * sync : restore common / main from `master` * sync : restore whisper from `master` * talk-llama : update to latest llama.cpp * ruby : fix build * ggml : fix 32-bit ARM build * ggml : fix MIN / MAX macro collisions + update ios bindings * ggml : fix ifdefs and MIN / MAX again * exampels : fix Obj-C and Swift examples * ggml : fix 32-bit ARM compatibility * ggml : one more attempt to fix 32-bit ARM compat * whisper : fix support for larger graphs --------- Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>	2023-11-03 21:35:05 +02:00
Asad Memon	d445098c8f	talk-llama : move up-to-date demo to top (#1417 )	2023-11-02 18:50:13 +02:00
Georgi Gerganov	74de25158e	talk-llama : add an up-to-date demo video	2023-11-02 15:28:48 +02:00
Aarni Koskela	bce49a260e	examples : Implement JSON output for Token-Level data in main (#1358 )	2023-10-31 19:54:52 +00:00
ai-at-home	dfe4bc6e59	README : Update README in stream to clarify where to compile from (Issue #1400 ) * Clarify doc about where to compile from * Update examples/stream/README.md * Update examples/stream/README.md * Update README.md --------- Co-authored-by: AI @ Home <> Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2023-10-29 17:11:13 +00:00
mkiol	940cdb1396	whisper : abort callback improvements (#1345 ) * whisper : initialize abort_callback to null * whisper : add example how to use abort_callback	2023-10-08 17:22:24 +03:00
bobqianic	08fa34882f	examples : move wav_writer from stream.cpp to common.h (#1317 ) * Allocate class on the stack instead of on the heap * Add class wav_writer * fix some minor issues * fix some minor issues * remove potential misleading API	2023-10-03 22:56:11 +03:00
brunofaustino	c76c11e59c	examples: Update the README for Talk - fixing the gpt2 URL (#1334 )	2023-10-01 04:21:32 +08:00
litong	707507ff6d	Examples: Add save audio to file option in stream.cpp (#1310 ) * save the recorded audio to a file * Alignment -help * Save the correct audio * chage to a consistent coding style * Correct typo * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Correct variable misuse * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp --------- Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2023-09-22 23:43:21 +08:00
Evgeny Kuznetsov	700f63a806	bench: fix missing include <cstring> (#1303 )	2023-09-18 15:51:10 +08:00
Georgi Gerganov	1ca4041b86	talk-llama : update to latest llama.cpp	2023-09-15 20:06:31 +03:00
Georgi Gerganov	93935980f8	whisper : Metal and ggml-alloc support (#1270 ) * metal : init * whisper : factor out graph builds * whisper : allocate encoder and decoder using ggml-alloc * whisper : ggml-alloc is now supported * whisper : CoreML support ggml-alloc * build : fix ggml-alloc * ios : update submodule * extra : update sync-ggml.sh script to also sync ggml-alloc * ci : see if this is causing the crash * whisper : refactor ggml-alloc init * whisper.android : try to fix build * whisper : initial Metal version * ci : try to debug vmem issue * metal : decoder works on GPU! * metal : add multi-decoder support * ggml : fix ggml_nbytes (probably temp solution) * metal : run "cross" step on the GPU * whisper : remove ggml_repeat in the encoder * whisper : offload the Encoder to Metal * ggml : use simpler ggml_bytes() implementation * ggml-alloc : try to make CI happy by reducing vram to 128GB * whisper : add whisper_allocr to wrap ggml_allocr * whisper : factor out alloc init in a function * cmake : update to support Metal build * whisper : add <functional> header * objc : fix build (no Metal yet) * ios : add Metal support * swiftui : fix build * metal : speed-up KQ multiplication * metal : sync latest llama.cpp kernels * readme : add Metal info * ios : update submodule * coreml : add code to toggle Core ML config (CPU, ANE, GPU) * bench : fix timings by running a pre-heat * bench : start benching the decoder * whisper : add ggml_mul_mat_pad * bench : fix uninitialized vars * whisper : add comment for disabling mul-mat padding * whisper : add description of ggml_mul_mat_pad * whisper : clean-up ggml_mul_mat_pad * metal : remove the "concurrent" flag * bench : variable n_past * ios : update SPM package	2023-09-15 12:18:18 +03:00
Przemysław Pawełczyk	b55b505690	build : do not use _GNU_SOURCE gratuitously (#1129 ) * Do not use _GNU_SOURCE gratuitously. What is needed to build whisper.cpp and examples is availability of stuff defined in The Open Group Base Specifications Issue 6 (https://pubs.opengroup.org/onlinepubs/009695399/) known also as Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions, plus some stuff from BSD that is not specified in POSIX.1. Well, that was true until NUMA support was added recently in ggml, so enable GNU libc extensions for Linux builds to cover that. There is no need to penalize musl libc which simply follows standards. Not having feature test macros in source code gives greater flexibility to those wanting to reuse it in 3rd party app, as they can build it with minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs. It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2. * examples : include SDL headers before other headers Avoid macOS build error when _DARWIN_C_SOURCE is not defined, brought by SDL2 relying on Darwin extension memset_pattern4/8/16 (from string.h). * make : enable BSD extensions for DragonFlyBSD to expose RLIMIT_MEMLOCK * make : use BSD-specific FTMs to enable alloca on BSDs * make : fix OpenBSD build by exposing newer POSIX definitions * cmake : follow recent FTM improvements from Makefile	2023-09-07 12:36:14 +03:00
Georgi Gerganov	2818de21ff	examples : fix build + compile warnings (close #1256 )	2023-09-07 12:33:12 +03:00
Digipom	afa5477d1c	whisper.android : bump gradle plugin and dependencies + a lint pass (#1255 )	2023-09-07 12:15:59 +03:00
Digipom	f990610776	whisper.android : address ARM's big.LITTLE arch by checking cpu info (#1254 ) Addresses https://github.com/ggerganov/whisper.cpp/issues/1248	2023-09-06 18:32:30 +03:00
Georgi Gerganov	59a3d0cb57	ggml : sync (ggml-alloc, GPU, eps, etc.) (#1220 ) * ggml : sync (ggml-alloc, GPU, eps, etc.) * ggml : fix build * wasm : fix build	2023-09-05 13:54:40 +03:00
Jhen-Jie Hong	99d3c105f5	whisper.android : fix cmake multiple libraries build (#1224 ) * whisper.android : fix multiple libraries build * fix flags for default target	2023-08-30 14:45:13 +03:00
AustinMroz	175ffa64ee	examples : vim plugin and LSP server (#1144 ) * Initial proof of concept Vim plugin At present, this is likely only slightly better than feature parity with the existing whisper.nvim Known issues: Trailing whitespace Up to an existing length(5 seconds) of speech may be processed when listening is enabled CPU cycles are spent processing speech even when not listening. Fixing these issues is likely dependent upon future efforts to create a dedicated library instead of wrapping examples/stream * Support $WHISPER_CPP_HOME environment variable A minor misunderstanding of the whisper.nvim implementation resulted in a plugin that was functional, but not a drop in replacement as it should be now. * Initial progress on LSP implementation Libcall is nonviable because the library is immediately freed after a call is made. Further investigation has shown Language Server Protocol as a promising alternative that both simplifies the required logic on the vimscript side and increases the ease with which plugins for other editors could be made in the future. This is a very large undertaking and my progress has slowed substantially. Work is far from being in a usable state, but I wish to keep track of major refactors for organizational purposes. * Rewrite audio windowing of guided transcription One of the defining goals of this venture is allowing consecutive commands to be rattled off without the existing deadzones of the current implementation. * Add unguided_transcription. Cleanup. The unguided transcription implantation heavily borrows from existing example implementations and the guided_transcription logic. A high level pass was done to check that method arguments are accurate to what inputs are actually required. A first attempt at cancellation support was added for record keeping, but will be deleted in a future commit. * Fix compilation. Resolves a large number of compilation errors. No testing has been done yet for execution errors. Update Makefile and .gitignore * Functional unguided_transcription * Functional guided_transcription Fix commandset_list being passed by value Properly register the first token of a multitoken command * Minor changes before time fix I've apparently made an awfully major mistake in thinking that unix time was in milliseconds and will be changing all timekeeping code to use standardized methods. In preparation for this is a number of minor bugfixes. Output is manually flushed. An echo method has been added. registerCommandset now wraps the returned index * Swap timekeeping to use std::chrono * Add work in progress lsp backed whisper.vim plugin Current progress blockers are Adding modality awareness to the command processing (specifically, motion prompting) Improving the VAD to be a little more responsive (testing start of activity) * Reworked vim plugin command loop * Fix change inside Multiple bug fixes that, crucially, bring the plugin to the point where a demonstration video is possible Add better echo messaging so whisper_log isn't required Add loading complete message as indicator when listening has started Insert/append are actually included in command sets Some more heavy handed corrections to prevent a double exit when leaving insert mode As a somewhat hacky fix, the very first space is removed when inserting. This cleans up most use cases, but leaves me unsatisfied with the few cases it would be desired. * Forcibly set commandset_index to 0 after subinsert Also remove unnecessary ! to use builtin vim command * Fix upper A minor scope mistake was causing upper'd inputs to be eaten. This was fixed and echoing was slightly improved for clarity. * Fix formatting Corrects indentation to 4 spaces as project standard Slightly better error support for malformed json input * Remove obsolete vim plugin * Add json.hpp library The same library that is used for the llama.cpp server * Minor cleanups add lsp to the make clean directive. remove a redundant params definition. reorder whisper.vim logging for subtranscriptions Corrections to unlets (variables of argument scope appear immutable) * Fix indentation. Fallback for subTranscription Indentation has been changed to 4 spaces. Unit testing has been set up, I'm opting not to include it in the repository for now. It however, has revealed a bug in the state logic where a subtranscription can be initiated without having a saved command When this occurs, append is added as a fallback * Move audio polling logic to a subfunction While work on the improved vad will continue, It's grown to be a little out of scope. Instead, a future commit will perform multiple detection passes at substretches of audio when a backlog of audio exists. To facilitate this, and prevent code duplication, the vad code has been moved into a subfunction shared by both the unguided and guided transcription functions. * Test for voice over subchunks if backlog > 1s As the existing VAD implementation only checks for a falling edge at the end of an audio chunk. It fails to detect voice in cases where the recorded voice is only at the beginning of the audio. To ameliorate this, when the timestamp would cause analysis of audio over a second in length, it is split into 1 second length subchunks which are individually tested. Results are promising, but there seems to be a remaining bug with unguided transcription likely related to saving context * Limit the maximum length of audio input. This existing VAD implementation only detects falling edges, which means any gap in the users speaking is processed for transcription. This simply establishes a constant maximum length depending on the type of transcription. Uguided gets a generous 10 seconds and guided, 2. While quick testing showed that commands are generally around a half a second to a second, limiting commands to an even second resulted in extreme degradation of quality. (Seemingly always the same output for a given commandset) * Unguided timestamp tracking, cleanup Unguided transcriptions where not setup to allow for passing of timestamp data forward, but have been corrected. No_context is now always set to false. While conceptually desirable for the quality of guided transcription, It was seemingly responsible for prior command inputs ghosting in unguided transcription. Save and Run are now tracked by command number instead of command text. While command_text was provided for convenience, I wish to keep command index authoritative. This gives greater consistency and potentially allows for end users to rename or even translate the spoken versions of these commands * By default, maintain mode. Previously, mode was reset to 0 unless otherwise set. In addition to causing some edge cases, this was didn't mesh well with the existing approach to visual mode. With this change, initial tests indicate visual mode is functional. * Add undo breaks before subtranscriptions Subtranscriptions use undo as a hack to allow for partial responses to be displayed. However, scripts don't cause an undo break mid execution unless specifically instructed to. This meant that multiple unguided transcriptions from a single session would cause a latter to undo a former. This is now fixed and undo should be reasonably usable as a command. * Append instead of insert for new undo sequence When entering and leavening insert mode with `i`, the cursor shifts one column to the left. This is remedied by using append instead of insert for setting these breaks in the undo sequence `-` was also added to the pronunciation dictionary to be pronounced as minus as it was causing a particularly high failure rate. * Move undo sequence breaks to command execution Previously, undo sequence breaks were triggered when there was a command that caused a move to insert mode. This caused commands that changed state (like delete or paste) to be bundled together with into the last command that caused text to be entered. * Fix repeat. Add space, carrot, dollar commands Repeat (.) wasn't being tracked properly just like undo and is being manually tracked now. While efforts have been made to properly handle spaces, it was particularly finicky to add a single space when one is needed. A special 'space' command has been added to insert a single space and move the cursor after it. Carrot and Dollar commands have been added for start of line and end of line respectively. These are both simple to implement, and just a matter of defining a pronunciation. * Return error on duplicate in commandset Not every command in the commandset tokenizes to a single token. Because of this, it's possible for that two commands could resolve to the same single token after subsequent tokens are discarded. This commit adds a simple check for duplicates when a commandset is registered and returns an error if so. Additional code will be required later on the vim side to actually process this error. * Add support for user-defined commands This adds a user definable dictionary from spoken keys to strings or funcrefs. All keys are added to the commandlist and when spoken, trigger the corresponding function. Like "save" and "run", these user commands are only available when the command buffer is empty. * Add readme, update cmake * Add area commandset. Refactor spoken_dict Area commands (inside word, around sentence...) have been given a commandset as considered earlier. Verbose definitions for spoken_dict entries now use dicts instead of lists. This shortens the definition for most keys that require it and scales better with the addition of further commandsets * Add mark, jump. Fix change under visual. Mark (m) and jump (') have been added. When a visual selection was executed upon a command that initiated a subtranscription (change) the area of the visual selection is not properly tracked which causes the attempt to stream in partial response to fail. This is solved by disabling partial transcriptions from being streamed when a subtranscription is started while in visual mode. * Accommodate ignorecase. Fix change. From testing on older different versions of vim, the test for distinguishing an 'R' replace all from an 'r' replace could fail if ignorecase was set. The comparison has been changed to explicitly require case matching Change detection has been moved to the execution section as it was missing the change+motion case. * Support registers. Fix README typo There's no logic to prevent doubled register entry, but the functional result is equivalent to if the same key order was typed into vim. A minor typo in the readme. I've mismemorized the mnemonic for 't' as 'to' instead of till., but 'to' can't be used as it's a homophone with '2'. While there was no mistake in the actual logic, it was misleading to use 'to' in the readme.	2023-08-27 21:35:06 +03:00
bobqianic	7e54df414e	whisper : significantly improve the inference quality (#1148 ) * Fix MSVC compile error C3688 Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC. * Significantly improve inference quality In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference. * Significantly improve inference quality At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue. * Addressed a few minor issues Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`. * Significantly improve inference quality Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information. * Add annotation and performance improvement * Calculate FFT only when fft_in are not all zero * Some minor performance improvement * Fixed a bug impacting inference quality * The first version after all the analysis is completed. * Fix some bugs and add debug mode * Fixed several bugs * Temporarily disable speed-up mode and add debug mode. * Add debug mode * Disable speed-up mode and add debug mode * Fix CI error (#1) * Fix error * Fix error * Fixed several bugs including [BLANK_AUDIO] problem * Remove Hard-coded hann window * Some Final Fix (#2) * Fix error * Fix error * Probably the last commit * Probably the last commit * whisper : minor coding style changes * whisper : remove debug from public API --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-27 19:51:33 +03:00
junkfood	20a80972f4	whisper.android : migrate from ndk-build to CMake (#1204 )	2023-08-27 19:35:16 +03:00
Yunès	7ef3f3837e	main : log probs to text file (#1205 ) * token/probability file generated with -ls * code comment cleaning * main : indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-27 19:09:06 +03:00
Jhen-Jie Hong	a4bb2df36a	quantize : fix load vocab crash when len is 128 (#1160 ) * quantize : fix load vocab crash when len is 128 * ci : add quantize job	2023-08-06 11:04:42 +03:00
Duncan McConnell	b948361956	examples : add tinydiarization support for streaming (#1137 )	2023-08-03 11:24:07 +03:00
Hrishikesh Barman	925915ae37	whisper : move progress calculation out of whisper.cpp (#1081 ) Current `progress_step` was hardcoded into whisper.cpp, this resulted in bindings having to access progress only at that step even if progress callback was being called at every iteration. With this change we get greater granularity progress reporting from whisper.cpp and bindings/implementations can define their own progress step.	2023-07-25 18:53:34 +03:00
AustinMroz	97f4a7fee0	examples : add Vim plugin (#1131 ) * Initial proof of concept Vim plugin At present, this is likely only slightly better than feature parity with the existing whisper.nvim Known issues: Trailing whitespace Up to an existing length(5 seconds) of speech may be processed when listening is enabled CPU cycles are spent processing speech even when not listening. Fixing these issues is likely dependent upon future efforts to create a dedicated library instead of wrapping examples/stream * Support $WHISPER_CPP_HOME environment variable A minor misunderstanding of the whisper.nvim implementation resulted in a plugin that was functional, but not a drop in replacement as it should be now.	2023-07-25 18:34:23 +03:00
Georgi Gerganov	4774d2feb0	whisper : minor OpenVINO refactoring (#1037 ) Hopefully I didn't break something - haven't tested	2023-07-04 20:28:27 +03:00
Ryan Metcalfe	62b81276e0	whisper : add OpenVINO support (#1037 ) * openvino: use OpenVINO encoder inference * openvino: add python script for OpenVINO model generation * whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * whisper: Fix compilation error * whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures * cmake: Add openvino-encoder as separate object target * whisper : minor style fixes * minor : indentation fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-04 15:56:11 +03:00
Akash Mahajan	c8d0f5fe98	whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058 ) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-04 09:45:00 +03:00
Georgi Gerganov	fdf58a6668	talk-llama : fix new rope interface	2023-07-03 19:24:01 +03:00
Georgi Gerganov	8ba42095c5	Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027 )" This reverts commit `3f7a03ebe3`.	2023-07-02 21:53:52 +03:00
Georgi Gerganov	d6509bf78d	ggml : sync latest repo (mostly refactoring changes)	2023-07-02 21:46:09 +03:00
Przemysław Pawełczyk	85ed71aaec	talk-llama : fix build on macOS (#1062 ) * talk-llama : use posix_madvise() instead of madvise() derived from BSD sed -i 's,\<madvise\>,posix_&,g;s,\<MADV_,POSIX_&,g' examples/talk-llama/llama-util.h * make : enable Darwin extensions for macOS builds This is an attempt at fixing macOS build error coming from the fact that RLIMIT_MEMLOCK define is not available there without Darwin extensions.	2023-06-28 22:34:50 +03:00
Przemysław Pawełczyk	3f7a03ebe3	ggml : do not use _GNU_SOURCE gratuitously (#1027 ) * Do not use _GNU_SOURCE gratuitously. What is needed to build whisper.cpp and examples is availability of stuff defined in The Open Group Base Specifications Issue 6 (https://pubs.opengroup.org/onlinepubs/009695399/) known also as Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions. There is no need to penalize musl libc which simply follows standards. Not having feature test macros in source code gives greater flexibility to those wanting to reuse it in 3rd party app, as they can build it with minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs. It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2. * examples : include SDL headers before other headers This is an attempt at fixing macOS build error coming from SDL2 relying on Darwin extension memset_pattern4/8/16 coming from Apple's string.h.	2023-06-25 16:34:30 +03:00
Przemysław Pawełczyk	62642bb61c	talk-llama : fix build after ggml sync (#1049 ) sed -i 's,GGML_BACKEND_CUDA,GGML_BACKEND_GPU,g' examples/talk-llama/llama.cpp	2023-06-25 16:13:50 +03:00
Roddur Dasgupta	f11f33f1c0	models : cd statements are quoted to allow spaces in path (#1041 )	2023-06-25 15:27:28 +03:00
Colin	14baf2e7f3	main : add diarization support for all current output types (#1031 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-25 15:07:57 +03:00
Georgi Gerganov	5feb0dffba	ggml : sync latest ggml lib	2023-06-25 14:30:44 +03:00
faker	598f607e28	main : gracefully exit when invalid params are passed (#1002 ) * Refactor whisper_params_parse to return false on failure * Updated help flag behavior	2023-06-25 13:51:59 +03:00
Nicholas Albion	5b9e59bc07	`speak` scripts for Windows	2023-06-01 22:45:00 +10:00
geniusnut	ce6f747064	whisper.android : support decode wav file has 2 channels (#972 )	2023-05-31 10:13:14 +03:00
DGdev91	5e2b3407ef	examples : update elevenlabs scripts to use official python API (#837 ) * Update elevenlabs example to use ufficial python API * Update elevenlabs example to use official python API	2023-05-24 21:11:01 +03:00
Georgi Gerganov	77eab3fbfe	talk-llama : sync latest llama.cpp (close #922 , close #954 )	2023-05-23 14:04:39 +03:00
Georgi Gerganov	e410cfc3ce	ggml : sync latest ggml repo - new Q4 and Q8 quantization - updated CUDA	2023-05-20 18:56:30 +03:00
Georgi Gerganov	0cb820e0f9	talk-llama : fix build + sync latest llama.cpp	2023-05-14 18:46:42 +03:00
Georgi Gerganov	e693074aa6	ggml : sync latest ggml - New Q4 and Q5 formats - Various improvements	2023-05-14 18:04:23 +03:00
Rich Jones	d652cf12ec	main : fix help for --no-timestamps arg (#908 )	2023-05-14 17:54:57 +03:00
Jhen-Jie Hong	5300117471	whisper.objc : enable Core ML in example & fix segmentation fault (#910 ) * coreml : update endcoder header import path * coreml : force objc_arc in whisper-encoder.mm * whisper.objc : create coreml/ group link * whisper.objc : add coreml model link * whisper.objc : update readme * coreml : use -fobjc-arc for coreml/whisper-encoder.mm * ci: create dummy .mlmodelc for pass ios build * whisper.objc : update readme --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-14 09:47:02 +03:00
Luis Herrera	4e4d00c67a	talk-llama : only copy used KV cache in get / set state (#890 ) --------- Co-authored-by: ejones <evan.q.jones@gmail.com>	2023-05-08 20:59:21 +03:00
Luis Herrera	0bf680fea2	talk-llama : fix session prompt load (#854 )	2023-05-02 20:05:27 +03:00
CRD716	b806420873	whisper : add detect-language mode (#853 ) * add detectlanguage flag * renaming and help * no idea why that last one didn't commit * run language detection if dl is set * help message fix * various fixes * fix quitting * fix language being english on print	2023-05-02 19:51:52 +03:00
Luis Herrera	be5911a9f3	talk-llama : add --session support (#845 ) * feat: adding session support * readme: adding --session info in examples/talk-llama * llama: adding session fixes * readme: updating session doc * talk-llama: update the value of need_to_save_session to true in order to save the session in the subsequent interaction * talk-llama: adding missing function which updates session_tokens	2023-05-01 20:18:10 +03:00
Georgi Gerganov	7765770f89	whisper : add memory sizes for Q8_0 (close #846 )	2023-05-01 10:03:56 +03:00
Baffin Lee	872a85ae94	whisper.wasm : fix typo in readme (#832 )	2023-05-01 09:28:05 +03:00
Georgi Gerganov	c94c469592	whisper : fix quantize bug (#842 ) * whisper : debug * whisper : fix bug during quantization	2023-04-30 22:50:04 +03:00
Georgi Gerganov	4a7d49af95	examples : fix + refactor Levenshtein distance	2023-04-30 19:12:49 +03:00
Georgi Gerganov	794b162a46	whisper : add integer quantization support (#540 ) * whisper : add integer quantization support * examples : add common-ggml + prepare to add "quantize" tool * whisper : quantization tool ready * whisper : fix F32 support * whisper : try to fix shared lib linkage * wasm : update quantized models to Q5 * bench.wasm : remove "medium" button * bench.wasm : fix custom model button * ggml : add Q5_0 and Q5_1 WASM SIMD * wasm : add quantized models to all WASM examples * wasm : bump DB version number to 2 * talk-llama : update example to latest llama.cpp * node : increase test timeout to 10s * readme : add information for model quantization * wasm : add links to other examples	2023-04-30 18:51:57 +03:00
Georgi Gerganov	5fd1bdd7fc	whisper : add GPU support via cuBLAS (#834 ) * make : add WHISPER_CUBLAS * make : fix CUBLAS build * whisper : disable Flash Attention + adjust memory buffers * whisper : remove old commented code * readme : add cuBLAS instructions * cmake : add WHISPER_CUBLAS option * gitignore : ignore build-cublas	2023-04-30 12:14:33 +03:00
Zollner	5cc17418c7	whisper.android : add some tips (#816 )	2023-04-29 11:00:20 +03:00
Laytan Laats	70567eff23	main : escape quotes in csv output (#815 )	2023-04-23 19:01:59 +03:00
Taras Glek	02ec83c5d5	stream : flush upon finishing inference (#811 )	2023-04-23 17:00:30 +03:00
Philipp Zabel	2bd4b8d577	examples : add missing #include <cstdint> (#798 ) common.cpp uses uint8_t and uint64_t, which are defined in <cstdint>.	2023-04-23 16:52:52 +03:00
Tauseef Mohiuddin	eecf2c3d41	main : update escape_double_quotes() function (#776 ) Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string. Changes Made: - Renamed the function to escape_quotes_and_backslashes - Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes. - Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash. Resolves: #769	2023-04-23 16:47:30 +03:00
Georgi Gerganov	f19e23fbd1	whisper : restore decoder temperature fallbacks I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close #471 #477 #508 #612 #719 #731	2023-04-15 16:12:55 +03:00
Bader-eddine Ouaich	2c856fb9e5	whisper : fix potential memory leaks (#740 ) * fix potential memory leak if whisper_init_state failed * fix potential memory leak if gpt2_init failed	2023-04-14 20:05:56 +03:00
Ali Alameh	2c4ac2627d	stream : support language auto-detect (#501 ) #445 fix Language auto-detect "auto" flag does not work using the stream tool	2023-04-14 20:02:18 +03:00
DGdev91	001083a769	talk, talk-llama : add basic example script for eleven-labs tts (#728 )	2023-04-14 19:53:58 +03:00
Maciek	78548dc03f	talk-llama : correct default speak.sh path (#720 ) There is `speak.sh` file in `./examples/talk-llama` as described in README. However `./examples/talk/speak.sh` is used in `talk-llama.cpp`, this commit corrects that.	2023-04-14 19:36:09 +03:00
LittleLoli	66110dafcc	main : add lrc output support (#718 ) * add lrc output support. * fix wrong comment	2023-04-14 19:35:33 +03:00
Georgi Gerganov	514cd04452	whisper : fix bug in prompt processing (close #705 ) Was dereferencing a dangling pointer	2023-04-14 19:17:07 +03:00
Georgi Gerganov	114df388fe	talk-llama : increase context to 2048	2023-04-10 23:09:15 +03:00
Georgi Gerganov	ea36831459	talk-llama : update to latest llama.cpp (improved performance)	2023-04-10 22:59:13 +03:00
InconsolableCellist	5e6e2187a3	talk-llama : fixing usage message for talk-llama (#687 ) "-ml" instead of "-mg" for specifying the llama file	2023-03-30 00:10:20 +03:00
Georgi Gerganov	a7f1f33715	main : add <cstring> header	2023-03-29 23:59:45 +03:00
Lucas Zanek	86ecfc6333	whisper.addon : fixed test to new async implementation (#686 ) * fixed blocking code on node addon * modify the example to run async * format * added logic to see the whisper output * added logic to see the whisper output * removed extra function for more clean example * fixed whisper test to new async implementation	2023-03-29 23:59:17 +03:00
Egor Egorov	0f759f125d	main : fix typo in JSON output (#648 ) * typo in JSON output * fix double quotes in JSON output	2023-03-29 23:26:39 +03:00
Jhen-Jie Hong	eefed45e37	whisper : add initial_prompt param (#645 )	2023-03-29 23:23:23 +03:00
Jonno	21c1e6afc5	whisper.swiftui : update README.md (#682 ) - Slight tweaks to README for improved comprehension.	2023-03-29 23:04:38 +03:00
Evan Jones	a47e812a54	talk-llama : add alpaca support (#668 )	2023-03-29 23:01:14 +03:00
Georgi Gerganov	e5c197d8aa	talk-llama : add discussion link	2023-03-28 10:11:34 +03:00
Georgi Gerganov	7cd1d3bc34	talk-llama : try to fix windows build ..	2023-03-27 22:40:59 +03:00
Georgi Gerganov	4a0deb8b1e	talk-llama : add new example + sync ggml from llama.cpp (#664 ) * talk-llama : talk with LLaMA AI * talk.llama : disable EOS token * talk-llama : add README instructions * ggml : fix build in debug	2023-03-27 21:00:32 +03:00
Lucas Zanek	21165580a1	Nodejs Addon blocking main thread. Implemented Napi::AsyncWorker (#642 ) * fixed blocking code on node addon * modify the example to run async * format * added logic to see the whisper output * added logic to see the whisper output * removed extra function for more clean example	2023-03-22 22:19:22 +02:00
Jhen-Jie Hong	1d749919e3	whisper.objc : add `-O3 -DNDEBUG` in release mode (#640 )	2023-03-22 22:16:04 +02:00
Leo Moll	8fcd1a3b32	main : provide option for creating JSON output (#615 ) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-22 21:37:36 +02:00
Georgi Gerganov	1beff6f66d	models : change HF hosting from dataset to model	2023-03-22 20:44:56 +02:00
Takeshi Inoue	09e9068007	whisper.android : support benchmark for Android example. (#542 ) * whisper.android: Support benchmark for Android example. * whisper.android: update screenshot in README. * update: Make text selectable for copy & paste. * Update whisper.h to restore API name Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * whisper.android: Restore original API names. --------- Co-authored-by: tinoue <tinoue@xevo.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-07 21:36:30 +02:00
venkr	b597c5a779	qual-bench.sh : add quality comparison tool, and update main.cpp to allow using a font file (#569 )	2023-03-06 19:18:11 +02:00
Takeshi Inoue	a3fb6c507f	whisper.android : enable fp16 instrinsics (FP16_VA) which is supported by ARMv8.2 or later. (#572 )	2023-03-06 19:15:57 +02:00
sandrohanea	59fdcd19c8	whisper : add whisper_state + default state on the whisper_context (#523 ) * Added whisper state + default state on the whisper_context * Fixed some examples and bindings * Fixed whisper_n_len (which was used in some binding) and added whisper_n_len_from_state * Fixed comments * whisper : reuse kv_cache_free() and fix compiler warnings * whisper : clean-up the API comments --------- Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-05 21:42:19 +02:00
Georgi Gerganov	478289a4b3	whisper : set no_context == true by default (#537 )	2023-03-05 20:53:43 +02:00
HY. Kelvin Lee	72af0f5697	main : add csv header (#552 )	2023-03-02 18:32:16 +02:00
Georgi Gerganov	f254e78737	yt-wsp.sh : print help on empty args	2023-02-18 09:42:31 +02:00
conradg	69e6e4644a	main : fix std in input (#503 ) if we don't add this as an explicit check, then we get an "error: unknown argument: -" later on	2023-02-15 19:31:16 +02:00
Georgi Gerganov	09d7d2b68e	examples : refactor in order to reuse code and reduce duplication (#482 ) * examples : refactor common code into a library * examples : refactor common SDL code into a library * make : update Makefile to use common libs * common : fix MSVC M_PI .. * addon.node : link common lib	2023-02-15 19:28:10 +02:00
genevera (she/her)	459753342d	yt-wsp.sh : add unique filename generation (#495 ) Co-authored-by: genevera <genevera@noreply.users.github.com>	2023-02-14 20:12:51 +02:00
Qianhe Chen	ab1916fc59	ci : add node addon test and optimize compilation configuration (#468 ) * addon: implement node addon call whisper through cpp * addon: modify the license to MIT * addon: remove iostream * addon: rename dir * addon: fix typo * addon: configure cmake to build when cmake-js is used * ci: add addon.node test ci * addon: remove build WHISPER_BUILD_TESTS * addon: update build command * addon: add test * addon: add test file * addon: adapt to compile on Windows * addon: fix typo * addon: reuse jfk.wav Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * addon: reuse jfk.wav --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-02-05 15:02:08 +02:00
Matija Pevec	d012b5c7e4	whisper : add "split_on_word" flag when using using "max_len" option (#455 ) * Update whisper.cpp * fix: trim function * feat: added flag to split on word * fix: arguments for main	2023-02-05 14:44:23 +02:00
Georgi Gerganov	f3ee4a9673	whisper : reduce memory usage during inference (#431 ) * ggml : add "scratch" buffer support * ggml : support for scratch ring-buffer * ggml : bug fix in ggml_repeat() * ggml : error on scratch buffer overflow * whisper : use scratch buffers during inference (base model only) * whisper : update memory usage for all models * whisper : fix encoder memory usage * whisper : use whisper_context functions instead of macros * whisper : fix FF + remove it from README * ggml : reuse ggml_new_i32 * ggml : refactor the scratch buffer storage * whisper : reorder scratch buffers in the decoder * main : add option to disable temp fallback * Update README.md	2023-02-04 09:45:52 +02:00
Qianhe Chen	c306a7fd89	addon.node : using whisper as a Node.js addon (#443 ) * addon: implement node addon call whisper through cpp * addon: modify the license to MIT * addon: remove iostream * addon: rename dir * addon: fix typo * addon: configure cmake to build when cmake-js is used	2023-02-04 09:10:25 +02:00
Taisei Mima	86ef64a855	wasm : fix typo in helper.js (#459 )	2023-02-04 08:49:15 +02:00
Alex Bacart	3b1960520a	main : CSV format export trimmed spaces fix (#444 ) * Update main.cpp Removed string trimming * Update main.cpp * Update main.cpp * Revert "Update main.cpp" This reverts commit `d8924fdcfe`. * Revert "Update main.cpp" This reverts commit `252e508d85`.	2023-02-04 08:48:35 +02:00
Eric Tendian	47737b2e82	livestream.sh : run main with model arg instead of default (#453 ) Actually utilizes the $model var when calling ./main.	2023-01-27 01:13:31 +02:00
Georgi Gerganov	60337f5306	wasm : check if navigator.storage.estimate() is available Safari does not support it	2023-01-25 20:00:59 +02:00
Ondrej Kokes	11f61cecd6	whisper.wasm : add labels for easier radio selection (#435 )	2023-01-23 20:49:00 +02:00
Georgi Gerganov	f583e2d2f5	main : we had accidentally disabled the temperature fallback .. (#291 )	2023-01-18 22:51:41 +02:00
Georgi Gerganov	206fc93396	whisper.wasm : add small and small.en models	2023-01-18 21:58:55 +02:00
Chia-Hsiang Cheng	472a473fd1	main : add an option to accept optional output filenames (#424 ) * Add an option to accept optional output filenames * Format the file Co-authored-by: Chia-Hsiang Cheng <gary.chiahsiang.cheng@gmail.com>	2023-01-18 21:26:31 +02:00
Georgi Gerganov	9ba66c2fad	stream : fix handling of --step == --length (#416 )	2023-01-18 21:22:52 +02:00
Georgi Gerganov	1ccb8a46a5	bench : fix Windows linkage by moving ggml benches in whisper lib ..	2023-01-18 21:19:50 +02:00
Georgi Gerganov	1290fc6457	bench : add memcpy and ggml_mul_mat benchmarks	2023-01-18 20:31:46 +02:00
Digipom	49b529ba74	whisper.android : add support for loading directly from asset in C (#415 )	2023-01-16 21:57:35 +02:00
Georgi Gerganov	c9aeb33676	stream : fix --keep_context argument to be used correctly (#354 )	2023-01-16 19:37:40 +02:00
Georgi Gerganov	c3991bbb24	Update README.md	2023-01-15 14:08:12 +02:00
Georgi Gerganov	fafd78945d	bench.wasm : print system info	2023-01-15 11:34:03 +02:00
Georgi Gerganov	8de452c18b	Improve decoding (#291 ) * whisper : prepare infra for new decoding strategies * whisper : apply logit filters and compute logprobs * whisper : add whisper_get_logits() * whisper : separate self and cross attention memory Initial step needed for supporting parallel decoders * whisper : move probs_id buffer to whisper_context * whisper : refactor kv cache into separate struct * whisper : move self-attention kv cache to whisper_decoder * whisper : wip decoding parameters + strategies * whisper : wip decoding parameters + strategies (part 2) * whisper : wip decoding parameters + strategies (part 3) * whisper : wip decoding parameters + strategies (part 4) * whisper : fix prompt_past update to not include prompt_init * whisper : temperature + best_of support * whisper : support for compression_ration_threshold We actually use entropy, but it is similar * command : fix example to use logits instead of obsolete probs * whisper : handle empty sequence ranking * whisper : add WHISPER_DEBUG + diagnostic prints + new main args * whisper : minor fixes * whisper : add beam-search support * whisper : bug fix when there no previous context * whisper : add comments * stream : disable temperature fallback For real-time processing, we always want a single decoder running at T=0 * whisper.swiftui : update example - fix paths + add empty folders	2023-01-15 11:29:57 +02:00
Georgi Gerganov	a6dbd9188b	stream : fix a bug that inserted a lot of empty audio at the start The quality was terrible due to this	2023-01-14 19:20:47 +02:00
Syahmi Azhar	1512545149	whisper : add loader class to allow loading from buffer and others (#353 ) * whisper : add loader to allow loading from other than file * whisper : rename whisper_init to whisper_init_from_file * whisper : add whisper_init_from_buffer * android : Delete local.properties * android : load models directly from assets * whisper : adding <stddef.h> needed for size_t + code style Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-01-08 13:03:33 +02:00
Georgi Gerganov	52a3e0c92a	ggml : improve vec_dot_f16 unrolling in flash_attn_f16	2023-01-08 11:41:18 +02:00
Georgi Gerganov	d1ea1220ff	command : clean-up / refactoring / formatting (#383 )	2023-01-07 21:43:24 +02:00
David	9c4a1522f6	command : always-prompt mode (#383 )	2023-01-07 21:41:11 +02:00
Georgi Gerganov	87dd4a3081	talk.wasm : bump memory usage + update whisper.js	2023-01-06 21:13:44 +02:00
Georgi Gerganov	6b351bb669	command : add "guided-mode" video demo in the README.md	2023-01-06 18:59:26 +02:00
Georgi Gerganov	b3c865083e	ci : add emscripten build	2023-01-05 22:10:20 +02:00
Georgi Gerganov	a0d4f8e65c	main : make whisper_print_segment_callback() more readable (close #371 )	2023-01-05 21:45:05 +02:00
Georgi Gerganov	196d738974	minor : close #370 + Makefile build info print change	2023-01-05 21:35:45 +02:00
Andy Maloney	84c6b42e65	cmake : update to 3.19 (#351 ) - update from 3.0 (from 2014) to 3.19 (from 2020) - move some global setting onto the targets (through a cmake include)	2023-01-05 21:22:48 +02:00
Georgi Gerganov	a466c3404d	stream : fix data race on bool + avoid division-by-zero	2023-01-02 10:20:50 +02:00
Andy Maloney	f00509d57c	command : refactor to split command list & general transcription modes (#331 ) This makes it easier to understand if you're looking for only one of the capabilities.	2022-12-31 14:08:57 +02:00
Niels Mayer	a593b932e4	main : add -ocsv, aka --output-csv to output a CSV file Adds -ocsv, aka --output-csv feature to examples/main, which outputs a CSV file containing lines formatted as follows <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, "<transcript-line-including-commas>".	2022-12-29 14:04:00 +02:00
Andy Maloney	331c0bbddc	examples : fix memory leak on failure to load gpt2 model (#323 )	2022-12-23 20:19:07 +02:00
Andy Maloney	dc90efd504	examples : small code cleanups (#322 ) - remove unnecessary initialization of string to "" - use empty() instead of checking size() - use emplace_back instead of push_back - use nullptr instead of NULL - remove unnecessary call to .data() on string - use character overload of find_first_of() instead of passing a string	2022-12-23 20:18:51 +02:00

... 3 4 5 6 7 ...

564 Commits