whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	267e15a46d	cuda : avoid async allocs in CUDA mel code	2024-06-12 09:52:15 +03:00
Georgi Gerganov	420b6abc54	cuda : fix HIPBLAS build (#2234 )	2024-06-11 19:14:38 +03:00
Georgi Gerganov	99804b0f3e	cuda : fix bounds check for src0 rows in MMVQ kernel (#2231 ) * cuda : fix bounds check for src0 rows in MMVQ kernel * Update ggml-cuda/mmvq.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-06-11 17:39:01 +03:00
Georgi Gerganov	c55964c956	ci : fix CUDA builds (#2232 )	2024-06-11 17:21:30 +03:00
Borislav Stanimirov	20c542c713	whisper : auto-grow working areas for mel_calc_cuda (#2227 ) * whisper : auto-grow working areas for mel_calc_cuda, fixes #2226 * whisper : only calculate mel spectrogram on GPU if audio is <= 5 min	2024-06-10 21:51:32 +03:00
Georgi Gerganov	c2bdb960cd	whisper : free whisper_mel instances (#2220 )	2024-06-10 11:00:15 +03:00
Georgi Gerganov	87acd6d629	whisper : whisper_state/backend fixes (#2217 ) * whisper : fixes * ci : WHISPER_CUBLAS -> WHISPER_CUDA	2024-06-06 18:51:36 +03:00
Borislav Stanimirov	f842d31171	whisper : calculate mel spectrogram directly into a ggml_tensor (#2208 ) * whisper : calculate mel spectrogram directly into a ggml_tensor * whisper : remove unused temp buffer from state * whisper : fix not initializing wstate.embd_enc	2024-06-06 16:20:46 +03:00
Borislav Stanimirov	ffef323c4c	whisper : add CUDA-specific computation mel spectrograms (#2206 ) * whisper : use polymorphic class to calculate mel spectrogram * whisper : add cuda-specific mel spectrogram calculation * whisper : conditionally compile cufftGetErrorString to avoid warnings * build : add new files to makefile * ruby : add new files to conf script * build : fix typo in makefile * whisper : suppress cub warning for deprecated C++ std in whisper-mel-cuda	2024-06-04 09:32:23 +03:00
Borislav Stanimirov	af5833e298	whisper : remove `speed_up` and `phase_vocoder` functions (#2198 ) whisper : fix cast warning * whisper : remove phase_vocoder functions, ref #2195 * whisper : remove speed_up from whisper_full_params, closes #2195	2024-05-31 11:37:29 +03:00
Martin Delille	b87494bb8f	readme : add conan badge (#2196 ) * Add conan badge * Fix markdown formating	2024-05-30 15:43:28 +03:00
Carlos Zoido	ad130431aa	readme : add install instructions for Conan (#2189 )	2024-05-30 15:06:15 +03:00
Borislav Stanimirov	e130b66642	whisper: use global cache for sin/cos vals and Hann window (#2194 ) - also rename Hanning to Hann as it's named after Julius von Hann as per Wikipedia	2024-05-29 19:09:21 +03:00
Georgi Gerganov	c7b6988678	release : v1.6.2	2024-05-27 10:35:09 +03:00
Georgi Gerganov	05042a782d	Revert "whisper : remove extra backend instance (huh?)" (#2182 ) This reverts commit `4caa64b73e`.	2024-05-27 10:20:25 +03:00
Daniel Valdivia	a7dc2aab16	server : fix typo (#2181 ) A simple comment typo, PR can be dismissed	2024-05-25 10:46:22 +03:00
Todd	22d46b7ba4	ruby : update bindings (#2154 ) * update library files * update whispercpp * not needed for gem	2024-05-22 23:02:52 +03:00
Georgi Gerganov	c10db6ea28	release : v1.6.1	2024-05-21 18:44:37 +03:00
William Tambellini	1b51fdf170	examples : add support for decoding input with ffmpeg (Linux) (#2133 ) - search for ffmpeg libs/headers at cmake time - added ffmpeg-transcode.cpp into libcommon if ffmpeg on - hooked ffmpeg trancoding in common read_wav(...) - passed test: ./main -m ggml-base.en.bin -f samples/jfk.mp3	2024-05-21 18:31:41 +03:00
Pedro Probst	adee3f9c1f	node : add flash_attn param (#2170 )	2024-05-20 09:08:48 +03:00
Tamotsu Takahashi	4798be1f9a	ci: Update build.yml to suppress warnings about node.js versions (#2166 ) * Update actions to suppress warnings about old node.js https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/ * Update actions/upload-artifact, specify android cmdline-tools-version * Use java 20 gradle 8.1 complains against 21 https://docs.gradle.org/current/userguide/compatibility.html	2024-05-19 11:49:26 +03:00
Georgi Gerganov	08981d1bac	release : v1.6.0	2024-05-15 09:59:48 +03:00
Georgi Gerganov	7094ea5e75	whisper : use flash attention (#2152 ) * whisper : use flash attention in the encoder * whisper : add kv_pad * whisper : remove extra backend instance (huh?) * whisper : use FA for cross-attention * whisper : use FA for self-attention * whisper : simplify encoder FA * whisper : add flash_attn runtime parameter * scripts : add bench log * scripts : add M1 Pro bench log	2024-05-15 09:38:19 +03:00
petterreinholdtsen	9d5771ae43	talk-llama : reject runs without required arguments (#2153 ) * Extended talk-llama example to reject runs without required arguments. Print warning and exit if models are not specified on the command line. * Update examples/talk-llama/talk-llama.cpp * Update examples/talk-llama/talk-llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-14 21:32:41 +03:00
Georgi Gerganov	f56b8305c4	sync : ggml	2024-05-14 19:16:32 +03:00
Georgi Gerganov	1056ad762c	metal : support FA without mask + add asserts (llama/7278) * ggml : fa without mask + add asserts ggml-ci * metal : support non-contiguous KV ggml-ci	2024-05-14 19:16:29 +03:00
Radoslav Gerganov	c451080c8b	ggml : add RPC backend (llama/6829) * ggml : add RPC backend The RPC backend proxies all operations to a remote server which runs a regular backend (CPU, CUDA, Metal, etc). * set TCP_NODELAY * add CI workflows * Address review comments * fix warning * implement llama_max_devices() for RPC * Address review comments * Address review comments * wrap sockfd into a struct * implement get_alignment and get_max_size * add get_device_memory * fix warning * win32 support * add README * readme : trim trailing whitespace * Address review comments * win32 fix * Address review comments * fix compile warnings on macos	2024-05-14 19:16:29 +03:00
Neo Zhang	8e7c22fbdb	rm wait() (llama/7233)	2024-05-14 19:16:29 +03:00
Johannes Gäßler	e57e95eb0d	CUDA: add FP32 FlashAttention vector kernel (llama/7188) * CUDA: add FP32 FlashAttention vector kernel * fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel	2024-05-14 19:16:29 +03:00
Georgi Gerganov	130f43e4b8	scripts : sync ggml-rpc	2024-05-14 19:15:35 +03:00
thewh1teagle	d8356a1cc2	whisper : fix model path encoding in windows (#2086 ) * fix: model path encoding in windows * fix: convert model path to wide string only for MSVC compiler	2024-05-14 09:43:41 +03:00
Georgi Gerganov	4ef8d9f44e	server : return utf-8 (#2138 )	2024-05-13 15:33:46 +03:00
Pedro Probst	3928dbd206	node : add audio_ctx and audio buffer params (#2123 ) * node : add audio_ctx param * node : support passing audio buffer directly * node : parse audio_ctx in index.js --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-13 15:22:23 +03:00
aldorof	2ced6f0742	cmake : fix HIP/ROCm build (#2102 )	2024-05-13 15:18:43 +03:00
valVk	30f73109b8	node : add additional params (#2000 ) * Add additional params to addon.node * Add comma_in_time as parameter * Fix tests	2024-05-13 15:15:43 +03:00
Mark Karpelès	17fa62d3d3	js : remove un-needed request header from fetchRemote (#2119 )	2024-05-13 15:13:19 +03:00
Georgi Gerganov	1da5edcde0	cmake : fix metal embed sources path (#2110 )	2024-05-13 15:09:59 +03:00
Daniel Ziegenberg	0bb05b113d	main : dont print timings with --no-prints (#2108 ) Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>	2024-05-13 15:00:19 +03:00
Daniel Ziegenberg	f141b2b938	main : add options for temperature control (#2088 ) Add two options: ``` -tp, --temperature N [0.00 ] The sampling temperature, between 0 and 1 -tpi, --temperature-inc N [0.20 ] The increment of temperature, between 0 and 1 ``` The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>	2024-05-13 14:59:44 +03:00
Georgi Gerganov	2b434c449e	whisper : switch back to F32 mask (#0 )	2024-05-13 14:43:43 +03:00
zhangjixiong	e93081f83f	whisper.android : update example, add field to print timestamp (#2072 )	2024-05-13 14:30:03 +03:00
Xingchen Song(宋星辰)	b6bbce4ae9	cmake : fix json INTERFACE library (#2069 )	2024-05-13 14:29:39 +03:00
mashizora	7705dc52da	main : fix double quote escaping in csv output (#2090 )	2024-05-13 11:55:32 +03:00
Georgi Gerganov	e6acaf9d91	metal : tune soft_max number of threads (#0 )	2024-05-13 11:02:26 +03:00
Georgi Gerganov	2c81e6fd51	whisper : remove old flash attn code (#0 )	2024-05-13 11:02:26 +03:00
Georgi Gerganov	9506267ce5	ggml : try fix ppc64 (#0 )	2024-05-13 11:02:26 +03:00
Georgi Gerganov	fbeb80b5f0	ggml : remove oboslete alibi code (skipme) (#0 )	2024-05-13 11:02:26 +03:00
Georgi Gerganov	3fa7d29876	talk-llama : sync llama.cpp	2024-05-13 11:02:26 +03:00
Georgi Gerganov	fe179ae0cc	sync : ggml	2024-05-13 11:02:26 +03:00
Hong Bo PENG	40aeeeecc4	ggml : optimize for ppc64le using VSX intrinsics (ggml/784) * optimize for ppc64le using VSX intrinsics * 1. code clean up by removing comments about overflow concern. 2. fix typo in suffix of scaling. * Continue to fix typo in suffix of scaling for QK_K <> 256 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-13 11:02:26 +03:00

1 2 3 4 5 ...

1307 Commits All Branches Search

1307 Commits

All Branches