The pm_runtime_enable will increase power disable depth.
Thus a pairing decrement is needed on the error handling
path to keep it balanced according to context.
Fixes: f7b2b5dd6a ("crypto: omap-aes - add error check for pm_runtime_get_sync")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The patch 'irqchip/gic-v3-its: Balance initial LPI affinity across CPUs'
set the IRQ to an uncentain CPU. If an IRQ is bound to the CPU used by the
thread which is sending request, the throughput will be just half.
So allocate a 'work_queue' and set as 'WQ_UNBOUND' to do the back half work
on some different CPUS.
Signed-off-by: Yang Shen <shenyang39@huawei.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Reviewed-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch moves the curve25519_selftest into curve25519.h so
we don't get a warning from gcc complaining about a missing
prototype.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
and <crypto/sha3.h> contains declarations for SHA-3.
This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure. So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.
Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
<crypto/sha2.h>, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.
This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1. It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Clang warns:
drivers/crypto/amcc/crypto4xx_core.c:921:60: warning: operator '?:' has
lower precedence than '|'; '|' will be evaluated first
[-Wbitwise-conditional-parentheses]
(crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AEAD) ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
drivers/crypto/amcc/crypto4xx_core.c:921:60: note: place parentheses
around the '|' expression to silence this warning
(crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AEAD) ?
^
)
drivers/crypto/amcc/crypto4xx_core.c:921:60: note: place parentheses
around the '?:' expression to evaluate it first
(crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AEAD) ?
^
(
1 warning generated.
It looks like this should have been a logical OR so that
PD_CTL_HASH_FINAL gets added to the w bitmask if crypto_tfm_alg_type
is either CRYPTO_ALG_TYPE_AHASH or CRYPTO_ALG_TYPE_AEAD. Change the
operator so that everything works properly.
Fixes: 4b5b79998a ("crypto: crypto4xx - fix stalls under heavy load")
Link: https://github.com/ClangBuiltLinux/linux/issues/1198
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Wang Qing reports that IS_ERR_OR_NULL() should be matched with
PTR_ERR_OR_ZERO(), not PTR_ERR().
As it turns out, the error path always returns an error code,
i.e. NULL is never returned.
Update the code accordingly - s/IS_ERR_OR_NULL/IS_ERR.
Reported-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of copying the calculated authentication tag to memory and
calling crypto_memneq() to verify it, use vector bytewise compare and
min across vector instructions to decide whether the tag is valid. This
is more efficient, and given that the tag is only transiently held in a
NEON register, it is also safer, given that calculated tags for failed
decryptions should be withheld.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fix aead auth setting key process error. if use soft shash function, driver
need to use digest size replace of the user input key length.
Signed-off-by: Kai Ye <yekai13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Based on lessons learnt from optimizing the 32-bit version of this driver,
we can simplify the arm64 version considerably, by reordering the final
two stores when the last block is not a multiple of 64 bytes. This removes
the need to use permutation instructions to calculate the elements that are
clobbered by the final overlapping store, given that the store of the
penultimate block now follows it, and that one carries the correct values
for those elements already.
While at it, simplify the overlapping loads as well, by calculating the
address of the final overlapping load upfront, and switching to this
address for every load that would otherwise extend past the end of the
source buffer.
There is no impact on performance, but the resulting code is substantially
smaller and easier to follow.
Cc: Eric Biggers <ebiggers@google.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Introduce new API, qat_uclo_set_cfg_ae_mask(), to allow the load of the
firmware image to a subset of Acceleration Engines (AEs). This is
required by the next generation of QAT devices to be able to load
different firmware images to the device.
Signed-off-by: Jack Xu <jack.xu@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Introduce the chip info structure which contains device specific
information. The initialization path has been split between common and
hardware specific in order to facilitate the introduction of the next
generation hardware.
Signed-off-by: Jack Xu <jack.xu@intel.com>
Co-developed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move the definition of ICP_QAT_AE_OFFSET, ICP_QAT_CAP_OFFSET,
LOCAL_TO_XFER_REG_OFFSET and ICP_QAT_EP_OFFSET from qat_hal.c to
icp_qat_hal.h to avoid the definition of generation specific constants
in qat_hal.c.
Signed-off-by: Jack Xu <jack.xu@intel.com>
Co-developed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Change the API and the behaviour of the qat_hal_start() function.
With this change, the function starts under the hood all acceleration
engines (AEs) and there is no longer need to call it for each engine.
Signed-off-by: Jack Xu <jack.xu@intel.com>
Co-developed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Improve the way micro instructions (FW code) are uploaded to Accelerator
Engines (AEs). If code starts at PC zero (absolute addressing), read
uwords with no relative address. Otherwise, use relative addressing to
the page region.
Signed-off-by: Jack Xu <jack.xu@intel.com>
Co-developed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Do not mask the AE number with the AE mask when accessing the AE local
CSRs. Bit 12 of the local CSR address is the start of AE number so just
take out the AE mask here.
Signed-off-by: Jack Xu <jack.xu@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The return value of qat_hal_rd_ae_csr() is always a CSR value and never
a status and should not be stored in the status variable of
qat_hal_put_rel_rd_xfer().
This removes the assignment as qat_hal_rd_ae_csr() is not expected to
fail.
A more comprehensive handling of the theoretical corner case which could
result in a fail will be submitted in a separate patch.
Fixes: 8c9478a400 ("crypto: qat - reduce stack size with KASAN")
Signed-off-by: Jack Xu <jack.xu@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Fiona Trahe <fiona.trahe@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement infrastructure for the Multiple Object File (MOF) format
in the firmware loader. This will allow to load a specific firmware
image contained inside an MOF file.
This patch is based on earlier work done by Pingchao Yang.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Jack Xu <jack.xu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes all the sparse warnings in cavium/nitrox:
- Fix endianness warnings by adding the correct markers to unions.
- Add missing header inclusions for prototypes.
- Move nitrox_sriov_configure prototype into the isr header file.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The current NEON based ChaCha implementation for ARM is optimized for
multiples of 4x the ChaCha block size (64 bytes). This makes sense for
block encryption, but given that ChaCha is also often used in the
context of networking, it makes sense to consider arbitrary length
inputs as well.
For example, WireGuard typically uses 1420 byte packets, and performing
ChaCha encryption involves 5 invocations of chacha_4block_xor_neon()
and 3 invocations of chacha_block_xor_neon(), where the last one also
involves a memcpy() using a buffer on the stack to process the final
chunk of 1420 % 64 == 12 bytes.
Let's optimize for this case as well, by letting chacha_4block_xor_neon()
deal with any input size between 64 and 256 bytes, using NEON permutation
instructions and overlapping loads and stores. This way, the 140 byte
tail of a 1420 byte input buffer can simply be processed in one go.
This results in the following performance improvements for 1420 byte
blocks, without significant impact on power-of-2 input sizes. (Note
that Raspberry Pi is widely used in combination with a 32-bit kernel,
even though the core is 64-bit capable)
Cortex-A8 (BeagleBone) : 7%
Cortex-A15 (Calxeda Midway) : 21%
Cortex-A53 (Raspberry Pi 3) : 3%
Cortex-A72 (Raspberry Pi 4) : 19%
Cc: Eric Biggers <ebiggers@google.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The extra tests in the manager actually require the manager to be
selected too. Otherwise the linker gives errors like:
ld: arch/x86/crypto/chacha_glue.o: in function `chacha_simd_stream_xor':
chacha_glue.c:(.text+0x422): undefined reference to `crypto_simd_disabled_for_test'
Fixes: 2343d1529a ("crypto: Kconfig - allow tests to be disabled when manager is disabled")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
At the time xts fallback tfm allocation fails the device struct
hasn't been enabled yet in the caam xts tfm's private context.
Fix this by using the device struct from xts algorithm's private context
or, when not available, by replacing dev_err with pr_err.
Fixes: 9d9b14dbe0 ("crypto: caam/jr - add fallback for XTS with more than 8B IV")
Fixes: 83e8aa9121 ("crypto: caam/qi - add fallback for XTS with more than 8B IV")
Fixes: 36e2d7cfdc ("crypto: caam/qi2 - add fallback for XTS with more than 8B IV")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Reviewed-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>