TBE CPU Autovectorization¶
FP8/16/32 Autovec Implementation Methods¶
- template<typename InType, typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDM_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const InType *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const bool no_bag, const bool is_bf16_out, const bool is_bf16_in)
Autovectorized version of method
EmbeddingSpMDM_reffor FP32 weight type.- Template Parameters:
InType – input data type (
uint8_tis used)IndexType – index data type (
int64_tis used)OffsetType – offset data type (
int32_tis used)OutType – output data type (
floatis used)
- Parameters:
block_size – Number of elements in a block (
int64_t)output_size – Number of elements in output (
int64_t)index_size – Number of elements in index (
int64_t)data_size – Number of elements in data (
int64_t)input – Address of input (
InType*)indices – Address of index (
IndexType*)offsets_or_lengths – Address of offset (
OffsetType*)weights – Weights of sum; optional, can be null for non-weighted sum (
float*)normalize_by_lengths – Whether or not to normalize by lengths (
bool)out – Address of output (
OutType*)is_weight_positional – If
true, weight is positional; set tofalsefor FP32 autovec implementation (bool)use_offsets – If
true, will use offsets instead of lengths; set totruefor FP32 autovec implementation (bool)output_stride – If -1, output_stride is same as block_size; set to -1 for FP32 autovec implementation (
int64_t)input_stride – If -1, input_stride is same as block_size; set to -1 for FP32 autovec implementation (
int64_t)scale_bias_last – If
true, scale and bias appear at end of each row; set totruefor FP32 autovec implementation (bool)no_bag – If
true, no embedding bag; set tofalsefor FP32 autovec implementation (bool)is_bf16_out – If
true, output isBFLOAT16type; set tofalsefor FP32 autovec implementation (bool)is_bf16_in – If
true, input isBFLOAT16type; set tofalsefor FP32 autovec implementation (bool)
- template<typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDMFP8_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const uint8_t *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const int exponent_bits, const int exponent_bias, const bool is_bf16_out)
Autovectorized version of method
EmbeddingSpMDM_reffor FP8 weight type.- Template Parameters:
InType – input data type (
uint8_tis used)IndexType – index data type (
int64_tis used)OffsetType – offset data type (
int32_tis used)OutType – output data type (
floatis used)
- Parameters:
block_size – Number of elements in a block (
int64_t)output_size – Number of elements in output (
int64_t)index_size – Number of elements in index (
int64_t)data_size – Number of elements in data (
int64_t)input – Address of input (
InType*)indices – Address of index (
IndexType*)offsets_or_lengths – Address of offset (
OffsetType*)weights – Weights of sum; optional, can be null for non-weighted sum (
float*)normalize_by_lengths – Whether or not to normalize by lengths (
bool)out – Address of output (
OutType*)is_weight_positional – If
true, weight is positional; set tofalsefor FP8 autovec implementation (bool)use_offsets – If
true, will use offsets instead of lengths; set totruefor FP8 autovec implementation (bool)output_stride – If -1, output_stride is same as block_size; set to -1 for FP8 autovec implementation (
int64_t)exponent_bits – Bits to use in exponent
exponent_bias – Bias to use in exponent
is_bf16_out – If
true, output isBFLOAT16type; set tofalsefor FP8 autovec implementation (bool)