Module core::arch::x86_64 1.27.0[−][src]
Expand description
Platformspecific intrinsics for the x86_64
platform.
See the module documentation for more details.
Structs
__m128bh  Experimental 128bit wide set of eight ‘u16’ types, x86specific 
__m256bh  Experimental 256bit wide set of 16 ‘u16’ types, x86specific 
__m512  Experimental 512bit wide set of sixteen 
__m512bh  Experimental 512bit wide set of 32 ‘u16’ types, x86specific 
__m512d  Experimental 512bit wide set of eight 
__m512i  Experimental 512bit wide integer vector type, x86specific 
CpuidResult  Result of the 
__m128  128bit wide set of four 
__m128d  128bit wide set of two 
__m128i  128bit wide integer vector type, x86specific 
__m256  256bit wide set of eight 
__m256d  256bit wide set of four 
__m256i  256bit wide integer vector type, x86specific 
Constants
_MM_CMPINT_EQ  Experimental Equal 
_MM_CMPINT_FALSE  Experimental False 
_MM_CMPINT_LE  Experimental Lessthanorequal 
_MM_CMPINT_LT  Experimental Lessthan 
_MM_CMPINT_NE  Experimental Notequal 
_MM_CMPINT_NLE  Experimental Not lessthanorequal 
_MM_CMPINT_NLT  Experimental Not lessthan 
_MM_CMPINT_TRUE  Experimental True 
_MM_MANT_NORM_1_2  Experimental interval [1, 2) 
_MM_MANT_NORM_P5_1  Experimental interval [0.5, 1) 
_MM_MANT_NORM_P5_2  Experimental interval [0.5, 2) 
_MM_MANT_NORM_P75_1P5  Experimental interval [0.75, 1.5) 
_MM_MANT_SIGN_NAN  Experimental DEST = NaN if sign(SRC) = 1 
_MM_MANT_SIGN_SRC  Experimental sign = sign(SRC) 
_MM_MANT_SIGN_ZERO  Experimental sign = 0 
_MM_PERM_AAAA  Experimental 
_MM_PERM_AAAB  Experimental 
_MM_PERM_AAAC  Experimental 
_MM_PERM_AAAD  Experimental 
_MM_PERM_AABA  Experimental 
_MM_PERM_AABB  Experimental 
_MM_PERM_AABC  Experimental 
_MM_PERM_AABD  Experimental 
_MM_PERM_AACA  Experimental 
_MM_PERM_AACB  Experimental 
_MM_PERM_AACC  Experimental 
_MM_PERM_AACD  Experimental 
_MM_PERM_AADA  Experimental 
_MM_PERM_AADB  Experimental 
_MM_PERM_AADC  Experimental 
_MM_PERM_AADD  Experimental 
_MM_PERM_ABAA  Experimental 
_MM_PERM_ABAB  Experimental 
_MM_PERM_ABAC  Experimental 
_MM_PERM_ABAD  Experimental 
_MM_PERM_ABBA  Experimental 
_MM_PERM_ABBB  Experimental 
_MM_PERM_ABBC  Experimental 
_MM_PERM_ABBD  Experimental 
_MM_PERM_ABCA  Experimental 
_MM_PERM_ABCB  Experimental 
_MM_PERM_ABCC  Experimental 
_MM_PERM_ABCD  Experimental 
_MM_PERM_ABDA  Experimental 
_MM_PERM_ABDB  Experimental 
_MM_PERM_ABDC  Experimental 
_MM_PERM_ABDD  Experimental 
_MM_PERM_ACAA  Experimental 
_MM_PERM_ACAB  Experimental 
_MM_PERM_ACAC  Experimental 
_MM_PERM_ACAD  Experimental 
_MM_PERM_ACBA  Experimental 
_MM_PERM_ACBB  Experimental 
_MM_PERM_ACBC  Experimental 
_MM_PERM_ACBD  Experimental 
_MM_PERM_ACCA  Experimental 
_MM_PERM_ACCB  Experimental 
_MM_PERM_ACCC  Experimental 
_MM_PERM_ACCD  Experimental 
_MM_PERM_ACDA  Experimental 
_MM_PERM_ACDB  Experimental 
_MM_PERM_ACDC  Experimental 
_MM_PERM_ACDD  Experimental 
_MM_PERM_ADAA  Experimental 
_MM_PERM_ADAB  Experimental 
_MM_PERM_ADAC  Experimental 
_MM_PERM_ADAD  Experimental 
_MM_PERM_ADBA  Experimental 
_MM_PERM_ADBB  Experimental 
_MM_PERM_ADBC  Experimental 
_MM_PERM_ADBD  Experimental 
_MM_PERM_ADCA  Experimental 
_MM_PERM_ADCB  Experimental 
_MM_PERM_ADCC  Experimental 
_MM_PERM_ADCD  Experimental 
_MM_PERM_ADDA  Experimental 
_MM_PERM_ADDB  Experimental 
_MM_PERM_ADDC  Experimental 
_MM_PERM_ADDD  Experimental 
_MM_PERM_BAAA  Experimental 
_MM_PERM_BAAB  Experimental 
_MM_PERM_BAAC  Experimental 
_MM_PERM_BAAD  Experimental 
_MM_PERM_BABA  Experimental 
_MM_PERM_BABB  Experimental 
_MM_PERM_BABC  Experimental 
_MM_PERM_BABD  Experimental 
_MM_PERM_BACA  Experimental 
_MM_PERM_BACB  Experimental 
_MM_PERM_BACC  Experimental 
_MM_PERM_BACD  Experimental 
_MM_PERM_BADA  Experimental 
_MM_PERM_BADB  Experimental 
_MM_PERM_BADC  Experimental 
_MM_PERM_BADD  Experimental 
_MM_PERM_BBAA  Experimental 
_MM_PERM_BBAB  Experimental 
_MM_PERM_BBAC  Experimental 
_MM_PERM_BBAD  Experimental 
_MM_PERM_BBBA  Experimental 
_MM_PERM_BBBB  Experimental 
_MM_PERM_BBBC  Experimental 
_MM_PERM_BBBD  Experimental 
_MM_PERM_BBCA  Experimental 
_MM_PERM_BBCB  Experimental 
_MM_PERM_BBCC  Experimental 
_MM_PERM_BBCD  Experimental 
_MM_PERM_BBDA  Experimental 
_MM_PERM_BBDB  Experimental 
_MM_PERM_BBDC  Experimental 
_MM_PERM_BBDD  Experimental 
_MM_PERM_BCAA  Experimental 
_MM_PERM_BCAB  Experimental 
_MM_PERM_BCAC  Experimental 
_MM_PERM_BCAD  Experimental 
_MM_PERM_BCBA  Experimental 
_MM_PERM_BCBB  Experimental 
_MM_PERM_BCBC  Experimental 
_MM_PERM_BCBD  Experimental 
_MM_PERM_BCCA  Experimental 
_MM_PERM_BCCB  Experimental 
_MM_PERM_BCCC  Experimental 
_MM_PERM_BCCD  Experimental 
_MM_PERM_BCDA  Experimental 
_MM_PERM_BCDB  Experimental 
_MM_PERM_BCDC  Experimental 
_MM_PERM_BCDD  Experimental 
_MM_PERM_BDAA  Experimental 
_MM_PERM_BDAB  Experimental 
_MM_PERM_BDAC  Experimental 
_MM_PERM_BDAD  Experimental 
_MM_PERM_BDBA  Experimental 
_MM_PERM_BDBB  Experimental 
_MM_PERM_BDBC  Experimental 
_MM_PERM_BDBD  Experimental 
_MM_PERM_BDCA  Experimental 
_MM_PERM_BDCB  Experimental 
_MM_PERM_BDCC  Experimental 
_MM_PERM_BDCD  Experimental 
_MM_PERM_BDDA  Experimental 
_MM_PERM_BDDB  Experimental 
_MM_PERM_BDDC  Experimental 
_MM_PERM_BDDD  Experimental 
_MM_PERM_CAAA  Experimental 
_MM_PERM_CAAB  Experimental 
_MM_PERM_CAAC  Experimental 
_MM_PERM_CAAD  Experimental 
_MM_PERM_CABA  Experimental 
_MM_PERM_CABB  Experimental 
_MM_PERM_CABC  Experimental 
_MM_PERM_CABD  Experimental 
_MM_PERM_CACA  Experimental 
_MM_PERM_CACB  Experimental 
_MM_PERM_CACC  Experimental 
_MM_PERM_CACD  Experimental 
_MM_PERM_CADA  Experimental 
_MM_PERM_CADB  Experimental 
_MM_PERM_CADC  Experimental 
_MM_PERM_CADD  Experimental 
_MM_PERM_CBAA  Experimental 
_MM_PERM_CBAB  Experimental 
_MM_PERM_CBAC  Experimental 
_MM_PERM_CBAD  Experimental 
_MM_PERM_CBBA  Experimental 
_MM_PERM_CBBB  Experimental 
_MM_PERM_CBBC  Experimental 
_MM_PERM_CBBD  Experimental 
_MM_PERM_CBCA  Experimental 
_MM_PERM_CBCB  Experimental 
_MM_PERM_CBCC  Experimental 
_MM_PERM_CBCD  Experimental 
_MM_PERM_CBDA  Experimental 
_MM_PERM_CBDB  Experimental 
_MM_PERM_CBDC  Experimental 
_MM_PERM_CBDD  Experimental 
_MM_PERM_CCAA  Experimental 
_MM_PERM_CCAB  Experimental 
_MM_PERM_CCAC  Experimental 
_MM_PERM_CCAD  Experimental 
_MM_PERM_CCBA  Experimental 
_MM_PERM_CCBB  Experimental 
_MM_PERM_CCBC  Experimental 
_MM_PERM_CCBD  Experimental 
_MM_PERM_CCCA  Experimental 
_MM_PERM_CCCB  Experimental 
_MM_PERM_CCCC  Experimental 
_MM_PERM_CCCD  Experimental 
_MM_PERM_CCDA  Experimental 
_MM_PERM_CCDB  Experimental 
_MM_PERM_CCDC  Experimental 
_MM_PERM_CCDD  Experimental 
_MM_PERM_CDAA  Experimental 
_MM_PERM_CDAB  Experimental 
_MM_PERM_CDAC  Experimental 
_MM_PERM_CDAD  Experimental 
_MM_PERM_CDBA  Experimental 
_MM_PERM_CDBB  Experimental 
_MM_PERM_CDBC  Experimental 
_MM_PERM_CDBD  Experimental 
_MM_PERM_CDCA  Experimental 
_MM_PERM_CDCB  Experimental 
_MM_PERM_CDCC  Experimental 
_MM_PERM_CDCD  Experimental 
_MM_PERM_CDDA  Experimental 
_MM_PERM_CDDB  Experimental 
_MM_PERM_CDDC  Experimental 
_MM_PERM_CDDD  Experimental 
_MM_PERM_DAAA  Experimental 
_MM_PERM_DAAB  Experimental 
_MM_PERM_DAAC  Experimental 
_MM_PERM_DAAD  Experimental 
_MM_PERM_DABA  Experimental 
_MM_PERM_DABB  Experimental 
_MM_PERM_DABC  Experimental 
_MM_PERM_DABD  Experimental 
_MM_PERM_DACA  Experimental 
_MM_PERM_DACB  Experimental 
_MM_PERM_DACC  Experimental 
_MM_PERM_DACD  Experimental 
_MM_PERM_DADA  Experimental 
_MM_PERM_DADB  Experimental 
_MM_PERM_DADC  Experimental 
_MM_PERM_DADD  Experimental 
_MM_PERM_DBAA  Experimental 
_MM_PERM_DBAB  Experimental 
_MM_PERM_DBAC  Experimental 
_MM_PERM_DBAD  Experimental 
_MM_PERM_DBBA  Experimental 
_MM_PERM_DBBB  Experimental 
_MM_PERM_DBBC  Experimental 
_MM_PERM_DBBD  Experimental 
_MM_PERM_DBCA  Experimental 
_MM_PERM_DBCB  Experimental 
_MM_PERM_DBCC  Experimental 
_MM_PERM_DBCD  Experimental 
_MM_PERM_DBDA  Experimental 
_MM_PERM_DBDB  Experimental 
_MM_PERM_DBDC  Experimental 
_MM_PERM_DBDD  Experimental 
_MM_PERM_DCAA  Experimental 
_MM_PERM_DCAB  Experimental 
_MM_PERM_DCAC  Experimental 
_MM_PERM_DCAD  Experimental 
_MM_PERM_DCBA  Experimental 
_MM_PERM_DCBB  Experimental 
_MM_PERM_DCBC  Experimental 
_MM_PERM_DCBD  Experimental 
_MM_PERM_DCCA  Experimental 
_MM_PERM_DCCB  Experimental 
_MM_PERM_DCCC  Experimental 
_MM_PERM_DCCD  Experimental 
_MM_PERM_DCDA  Experimental 
_MM_PERM_DCDB  Experimental 
_MM_PERM_DCDC  Experimental 
_MM_PERM_DCDD  Experimental 
_MM_PERM_DDAA  Experimental 
_MM_PERM_DDAB  Experimental 
_MM_PERM_DDAC  Experimental 
_MM_PERM_DDAD  Experimental 
_MM_PERM_DDBA  Experimental 
_MM_PERM_DDBB  Experimental 
_MM_PERM_DDBC  Experimental 
_MM_PERM_DDBD  Experimental 
_MM_PERM_DDCA  Experimental 
_MM_PERM_DDCB  Experimental 
_MM_PERM_DDCC  Experimental 
_MM_PERM_DDCD  Experimental 
_MM_PERM_DDDA  Experimental 
_MM_PERM_DDDB  Experimental 
_MM_PERM_DDDC  Experimental 
_MM_PERM_DDDD  Experimental 
_XABORT_CAPACITY  Experimental Transaction abort due to the transaction using too much memory. 
_XABORT_CONFLICT  Experimental Transaction abort due to a memory conflict with another thread. 
_XABORT_DEBUG  Experimental Transaction abort due to a debug trap. 
_XABORT_EXPLICIT  Experimental Transaction explicitly aborted with xabort. The parameter passed to xabort is available with

_XABORT_NESTED  Experimental Transaction abort in a inner nested transaction. 
_XABORT_RETRY  Experimental Transaction retry is possible. 
_XBEGIN_STARTED  Experimental Transaction successfully started. 
_CMP_EQ_OQ  Equal (ordered, nonsignaling) 
_CMP_EQ_OS  Equal (ordered, signaling) 
_CMP_EQ_UQ  Equal (unordered, nonsignaling) 
_CMP_EQ_US  Equal (unordered, signaling) 
_CMP_FALSE_OQ  False (ordered, nonsignaling) 
_CMP_FALSE_OS  False (ordered, signaling) 
_CMP_GE_OQ  Greaterthanorequal (ordered, nonsignaling) 
_CMP_GE_OS  Greaterthanorequal (ordered, signaling) 
_CMP_GT_OQ  Greaterthan (ordered, nonsignaling) 
_CMP_GT_OS  Greaterthan (ordered, signaling) 
_CMP_LE_OQ  Lessthanorequal (ordered, nonsignaling) 
_CMP_LE_OS  Lessthanorequal (ordered, signaling) 
_CMP_LT_OQ  Lessthan (ordered, nonsignaling) 
_CMP_LT_OS  Lessthan (ordered, signaling) 
_CMP_NEQ_OQ  Notequal (ordered, nonsignaling) 
_CMP_NEQ_OS  Notequal (ordered, signaling) 
_CMP_NEQ_UQ  Notequal (unordered, nonsignaling) 
_CMP_NEQ_US  Notequal (unordered, signaling) 
_CMP_NGE_UQ  Notgreaterthanorequal (unordered, nonsignaling) 
_CMP_NGE_US  Notgreaterthanorequal (unordered, signaling) 
_CMP_NGT_UQ  Notgreaterthan (unordered, nonsignaling) 
_CMP_NGT_US  Notgreaterthan (unordered, signaling) 
_CMP_NLE_UQ  Notlessthanorequal (unordered, nonsignaling) 
_CMP_NLE_US  Notlessthanorequal (unordered, signaling) 
_CMP_NLT_UQ  Notlessthan (unordered, nonsignaling) 
_CMP_NLT_US  Notlessthan (unordered, signaling) 
_CMP_ORD_Q  Ordered (nonsignaling) 
_CMP_ORD_S  Ordered (signaling) 
_CMP_TRUE_UQ  True (unordered, nonsignaling) 
_CMP_TRUE_US  True (unordered, signaling) 
_CMP_UNORD_Q  Unordered (nonsignaling) 
_CMP_UNORD_S  Unordered (signaling) 
_MM_EXCEPT_DENORM  See 
_MM_EXCEPT_DIV_ZERO  See 
_MM_EXCEPT_INEXACT  See 
_MM_EXCEPT_INVALID  See 
_MM_EXCEPT_MASK  
_MM_EXCEPT_OVERFLOW  See 
_MM_EXCEPT_UNDERFLOW  See 
_MM_FLUSH_ZERO_MASK  
_MM_FLUSH_ZERO_OFF  See 
_MM_FLUSH_ZERO_ON  See 
_MM_FROUND_CEIL  round up and do not suppress exceptions 
_MM_FROUND_CUR_DIRECTION  use MXCSR.RC; see 
_MM_FROUND_FLOOR  round down and do not suppress exceptions 
_MM_FROUND_NEARBYINT  use MXCSR.RC and suppress exceptions; see 
_MM_FROUND_NINT  round to nearest and do not suppress exceptions 
_MM_FROUND_NO_EXC  suppress exceptions 
_MM_FROUND_RAISE_EXC  do not suppress exceptions 
_MM_FROUND_RINT  use MXCSR.RC and do not suppress exceptions; see

_MM_FROUND_TO_NEAREST_INT  round to nearest 
_MM_FROUND_TO_NEG_INF  round down 
_MM_FROUND_TO_POS_INF  round up 
_MM_FROUND_TO_ZERO  truncate 
_MM_FROUND_TRUNC  truncate and do not suppress exceptions 
_MM_HINT_ET0  See 
_MM_HINT_ET1  See 
_MM_HINT_NTA  See 
_MM_HINT_T0  See 
_MM_HINT_T1  See 
_MM_HINT_T2  See 
_MM_MASK_DENORM  See 
_MM_MASK_DIV_ZERO  See 
_MM_MASK_INEXACT  See 
_MM_MASK_INVALID  See 
_MM_MASK_MASK  
_MM_MASK_OVERFLOW  See 
_MM_MASK_UNDERFLOW  See 
_MM_ROUND_DOWN  See 
_MM_ROUND_MASK  
_MM_ROUND_NEAREST  See 
_MM_ROUND_TOWARD_ZERO  See 
_MM_ROUND_UP  See 
_SIDD_BIT_MASK  Mask only: return the bit mask 
_SIDD_CMP_EQUAL_ANY  For each character in 
_SIDD_CMP_EQUAL_EACH  The strings defined by 
_SIDD_CMP_EQUAL_ORDERED  Search for the defined substring in the target 
_SIDD_CMP_RANGES  For each character in 
_SIDD_LEAST_SIGNIFICANT  Index only: return the least significant bit (Default) 
_SIDD_MASKED_NEGATIVE_POLARITY  Negates results only before the end of the string 
_SIDD_MASKED_POSITIVE_POLARITY  Do not negate results before the end of the string 
_SIDD_MOST_SIGNIFICANT  Index only: return the most significant bit 
_SIDD_NEGATIVE_POLARITY  Negates results 
_SIDD_POSITIVE_POLARITY  Do not negate results (Default) 
_SIDD_SBYTE_OPS  String contains signed 8bit characters 
_SIDD_SWORD_OPS  String contains unsigned 16bit characters 
_SIDD_UBYTE_OPS  String contains unsigned 8bit characters (Default) 
_SIDD_UNIT_MASK  Mask only: return the byte mask 
_SIDD_UWORD_OPS  String contains unsigned 16bit characters 
_XCR_XFEATURE_ENABLED_MASK 

Functions
_MM_SHUFFLE  Experimental A utility function for creating masks to use with Intel shuffle and permute intrinsics. 
_bittest^{⚠}  Experimental Returns the bit in position 
_bittest64^{⚠}  Experimental Returns the bit in position 
_bittestandcomplement^{⚠}  Experimental Returns the bit in position 
_bittestandcomplement64^{⚠}  Experimental Returns the bit in position 
_bittestandreset^{⚠}  Experimental Returns the bit in position 
_bittestandreset64^{⚠}  Experimental Returns the bit in position 
_bittestandset^{⚠}  Experimental Returns the bit in position 
_bittestandset64^{⚠}  Experimental Returns the bit in position 
_kadd_mask32^{⚠}  Experimentalavx512bw Add 32bit masks in a and b, and store the result in k. 
_kadd_mask64^{⚠}  Experimentalavx512bw Add 64bit masks in a and b, and store the result in k. 
_kand_mask16^{⚠}  Experimentalavx512f Compute the bitwise AND of 16bit masks a and b, and store the result in k. 
_kand_mask32^{⚠}  Experimentalavx512bw Compute the bitwise AND of 32bit masks a and b, and store the result in k. 
_kand_mask64^{⚠}  Experimentalavx512bw Compute the bitwise AND of 64bit masks a and b, and store the result in k. 
_kandn_mask16^{⚠}  Experimentalavx512f Compute the bitwise NOT of 16bit masks a and then AND with b, and store the result in k. 
_kandn_mask32^{⚠}  Experimentalavx512bw Compute the bitwise NOT of 32bit masks a and then AND with b, and store the result in k. 
_kandn_mask64^{⚠}  Experimentalavx512bw Compute the bitwise NOT of 64bit masks a and then AND with b, and store the result in k. 
_knot_mask16^{⚠}  Experimentalavx512f Compute the bitwise NOT of 16bit mask a, and store the result in k. 
_knot_mask32^{⚠}  Experimentalavx512bw Compute the bitwise NOT of 32bit mask a, and store the result in k. 
_knot_mask64^{⚠}  Experimentalavx512bw Compute the bitwise NOT of 64bit mask a, and store the result in k. 
_kor_mask16^{⚠}  Experimentalavx512f Compute the bitwise OR of 16bit masks a and b, and store the result in k. 
_kor_mask32^{⚠}  Experimentalavx512bw Compute the bitwise OR of 32bit masks a and b, and store the result in k. 
_kor_mask64^{⚠}  Experimentalavx512bw Compute the bitwise OR of 64bit masks a and b, and store the result in k. 
_kxnor_mask16^{⚠}  Experimentalavx512f Compute the bitwise XNOR of 16bit masks a and b, and store the result in k. 
_kxnor_mask32^{⚠}  Experimentalavx512bw Compute the bitwise XNOR of 32bit masks a and b, and store the result in k. 
_kxnor_mask64^{⚠}  Experimentalavx512bw Compute the bitwise XNOR of 64bit masks a and b, and store the result in k. 
_kxor_mask16^{⚠}  Experimentalavx512f Compute the bitwise XOR of 16bit masks a and b, and store the result in k. 
_kxor_mask32^{⚠}  Experimentalavx512bw Compute the bitwise XOR of 32bit masks a and b, and store the result in k. 
_kxor_mask64^{⚠}  Experimentalavx512bw Compute the bitwise XOR of 64bit masks a and b, and store the result in k. 
_load_mask32^{⚠}  Experimentalavx512bw Load 32bit mask from memory into k. 
_load_mask64^{⚠}  Experimentalavx512bw Load 64bit mask from memory into k. 
_mm256_abs_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the absolute value of packed signed 64bit integers in a, and store the unsigned results in dst. 
_mm256_aesdec_epi128^{⚠}  Experimentalavx512vaes,avx512vl Performs one round of an AES decryption flow on each 128bit word (state) in 
_mm256_aesdeclast_epi128^{⚠}  Experimentalavx512vaes,avx512vl Performs the last round of an AES decryption flow on each 128bit word (state) in 
_mm256_aesenc_epi128^{⚠}  Experimentalavx512vaes,avx512vl Performs one round of an AES encryption flow on each 128bit word (state) in 
_mm256_aesenclast_epi128^{⚠}  Experimentalavx512vaes,avx512vl Performs the last round of an AES encryption flow on each 128bit word (state) in 
_mm256_alignr_epi32^{⚠}  Experimentalavx512f,avx512vl Concatenate a and b into a 64byte immediate result, shift the result right by imm8 32bit elements, and store the low 32 bytes (8 elements) in dst. 
_mm256_alignr_epi64^{⚠}  Experimentalavx512f,avx512vl Concatenate a and b into a 64byte immediate result, shift the result right by imm8 64bit elements, and store the low 32 bytes (4 elements) in dst. 
_mm256_bitshuffle_epi64_mask^{⚠}  Experimentalavx512bitalg,avx512vl Considers the input 
_mm256_broadcast_f32x4^{⚠}  Experimentalavx512f,avx512vl Broadcast the 4 packed singleprecision (32bit) floatingpoint elements from a to all elements of dst. 
_mm256_broadcast_i32x4^{⚠}  Experimentalavx512f,avx512vl Broadcast the 4 packed 32bit integers from a to all elements of dst. 
_mm256_broadcastmb_epi64^{⚠}  Experimentalavx512cd,avx512vl Broadcast the low 8bits from input mask k to all 64bit elements of dst. 
_mm256_broadcastmw_epi32^{⚠}  Experimentalavx512cd,avx512vl Broadcast the low 16bits from input mask k to all 32bit elements of dst. 
_mm256_clmulepi64_epi128^{⚠}  Experimentalavx512vpclmulqdq,avx512vl Performs a carryless multiplication of two 64bit polynomials over the finite field GF(2^k)  in each of the 2 128bit lanes. 
_mm256_cmp_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm256_cmp_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm256_cmp_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm256_cmp_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm256_cmp_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm256_cmp_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm256_cmp_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm256_cmp_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm256_cmp_pd_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed doubleprecision (64bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm256_cmp_ps_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed singleprecision (32bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm256_cmpeq_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for equality, and store the results in mask vector k. 
_mm256_cmpeq_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for equality, and store the results in mask vector k. 
_mm256_cmpeq_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed 32bit integers in a and b for equality, and store the results in mask vector k. 
_mm256_cmpeq_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed 64bit integers in a and b for equality, and store the results in mask vector k. 
_mm256_cmpeq_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for equality, and store the results in mask vector k. 
_mm256_cmpeq_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for equality, and store the results in mask vector k. 
_mm256_cmpeq_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for equality, and store the results in mask vector k. 
_mm256_cmpeq_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for equality, and store the results in mask vector k. 
_mm256_cmpge_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm256_cmpge_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm256_cmpge_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm256_cmpge_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm256_cmpge_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm256_cmpge_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm256_cmpge_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm256_cmpge_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm256_cmpgt_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm256_cmpgt_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm256_cmpgt_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm256_cmpgt_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm256_cmpgt_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm256_cmpgt_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm256_cmpgt_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm256_cmpgt_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm256_cmple_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm256_cmple_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm256_cmple_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm256_cmple_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm256_cmple_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm256_cmple_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm256_cmple_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm256_cmple_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm256_cmplt_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm256_cmplt_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm256_cmplt_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm256_cmplt_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm256_cmplt_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm256_cmplt_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm256_cmplt_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm256_cmplt_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm256_cmpneq_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for notequal, and store the results in mask vector k. 
_mm256_cmpneq_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for notequal, and store the results in mask vector k. 
_mm256_cmpneq_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed 32bit integers in a and b for notequal, and store the results in mask vector k. 
_mm256_cmpneq_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b for notequal, and store the results in mask vector k. 
_mm256_cmpneq_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for notequal, and store the results in mask vector k. 
_mm256_cmpneq_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for notequal, and store the results in mask vector k. 
_mm256_cmpneq_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for notequal, and store the results in mask vector k. 
_mm256_cmpneq_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for notequal, and store the results in mask vector k. 
_mm256_conflict_epi32^{⚠}  Experimentalavx512cd,avx512vl Test each 32bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst. 
_mm256_conflict_epi64^{⚠}  Experimentalavx512cd,avx512vl Test each 64bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst. 
_mm256_cvtepi16_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed 16bit integers in a to packed 8bit integers with truncation, and store the results in dst. 
_mm256_cvtepi32_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed 32bit integers in a to packed 8bit integers with truncation, and store the results in dst. 
_mm256_cvtepi32_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed 32bit integers in a to packed 16bit integers with truncation, and store the results in dst. 
_mm256_cvtepi64_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 8bit integers with truncation, and store the results in dst. 
_mm256_cvtepi64_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 16bit integers with truncation, and store the results in dst. 
_mm256_cvtepi64_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 32bit integers with truncation, and store the results in dst. 
_mm256_cvtepu32_pd^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst. 
_mm256_cvtne2ps_pbh^{⚠}  Experimentalavx512bf16,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in two 256bit vectors a and b to packed BF16 (16bit) floatingpoint elements, and store the results in a 256bit wide vector. Intel’s documentation 
_mm256_cvtneps_pbh^{⚠}  Experimentalavx512bf16,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed BF16 (16bit) floatingpoint elements, and store the results in dst. Intel’s documentation 
_mm256_cvtpd_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst. 
_mm256_cvtph_ps^{⚠}  Experimentalf16c Converts the 8 x 16bit halfprecision float values in the 128bit vector

_mm256_cvtps_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst. 
_mm256_cvtps_ph^{⚠}  Experimentalf16c Converts the 8 x 32bit float values in the 256bit vector 
_mm256_cvtsepi16_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 16bit integers in a to packed 8bit integers with signed saturation, and store the results in dst. 
_mm256_cvtsepi32_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed 8bit integers with signed saturation, and store the results in dst. 
_mm256_cvtsepi32_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed 16bit integers with signed saturation, and store the results in dst. 
_mm256_cvtsepi64_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 8bit integers with signed saturation, and store the results in dst. 
_mm256_cvtsepi64_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 16bit integers with signed saturation, and store the results in dst. 
_mm256_cvtsepi64_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 32bit integers with signed saturation, and store the results in dst. 
_mm256_cvttpd_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst. 
_mm256_cvttps_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst. 
_mm256_cvtusepi16_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed unsigned 16bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst. 
_mm256_cvtusepi32_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst. 
_mm256_cvtusepi32_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed unsigned 16bit integers with unsigned saturation, and store the results in dst. 
_mm256_cvtusepi64_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst. 
_mm256_cvtusepi64_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed unsigned 16bit integers with unsigned saturation, and store the results in dst. 
_mm256_cvtusepi64_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed unsigned 32bit integers with unsigned saturation, and store the results in dst. 
_mm256_dbsad_epu8^{⚠}  Experimentalavx512bw,avx512vl Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8bit integers in a compared to those in b, and store the 16bit results in dst. Four SADs are performed on four 8bit quadruplets for each 64bit lane. The first two SADs use the lower 8bit quadruplet of the lane from a, and the last two SADs use the uppper 8bit quadruplet of the lane from a. Quadruplets from b are selected from within 128bit lanes according to the control in imm8, and each SAD in each 64bit lane uses the selected quadruplet at 8bit offsets. 
_mm256_dpbf16_ps^{⚠}  Experimentalavx512bf16,avx512vl Compute dotproduct of BF16 (16bit) floatingpoint pairs in a and b, accumulating the intermediate singleprecision (32bit) floatingpoint elements with elements in src, and store the results in dst. Intel’s documentation 
_mm256_dpbusd_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 4 adjacent pairs of unsigned 8bit integers in a with corresponding signed 8bit integers in b, producing 4 intermediate signed 16bit results. Sum these 4 results with the corresponding 32bit integer in src, and store the packed 32bit results in dst. 
_mm256_dpbusds_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 4 adjacent pairs of unsigned 8bit integers in a with corresponding signed 8bit integers in b, producing 4 intermediate signed 16bit results. Sum these 4 results with the corresponding 32bit integer in src using signed saturation, and store the packed 32bit results in dst. 
_mm256_dpwssd_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 2 adjacent pairs of signed 16bit integers in a with corresponding 16bit integers in b, producing 2 intermediate signed 32bit results. Sum these 2 results with the corresponding 32bit integer in src, and store the packed 32bit results in dst. 
_mm256_dpwssds_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 2 adjacent pairs of signed 16bit integers in a with corresponding 16bit integers in b, producing 2 intermediate signed 32bit results. Sum these 2 results with the corresponding 32bit integer in src using signed saturation, and store the packed 32bit results in dst. 
_mm256_extractf32x4_ps^{⚠}  Experimentalavx512f,avx512vl Extract 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from a, selected with imm8, and store the result in dst. 
_mm256_extracti32x4_epi32^{⚠}  Experimentalavx512f,avx512vl Extract 128 bits (composed of 4 packed 32bit integers) from a, selected with IMM1, and store the result in dst. 
_mm256_fixupimm_pd^{⚠}  Experimentalavx512f,avx512vl Fix up packed doubleprecision (64bit) floatingpoint elements in a and b using packed 64bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting. 
_mm256_fixupimm_ps^{⚠}  Experimentalavx512f,avx512vl Fix up packed singleprecision (32bit) floatingpoint elements in a and b using packed 32bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting. 
_mm256_getexp_pd^{⚠}  Experimentalavx512f,avx512vl Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm256_getexp_ps^{⚠}  Experimentalavx512f,avx512vl Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm256_getmant_pd^{⚠}  Experimentalavx512f,avx512vl Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. 
_mm256_getmant_ps^{⚠}  Experimentalavx512f,avx512vl Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 
_mm256_gf2p8affine_epi64_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512vl Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8bit immediate value. Each pack of 8 bytes in x is paired with the 64bit word at the same position in a. 
_mm256_gf2p8affineinv_epi64_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512vl Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64bit word at the same position in a. 
_mm256_gf2p8mul_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512vl Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. 
_mm256_insertf32x4^{⚠}  Experimentalavx512f,avx512vl Copy a to dst, then insert 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from b into dst at the location specified by imm8. 
_mm256_inserti32x4^{⚠}  Experimentalavx512f,avx512vl Copy a to dst, then insert 128 bits (composed of 4 packed 32bit integers) from b into dst at the location specified by imm8. 
_mm256_load_epi32^{⚠}  Experimentalavx512f,avx512vl Load 256bits (composed of 8 packed 32bit integers) from memory into dst. mem_addr must be aligned on a 32byte boundary or a generalprotection exception may be generated. 
_mm256_load_epi64^{⚠}  Experimentalavx512f,avx512vl Load 256bits (composed of 4 packed 64bit integers) from memory into dst. mem_addr must be aligned on a 32byte boundary or a generalprotection exception may be generated. 
_mm256_loadu_epi8^{⚠}  Experimentalavx512bw,avx512vl Load 256bits (composed of 32 packed 8bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary. 
_mm256_loadu_epi16^{⚠}  Experimentalavx512bw,avx512vl Load 256bits (composed of 16 packed 16bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary. 
_mm256_loadu_epi32^{⚠}  Experimentalavx512f,avx512vl Load 256bits (composed of 8 packed 32bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary. 
_mm256_loadu_epi64^{⚠}  Experimentalavx512f,avx512vl Load 256bits (composed of 4 packed 64bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary. 
_mm256_lzcnt_epi32^{⚠}  Experimentalavx512cd,avx512vl Counts the number of leading zero bits in each packed 32bit integer in a, and store the results in dst. 
_mm256_lzcnt_epi64^{⚠}  Experimentalavx512cd,avx512vl Counts the number of leading zero bits in each packed 64bit integer in a, and store the results in dst. 
_mm256_madd52hi_epu64^{⚠}  Experimentalavx512ifma,avx512vl Multiply packed unsigned 52bit integers in each 64bit element of

_mm256_madd52lo_epu64^{⚠}  Experimentalavx512ifma,avx512vl Multiply packed unsigned 52bit integers in each 64bit element of

_mm256_mask2_permutex2var_epi8^{⚠}  Experimentalavx512vbmi,avx512vl Shuffle 8bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask2_permutex2var_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm256_mask2_permutex2var_epi32^{⚠}  Experimentalavx512f,avx512vl Shuffle 32bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm256_mask2_permutex2var_epi64^{⚠}  Experimentalavx512f,avx512vl Shuffle 64bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm256_mask2_permutex2var_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set) 
_mm256_mask2_permutex2var_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm256_mask3_fmadd_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fmadd_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fmaddsub_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fmaddsub_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fmsub_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fmsub_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fmsubadd_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fmsubadd_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fnmadd_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fnmadd_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fnmsub_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask3_fnmsub_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm256_mask_abs_epi8^{⚠}  Experimentalavx512bw,avx512vl Compute the absolute value of packed signed 8bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_abs_epi16^{⚠}  Experimentalavx512bw,avx512vl Compute the absolute value of packed signed 16bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_abs_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the absolute value of packed signed 32bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_abs_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the absolute value of packed signed 64bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_add_epi8^{⚠}  Experimentalavx512bw,avx512vl Add packed 8bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_add_epi16^{⚠}  Experimentalavx512bw,avx512vl Add packed 16bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_add_epi32^{⚠}  Experimentalavx512f,avx512vl Add packed 32bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_add_epi64^{⚠}  Experimentalavx512f,avx512vl Add packed 64bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_add_pd^{⚠}  Experimentalavx512f,avx512vl Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_add_ps^{⚠}  Experimentalavx512f,avx512vl Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_adds_epi8^{⚠}  Experimentalavx512bw,avx512vl Add packed signed 8bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_adds_epi16^{⚠}  Experimentalavx512bw,avx512vl Add packed signed 16bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_adds_epu8^{⚠}  Experimentalavx512bw,avx512vl Add packed unsigned 8bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_adds_epu16^{⚠}  Experimentalavx512bw,avx512vl Add packed unsigned 16bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_alignr_epi8^{⚠}  Experimentalavx512bw,avx512vl Concatenate pairs of 16byte blocks in a and b into a 32byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_alignr_epi32^{⚠}  Experimentalavx512f,avx512vl Concatenate a and b into a 64byte immediate result, shift the result right by imm8 32bit elements, and store the low 32 bytes (8 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_alignr_epi64^{⚠}  Experimentalavx512f,avx512vl Concatenate a and b into a 64byte immediate result, shift the result right by imm8 64bit elements, and store the low 32 bytes (4 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_and_epi32^{⚠}  Experimentalavx512f,avx512vl Performs elementbyelement bitwise AND between packed 32bit integer elements of a and b, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_and_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise AND of packed 64bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_andnot_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise NOT of packed 32bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_andnot_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise NOT of packed 64bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_avg_epu8^{⚠}  Experimentalavx512bw,avx512vl Average packed unsigned 8bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_avg_epu16^{⚠}  Experimentalavx512bw,avx512vl Average packed unsigned 16bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_bitshuffle_epi64_mask^{⚠}  Experimentalavx512bitalg,avx512vl Considers the input 
_mm256_mask_blend_epi8^{⚠}  Experimentalavx512bw,avx512vl Blend packed 8bit integers from a and b using control mask k, and store the results in dst. 
_mm256_mask_blend_epi16^{⚠}  Experimentalavx512bw,avx512vl Blend packed 16bit integers from a and b using control mask k, and store the results in dst. 
_mm256_mask_blend_epi32^{⚠}  Experimentalavx512f,avx512vl Blend packed 32bit integers from a and b using control mask k, and store the results in dst. 
_mm256_mask_blend_epi64^{⚠}  Experimentalavx512f,avx512vl Blend packed 64bit integers from a and b using control mask k, and store the results in dst. 
_mm256_mask_blend_pd^{⚠}  Experimentalavx512f,avx512vl Blend packed doubleprecision (64bit) floatingpoint elements from a and b using control mask k, and store the results in dst. 
_mm256_mask_blend_ps^{⚠}  Experimentalavx512f,avx512vl Blend packed singleprecision (32bit) floatingpoint elements from a and b using control mask k, and store the results in dst. 
_mm256_mask_broadcast_f32x4^{⚠}  Experimentalavx512f,avx512vl Broadcast the 4 packed singleprecision (32bit) floatingpoint elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_broadcast_i32x4^{⚠}  Experimentalavx512f,avx512vl Broadcast the 4 packed 32bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_broadcastb_epi8^{⚠}  Experimentalavx512bw,avx512vl Broadcast the low packed 8bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_broadcastd_epi32^{⚠}  Experimentalavx512f,avx512vl Broadcast the low packed 32bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_broadcastq_epi64^{⚠}  Experimentalavx512f,avx512vl Broadcast the low packed 64bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_broadcastsd_pd^{⚠}  Experimentalavx512f,avx512vl Broadcast the low doubleprecision (64bit) floatingpoint element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_broadcastss_ps^{⚠}  Experimentalavx512f,avx512vl Broadcast the low singleprecision (32bit) floatingpoint element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_broadcastw_epi16^{⚠}  Experimentalavx512bw,avx512vl Broadcast the low packed 16bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cmp_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmp_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmp_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmp_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmp_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmp_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmp_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmp_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmp_pd_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed doubleprecision (64bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmp_ps_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed singleprecision (32bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpeq_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpeq_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpeq_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed 32bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpeq_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed 64bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpeq_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpeq_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpeq_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpeq_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpge_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpge_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpge_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpge_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpge_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpge_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpge_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpge_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpgt_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpgt_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpgt_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpgt_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpgt_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpgt_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpgt_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpgt_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmple_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmple_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmple_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmple_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmple_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmple_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmple_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmple_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmplt_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmplt_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmplt_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmplt_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmplt_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmplt_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmplt_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmplt_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpneq_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpneq_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpneq_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed 32bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpneq_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpneq_epu8_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpneq_epu16_mask^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpneq_epu32_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_cmpneq_epu64_mask^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_mask_compress_epi8^{⚠}  Experimentalavx512vbmi2,avx512vl Contiguously store the active 8bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm256_mask_compress_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Contiguously store the active 16bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm256_mask_compress_epi32^{⚠}  Experimentalavx512f,avx512vl Contiguously store the active 32bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm256_mask_compress_epi64^{⚠}  Experimentalavx512f,avx512vl Contiguously store the active 64bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm256_mask_compress_pd^{⚠}  Experimentalavx512f,avx512vl Contiguously store the active doubleprecision (64bit) floatingpoint elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm256_mask_compress_ps^{⚠}  Experimentalavx512f,avx512vl Contiguously store the active singleprecision (32bit) floatingpoint elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm256_mask_conflict_epi32^{⚠}  Experimentalavx512cd,avx512vl Test each 32bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst. 
_mm256_mask_conflict_epi64^{⚠}  Experimentalavx512cd,avx512vl Test each 64bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst. 
_mm256_mask_cvt_roundps_ph^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed halfprecision (16bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi8_epi16^{⚠}  Experimentalavx512bw,avx512vl Sign extend packed 8bit integers in a to packed 16bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi8_epi32^{⚠}  Experimentalavx512f,avx512vl Sign extend packed 8bit integers in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi8_epi64^{⚠}  Experimentalavx512f,avx512vl Sign extend packed 8bit integers in the low 4 bytes of a to packed 64bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi16_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed 16bit integers in a to packed 8bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi16_epi32^{⚠}  Experimentalavx512f,avx512vl Sign extend packed 16bit integers in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi16_epi64^{⚠}  Experimentalavx512f,avx512vl Sign extend packed 16bit integers in a to packed 64bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi16_storeu_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed 16bit integers in a to packed 8bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtepi32_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed 32bit integers in a to packed 8bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi32_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed 32bit integers in a to packed 16bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi32_epi64^{⚠}  Experimentalavx512f,avx512vl Sign extend packed 32bit integers in a to packed 64bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi32_pd^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi32_ps^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi32_storeu_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed 32bit integers in a to packed 8bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtepi32_storeu_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed 32bit integers in a to packed 16bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtepi64_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 8bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi64_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 16bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi64_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepi64_storeu_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 8bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtepi64_storeu_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 16bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtepi64_storeu_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 32bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtepu8_epi16^{⚠}  Experimentalavx512bw,avx512vl Zero extend packed unsigned 8bit integers in a to packed 16bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepu8_epi32^{⚠}  Experimentalavx512f,avx512vl Zero extend packed unsigned 8bit integers in the low 8 bytes of a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepu8_epi64^{⚠}  Experimentalavx512f,avx512vl Zero extend packed unsigned 8bit integers in the low 4 bytes of a to packed 64bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepu16_epi32^{⚠}  Experimentalavx512f,avx512vl Zero extend packed unsigned 16bit integers in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepu16_epi64^{⚠}  Experimentalavx512f,avx512vl Zero extend packed unsigned 16bit integers in the low 8 bytes of a to packed 64bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepu32_epi64^{⚠}  Experimentalavx512f,avx512vl Zero extend packed unsigned 32bit integers in a to packed 64bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtepu32_pd^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtne2ps_pbh^{⚠}  Experimentalavx512bf16,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in two vectors a and b to packed BF16 (16bit) floatingpoint elements and and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation 
_mm256_mask_cvtneps_pbh^{⚠}  Experimentalavx512bf16,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed BF16 (16bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation 
_mm256_mask_cvtpd_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtpd_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtpd_ps^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtph_ps^{⚠}  Experimentalavx512f,avx512vl Convert packed halfprecision (16bit) floatingpoint elements in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtps_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtps_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtps_ph^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed halfprecision (16bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtsepi16_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 16bit integers in a to packed 8bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtsepi16_storeu_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 16bit integers in a to packed 8bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtsepi32_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed 8bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtsepi32_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed 16bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtsepi32_storeu_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed 8bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtsepi32_storeu_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed 16bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtsepi64_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 8bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtsepi64_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 16bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtsepi64_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 32bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtsepi64_storeu_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 8bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtsepi64_storeu_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 16bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtsepi64_storeu_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 32bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvttpd_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvttpd_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvttps_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvttps_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtusepi16_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed unsigned 16bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtusepi16_storeu_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed unsigned 16bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtusepi32_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtusepi32_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed unsigned 16bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtusepi32_storeu_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed 8bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtusepi32_storeu_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed unsigned 16bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtusepi64_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtusepi64_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed unsigned 16bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtusepi64_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed unsigned 32bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_cvtusepi64_storeu_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed 8bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtusepi64_storeu_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed 16bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_cvtusepi64_storeu_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed 32bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm256_mask_dbsad_epu8^{⚠}  Experimentalavx512bw,avx512vl Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8bit integers in a compared to those in b, and store the 16bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Four SADs are performed on four 8bit quadruplets for each 64bit lane. The first two SADs use the lower 8bit quadruplet of the lane from a, and the last two SADs use the uppper 8bit quadruplet of the lane from a. Quadruplets from b are selected from within 128bit lanes according to the control in imm8, and each SAD in each 64bit lane uses the selected quadruplet at 8bit offsets. 
_mm256_mask_div_pd^{⚠}  Experimentalavx512f,avx512vl Divide packed doubleprecision (64bit) floatingpoint elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_div_ps^{⚠}  Experimentalavx512f,avx512vl Divide packed singleprecision (32bit) floatingpoint elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_dpbf16_ps^{⚠}  Experimentalavx512bf16,avx512vl Compute dotproduct of BF16 (16bit) floatingpoint pairs in a and b, accumulating the intermediate singleprecision (32bit) floatingpoint elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation 
_mm256_mask_dpbusd_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 4 adjacent pairs of unsigned 8bit integers in a with corresponding signed 8bit integers in b, producing 4 intermediate signed 16bit results. Sum these 4 results with the corresponding 32bit integer in src, and store the packed 32bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_dpbusds_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 4 adjacent pairs of unsigned 8bit integers in a with corresponding signed 8bit integers in b, producing 4 intermediate signed 16bit results. Sum these 4 results with the corresponding 32bit integer in src using signed saturation, and store the packed 32bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_dpwssd_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 2 adjacent pairs of signed 16bit integers in a with corresponding 16bit integers in b, producing 2 intermediate signed 32bit results. Sum these 2 results with the corresponding 32bit integer in src, and store the packed 32bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_dpwssds_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 2 adjacent pairs of signed 16bit integers in a with corresponding 16bit integers in b, producing 2 intermediate signed 32bit results. Sum these 2 results with the corresponding 32bit integer in src using signed saturation, and store the packed 32bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_expand_epi8^{⚠}  Experimentalavx512vbmi2,avx512vl Load contiguous active 8bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_expand_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Load contiguous active 16bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_expand_epi32^{⚠}  Experimentalavx512f,avx512vl Load contiguous active 32bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_expand_epi64^{⚠}  Experimentalavx512f,avx512vl Load contiguous active 64bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_expand_pd^{⚠}  Experimentalavx512f,avx512vl Load contiguous active doubleprecision (64bit) floatingpoint elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_expand_ps^{⚠}  Experimentalavx512f,avx512vl Load contiguous active singleprecision (32bit) floatingpoint elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_extractf32x4_ps^{⚠}  Experimentalavx512f,avx512vl Extract 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from a, selected with imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_extracti32x4_epi32^{⚠}  Experimentalavx512f,avx512vl Extract 128 bits (composed of 4 packed 32bit integers) from a, selected with IMM1, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_fixupimm_pd^{⚠}  Experimentalavx512f,avx512vl Fix up packed doubleprecision (64bit) floatingpoint elements in a and b using packed 64bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting. 
_mm256_mask_fixupimm_ps^{⚠}  Experimentalavx512f,avx512vl Fix up packed singleprecision (32bit) floatingpoint elements in a and b using packed 32bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting. 
_mm256_mask_fmadd_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fmadd_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fmaddsub_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fmaddsub_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fmsub_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fmsub_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fmsubadd_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fmsubadd_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fnmadd_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fnmadd_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fnmsub_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_fnmsub_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_getexp_pd^{⚠}  Experimentalavx512f,avx512vl Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm256_mask_getexp_ps^{⚠}  Experimentalavx512f,avx512vl Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm256_mask_getmant_pd^{⚠}  Experimentalavx512f,avx512vl Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. 
_mm256_mask_getmant_ps^{⚠}  Experimentalavx512f,avx512vl Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. 
_mm256_mask_gf2p8affine_epi64_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512vl Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8bit immediate value. Each pack of 8 bytes in x is paired with the 64bit word at the same position in a. 
_mm256_mask_gf2p8affineinv_epi64_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512vl Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64bit word at the same position in a. 
_mm256_mask_gf2p8mul_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512vl Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. 
_mm256_mask_insertf32x4^{⚠}  Experimentalavx512f,avx512vl Copy a to tmp, then insert 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_inserti32x4^{⚠}  Experimentalavx512f,avx512vl Copy a to tmp, then insert 128 bits (composed of 4 packed 32bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_lzcnt_epi32^{⚠}  Experimentalavx512cd,avx512vl Counts the number of leading zero bits in each packed 32bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_lzcnt_epi64^{⚠}  Experimentalavx512cd,avx512vl Counts the number of leading zero bits in each packed 64bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_madd_epi16^{⚠}  Experimentalavx512bw,avx512vl Multiply packed signed 16bit integers in a and b, producing intermediate signed 32bit integers. Horizontally add adjacent pairs of intermediate 32bit integers, and pack the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_maddubs_epi16^{⚠}  Experimentalavx512bw,avx512vl Multiply packed unsigned 8bit integers in a by packed signed 8bit integers in b, producing intermediate signed 16bit integers. Horizontally add adjacent pairs of intermediate signed 16bit integers, and pack the saturated results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_max_epi8^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_max_epi16^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_max_epi32^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_max_epi64^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_max_epu8^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_max_epu16^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_max_epu32^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_max_epu64^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_max_pd^{⚠}  Experimentalavx512f,avx512vl Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_max_ps^{⚠}  Experimentalavx512f,avx512vl Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_min_epi8^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_min_epi16^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_min_epi32^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_min_epi64^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_min_epu8^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_min_epu16^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_min_epu32^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_min_epu64^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_min_pd^{⚠}  Experimentalavx512f,avx512vl Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_min_ps^{⚠}  Experimentalavx512f,avx512vl Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mov_epi8^{⚠}  Experimentalavx512bw,avx512vl Move packed 8bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mov_epi16^{⚠}  Experimentalavx512bw,avx512vl Move packed 16bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mov_epi32^{⚠}  Experimentalavx512f,avx512vl Move packed 32bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mov_epi64^{⚠}  Experimentalavx512f,avx512vl Move packed 64bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mov_pd^{⚠}  Experimentalavx512f,avx512vl Move packed doubleprecision (64bit) floatingpoint elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mov_ps^{⚠}  Experimentalavx512f,avx512vl Move packed singleprecision (32bit) floatingpoint elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_movedup_pd^{⚠}  Experimentalavx512f,avx512vl Duplicate evenindexed doubleprecision (64bit) floatingpoint elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_movehdup_ps^{⚠}  Experimentalavx512f,avx512vl Duplicate oddindexed singleprecision (32bit) floatingpoint elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_moveldup_ps^{⚠}  Experimentalavx512f,avx512vl Duplicate evenindexed singleprecision (32bit) floatingpoint elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mul_epi32^{⚠}  Experimentalavx512f,avx512vl Multiply the low signed 32bit integers from each packed 64bit element in a and b, and store the signed 64bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mul_epu32^{⚠}  Experimentalavx512f,avx512vl Multiply the low unsigned 32bit integers from each packed 64bit element in a and b, and store the unsigned 64bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mul_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mul_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mulhi_epi16^{⚠}  Experimentalavx512bw,avx512vl Multiply the packed signed 16bit integers in a and b, producing intermediate 32bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mulhi_epu16^{⚠}  Experimentalavx512bw,avx512vl Multiply the packed unsigned 16bit integers in a and b, producing intermediate 32bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mulhrs_epi16^{⚠}  Experimentalavx512bw,avx512vl Multiply packed signed 16bit integers in a and b, producing intermediate signed 32bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mullo_epi16^{⚠}  Experimentalavx512bw,avx512vl Multiply the packed 16bit integers in a and b, producing intermediate 32bit integers, and store the low 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_mullo_epi32^{⚠}  Experimentalavx512f,avx512vl Multiply the packed 32bit integers in a and b, producing intermediate 64bit integers, and store the low 32 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_multishift_epi64_epi8^{⚠}  Experimentalavx512vbmi,avx512vl For each 64bit element in b, select 8 unaligned bytes using a bytegranular shift control within the corresponding 64bit element of a, and store the 8 assembled bytes to the corresponding 64bit element of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_or_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise OR of packed 32bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_or_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise OR of packed 64bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_packs_epi16^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 16bit integers from a and b to packed 8bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_packs_epi32^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 32bit integers from a and b to packed 16bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_packus_epi16^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 16bit integers from a and b to packed 8bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_packus_epi32^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 32bit integers from a and b to packed 16bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permute_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permute_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permutevar_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a within 128bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permutevar_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permutex2var_epi8^{⚠}  Experimentalavx512vbmi,avx512vl Shuffle 8bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_permutex2var_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_permutex2var_epi32^{⚠}  Experimentalavx512f,avx512vl Shuffle 32bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_permutex2var_epi64^{⚠}  Experimentalavx512f,avx512vl Shuffle 64bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_permutex2var_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_permutex2var_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_permutex_epi64^{⚠}  Experimentalavx512f,avx512vl Shuffle 64bit integers in a within 256bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permutex_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a within 256bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permutexvar_epi8^{⚠}  Experimentalavx512vbmi,avx512vl Shuffle 8bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permutexvar_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permutexvar_epi32^{⚠}  Experimentalavx512f,avx512vl Shuffle 32bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permutexvar_epi64^{⚠}  Experimentalavx512f,avx512vl Shuffle 64bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permutexvar_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_permutexvar_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_popcnt_epi8^{⚠}  Experimentalavx512bitalg,avx512vl For each packed 8bit integer maps the value to the number of logical 1 bits. 
_mm256_mask_popcnt_epi16^{⚠}  Experimentalavx512bitalg,avx512vl For each packed 16bit integer maps the value to the number of logical 1 bits. 
_mm256_mask_popcnt_epi32^{⚠}  Experimentalavx512vpopcntdq,avx512vl For each packed 32bit integer maps the value to the number of logical 1 bits. 
_mm256_mask_popcnt_epi64^{⚠}  Experimentalavx512vpopcntdq,avx512vl For each packed 64bit integer maps the value to the number of logical 1 bits. 
_mm256_mask_rcp14_pd^{⚠}  Experimentalavx512f,avx512vl Compute the approximate reciprocal of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm256_mask_rcp14_ps^{⚠}  Experimentalavx512f,avx512vl Compute the approximate reciprocal of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm256_mask_rol_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_rol_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_rolv_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_rolv_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_ror_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_ror_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_rorv_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_rorv_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_roundscale_pd^{⚠}  Experimentalavx512f,avx512vl Round packed doubleprecision (64bit) floatingpoint elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_roundscale_ps^{⚠}  Experimentalavx512f,avx512vl Round packed singleprecision (32bit) floatingpoint elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_rsqrt14_pd^{⚠}  Experimentalavx512f,avx512vl Compute the approximate reciprocal square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm256_mask_rsqrt14_ps^{⚠}  Experimentalavx512f,avx512vl Compute the approximate reciprocal square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm256_mask_scalef_pd^{⚠}  Experimentalavx512f,avx512vl Scale the packed doubleprecision (64bit) floatingpoint elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_scalef_ps^{⚠}  Experimentalavx512f,avx512vl Scale the packed singleprecision (32bit) floatingpoint elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_set1_epi8^{⚠}  Experimentalavx512bw,avx512vl Broadcast 8bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_set1_epi16^{⚠}  Experimentalavx512bw,avx512vl Broadcast 16bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_set1_epi32^{⚠}  Experimentalavx512f,avx512vl Broadcast 32bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_set1_epi64^{⚠}  Experimentalavx512f,avx512vl Broadcast 64bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shldi_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in a and b producing an intermediate 32bit result. Shift the result left by imm8 bits, and store the upper 16bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shldi_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in a and b producing an intermediate 64bit result. Shift the result left by imm8 bits, and store the upper 32bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shldi_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in a and b producing an intermediate 128bit result. Shift the result left by imm8 bits, and store the upper 64bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shldv_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in a and b producing an intermediate 32bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_shldv_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in a and b producing an intermediate 64bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_shldv_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in a and b producing an intermediate 128bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_shrdi_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in b and a producing an intermediate 32bit result. Shift the result right by imm8 bits, and store the lower 16bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shrdi_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in b and a producing an intermediate 64bit result. Shift the result right by imm8 bits, and store the lower 32bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shrdi_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in b and a producing an intermediate 128bit result. Shift the result right by imm8 bits, and store the lower 64bits in dst using writemask k (elements are copied from src“ when the corresponding mask bit is not set). 
_mm256_mask_shrdv_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in b and a producing an intermediate 32bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_shrdv_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in b and a producing an intermediate 64bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_shrdv_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in b and a producing an intermediate 128bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm256_mask_shuffle_epi8^{⚠}  Experimentalavx512bw,avx512vl Shuffle 8bit integers in a within 128bit lanes using the control in the corresponding 8bit element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shuffle_epi32^{⚠}  Experimentalavx512f,avx512vl Shuffle 32bit integers in a within 128bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shuffle_f32x4^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 4 singleprecision (32bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shuffle_f64x2^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 2 doubleprecision (64bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shuffle_i32x4^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 4 32bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shuffle_i64x2^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 2 64bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shuffle_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements within 128bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shuffle_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shufflehi_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in the high 64 bits of 128bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128bit lanes of dst, with the low 64 bits of 128bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_shufflelo_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in the low 64 bits of 128bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128bit lanes of dst, with the high 64 bits of 128bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sll_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sll_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sll_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_slli_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_slli_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_slli_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sllv_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sllv_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sllv_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sqrt_pd^{⚠}  Experimentalavx512f,avx512vl Compute the square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sqrt_ps^{⚠}  Experimentalavx512f,avx512vl Compute the square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sra_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sra_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sra_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srai_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srai_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srai_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srav_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srav_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srav_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srl_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srl_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srl_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srli_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srli_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srli_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srlv_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srlv_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_srlv_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sub_epi8^{⚠}  Experimentalavx512bw,avx512vl Subtract packed 8bit integers in b from packed 8bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sub_epi16^{⚠}  Experimentalavx512bw,avx512vl Subtract packed 16bit integers in b from packed 16bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sub_epi32^{⚠}  Experimentalavx512f,avx512vl Subtract packed 32bit integers in b from packed 32bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sub_epi64^{⚠}  Experimentalavx512f,avx512vl Subtract packed 64bit integers in b from packed 64bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sub_pd^{⚠}  Experimentalavx512f,avx512vl Subtract packed doubleprecision (64bit) floatingpoint elements in b from packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_sub_ps^{⚠}  Experimentalavx512f,avx512vl Subtract packed singleprecision (32bit) floatingpoint elements in b from packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_subs_epi8^{⚠}  Experimentalavx512bw,avx512vl Subtract packed signed 8bit integers in b from packed 8bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_subs_epi16^{⚠}  Experimentalavx512bw,avx512vl Subtract packed signed 16bit integers in b from packed 16bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_subs_epu8^{⚠}  Experimentalavx512bw,avx512vl Subtract packed unsigned 8bit integers in b from packed unsigned 8bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_subs_epu16^{⚠}  Experimentalavx512bw,avx512vl Subtract packed unsigned 16bit integers in b from packed unsigned 16bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_ternarylogic_epi32^{⚠}  Experimentalavx512f,avx512vl Bitwise ternary logic that provides the capability to implement any threeoperand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 32bit granularity (32bit elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_ternarylogic_epi64^{⚠}  Experimentalavx512f,avx512vl Bitwise ternary logic that provides the capability to implement any threeoperand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 64bit granularity (64bit elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_test_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compute the bitwise AND of packed 8bit integers in a and b, producing intermediate 8bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is nonzero. 
_mm256_mask_test_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compute the bitwise AND of packed 16bit integers in a and b, producing intermediate 16bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is nonzero. 
_mm256_mask_test_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise AND of packed 32bit integers in a and b, producing intermediate 32bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is nonzero. 
_mm256_mask_test_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise AND of packed 64bit integers in a and b, producing intermediate 64bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is nonzero. 
_mm256_mask_testn_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compute the bitwise NAND of packed 8bit integers in a and b, producing intermediate 8bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero. 
_mm256_mask_testn_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compute the bitwise NAND of packed 16bit integers in a and b, producing intermediate 16bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero. 
_mm256_mask_testn_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise NAND of packed 32bit integers in a and b, producing intermediate 32bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero. 
_mm256_mask_testn_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise NAND of packed 64bit integers in a and b, producing intermediate 64bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero. 
_mm256_mask_unpackhi_epi8^{⚠}  Experimentalavx512bw,avx512vl Unpack and interleave 8bit integers from the high half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpackhi_epi16^{⚠}  Experimentalavx512bw,avx512vl Unpack and interleave 16bit integers from the high half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpackhi_epi32^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave 32bit integers from the high half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpackhi_epi64^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave 64bit integers from the high half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpackhi_pd^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave doubleprecision (64bit) floatingpoint elements from the high half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpackhi_ps^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave singleprecision (32bit) floatingpoint elements from the high half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpacklo_epi8^{⚠}  Experimentalavx512bw,avx512vl Unpack and interleave 8bit integers from the low half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpacklo_epi16^{⚠}  Experimentalavx512bw,avx512vl Unpack and interleave 16bit integers from the low half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpacklo_epi32^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave 32bit integers from the low half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpacklo_epi64^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave 64bit integers from the low half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpacklo_pd^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave doubleprecision (64bit) floatingpoint elements from the low half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_unpacklo_ps^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave singleprecision (32bit) floatingpoint elements from the low half of each 128bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_xor_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise XOR of packed 32bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_mask_xor_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise XOR of packed 64bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_maskz_abs_epi8^{⚠}  Experimentalavx512bw,avx512vl Compute the absolute value of packed signed 8bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_abs_epi16^{⚠}  Experimentalavx512bw,avx512vl Compute the absolute value of packed signed 16bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_abs_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the absolute value of packed signed 32bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_abs_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the absolute value of packed signed 64bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_add_epi8^{⚠}  Experimentalavx512bw,avx512vl Add packed 8bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_add_epi16^{⚠}  Experimentalavx512bw,avx512vl Add packed 16bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_add_epi32^{⚠}  Experimentalavx512f,avx512vl Add packed 32bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_add_epi64^{⚠}  Experimentalavx512f,avx512vl Add packed 64bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_add_pd^{⚠}  Experimentalavx512f,avx512vl Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_add_ps^{⚠}  Experimentalavx512f,avx512vl Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_adds_epi8^{⚠}  Experimentalavx512bw,avx512vl Add packed signed 8bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_adds_epi16^{⚠}  Experimentalavx512bw,avx512vl Add packed signed 16bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_adds_epu8^{⚠}  Experimentalavx512bw,avx512vl Add packed unsigned 8bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_adds_epu16^{⚠}  Experimentalavx512bw,avx512vl Add packed unsigned 16bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_alignr_epi8^{⚠}  Experimentalavx512bw,avx512vl Concatenate pairs of 16byte blocks in a and b into a 32byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_alignr_epi32^{⚠}  Experimentalavx512f,avx512vl Concatenate a and b into a 64byte immediate result, shift the result right by imm8 32bit elements, and store the low 32 bytes (8 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_alignr_epi64^{⚠}  Experimentalavx512f,avx512vl Concatenate a and b into a 64byte immediate result, shift the result right by imm8 64bit elements, and store the low 32 bytes (4 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_and_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise AND of packed 32bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_and_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise AND of packed 64bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_andnot_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise NOT of packed 32bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_andnot_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise NOT of packed 64bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_avg_epu8^{⚠}  Experimentalavx512bw,avx512vl Average packed unsigned 8bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_avg_epu16^{⚠}  Experimentalavx512bw,avx512vl Average packed unsigned 16bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_broadcast_f32x4^{⚠}  Experimentalavx512f,avx512vl Broadcast the 4 packed singleprecision (32bit) floatingpoint elements from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_broadcast_i32x4^{⚠}  Experimentalavx512f,avx512vl Broadcast the 4 packed 32bit integers from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_broadcastb_epi8^{⚠}  Experimentalavx512bw,avx512vl Broadcast the low packed 8bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_broadcastd_epi32^{⚠}  Experimentalavx512f,avx512vl Broadcast the low packed 32bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_broadcastq_epi64^{⚠}  Experimentalavx512f,avx512vl Broadcast the low packed 64bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_broadcastsd_pd^{⚠}  Experimentalavx512f,avx512vl Broadcast the low doubleprecision (64bit) floatingpoint element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_broadcastss_ps^{⚠}  Experimentalavx512f,avx512vl Broadcast the low singleprecision (32bit) floatingpoint element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_broadcastw_epi16^{⚠}  Experimentalavx512bw,avx512vl Broadcast the low packed 16bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_compress_epi8^{⚠}  Experimentalavx512vbmi2,avx512vl Contiguously store the active 8bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero. 
_mm256_maskz_compress_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Contiguously store the active 16bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero. 
_mm256_maskz_compress_epi32^{⚠}  Experimentalavx512f,avx512vl Contiguously store the active 32bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero. 
_mm256_maskz_compress_epi64^{⚠}  Experimentalavx512f,avx512vl Contiguously store the active 64bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero. 
_mm256_maskz_compress_pd^{⚠}  Experimentalavx512f,avx512vl Contiguously store the active doubleprecision (64bit) floatingpoint elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero. 
_mm256_maskz_compress_ps^{⚠}  Experimentalavx512f,avx512vl Contiguously store the active singleprecision (32bit) floatingpoint elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero. 
_mm256_maskz_conflict_epi32^{⚠}  Experimentalavx512cd,avx512vl Test each 32bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst. 
_mm256_maskz_conflict_epi64^{⚠}  Experimentalavx512cd,avx512vl Test each 64bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst. 
_mm256_maskz_cvt_roundps_ph^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed halfprecision (16bit) floatingpoint elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi8_epi16^{⚠}  Experimentalavx512bw,avx512vl Sign extend packed 8bit integers in a to packed 16bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi8_epi32^{⚠}  Experimentalavx512f,avx512vl Sign extend packed 8bit integers in a to packed 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi8_epi64^{⚠}  Experimentalavx512f,avx512vl Sign extend packed 8bit integers in the low 4 bytes of a to packed 64bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi16_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed 16bit integers in a to packed 8bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi16_epi32^{⚠}  Experimentalavx512f,avx512vl Sign extend packed 16bit integers in a to packed 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi16_epi64^{⚠}  Experimentalavx512f,avx512vl Sign extend packed 16bit integers in a to packed 64bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi32_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed 32bit integers in a to packed 8bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi32_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed 32bit integers in a to packed 16bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi32_epi64^{⚠}  Experimentalavx512f,avx512vl Sign extend packed 32bit integers in a to packed 64bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi32_pd^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi32_ps^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi64_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 8bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi64_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 16bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepi64_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed 64bit integers in a to packed 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepu8_epi16^{⚠}  Experimentalavx512bw,avx512vl Zero extend packed unsigned 8bit integers in a to packed 16bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepu8_epi32^{⚠}  Experimentalavx512f,avx512vl Zero extend packed unsigned 8bit integers in the low 8 bytes of a to packed 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepu8_epi64^{⚠}  Experimentalavx512f,avx512vl Zero extend packed unsigned 8bit integers in the low 4 bytes of a to packed 64bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepu16_epi32^{⚠}  Experimentalavx512f,avx512vl Zero extend packed unsigned 16bit integers in a to packed 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepu16_epi64^{⚠}  Experimentalavx512f,avx512vl Zero extend packed unsigned 16bit integers in the low 8 bytes of a to packed 64bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepu32_epi64^{⚠}  Experimentalavx512f,avx512vl Zero extend packed unsigned 32bit integers in a to packed 64bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtepu32_pd^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtne2ps_pbh^{⚠}  Experimentalavx512bf16,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in two vectors a and b to packed BF16 (16bit) floatingpoint elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation 
_mm256_maskz_cvtneps_pbh^{⚠}  Experimentalavx512bf16,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed BF16 (16bit) floatingpoint elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation 
_mm256_maskz_cvtpd_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtpd_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtpd_ps^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtph_ps^{⚠}  Experimentalavx512f,avx512vl Convert packed halfprecision (16bit) floatingpoint elements in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtps_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtps_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtps_ph^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed halfprecision (16bit) floatingpoint elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtsepi16_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 16bit integers in a to packed 8bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtsepi32_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed 8bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtsepi32_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 32bit integers in a to packed 16bit integers with signed saturation, and store the results in dst. 
_mm256_maskz_cvtsepi64_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 8bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtsepi64_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 16bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtsepi64_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed signed 64bit integers in a to packed 32bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvttpd_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvttpd_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvttps_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvttps_epu32^{⚠}  Experimentalavx512f,avx512vl Convert packed doubleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtusepi16_epi8^{⚠}  Experimentalavx512bw,avx512vl Convert packed unsigned 16bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtusepi32_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtusepi32_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 32bit integers in a to packed unsigned 16bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtusepi64_epi8^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtusepi64_epi16^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed unsigned 16bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_cvtusepi64_epi32^{⚠}  Experimentalavx512f,avx512vl Convert packed unsigned 64bit integers in a to packed unsigned 32bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_dbsad_epu8^{⚠}  Experimentalavx512bw,avx512vl Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8bit integers in a compared to those in b, and store the 16bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Four SADs are performed on four 8bit quadruplets for each 64bit lane. The first two SADs use the lower 8bit quadruplet of the lane from a, and the last two SADs use the uppper 8bit quadruplet of the lane from a. Quadruplets from b are selected from within 128bit lanes according to the control in imm8, and each SAD in each 64bit lane uses the selected quadruplet at 8bit offsets. 
_mm256_maskz_div_pd^{⚠}  Experimentalavx512f,avx512vl Divide packed doubleprecision (64bit) floatingpoint elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_div_ps^{⚠}  Experimentalavx512f,avx512vl Divide packed singleprecision (32bit) floatingpoint elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_dpbf16_ps^{⚠}  Experimentalavx512bf16,avx512vl Compute dotproduct of BF16 (16bit) floatingpoint pairs in a and b, accumulating the intermediate singleprecision (32bit) floatingpoint elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation 
_mm256_maskz_dpbusd_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 4 adjacent pairs of unsigned 8bit integers in a with corresponding signed 8bit integers in b, producing 4 intermediate signed 16bit results. Sum these 4 results with the corresponding 32bit integer in src, and store the packed 32bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_dpbusds_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 4 adjacent pairs of unsigned 8bit integers in a with corresponding signed 8bit integers in b, producing 4 intermediate signed 16bit results. Sum these 4 results with the corresponding 32bit integer in src using signed saturation, and store the packed 32bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_dpwssd_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 2 adjacent pairs of signed 16bit integers in a with corresponding 16bit integers in b, producing 2 intermediate signed 32bit results. Sum these 2 results with the corresponding 32bit integer in src, and store the packed 32bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_dpwssds_epi32^{⚠}  Experimentalavx512vnni,avx512vl Multiply groups of 2 adjacent pairs of signed 16bit integers in a with corresponding 16bit integers in b, producing 2 intermediate signed 32bit results. Sum these 2 results with the corresponding 32bit integer in src using signed saturation, and store the packed 32bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_expand_epi8^{⚠}  Experimentalavx512vbmi2,avx512vl Load contiguous active 8bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_expand_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Load contiguous active 16bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_expand_epi32^{⚠}  Experimentalavx512f,avx512vl Load contiguous active 32bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_expand_epi64^{⚠}  Experimentalavx512f,avx512vl Load contiguous active 64bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_expand_pd^{⚠}  Experimentalavx512f,avx512vl Load contiguous active doubleprecision (64bit) floatingpoint elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_expand_ps^{⚠}  Experimentalavx512f,avx512vl Load contiguous active singleprecision (32bit) floatingpoint elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_extractf32x4_ps^{⚠}  Experimentalavx512f,avx512vl Extract 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from a, selected with imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_extracti32x4_epi32^{⚠}  Experimentalavx512f,avx512vl Extract 128 bits (composed of 4 packed 32bit integers) from a, selected with IMM1, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fixupimm_pd^{⚠}  Experimentalavx512f,avx512vl Fix up packed doubleprecision (64bit) floatingpoint elements in a and b using packed 64bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting. 
_mm256_maskz_fixupimm_ps^{⚠}  Experimentalavx512f,avx512vl Fix up packed singleprecision (32bit) floatingpoint elements in a and b using packed 32bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting. 
_mm256_maskz_fmadd_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fmadd_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fmaddsub_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fmaddsub_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fmsub_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fmsub_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fmsubadd_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fmsubadd_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fnmadd_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fnmadd_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fnmsub_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_fnmsub_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_getexp_pd^{⚠}  Experimentalavx512f,avx512vl Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm256_maskz_getexp_ps^{⚠}  Experimentalavx512f,avx512vl Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm256_maskz_getmant_pd^{⚠}  Experimentalavx512f,avx512vl Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. 
_mm256_maskz_getmant_ps^{⚠}  Experimentalavx512f,avx512vl Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. 
_mm256_maskz_gf2p8affine_epi64_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512vl Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8bit immediate value. Each pack of 8 bytes in x is paired with the 64bit word at the same position in a. 
_mm256_maskz_gf2p8affineinv_epi64_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512vl Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64bit word at the same position in a. 
_mm256_maskz_gf2p8mul_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512vl Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. 
_mm256_maskz_insertf32x4^{⚠}  Experimentalavx512f,avx512vl Copy a to tmp, then insert 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_inserti32x4^{⚠}  Experimentalavx512f,avx512vl Copy a to tmp, then insert 128 bits (composed of 4 packed 32bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_lzcnt_epi32^{⚠}  Experimentalavx512cd,avx512vl Counts the number of leading zero bits in each packed 32bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_lzcnt_epi64^{⚠}  Experimentalavx512cd,avx512vl Counts the number of leading zero bits in each packed 64bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_madd_epi16^{⚠}  Experimentalavx512bw,avx512vl Multiply packed signed 16bit integers in a and b, producing intermediate signed 32bit integers. Horizontally add adjacent pairs of intermediate 32bit integers, and pack the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_maddubs_epi16^{⚠}  Experimentalavx512bw,avx512vl Multiply packed unsigned 8bit integers in a by packed signed 8bit integers in b, producing intermediate signed 16bit integers. Horizontally add adjacent pairs of intermediate signed 16bit integers, and pack the saturated results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_max_epi8^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_max_epi16^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_max_epi32^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_max_epi64^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_max_epu8^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_max_epu16^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_max_epu32^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_max_epu64^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_max_pd^{⚠}  Experimentalavx512f,avx512vl Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_max_ps^{⚠}  Experimentalavx512f,avx512vl Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_min_epi8^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 8bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_min_epi16^{⚠}  Experimentalavx512bw,avx512vl Compare packed signed 16bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_min_epi32^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 32bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_min_epi64^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_min_epu8^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 8bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_min_epu16^{⚠}  Experimentalavx512bw,avx512vl Compare packed unsigned 16bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_min_epu32^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 32bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_min_epu64^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_min_pd^{⚠}  Experimentalavx512f,avx512vl Compare packed doubleprecision (64bit) floatingpoint elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_min_ps^{⚠}  Experimentalavx512f,avx512vl Compare packed singleprecision (32bit) floatingpoint elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mov_epi8^{⚠}  Experimentalavx512bw,avx512vl Move packed 8bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mov_epi16^{⚠}  Experimentalavx512bw,avx512vl Move packed 16bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mov_epi32^{⚠}  Experimentalavx512f,avx512vl Move packed 32bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mov_epi64^{⚠}  Experimentalavx512f,avx512vl Move packed 64bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mov_pd^{⚠}  Experimentalavx512f,avx512vl Move packed doubleprecision (64bit) floatingpoint elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mov_ps^{⚠}  Experimentalavx512f,avx512vl Move packed singleprecision (32bit) floatingpoint elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_movedup_pd^{⚠}  Experimentalavx512f,avx512vl Duplicate evenindexed doubleprecision (64bit) floatingpoint elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_movehdup_ps^{⚠}  Experimentalavx512f,avx512vl Duplicate oddindexed singleprecision (32bit) floatingpoint elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_moveldup_ps^{⚠}  Experimentalavx512f,avx512vl Duplicate evenindexed singleprecision (32bit) floatingpoint elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mul_epi32^{⚠}  Experimentalavx512f,avx512vl Multiply the low signed 32bit integers from each packed 64bit element in a and b, and store the signed 64bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mul_epu32^{⚠}  Experimentalavx512f,avx512vl Multiply the low unsigned 32bit integers from each packed 64bit element in a and b, and store the unsigned 64bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mul_pd^{⚠}  Experimentalavx512f,avx512vl Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mul_ps^{⚠}  Experimentalavx512f,avx512vl Multiply packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mulhi_epi16^{⚠}  Experimentalavx512bw,avx512vl Multiply the packed signed 16bit integers in a and b, producing intermediate 32bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mulhi_epu16^{⚠}  Experimentalavx512bw,avx512vl Multiply the packed unsigned 16bit integers in a and b, producing intermediate 32bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mulhrs_epi16^{⚠}  Experimentalavx512bw,avx512vl Multiply packed signed 16bit integers in a and b, producing intermediate signed 32bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mullo_epi16^{⚠}  Experimentalavx512bw,avx512vl Multiply the packed 16bit integers in a and b, producing intermediate 32bit integers, and store the low 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_mullo_epi32^{⚠}  Experimentalavx512f,avx512vl Multiply the packed 32bit integers in a and b, producing intermediate 64bit integers, and store the low 32 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_multishift_epi64_epi8^{⚠}  Experimentalavx512vbmi,avx512vl For each 64bit element in b, select 8 unaligned bytes using a bytegranular shift control within the corresponding 64bit element of a, and store the 8 assembled bytes to the corresponding 64bit element of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_or_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise OR of packed 32bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_or_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise OR of packed 64bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_packs_epi16^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 16bit integers from a and b to packed 8bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_packs_epi32^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 32bit integers from a and b to packed 16bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_packus_epi16^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 16bit integers from a and b to packed 8bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_packus_epi32^{⚠}  Experimentalavx512bw,avx512vl Convert packed signed 32bit integers from a and b to packed 16bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permute_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permute_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutevar_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a within 128bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutevar_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutex2var_epi8^{⚠}  Experimentalavx512vbmi,avx512vl Shuffle 8bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutex2var_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutex2var_epi32^{⚠}  Experimentalavx512f,avx512vl Shuffle 32bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutex2var_epi64^{⚠}  Experimentalavx512f,avx512vl Shuffle 64bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutex2var_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutex2var_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutex_epi64^{⚠}  Experimentalavx512f,avx512vl Shuffle 64bit integers in a within 256bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutex_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a within 256bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutexvar_epi8^{⚠}  Experimentalavx512vbmi,avx512vl Shuffle 8bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutexvar_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutexvar_epi32^{⚠}  Experimentalavx512f,avx512vl Shuffle 32bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutexvar_epi64^{⚠}  Experimentalavx512f,avx512vl Shuffle 64bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutexvar_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_permutexvar_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_popcnt_epi8^{⚠}  Experimentalavx512bitalg,avx512vl For each packed 8bit integer maps the value to the number of logical 1 bits. 
_mm256_maskz_popcnt_epi16^{⚠}  Experimentalavx512bitalg,avx512vl For each packed 16bit integer maps the value to the number of logical 1 bits. 
_mm256_maskz_popcnt_epi32^{⚠}  Experimentalavx512vpopcntdq,avx512vl For each packed 32bit integer maps the value to the number of logical 1 bits. 
_mm256_maskz_popcnt_epi64^{⚠}  Experimentalavx512vpopcntdq,avx512vl For each packed 64bit integer maps the value to the number of logical 1 bits. 
_mm256_maskz_rcp14_pd^{⚠}  Experimentalavx512f,avx512vl Compute the approximate reciprocal of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm256_maskz_rcp14_ps^{⚠}  Experimentalavx512f,avx512vl Compute the approximate reciprocal of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm256_maskz_rol_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_rol_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_rolv_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_rolv_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_ror_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_ror_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_rorv_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_rorv_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_roundscale_pd^{⚠}  Experimentalavx512f,avx512vl Round packed doubleprecision (64bit) floatingpoint elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_roundscale_ps^{⚠}  Experimentalavx512f,avx512vl Round packed singleprecision (32bit) floatingpoint elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_rsqrt14_pd^{⚠}  Experimentalavx512f,avx512vl Compute the approximate reciprocal square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm256_maskz_rsqrt14_ps^{⚠}  Experimentalavx512f,avx512vl Compute the approximate reciprocal square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^14. 
_mm256_maskz_scalef_pd^{⚠}  Experimentalavx512f,avx512vl Scale the packed doubleprecision (64bit) floatingpoint elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_scalef_ps^{⚠}  Experimentalavx512f,avx512vl Scale the packed singleprecision (32bit) floatingpoint elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_set1_epi8^{⚠}  Experimentalavx512bw,avx512vl Broadcast 8bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_set1_epi16^{⚠}  Experimentalavx512bw,avx512vl Broadcast the low packed 16bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_set1_epi32^{⚠}  Experimentalavx512f,avx512vl Broadcast 32bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_set1_epi64^{⚠}  Experimentalavx512f,avx512vl Broadcast 64bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shldi_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in a and b producing an intermediate 32bit result. Shift the result left by imm8 bits, and store the upper 16bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shldi_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in a and b producing an intermediate 64bit result. Shift the result left by imm8 bits, and store the upper 32bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shldi_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in a and b producing an intermediate 128bit result. Shift the result left by imm8 bits, and store the upper 64bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shldv_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in a and b producing an intermediate 32bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shldv_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in a and b producing an intermediate 64bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shldv_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in a and b producing an intermediate 128bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shrdi_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in b and a producing an intermediate 32bit result. Shift the result right by imm8 bits, and store the lower 16bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shrdi_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in b and a producing an intermediate 64bit result. Shift the result right by imm8 bits, and store the lower 32bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shrdi_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in b and a producing an intermediate 128bit result. Shift the result right by imm8 bits, and store the lower 64bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shrdv_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in b and a producing an intermediate 32bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shrdv_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in b and a producing an intermediate 64bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shrdv_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in b and a producing an intermediate 128bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shuffle_epi8^{⚠}  Experimentalavx512bw,avx512vl Shuffle packed 8bit integers in a according to shuffle control mask in the corresponding 8bit element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shuffle_epi32^{⚠}  Experimentalavx512f,avx512vl Shuffle 32bit integers in a within 128bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shuffle_f32x4^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 4 singleprecision (32bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shuffle_f64x2^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 2 doubleprecision (64bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shuffle_i32x4^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 4 32bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shuffle_i64x2^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 2 64bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shuffle_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements within 128bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shuffle_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a within 128bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shufflehi_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in the high 64 bits of 128bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128bit lanes of dst, with the low 64 bits of 128bit lanes being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_shufflelo_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in the low 64 bits of 128bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128bit lanes of dst, with the high 64 bits of 128bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm256_maskz_sll_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sll_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sll_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_slli_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_slli_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_slli_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sllv_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sllv_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sllv_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sqrt_pd^{⚠}  Experimentalavx512f,avx512vl Compute the square root of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sqrt_ps^{⚠}  Experimentalavx512f,avx512vl Compute the square root of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sra_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sra_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sra_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srai_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srai_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srai_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srav_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srav_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srav_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srl_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srl_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srl_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srli_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srli_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srli_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srlv_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srlv_epi32^{⚠}  Experimentalavx512f,avx512vl Shift packed 32bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_srlv_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sub_epi8^{⚠}  Experimentalavx512bw,avx512vl Subtract packed 8bit integers in b from packed 8bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sub_epi16^{⚠}  Experimentalavx512bw,avx512vl Subtract packed 16bit integers in b from packed 16bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sub_epi32^{⚠}  Experimentalavx512f,avx512vl Subtract packed 32bit integers in b from packed 32bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sub_epi64^{⚠}  Experimentalavx512f,avx512vl Subtract packed 64bit integers in b from packed 64bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sub_pd^{⚠}  Experimentalavx512f,avx512vl Subtract packed doubleprecision (64bit) floatingpoint elements in b from packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_sub_ps^{⚠}  Experimentalavx512f,avx512vl Subtract packed singleprecision (32bit) floatingpoint elements in b from packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_subs_epi8^{⚠}  Experimentalavx512bw,avx512vl Subtract packed signed 8bit integers in b from packed 8bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_subs_epi16^{⚠}  Experimentalavx512bw,avx512vl Subtract packed signed 16bit integers in b from packed 16bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_subs_epu8^{⚠}  Experimentalavx512bw,avx512vl Subtract packed unsigned 8bit integers in b from packed unsigned 8bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_subs_epu16^{⚠}  Experimentalavx512bw,avx512vl Subtract packed unsigned 16bit integers in b from packed unsigned 16bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_ternarylogic_epi32^{⚠}  Experimentalavx512f,avx512vl Bitwise ternary logic that provides the capability to implement any threeoperand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 32bit granularity (32bit elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_ternarylogic_epi64^{⚠}  Experimentalavx512f,avx512vl Bitwise ternary logic that provides the capability to implement any threeoperand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 64bit granularity (64bit elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpackhi_epi8^{⚠}  Experimentalavx512bw,avx512vl Unpack and interleave 8bit integers from the high half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpackhi_epi16^{⚠}  Experimentalavx512bw,avx512vl Unpack and interleave 16bit integers from the high half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpackhi_epi32^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave 32bit integers from the high half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpackhi_epi64^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave 64bit integers from the high half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpackhi_pd^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave doubleprecision (64bit) floatingpoint elements from the high half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpackhi_ps^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave singleprecision (32bit) floatingpoint elements from the high half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpacklo_epi8^{⚠}  Experimentalavx512bw,avx512vl Unpack and interleave 8bit integers from the low half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpacklo_epi16^{⚠}  Experimentalavx512bw,avx512vl Unpack and interleave 16bit integers from the low half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpacklo_epi32^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave 32bit integers from the low half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpacklo_epi64^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave 64bit integers from the low half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpacklo_pd^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave doubleprecision (64bit) floatingpoint elements from the low half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_unpacklo_ps^{⚠}  Experimentalavx512f,avx512vl Unpack and interleave singleprecision (32bit) floatingpoint elements from the low half of each 128bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_xor_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise XOR of packed 32bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_maskz_xor_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise XOR of packed 64bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). 
_mm256_max_epi64^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b, and store packed maximum values in dst. 
_mm256_max_epu64^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b, and store packed maximum values in dst. 
_mm256_min_epi64^{⚠}  Experimentalavx512f,avx512vl Compare packed signed 64bit integers in a and b, and store packed minimum values in dst. 
_mm256_min_epu64^{⚠}  Experimentalavx512f,avx512vl Compare packed unsigned 64bit integers in a and b, and store packed minimum values in dst. 
_mm256_movepi8_mask^{⚠}  Experimentalavx512bw,avx512vl Set each bit of mask register k based on the most significant bit of the corresponding packed 8bit integer in a. 
_mm256_movepi16_mask^{⚠}  Experimentalavx512bw,avx512vl Set each bit of mask register k based on the most significant bit of the corresponding packed 16bit integer in a. 
_mm256_movm_epi8^{⚠}  Experimentalavx512bw,avx512vl Set each packed 8bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k. 
_mm256_movm_epi16^{⚠}  Experimentalavx512bw,avx512vl Set each packed 16bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k. 
_mm256_multishift_epi64_epi8^{⚠}  Experimentalavx512vbmi,avx512vl For each 64bit element in b, select 8 unaligned bytes using a bytegranular shift control within the corresponding 64bit element of a, and store the 8 assembled bytes to the corresponding 64bit element of dst. 
_mm256_or_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise OR of packed 32bit integers in a and b, and store the results in dst. 
_mm256_or_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise OR of packed 64bit integers in a and b, and store the resut in dst. 
_mm256_permutex2var_epi8^{⚠}  Experimentalavx512vbmi,avx512vl Shuffle 8bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst. 
_mm256_permutex2var_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst. 
_mm256_permutex2var_epi32^{⚠}  Experimentalavx512f,avx512vl Shuffle 32bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst. 
_mm256_permutex2var_epi64^{⚠}  Experimentalavx512f,avx512vl Shuffle 64bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst. 
_mm256_permutex2var_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst. 
_mm256_permutex2var_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst. 
_mm256_permutex_epi64^{⚠}  Experimentalavx512f,avx512vl Shuffle 64bit integers in a within 256bit lanes using the control in imm8, and store the results in dst. 
_mm256_permutex_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a within 256bit lanes using the control in imm8, and store the results in dst. 
_mm256_permutexvar_epi8^{⚠}  Experimentalavx512vbmi,avx512vl Shuffle 8bit integers in a across lanes using the corresponding index in idx, and store the results in dst. 
_mm256_permutexvar_epi16^{⚠}  Experimentalavx512bw,avx512vl Shuffle 16bit integers in a across lanes using the corresponding index in idx, and store the results in dst. 
_mm256_permutexvar_epi32^{⚠}  Experimentalavx512f,avx512vl Shuffle 32bit integers in a across lanes using the corresponding index in idx, and store the results in dst. 
_mm256_permutexvar_epi64^{⚠}  Experimentalavx512f,avx512vl Shuffle 64bit integers in a across lanes using the corresponding index in idx, and store the results in dst. 
_mm256_permutexvar_pd^{⚠}  Experimentalavx512f,avx512vl Shuffle doubleprecision (64bit) floatingpoint elements in a across lanes using the corresponding index in idx, and store the results in dst. 
_mm256_permutexvar_ps^{⚠}  Experimentalavx512f,avx512vl Shuffle singleprecision (32bit) floatingpoint elements in a across lanes using the corresponding index in idx. 
_mm256_popcnt_epi8^{⚠}  Experimentalavx512bitalg,avx512vl For each packed 8bit integer maps the value to the number of logical 1 bits. 
_mm256_popcnt_epi16^{⚠}  Experimentalavx512bitalg,avx512vl For each packed 16bit integer maps the value to the number of logical 1 bits. 
_mm256_popcnt_epi32^{⚠}  Experimentalavx512vpopcntdq,avx512vl For each packed 32bit integer maps the value to the number of logical 1 bits. 
_mm256_popcnt_epi64^{⚠}  Experimentalavx512vpopcntdq,avx512vl For each packed 64bit integer maps the value to the number of logical 1 bits. 
_mm256_rcp14_pd^{⚠}  Experimentalavx512f,avx512vl Compute the approximate reciprocal of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^14. 
_mm256_rcp14_ps^{⚠}  Experimentalavx512f,avx512vl Compute the approximate reciprocal of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^14. 
_mm256_rol_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in imm8, and store the results in dst. 
_mm256_rol_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in imm8, and store the results in dst. 
_mm256_rolv_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst. 
_mm256_rolv_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst. 
_mm256_ror_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in imm8, and store the results in dst. 
_mm256_ror_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in imm8, and store the results in dst. 
_mm256_rorv_epi32^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 32bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst. 
_mm256_rorv_epi64^{⚠}  Experimentalavx512f,avx512vl Rotate the bits in each packed 64bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst. 
_mm256_roundscale_pd^{⚠}  Experimentalavx512f,avx512vl Round packed doubleprecision (64bit) floatingpoint elements in a to the number of fraction bits specified by imm8, and store the results in dst. 
_mm256_roundscale_ps^{⚠}  Experimentalavx512f,avx512vl Round packed singleprecision (32bit) floatingpoint elements in a to the number of fraction bits specified by imm8, and store the results in dst. 
_mm256_scalef_pd^{⚠}  Experimentalavx512f,avx512vl Scale the packed doubleprecision (64bit) floatingpoint elements in a using values from b, and store the results in dst. 
_mm256_scalef_ps^{⚠}  Experimentalavx512f,avx512vl Scale the packed singleprecision (32bit) floatingpoint elements in a using values from b, and store the results in dst. 
_mm256_shldi_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in a and b producing an intermediate 32bit result. Shift the result left by imm8 bits, and store the upper 16bits in dst). 
_mm256_shldi_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in a and b producing an intermediate 64bit result. Shift the result left by imm8 bits, and store the upper 32bits in dst. 
_mm256_shldi_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in a and b producing an intermediate 128bit result. Shift the result left by imm8 bits, and store the upper 64bits in dst). 
_mm256_shldv_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in a and b producing an intermediate 32bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16bits in dst. 
_mm256_shldv_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in a and b producing an intermediate 64bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32bits in dst. 
_mm256_shldv_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in a and b producing an intermediate 128bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64bits in dst. 
_mm256_shrdi_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in b and a producing an intermediate 32bit result. Shift the result right by imm8 bits, and store the lower 16bits in dst. 
_mm256_shrdi_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in b and a producing an intermediate 64bit result. Shift the result right by imm8 bits, and store the lower 32bits in dst. 
_mm256_shrdi_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in b and a producing an intermediate 128bit result. Shift the result right by imm8 bits, and store the lower 64bits in dst. 
_mm256_shrdv_epi16^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 16bit integers in b and a producing an intermediate 32bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16bits in dst. 
_mm256_shrdv_epi32^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 32bit integers in b and a producing an intermediate 64bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32bits in dst. 
_mm256_shrdv_epi64^{⚠}  Experimentalavx512vbmi2,avx512vl Concatenate packed 64bit integers in b and a producing an intermediate 128bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64bits in dst. 
_mm256_shuffle_f32x4^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 4 singleprecision (32bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst. 
_mm256_shuffle_f64x2^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 2 doubleprecision (64bit) floatingpoint elements) selected by imm8 from a and b, and store the results in dst. 
_mm256_shuffle_i32x4^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 4 32bit integers) selected by imm8 from a and b, and store the results in dst. 
_mm256_shuffle_i64x2^{⚠}  Experimentalavx512f,avx512vl Shuffle 128bits (composed of 2 64bit integers) selected by imm8 from a and b, and store the results in dst. 
_mm256_sllv_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst. 
_mm256_sra_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by count while shifting in sign bits, and store the results in dst. 
_mm256_srai_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by imm8 while shifting in sign bits, and store the results in dst. 
_mm256_srav_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst. 
_mm256_srav_epi64^{⚠}  Experimentalavx512f,avx512vl Shift packed 64bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst. 
_mm256_srlv_epi16^{⚠}  Experimentalavx512bw,avx512vl Shift packed 16bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst. 
_mm256_store_epi32^{⚠}  Experimentalavx512f,avx512vl Store 256bits (composed of 8 packed 32bit integers) from a into memory. mem_addr must be aligned on a 32byte boundary or a generalprotection exception may be generated. 
_mm256_store_epi64^{⚠}  Experimentalavx512f,avx512vl Store 256bits (composed of 4 packed 64bit integers) from a into memory. mem_addr must be aligned on a 32byte boundary or a generalprotection exception may be generated. 
_mm256_storeu_epi8^{⚠}  Experimentalavx512bw,avx512vl Store 256bits (composed of 32 packed 8bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary. 
_mm256_storeu_epi16^{⚠}  Experimentalavx512bw,avx512vl Store 256bits (composed of 16 packed 16bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary. 
_mm256_storeu_epi32^{⚠}  Experimentalavx512f,avx512vl Store 256bits (composed of 8 packed 32bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary. 
_mm256_storeu_epi64^{⚠}  Experimentalavx512f,avx512vl Store 256bits (composed of 4 packed 64bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary. 
_mm256_ternarylogic_epi32^{⚠}  Experimentalavx512f,avx512vl Bitwise ternary logic that provides the capability to implement any threeoperand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst. 
_mm256_ternarylogic_epi64^{⚠}  Experimentalavx512f,avx512vl Bitwise ternary logic that provides the capability to implement any threeoperand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst. 
_mm256_test_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compute the bitwise AND of packed 8bit integers in a and b, producing intermediate 8bit values, and set the corresponding bit in result mask k if the intermediate value is nonzero. 
_mm256_test_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compute the bitwise AND of packed 16bit integers in a and b, producing intermediate 16bit values, and set the corresponding bit in result mask k if the intermediate value is nonzero. 
_mm256_test_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise AND of packed 32bit integers in a and b, producing intermediate 32bit values, and set the corresponding bit in result mask k if the intermediate value is nonzero. 
_mm256_test_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise AND of packed 64bit integers in a and b, producing intermediate 64bit values, and set the corresponding bit in result mask k if the intermediate value is nonzero. 
_mm256_testn_epi8_mask^{⚠}  Experimentalavx512bw,avx512vl Compute the bitwise NAND of packed 8bit integers in a and b, producing intermediate 8bit values, and set the corresponding bit in result mask k if the intermediate value is zero. 
_mm256_testn_epi16_mask^{⚠}  Experimentalavx512bw,avx512vl Compute the bitwise NAND of packed 16bit integers in a and b, producing intermediate 16bit values, and set the corresponding bit in result mask k if the intermediate value is zero. 
_mm256_testn_epi32_mask^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise NAND of packed 32bit integers in a and b, producing intermediate 32bit values, and set the corresponding bit in result mask k if the intermediate value is zero. 
_mm256_testn_epi64_mask^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise NAND of packed 64bit integers in a and b, producing intermediate 64bit values, and set the corresponding bit in result mask k if the intermediate value is zero. 
_mm256_xor_epi32^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise XOR of packed 32bit integers in a and b, and store the results in dst. 
_mm256_xor_epi64^{⚠}  Experimentalavx512f,avx512vl Compute the bitwise XOR of packed 64bit integers in a and b, and store the results in dst. 
_mm512_abs_epi8^{⚠}  Experimentalavx512bw Compute the absolute value of packed signed 8bit integers in a, and store the unsigned results in dst. 
_mm512_abs_epi16^{⚠}  Experimentalavx512bw Compute the absolute value of packed signed 16bit integers in a, and store the unsigned results in dst. 
_mm512_abs_epi32^{⚠}  Experimentalavx512f Computes the absolute values of packed 32bit integers in 
_mm512_abs_epi64^{⚠}  Experimentalavx512f Compute the absolute value of packed signed 64bit integers in a, and store the unsigned results in dst. 
_mm512_abs_pd^{⚠}  Experimentalavx512f Finds the absolute value of each packed doubleprecision (64bit) floatingpoint element in v2, storing the results in dst. 
_mm512_abs_ps^{⚠}  Experimentalavx512f Finds the absolute value of each packed singleprecision (32bit) floatingpoint element in v2, storing the results in dst. 
_mm512_add_epi8^{⚠}  Experimentalavx512bw Add packed 8bit integers in a and b, and store the results in dst. 
_mm512_add_epi16^{⚠}  Experimentalavx512bw Add packed 16bit integers in a and b, and store the results in dst. 
_mm512_add_epi32^{⚠}  Experimentalavx512f Add packed 32bit integers in a and b, and store the results in dst. 
_mm512_add_epi64^{⚠}  Experimentalavx512f Add packed 64bit integers in a and b, and store the results in dst. 
_mm512_add_pd^{⚠}  Experimentalavx512f Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_add_ps^{⚠}  Experimentalavx512f Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_add_round_pd^{⚠}  Experimentalavx512f Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_add_round_ps^{⚠}  Experimentalavx512f Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst. 
_mm512_adds_epi8^{⚠}  Experimentalavx512bw Add packed signed 8bit integers in a and b using saturation, and store the results in dst. 
_mm512_adds_epi16^{⚠}  Experimentalavx512bw Add packed signed 16bit integers in a and b using saturation, and store the results in dst. 
_mm512_adds_epu8^{⚠}  Experimentalavx512bw Add packed unsigned 8bit integers in a and b using saturation, and store the results in dst. 
_mm512_adds_epu16^{⚠}  Experimentalavx512bw Add packed unsigned 16bit integers in a and b using saturation, and store the results in dst. 
_mm512_aesdec_epi128^{⚠}  Experimentalavx512vaes,avx512f Performs one round of an AES decryption flow on each 128bit word (state) in 
_mm512_aesdeclast_epi128^{⚠}  Experimentalavx512vaes,avx512f Performs the last round of an AES decryption flow on each 128bit word (state) in 
_mm512_aesenc_epi128^{⚠}  Experimentalavx512vaes,avx512f Performs one round of an AES encryption flow on each 128bit word (state) in 
_mm512_aesenclast_epi128^{⚠}  Experimentalavx512vaes,avx512f Performs the last round of an AES encryption flow on each 128bit word (state) in 
_mm512_alignr_epi8^{⚠}  Experimentalavx512bw Concatenate pairs of 16byte blocks in a and b into a 32byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst. 
_mm512_alignr_epi32^{⚠}  Experimentalavx512f Concatenate a and b into a 128byte immediate result, shift the result right by imm8 32bit elements, and store the low 64 bytes (16 elements) in dst. 
_mm512_alignr_epi64^{⚠}  Experimentalavx512f Concatenate a and b into a 128byte immediate result, shift the result right by imm8 64bit elements, and store the low 64 bytes (8 elements) in dst. 
_mm512_and_epi32^{⚠}  Experimentalavx512f Compute the bitwise AND of packed 32bit integers in a and b, and store the results in dst. 
_mm512_and_epi64^{⚠}  Experimentalavx512f Compute the bitwise AND of 512 bits (composed of packed 64bit integers) in a and b, and store the results in dst. 
_mm512_and_si512^{⚠}  Experimentalavx512f Compute the bitwise AND of 512 bits (representing integer data) in a and b, and store the result in dst. 
_mm512_andnot_epi32^{⚠}  Experimentalavx512f Compute the bitwise NOT of packed 32bit integers in a and then AND with b, and store the results in dst. 
_mm512_andnot_epi64^{⚠}  Experimentalavx512f Compute the bitwise NOT of 512 bits (composed of packed 64bit integers) in a and then AND with b, and store the results in dst. 
_mm512_andnot_si512^{⚠}  Experimentalavx512f Compute the bitwise NOT of 512 bits (representing integer data) in a and then AND with b, and store the result in dst. 
_mm512_avg_epu8^{⚠}  Experimentalavx512bw Average packed unsigned 8bit integers in a and b, and store the results in dst. 
_mm512_avg_epu16^{⚠}  Experimentalavx512bw Average packed unsigned 16bit integers in a and b, and store the results in dst. 
_mm512_bitshuffle_epi64_mask^{⚠}  Experimentalavx512bitalg Considers the input 
_mm512_broadcast_f32x4^{⚠}  Experimentalavx512f Broadcast the 4 packed singleprecision (32bit) floatingpoint elements from a to all elements of dst. 
_mm512_broadcast_f64x4^{⚠}  Experimentalavx512f Broadcast the 4 packed doubleprecision (64bit) floatingpoint elements from a to all elements of dst. 
_mm512_broadcast_i32x4^{⚠}  Experimentalavx512f Broadcast the 4 packed 32bit integers from a to all elements of dst. 
_mm512_broadcast_i64x4^{⚠}  Experimentalavx512f Broadcast the 4 packed 64bit integers from a to all elements of dst. 
_mm512_broadcastb_epi8^{⚠}  Experimentalavx512bw Broadcast the low packed 8bit integer from a to all elements of dst. 
_mm512_broadcastd_epi32^{⚠}  Experimentalavx512f Broadcast the low packed 32bit integer from a to all elements of dst. 
_mm512_broadcastmb_epi64^{⚠}  Experimentalavx512cd Broadcast the low 8bits from input mask k to all 64bit elements of dst. 
_mm512_broadcastmw_epi32^{⚠}  Experimentalavx512cd Broadcast the low 16bits from input mask k to all 32bit elements of dst. 
_mm512_broadcastq_epi64^{⚠}  Experimentalavx512f Broadcast the low packed 64bit integer from a to all elements of dst. 
_mm512_broadcastsd_pd^{⚠}  Experimentalavx512f Broadcast the low doubleprecision (64bit) floatingpoint element from a to all elements of dst. 
_mm512_broadcastss_ps^{⚠}  Experimentalavx512f Broadcast the low singleprecision (32bit) floatingpoint element from a to all elements of dst. 
_mm512_broadcastw_epi16^{⚠}  Experimentalavx512bw Broadcast the low packed 16bit integer from a to all elements of dst. 
_mm512_bslli_epi128^{⚠}  Experimentalavx512bw Shift 128bit lanes in a left by imm8 bytes while shifting in zeros, and store the results in dst. 
_mm512_bsrli_epi128^{⚠}  Experimentalavx512bw Shift 128bit lanes in a right by imm8 bytes while shifting in zeros, and store the results in dst. 
_mm512_castpd128_pd512^{⚠}  Experimentalavx512f Cast vector of type __m128d to type __m512d; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castpd256_pd512^{⚠}  Experimentalavx512f Cast vector of type __m256d to type __m512d; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castpd512_pd128^{⚠}  Experimentalavx512f Cast vector of type __m512d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castpd512_pd256^{⚠}  Experimentalavx512f Cast vector of type __m512d to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castpd_ps^{⚠}  Experimentalavx512f Cast vector of type __m512d to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castpd_si512^{⚠}  Experimentalavx512f Cast vector of type __m512d to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps128_ps512^{⚠}  Experimentalavx512f Cast vector of type __m128 to type __m512; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps256_ps512^{⚠}  Experimentalavx512f Cast vector of type __m256 to type __m512; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps512_ps128^{⚠}  Experimentalavx512f Cast vector of type __m512 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps512_ps256^{⚠}  Experimentalavx512f Cast vector of type __m512 to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps_pd^{⚠}  Experimentalavx512f Cast vector of type __m512 to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castps_si512^{⚠}  Experimentalavx512f Cast vector of type __m512 to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi128_si512^{⚠}  Experimentalavx512f Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi256_si512^{⚠}  Experimentalavx512f Cast vector of type __m256i to type __m512i; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi512_pd^{⚠}  Experimentalavx512f Cast vector of type __m512i to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi512_ps^{⚠}  Experimentalavx512f Cast vector of type __m512i to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi512_si128^{⚠}  Experimentalavx512f Cast vector of type __m512i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_castsi512_si256^{⚠}  Experimentalavx512f Cast vector of type __m512i to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. 
_mm512_clmulepi64_epi128^{⚠}  Experimentalavx512vpclmulqdq,avx512f Performs a carryless multiplication of two 64bit polynomials over the finite field GF(2^k)  in each of the 4 128bit lanes. 
_mm512_cmp_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmp_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmp_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmp_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmp_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmp_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b based on the comparison operand specified by 
_mm512_cmp_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmp_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmp_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmp_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmp_round_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmp_round_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k. 
_mm512_cmpeq_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for equality, and store the results in mask vector k. 
_mm512_cmpeq_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for equality, and store the results in mask vector k. 
_mm512_cmpeq_epi32_mask^{⚠}  Experimentalavx512f Compare packed 32bit integers in a and b for equality, and store the results in mask vector k. 
_mm512_cmpeq_epi64_mask^{⚠}  Experimentalavx512f Compare packed 64bit integers in a and b for equality, and store the results in mask vector k. 
_mm512_cmpeq_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for equality, and store the results in mask vector k. 
_mm512_cmpeq_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for equality, and store the results in mask vector k. 
_mm512_cmpeq_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for equality, and store the results in mask vector k. 
_mm512_cmpeq_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for equality, and store the results in mask vector k. 
_mm512_cmpeq_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for equality, and store the results in mask vector k. 
_mm512_cmpeq_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for equality, and store the results in mask vector k. 
_mm512_cmpge_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm512_cmpge_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm512_cmpge_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm512_cmpge_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm512_cmpge_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm512_cmpge_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm512_cmpge_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm512_cmpge_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for greaterthanorequal, and store the results in mask vector k. 
_mm512_cmpgt_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm512_cmpgt_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm512_cmpgt_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm512_cmpgt_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm512_cmpgt_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm512_cmpgt_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm512_cmpgt_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm512_cmpgt_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for greaterthan, and store the results in mask vector k. 
_mm512_cmple_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm512_cmple_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm512_cmple_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm512_cmple_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm512_cmple_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm512_cmple_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm512_cmple_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm512_cmple_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for lessthanorequal, and store the results in mask vector k. 
_mm512_cmple_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for lessthanorequal, and store the results in mask vector k. 
_mm512_cmple_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for lessthanorequal, and store the results in mask vector k. 
_mm512_cmplt_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm512_cmplt_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm512_cmplt_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm512_cmplt_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm512_cmplt_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm512_cmplt_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm512_cmplt_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm512_cmplt_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for lessthan, and store the results in mask vector k. 
_mm512_cmplt_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for lessthan, and store the results in mask vector k. 
_mm512_cmplt_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for lessthan, and store the results in mask vector k. 
_mm512_cmpneq_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for notequal, and store the results in mask vector k. 
_mm512_cmpneq_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for notequal, and store the results in mask vector k. 
_mm512_cmpneq_epi32_mask^{⚠}  Experimentalavx512f Compare packed 32bit integers in a and b for notequal, and store the results in mask vector k. 
_mm512_cmpneq_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for notequal, and store the results in mask vector k. 
_mm512_cmpneq_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for notequal, and store the results in mask vector k. 
_mm512_cmpneq_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for notequal, and store the results in mask vector k. 
_mm512_cmpneq_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for notequal, and store the results in mask vector k. 
_mm512_cmpneq_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for notequal, and store the results in mask vector k. 
_mm512_cmpneq_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for notequal, and store the results in mask vector k. 
_mm512_cmpneq_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for notequal, and store the results in mask vector k. 
_mm512_cmpnle_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for notlessthanorequal, and store the results in mask vector k. 
_mm512_cmpnle_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for notlessthanorequal, and store the results in mask vector k. 
_mm512_cmpnlt_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for notlessthan, and store the results in mask vector k. 
_mm512_cmpnlt_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for notlessthan, and store the results in mask vector k. 
_mm512_cmpord_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b to see if neither is NaN, and store the results in mask vector k. 
_mm512_cmpord_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b to see if neither is NaN, and store the results in mask vector k. 
_mm512_cmpunord_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b to see if either is NaN, and store the results in mask vector k. 
_mm512_cmpunord_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b to see if either is NaN, and store the results in mask vector k. 
_mm512_conflict_epi32^{⚠}  Experimentalavx512cd Test each 32bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst. 
_mm512_conflict_epi64^{⚠}  Experimentalavx512cd Test each 64bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst. 
_mm512_cvt_roundepi32_ps^{⚠}  Experimentalavx512f Convert packed signed 32bit integers in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst. 
_mm512_cvt_roundepu32_ps^{⚠}  Experimentalavx512f Convert packed unsigned 32bit integers in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst. 
_mm512_cvt_roundpd_epi32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst. 
_mm512_cvt_roundpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst. 
_mm512_cvt_roundpd_ps^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst. 
_mm512_cvt_roundph_ps^{⚠}  Experimentalavx512f Convert packed halfprecision (16bit) floatingpoint elements in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst. 
_mm512_cvt_roundps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst. 
_mm512_cvt_roundps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst. 
_mm512_cvt_roundps_pd^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst. 
_mm512_cvt_roundps_ph^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed halfprecision (16bit) floatingpoint elements, and store the results in dst. 
_mm512_cvtepi8_epi16^{⚠}  Experimentalavx512bw Sign extend packed 8bit integers in a to packed 16bit integers, and store the results in dst. 
_mm512_cvtepi8_epi32^{⚠}  Experimentalavx512f Sign extend packed 8bit integers in a to packed 32bit integers, and store the results in dst. 
_mm512_cvtepi8_epi64^{⚠}  Experimentalavx512f Sign extend packed 8bit integers in the low 8 bytes of a to packed 64bit integers, and store the results in dst. 
_mm512_cvtepi16_epi8^{⚠}  Experimentalavx512bw Convert packed 16bit integers in a to packed 8bit integers with truncation, and store the results in dst. 
_mm512_cvtepi16_epi32^{⚠}  Experimentalavx512f Sign extend packed 16bit integers in a to packed 32bit integers, and store the results in dst. 
_mm512_cvtepi16_epi64^{⚠}  Experimentalavx512f Sign extend packed 16bit integers in a to packed 64bit integers, and store the results in dst. 
_mm512_cvtepi32_epi8^{⚠}  Experimentalavx512f Convert packed 32bit integers in a to packed 8bit integers with truncation, and store the results in dst. 
_mm512_cvtepi32_epi16^{⚠}  Experimentalavx512f Convert packed 32bit integers in a to packed 16bit integers with truncation, and store the results in dst. 
_mm512_cvtepi32_epi64^{⚠}  Experimentalavx512f Sign extend packed 32bit integers in a to packed 64bit integers, and store the results in dst. 
_mm512_cvtepi32_pd^{⚠}  Experimentalavx512f Convert packed signed 32bit integers in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst. 
_mm512_cvtepi32_ps^{⚠}  Experimentalavx512f Convert packed signed 32bit integers in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst. 
_mm512_cvtepi32lo_pd^{⚠}  Experimentalavx512f Performs elementbyelement conversion of the lower half of packed 32bit integer elements in v2 to packed doubleprecision (64bit) floatingpoint elements, storing the results in dst. 
_mm512_cvtepi64_epi8^{⚠}  Experimentalavx512f Convert packed 64bit integers in a to packed 8bit integers with truncation, and store the results in dst. 
_mm512_cvtepi64_epi16^{⚠}  Experimentalavx512f Convert packed 64bit integers in a to packed 16bit integers with truncation, and store the results in dst. 
_mm512_cvtepi64_epi32^{⚠}  Experimentalavx512f Convert packed 64bit integers in a to packed 32bit integers with truncation, and store the results in dst. 
_mm512_cvtepu8_epi16^{⚠}  Experimentalavx512bw Zero extend packed unsigned 8bit integers in a to packed 16bit integers, and store the results in dst. 
_mm512_cvtepu8_epi32^{⚠}  Experimentalavx512f Zero extend packed unsigned 8bit integers in a to packed 32bit integers, and store the results in dst. 
_mm512_cvtepu8_epi64^{⚠}  Experimentalavx512f Zero extend packed unsigned 8bit integers in the low 8 byte sof a to packed 64bit integers, and store the results in dst. 
_mm512_cvtepu16_epi32^{⚠}  Experimentalavx512f Zero extend packed unsigned 16bit integers in a to packed 32bit integers, and store the results in dst. 
_mm512_cvtepu16_epi64^{⚠}  Experimentalavx512f Zero extend packed unsigned 16bit integers in a to packed 64bit integers, and store the results in dst. 
_mm512_cvtepu32_epi64^{⚠}  Experimentalavx512f Zero extend packed unsigned 32bit integers in a to packed 64bit integers, and store the results in dst. 
_mm512_cvtepu32_pd^{⚠}  Experimentalavx512f Convert packed unsigned 32bit integers in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst. 
_mm512_cvtepu32_ps^{⚠}  Experimentalavx512f Convert packed unsigned 32bit integers in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst. 
_mm512_cvtepu32lo_pd^{⚠}  Experimentalavx512f Performs elementbyelement conversion of the lower half of packed 32bit unsigned integer elements in v2 to packed doubleprecision (64bit) floatingpoint elements, storing the results in dst. 
_mm512_cvtne2ps_pbh^{⚠}  Experimentalavx512bf16,avx512f Convert packed singleprecision (32bit) floatingpoint elements in two 512bit vectors
a and b to packed BF16 (16bit) floatingpoint elements, and store the results in a 
_mm512_cvtneps_pbh^{⚠}  Experimentalavx512bf16,avx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed BF16 (16bit) floatingpoint elements, and store the results in dst. Intel’s documentation 
_mm512_cvtpd_epi32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst. 
_mm512_cvtpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst. 
_mm512_cvtpd_ps^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst. 
_mm512_cvtpd_pslo^{⚠}  Experimentalavx512f Performs an elementbyelement conversion of packed doubleprecision (64bit) floatingpoint elements in v2 to singleprecision (32bit) floatingpoint elements and stores them in dst. The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0. 
_mm512_cvtph_ps^{⚠}  Experimentalavx512f Convert packed halfprecision (16bit) floatingpoint elements in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst. 
_mm512_cvtps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst. 
_mm512_cvtps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst. 
_mm512_cvtps_pd^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst. 
_mm512_cvtps_ph^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed halfprecision (16bit) floatingpoint elements, and store the results in dst. 
_mm512_cvtpslo_pd^{⚠}  Experimentalavx512f Performs elementbyelement conversion of the lower half of packed singleprecision (32bit) floatingpoint elements in v2 to packed doubleprecision (64bit) floatingpoint elements, storing the results in dst. 
_mm512_cvtsepi16_epi8^{⚠}  Experimentalavx512bw Convert packed signed 16bit integers in a to packed 8bit integers with signed saturation, and store the results in dst. 
_mm512_cvtsepi32_epi8^{⚠}  Experimentalavx512f Convert packed signed 32bit integers in a to packed 8bit integers with signed saturation, and store the results in dst. 
_mm512_cvtsepi32_epi16^{⚠}  Experimentalavx512f Convert packed signed 32bit integers in a to packed 16bit integers with signed saturation, and store the results in dst. 
_mm512_cvtsepi64_epi8^{⚠}  Experimentalavx512f Convert packed signed 64bit integers in a to packed 8bit integers with signed saturation, and store the results in dst. 
_mm512_cvtsepi64_epi16^{⚠}  Experimentalavx512f Convert packed signed 64bit integers in a to packed 16bit integers with signed saturation, and store the results in dst. 
_mm512_cvtsepi64_epi32^{⚠}  Experimentalavx512f Convert packed signed 64bit integers in a to packed 32bit integers with signed saturation, and store the results in dst. 
_mm512_cvtsi512_si32^{⚠}  Experimentalavx512f Copy the lower 32bit integer in a to dst. 
_mm512_cvtt_roundpd_epi32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst. 
_mm512_cvtt_roundpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst. 
_mm512_cvtt_roundps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst. 
_mm512_cvtt_roundps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst. 
_mm512_cvttpd_epi32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst. 
_mm512_cvttpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst. 
_mm512_cvttps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers with truncation, and store the results in dst. 
_mm512_cvttps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers with truncation, and store the results in dst. 
_mm512_cvtusepi16_epi8^{⚠}  Experimentalavx512bw Convert packed unsigned 16bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst. 
_mm512_cvtusepi32_epi8^{⚠}  Experimentalavx512f Convert packed unsigned 32bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst. 
_mm512_cvtusepi32_epi16^{⚠}  Experimentalavx512f Convert packed unsigned 32bit integers in a to packed unsigned 16bit integers with unsigned saturation, and store the results in dst. 
_mm512_cvtusepi64_epi8^{⚠}  Experimentalavx512f Convert packed unsigned 64bit integers in a to packed unsigned 8bit integers with unsigned saturation, and store the results in dst. 
_mm512_cvtusepi64_epi16^{⚠}  Experimentalavx512f Convert packed unsigned 64bit integers in a to packed unsigned 16bit integers with unsigned saturation, and store the results in dst. 
_mm512_cvtusepi64_epi32^{⚠}  Experimentalavx512f Convert packed unsigned 64bit integers in a to packed unsigned 32bit integers with unsigned saturation, and store the results in dst. 
_mm512_dbsad_epu8^{⚠}  Experimentalavx512bw Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8bit integers in a compared to those in b, and store the 16bit results in dst. Four SADs are performed on four 8bit quadruplets for each 64bit lane. The first two SADs use the lower 8bit quadruplet of the lane from a, and the last two SADs use the uppper 8bit quadruplet of the lane from a. Quadruplets from b are selected from within 128bit lanes according to the control in imm8, and each SAD in each 64bit lane uses the selected quadruplet at 8bit offsets. 
_mm512_div_pd^{⚠}  Experimentalavx512f Divide packed doubleprecision (64bit) floatingpoint elements in a by packed elements in b, and store the results in dst. 
_mm512_div_ps^{⚠}  Experimentalavx512f Divide packed singleprecision (32bit) floatingpoint elements in a by packed elements in b, and store the results in dst. 
_mm512_div_round_pd^{⚠}  Experimentalavx512f Divide packed doubleprecision (64bit) floatingpoint elements in a by packed elements in b, =and store the results in dst. 
_mm512_div_round_ps^{⚠}  Experimentalavx512f Divide packed singleprecision (32bit) floatingpoint elements in a by packed elements in b, and store the results in dst. 
_mm512_dpbf16_ps^{⚠}  Experimentalavx512bf16,avx512f Compute dotproduct of BF16 (16bit) floatingpoint pairs in a and b, accumulating the intermediate singleprecision (32bit) floatingpoint elements with elements in src, and store the results in dst.Compute dotproduct of BF16 (16bit) floatingpoint pairs in a and b, accumulating the intermediate singleprecision (32bit) floatingpoint elements with elements in src, and store the results in dst. Intel’s documentation 
_mm512_dpbusd_epi32^{⚠}  Experimentalavx512vnni Multiply groups of 4 adjacent pairs of unsigned 8bit integers in a with corresponding signed 8bit integers in b, producing 4 intermediate signed 16bit results. Sum these 4 results with the corresponding 32bit integer in src, and store the packed 32bit results in dst. 
_mm512_dpbusds_epi32^{⚠}  Experimentalavx512vnni Multiply groups of 4 adjacent pairs of unsigned 8bit integers in a with corresponding signed 8bit integers in b, producing 4 intermediate signed 16bit results. Sum these 4 results with the corresponding 32bit integer in src using signed saturation, and store the packed 32bit results in dst. 
_mm512_dpwssd_epi32^{⚠}  Experimentalavx512vnni Multiply groups of 2 adjacent pairs of signed 16bit integers in a with corresponding 16bit integers in b, producing 2 intermediate signed 32bit results. Sum these 2 results with the corresponding 32bit integer in src, and store the packed 32bit results in dst. 
_mm512_dpwssds_epi32^{⚠}  Experimentalavx512vnni Multiply groups of 2 adjacent pairs of signed 16bit integers in a with corresponding 16bit integers in b, producing 2 intermediate signed 32bit results. Sum these 2 results with the corresponding 32bit integer in src using signed saturation, and store the packed 32bit results in dst. 
_mm512_extractf32x4_ps^{⚠}  Experimentalavx512f Extract 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from a, selected with imm8, and store the result in dst. 
_mm512_extractf64x4_pd^{⚠}  Experimentalavx512f Extract 256 bits (composed of 4 packed doubleprecision (64bit) floatingpoint elements) from a, selected with imm8, and store the result in dst. 
_mm512_extracti32x4_epi32^{⚠}  Experimentalavx512f Extract 128 bits (composed of 4 packed 32bit integers) from a, selected with IMM2, and store the result in dst. 
_mm512_extracti64x4_epi64^{⚠}  Experimentalavx512f Extract 256 bits (composed of 4 packed 64bit integers) from a, selected with IMM1, and store the result in dst. 
_mm512_fixupimm_pd^{⚠}  Experimentalavx512f Fix up packed doubleprecision (64bit) floatingpoint elements in a and b using packed 64bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting. 
_mm512_fixupimm_ps^{⚠}  Experimentalavx512f Fix up packed singleprecision (32bit) floatingpoint elements in a and b using packed 32bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting. 
_mm512_fixupimm_round_pd^{⚠}  Experimentalavx512f Fix up packed doubleprecision (64bit) floatingpoint elements in a and b using packed 64bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting. 
_mm512_fixupimm_round_ps^{⚠}  Experimentalavx512f Fix up packed singleprecision (32bit) floatingpoint elements in a and b using packed 32bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting. 
_mm512_fmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst. 
_mm512_fmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst. 
_mm512_fmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst. 
_mm512_fmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst. 
_mm512_fmaddsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst. 
_mm512_fmaddsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst. 
_mm512_fmaddsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst. 
_mm512_fmaddsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst. 
_mm512_fmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst. 
_mm512_fmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst. 
_mm512_fmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst. 
_mm512_fmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst. 
_mm512_fmsubadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst. 
_mm512_fmsubadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst. 
_mm512_fmsubadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst. 
_mm512_fmsubadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst. 
_mm512_fnmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst. 
_mm512_fnmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst. 
_mm512_fnmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst. 
_mm512_fnmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst. 
_mm512_fnmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst. 
_mm512_fnmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst. 
_mm512_fnmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst. 
_mm512_fnmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst. 
_mm512_getexp_pd^{⚠}  Experimentalavx512f Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm512_getexp_ps^{⚠}  Experimentalavx512f Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm512_getexp_round_pd^{⚠}  Experimentalavx512f Convert the exponent of each packed doubleprecision (64bit) floatingpoint element in a to a doubleprecision (64bit) floatingpoint number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm512_getexp_round_ps^{⚠}  Experimentalavx512f Convert the exponent of each packed singleprecision (32bit) floatingpoint element in a to a singleprecision (32bit) floatingpoint number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element. 
_mm512_getmant_pd^{⚠}  Experimentalavx512f Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. 
_mm512_getmant_ps^{⚠}  Experimentalavx512f Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1 
_mm512_getmant_round_pd^{⚠}  Experimentalavx512f Normalize the mantissas of packed doubleprecision (64bit) floatingpoint elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. 
_mm512_getmant_round_ps^{⚠}  Experimentalavx512f Normalize the mantissas of packed singleprecision (32bit) floatingpoint elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*x.significand, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. 
_mm512_gf2p8affine_epi64_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512f Performs an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8bit immediate value. Each pack of 8 bytes in x is paired with the 64bit word at the same position in a. 
_mm512_gf2p8affineinv_epi64_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512f Performs an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64bit word at the same position in a. 
_mm512_gf2p8mul_epi8^{⚠}  Experimentalavx512gfni,avx512bw,avx512f Performs a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. 
_mm512_i32gather_epi32^{⚠}  Experimentalavx512f Gather 32bit integers from memory using 32bit indices. 32bit elements are loaded from addresses starting at base_addr and offset by each 32bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8. 
_mm512_i32gather_epi64^{⚠}  Experimentalavx512f Gather 64bit integers from memory using 32bit indices. 64bit elements are loaded from addresses starting at base_addr and offset by each 32bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8. 
_mm512_i32gather_pd^{⚠}  Experimentalavx512f Gather doubleprecision (64bit) floatingpoint elements from memory using 32bit indices. 64bit elements are loaded from addresses starting at base_addr and offset by each 32bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8. 
_mm512_i32gather_ps^{⚠}  Experimentalavx512f Gather singleprecision (32bit) floatingpoint elements from memory using 32bit indices. 32bit elements are loaded from addresses starting at base_addr and offset by each 32bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8. 
_mm512_i32scatter_epi32^{⚠}  Experimentalavx512f Scatter 32bit integers from a into memory using 32bit indices. 32bit elements are stored at addresses starting at base_addr and offset by each 32bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8. 
_mm512_i32scatter_epi64^{⚠}  Experimentalavx512f Scatter 64bit integers from a into memory using 32bit indices. 64bit elements are stored at addresses starting at base_addr and offset by each 32bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8. 
_mm512_i32scatter_pd^{⚠}  Experimentalavx512f Scatter doubleprecision (64bit) floatingpoint elements from a into memory using 32bit indices. 64bit elements are stored at addresses starting at base_addr and offset by each 32bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8. 
_mm512_i32scatter_ps^{⚠}  Experimentalavx512f Scatter singleprecision (32bit) floatingpoint elements from a into memory using 32bit indices. 32bit elements are stored at addresses starting at base_addr and offset by each 32bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8. 
_mm512_i64gather_epi32^{⚠}  Experimentalavx512f Gather 32bit integers from memory using 64bit indices. 32bit elements are loaded from addresses starting at base_addr and offset by each 64bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8. 
_mm512_i64gather_epi64^{⚠}  Experimentalavx512f Gather 64bit integers from memory using 64bit indices. 64bit elements are loaded from addresses starting at base_addr and offset by each 64bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8. 
_mm512_i64gather_pd^{⚠}  Experimentalavx512f Gather doubleprecision (64bit) floatingpoint elements from memory using 64bit indices. 64bit elements are loaded from addresses starting at base_addr and offset by each 64bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8. 
_mm512_i64gather_ps^{⚠}  Experimentalavx512f Gather singleprecision (32bit) floatingpoint elements from memory using 64bit indices. 32bit elements are loaded from addresses starting at base_addr and offset by each 64bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8. 
_mm512_i64scatter_epi32^{⚠}  Experimentalavx512f Scatter 32bit integers from a into memory using 64bit indices. 32bit elements are stored at addresses starting at base_addr and offset by each 64bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8. 
_mm512_i64scatter_epi64^{⚠}  Experimentalavx512f Scatter 64bit integers from a into memory using 64bit indices. 64bit elements are stored at addresses starting at base_addr and offset by each 64bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8. 
_mm512_i64scatter_pd^{⚠}  Experimentalavx512f Scatter doubleprecision (64bit) floatingpoint elements from a into memory using 64bit indices. 64bit elements are stored at addresses starting at base_addr and offset by each 64bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8. 
_mm512_i64scatter_ps^{⚠}  Experimentalavx512f Scatter singleprecision (32bit) floatingpoint elements from a into memory using 64bit indices. 32bit elements are stored at addresses starting at base_addr and offset by each 64bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8. 
_mm512_insertf32x4^{⚠}  Experimentalavx512f Copy a to dst, then insert 128 bits (composed of 4 packed singleprecision (32bit) floatingpoint elements) from b into dst at the location specified by imm8. 
_mm512_insertf64x4^{⚠}  Experimentalavx512f Copy a to dst, then insert 256 bits (composed of 4 packed doubleprecision (64bit) floatingpoint elements) from b into dst at the location specified by imm8. 
_mm512_inserti32x4^{⚠}  Experimentalavx512f Copy a to dst, then insert 128 bits (composed of 4 packed 32bit integers) from b into dst at the location specified by imm8. 
_mm512_inserti64x4^{⚠}  Experimentalavx512f Copy a to dst, then insert 256 bits (composed of 4 packed 64bit integers) from b into dst at the location specified by imm8. 
_mm512_int2mask^{⚠}  Experimentalavx512f Converts integer mask into bitmask, storing the result in dst. 
_mm512_kand^{⚠}  Experimentalavx512f Compute the bitwise AND of 16bit masks a and b, and store the result in k. 
_mm512_kandn^{⚠}  Experimentalavx512f Compute the bitwise NOT of 16bit masks a and then AND with b, and store the result in k. 
_mm512_kmov^{⚠}  Experimentalavx512f Copy 16bit mask a to k. 
_mm512_knot^{⚠}  Experimentalavx512f Compute the bitwise NOT of 16bit mask a, and store the result in k. 
_mm512_kor^{⚠}  Experimentalavx512f Compute the bitwise OR of 16bit masks a and b, and store the result in k. 
_mm512_kortestc^{⚠}  Experimentalavx512f Performs bitwise OR between k1 and k2, storing the result in dst. CF flag is set if dst consists of all 1’s. 
_mm512_kunpackb^{⚠}  Experimentalavx512f Unpack and interleave 8 bits from masks a and b, and store the 16bit result in k. 
_mm512_kxnor^{⚠}  Experimentalavx512f Compute the bitwise XNOR of 16bit masks a and b, and store the result in k. 
_mm512_kxor^{⚠}  Experimentalavx512f Compute the bitwise XOR of 16bit masks a and b, and store the result in k. 
_mm512_load_epi32^{⚠}  Experimentalavx512f Load 512bits (composed of 16 packed 32bit integers) from memory into dst. mem_addr must be aligned on a 64byte boundary or a generalprotection exception may be generated. 
_mm512_load_epi64^{⚠}  Experimentalavx512f Load 512bits (composed of 8 packed 64bit integers) from memory into dst. mem_addr must be aligned on a 64byte boundary or a generalprotection exception may be generated. 
_mm512_load_pd^{⚠}  Experimentalavx512f Load 512bits (composed of 8 packed doubleprecision (64bit) floatingpoint elements) from memory into dst. mem_addr must be aligned on a 64byte boundary or a generalprotection exception may be generated. 
_mm512_load_ps^{⚠}  Experimentalavx512f Load 512bits (composed of 16 packed singleprecision (32bit) floatingpoint elements) from memory into dst. mem_addr must be aligned on a 64byte boundary or a generalprotection exception may be generated. 
_mm512_load_si512^{⚠}  Experimentalavx512f Load 512bits of integer data from memory into dst. mem_addr must be aligned on a 64byte boundary or a generalprotection exception may be generated. 
_mm512_loadu_epi8^{⚠}  Experimentalavx512bw Load 512bits (composed of 64 packed 8bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary. 
_mm512_loadu_epi16^{⚠}  Experimentalavx512bw Load 512bits (composed of 32 packed 16bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary. 
_mm512_loadu_epi32^{⚠}  Experimentalavx512f Load 512bits (composed of 16 packed 32bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary. 
_mm512_loadu_epi64^{⚠}  Experimentalavx512f Load 512bits (composed of 8 packed 64bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary. 
_mm512_loadu_pd^{⚠}  Experimentalavx512f Loads 512bits (composed of 8 packed doubleprecision (64bit)
floatingpoint elements) from memory into result.

_mm512_loadu_ps^{⚠}  Experimentalavx512f Loads 512bits (composed of 16 packed singleprecision (32bit)
floatingpoint elements) from memory into result.

_mm512_loadu_si512^{⚠}  Experimentalavx512f Load 512bits of integer data from memory into dst. mem_addr does not need to be aligned on any particular boundary. 
_mm512_lzcnt_epi32^{⚠}  Experimentalavx512cd Counts the number of leading zero bits in each packed 32bit integer in a, and store the results in dst. 
_mm512_lzcnt_epi64^{⚠}  Experimentalavx512cd Counts the number of leading zero bits in each packed 64bit integer in a, and store the results in dst. 
_mm512_madd52hi_epu64^{⚠}  Experimentalavx512ifma Multiply packed unsigned 52bit integers in each 64bit element of

_mm512_madd52lo_epu64^{⚠}  Experimentalavx512ifma Multiply packed unsigned 52bit integers in each 64bit element of

_mm512_madd_epi16^{⚠}  Experimentalavx512bw Multiply packed signed 16bit integers in a and b, producing intermediate signed 32bit integers. Horizontally add adjacent pairs of intermediate 32bit integers, and pack the results in dst. 
_mm512_maddubs_epi16^{⚠}  Experimentalavx512bw Vertically multiply each unsigned 8bit integer from a with the corresponding signed 8bit integer from b, producing intermediate signed 16bit integers. Horizontally add adjacent pairs of intermediate signed 16bit integers, and pack the saturated results in dst. 
_mm512_mask2_permutex2var_epi8^{⚠}  Experimentalavx512vbmi Shuffle 8bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). 
_mm512_mask2_permutex2var_epi16^{⚠}  Experimentalavx512bw Shuffle 16bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm512_mask2_permutex2var_epi32^{⚠}  Experimentalavx512f Shuffle 32bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm512_mask2_permutex2var_epi64^{⚠}  Experimentalavx512f Shuffle 64bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm512_mask2_permutex2var_pd^{⚠}  Experimentalavx512f Shuffle doubleprecision (64bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set) 
_mm512_mask2_permutex2var_ps^{⚠}  Experimentalavx512f Shuffle singleprecision (32bit) floatingpoint elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set). 
_mm512_mask2int^{⚠}  Experimentalavx512f Converts bit mask k1 into an integer value, storing the results in dst. 
_mm512_mask3_fmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmaddsub_pd^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmaddsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmaddsub_round_pd^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmaddsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsubadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsubadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsubadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fmsubadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmadd_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmadd_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmadd_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmadd_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmsub_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmsub_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmsub_round_pd^{⚠}  Experimentalavx512f Multiply packed doubleprecision (64bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask3_fnmsub_round_ps^{⚠}  Experimentalavx512f Multiply packed singleprecision (32bit) floatingpoint elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set). 
_mm512_mask_abs_epi8^{⚠}  Experimentalavx512bw Compute the absolute value of packed signed 8bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_abs_epi16^{⚠}  Experimentalavx512bw Compute the absolute value of packed signed 16bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_abs_epi32^{⚠}  Experimentalavx512f Computes the absolute value of packed 32bit integers in 
_mm512_mask_abs_epi64^{⚠}  Experimentalavx512f Compute the absolute value of packed signed 64bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_abs_pd^{⚠}  Experimentalavx512f Finds the absolute value of each packed doubleprecision (64bit) floatingpoint element in v2, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_abs_ps^{⚠}  Experimentalavx512f Finds the absolute value of each packed singleprecision (32bit) floatingpoint element in v2, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_epi8^{⚠}  Experimentalavx512bw Add packed 8bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_epi16^{⚠}  Experimentalavx512bw Add packed 16bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_epi32^{⚠}  Experimentalavx512f Add packed 32bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_epi64^{⚠}  Experimentalavx512f Add packed 64bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_pd^{⚠}  Experimentalavx512f Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_ps^{⚠}  Experimentalavx512f Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_round_pd^{⚠}  Experimentalavx512f Add packed doubleprecision (64bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_add_round_ps^{⚠}  Experimentalavx512f Add packed singleprecision (32bit) floatingpoint elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_adds_epi8^{⚠}  Experimentalavx512bw Add packed signed 8bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_adds_epi16^{⚠}  Experimentalavx512bw Add packed signed 16bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_adds_epu8^{⚠}  Experimentalavx512bw Add packed unsigned 8bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_adds_epu16^{⚠}  Experimentalavx512bw Add packed unsigned 16bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_alignr_epi8^{⚠}  Experimentalavx512bw Concatenate pairs of 16byte blocks in a and b into a 32byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_alignr_epi32^{⚠}  Experimentalavx512f Concatenate a and b into a 128byte immediate result, shift the result right by imm8 32bit elements, and store the low 64 bytes (16 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_alignr_epi64^{⚠}  Experimentalavx512f Concatenate a and b into a 128byte immediate result, shift the result right by imm8 64bit elements, and store the low 64 bytes (8 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_and_epi32^{⚠}  Experimentalavx512f Performs elementbyelement bitwise AND between packed 32bit integer elements of a and b, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_and_epi64^{⚠}  Experimentalavx512f Compute the bitwise AND of packed 64bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_andnot_epi32^{⚠}  Experimentalavx512f Compute the bitwise NOT of packed 32bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_andnot_epi64^{⚠}  Experimentalavx512f Compute the bitwise NOT of packed 64bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_avg_epu8^{⚠}  Experimentalavx512bw Average packed unsigned 8bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_avg_epu16^{⚠}  Experimentalavx512bw Average packed unsigned 16bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_bitshuffle_epi64_mask^{⚠}  Experimentalavx512bitalg Considers the input 
_mm512_mask_blend_epi8^{⚠}  Experimentalavx512bw Blend packed 8bit integers from a and b using control mask k, and store the results in dst. 
_mm512_mask_blend_epi16^{⚠}  Experimentalavx512bw Blend packed 16bit integers from a and b using control mask k, and store the results in dst. 
_mm512_mask_blend_epi32^{⚠}  Experimentalavx512f Blend packed 32bit integers from a and b using control mask k, and store the results in dst. 
_mm512_mask_blend_epi64^{⚠}  Experimentalavx512f Blend packed 64bit integers from a and b using control mask k, and store the results in dst. 
_mm512_mask_blend_pd^{⚠}  Experimentalavx512f Blend packed doubleprecision (64bit) floatingpoint elements from a and b using control mask k, and store the results in dst. 
_mm512_mask_blend_ps^{⚠}  Experimentalavx512f Blend packed singleprecision (32bit) floatingpoint elements from a and b using control mask k, and store the results in dst. 
_mm512_mask_broadcast_f32x4^{⚠}  Experimentalavx512f Broadcast the 4 packed singleprecision (32bit) floatingpoint elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcast_f64x4^{⚠}  Experimentalavx512f Broadcast the 4 packed doubleprecision (64bit) floatingpoint elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcast_i32x4^{⚠}  Experimentalavx512f Broadcast the 4 packed 32bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcast_i64x4^{⚠}  Experimentalavx512f Broadcast the 4 packed 64bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcastb_epi8^{⚠}  Experimentalavx512bw Broadcast the low packed 8bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcastd_epi32^{⚠}  Experimentalavx512f Broadcast the low packed 32bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcastq_epi64^{⚠}  Experimentalavx512f Broadcast the low packed 64bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcastsd_pd^{⚠}  Experimentalavx512f Broadcast the low doubleprecision (64bit) floatingpoint element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcastss_ps^{⚠}  Experimentalavx512f Broadcast the low singleprecision (32bit) floatingpoint element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_broadcastw_epi16^{⚠}  Experimentalavx512bw Broadcast the low packed 16bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cmp_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_round_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmp_round_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epi32_mask^{⚠}  Experimentalavx512f Compare packed 32bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epi64_mask^{⚠}  Experimentalavx512f Compare packed 64bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpeq_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpge_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for greaterthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpgt_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for greaterthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmple_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epi32_mask^{⚠}  Experimentalavx512f Compare packed signed 32bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for lessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmplt_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for lessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epi8_mask^{⚠}  Experimentalavx512bw Compare packed signed 8bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epi16_mask^{⚠}  Experimentalavx512bw Compare packed signed 16bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epi32_mask^{⚠}  Experimentalavx512f Compare packed 32bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epi64_mask^{⚠}  Experimentalavx512f Compare packed signed 64bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epu8_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 8bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epu16_mask^{⚠}  Experimentalavx512bw Compare packed unsigned 16bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epu32_mask^{⚠}  Experimentalavx512f Compare packed unsigned 32bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_epu64_mask^{⚠}  Experimentalavx512f Compare packed unsigned 64bit integers in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpneq_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for notequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpnle_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for notlessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpnle_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for notlessthanorequal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpnlt_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b for notlessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpnlt_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b for notlessthan, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpord_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b to see if neither is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpord_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b to see if neither is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpunord_pd_mask^{⚠}  Experimentalavx512f Compare packed doubleprecision (64bit) floatingpoint elements in a and b to see if either is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_cmpunord_ps_mask^{⚠}  Experimentalavx512f Compare packed singleprecision (32bit) floatingpoint elements in a and b to see if either is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). 
_mm512_mask_compress_epi8^{⚠}  Experimentalavx512vbmi2 Contiguously store the active 8bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm512_mask_compress_epi16^{⚠}  Experimentalavx512vbmi2 Contiguously store the active 16bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm512_mask_compress_epi32^{⚠}  Experimentalavx512f Contiguously store the active 32bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm512_mask_compress_epi64^{⚠}  Experimentalavx512f Contiguously store the active 64bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm512_mask_compress_pd^{⚠}  Experimentalavx512f Contiguously store the active doubleprecision (64bit) floatingpoint elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm512_mask_compress_ps^{⚠}  Experimentalavx512f Contiguously store the active singleprecision (32bit) floatingpoint elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src. 
_mm512_mask_conflict_epi32^{⚠}  Experimentalavx512cd Test each 32bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst. 
_mm512_mask_conflict_epi64^{⚠}  Experimentalavx512cd Test each 64bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst. 
_mm512_mask_cvt_roundepi32_ps^{⚠}  Experimentalavx512f Convert packed signed 32bit integers in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundepu32_ps^{⚠}  Experimentalavx512f Convert packed unsigned 32bit integers in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundpd_epi32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundpd_epu32^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundpd_ps^{⚠}  Experimentalavx512f Convert packed doubleprecision (64bit) floatingpoint elements in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundph_ps^{⚠}  Experimentalavx512f Convert packed halfprecision (16bit) floatingpoint elements in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundps_epi32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundps_epu32^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed unsigned 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundps_pd^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvt_roundps_ph^{⚠}  Experimentalavx512f Convert packed singleprecision (32bit) floatingpoint elements in a to packed halfprecision (16bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi8_epi16^{⚠}  Experimentalavx512bw Sign extend packed 8bit integers in a to packed 16bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi8_epi32^{⚠}  Experimentalavx512f Sign extend packed 8bit integers in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi8_epi64^{⚠}  Experimentalavx512f Sign extend packed 8bit integers in the low 8 bytes of a to packed 64bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi16_epi8^{⚠}  Experimentalavx512bw Convert packed 16bit integers in a to packed 8bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi16_epi32^{⚠}  Experimentalavx512f Sign extend packed 16bit integers in a to packed 32bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi16_epi64^{⚠}  Experimentalavx512f Sign extend packed 16bit integers in a to packed 64bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi16_storeu_epi8^{⚠}  Experimentalavx512bw Convert packed 16bit integers in a to packed 8bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr. 
_mm512_mask_cvtepi32_epi8^{⚠}  Experimentalavx512f Convert packed 32bit integers in a to packed 8bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi32_epi16^{⚠}  Experimentalavx512f Convert packed 32bit integers in a to packed 16bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi32_epi64^{⚠}  Experimentalavx512f Sign extend packed 32bit integers in a to packed 64bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi32_pd^{⚠}  Experimentalavx512f Convert packed signed 32bit integers in a to packed doubleprecision (64bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 
_mm512_mask_cvtepi32_ps^{⚠}  Experimentalavx512f Convert packed signed 32bit integers in a to packed singleprecision (32bit) floatingpoint elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). 