In Sun, 7 Apr 2019 23:06:45 +0500
Nikita Zlobin <cook60020tmp@mail.ru> wrote:
> I really did not recognize that nasty trick, clearing xmm0 :).
> Also i understood, why SSE can't be used there. Without integer
> division support it is undoable with SSE - replacing with
> multiplication means conversion to float.
>
I recently discovered fast integer division algorythm, allowing to
accelerate multiple divisions with same divisor. I got working this
way, but then discovered that gcc uses this method, so it is still
doable by SSE. Though from other side, i still can't find enough
places, where benefit of working with colors as single integers rather
than separate color values would be meaningful... one such place is
accumulator, used for averaging. While input is uint8_t[4], accumulator
is uint16_t[4]. I have to either work with them by elements or use
masks, bitshifts and OR for each element... just to prepare single
value and store (either uing32_t[2] or just one uint64_t).
Looks like benchmarks are necessary, along with these intrinsics, to
test, wether integer SSE really better than what gcc proposes.
_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
https://lists.linuxaudio.org/listinfo/linux-audio-dev
Received on Wed Apr 10 16:15:01 2019
This archive was generated by hypermail 2.1.8 : Wed Apr 10 2019 - 16:15:01 EEST