Commit Graph

2292 Commits

Author SHA1 Message Date
Clément Bœsch
4bb4fa28e3 Merge commit '5801f9ed245ca5ebb57b0b5183de7a24aaece133'
* commit '5801f9ed245ca5ebb57b0b5183de7a24aaece133':
  h264_intrapred: x86: Update comments left behind in 95c89da36e

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 11:58:01 +01:00
Clément Bœsch
9954d5b44e Merge commit 'd9dccc03890a976dba59d66ed3b5aceeaa33d14c'
* commit 'd9dccc03890a976dba59d66ed3b5aceeaa33d14c':
  hevc: x86: Refactor IDCT macro declarations

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 11:54:53 +01:00
James Almer
30cadfe071 avcodec/lossless_videodsp: use ptrdiff_t for length parameters
Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-22 18:38:35 -03:00
Clément Bœsch
af607b7e07 lavc/huffyuvdsp: only transmit the pix_fmt instead of the whole avctx
Only the pixel format is required in that init function. This will also
simplify the incoming merge.
2017-03-22 16:22:20 +01:00
Clément Bœsch
c66bd8f3ff Merge commit 'b57e38f52cc3f31a27105c28887d57cd6812c3eb'
* commit 'b57e38f52cc3f31a27105c28887d57cd6812c3eb':
  ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-22 12:49:29 +01:00
Clément Bœsch
e39d4ff150 Merge commit '43717469f9daa402f6acb48997255827a56034e9'
* commit '43717469f9daa402f6acb48997255827a56034e9':
  ac3dsp: Reverse matrix in/out order in downmix()

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-22 11:29:46 +01:00
James Almer
aee046a895 x86/audiodsp: remove an unnecessary movss 2017-03-22 00:14:56 -03:00
James Almer
9a0fbb9ca9 Merge commit '2caa93b813adc5dbb7771dfe615da826a2947d18'
* commit '2caa93b813adc5dbb7771dfe615da826a2947d18':
  mpegaudiodsp: Change type of array stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 16:04:22 -03:00
James Almer
a8474df944 Merge commit 'e4a94d8b36c48d95a7d412c40d7b558422ff659c'
* commit 'e4a94d8b36c48d95a7d412c40d7b558422ff659c':
  h264chroma: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 15:20:45 -03:00
James Almer
5a49097b42 Merge commit '2ec9fa5ec60dcd10e1cb10d8b4e4437e634ea428'
* commit '2ec9fa5ec60dcd10e1cb10d8b4e4437e634ea428':
  idct: Change type of array stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 14:29:52 -03:00
Clément Bœsch
f54da138e9 Merge commit '009adfd4fbdd78a890a4a65d6f141c467bb027fa'
* commit '009adfd4fbdd78a890a4a65d6f141c467bb027fa':
  x86: fpel: Remove unnecessary sign extend

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-21 15:02:31 +01:00
Clément Bœsch
ad98af27f7 Merge commit 'de2ae3c1fae5a2eb539b9abd7bc2a9ca8c286ff0'
* commit 'de2ae3c1fae5a2eb539b9abd7bc2a9ca8c286ff0':
  lavc: add clobber tests for the new encoding/decoding API

The merge only re-order what we already have.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-21 14:43:53 +01:00
Clément Bœsch
83cd80d10a Merge commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5'
* commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5':
  audiodsp/x86: yasmify vector_clipf_sse
  audiodsp: reorder arguments for vector_clipf

Merged the version from Libav after a discussion with James Almer on
IRC:

19:22 <ubitux> jamrial: opinion on 12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5?
19:23 <ubitux> it was apparently yasmified differently
19:23 <ubitux> (it depends on the previous commit arg shuffle)
19:24 <ubitux> i don't see the magic movsxdifnidn in your port btw
19:24 <ubitux> it's a port from 1d36defe94
19:25 <jamrial> seems better thanks to said arg shuffle
19:25 <jamrial> the loop is the same, but init is simpler
19:25 <jamrial> probably worth merging
19:25 <ubitux> OK
19:25 <ubitux> thanks
19:26 <jamrial> curious they didn't make len ptrdiff_t after the previous bunch of commits, heh
19:26 <ubitux> yeah indeed

Both commits are merged at the same time to prevent a conflict with our
existing yasmified ff_vector_clipf_sse.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 22:35:07 +01:00
Clément Bœsch
43a4c729d4 Merge commit '75d98e30afab61542faab3c0f11880834653bd6b'
* commit '75d98e30afab61542faab3c0f11880834653bd6b':
  audiodsp/x86: clear the high bits of the order parameter on 64bit

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:44:00 +01:00
Clément Bœsch
072fad7cf5 Merge commit '1d6c76e11febb58738c9647c47079d02b5e10094'
* commit '1d6c76e11febb58738c9647c47079d02b5e10094':
  audiodsp/x86: fix ff_vector_clip_int32_sse2

No functionnal changes, only cosmetics. This issue was fixed in
9a9e2f1c8a.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:42:37 +01:00
Clément Bœsch
e07fa3008b Merge commit 'de452e503734ebb0fdbce86e9d16693b3530fad3'
* commit 'de452e503734ebb0fdbce86e9d16693b3530fad3':
  pixblockdsp: Change type of stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 15:58:32 +01:00
Ilia
2f3d10a01a avcodec/vp9: avx2 implementation of ipred_dl_16x16_16
vp9_diag_downleft_16x16_10bpp_c: 263.0
vp9_diag_downleft_16x16_10bpp_sse2: 44.7
vp9_diag_downleft_16x16_10bpp_ssse3: 32.5
vp9_diag_downleft_16x16_10bpp_avx: 31.9
vp9_diag_downleft_16x16_10bpp_avx2: 25.7
vp9_diag_downleft_16x16_12bpp_c: 264.7
vp9_diag_downleft_16x16_12bpp_sse2: 44.4
vp9_diag_downleft_16x16_12bpp_ssse3: 32.0
vp9_diag_downleft_16x16_12bpp_avx: 32.4
vp9_diag_downleft_16x16_12bpp_avx2: 25.5

Benchmarked with 10000 runs

Signed-off-by: Ilia <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-03-20 09:47:43 -04:00
Mirage Abeysekara
5eb4f95bef h264pred: added AVX2 implementation for tm_vp8 16x16.
checkasm --bench results with 5000 runs

pred16x16_tm_vp8_c: 302.8
pred16x16_tm_vp8_mmx: 101.4
pred16x16_tm_vp8_mmxext: 95.5
pred16x16_tm_vp8_sse2: 95.1
pred16x16_tm_vp8_avx2: 38.2

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-03-20 09:45:42 -04:00
James Almer
6966a5e4d7 Merge commit '721d57e608dc4fd6c86f27c5ae76ef559d646220'
* commit '721d57e608dc4fd6c86f27c5ae76ef559d646220':
  vp56: Separate VP5 and VP6 dsp initialization

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 17:15:24 -03:00
James Almer
663640d745 Merge commit '3fd22538bc0e0de84b31335266b4b1577d3d609e'
* commit '3fd22538bc0e0de84b31335266b4b1577d3d609e':
  prores: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:30:13 -03:00
James Almer
aec42ebc27 Merge commit 'f81be06cf614919d71ded29b8f595bef40123ad8'
* commit 'f81be06cf614919d71ded29b8f595bef40123ad8':
  cavs: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:23:52 -03:00
James Almer
4e4dfcac58 Merge commit '802727b538b484e3f9d1345bfcc4ab24cfea8898'
* commit '802727b538b484e3f9d1345bfcc4ab24cfea8898':
  vp8: Update some assembly comments left unchanged in bd66f073fe

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:18:31 -03:00
James Almer
4004d33fcb Merge commit 'd9d26a3674f31f482f54e936fcb382160830877a'
* commit 'd9d26a3674f31f482f54e936fcb382160830877a':
  vp56: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 14:54:25 -03:00
Clément Bœsch
6a42a54b9d Merge commit '6892df9294d93322d43255ada299507465bc93c8'
* commit '6892df9294d93322d43255ada299507465bc93c8':
  vp3: Change type of stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-19 18:41:26 +01:00
Clément Bœsch
8695ce73ca Merge commit 'e2b9993558b6adee42dcc6eb385a14943aaca974'
* commit 'e2b9993558b6adee42dcc6eb385a14943aaca974':
  simple_idct: x86: Drop disabled IDCT implementation

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-19 16:11:11 +01:00
Clément Bœsch
8286c359ad Merge commit 'e99ecda55082cb9dde8fd349361e169dc383943a'
* commit 'e99ecda55082cb9dde8fd349361e169dc383943a':
  checkasm: add vp9 MC tests.
  vp9mc/x86: sse2 MC assembly.
  vp9mc/x86: add AVX and AVX2 MC
  vp9mc/x86: rename ff_* to ff_vp9_*
  vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
  vp9mc/x86: simplify a few inits.
  vp9mc/x86: add 16px functions (64bit only).

Noop (aside from a formatting comment in vp9mc.asm). We already have all
of this. We should consider making a final diff between the two projects
when the dust comes down.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-16 20:25:39 +01:00
Clément Bœsch
a4f5e79f7c Merge commit '89466de4aeaf5e359489b81b8a9920a2bc7936d6'
* commit '89466de4aeaf5e359489b81b8a9920a2bc7936d6':
  vp9/x86: rename vp9dsp to vp9mc

File was already renamed, only the top description is updated.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-16 20:10:47 +01:00
James Almer
e632fe9bab Merge commit '3c504bc3599f00bfc5923adc114beef34bce11d0'
* commit '3c504bc3599f00bfc5923adc114beef34bce11d0':
  x86: deduplicate some constants

Merged-by: James Almer <jamrial@gmail.com>
2017-03-15 22:07:28 -03:00
Michael Niedermayer
835d9f299c avcodec/x86/cavsdsp: Put MMX code under mmx check
Without this the FPU state becomes trashed and causes mysterious
fate failures with cpuflags=0

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-06 16:47:17 +01:00
James Darnley
33de0fee2c avcodec/h264: enable sse2 chroma deblock/loop filter functions
Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad.
Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.
2017-02-27 13:22:06 +01:00
James Darnley
cd893b9307 avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filter
~1.37x faster (147 vs. 108 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
0e16b3e2be avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filter
~1.10x faster (69 vs. 63 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
987ffe4b8d avcodec/h264: add avx 8-bit chroma v intra deblock/loop filter
~1.14x faster (90 vs 78 cycles) compared with mmxext
2017-02-27 13:22:06 +01:00
James Darnley
88307b3eec avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter
~1.21x faster (68 vs. 56 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
ac096fc82d avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter
~1.14x faster (93 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5c56758843 avcodec/h264: add avx 8-bit chroma v deblock/loop filter
~1.24x faster (101 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5336887867 avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
x86-64 only

Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)

Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)

Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx:  ~3.29x (370 vs. 112 cycles)
2017-02-18 20:26:52 +01:00
James Darnley
e18bc2114f avcodec/h264: add named parameters to x86 function 2017-02-18 20:26:50 +01:00
James Darnley
9d815b7424 avcodec/x86: deduplicate PASS8ROWS macro 2017-02-18 20:26:49 +01:00
James Almer
c8467abbad x86/rv34dsp: add ff_rv34_idct_dc_add_sse2
Also disable ff_rv34_idct_dc_add_mmx on x86_64 as the presence of sse2
is guaranteed in such builds.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-02-02 17:51:21 -03:00
James Almer
ab5c4d006d x86/vp8dsp: add ff_vp8_idct_dc_add_sse2
Also disable ff_vp8_idct_dc_add_mmx on x86_64 as the presence of sse2
is guaranteed in such builds.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-02-02 17:18:58 -03:00
Michael Niedermayer
536ac72f46 Revert "Merge commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553'"
The assumption this is based on is wrong, the code is not always run with bitexact flags

This reverts commit a956164e1e, reversing
changes made to f6005907fd.

Approved-by: James Almer <jamrial@gmail.com>
2017-02-01 02:01:07 +01:00
James Almer
ba5d089381 Merge commit 'd06dfaa5cbdd20acfd2364b16c0f4ae4ddb30a65'
* commit 'd06dfaa5cbdd20acfd2364b16c0f4ae4ddb30a65':
  x86: huffyuv: Use EXTERNAL_SSSE3_FAST convenience macro where appropriate

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:36:49 -03:00
James Almer
ac774cfa57 Merge commit '4efab89332ea39a77145e8b15562b981d9dbde68'
* commit '4efab89332ea39a77145e8b15562b981d9dbde68':
  x86: Use *_FAST/*_SLOW CPU feature detection macros where appropriate

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:08:19 -03:00
James Almer
a956164e1e Merge commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553'
* commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553':
  x86: hpeldsp: Don't check for bitexact flag when initializing VP3-specific code

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 14:59:29 -03:00
James Almer
f6005907fd Merge commit '95c1df929b92d81454656c222a35ec5f7db576b4'
* commit '95c1df929b92d81454656c222a35ec5f7db576b4':
  x86: hpeldsp: Drop unused function parameters

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 14:56:11 -03:00
James Almer
4d0e89ce27 Merge commit 'c3e83ad3b7d75f3597f47ada2616ba4479665009'
* commit 'c3e83ad3b7d75f3597f47ada2616ba4479665009':
  x86: hpeldsp: Use EXTERNAL_SSE2_FAST where appropriate

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 14:53:27 -03:00
James Almer
ca8a3978e5 Merge commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5'
* commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5':
  x86: hpeldsp: Split off VP3-specific bits into a separate file

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 14:49:29 -03:00
Clément Bœsch
7c300a8ed4 lavc/hevc: remove a few random spaces to reduce diff with libav 2017-01-31 17:02:24 +01:00
Clément Bœsch
78d16eb452 Merge commit 'fca3c3b61952aacc45e9ca54d86a762946c21942'
* commit 'fca3c3b61952aacc45e9ca54d86a762946c21942':
  hevc: Add AVX2 DC IDCT

Mostly noop as we already have that code.

In the ASM, code is merged with the exception of SECTION which is kept
uppercase for consistency with the rest of the codebase.

Still in the ASM, the prototype comment is fixed to honor the '_' added
from the original commit.

idct_dc_proto() is dropped as it's not used anymore here.

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31 16:53:37 +01:00