FLV spec only has AVC end of sequence tag, and the EOS tag has a
CodecID as other video data packet. MPEG4 doesn't conformance to
the spec, but it's there for a decade. So only 'fix' the EOS tag
rather than remove it completely.
Reviewed-by: Steven Liu <lq@chinaffmpeg.org>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
HLS segments may be MPEG-TS or fragmented MP4, so those (de)muxers are
required for reading/writing HLS media segments.
Fixes functionality with --disable-everything --enable-demuxer=hls
--enable-muxer=hls
When encode RGB frame, Intel driver convert RGB to YUV, so we cannot
set RGB colorspace to VPL/MSDK.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
CC libavcodec/libx265.o
libavcodec/libx265.c: In function ‘libx265_encode_frame’:
libavcodec/libx265.c:781:5: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
else
^~~~
libavcodec/libx265.c:782:1: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘else’
FF_DISABLE_DEPRECATION_WARNINGS
^~~
Signed-off-by: Marton Balint <cus@passwd.hu>
ISOBMFF (14496-12) made this field ('channelcount') in the
AudioSampleEntry structure non-template¹ somewhere before the
release of the 2022 edition. As for ETSI TS 126 244 AKA 3GPP
file format (V16.1.0, 2020-10), it does not seem contain any
references limiting the channelcount entry in AudioSampleEntry
or in its own definition of EVSSampleEntry.
fate-mov-mp4-chapters test had to be adjusted as it output a
mono vorbis stream, which would now be properly marked as such
in the container.
1: As per 14496-12:
Fields shown as “template” in the box descriptions are fields
which are coded with a default value unless a derived
specification defines their use and permits writers to use
other values than the default.
When building FFMPEG in the MSYS environment under Windows, one
must not use forward slashes ('/') for command-line options. It
appears that the MSYS shell interprets these as absolute paths and
then automatically rewrites them into equivalent Windows paths. For
example, the '/nologo' switch below gets rewritten to something like
'C:/Program Files/Git/nologo', and this obviously breaks the build.
Thankfully, most M$ tools accept dashes ('-') as well.
Signed-off-by: Ziemowit Łąski <15880281+zlaski@users.noreply.github.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Their usefulness is questionable, very few decoders set them, and their type
should have been int64_t. A replacement field can be added later if a valid use
case is found.
Signed-off-by: Marton Balint <cus@passwd.hu>
Frame counters can overflow relatively easily (INT_MAX number of frames is
slightly more than 1 year for 60 fps content), so make sure we use 64 bit
values for them.
Also deprecate the old 32 bit frame_number attribute.
Signed-off-by: Marton Balint <cus@passwd.hu>
Many filters accept user-provided data that is cumbersome to provide as
text strings - e.g. binary files or very long text. For that reason such
filters typically provide a option whose value is the path from which
the filter loads the actual data.
However, filters doing their own IO internally is a layering violation
that the callers may not expect, and is thus best avoided. With the
recently introduced graph segment parsing API, loading option values
from files can now be handled by the caller.
This commit makes use of the new API in ffmpeg CLI. Any option name in
the filtergraph syntax can now be prefixed with a slash '/'. This will
cause ffmpeg to interpret the value as the path to load the actual value
from.
Callers currently have two ways of adding filters to a graph - they can
either
- create, initialize, and link them manually
- use one of the avfilter_graph_parse*() functions, which take a
(typically end-user-written) string, split it into individual filter
definitions+options, then create filters, apply options, initialize
filters, and finally link them - all based on information from this
string.
A major problem with the second approach is that it performs many
actions as a single atomic unit, leaving the caller no space to
intervene in between. Such intervention would be useful e.g. to
- modify filter options;
- supply hardware device contexts;
both of which typically must be done before the filter is initialized.
Callers who need such intervention are then forced to invent their own
filtergraph parsing, which is clearly suboptimal.
This commit aims to address this problem by adding a new modular
filtergraph parsing API. It adds a new avfilter_graph_segment_parse()
function to parse a string filtergraph description into an intermediate
tree-like representation (AVFilterGraphSegment and its children).
This intermediate form may then be applied step by step using further
new avfilter_graph_segment*() functions, with user intervention possible
between each step.
Also, replace an AVFilterContext argument with a logging context+private
class, as those are the only things needed in this function.
Will be useful in future commits.
This script generates the current general assembly voters according to
the criteria of '20 commits in the last 36 months'.
Signed-off-by: J. Dekker <jdek@itanimul.li>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
libavutil/color_utils contains some avpriv_ symbols that map
enum AVTransferCharacteristic values to gamma-curve approximations and
to the actual transfer functions to invert them (i.e. -> linear).
There's two issues with this:
(1) avpriv is evil and should be avoided whenever possible
(2) libavutil/csp.h exposes a public API for handling color that
already handles primaries and matricies
I don't see any reason this API has to be private, so this commit takes
the functionality from avutil/color_utils and merges it into avutil/csp
with an exposed av_ API rather than the previous avpriv_ API.
Every reference to the previous API has been updated to point to the
new one. color_utils.h has been deleted as well. This should not break
any applications as it only contained avpriv_ symbols in the first
place, so nothing in that header could be referenced by other
applications.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It has been deprecated in 94d68a41fa
and can't be set via AVOptions. The only codecs that use it
(the MPEG-1/2 encoders) have private options for this.
So remove it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This commit does for AVOutputFormat what commit
20f9727018 did for AVCodec:
It adds a new type FFOutputFormat, moves all the internals
of AVOutputFormat to it and adds a now reduced AVOutputFormat
as first member.
This does not affect/improve extensibility of both public
or private fields for muxers (it is still a mess due to lavd).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It is the most commonly used field and moving it to the start
e.g. allows to encode the offset in a pointer+offset addressing
mode on one byte on x86.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Analogous to -enc_stats*, but happens right before muxing. Useful
because bitstream filters and the sync queue can modify packets after
encoding and before muxing. Also has access to the muxing timebase.
Since at least 4.4.3, -ab/-b:a help text was in the video section
of ffmpeg -h, but these are audio options.
Signed-off-by: Marth64 <marth64@proxyid.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Current HLS implementation simply skip a failed segment to catch up
the stream, but this is not optimal for some use cases like livestream
recording.
Add an option to retry a failed segment to ensure the output file is
a complete stream.
Signed-off-by: gnattu <gnattuoc@me.com>
Reviewed-by: Steven Liu <liuqi05@kuaishou.com>
We parse the fallback cHRM on decode and correctly determine that we
have BT.709 primaries, but unknown TRC. This causes us to write cICP
where we shouldn't. Primaries without transfer can be handled entirely
by cHRM, so we should only write cICP if we actually know the transfer
function.
Additionally, we should avoid writing cICP if there's an ICC profile
because the spec says decoders must prioritize cICP over the ICC
profile.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
We need to construct the output format list separatedly from the input
format list, because we need to adhere to two extra requirements:
1. Big-endian output formats are always unsupported (runtime error)
2. Combining 'vulkan' with an explicit out_format that is not supported
by the vulkan frame allocation code is illegal and will crash (abort)
As a free side benefit, this rewrite fixes a possible memory leak in the
`fail` path that was present in the old code.
Signed-off-by: Niklas Haas <git@haasn.dev>
It only works on Linux
$ ffmpeg -loglevel verbose -init_hw_device qsv=intel -f lavfi -i \
yuvtestsrc -vf "format=uyvy422,vpp_qsv=format=nv12" -f null -
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Since many distributions ship libjxl 0.7.0 still, we'd still prefer to
compile against that, but don't want to lose the features that require
libjxl 0.8.0 or greater. For this reason I've added preprocessor #ifdef
guards around the features that aren't necessarily in libjxl 0.7.0.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Splits the currently handled subtitle at random access point
packets that can be configured to follow a specific output stream.
Currently only subtitle streams which are directly mapped into the
same output in which the heartbeat stream resides are affected.
This way the subtitle - which is known to be shown at this time
can be split and passed to muxer before its full duration is
yet known. This is also a drawback, as this essentially outputs
multiple subtitles from a single input subtitle that continues
over multiple random access points. Thus this feature should not
be utilized in cases where subtitle output latency does not matter.
Co-authored-by: Andrzej Nadachowski <andrzej.nadachowski@24i.com>
Co-authored-by: Bernard Boulay <bernard.boulay@24i.com>
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
This way we can call process_subtitles without causing the decoded
frame counter to get bumped.
Additionally, this now takes into mention all of the decoded
subtitle frames without fix_sub_duration latency/buffering, or filtering
out decoded reset/end subtitles without any rendered rectangles, which
matches the original intent in 4754345027
.
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
This enables us to later call this when generating additional
subtitles for splitting purposes.
Co-authored-by: Andrzej Nadachowski <andrzej.nadachowski@24i.com>
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
QSVDeintContext and VPPContext have the same base context, and all
features in deinterlace_qsv are implemented in vpp_qsv filter, so
deinterlace_qsv can be taken as a special case of vpp_qsv filter, and we
may use VPPContext with a different option array, preinit callback and
support pixel formats to implement deinterlace_qsv, then remove
QSVDeintContext.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Like what we did for scale_qsv filter, we use QSVVPPContext as a base
context to manage MFX session for deinterlace_qsv filter.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
This is used to control the output at frame rate or field rate when
deinterlace is expected and framerate is not specified.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)
x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.
The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.
In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
libjxl only accepts 16-bit buffers with its API, but it can
accept 9-bit to 15-bit input via a 16-bit buffer, provided the flag
is set declaring the buffer to be of the respective significant depth.
Likewise, it can only provide pixel data on decode as a 16-bit buffer
(if higher than 8) but does provide the metadata tagging the actual bit
depth.
This commit causes libjxlenc.c and libjxldec.c to respect this metadata
and tag/read it accordingly from AVCodecContext->bits_per_raw_sample.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
mfenc sets FF_CODEC_CAP_INIT_CLEANUP, so calling mf_close() on
failure inside mf_init() results in a double-free.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Only warn if the advanced_editlist option is enabled (it is enabled
by default though) so we don't print one warning for each track, and
demote the warning to AV_LOG_LEVEL_VERBOSE; this message does get
generated whenever parsing a fragmented MP4 file, regardless of
whether the file actually uses multiple edits or not.
Later when parsing the mov structures, the demuxer does warn if
the file did contain multiple edits which would require the
advanced_editlist option enabled for decoding correctly.
Adjust the warning message for the case when the file seemed like it
actually would have needed handling of advanced edit lists, to
reflect the fact that it doesn't help to try set the option as
it has been automatically disabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
The construct of using offsetof on a (potentially anonymous) struct
defined within the offsetof expression, while supported by all
current compilers, has been declared explicitly undefined by the
C standards committee [1].
Clang recently got a change to identify this as an issue [2];
initially it was treated as a hard error, but it was soon after
softened into a warning under the -Wgnu-offsetof-extensions option
(not enabled automatically as part of -Wall though).
Nevertheless - in this particular case, it's trivial to fix the
code not to rely on the construct that the standards committee has
explicitly called out as undefined.
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2350.htm
[2] https://reviews.llvm.org/D133574
Signed-off-by: Martin Storsjö <martin@martin.st>
If the dictionary provided on input contains multiple entries for an
option (relevant for flags modifying the previous value with '+' or
'-') and the option is not found in the target object, only the last
entry would be returned to the caller.
Pass AV_DICT_MULTIKEY to av_dict_set() to make sure all such entries are
returned.
QSVScaleContext and VPPContext have the same base context, and all
features in scale_qsv are implemented in vpp_qsv filter, so scale_qsv
can be taken as a special case of vpp_qsv filter, and we may use
VPPContext with a different option array, preinit callback and supported
pixel formats to implement scale_qsv then remove QSVScaleContext
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Use QSVVPPContext as a base context of QSVScaleContext, hence we may
re-use functions defined for QSVVPPContext to manage MFX session for
scale_qsv filter.
In addition, system memory has been taken into account in
QSVVVPPContext, we may add support for non-QSV pixel formats in the
future.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Hyper Encode uses Intel integrated and discrete graphics on one system
to accelerate encoding of a single video stream.
Depending on the selected parameters and codecs, performance gain on AlderLake iGPU + ARC Gfx up to 1.6x.
More information: https://www.intel.co.uk/content/www/uk/en/architecture-and-technology/adaptix/deep-link.html
Developer guide: https://github.com/oneapi-src/oneVPL-intel-gpu/blob/main/doc/HyperEncode_FeatureDeveloperGuide.md
Hyper Encode is supported only on Windows and requires D3D11 and oneVPL.
To enable Hyper Encode need to specify:
-Hyper Encode mode (-dual_gfx on or dual_gfx adaptive)
-Encoder: h264_qsv or hevc_qsv
-BRC: VBR, CQP or ICQ
-Lowpower mode (-low_power 1)
-Closed GOP for AVC or strict GOP for HEVC -idr_interval = 0 used by default
Depending on the encoding parameters, the following parameters may need
to be adjusted:
-g recommended >= 30 for better performance
-async_depth recommended >= 30 for better performance
-extra_hw_frames recommended equal to async_depth value
-bf recommended = 0 for better performance
In the cases with fast encoding (-preset veryfast) there may be no
performance gain due to the fact that the decode is slower than the encode.
Command line examples:
ffmpeg.exe -init_hw_device qsv:hw,child_device_type=d3d11va,child_device=0 -v verbose -y -hwaccel qsv -extra_hw_frames 60 -async_depth 60 -c:v h264_qsv -i bbb_sunflower_2160p_60fps_normal.mp4
-async_depth 60 -c:v h264_qsv -preset medium -g 60 -low_power 1 -bf 0 -dual_gfx on output.h265
Signed-off-by: galinart <artem.galin@intel.com>
Let's ignore the index table if the number of index entries does not match the
index duration (or the special AVID index entry counts).
Fixes: OOM
Fixes: 50551/clusterfuzz-testcase-minimized-ffmpeg_dem_MXF_fuzzer-6607795234930688
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Marton Balint <cus@passwd.hu>
This is intended to be a more convenient replacement for
reordered_opaque.
Add support for it in the two encoders that offer
AV_CODEC_CAP_ENCODER_REORDERED_OPAQUE: libx264 and libx265. Other
encoders will be supported in future commits.
Some encoders (ffv1, flac, adx) are marked with AV_CODEC_CAP_DELAY onky
in order to be flushed at the end, otherwise they behave as no-delay
encoders.
Add a capability to mark these encoders. Use it for setting pts
generically.
bink supports 16x16 blocks in chroma planes thus we need to allocate enough.
Fixes: out of array access
Fixes: 55026/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BINK_fuzzer-6013915371012096
Reviewed-by: Peter Ross <pross@xvid.org>
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This brings ff_vf_ssim360 in line with its declaration in allfilters.c;
this discrepancy is actually undefined behaviour.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The FILTER_INPUTS and FILTER_OUTPUTS macros already set
AVFilter.(inputs|outputs); Clang therefore emits a warning for
this: "initializer overrides prior initialization of this subobject
[-Winitializer-overrides]"
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Don't use "static const" for compile time float constants, but use
defines. This fixes the following error:
src/libavfilter/vf_ssim360.c(549): error C2099: initializer is not a constant
Signed-off-by: Martin Storsjö <martin@martin.st>
Customized SSIM for various projections (and stereo formats) of 360 images and videos.
Further contributions by:
Ashok Mathew Kuruvilla
Matthieu Patou
Yu-Hui Wu
Anton Khirnov
Suggested-By: ffmpeg@fb.com
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Since those fields will be overridden by videotoolbox_start(), they
should never be set by user, it can trigger memory leaks otherwise.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
In the code path of av_videotoolbox_default_init/init2(),
avctx->internal->hwaccel_priv_data is NULL and passed to
decoder_cb.decompressionOutputRefCon. Then it will be dereferenced
inside videotoolbox_decoder_callback().
Delay videotoolbox_star() until ff_videotoolbox_common_init() to
fix the bug.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
It's the minimum of all child protocols max_packet_size. Can be used
like this:
ffmpeg -re -i cctv.mp4 -c copy -f mpegts \
-protocol_whitelist 'tee,file,udp' \
'tee:out.ts|udp://127.0.0.1:6666?pkt_size=1316'
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
AVMediaCodecDeviceContext without surface or native_window is
useless, it shouldn't be created at all. Such dummy AVHWDeviceContext
is allowed before, and it's used by mpv player. Creating a ANativeWindow
automatically breaks such usecases.
So disable creating a ANativeWindow by default. It can be enabled
via the create_window flag, or by set the AVDictionary of
av_hwdevice_ctx_create(). The downside is that
ffmpeg -hwaccel mediacodec -i input.mp4 \
-c:a copy -c:v hevc_mediacodec output.mp4
use ByteBuffer mode which isn't as efficient as before. The upside
is libavfilter works now, which should be less surprise.
To enable create_window on ffmpeg command line, use
ffmpeg -hwaccel mediacodec \
-init_hw_device mediacodec=mediacodec,create_window=1 \
-i input.mp4 -c:a copy -c:v hevc_mediacodec output.mp4
Users should know what it is to enable create_window. It should
be OK to take sometime to figure out the option. And there are comments
inside hwcontext_mediacodec.h to help user figure it out.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
This commit adds both decode and encode support for cICP chunks, which
allow a PNG image's pixel data to be tagged by any of the enum values in
H.273, without an ICC profile.
Upon decode, if a cICP chunk is present, the PNG decoder will tag output
AVFrames with the resulting enum color, and ignore iCCP, sRGB, gAMA, and
cHRM chunks, as per the spec.
Upon encode, if the color space is known and specified, and it is not sRGB,
the PNG encoder will output a cICP chunk containing the color space. If the
color space is sRGB, then it will output an sRGB chunk instead of a cICP
chunk. If the color space of the input is not unspecified, it will not output
a cICP chunk tagging the PNG as unspecified.
In either the sRGB case or the non-SRGB case, gAMA and cHRM are still written
as fallbacks provided the info is known.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
If an sRGB chunk is present in the PNG file, this commit will cause the
png decoder to ignore the cHRM and gAMA chunks and tag the resulting AVFrames
with BT.709 primaries, and ISO/IEC 61966-2-1 transfer. If these tags are
present in the AVFrame, pngenc.c already writes this chunk, so no change was
needed on the encode-side.
The PNG spec does not define what happens if sRGB and iCCP are present at
the same time, it just recommends that this not happen. As of this patch,
the decoder will have the ICC profile take precedence, and it will not tag
the pixel data as sRGB.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
The cHRM chunk is descriptive. That is, it describes the primaries that should
be used to interpret the pixel data in the PNG file. This is notably different
from Mastering Display Metadata, which describes which subset of the presented
gamut is relevant. MDM describes a gamut and says colors outside the gamut are
not required to be preserved, but it does not actually describe the gamut that
the pixel data from the frame resides in. Thus, to decode a cHRM chunk present
in a PNG file to Mastering Display Metadata is incorrect.
This commit changes this behavior so the cHRM chunk, if present, is decoded to
color metadata. For example, if the cHRM chunk describes BT.709 primaries, the
resulting AVFrame will be tagged with AVCOL_PRI_BT709, as a description of its
pixel data. To do this, it utilizes libavutil/csp.h, which exposes a funcction
av_csp_primaries_id_from_desc, to detect which enum value accurately describes
the white point and primaries represented by the cHRM chunk.
This commit also changes pngenc.c to utilize the libavuitl/csp.h API, since it
previously duplicated code contained in that API. Instead, taking advantage of
the API that exists makes more sense. pngenc.c does properly utilize the color
tags rather than incorrectly using MDM, so that required no change.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Gamma 2.2 and Gamma 2.8 are tagged in the file as 0.45455 and 0.35714,
respectively (i.e. 1/2.2 and 1/2.8). Trying to identify them as 2.2 and
2.8 instead of these values will cause the transfer function to not
properly be recognized. This patch fixes this.
bits_*_signed(0) will currently invoke an undefined shift by
8 * sizeof(int).
Add bits_*_signed_nz() that only works for n>0, analogous to
bits_read_nz(). Add an explicit check for n=0 in bits_*_signed().
Found-by: James Almer
This change improves the performance and multicore scalability of the vp9
codec for streaming single-pass encoded videos by taking advantage of up
to 64 cores in the system. The current thread limit for ffmpeg codecs is 16
(MAX_AUTO_THREADS in pthread_internal.h) due to a limitation in H.264 codec
that prevents more than 16 threads being used.
Experiments show that increasing the thread limit to 64 for vp9 improves
the performance for encoding 4K raw videos for streaming by up to 47%
compared to 16 threads, and from 20-30% for 32 threads, with the same quality
as measured by the VMAF score.
Rationale for this change:
Vp9 uses tiling to split the video frame into multiple columns; tiles must
be at least 256 pixels wide, so there is a limit to how many tiles can be
used. The tiles can be processed in parallel, and more tiles mean more CPU
threads can be used. 4K videos can make use of 16 threads, and 8K videos
can use 32. Row-mt can double the number of threads so 64 threads can be used.
Signed-off-by: James Zern <jzern@google.com>
Fixes: out of array access
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_ZERO12V_fuzzer-6714182078955520.fuzz
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_ZERO12V_fuzzer-6698145212137472.fuzz
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The same members between QSVVPPContext and VPPContext are removed from
VPPContext, and async_depth is moved from QSVVPPParam to QSVVPPContext
so that all QSV filters using QSVVPPContext may support async depth.
In addition, we may use QSVVPPContext as base context in other QSV
filters in the future so that we may re-use functions defined in
qsvvpp.c for other QSV filters.
This commit shouldn't change the functionality of vpp_qsv / overlay_qsv.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
This is in preparation for reusing the code for other QSV filters. E.g.
deinterlacing_qsv may have an option array without format option
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
QSV filters may set this flag in preinit callback to turn on / off pass
through mode
This is in preparation for reusing the code for other QSV filters. E.g.
scale_qsv filter doesn't support pass through mode.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Set the expected default value for options in this callback, hence we
have the right values even if these options are not included in the
option arrray.
This is in preparation for reusing the code for other QSV filters.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Special values are:
0 = original width/height
-1 = keep original aspect
This is in preparation for reusing the code for other QSV filters.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
This patch provides default value if the expression is NULL.
This is in preparation for reusing the code for other QSV filters.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Also fix the naming style in enum var_name.
This is in preparation for reusing the code for other QSV filters.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
segment_time and segment_times are defined as duration specifications, not
timestamps, so calculation of segment duration must account for initial
timestamp. Fixed.
FATE ref for segment-mp4-to-ts changed on account of avoiding premature
segment cut at the end of the first segment.
Fixes: out of array access
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WBMP_fuzzer-6652634692190208
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WBMP_fuzzer-6653703453278208
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WBMP_fuzzer-6668020758216704
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WBMP_fuzzer-6684749875249152
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This ensures all defined types are supported, even if only to report
their presence and print their size if a custom implementation is
not added.
Signed-off-by: James Almer <jamrial@gmail.com>
Defined by H.274, this SEI message is utilized by iPhones to save
the nominal ambient viewing environment for the display of recorded
HDR content. The contents of the message are exposed to API users
as AVFrame side data containing AVAmbientViewingEnvironment.
As the DV RPU test sample is from an iPhone and includes Ambient
Viewing Environment SEI messages, its test result gets updated.
Parsing should probably be enabled for all codecs, at least for headers,
but e.g. the AAC parser produces 1-byte packets of zero padding with it,
so I'm just enabling it for EAC3 for the moment.
Removed the unnecessary calls to ff_format_io_close
this patch introduced in dashenc_delete_file.
dashenc_delete_file functions open a
new HTTP connection regardless of the http_persistent value,
So change their behaviour to keep http connections open
if http_persistent is set.
Signed-off-by: Basel Sayeh <basel.sayeh@hotmail.com>
Removed the unnecessary calls to ff_format_io_close
this patch introduced in hls_delete_file.
hls_delete_file functions open a new HTTP connection
regardless of the http_persistent value,
So change their behaviour to keep http connections open
if http_persistent is set
Signed-off-by: Basel Sayeh <basel.sayeh@hotmail.com>
The HEIF specification permits specifying the looping behavior of
animated sequences by using the EditList (elst) box. The track
duration will be set to the total duration of all the loops (or
infinite) and the duration of a single loop will be set in the edit
list box.
The default behavior is to loop infinitely.
Compliance verification:
* This was added in libavif recently [1] and the files produced by
ffmpeg after this change have EditList boxes similar to the ones
produced by libavif (and avifdec is able to parse the loop count as
intended).
* ComplianceWarden is ok with the produced files.
* Chrome is able to play back the produced files.
[1] 4d2776a3af
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Allow specifying the movie_timescale options to AVIF ouptut.
This also makes sure that when movie_timescale is not specified,
the default value of 1000 is used instead of 0. Animated AVIF
files which don't specify the movie_timescale will have the
correct duration written in the track and movie headers after this
change (instead of writing 0).
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Fixes: signed integer overflow: -2889074 * 2048 cannot be represented in type 'int'
Fixes: 51363/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BONK_fuzzer-5660734784143360
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BONK_fuzzer-6617680050520064
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BONK_fuzzer-6743951854141440
No check is done for the overflow as this was rejected in last review, see the ML
Note: the 2nd and 3rd testcase was assigned by ossfuzz to a unrelated theora issue (48567)
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SGI_fuzzer-6704753329700864
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SGI_fuzzer-6683986844057600
Fixes: 48567/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SGI_fuzzer-6697387691474944
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 2147481600 + 13408 cannot be represented in type 'int'
Fixes: 53963/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_H264_fuzzer-4650467311616000
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
android_get_device_api_level() is a static inline before API level
29. It was implemented via __system_property_get(). We can do the
same thing, but I don't want to mess up with __system_property_get.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
There is no reason to enforce a high minimum. In the context
of streaming only a few output buffers and capture buffers
are even needed for continuous playback. This also helps
alleviate memory pressure when decoding 4K media.
Signed-off-by: Aman Karmani <aman@tmm1.net>
The current code will apply them if the options string does not contain
a 'flags' substring, and will do so by appending the graph-level option
string to the filter option string (with the standard ':' separator).
This is flawed in at least the following ways:
- naive substring matching without actually parsing the options string
may lead to false positives (e.g. flags are specified by shorthand)
and false negatives (e.g. the 'flags' substring is not actually the
option name)
- graph-level sws options are not limited to flags, but may set
arbitrary sws options
This commit simply applies the graph-level options with
av_set_options_string() and lets them be overridden as desired by the
user-specified filter options (if any). This is also shorter and avoids
extra string handling.
This function currently treats AVFilterContext options and
filter-private options differently: the former are immediately applied,
while the latter are stored in a dictionary to be applied later.
There is no good reason for having two branches - storing all options in
the dictionary is simpler and achieves the same effect (since it is
later applied with av_opt_set_dict()).
This will also be useful in future commits.
Current code first sets AVFilterContext-level options, then aplies the
leftover on the filter's private data. This is unnecessary, applying the
options to AVFilterContext with the AV_OPT_SEARCH_CHILDREN flag
accomplishes the same effect.
Current code may, depending on the muxer, decide to use VSYNC_VFR tagged
with the specified framerate, without actually performing framerate
conversion. This is clearly wrong and against the documentation, which
states unambiguously that -r should produce CFR output for video
encoding.
FATE test changes:
* nuv-rtjpeg: replace -r with '-enc_time_base -1', which keeps the
original timebase. Output frames are now produced with proper
durations.
* filter-mpdecimate: just drop the -r option, it is unnecessary
* filter-fps-r: remove, this test makes no sense and actually
produces broken VFR output (with incorrect frame durations).
It is a URL rewriter for IPFS gateways, not an actual implementation of
IPFS, and naming it as such was both incorrect and misleading.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Advanced edit list support is entirely broken for fragmented MP4s,
currently. mov_fix_index is never run in mov_build_index, since
in fragmented MP4s the stco, stsz, stts, and stsc boxes have zero
entries, with the index being filled in as each fragment's trun
box is seen.
The result of this is that the skip samples is never set properly,
since half the code thinks it doesn't need to, as advanced_editlist
is enabled, but as mov_fix_index is never called, it doesnt get set.
This means that any edits for e.g. priming are not properly applied
as skip samples side data.
This also means remuxing to fragmented MP4 from progressive MP4 with
lavf will quietly drop the edit list, currently.
Example:
$ ffmpeg -loglevel quiet -advanced_editlist 1 -i non_fragmented.mp4 -f md5 -
MD5=d02d929f8eb4edef624758a298d5f7c6
$ ffmpeg -loglevel quiet -advanced_editlist 0 -i non_fragmented.mp4 -f md5 -
MD5=d02d929f8eb4edef624758a298d5f7c6
$ ffmpeg -loglevel quiet -advanced_editlist 1 -i fragmented.mp4 -f md5 -
MD5=e38b110f586fa886ff94e0ca98a95d59 <-- wrong, extra samples are output instead of being skipped
$ ffmpeg -loglevel quiet -advanced_editlist 0 -i fragmented.mp4 -f md5 -
MD5=d02d929f8eb4edef624758a298d5f7c6
We cannot call mov_fix_index after reading a trun box
since mov_fix_index seems to assume it is only called once, on a
fully complete index, an multiple calls to it don't seem like
they'd work, so the "best" option seems to be disabling advanced
edit list support entirely for the time being, as it is broken
for these types of files.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
The cached bitstream reader was originally written by Alexandra Hájková
for Libav, with significant input from Kostya Shishkov and Luca Barbato.
It was then committed to FFmpeg in ca079b0954, by merging it with the
implementation of the current bitstream reader.
This merge makes the code of get_bits.h significantly harder to read,
since it now contains two different bitstream readers interleaved with
#ifdefs. Additionally, the code was committed without proper authorship
attribution.
This commit re-adds the cached bitstream reader as a standalone header,
as it was originally developed. It will be made useful in following
commits.
Integration by Anton Khirnov.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Current code stores a pointer to allocated data in libx265 and frees it
when the encoded packet is retrieved. This will leak if the packet is
never retrieved, e.g. if the encoder is closed without being flushed.
Restructure the code such that only indices to an array stored in our
private data are given to libx265. This ensures no allocated memory can
be lost.
Commit 18f24527eb accidentally made side data only packets be handled like a
flush request. Fix this regression by effectively ignoring them as was the
original intention.
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: signed integer overflow: 0 - -9223372036854775808 cannot be represented in type 'long int'
Fixes: fate-cover-art-aiff-id3v2-remux, fate-cover-art-mp3-id3v2-remux and fate-mov-cover-image
under ubsan.
Signed-off-by: James Almer <jamrial@gmail.com>
It appears faster than the iterative method on my machine (1.06x
faster), so I'm guessing compilers improved over time (the iterative
version was slightly faster in the past).
Currently, in case of equality on the first color channel, the order of
the ref colors is defined by the hashing function. This commit makes the
sorting deterministic and improve the hierarchical ordering.
This variable is used only for the running weight (used to reach the
target median). The places where we actually need the box weight are
changed to use box->weight.
The variance computation is simple enough now (since we can use the axis
squared errors) that it doesn't need to have a complex lazy computation
logic.
The equalities in the w{r,g,b} range checks make sure longest is never
0. Even if the alpha ended up being selected in get_next_color() it
would cause underread memory accesses in its caller (colormap_insert).
This reverts commit dea673d0d5.
This change cannot work for several reasons, the most obvious ones are:
- the alpha is being part of the scoring of the color difference, even
though we can not interpret the alpha as part of the perception of the
color (we don't even know if it's premultiplied or postmultiplied)
- the colors are averaged with their alpha value which simply cannot
work
The command proposed in the original thread of the patch actually
produces a completely broken file:
ffmpeg -y -loglevel verbose -i fate-suite/apng/o_sample.png -filter_complex "split[split1][split2];[split1]palettegen=max_colors=254:use_alpha=1[pal1];[split2][pal1]paletteuse=use_alpha=1" -frames:v 1 out.png
We can see that many color pixels are off, but more importantly some
colors have a random alpha value: https://imgur.com/eFQ2UK7
I don't see any easy fix for this unfortunately, the approach appears to
be flawed by design.
New option can be used to avoid creating very short segments with inputs
whose GOP size is variable or unharmonic with segment_time.
Only effective with segment_time.
Similar to how the encoder looks at frame color information to write the frame
header bitstream.
Should workaround ticket #10091, where container level color parameters passed
to the decoder context were being overwritten.
Signed-off-by: James Almer <jamrial@gmail.com>
Some encoders, like flac, can send side data only packets at the end.
Eventually, said extradata update should ideally be used to update the header
when writting to seekable output, but for now, ignore them.
Should fix the undefined behavior of passing NULL to memcpy().
Signed-off-by: James Almer <jamrial@gmail.com>
PFM (aka Portable FloatMap) encodes its scanlines from bottom-to-top,
not from top-to-bottom, unlike other NetPBM formats. Without this
patch, FFmpeg ignores this exception and decodes/encodes PFM images
mirrored vertically from their proper orientation.
For reference, see the NetPBM tool pfmtopam, which encodes a .pam
from a .pfm, using the correct orientation (and which FFmpeg reads
correctly). Also compare ffplay to magick display, which shows the
correct orientation as well.
See: http://www.pauldebevec.com/Research/HDR/PFM/ and see:
https://netpbm.sourceforge.net/doc/pfm.html for descriptions of this
image format.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: James Almer <jamrial@gmail.com>
EAGAIN is not correct in this scenario. FFCodec.cb.decode() callback decoders
always return the amount of bytes consumed from the input packet (if any), and
report if a frame was generated by setting got_frame.
Signed-off-by: James Almer <jamrial@gmail.com>
Add encoding of 32 bit-per-sample PCM to FLAC files to libavcodec.
Coding to this format is at this point considered experimental and
-strict experimental is needed to get ffmpeg to encode such files.
Fixes: signed integer overflow: 574590586 - -1875616554 cannot be represented in type 'int'
Fixes: 53914/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_FIXED_fuzzer-5037125846564864
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
They're not currently used, so they don't need to be there.
Vulkan stabilized the decode extensions less than a week ago, and their
name prefixes were changed from EXT to KHR. It's a bit too soon to be
depending on it, so rather than bumping, just remove these for now.
Fixes: OOM testcase
Fixes: 51527/clusterfuzz-testcase-minimized-ffmpeg_dem_LAF_fuzzer-5453663505612800
OOM can still happen after this as an arbitrary sized block is allocated and read
this would require a redesign or some limit on the sample rate.
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This filter, when used in the "pad" mode, currently makes the
distinction between limited and full range solely by testing for YUVJ
pixel formats at link setup time. This is deprecated and should be
improved to perform the detection based on the per-frame metadata.
In order to make this distinction based on color range metadata, which
is only known at the time of filtering frames, for simplicity, we simply
allocate two copies of the "black" frame - one for limited range and the
other for full range metadata. This could be done more dynamically (e.g.
as-needed or simply by blitting the appropriate pixel value directly),
but this change is relatively simple and preserves the structure of the
existing code.
This commit actually fixes a bug in FATE - the new output is correct for
the first time. The previous md5 ref was of a frame that incorrectly
combined full-range pixel data with limited-range black fields. The
corresponding result has been updated.
Signed-off-by: Niklas Haas <git@haasn.dev>
This filter currently makes the distinction between limited and full
range by testing for the deprecated YUVJ pixel formats at link setup
time. This is deprecated and should be improved to perform the detection
based on the per-frame metadata.
Rewrite it to calculate the black pixel threshold at the time of
filtering a frame, when metadata about the frame's color range is known.
Doing it this way has the small side benefit of being able to handle
streams with variable metadata, and is not a meaningful performance
cost.
Signed-off-by: Niklas Haas <git@haasn.dev>
Bugfix: The OpenVino DNN backend in the 'async' mode sets
'task->inference_done' to 'complete' prior to data copy from
OpenVino output buffer to task's output frame.
This order causes task destroy in ff_dnn_get_result_common()
prior to model output processing.
Signed-off-by: Rafik Saliev <rafik.f.saliev@intel.com>
It works since most of Android devices don't output B frames by
default. The behavior is documented by Android now, although there
is some exception in history, which should have been fixed now.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Use input PTS as DTS has multiple problems:
1. If there is no reordering, it's better to just use the output
PTS as DTS, since encoder may change the timestamp value (do it
on purpose or rounding error).
2. If there is reordering, input PTS should be shift a few frames
as DTS to satisfy the requirement of PTS >= DTS. I can't find a
reliable way to determine how many frames to be shift. For example,
we don't known if the encoder use hierarchical B frames. The
max_num_reorder_frames can be get from VUI, but VUI is optional.
3. Encoder dropping frames makes the case worse. Android has an
BITRATE_MODE_CBR_FD option to allow it explicitly.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
simple_idct.asm is 32 bit-only since
bfb28b5ce8,
whereas simple_idct10.asm is x64-only. So don't build
the ultimately unneeded and empty files, as some linkers
complain about this: "ranlib: file:
libavcodec/libavcodec.a(simple_idct.o) has no symbols"
(this is from an Xcode toolchain as reported by Ronald S. Bultje).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
We shouldn't be providing links to unverified and non-FFmpeg-controlled
content in our documentation.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
We do not endorse or recommend specific third party servers or companies
that users' data will be funneled through.
It is also incorrectly describing how FFmpeg currently works.
Should have been part of 412922cc6f but was
missed.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Attach the AVPacket instead of only a few values to the corresponding Dav1dData
struct and use it to set all output frame props.
Signed-off-by: James Almer <jamrial@gmail.com>
Only one codec using mjpegdec.c actually creates multiple
frames from a single packet, namely SMVJPEG. The other can
use the ordinary decode callback just fine. This e.g. has
the advantage of confining the special SP5X/AMV code to sp5xdec.c.
This reverts most of commit e9a2a8777317d91af658f774c68442ac4aa726ec;
of course it is not a simple revert: Way too much has changed;
furthermore, outright reverting the sp5xdec.c changes would readd
a stack packet to sp5x_decode_frame() which is not desired.
In order to avoid this without modifying the given AVPacket,
a variant of ff_mjpeg_decode_frame() with explicit buf and size
parameters has been added.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
AVID content is not supposed to be SMVJPEG; given that
both these codecs involve manipulating image dimensions
and cropping dimensions, it makes sense to restrict
the AVID codepaths to non-SMVJPEG codecs in order not
to have to think about what if SMVJPEG happens to
have a codec tag indicating AVID.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
xv30be is an obnoxious format that I shouldn't have included in the
first place. xv30 packs 3 10bit channels into 32bits and while our
byte-oriented logic can handle Little Endian correctly, it cannot
handle Big Endian. To avoid that, I marked xv30be as a bitstream
format, but while that didn't produce FATE errors, it turns out that
the existing read/write code silently produces incorrect results, which
can be revealed via ubsan.
In all likelyhood, the correct fix here is to remove the format. As
this format is only used by Intel vaapi, it's only going to show up
in LE form, so we could just drop the BE version. But I don't want to
deal with creating a hole in the pixfmt list and all the weirdness that
comes from that. Instead, I decided to write the correct read/write
code for it.
And that code isn't too bad, as long as it's specialised for this
format, as the channels are all bit-aligned inside a 32bit word.
Various parts of the code assume that width can be divided by various powers of 2
without rounding
Fixes: out of array access
Fixes: 53623/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VQC_fuzzer-6209269924233216
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
There can be more than one available render node, and it's not
guaranteed the first node we come across is the correct one. In
particular, 'vgem' devices are common, and are
never VAAPI-enabled and thus not valid here.
We have a 'kernel_driver' arg already for specifying a single driver we
*do* want, but it doesn't support a negation, nor a list. It's easier
just to automatically skip 'vgem' anyway, to avoid foisting this burden
on users.
This has precedent in libva-utils already:
bfb6b98ed62a exclude vgem node and invalid drm node in vainfo
bfb6b98ed6
Signed-off-by: Brian Norris <briannorris@chromium.org>
PI, PHI and E are defined in libavutil/eval.c, and user may use these
constants for scale_qsv filter, so we needn't re-define these variables
in vf_scale_qsv.c
Reviewed-by: Gyan Doshi <ffmpeg@gyani.pro>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
When process yuv420 frames, FFmpeg uses same alignment on Y/U/V
planes. VPL and MSDK use Y plane's pitch / 2 as U/V planes's
pitch, which makes U/V planes 16-bytes aligned. We need to set
a separate alignment to meet runtime's behaviour.
Now alignment is changed to 16 so that the linesizes of U/V planes
meet the requirment of VPL/MSDK. Add get_buffer.video callback to
qsv filters to change the default get_buffer behaviour.
Now the commandline works fine:
ffmpeg -f rawvideo -pix_fmt yuv420p -s:v 3082x1884 \
-i ./3082x1884.yuv -vf 'vpp_qsv=w=2466:h=1508' -f rawvideo \
-pix_fmt yuv420p 2466_1508.yuv
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Making it point to the input packet results in different behavior during flush,
where its contents will be that of an empty packet instead of containing the
props from the last input packet fed to the decoder.
After this change, decoding with more than one thread will shield the same
results as using a single thread.
Signed-off-by: James Almer <jamrial@gmail.com>
Use the opaque field instead to keep this value.
No functional change, but removes the hack that made the packet technically
invalid, allowing the safe usage of functions like av_packet_ref() on it
if required.
Signed-off-by: James Almer <jamrial@gmail.com>
The idea behind last_pkt_props was to store the properties of the last packet
fed to the decoder. Any sort of queueing required by CODEC_CAP_DELAY decoders
that consume several packets before they start outputting frames should be done
by the decoders in question. An example of this is libdav1d.
This is required for the following commits that will fix last_pkt_props in
frame threading scenarios, as well as maintain its contents during flush.
This revers commit 022a12b306.
Signed-off-by: James Almer <jamrial@gmail.com>
This will be needed for the following commit, after which ff_get_buffer() will
stop setting frame->pts to AV_NOPTS_VALUE.
Signed-off-by: James Almer <jamrial@gmail.com>
This will be needed for a following commit, after which ff_get_buffer() will
stop setting frame->pts to AV_NOPTS_VALUE.
Signed-off-by: James Almer <jamrial@gmail.com>
This commit tests it in a way that automatically checks
that using av_dict_iterate() is equivalent to using
av_dict_get() with key "" and AV_DICT_IGNORE_SUFFIX.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Untested with "non fuzzed" samples as i have no such file
The reference 5.6.0 decoder appears to also have undefined behavior in the lossless codepath for this
Fixes: shift exponent 32 is too large for 32-bit type 'int'
Fixes: 50930/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WAVPACK_fuzzer-6319201949712384
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The Encoding field (and the \fe tag) allows to limit font selection to
only those fonts declaring support for the specified codepage in their
OS/2's table "Code Page Character Range" field.
Particularly, Encoding=0 means only font's declaring support for "ANSI",
or rather "Latin (Western European)", are allowed to be selected.
Specifying Encoding=1 allows all fonts to be considered.
We do not want to limit font selection, so specify Encoding=1.
NB: at the time of writing libass only partially supports this field,
thus hiding the issue in any libass-based renderer. A VSFilter-based
DirectShow filter or XySubFilter will reveal the issue when a font not
declaring support for latin characters is specified in a style.
Colour values used in ASS files without a "YCbCr Matrix" header set to
"None" are subject to colour mangling, due to how ASS was historically
conceived. A more in-depth description can be found in the documetation
inside libass' public ass_types.h header. The important part is, if this
header is not set to "None", the final output colours can deviate from
the literal value specified in the file. When converting from non-ASS
formats we do not want any colour shift to happen, so let's set the
appropiate header.
NB: ffmpeg's subtitle filter, does not follow libass' documentation
regarding colour mangling, thus hiding the bug. Anything based on
VSFilter, XySubFilter or e.g. mpv do and might show the issue.
(Of course native ASS subs, which _do_ rely on colour mangling won't
work properly with the subtitle filter, but this can be fixed another
time)
There is no v4 ASS. There are versions 1 to 4 of SSA (with only v4
being supported by softsub renderers), ASS which declares itself as v4+
or v4.00+, and the rarely used v4++/v4.00++, which isn't fully supported
by libass.
As reflected by the [V4+ Styles] section header, FFmpeg uses ASS, not
SSA v4, so adjust the comment accordingly.
avx512 on Skylake-X (Xeon D-2123IT):
1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2
avx512icl on Ice Lake (Xeon Silver 4316):
2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2
Instead of manually assembling the string, use av_dict_get_string
which handles things like proper escaping too (even though it is
not yet needed here).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This unfortunately involved adding some parameters
to ff_h2645_sei_to_frame() that will be mostly unused.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is valid for HEVC; in fact, the ATSC-HEVC spec [1] simply
refers to the relevant H.264 spec.
It is also trivial to implement now: Just move applying AFD
to ff_h2645_sei_to_frame() and stop ignoring AFD when parsing
a HEVC SEI containing it.
A FATE-test for this has been added.
[1]: https://www.atsc.org/atsc-documents/a3412017-video-hevc/
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There are only slight differences between H.264 and HEVC
for this side data, so it makes sense to share the code.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This commit only factors out freeing the common SEI parts,
not whether the fields indicating whether an SEI is present
shall be reset. H.264 and HEVC differ in this regard
(ff_h264_sei_uninit() really resets, whereas ff_hevc_reset_sei()
only uninits.) and neither actually honours the persistency
as prescribed by the relevant specs.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Currently, several components select atsc_a53, despite
not using anything from it themselves. They only select
it because parsing SEI messages adds an indirect dependency.
But using direct dependencies is more natural, so add
dedicated subsystems for them.
It already allows to remove a superfluous dependency of
the HEVC QSV encoder on hevc_sei and atsc_a53.
Adding new subsystems only becomes effective after a reconfiguration.
In order to force this, some needed headers (which are only included
implicitly before this commit) were included explicitly in
libavformat/allformats.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SEI NALUs and several SEI messages are naturally byte-aligned,
so reading them via the bytestream-API is more natural.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
VPP in the SDK requires the frame rate to be set to a valid value,
otherwise init will fail, so always set a default framerate when the
input link doesn't have a valid framerate.
Reviewed-by: Soft Works <softworkz@hotmail.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
The SDK will return error if feeding h264_qsv encoder p010 input.
$ ffmpeg -f lavfi -i testsrc -vf "format=p010" -c:v h264_qsv -f null -
[...]
[h264_qsv @ 0x55584888b080] Current pixel format is unsupported
[h264_qsv @ 0x55584888b080] some encoding parameters are not supported
by the QSV runtime. Please double check the input parameters.
Error initializing output stream 0:0 -- Error while opening encoder for
output stream #0:0 - maybe incorrect parameters such as bit_rate, rate,
width or height
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
adaptive_i and adaptive_b cannot work with MFX_GOP_STRICT,
so only enable MFX_GOP_STRICT when these features are disabled.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
The SDK may provides HDR metadata for HDR streams via mfxExtBuffer
attached on output mfxFrameSurface1
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Fixes some MP4F files which have duration in mdhd set to UINT_MAX instead of zero.
Signed-off-by: Sasi Inguva <isasi@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: Timeout (read mostly the same data repeatly)
Fixes: 52457/clusterfuzz-testcase-minimized-ffmpeg_dem_ALP_fuzzer-6610706313379840
Fixes: 53098/clusterfuzz-testcase-minimized-ffmpeg_dem_SOL_fuzzer-6481382981632000
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -2146670226 + -2227242 cannot be represented in type 'int'
Fixes: 51943/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APAC_fuzzer-5779018251370496
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This causes the RLE decoder to exit before applying the last RLE run
All images i tested with are unchanged, this makes the special case
for handling the last run unused for non truncated images.
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -1094995528 * 8224 cannot be represented in type 'int'
Fixes: 53508/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FFV1_fuzzer-474551033462784
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: negation of -2147483648 cannot be represented in type 'int'; cast to an unsigned type to negate this value to itself
Fixes: 53364/clusterfuzz-testcase-minimized-ffmpeg_BSF_DTS2PTS_fuzzer-4693772269387776
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is a regression since: adaa06581c
Before this, max_channel and max_matrix_channel where compared for equality
Fixes: out of array access
Fixes: 53340/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEHD_fuzzer-514959011885875
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -1284837070 - 982101618 cannot be represented in type 'int'
Fixes: 53105/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AC3_FIXED_fuzzer-4848015827664896
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Rather than hard-coding AV_PIX_FMT_VULKAN, expand this to the full list
of formats supported by <libplacebo/utils/libav.h>. We re-use the
existing `format` option to allow selecting specific software formats in
addition to specific vulkan hwframe formats.
Some minor changes are necessary to account for the fact that
`ff_vk_filter_config_output` is now only called optionally, the fact
that the output format must now be parsed before `query_format` gets
called, and the fact that we need to call a different function to
retrieve data from the `pl_frame` in the non-hwaccel case.
Signed-off-by: Niklas Haas <git@haasn.dev>
Rather than the encoder timebase. Since the times are parsed as
microseconds, this will not reduce precision, except possibly when
chapter times are used and the chapter timebase happens to be better
aligned with the encoder timebase, which is unlikely.
This will allow parsing the keyframe times earlier (before encoder
timebase is known) in future commits.
There are 8 of them and they are typically used together. Allows to pass
just this struct to forced_kf_apply(), which makes it clear that the
rest of the OutputStream is not accessed there.
Currently, it is done once per slice-thread, leading to
one warning per slice-thread in case a YUVJ pixel format
has been originally used.
This also fixes the anomaly that said parameter are only
updated for the user-facing context (whose values are retrievable
via av_opt_get()) if slice-threading is not in use.
Fixes ticket #9860.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Initializing slice threads currently uses the function
(sws_init_context()) that is also used for initializing
user-facing contexts with the only difference being that
nb_threads is set to one before initializing the slice contexts.
Yet sws_init_context() also initializes lots of stuff
that is not slice-dependent, i.e. (src|dst)Range. This
currently only works because the code sets these fields
to the same values for all slice contexts. This is not
nice; even worse, it entails that log messages are printed
once per slice context (and therefore fill the screen).
This commit lays the groundwork to fix this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The encoder is sensitive to changes in precision, and its test target
was a compromise. It was already close to failing on x87 FPUs.
ff_mdct_init used double precision entirely from the scale to computing
the MDCT exp tables. av_tx_init uses single-precision for the scale,
with a small input change which was enough to tip the test into failing on
x87 FPUs.
Increase the fuzz factor in line with other AAC encoder tests to fix.
AVCodecContext.frame_number is actually only incremented
in case encoding was successfull; if e.g. the ff_alloc_packet()
below fails, it won't be incremented and therefore it is possible
for the previous_frame buffer to be allocated for multiple
first frames, leaking every one except the last.
So check for whether there already is a previous frame instead.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The earlier code did not account for the frame header as well
as the block headers; furthermore, in case a large part of
a block is unused (due to padding), the output size may
exceed 3 * width * height (where the dimensions correspond
to the visible pixels) due to the overhead of the zlib header,
so use the padded dimensions to calculate the maximum packet size
(which is also what the actual call to compress2() uses).
Fixes ticket #10053.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes the crash in ticket #10050.
Also ensure that we don't overflow before ff_get_encode_buffer().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Bring it up to date with current practice, as the current text does not
cover everything. Drop the reference to unprefixed exported symbols,
which do not exist anymore.
It describes a general development policy, not code formatting. It is
also not true, as these days we tend to prioritize correctness, safety,
and completeness over code size.
It is currently very out of touch with reality.
* declare we are using C99 fully, rather than C90 plus extensions
* mention our use of stdatomic.h
* mention forbidden C99 features, like VLAs and complex numbers
Do it in set_dispositions() rather than during stream creation.
Since at this point all other stream information is known, this allows
setting disposition based on metadata, which implements #10015. This
also avoids an extra allocated string in OutputStream that was unused
after of_open().
Replace it with an array of streams in each InputFile. This is a more
accurate reflection of the actual relationship between InputStream and
InputFile.
Analogous to what was previously done to output streams in
7ef7a22251.
Encoding init code will currently fall back to a 25fps default when no
framerate is known or specified, but only if there is a known source
input stream. There is no good reason for this condition, so drop it.
VP6 alpha in EA format is a second VP6 encoded video stream where only the Y
component is used and is interpreted as the alpha channel of the first VP6
stream. The alpha VP6 stream is muxed separately from the main VP6 stream, has
its own stream headers and packet headers. In theory the two streams might not
even have the same resolution (although most likely that is not something that
is seen or supported in the wild), but the format is capable of doing it.
Merged VP6 alpha (also known as the VP6A codec) means that a packet of the
video stream contains the corresponding packet of both VP6 substreams like
this:
{OffsetOfAlpha, DataPacket, AlphaDataPacket}
So data and alpha data of a frame is merged to a single packet, this is how VP6
video with alpha is muxed in FLV and SWF.
The first approach is more like how the demuxer sees data in the EA format,
unfortunately it is different to what the FLV or SWF format expects, so -
having no better place for it in the framework - I decided to do an optional
format conversion in the EA demuxer.
Signed-off-by: Marton Balint <cus@passwd.hu>
Add qsv_transcode example which shows how to use qsv to do hardware
accelerated transcoding, also show how to dynamically set encoding
parameters.
examples:
Normal usage:
qsv_transcode input.mp4 h264_qsv output.mp4 "g 60"
Dynamic setting usage:
qsv_transcode input.mp4 hevc_qsv output.mp4 "g 60 asyne_depth 1"
100 "g 120"
This command initializes codec with gop_size 60 and change it to
120 after 100 frames
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Since frame->buf[0] is always NULL in this case, av_buffer_unref
has no effect. If it's not NULL, double-free will happen.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
At the beginning of decoding, if we feed mediacodec too fast, the
input port will return try again. It takes some time for mediacodec
to consume bitstream and output frame. So the output port also return
try again. It possible that mediacodec_receive_frame doesn't consume
any AVPacket and no AVFrame is output. Then both avcodec_send_packet()
and avcodec_receive_frame() return EAGAIN, which shouldn't happen.
This bug can be produced with decoding benchmark on Pixel 3.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Same principle as previous commit, with sufficiently huge rgb2yuv table
values this produces wrong results and undefined behavior.
The unsigned produces the same incorrect results. That is probably
ok as these cases with huge values seem not to occur in any real
use case.
Fixes: signed integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Large rgb2yuv tables and high pixel values cause the intermediate
int32_t of ru*r + gu*g + bu*b to exceed INT_MAX, which is undefined
behavior. This causes libswscale built with LLVM -fsanitize=undefined to
assert. Using unsigned integers instead has defined behavior and
produces identical results, and makes rgb64ToUV_c_template match
rgb64ToY_c_template.
Fixes: signed integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
maximum_buffer_size_ms should only be set if both are specified or if
the user sets it through -svtav1-params buf-sz=val
Signed-off-by: Christopher Degawa <ccom@randomderp.com>
The check of vpx_rac_is_end check(s) are added originally from
1afd246960. It causes a regression
of some vp8 stream. b6b9ac5698 fixes
the regression by a sort of band-aid way. This fixes the wrongness
of the original commit. vpx_rac_is_end() should be called against
the bool decoder for the vp8 headr context, not one for each
coefficient. Reference is vp8_dixie_tokens_process_row() in token.c
in spec 20.16.
Fixes: Ticket 8069
Fixes: regression of 1afd246960.
Fixes: b6b9ac5698
Co-authored-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This previous expression multiplied a constant (outlink->h) that was
guaranteed to be 0 at this point, thus making it always a no-op.
Fix the calculation, and also properly reset the SAR to 1:1 as is now
necessary (the failure to do so previously hid this bug's existence).
As a result of a typo in the source code, this option was completely
non-functional. In order to fix it, without breaking the current default
behavior, explicitly change this default to 0.
This behavior is also consistent with how other scale filters behave by
default, so it's probably best to enshrine it anyways.
After commit c0b93, it's possible that `ff_vk_filter_config_input` never
gets called, leading to `s->vkctx.input_format` being left unset. This
broke the format auto-selection logic in `libplacebo_config_output`,
resulting in a default to yuv420p, instead of defaulting to the input
format as intended.
Fixes: c0b93c4f8b
x265_sei is available since X265_BUILD 88. Bump required version
to 89 to fix the regression from commit 1f58503013, and remove a
conditional compilation.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
svt-av1 v1.2.0 has deprecated vbv_bufsize in favor of using
- maximum_buffer_size_ms (--buf-sz)
- starting_buffer_level_ms (--buf-initial-sz)
- optimal_buffer_level_ms (--buf-optimal-sz)
and vbv_bufsize has not been in use since svt-av1 v0.8.6
Signed-off-by: Christopher Degawa <christopher.degawa@intel.com>
compressed_ten_bit_format has been deprecated upstream and has no effect
and can be removed. Plus, technically it was never used in the first place
since it would require the app (ffmpeg) to set it and do additional
processing of the input frames.
Also simplify alloc_buffer by removing calculations relating to the
non-existant processing.
Signed-off-by: Christopher Degawa <christopher.degawa@intel.com>
Profile can be derived from values codecpar pixel format only with software
formats. For hardware formats, we're forced to parse a frame header to get
the required information.
Signed-off-by: James Almer <jamrial@gmail.com>
Drop the reference to directly committing code, because
- it is highly discouraged and very rarely done these days
- there is no good reason NOT to submit patches for review
Add a link to the development policy chapter.
Frame limiting is now handled using sync queues. This code prevents the
sync queue from triggering EOF, resulting in unnecessarily many frames
being decoded, filtered, and then discarded.
Found-by: U. Artie Eoff <ullysses.a.eoff@intel.com>
Specificaly, the of_add_attachments() call (which can add attachment
streams to the output) and the check whether the output file contains
any streams. They both logically belong in create_streams().
The DVD subtitle parser handles two types of packets: "normal"
packets with a 16-bit length, and HD-DVD packets that set the
16-bit length to 0 and encode a 32-bit length in the next four
bytes. This implies that HD-DVD packets are at least six bytes
long, but the code didn't actually verify this.
The faulty length check results in an out of bounds read for
zero-length "normal" packets that occur in the input, which are
only 2 bytes long, but get misinterpreted as an HD-DVD packet.
When this happens the parser reads packet_len from beyond the
end of the input buffer. The subtitle stream is not correctly
decoded after this point due to the garbage packet_len.
Fixing this is pretty simple: fix the length check so packets
less than 6 bytes long will not be mistakenly parsed as HD-DVD
packets.
Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This is done only to the inputs, not the outputs, because we always
output vulkan hwframes.
Doing so also requires keeping track of backing textures for non-hwdec
formats, but is otherwise a mostly straightforward change to the format
query function. Special care needs to be taken to avoid crashing on
older libplacebo due to AV_PIX_FMT_RGBF32LE et al.
Instead of doing it ad-hoc in `filter_frame`. This is not a huge change
on its own, but paves the way for adding support for more formats in the
future, in particular formats other than AV_PIX_FMT_VULKAN.
This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter
sobel_c: 4537
sobel_avx512icl 2136
Signed-off-by: bwang30 <bin.wang@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
block_index is write-only for the H.261 decoder, so
don't update it by calling ff_update_block_index().
Instead use a function of our own to set/update dest.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also remove some internal.h inclusions which have been
unnecessarily added recently.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The max == 0 case can be removed too but i left it as 50% of the cases use it
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Due to the lack of an active AMF maintainer at the moment, as well
as plans to add the av1 encoder and other improvements of AMF,
I added myself to the maintainers. Timely review and merging
patches targeting AMF integration should improve support
of AMD GPUs and APUs in FFmpeg.
For the last couple of years I have been working on AMF related
patches to ffmpeg and other open source projects.
added a new option 'a53cc' (on by default, as in libx264) for rendering
AV_FRAME_DATA_A53_CC as hevc sei payloads.
the code is a blend of the libx265.c code for writing
AV_FRAME_DATA_SEI_UNREGISTERED with the libx264.c code for writing atsc
a/53 payloads.
Up until now, the ClearVideo decoder separates parsing tiles
and actually using the parsed information: The information is
instead stored in structures which are constantly allocated
and freed. This commit changes this to use the information
immediately, avoiding said allocations. This e.g. reduced
the amount of allocations for [1] from 2,866,462 to 24,720.
For said sample decoding speed improved by 143%.
[1]: https://samples.ffmpeg.org/V-codecs/UCOD/AccordianDance-300.avi
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
postProcess in postprocess_template.c copies a PPContext
to the stack, works with this copy and then copies
it back again. Said local copy uses a hardcoded alignment
of eight, although PPContext has alignment 32 since
cbe27006ce
(this commit was in anticipation of AVX2 code that never landed).
This leads to misalignment in the filter-(pp|pp1|pp2|pp3|qp)
FATE-tests which UBSan complains about. So avoid the local copy.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
postprocess.c currently has C, MMX, MMXEXT, 3DNow as well as
SSE2 versions of its internal functions. But given that only
ancient 32-bit x86 CPUs don't support SSE2, the MMX, MMXEXT
and 3DNow versions are obsolete and are therefore removed by
this commit. This saves about 56KB here.
(The SSE2 version in particular is not really complete,
so that it often falls back to MMXEXT (which means that
there were some identical (apart from the name) MMXEXT
and SSE2 functions; this duplication no longer exists
with this commit.)
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The input data is multiplied by `s->offset` to get normalized output.
`s->target_tp` and `true_peak` is not in dB,
so `s->offset` should be calculated by division instead of subtraction.
Signed-off-by: Rui Zhu <real.zhurui@gmail.com>
Should fix fate failures on Windowx x86 targets, where long is 32 bits.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: James Almer <jamrial@gmail.com>
The amount of lines printed is too high for the verbose level, and the debug
level is a better fit for their content.
Signed-off-by: James Almer <jamrial@gmail.com>
Some formats like FLV can dynamically add streams during packet reading.
FFprobe does check for this and reallocates the global stream info, but does
not reallocate InputFrame's streams and decoders when this happens, which,
as a result, could have caused flushing to occur on an out of bounds stream
index, since the flush loop iterates over fmt_ctx's nb_streams, and not
ifile's, despite using ifile's streams.
This fixes an out of bounds read and segfult.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Since e6afa61be9, no components in
libavcodec enable CONFIG_MDCT. This fixes building "make testprogs".
Signed-off-by: Martin Storsjö <martin@martin.st>
Add skip_frame support to qsvenc. Use per-frame metadata
"qsv_skip_frame" to control it. skip_frame option defines the behavior
of qsv_skip_frame.
no_skip: Frame skipping is disabled.
insert_dummy: Encoder inserts into bitstream frame where all macroblocks
are encoded as skipped.
insert_nothing: Similar to insert_dummy, but encoder inserts nothing.
The skipped frames are still used in brc. For example, gop still include
skipped frames, and the frames after skipped frames will be larger in
size.
brc_only: skip_frame metadata indicates the number of missed frames
before the current frame.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Without this change, MSDK/VPL always defaults the HEVC tier to MAIN if the -level is specified.
Also, according to the HEVC specs, only level >= 4 can support High Tier.
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
This e.g. allows compilers to bake the offset implied
by using ff_vc1_b_field_mvpred_scales[3] into the
general offset; for certain arches this is also necessary
in order to avoid building suboptimal code.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is the only encoder supporting quarter samples.
This also allows to remove the qpeldsp dependency from
mpegvideo_enc.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The MPEG-4 decoder is the only decoder based upon H.263 that
supports quarterpel motion vectors.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The only thing from the H.263 decoder that is reachable
by the VC-1 decoder is ff_h263_decode_init(); but it does
not even use all of it; e.g. h263dsp is unused and so are
the VLCs initialized in ff_h263_decode_init() (they amount
to about 77KB which are now no longer touched).
Notice that one could also call ff_idctdsp_init()
directly instead of ff_mpv_idct_init(); one could even
do so in ff_vc1_init_common().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Decoders that might use quarter pixel motion estimation
(namely MPEG-4 as well as the VC-1 family) currently
use MpegEncContext.me.qpel_(put|avg) as scratch space
for pointers to arrays of function pointers.
(MotionEstContext contains such pointers as it supports
quarter pixel motion estimation.) The MotionEstContext
is unused apart from this for the decoding part of
mpegvideo.
Using the context at all is for decoding is actually
unnecessary and easily avoided: All codecs with
quarter pixels set me.qpel_avg to qdsp.avg_qpel_pixels_tab,
so one can just unconditionally use this in ff_mpv_reconstruct_mb().
MPEG-4 sets qpel_put to qdsp.put_qpel_pixels_tab
or to qdsp.put_no_rnd_qpel_pixels_tab based upon
whether the current frame is a b-frame with no_rounding
or not, while the VC-1-based decoders set it to
qdsp.put_qpel_pixels_tab unconditionally. Given
that no_rounding is always zero for VC-1, using
the same check for VC-1 as well as for MPEG-4 would work.
Since ff_mpv_reconstruct_mb() already has exactly
the right check (for hpeldsp), it can simply be reused.
(This change will result in ff_mpv_motion() receiving
a pointer to an array of NULL function pointers instead
of a NULL pointer for codecs without qpeldsp (like MPEG-1/2).
It doesn't matter.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
vc1_decode_skip_blocks() is only called if the current picture
is a P frame. So setting pict_type to AV_PICTURE_TYPE_P
is redundant; removing it makes pict_type read-only in vc1_block.c
(as it should be).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The only msmpeg4 code that is ever executed by the VC-1 based
decoders is ff_msmpeg4_decode_init() and what is directly
reachable from it. This is:
a) A call to av_image_check_size(), then ff_h263_decode_init(),
b) followed by setting [yc]_dc_scale_table and initializing
scantable/permutations.
c) Afterwards, some static tables are initialized.
d) Finally, slice_height is set.
The replacement for ff_msmpeg4_decode_init() performs a)
just like now; it also sets [yc]_dc_scale_table,
but it only initializes inter_scantable and intra_scantable
and not permutated_intra_[hv]_scantable: The latter are only
used inside decode_mb callbacks which are only called
in ff_h263_decode_frame() which is unused for VC-1.*
The static tables initialized in c) are not used at all by
VC-1 (the ones that are used have been factored out in
previous commits); this avoids touching 327KiB of .bss.
slice_height is also not used by the VC-1 decoder (setting
it in ff_msmpeg4_decode_init() is probably redundant after
b34397b4cd).
*: It follows from this that the VC-1 decoder is not really
based upon the H.263 decoder either; changing this will
be done in a future commit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is in preparation for splitting VC-1 from msmpeg4.
(msmpeg4data.c was originally intended to be just this;
9488b966c7 changed it).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is done since 16af29a7a6
(and is actually unnecessary, because the tables initialized
in ff_msmpeg4_decode_init() are only ever used in vc1_block.c
which is only entered after a call to ff_msmpeg4_decode_init())
in a very ugly manner; said manner had the byproduct of
involving lots of unnecessary allocations and even opening
and closing a hwaccel in case one is used.
This commit achieves the aim of 16af29a7a6
by initializing the VLCs used by VC-1 in ff_vc1_init_common().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
VC1 shares some VLCs with MSMPEG-4, but vc1_block.c
simply duplicates the defines instead of including
the appropriate headers; furthermore, use a proper
prefix for these defines: DC_VLC_BITS is also used
by other codecs.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
and include said header at the place where the VLCs are created.
This allows to make said tables static.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_vc1_norm6_spec has been removed in commit
356be9307c (and it seems that it
has never been used); the declarations of the 8x8_zz arrays meanwhile
have been added in f0c02e1cbc
without having ever been defined.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The defines in vc1data.c are duplicates of the ones in vc1data.h;
they are also pointless, because they are not used anywhere.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary to initialize the VLCs: The only VLC
that was only ever used by the code reachable from the parser
was ff_vc1_bfraction_vlc; and this VLC has been removed.
Yet vc1dsp is still needed for startcode_find_candidate.
Maybe this should be factored out of vc1dsp in a later
commit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Added in b50be4e38dc83389925dc14f24fa11e660d32197;
this check was racy back then (as the VLC could be initialized
concurrently) and it is redundant (always-false)
since commit c742ab4e81.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This check has been added in c617bed34f,
merging ee769c6a7c to fix
a possible segfault if AVCodecContext.codec is not set
as it may be during parsing. While this fixes the segfault,
it has the unfortunate side effect that it makes the output
of the parser dependent on whether a decoder is set (and
ultimately available). The fix later applied in
5d2be71b9e does not have this
downside and makes checking AVCodecContext.codec superfluous.
So remove this check.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There are sill many users of these APIs within libav*, so this commit
introduced too many deprecation warnings, making compilation too noisy and
potentially hiding legit warnings.
Once the remaining users are ported, this can be reapplied.
This reverts commit 76d0038579.
The encoder is fixed point, and uses an MDCT only for analysis. Due
to the slightly different rounding, the encoder makes a different
decision, so the tests have to be adjusted as well.
This patch replaces the transform used in AAC with lavu/tx and removes
the limitation on only being able to decode 960-sample files
with the float decoder.
This commit also removes a whole bunch of unnecessary and slow
lifting steps the decoder did to compensate for the poor accuracy
of the old integer transformation code.
Overall float decoder speedup on Zen 3 for 64kbps: 32%
Mostly consistent formatting and consistently ordering of
warnings/notes to be next to the description.
Additionally group the AV_DICT_* macros.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This is a more explicit iteration API rather than using the "magic"
av_dict_get(d, "", t, AV_DICT_IGNORE_SUFFIX) which is not really
trivial to grasp what it does when casually reading through code.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
dts != pts is actually a spec violation for AV1, given it has no
reordering in the classical sense.
We don't really need the whole timestamp queue in this case and can just
pass through the timestamp as is for both dts and pts.
The encoder seems to be trading blows with hevc_nvenc.
In terms of quality at low bitrate cbr settings, it seems to
outperform it even. It produces fewer artifacts and the ones it
does produce are less jarring to my perception.
At higher bitrates I had a hard time finding differences between
the two encoders in terms of subjective visual quality.
Using the 'slow' preset, av1_nvenc outperformed hevc_nvenc in terms
of encoding speed by 75% to 100% while performing above tests.
Needless to say, it always massively outperformed h264_nvenc in terms
of quality for a given bitrate, while also being slightly faster.
Makes it possible to use deinterlacers which output one frame for each field as fallback if field
matching fails (combmatch=full).
Currently, the documented example with fallback on a post-deinterlacer will only work in case the
deinterlacer outputs one frame per first field (as yadif=mode=0). The reason for that is that
fieldmatch will attempt to match the second field regardless of whether it recognizes the end
result is still interlaced. This produces garbled output with for example mixed telecined 24fps and
60i content combined with a field-based deinterlaced such as yadif=mode=1.
This patch orders fieldmatch to revert to using the second field of the current frame in case the
end result is still interlaced and a post-deinterlacer is assumed to be used.
Signed-off-by: lovesyk <lovesyk@users.noreply.github.com>
Negligible speed difference for avx2 on Zen 2 (Ryzen 5700X) and
Broadwell (Xeon E5-2620 v4):
1690±4.3 decicycles vs. 1693±78.4
1439±31.1 decicycles vs 1429±16.7
Moderate speedup with avx512 on Skylake-X (Xeon D-2123IT):
1.22x faster (793±0.8 vs. 649±5.5 decicycles) compared with avx2
Better speedup with avx512icl on Ice Lake (Xeon Silver 4316):
1.77x faster (784±1.8 vs. 442±11.6 decicycles) compared with avx2
Co-authors:
Henrik Gramner <henrik@gramner.com>
Kieran Kunhya <kierank@obe.tv>
In av1_spec.pdf page 38/669, there is a sentence below:
if ( frame_type == KEY_FRAME && show_frame ) {
for ( i = 0; i < NUM_REF_FRAMES; i++) {
RefValid[ i ] = 0
......
}
......
}
This shows that the condition of invalidating current
DPB frames should be the coming frame_type is KEY_FRAME plus
show_frame is equal to 1. Otherwise, some of the frames
in sequence after KEY_FRAME still refer to the reference frames
before KEY_FRAME, and if these before KEY_FRAME reference
frames were invalidated, these frames could not find their
reference frames, and it could cause image corruption.
Mesa fix is in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19386
Reviewed-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
The order in which the channels are coded in the bitstream do not always follow
the native, bitmask-based order of channels both signaled by the WAV container
and forced by this same decoder. This is the case with layouts containing an
LFE channel, as it's always coded last.
Fixes ticket #9964.
Signed-off-by: James Almer <jamrial@gmail.com>
Generalize the checks for channels in all positions, and properly support
the three height groups (normal, top, bottom) instead of manually setting
the relevant channels for the latter two after the normal height tags were
parsed.
Signed-off-by: James Almer <jamrial@gmail.com>
If PCE defines channels not covered by those in the standard configurations
then don't try to come up with some made up layout and just return them in the
coded order.
Fixes al08_44.mp4 from the conformance suite, now reporting and decoding all 48
channels instead of 10.
Signed-off-by: James Almer <jamrial@gmail.com>
Set the correct amount of tags in tags_per_config[].
Also, there are no channels that correspond to a side element in this
configuration, so reflect this in the list of known/supported channel layouts.
Signed-off-by: James Almer <jamrial@gmail.com>
The IMF CPL contains an optional timecode start address. This patch reads the
latter, if present, into the context's timecode metadata parameter.
This addresses https://trac.ffmpeg.org/ticket/9842.
The current adjustment of input start times just adjusts the tsoffset.
And it does so, by resetting the tsoffset to nullify the new start time.
This leads to breakage of -copyts, ignoring of input_ts_offset, breaking
of -isync as well as breaking wrap correction.
Fixed by taking cognizance of these parameters, and by correcting start times
just before sync offsets are applied.
Provide arm64 neon optimized implementations for hscale16To19 with
filter sizes 4, 8 and X4.
The tests and benchmarks run on AWS Graviton 2 instances.
The results from a checkasm tool are shown below.
hscale_16_to_19__fs_4_dstW_512_c: 6216.0
hscale_16_to_19__fs_4_dstW_512_neon: 2257.0
hscale_16_to_19__fs_8_dstW_512_c: 10417.7
hscale_16_to_19__fs_8_dstW_512_neon: 3112.5
hscale_16_to_19__fs_12_dstW_512_c: 14890.5
hscale_16_to_19__fs_12_dstW_512_neon: 3899.0
hscale_16_to_19__fs_16_dstW_512_c: 19006.5
hscale_16_to_19__fs_16_dstW_512_neon: 5341.2
hscale_16_to_19__fs_32_dstW_512_c: 36629.5
hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
hscale_16_to_19__fs_40_dstW_512_c: 45477.5
hscale_16_to_19__fs_40_dstW_512_neon: 11552.0
(Note, the checkasm tests for these functions haven't been
merged since they fail on x86.)
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Add arm64 neon implementations for hscale 16 to 15 with filter
sizes 4, 8 and X4.
The tests and benchmarks run on AWS Graviton 2 instances.
The results from a checkasm tool are shown below.
hscale_16_to_15__fs_4_dstW_512_c: 6703.5
hscale_16_to_15__fs_4_dstW_512_neon: 2298.0
hscale_16_to_15__fs_8_dstW_512_c: 10983.0
hscale_16_to_15__fs_8_dstW_512_neon: 3216.5
hscale_16_to_15__fs_12_dstW_512_c: 15526.0
hscale_16_to_15__fs_12_dstW_512_neon: 3993.0
hscale_16_to_15__fs_16_dstW_512_c: 20183.5
hscale_16_to_15__fs_16_dstW_512_neon: 5369.7
hscale_16_to_15__fs_32_dstW_512_c: 39315.2
hscale_16_to_15__fs_32_dstW_512_neon: 9511.2
hscale_16_to_15__fs_40_dstW_512_c: 48995.7
hscale_16_to_15__fs_40_dstW_512_neon: 11570.0
(Note, the checkasm tests for these functions haven't been
merged since they fail on x86.)
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Add arm64 neon implementations for hscale 8 to 19 with filter
sizes 4, 4X and 8. Both implementations are based on very similar ones
dedicated to hscale 8 to 15. The major changes refer to saving
the data - instead of writing the result as int16_t it is done
with int32_t.
These functions are heavily inspired on patches provided by J. Swinney
and M. Storsjö for hscale8to15 which were slightly adapted for
hscale8to19.
The tests and benchmarks run on AWS Graviton 2 instances. The results
from a checkasm tool shown below.
hscale_8_to_19__fs_4_dstW_512_c: 5663.2
hscale_8_to_19__fs_4_dstW_512_neon: 1259.7
hscale_8_to_19__fs_8_dstW_512_c: 9306.0
hscale_8_to_19__fs_8_dstW_512_neon: 2020.2
hscale_8_to_19__fs_12_dstW_512_c: 12932.7
hscale_8_to_19__fs_12_dstW_512_neon: 2462.5
hscale_8_to_19__fs_16_dstW_512_c: 16844.2
hscale_8_to_19__fs_16_dstW_512_neon: 4671.2
hscale_8_to_19__fs_32_dstW_512_c: 32803.7
hscale_8_to_19__fs_32_dstW_512_neon: 5474.2
hscale_8_to_19__fs_40_dstW_512_c: 40948.0
hscale_8_to_19__fs_40_dstW_512_neon: 6669.7
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
The FFmpeg encoder does not increment the temporal reference field, so use
that knowledge plus the extradata field to detect buggy versions of the encoder.
Fixes ticket #128.
The SVQ1 interframe mean VLC symbols -128 and 128 are incorrectly swapped
in our SVQ1 implementation, resulting in visible artifacts for some videos.
This patch unswaps the order of these two symbols.
The most noticable example of the artiacts caused by this error can be observed in
https://trac.ffmpeg.org/attachment/ticket/128/svq1_set.7z '352_288_k_50.mov'.
The artifacts are not observed when using the reference decoder
(QuickTime 7.7.9 x86 binary).
As a result of this patch, the reference data for the fate-svq1 test
($SAMPLES/svq1/marymary-shackles.mov) must be modified. For this file, our
decoder output is now bitwise identical to the reference decoder. I have
tested patch with various other samples and they are all now bitwise identical.
ff_mpeg1_dc_scale_table is the default value for
[yc]_dc_scale_table (as set by ff_mpv_common_defaults()).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This e.g. allows compilers to bake the offset implied
by using ff_mpeg12_dc_scale_table[3] (as the SpeedHQ encoder
does) into the general offset; for certain arches this is
also necessary in order to avoid building suboptimal code.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These tables are only accessed in ff_set_qscale()
which only accesses values 1..31 as well as in
encode_picture() in mpegvideo_enc.c, accessing
the value with index 8. So make these tables smaller.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For encoders, mpeg_quant is an option of the MPEG-4 encoder
and therefore constant. This implies that one can set
the dct_unquantize_(intra|inter) function pointers during init.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The frei0r API requires linesize to be width * 4.
Since the align property of ff_default_get_video_buffer2
specifies line alignment, not buffer alignment, the
previous value of 16 breaks this requirement for frames
whose width is a multiple of 8, but not a multiple of 16
as it adds extra padding to ensure line aligment.
Fix Trac ticket #9873
This basically reverts cc0380222a.
At the time of said commit, cleanup on init failure was very
buggy. For codecs with the INIT_CLEANUP cap, the close function
could be called on error even before the private data has been
allocated; and when using frame threading the same could also
happen even without said flag. Some mpegvideo decoders were
affected by the latter.
Yet both of these issues have been fixed long ago, the latter
in commit e9b6617579. Therefore
the workaround in ff_mpv_common_end() can be removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It avoids checks and allows to make ff_wmv2_decode_mb() static;
furthermore, it allows to avoid a config_components.h inclusion.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only encoders need two sets of int16_t [12][64]
(one to save the current best state and one for the current
working state); decoders need only one. This saves 1.5KiB
per slice context for a decoder.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is only used by metasound.c, so this allows to make
the data static.
To do this, remove metasound_data.h, rename metasound_data.c
into metasound_data.h (and add inclusion guards, remove ff_
prefixes etc.).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Namely into a header metasound_twinvq_data.h included
in twinvq.c (the common file of MetaSound and TwinVQ).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This can be achieved by moving the AVOnce out of the structure
containing the function pointers; the latter can then be made
const.
This also has the advantage of eliminating padding in the structure
(sizeof(AVOnce) is four here) and allowing the AVOnces to be put
into .bss (dependening upon the implementation).
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is possible to avoid the factors array for the power-of-two
tables for which said array is unused by using a different
structure for initialization for power-of-two tables than for
non-power-of-two-tables. This saves 3*15*16B from .data.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Treat the 32 bit stride registers as signed.
Alternatively, we could make the stride arguments ptrdiff_t instead
of int, and changing all of the assembly to operate on these
registers with their full 64 bit width, but that's a much larger
and more intrusive change (and risks missing some operation, which
would clamp the intermediates to 32 bit still).
Fixes: https://trac.ffmpeg.org/ticket/9985
Signed-off-by: Martin Storsjö <martin@martin.st>
Support for building with older versions of MSVC (with the
c99wrap/c99conv frontend) was removed in
ce943dd6ac.
Signed-off-by: Martin Storsjö <martin@martin.st>
ff_rl_init() initializes RLTable.(max_level|max_run|index_run);
max_run is unused by the SpeedHQ encoder (as well as MPEG-1/2).
Furthermore, it initializes these things twice (for two passes),
but the second half of this is never used.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_rl_init() initializes RLTable.(max_level|max_run|index_run);
max_run is unused by the MPEG-1/2 encoders (as well as SpeedHQ).
Furthermore, it initializes these things twice (for two passes),
but the second half of this is never used.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to exploit that ff_rl_mpeg1 and ff_rl_mpeg2
only differ in their VLC table.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_rl_mpeg1 and ff_rl_mpeg2 differ only in RLTable.table_vlc,
which ff_rl_init() does not use to initialize RLTable.max_level,
RLTable.max_run and RLTable.index_run. This implies
that these tables agree for ff_rl_mpeg1 and ff_rl_mpeg2;
hence one can just use one of them and avoid calling ff_rl_init()
ff_rl_mpeg2 alltogether.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This will allow to remove ff_rl_mpeg2 soon and remove
all uses of RLTable in MPEG-1/2/SpeedHQ later.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
A two byte sync word is not enough to ensure we got a real syncframe, nor are
all the range checks we do in the first seven bytes. Do therefore an integrity
check for the sync frame in order to prevent the parser from filling avctx with
bogus information.
Signed-off-by: James Almer <jamrial@gmail.com>
Have it only find frame boundaries. The stream props will then be filled once
we have an assembled frame.
Signed-off-by: James Almer <jamrial@gmail.com>
Code freeing the struct on failure is kept for backwards compatibility, but
should be removed in the next major bump, and the existing lavf user adapted.
Signed-off-by: James Almer <jamrial@gmail.com>
for videos with wmv9 rectangles, the region drawn by ff_mss12_decode_rect
may be less than the entire video area. the wmv9 rectangles are used to
calculate the ff_mss12_decode_rect draw region.
Fixes tickets #3255 and #4043
Duplicates of the standard JPEG quantization tables were found in the
AGM, MSS34(dsp), NUV and VP31 codecs. This patch elimates those duplicates,
placing a single copy in jpegquanttables.c.
GCC 11 has a bug: When it creates clones of recursive functions
(to inline some parameters), it clones a recursive function
eight times by default, even when this exceeds the recursion
depth. This happens with encode_block() in libavcodec/svq1enc.c
where a parameter level is always in the range 0..5;
but GCC 11 also creates functions corresponding to level UINT_MAX
and UINT_MAX - 1 (on -O3; -O2 is fine).
Using such levels would produce undefined behaviour and because
of this GCC emits bogus -Warray-bounds warnings for these clones.
Since commit d08b2900a9, certain
symbols that are accessed like ff_svq1_inter_multistage_vlc[level]
are declared with hidden visibility, which allows compilers
to bake the offset implied by level into the instructions
if level is a compile-time constant as it is in the clones.
Yet this leads to insane offsets for level == UINT_MAX which
can be incompatible with the supported offset ranges of relocations.
This happens in the small code model (the default code model for
AArch64).
This commit therefore works around this bug by disabling cloning
recursive functions for GCC 10 and 11. GCC 10 is affected by the
underlying bug (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102513), so the workaround
also targets it, although it only produces three versions of
encode_block(), so it does not seem to trigger the actual issue here.
The issue has been mitigated in GCC 12.1 (it no longer creates clones
for impossible values; see also commit
1cb7fd317c), so the workaround
does not target it.
Reported-by: J. Dekker <jdek@itanimul.li>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: J. Dekker <jdek@itanimul.li>
The current code will override the *_disable fields (set by -vn/-an
options) when creating output streams for unlabeled complex filtergraph
outputs, in order to disable automatic mapping for the corresponding
media type.
However, this will apply not only to automatic mappings, but to manual
ones as well, which should not happen. Avoid this by adding local
variables that are used only for automatic mappings.
Specifically recording_time and stop_time - use local variables instead.
OptionsContext should be input-only to this code. Will allow making it
const in future commits.
This is similar to what was done before for output files and will allow
introducing demuxer-private state in future commits
Unlike for muxing, the code is moved to existing ffmpeg_demux.c rather
than to a new file. The reason is just file size - the demuxing code is
much smaller than muxing.
The next commit and other commits in future will use more ExtParam
buffers.
And combine 2 free functions into single one
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Up until now, av_aes_init() uses a->round_key[0].u8 + t
as dst of memcpy where it is intended for t to greater
than 16 (u8 is an uint8_t[16]); given that round_key itself
is an array, it is actually intended for the dst to be
in a latter round_key member. To do this properly,
just cast a->round_key to unsigned char*.
This fixes the srtp, aes, aes_ctr, mov-3elist-encrypted,
mov-frag-encrypted and mov-tenc-only-encrypted
FATE-tests with (Clang-)UBSan.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The AES code uses av_aes_block, a union consisting of
uint64_t[2], uint32_t[4], uint8_t[4][4] and uint8_t[16].
subshift() performs byte-wise manipulations of two av_aes_blocks,
but when encrypting, it does so with a shift of two bytes;
more precisely, it uses
"av_aes_block *s1 = (av_aes_block *) (s0[0].u8 - s)"
and lateron uses the uint8_t[16] member to access s0.
Yet av_aes_block requires to be suitably aligned for
the uint64_t[2] member, which s0[0].u8 - 2 is certainly
not. This is in violation of 6.3.2.3 (7) of C11. UBSan
reports this in the aes_ctr, mov-3elist-encrypted,
mov-frag-encrypted, mov-tenc-only-encrypted and srtp
tests.
Furthermore, there is another issue here: The pointer points
outside of s0; this works, because all the accesses lateron
use an index >= 3. (Clang-)UBSan reports this as
"runtime error: index -2 out of bounds for type 'uint8_t[16]'".
This commit fixes both of these issues: The latter issue
is fixed by applying an offset of "+ 3" during the cast
and subtracting this from the indices used lateron.
The former issue is solved by not casting to av_aes_block*
at all; instead simply cast to unsigned char*.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For an int array[8][2] using &array[8][0] (which is an int*
pointing to the element beyond the last element of array)
triggers a "runtime error: index 8 out of bounds for type 'int[8][2]'"
from (Clang-)UBSan in the fate-vsynth(1|2|_lena)-snow tests.
I don't know whether this is really undefined behaviour or does not
actually fall under the "pointer arithmetic with the element beyond
the last element of the array is allowed as long as it is not
accessed" exception". All I know is that the code itself does not
read from beyond the last element of the array.
Nevertheless rewrite the code to a form that UBSan does not complain
about.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Namely ScanTable.permutated. The rest of the IDCTDSP-API
is unused as cavs has its own idct.
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is the part of ff_init_scantable() that is used
by all users of said function.
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also rename the scantable variable to idct_permutation
to better reflect what it actually is.
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The eatqi decoder uses a custom IDCT and actually does not
use the IDCTDSP API at all. Somehow it was nevertheless
used to simply apply the identity permutation on ff_zigzag_direct.
This commit stops doing so.
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The eatgq decoder uses a custom IDCT and actually does not
use the IDCTDSP API at all. Somehow it was nevertheless
used to simply apply the identity permutation on ff_zigzag_direct.
This commit stops doing so. It also renames perm to scantable,
because it is only the scantable as given by the spec without
any further permutation performed by us.
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The eamad decoder uses a custom IDCT and actually does not
use the IDCTDSP API at all. Somehow it was nevertheless
used to simply apply the identity permutation on ff_zigzag_direct.
This commit stops doing so.
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
summary: This patch modifies the `curves` filter with new `interp` option
to let user pick the existing natural cubic spline interpolation
and the new PCHIP interapolation.
reason: The natural cubic spline does not impose monotonicity between
the keypoints. As such, the fitted curve may vary wildly against
user's intension. The PCHIP interpolation is not as smooth as
the natural spline but guarantees the monotonicity. Providing
both options enhances users experience (e.g., reduces the number
of keypoints to realize the desired curve). See the related bug
report for the example of an ill-interpolated curve.
alternate solution:
Both Photoshop and GIMP appear to use monotonic interpolation in
their curve tools, which were the models for this filter. As
such, an alternate solution is to drop the natural spline and
go without the `interp` option.
related bug report: https://trac.ffmpeg.org/ticket/9947 (filed by myself)
Signed-off-by: Takeshi (Kesh) Ikuma <tikuma@hotmail.com>
Renaming the decoder to speedhqdec.c makes sense since
we have an encoder in speedhqenc.c. Given that ff_rl_speedhq
is also used by the encoder, it is kept in speedhq.c
and a proper header for it is created to replace the ad-hoc
declaration in speedhqenc.c. This also allows to remove
the check for CONFIG_SPEEDHQ_DECODER, as it is always true
when speedhqdec.c is compiled.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The number of channels that is checked here is automatically
valid because it has just been set by us (based upon an entry
in codec_props).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It just contains duplicates of values from AVCodecContext
as well as an write-only pointer to said AVCodecContext.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: signed integer overflow: 9223372036854550860 + 530259564 cannot be represented in type 'long'
Fixes: 49093/clusterfuzz-testcase-minimized-ffmpeg_dem_ASF_O_fuzzer-4697179192688640
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
codec_id is always AV_CODEC_ID_MPEG4 for mpeg4_decode_mb(),
as the MPEG-4 decoder is the only decoder for which
ff_mpeg4_decode_picture_header() as well as decode_init()
are ever called and these are the only places where
the decode_mb function pointer is ever set to mpeg4_decode_mb().
ff_mpeg4_workaround_bugs() is also only called for the MPEG-4
decoder (the caller checks the codec id).
(ff_mpeg4_decode_picture_header() is also called for the MPEG-4
parser, but it never uses the decode_mb function pointer.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is set in the vol header and is therefore a sequence and
not only a picture property.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is only used by gmc/gmc1 which is only used by the MPEG-4
decoder, so move it to Mpeg4DecContext and rename it
to Mpeg4VideoDSP. Also compile it iff the MPEG-4 decoder is compiled.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
None of the MPEG-1/2 codecs support frame threading,
so one can optimize the check for it away.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This has the advantage of not having to check for whether
a given MpegEncContext is actually a decoder or an encoder
context at runtime.
To do so, mpv_reconstruct_mb_internal() is moved into a new
template file that is included by both mpegvideo_enc.c
and mpegvideo_dec.c; the decoder-only code (mainly lowres)
are also moved to mpegvideo_dec.c. The is_encoder checks are
changed to #if IS_ENCODER in order to avoid having to include
headers for decoder-only functions in mpegvideo_enc.c.
This approach also has the advantage that it is easy to adapt
mpv_reconstruct_mb_internal() to using different structures
for decoders and encoders (e.g. the check for whether
a macroblock should be processed for the encoder or not
uses MpegEncContext elements that make no sense for decoders
and should not be part of their context).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, we inlined lowres_flag as well as is_mpeg12
independently (unless CONFIG_SMALL was true); this commit
changes this to instead inline mpv_reconstruct_mb_internal()
(at most) four times, namely once for encoders, once for decoders
using lowres and once for non-lowres mpeg-1/2 decoders and once
for non-lowres non-mpeg-1/2 decoders (mpeg-1/2 is not inlined
in case of CONFIG_SMALL). This is neutral performance-wise,
but proved beneficial size-wise: It saved 1776B of .text
for GCC 11 or 1344B for Clang 14 (both -O3 x64).
Notice that inlining is_mpeg12 for is_encoder would not really
be beneficial, as the encoder codepath does mostly not depend
on is_mpeg12 at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There are two types of checks for whether the current codec
is MPEG-1/2 in mpv_reconstruct_mb_internal(): Those that are
required for correctness and those that are not; an example
of the latter is "is_mpeg12 || (s->codec_id != AV_CODEC_ID_WMV2)".
The reason for the existence of such checks is that
mpv_reconstruct_mb_internal() has the av_always_inline attribute
and is_mpeg12 is usually inlined, so that in case we are dealing
with MPEG-1/2 the above check can be completely optimized away.
But is_mpeg12 is not always inlined: it is not in case
CONFIG_SMALL is true in which case is_mpeg12 is always zero,
so that the checks required for correctness need to check
out_format explicitly. This is currently done via a macro
in mpv_reconstruct_mb_internal(), so that the fact that
it is CONFIG_SMALL that determines this is encoded at two places.
This commit changes this by making is_mpeg12 a three-state:
DEFINITELY_MPEG12, MAY_BE_MPEG12 and NOT_MPEG12. In the second
case, one has to resort to check out_format, in the other cases
is_mpeg12 can be taken at face-value. This will allow to make
inlining is_mpeg12 more flexible in a future commit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Both the FFV1 decoder and encoder use a template of their own
to generate code multiple times. They also use a common template,
used by both decoder and encoder templates which is currently
instantiated in ffv1.h (and therefore also in ffv1.c, which
doesn't need it at all).
All these templates have the prerequisite that two macros
are defined, namely RENAME() and TYPE. The codec-specific
templates call the functions generated via the common template
via the RENAME() macro and therefore the macros used for
the common template must coincide with the macros used for
the codec-specific templates. But then it is better to not
instantiate the common template in ffv1.h, but in the codec
specific templates.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Some parts of mpegvideo.c behave differently depending
upon whether AVCodecContext.draw_horiz_band is set or not.
This differing behaviour makes lots of FATE tests fail
and leads to garbage output, although setting this callback
is not supposed to change the output at all.
These checks have been added in commits
3994623df2 and
b68ab2609c. The commit messages
do not contain a real reason for adding the checks and it is
indeed a mystery to me. But removing these checks fixes
the FATE tests when one adds an (empty) draw_horiz_band
when using a codec that claims to support it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
librav1e provides a function to create extradata, so use it instead of
extracting the sequence header OBU from packets.
Signed-off-by: James Almer <jamrial@gmail.com>
Up until now, ff_startcode_find_candidate_c() simply casts
an uint8_t* to uint64_t*/uint32_t* to read 64/32 bits at a time
in case HAVE_FAST_UNALIGNED is true. Yet this ignores the
alignment requirement of these types as well as effective type
rules of the C standard. This commit therefore replaces these
direct accesses with AV_RN64/32; this also improves
readability.
UBSan reported these unaligned accesses which happened in 233
FATE-tests involving H.264 and VC-1 (this has also been reported
in tickets #8138 and #8485); these tests are fixed by this commit.
The output of GCC with -O3 is unchanged for aarch64, loongarch,
ppc and x64 (as well as for arches like alpha for which
HAVE_FAST_UNALIGNED is never true in the first place).
There was only a slight difference for mips and arm.
I don't know about the speed impact of them.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Now that we have proper options for defining display matrix
overrides, this should no longer be required.
fftools does not have its own versioning, so for now the define is
just set to 1 and disables the functionality if set to zero.
This enables overriding the rotation as well as horizontal/vertical
flip state of a specific video stream on the input side.
Additionally, switch the singular test that was utilizing the rotation
metadata to instead override the input display rotation, thus leading
to the same result.
This e.g. allows compilers to bake the "+ 256" offset
used to access ff_v2_dc_(lum|chroma)_table into
the general offset; for certain arches this is also necessary
in order to avoid building suboptimal code.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These tables are not exported as avpriv symbols, but instead
included into every library using them. Therefore they
can be mark with the hidden elf visibility. For certain arches
this is necessary in order to avoid building suboptimal code;
for other arches it just allows the compiler to simplify accesses
like ff_mjpeg_bits_dc_luminance + 1 because the "+ 1" can be baked
into the offset.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is now possible since OutputStream is a child of OutputFile and the
code allocating it can access MuxStream. Avoids the overhead and extra
complexity of allocating two objects instead of one.
Similar to what was previously done for OutputFile/Muxer.
Replace it with an array of streams in each OutputFile. This is a more
accurate reflection of the actual relationship between OutputStream and
OutputFile. This is easier to handle and will allow further
simplifications in future commits.
This is now possible since the code allocating OutputFile can see
sizeof(Muxer). Avoids the overhead and extra complexity of allocating
two objects instead of one.
Similar to what is done e.g. for AVStream/FFStream in lavf.
ffmpeg_opt.c currently contains code for
- parsing the options provided on the command line
- opening and initializing input files based on these options
- opening and initializing output files based on these options
The code dealing with each of these is for the most part disjoint, so it
makes sense to move them to separate files. Beyond reducing the quite
considerable size of ffmpeg_opt.c, this will also allow exposing muxer
internals (currently private to ffmpeg_mux.c) to the initialization
code, thus removing the awkward separation currently in place.
qsvenc makes a copy when the input in system memory is not padded as the
SDK requires, however the padding area is not filled with right data
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Doxygen does not properly form references that span multiple levels,
so instead reword it a bit and manually add the references to what
they should point to.
The FF_PAD_STRUCTURE is too complex for doxygen to be able to
properly handle, resulting in completely broken AVBPrint documentation.
To fix that, tell Doxygen what to expand that macro to.
Avoids doxy to interpret these as internal references forced by the #
character, fixing the warnings:
warning: explicit link request to 'nanoTime()' could not be resolved
warning: explicit link request to 'releaseOutputBuffer(int,long)'
could not be resolved
The intention here was probably to document this as use of
conditionals does not make sense in a comment.
Fixes doxy warning:
warning: explicit link request to 'if' could not be resolved
Separate the blocks to make the grouping easier to grasp,
add the file properly to the group and fix the file description
incorrectly used as filename, fixing the following doxy warning:
warning: the name 'Colorspace' supplied as the argument in
the \file statement is not an input file
Make it a bit easier to grasp the grouping when not
unnecessarily splitting comment blocks.
Additionally do not try to add lavu_video_stereo3d to itself, resolving
the following doxy warning:
warning: Refusing to add group lavu_video_stereo3d to itself
Make it a bit easier to grasp the grouping when not
unnecessarily splitting comment blocks.
Additionally do not try to add lavu_video_spherical to itself, resolving
the following doxy warning:
warning: Refusing to add group lavu_video_spherical to itself
Make it a bit easier to grasp the grouping when not
unnecessarily splitting comment blocks.
Additionally do not try to add lavu_video_display to itself, resolving
the following doxy warning:
warning: Refusing to add group lavu_video_display to itself
This is more natural, because said context is only used
for the duration of one call to svq1_encode_frame().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The encoder uses ff_svq1_inter_mean_vlc + 256 and setting
hidden visibility allows to bake this "+ 256" into the
general offset of ff_svq1_inter_mean_vlc and the code
accessing it.
For certain arches, this is also required for the compiler
to not produce overtly pessimistic code that can't be fixed
up by the linker lateron.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Currently, SVQ1EncContext is defined in a header that is also
included by the arch-specific code that initializes the one
and only dsp function that this encoder uses directly.
But the arch-specific functions to set this dsp function
do not need anything from SVQ1EncContext. This commit therefore
adds a small SVQ1EncDSPContext whose only member is said
function pointer and renames svq1enc.h to svq1encdsp.h
to avoid exposing unnecessary internals to these init
functions (and the whole mpegvideo with it).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Bayer sources are read in groups of 2 lines (e.g. for a
BGGR flavor, the first row contains only B and G samples,
while the second row contains only G and R samples). They
need to be read as a whole.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This fixes a regression from commit 36117968ad.
wrapped_url_read() used to be able to return positive number from
ffurl_read(). It relies on the result to check if EOF is reached in
async_buffer_task().
But FIFO callbacks must return 0 on success. This should be handled
in ring_write() instead.
Test case:
ffmpeg -f lavfi -i testsrc -t 1 test.mp4
ffmpeg -i async:test.mp4
Signed-off-by: Guangyu Sun <gsun@roblox.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This simplifies the code as there is no other place the error buffer
is needed, so the av_err2str helper macro can be used.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
av_err2str which is a wrapper for av_strerror already calls
strerror_r if available and if not has a fallback for the other
error codes that would be handled by that, so manually calling
strerror again if it fails is not necessary.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
av_err2str which is a wrapper for av_strerror already calls
strerror_r if available and if not has a fallback for the other
error codes that would be handled by that, so manually calling
strerror again if it fails is not necessary.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This filter currently keeps the input timebase, but produces CFR output.
It is thus simpler to use 1/output fps as the output timebase.
Also, set output frame durations.
Currently it would essentially change the find_stream_info setting for
the file it was specified for and all following files, which is unusual
and somewhat unexpected behaviour for a per-file option and not even
documented to behave like this.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
VSETVLI xd, x0, ...' has rather nonobvious semantics:
- If xd is x0, then it preserves the current vector length.
- If xd is not x0, it sets the vector length to the supported maximum.
Also somewhat confusingly, while VMV.X.S always does its thing
regardless of the selected vector length, VMV.S.X does _nothing_ if the
selected vector length is zero.
So the current code breaks fails to initialise the accumulator if we
are unlucky to have a selected vector length of zero on entry. Fix it
by forcing the vector length to one.
The binkaudio decoders don't need mdct or sinewin at all;
and binkaudio_dct doesn't need rdft directly (but nevertheless
uses it indirectly via dct).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Add an AV_PIX_FMT_NE macro for RGB32FBE/RGB32FLE and also one for
RGBA32FBE/RGBA32FLE for packed 32-bit float RGB samples, and also
packed 32-bit float RGBA samples, respectively.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Leo Izen <leo.izen@gmail.com>
There is no MMX code for (add|put|put_signed)_pixels_clamped
since commit bfb28b5ce8, so use
declare_func instead of declare_func_emms() to also test that
we are not in MMX mode after return.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no MMX code for diff_bytes since commit
230ea38de1, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no MMX code for add_int16 since commit
4b6ffc2880, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no MMX code for llviddsp after commit
fed07efcde, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no MMX code for pixblockdsp after commit
92b5800277, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no MMX code for audiodsp after commit
3d716d38ab, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no MMX code for blockdsp after commit
ee551a21dd, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no MMX code for vc1_inv_trans_8x8 or
vc1_unescape_buffer, so use declare_func instead of
declare_func_emms() to also test that we are not in MMX
mode after return.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Currently it is done in several different ways, which
might cause needless dependencies or in case of
tx_float_neon.S is incorrect.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Currently, it is accessed via a pointer (ff_celt_window)
exported from opustab.h which points inside a static array
(ff_celt_window_padded) in opustab.h. Instead export
ff_celt_window_padded directly and make opustab.h
a static const pointer pointing inside ff_celt_window_padded.
Also mark all the declarations in opustab.h as hidden,
so that the compiler knows that ff_celt_window has a fixed
offset from the code even when compiling position-independent
code.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
HEVC spec has CRA frame which allows random access with open GOP, hence
it can achieve higher compression efficiency.
Removing the entry was suggested by Andreas
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
AV_PIX_FMT_P012, AV_PIX_FMT_Y212 and AV_PIX_FMT_XV36 are used in
FFmpeg and MFX_FOURCC_P016, MFX_FOURCC_Y216, and MFX_FOURCC_Y416 are used
in the SDK
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
P012, Y212 and XV36 are used for 12bit content in FFmpeg VAAPI, so
these formats should be used in FFmpeg QSV too, however the SDK only
declares support for P016, Y216 and Y416. So this commit fudged mappings
between AV_PIX_FMT_P012 and MFX_FOURCC_P016, AV_PIX_FMT_Y212 and
MFX_FOURCC_Y216, AV_PIX_FMT_XV36 and MFX_FOURCC_Y416.
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
XV30 is used for 10bit 4:4:4 content in FFmpeg VAAPI, so XV30 should be
used for 10bit 4:4:4 content in FFmpeg QSV too because QSV is based on
VAAPI on Linux. However the SDK only declares support for Y410 but does
nothing with the alpha in Y410, so this commit fudged a mapping between
AV_PIX_FMT_XV30 and MFX_FOURCC_Y410.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
We can't get Shift from bit depth for some formats in the SDK. For
example, bit depth is 10, however Shift is 0 for Y410 (XV30 in FFmpeg).
In order to support these formats in the next commits, this patch
specified Shift for each format
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Although the DSP function only uses single precision from RISC-V F, the
caller may leave double precision values in the spilled registers if the
calling convention supports double precision hardware floats. Then, we
need to save and restore FS registers as double precision.
Conversely, we do not need to save anything at all if an integer calling
convention is in use. However we can assume that single precision floats
are supported, since the Zve32f extension implies the F extension.
So for the sake of simplicity, we always save at least single precision
values.
In theory, we should even save quadruple precision values if the LP64Q
ABI is in use. I have yet to see a compiler that supports it though.
This adds a variant of the postfilter for use with 512-bit vectors.
Half a vector is enough to perform the scalar product. Normally a whole
vector would be used anyhow. Indeed fractional multiplers are no faster
than the unit multipler.
But in this particular function, a full vector makes up 16 samples,
which would be loaded at each iteration of the outer loop. The minimum
guaranteed CELT postfilter period is only 15. Accounting for the edges,
we can only safely preload up to 13 samples.
The fractional multipler is thus used to cap the selected vector length
to a safe value of 8 elements or 256 bits.
Likewise, we have the 1024-bit variant with the quarter multipler. In
theory, a 2048-bit one would be possible with the eigth multipler, but
that length is not even defined in the specifications as of yet, nor is
it supported by any emulator - forget actual hardware.
This adds a variant of the postfilter for use with 256-bit vectors.
As a single vector is then large enough to perform the scalar product,
the group multipler is reduced to just one at run-time.
The different vector type is passed via register. Unfortunately,
there is no VSETIVL instruction, so the constant vector size (5) also
needs to be passed via a register.
On most cases, the vector type (VTYPE) for the RISC-V Vector extension
is supplied as an immediate value, with either of the VSETVLI or
VSETIVLI instructions. There is however a third instruction VSETVL
which takes the vector type from a general purpose register. That is so
the type can be selected at run-time.
This introduces a macro to load a (valid) vector type into a register.
The syntax follows that of VSETVLI and VSETIVLI, with element size,
group multiplier, then tail and mask policies.
This is implemented for a vector size of 128-bit. Since the scalar
product in the inner loop covers 5 samples or 160 bits, we need a group
multipler of 2.
To avoid reconfiguring the vector type, the outer loop, which loads
multiple input samples sticks to the same multipler. Consequently, the
outer loop loads 8 samples per iteration. This is safe since the minimum
period of the CELT codec is 15 samples.
The same code would also work, albeit needlessly inefficiently with a
vector length of 256 bits. A proper implementation will follow instead.
Fixes the following integer overflows:
libavfilter/vf_rotate.c:273:13: runtime error: signed integer overflow: 92951468 + 2058533568 cannot be represented in type 'int'
libavfilter/vf_rotate.c:273:37: runtime error: signed integer overflow: 39684 * 54149 cannot be represented in type 'int'
libavfilter/vf_rotate.c:272:13: runtime error: signed integer overflow: 247587320 + 1900985032 cannot be represented in type 'int'
libavfilter/vf_rotate.c:272:37: runtime error: signed integer overflow: 42584 * 50430 cannot be represented in type 'int'
libavfilter/vf_rotate.c:272:50: runtime error: signed integer overflow: 65083 * 52912 cannot be represented in type 'int'
libavfilter/vf_rotate.c:273:50: runtime error: signed integer overflow: 65286 * 38044 cannot be represented in type 'int'
Fixes ticket #9799, different output with different compilers.
Since commit 4fc2531fff opus.c
contains only the celt stuff shared between decoder and encoder.
meanwhile, opus_celt.c is decoder-only. So the new names
reflect the actual content better than the current ones.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The PutBitContext has already been flushed a few lines above
and nothing has been written to it in the meantime.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(There is a small issue that is now being treated differently:
The earlier code would record a position in a buffer that
is being written to via put_bits(), then write data,
then overwrite the byte at the position recorded earlier
and only then flush the PutBitContext. In case there was
no writeout in the meantime, said flush would overwrite
what one has just written. This never happened in my tests,
but maybe it can happen. In this case this commit fixes
this issue by flushing before overwriting the old data.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_square_tab is always used with an offset; if this table
is marked as hidden, the compiler can infer that it and
therefore also ff_square_tab + 256 have a fixed offset
from the code. This allows to avoid performing "+ 256"
at runtime by baking it into the offset from the code to
the table.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This codec uses BswapDSP, BlockDSP and IDCTDSP.
The former never used MMX, the latter does not use it
for idct_put since bfb28b5ce8
and BlockDSP does not use it since commit
ee551a21dd.
Therefore this emms_c() is can be removed.
(It was actually always redundant, because its caller
(decode_simple_internal()) calls emms_c() itself afterwards.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These matrices are only used for MJPEG, not for LJPEG.
So only check them for the former. This is in preparation
for removing said matrices from LJPEG altogether
(i.e. sending NULL matrices).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The codes here have the property that the long codes
are to the left of the tree (each zero bit child node
is by definition to the left of its one bit sibling);
they also have the property that among codes of the same length,
the symbol is ascending from left to right.
These properties can be used to create the codes from
the lengths in only two passes over the array of lengths
(the current code uses one pass for each length, i.e. 32):
First one counts how many nodes of each length there are.
Then one calculates the range of codes of each length
(possible because the codes are ordered by length in the tree).
This enables one to calculate the actual codes with only
one further traversal of the length array.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
While the share of elements used by both is quite big, the amount
of code shared between the decoders and encoders is negligible.
Therefore one can easily split the context if one wants to.
The reasons for doing so are that the non-shared elements
are non-negligible: The stats array which is only used by
the encoder takes 524288B of 868904B (on x64); similarly,
pix_bgr_map which is only used by the decoder takes 16KiB.
Furthermore, using a shared context also entails inclusions
of unneeded headers like put_bits.h for the decoder and get_bits.h
for the encoder (and all of these and much more for huffyuv.c).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These parameters are easily accessible whereever they
are accessed, so using copies from HYuvContext is
unnecessary.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
av_pix_fmt_get_chroma_sub_sample() is superfluous if one
already has an AVPixFmtDescriptor.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is nearly unused anyway, so stop use the field altogether.
This is in preparation for splitting HYuvContext into
decoder and encoder contexts.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
All codecs here have the FF_CODEC_CAP_INIT_CLEANUP set,
so ff_huffyuv_common_end() will be called automatically
in encode_end() on error.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The ffvhuff encoder has AVCodec.pix_fmts set and therefore
encode_preinit_video() checks that the used pixel format
is permissible.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avio_seek() is called inside check(). Seeking to 'off' then seeking
to 'off + i' is unefficient, and it can loop 64 * 1024 times in the
worst case. When probe a malformed file over HTTP, it looks like
stucked forvever. ffio_ensure_seekback() doesn't solve the issue
when the stream is seekable but slow.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Add PCR at keyframe can be undesirable when -pcr_period is
specified. Add an flag to disable this behavior.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
There is no MMX code for loop filters since commit
6a551f1405, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
PixblockDSP does not use MMX functions any more since
92b5800277 and FDCTDSP
since d402ec6be9.
BswapDSP never used MMX, so that the emms_c() here
is unnecessary.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This also enables the compiler to optimize the implicit
checks performed by the PutBit-API away (Clang does so).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The current pbc might be small for an obu frame, so a new pbc is
required then parse this obu frame again. Because
CodedBitstreamAV1Context has already been updated for this obu frame, we
need to restore CodedBitstreamAV1Context, otherwise
CodedBitstreamAV1Context doesn't match this obu frame when parsing obu
frame again, e.g. CodedBitstreamAV1Context.order_hint.
$ ffmpeg -i input.ivf -c:v copy -f null -
[...]
[av1_frame_merge @ 0x558bc3d6f880] ref_order_hint[i] does not match
inferred value: 20, but should be 22.
[av1_frame_merge @ 0x558bc3d6f880] Failed to write unit 1 (type 6).
[av1_frame_merge @ 0x558bc3d6f880] Failed to write packet.
[obu @ 0x558bc3d6e040] av1_frame_merge filter failed to send output
packet
Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
It does not require anything that is being set between
the new position where it is called and the old position
where it used to be called; and nothing that it sets
gets overwritten between these two positions.
Doing so allows to remove a check lateron.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It does not require anything that is being set between
the new position where it is called and the old position
where it used to be called; and nothing that it sets
gets overwritten between these two positions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Thanks, Pierre-Anthony. I've updated the patch to remove the unnecessary UL and it's now using mxf_match_uid() to detect the EKLV packet.
Signed-off-by: Richard Ayres <richard.ayres@bydeluxe.com>
Using unsigned and negative linesizes doesn't really work.
Use ptrdiff_t instead. This fixes the fraps-v0 and fraps-v1
FATE tests with negative linesizes.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Use ptrdiff_t instead of unsigned for linesizes.
Fixes the armovie-escape124 FATE test with negative linesizes.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
c93.c used an int for the stride and an unsigned for the current
linenumber. This does not work when using negative linesizes.
So use ptrdiff_t for stride and int for linenumber.
This fixes the cyberia-c93 FATE test when using negative linesizes.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The data in SGI images is stored planar, so exporting
it via planar pixel formats is natural.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SGI is intra-frame only; the decoder therefore does not
maintain any state between frames, so remove
the private context.
Also rename depth to nb_components.
Reviewed-by: Tomas Härdin <git@haerdin.se>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The earlier code used "p->data[0] + p->linesize[0] * s->height" with
the latter being unsigned, which gives the wrong value for negative
linesizes. There is also a not so obvious problem with this:
In case of negative linesizes, the last line is the start of
the allocated buffer, so using the line after the last line
would involve undefined pointer arithmetic. So don't do it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes segfaults with negative linesizes; in particular,
this affected the sunraster-(1|8|24)bit-(raw|rle) and
sunraster-8bit_gray-raw FATE tests.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
A lot of the stuff in ASV1Context is actually only used
by decoders or encoders, but not both: Of the seven contexts
in ASV1Context, only the BswapDSPContext is used by both.
So splitting makes sense.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Simply taking the Zbb REV8 instruction into use in a simple loop gives
some significant savings:
bswap_buf_c: 1081.0
bswap_buf_rvb_b: 771.0
But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with
just one additional shift, and one fewer load, effectively doubling the
bandwidth. Consequently, this patch is useful even if the compile-time
target has Zbb enabled for C code:
bswap_buf_c: 1081.0
bswap_buf_rvb_b: 341.0 (this patch)
On the other hand, this approach fails miserably for bswap16_buf as the
ratio of shifts and stores becomes unfavorable compared to naïve C:
bswap16_buf_c: 1542.0
bswap16_buf_rvb_b: 1803.7
Unrolling to process 128 bits (4 samples) at a time actually worsens
performance ever so slightly:
bswap_buf_c: 1081.0
bswap_buf_rvb_b: 408.5
Unfortunately, it is common, and will remain so, that the Bit
manipulations are not enabled at compilation time. This is an official
policy for Debian ports in general (though they do not support RISC-V
officially as of yet) to stick to the minimal target baseline, which
does not include the B extension or even its Zbb subset.
For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or
even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems
impractical. But at least it can work for the byte-swap DSP functions.
To avoid data dependencies, this does the following unroll, which
requires one extra but probably free addition:
coeff = (b * left_weight) >> decorr_shift;
b += a;
a -= coeff;
b -= coeff;
swap(a, b);
We never guard against a user freeing/stealing the private context;
and returning AVERROR(EIO) is inappropriate.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Move ROUND_MUL* macros to their only users and the Celt
macros to opus_celt.h. Also improve the other headers
a bit while at it.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Simply don't include opus_pvq.h in opus_celt.h: The latter only
uses pointers to CeltPVQ.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
opus.h (which is used by all the Opus code) currently includes
several structures only used by the parser and the decoder;
several elements of OpusContext are even only used by the decoder.
This commit therefore moves the part of OpusContext that is shared
between these two components (and used by ff_opus_parse_extradata())
out into a new structure and moves all the other accompanying
structures and functions to a new header, opus_parse.h; the
functions itself are also moved to a new file, opus_parse.c.
(This also allows to remove several spurious dependencies
of the Opus parser and encoder.)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
More than 2 channels seems unsupported, the code seems to just output empty extra channels
Fixes: Timeout
Fixes: 51569/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SPEEX_fuzzer-5511509165342720
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 119760682 - -2084600173 cannot be represented in type 'int'
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_VIVIDAS_fuzzer-6745781167587328
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Otherwise p->linesize[0] * y will be evaluated as an unsigned
which leads to segfaults in case linesize is negative.
This happens in the apng-dispose-previous FATE-test in case
one makes get_buffer return pictures with negative linesizes.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This might happen in avio_write() if size == 0
when the direct codepath is taken. It is undefined behaviour
according to the spec although it happens to work in practice.
Fixes the webm-webvtt-remux FATE-test under UBSan.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
partial_corr is an int16_t and so the av_clipl_int32()
never clips and can be removed. This also avoids
undefined left-shifts of negative numbers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Return immediately if not enough leftover bits are available
when flushing. This is simpler and also avoids an
init_get_bits(gb, NULL, 0) (which currently leads to NULL + 0,
which is UB; this affects the lossless-wma(|-1|-2|-rawtile)
FATE tests).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Happens when flushing. This triggers NULL + 0 (which is UB) in
init_get_bits_xe (which previously errored out, but the return value
has not been checked) and in copy_bits().
This fixes the wmavoice-(7|11|19)k FATE-tests with UBSan.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
src2 is used in CAVS_SUBPIX_HV iff FULL is true (it is exactly
for the egpr functions); otherwise it might be NULL. So check
for FULL before doing pointer arithmetic.
Fixes a "src/libavcodec/cavsdsp.c:524:1: runtime error: applying
non-zero offset 8 to null pointer" from UBSan.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
As long as ff_mpeg12_common_init() existed in mpeg12.c,
it added a dependency of mpeg12.o on mpegvideodata.o
(which provides ff_mpeg2_dc_scale_table, which is used
in ff_mpeg12_common_init()). mpegvideodata.o is normally
provided by the mpegvideo subsystem and therefore several
codecs and the MPEG-1/2 parser added a configure dependency
on said subsystem (additionally, the eatqi decoder just
added a Makefile dependency on mpegvideodata.o).
Given that ff_mpeg12_common_init() is no more, these dependencies
can be removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It only sets [yc]_dc_scale_table and these tables are only
read in ff_set_qscale(); but the MPEG-1/2 decoders don't call
ff_set_qscale() at all.
(Furthermore, given that intra_dc_precision is always zero
for a decoder at this point, ff_mpeg12_common_init()
actually set these pointers to what ff_mpv_common_defaults()
already set them.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is better place for these declarations than
mpeg12data.h as RL VLC are just a variant of VLCs.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Starting with an h264 implementation. Can be extended to support other codecs.
A few caveats:
- OpenGOP streams are currently not supported. The firt packet must be an IDR
frame.
- In some streams, a few frames at the end may not get a reordered PTS when
they reference frames past EOS. The code added to derive timestamps from
previous frames needs to extended.
Addresses ticket #502.
Signed-off-by: James Almer <jamrial@gmail.com>
Provide optimized implementation for vsse_intra8 for arm64.
Performance tests are shown below.
- vsse_5_c: 87.7
- vsse_5_neon: 26.2
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of vsse8 for arm64.
Performance comparison tests are shown below.
- vsse_1_c: 141.5
- vsse_1_neon: 32.5
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Add vectorized implementation of nsse8 function.
Performance comparison tests are shown below.
- nsse_1_c: 256.0
- nsse_1_neon: 82.7
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of pix_abs8 function for arm64.
Performance comparison tests are shown below:
pix_abs_1_1_c: 162.5
pix_abs_1_1_neon: 27.0
pix_abs_1_2_c: 174.0
pix_abs_1_2_neon: 23.5
pix_abs_1_3_c: 203.2
pix_abs_1_3_neon: 34.7
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Apparently this option was intended (the context contains a
currently-unused frame_rate field), but was never added. This results in
the output timebase being unset after config_output(), so the input
audio timebase ends up being used for video output, which is clearly
wrong.
Add an option for setting output video framerate. Also set output frame
durations.
It has been deprecated in favor of the aresample filter for almost 10
years.
Another thing this option can do is drop audio timestamps and have them
generated by the encoding code or the muxer, but
- for encoding, this can already be done with the setpts filter
- for muxing this should almost never be done as timestamp generation by
the muxer is deprecated, but people who really want to do this can use
the setts bitstream filter
The butterflies_fixed function pointer declaration specifies av_restrict
for the first two pointer arguments. So the corresponding function
definitions should honor this declaration.
MSVC emits warning C4113 for this.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Users can't make anything with its content.
Making it opaque might allow us to avoid one level of indirection.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
vorbis.h currently contains stuff only used by the native
Vorbis codecs and some Vorbis tables, which are also used by
Opus and libvorbis. Therefore split the data out into a header
of its own.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Don't increment back_frame if it does not correspond
to a real buffer. To do this, handle copying from
the back frame separately from the "use coded value"
codepath; also use memcpy for the former, as the
chunks here are typically worth it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This check is intended to be avoid buffer overflows,
yet there are four problems with it:
1. It has an in-built off-by-one error: len == out_end - out
is perfectly fine and nothing to worry about.
This off-by-one error led to the pixel in the lower-right corner
not being set properly for the back frame of the sample from
the rl2 FATE-test. This pixel is copied to every frame which
is the reason for the update to the reference file of said test.
With this patch, the output of the decoder matches the output
as captured from the reference decoder* (apart from the fact
that said reference somehow lacks the top part of the frame
(copied over from the background frame)).
2. Given that the stride of the buffer may be different
from the width of the video (despite one pixel taking one byte),
there is a second check lateron making the first check redundant
(if one returns immediately; a simple break at the second check
is not sufficient, because it only exits the inner loop).
3. The check is based around the assumption of the stride being
positive (it has this in common with the other check which
will be fixed in a future commit).
4. Even after fixing the off-by-one error, the check in
question is still triggered by all the non-background frames
in the FATE sample as well as by A1100100.RL2. In all these
cases, they use len == 255 and val == 128. For videos with
background frame this just means "copy from the background
frame", which would be done anyway lateron.* Yet for videos
without it copying it is necessary to avoid leaving
uninitialized parts in the video.
*: Available in https://samples.mplayerhq.hu/game-formats/voyeur-rl2/
**: Due to this, the code that copies the rest from the
back frame is no longer executed for any of the samples
available on the sample server. Given that these are only
the files from the demo version of this game, I don't know
whether this code is executed for any file in existence or not.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is only used by three of the thirty files that (potentially
indirectly) include mpeg4audio.h. Twenty of these files won't
have a put_bits.h inclusion any more after this patch.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows av_mediacodec_release_buffer to be called safely after
the decoder is closed, this was already the case with delay_flush=1.
Note that this causes holding onto frames to keep the decoding context
alive which is generally considered to be the intended behavior.
Signed-off-by: sfan5 <sfan5@live.de>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Namely to lavu/tests/pixelutils.c. This way, this function will
not be included into actual binaries any more.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Instead use av_pix_fmt_desc_next(). It is still possible
to check its return values by comparing it with the
(currently) expected values and the code does so.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_check_pixfmt_descriptors() was added in commit
20e99a9c10. At this time,
the values of enum AVPixelFormat were not contiguous;
instead there was a jump from 111 to 291 (or from 115
to 295 depending upon AV_PIX_FMT_ABI_GIT_MASTER).
ff_check_pixfmt_descriptors() accounts for this
by skipping empty descriptors. Yet this issue no longer
exists: There are no holes.
The check for said holes makes GCC believe that the name
can be NULL; because it is used as argument corresponding to
%s in a log statement, it therefore emits a warning
(since d75c4693fe). Therefore
this commit simply removes these checks.
Also move the checks for name before the log statement.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is currently 64-bit only because the stack spilling code would not
assemble on RV32I (and it would corrupt s0 and s1 on RV128I, in theory).
This could be added later in the unlikely that someone wants it.
Also remove a variable that is only used in this commented-out
codeblock. This fixes a -Wunused-variable warning.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Set preset default value to MFX_TARGETUSAGE_UNKNOWN. Let runtime to
decide the targetUsage, so that ffmpeg-qsv can keep up with runtime's
update.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Unset qsv_h264 and qsv_hevc's default settings. Let runtime to decide
these parameters, so that it can choose the best parameter and ffmpeg-qsv
can keep up with runtime's update.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
This avoids one redundant load per row; pix3 from the previous
iteration can be used as pix2 in the next one.
Before: Cortex A53 A72 A73
pix_abs_0_2_neon: 138.0 59.7 48.0
After:
pix_abs_0_2_neon: 109.7 50.2 39.5
Signed-off-by: Martin Storsjö <martin@martin.st>
Fixes the j2k-dwt FATE-test; also fixes#9945.
(I don't know whether the multiplication can overflow.)
Reviewed-by: Tomas Härdin <git@haerdin.se>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Possible since 48ac225db2.
Furthermore, the current code would not work on mips
in case ff_lsp2polyf() were used outside of lsp.c,
because it is not compiled on mips since commit
3827a86eac at all;
instead it is overridden with a static av_always_inline
function which only works for the callers in lsp.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Will avoid a forward declaration lateron.
Also adapt the function to modern style while at it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They will be mistaken for the sentinel of the arrays
they are in, thereby hiding the 6.1, 7.1 and 22.2 layouts.
(This doesn't really matter, as these arrays are informational
only for decoders.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
AVCodec.channel_layouts is deprecated and Clang (unlike GCC)
warns when setting this field in a codec definition.
Fortunately, Clang (unlike GCC) allows to use
FF_DISABLE_DEPRECATION_WARNINGS inside a definition (of an FFCodec),
so that one can create simple macros to set AVCodec.channel_layouts
that also suppress deprecation warnings for Clang.
(Notice that some of the codec definitions were already
inside FF_DISABLE/ENABLE_DEPRECATION_WARNINGS (that were not
guarded by FF_API_OLD_CHANNEL_LAYOUT); these have been removed.
Also notice that setting AVCodec.channel_layouts was not guarded
by FF_API_OLD_CHANNEL_LAYOUT either, so testing disabling it
it without removing all the codeblocks would not have worked.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Pointers to void can be converted to any pointer to incomplete or
object type and back; but they are nevertheless not completely generic
pointers: There is no provision in the C standard that guarantees their
convertibility with function pointers. C90 lacks a generic function
pointer, C99 made every function pointer a generic function pointer and
still disallows the convertibility with void *. Both GCC as well as
Clang warn about this when using -pedantic.
Therefore use unions to avoid these conversions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Don't place it as doxy specific for the order field, and generalize it both to
also cover already defined orders and to not make it seem like the user is
required to handle a layout they don't fully support or understand.
Signed-off-by: James Almer <jamrial@gmail.com>
av_display_rotation_get will return NAN when the display matrix is invalid,
which would end up printing NAN as an integer in the rotation field. This
is poor for multiple reasons:
* Users of ffprobe have no way of discerning "valid but ugly rotation from
display matrix" from "invalid display matrix".
* It can have unintended consequences on some platforms, such as Linux x86_64,
where NAN is equal to INT64_MIN, which, for example, when printed as JSON,
which uses floating point for all numbers, can end up as invalid JSON or wit
a number that cannot be reserialized as an integer at all.
Since NAN is av_display_rotation_get's error case, just print 0 (no rotation)
when that happens.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This starts with one-time initialisation of the 26 constant factors
like 08edacc248. That is done with
the scalar instruction set. While the formula can readily be vectored,
the gains would (probably) be more than lost in transfering the results
back to FP registers (or suitably reshuffling them into vector
registers).
Note that the main loop could likely be scheduled sligthly better by
expanding the filter macro and interleaving loads with arithmetic.
It is not clear yet if that would be relevant for vector processing (as
opposed to traditional SIMD).
We could also use fewer vectors, but there is not much point in sparing
them (they are *all* callee-clobbered).
This uses the following vectorisation:
for (i = 0; i < blocksize; i++) {
ang[i] = mag[i] - copysignf(fmaxf(ang[i], 0.f), mag[i]);
mag[i] = mag[i] - copysignf(fminf(ang[i], 0.f), mag[i]);
}
RVV defines a total of 12 different extensions, including:
- 5 different instruction subsets:
- Zve32x: 8-, 16- and 32-bit integers,
- Zve32f: Zve32x plus single precision floats,
- Zve64x: Zve32x plus 64-bit integers,
- Zve64f: Zve32f plus Zve64x,
- Zve64d: Zve64f plus double precision floats.
- 6 different vector lengths:
- Zvl32b (embedded only),
- Zvl64b (embedded only),
- Zvl128b,
- Zvl256b,
- Zvl512b,
- Zvl1024b,
- and the V extension proper: equivalent to Zve64f and Zvl128b.
In total, there are 6 different possible sets of supported instructions
(including the empty set), but for convenience we allocate one bit for
each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32),
64-bit ints (RVV_I64) and doubles (RVV_F64).
Whence the vector size is needed, it can be retrieved by reading the
unprivileged read-only vlenb CSR. This should probably be a separate
helper macro if needed at a later point.
RV64G supports MIN & MAX instructions natively only on floating point
registers, not general purpose ones. The later would require the Zbb
extension. Due to that, it is actually faster to perform the clipping
"properly" in FPU.
Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech):
audiodsp.vector_clipf_c: 29551.5
audiodsp.vector_clipf_rvf: 17871.0
Also tried unrolling with 2 or 8 elements but it gets worse either way.
This introduces compile-time and run-time CPU detection on RISC-V. In
practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of
I, F and D extensions, and if it does, it probably won't have run-time
detection. So the flags are essentially always set.
But as things stand, checkasm wants them that way. Compare the ARMV8
flag on AArch64. We are nowhere near running short on CPU flag bits.
When force_original_aspect_ratio and force_divisible_by are both
used, dimensions are now rounded to the nearest allowed multiple of
force_divisible_by rather than first rounding to the nearest integer and
then rounding in a static direction. This results in less distortion of
the aspect ratio.
Reviewed-by: Thierry Foucu <tfoucu@google.com>
Signed-off-by: Tristan Schmelcher <tschmelcher@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The general demuxing API uses parsers and decoders. Therefore
FFStream contains pointers to AVCodecContexts and
AVCodecParserContext and lavf/internal.h includes lavc/avcodec.h.
Yet actually only a few files files really use these; and it is best
when this number stays small. Therefore this commit uses opaque
structs in lavf/internal.h for these contexts and stops including
avcodec.h.
This also avoids including lavc/codec_desc.h implicitly. All other
headers are implicitly included as now (mostly through codec.h).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avcodec_enum_to_chroma_pos() and avcodec_chroma_pos_to_enum()
deal with enum AVChromaLocation which is defined in lavu.
These functions are therefore replaced by
av_chroma_location_enum_to_pos() and av_chroma_location_pos_to_enum().
This commit provides the necessary deprecations. Also already make
these functions wrappers around the corresponding lavu functions
as not doing so would force one to disable deprecation warnings.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are intended as replacements for avcodec_enum_to_chroma_pos()
and avcodec_chroma_pos_to_enum().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are also frequently used in libavformat.
This change does not cause any breakage as avcodec.h
includes defs.h.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This also tests writing slice data in the unaligned mode
(some of these files use CAVLC) as well as updating
side data as well as parsing ISOBMFF avcc extradata.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary because an av_shrink_packet() a few lines below
will set the size; furthermore, it is actually harmful, because
av_shrink_packet() does nothing in case the size already matches,
so that the packet's padding is not correctly zeroed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is e.g. legal for an ISOBMFF avcc to contain zero parameter sets.
In this case the annex B that we produce would be empty and therefore
useless. This happens e.g. with mov/frag_overlap.mp4 from the
FATE-suite.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no check for whether these supposedly redundant PPS
are actually redundant. One could check via memcmp which would
work in practice* (because all content buffers are initially
zero-allocated), but this is not portable as compilers may
trash padding inside structures as they wish.
In case the PPS is not really redundant the output is garbage.
This happens with several files from the FATE-suite. E.g.
h264-conformance/CVCANLMA2_Sony_C.jsv doesn't decode correctly
any more, whereas h264-conformance/CABA3_TOSHIBA_E.264 even
fails in ff_cbs_write_packet(), because the inferred value
of num_ref_idx_l0_active_minus1 mismatches with the value set
in the slice (this happens when num_ref_idx_l0_default_active_minus1
changes in the PPS; the value in the slice header is inferred from
the original PPS's num_ref_idx_l0_default_active_minus1).
*: Unless slice_group_id is used, i.e. unless slice_group_map_type
is six.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: signed integer overflow: 1917019860 + 265558963 cannot be represented in type 'int'
Fixes: 48798/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DST_fuzzer-4833165046317056
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Maybe timestamp / duration validity should be checked earlier
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_WEBM_DASH_MANIFEST_fuzzer-6586894739177472
Fixes: signed integer overflow: 0 - -9223372036854775808 cannot be represented in type 'long'
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
There is probably a better place to check for this, but better
here than nowhere
Fixes: signed integer overflow: -9223372036824775808 - 86400000000 cannot be represented in type 'long'
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_SBG_fuzzer-6601162580688896
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 9223372036851135042 + 15666854 cannot be represented in type 'long'
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_SBG_fuzzer-6573717339111424
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 2138820085 + 16130322 cannot be represented in type 'int'
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_LIVE_FLV_fuzzer-6704728165187584
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The check could be made more strict
Fixes: signed integer overflow: 36 * 538976288 cannot be represented in type 'int'
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_GENH_fuzzer-6539389873815552
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: negation of -2147483648 cannot be represented in type 'int'; cast to an unsigned type to negate this value to itself
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_DHAV_fuzzer-6604736532447232
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 1099511693312 * 538976288 cannot be represented in type 'long'
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_CAF_fuzzer-6565048815845376
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avoids overflows with it
Fixes: signed integer overflow: 9223372036846866010 + 4294967047 cannot be represented in type 'long'
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_ASF_O_fuzzer-6538296768987136
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_ASF_O_fuzzer-657169555665715
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use the stream duration as last resort, as an off-by-one result of the
"st->duration / (caf->packets - 1)" calculation can break playback on some
devices.
Also, don't write the sample_rate value propagated by encoders like libopus.
The sample rate of the audio fed to it is irrelevant after being encoded.
Fixes ticket #9930.
Signed-off-by: James Almer <jamrial@gmail.com>
MJPEG-B is an intra-codec, so it makes no sense to keep the reference.
It is unused lateron anyway.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This could be improved further by not allocating the buffers
that won't be needed lateron in the first place.
Reviewed-by: Tomas Härdin <tjoppen@acc.umu.se>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes assertion failures after avcodec_flush_buffers(), where
stashed hwaccel state is present, but prev_thread is NULL.
Found-by: Wang Bin <wbsecg1@gmail.com>
To support non-aligned buffers during the post-transform step, just iterate
backwards over the array.
This allows using the 15xN-point FFT, with which the speed is 2.1 times
faster than our old libavcodec implementation.
~4x faster than the C version.
The shuffles in the 15pt dim1 are seriously expensive. Not happy with it,
but I'm contempt.
Can be easily converted to pure AVX by removing all vpermpd/vpermps
instructions.
There are two issues here. Firstly, the floating-point comparison
is always true. Seconly, the code depends on the default value of
min_hard_comp implicitly, which can be dangerous.
Partially fixes ticket 9859.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
According to the HEIF specification (ISO/IEC 23008-12) Section
7.5.3.1, tracks with handler_type 'auxv' must contain a 'auxi' box
in its SampleEntry to notify the nature of the auxiliary track to the
decoder.
The content is the same as the 'auxC' box. So parameterize and re-use
the existing function.
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: James Zern <jzern@google.com>
this maps to the vpxenc argument with the same name and the
VP9E_SET_MIN_GF_INTERVAL codec control
Signed-off-by: James Zern <jzern@google.com>
Reviewed-by: Vignesh Venkatasubramanian <vigneshv@google.com>
The input complex factors are constant for each iterations. This
substitudes 4 loads, 2 additions and 2 subtractions per iteration of
the inner-loop with another 4 loads. Thus effectively 4 arithmetic
operations per iteration of the inner loop are avoided, i.e. 24
operations per iteration of the outer loop, or 24 * (n - 1) operations
in total.
If the inner loop is not unrolled by the compiler, this also might
also save some pointer arithmetic as most instruction sets do not
have addressing modes with negated register offsets (12 - j). Unless
the compiler is optimising for code size, this is unlikely though.
Fixes: signed integer overflow: 9223372036854775807 - -2146905566 cannot be represented in type 'long'
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_MXF_fuzzer-6570996594769920
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In case a SupplementalProperty node exists in an adaptationset,
it is searched for a "schemeIdUri" property via xmlGetProp().
Whatever xmlGetProp() returns is then compared via av_strcasecmp()
to a string literal. xmlGetProp() can return NULL, namely in case
no "schemeIdUri" exists and (given that this string is allocated)
presumably also on allocation failure. No check for NULL is done,
so this may crash.
Furthermore, the string returned by xmlGetProp() needs to be freed
with xmlFree(), but this is not done either.
This commit fixes both of these issues; they existed since this code
has been added in 10d008f0fd.
This has been found while investigating ticket #9697. The continuous
leaks might very well be the reason behind the observed slowdown.
Reviewed-by: Steven Liu <lingjiujianke@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In case new orders are added in the future, existing library users can still
use the layout simply by ignoring everything but the channel count in it, so
make this explicit.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: James Almer <jamrial@gmail.com>
ff_encode_preinit() ensures that the channel layout is equivalent
to one of the channel layouts in AVCodec.ch_layout; given that
all of these channel layouts have distinct numbers of channels,
one can therefore uniquely determine the channel layout by
the number of channels.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This encoder has AVCodec.ch_layouts set, so ff_encode_preinit()
ensures that the used channel layout is equivalent to one of
these.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_encode_preinit() has already checked that the channel layout
is equivalent to one of the layouts in AVCodec.ch_layouts.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These encoders have AVCodec.ch_layouts set, so ff_encode_preinit()
has already checked that the used channel layout is equivalent
to one of these native layouts. Therefore one can simply
compare the channel masks (with the added complication
that one has to use av_channel_layout_subset() to get it,
because the channel layout is not guaranteed to have
AV_CHANNEL_ORDER_NATIVE).
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The encoder actually creates files with side channels, not back
channels. See thd_layout in mlp_parse.h.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The encoders using this have AVCodec.ch_layouts set, so that
this is checked generically.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
None of the decoders here have the AV_CODEC_CAP_CHANNEL_CONF set,
so that it is already checked generically that the number of channels
is positive.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The pcm_bluray encoder has AVCodec.ch_layouts set, so that
ff_encode_preinit() checks that the channel layout in use
is equivalent to one of the layouts from AVCodec.ch_layouts.
Yet equivalent is not the same as identical; in particular,
custom layouts equivalent to native layouts are possible
(and necessary if one wants to use the name/opaque fields
with an ordinary channel layout), so one must not simply
use AVChannelLayout.u.mask. Use av_channel_layout_subset()
instead.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This decoder does not have the AV_CODEC_CAP_CHANNEL_CONF set,
so that number of channels has to be set by the user before
avcodec_open2().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Now that it is ensured that the old and new channel count/layout
values coincide if the old ones are set, the consistency of the
AVChannelLayout (which is checked before we reach this point)
implies the consistency of the old values, making these checks
here dead code. So remove them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This ensures that if AVCodecContext.channels or
AVCodecContext.channel_layout are set, AVCodecContext.ch_layout
has the equivalent values after this block.
(In case these values are set inconsistently, the consistency check
for ch_layout below will error out.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In particular, check the provided channel layout for encoders
without AVCodec.ch_layouts set. This fixes an infinite loop
in the WavPack encoder (and maybe other issues in other encoders
as well) in case the channel count is zero.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The wrapper for the legacy channel layout API already sets
AVCodecContext.channels based upon AVCodecContext.channel_layout
if the latter is set while the former is unset.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Provide optimized implementation for pix_median_abs8 function.
Performance comparison tests are shown below.
- median_sad_1_c: 277.0
- median_sad_1_neon: 82.0
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation for vsad8_intra function.
Performance comparison tests are shown below.
- vsad_5_c: 94.7
- vsad_5_neon: 20.7
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation for pix_median_abs16 function.
Performance comparison tests are shown below.
- median_sad_0_c: 720.5
- median_sad_0_neon: 127.2
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
When determining whether a packet should be decrypted,
should use the stsd_id of the fragment where the current packet is located.
Reviewed-by: Zhao Zhili <zhilizhao@tencent.com>
Signed-off-by: Wang Yaqiang <wangyaqiang03@kuaishou.com>
Old one was written with the assumption only even inputs would be given.
This very messy replacement supports even and odd inputs, and supports
AVX2 for extra speed. The buffers given are usually quite big (4k samples),
so the speedup is worth it.
The new SSE version is still faster than the old inline asm version by 33%.
Also checkasm is provided to make sure this monstrosity works.
This fixes some FATE tests.
Clang's static analyzer complains that leaving the variable
uninitialized could lead to a code path where the uninitialized value is
written to at the end of this function.
This patch simply zero-initializes that variable to avoid that.
Signed-off-by: Will Cassella <cassew@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The aim of this test is to show the interleavement
of the file generated in the first pass; so make the
interleavement queue in the framecrc muxer in the second
pass as small as possible so that the framecrc muxer does not
fix wrong interleavement of the input file behind our backs.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
enc_dec is designed for raw input and output and computes
the PSNR between these two. The input of the shortest-sub
test is the idx file of a vobsub sub+idx combination
and the output is the output of framecrc of said vobsub
subtitle muxed into Matroska together with a synthesized
video. Calculating the PSNR between these two files makes
no sense, therefore switch to a transcode test, where
the ref file file contains the output of framecrc directly,
making the interleavement better visible in the ref file
at the cost of a larger ref file (>400 lines).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The MXF demuxer does not currently set AVStream::avg_frame_rate and ::r_frame_rate
when J2K essence is wrapped according to SMPTE ST 422.
Reviewed-by: Tomas Härdin <tjoppen@acc.umu.se>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Basically reverts af15c17daa.
Flipping a picture by modifying the pointers is so common
that even users of direct rendering should take it into account.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, libswscale/output.c used a macro to write
an output pixel which involved a call to av_pix_fmt_desc_get()
to find out whether the input pixel format is BE or LE
despite this being known at compile-time (there are templates
per pixfmt). Even worse, these calls are made in a loop,
so that e.g. there are eight calls to av_pix_fmt_desc_get()
for every pixel processed in yuv2rgba64_X_c_template()
for 64bit RGB formats.
This commit modifies these macros to ensure that isBE()
is evaluated at compile-time. This saved 41184B of .text
for me (GCC 11.2, -O3). Of course, it also improved performance.
E.g. ffmpeg_g -f lavfi -i testsrc2,format=yuva420p -pix_fmt rgba64le \
-threads 1 -t 1:00 -f null - (which uses yuv2rgba64le_X_c,
which is an invocation of yuv2rgba64_X_c_template() mentioned above),
performance improved from 95589 to 41387 decicycles for one call
to yuv2packedX; for the be variant the numbers went down from
76087 to 43024 decicycles.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, libswscale/input.c used a macro to read
an input pixel which involved a call to av_pix_fmt_desc_get()
to find out whether the input pixel format is BE or LE
despite this being known at compile-time (there are templates
per pixfmt). Even worse, these calls are made in a loop,
so that e.g. there are six calls to av_pix_fmt_desc_get()
for every pair of UV pixel processed in
rgb64ToUV_half_c_template().
This commit modifies these macros to ensure that isBE()
is evaluated at compile-time. This saved 9743B of .text
for me (GCC 11.2, -O3). For a simple RGB64LE->YUV420P
transformation like
ffmpeg -f lavfi -i haldclutsrc,format=rgba64le -pix_fmt yuv420p \
-threads 1 -t 1:00 -f null -
the amount of decicycles spent in rgb64LEToUV_half_c
(which is created via the template mentioned above)
decreases from 19751 to 5341; for RGBA64BE the number
went down from 11945 to 5393. For shared builds (where
the call to av_pix_fmt_desc_get() is indirect) the old numbers
are 15230 for RGBA64BE and 27502 for RGBA64LE, whereas
the numbers with this patch are indistinguishable from
the numbers from a static build.
Also make the macros that are touched conform to the
usual convention of using uppercase names while just at it.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, using NULL as key in av_dict_get() on a non-empty
AVDictionary would crash; using NULL as key in av_dict_set()
would also crash for a non-empty AVDictionary unless AV_DICT_MULTIKEY
was set; in case the dictionary was initially empty or AV_DICT_MULTIKEY
was set, it was even possible for av_dict_set() to succeed when
adding a NULL key, namely when one uses a value != NULL and
the AV_DICT_DONT_STRDUP_VAL flag. Using av_dict_get() on such
an AVDictionary will usually lead to crashes, though.
Fix this by actually checking for key in both functions; error out
if they are NULL.
While just at it, also stop relying on av_strdup(NULL) to return NULL
in av_dict_set().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The compiler cannot infer that the two float vectors do not alias,
causing unnecessary extra loads and serialisation. This patch caches
the two input values in local variables so that compiler can optimise
individual loop iterations.
While this probably never overflows, we are better safe than sorry.
The callback prototype should probably also use ptrdiff_t or size_t,
but I diggress (this would affect the DSP callback prototype).
Do this by setting AVCodecInternal.pad_samples.
This prevents reading into the frame's padding and writing
into the packet's padding.
This actually happened in our FATE tests (where the number of samples
is 2 mod 4), which therefore needed to be updated.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Some audio codecs work with atomic units that decode to a fixed
number of audio samples with this number being so small that it is
common to put multiple of these atoms into one packet. In these
cases it makes no sense to pad the last frame to the big frame_size,
so allow encoders to set the number of samples that they want
the last frame to be padded to instead.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In particular, check that there is only one small last frame
in case the encoder has the AV_CODEC_CAP_SMALL_LAST_FRAME set.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The APTX (HD) decoder decodes blocks of four (six) bytes to four
output samples. It makes no sense to handle incomplete blocks:
They would just lead to synchronization errors, in which case
the complete frame is discarded. So only handle complete blocks.
This also avoids reading from the packet's padding and writing
into the frame's padding.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Just because we try to put multiple units of block_align bytes
(the atomic units for APTX and APTX HD) into one packet
does not mean that packets with fewer units than the
one we wanted are corrupt; only those packets that are not
a multiple of block_align are.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This field was misunderstood: It gives the number of samples
in a packet, not the number of bytes. Its usage was wrong for APTX HD.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Currently the APTX (HD) codecs set frame_size if unset and check
whether it is divisible by block_size (corresponding to block_align
as used by other codecs). But this is based upon a misunderstanding
of the API: frame_size is not in bytes, but in samples.
Said value is also not intended to be set by the user at all,
but set by encoders and (possibly) decoders if the number of channels
in a frame is constant. The latter condition is not fulfilled here,
so only set it for encoders. Given that the encoder can handle any
number of samples as long as it is divisible by four and given that
it worked to set a custom frame size before, the encoders accept
any multiple of four; otherwise the value is set to the value
that it already had for APTX: 1024 samples (per channel).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
APTX decodes four bytes of input to four stereo samples; APTX HD
does the same with six bytes of input. So it can be easily supported
in av_get_audio_frame_duration().
This fixes invalid durations and (derived) timestamps of demuxed
APTX HD packets and therefore fixed the timestamp in the aptx-hd
FATE test.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
We have de- and encoders for APTX and APTX HD, yet not FATE tests.
This commit therefore adds a transcoding test to utilize them.
Furthermore, during creating these tests it turned out that
the duration is set incorrectly for APTX HD. This will be fixed
in a future commit.
(Thanks to Andriy Gelman for finding an issue in an earlier version
that used a 192kHz input sample which does not work reliably accross
platforms.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This block was scheduled for removal, which means that no validation would have
taken place after the old API was removed.
It was algo going to mistakenly remove an unrelated bits_per_coded_sample
check.
Signed-off-by: James Almer <jamrial@gmail.com>
The opaque parameter for the callback is set in videotoolbox_start(),
called when the hwaccel is initialized. When frame threading is used,
avctx will be the context corresponding to the frame thread currently
doing the decoding. Using this same codec context in all subsequent
invocations of the decoder callback (even those triggered by a different
frame thread) is unsafe, and broken after
cc867f2c09, since each frame thread now
cleans up its hwaccel state after decoding each frame.
Fix this by passing hwaccel_priv_data as the opaque parameter, which
exists in a single instance forwarded between all frame threads.
The only other use of AVCodecContext in the decoder output callback is
as a logging context. For this purpose, store a logging context in
hwaccel_priv_data.
This is no longer used since 4608996772.
It also has no implementations other than the plain C one.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Since introducing the various packed formats used by VAAPI (and p012),
we've noticed that there's actually a gap in how
av_find_best_pix_fmt_of_2 works. It doesn't actually assign any value
to having the same bit depth as the source format, when comparing
against formats with a higher bit depth. This usually doesn't matter,
because av_get_padded_bits_per_pixel() will account for it.
However, as many of these formats use padding internally, we find that
av_get_padded_bits_per_pixel() actually returns the same value for the
10 bit, 12 bit, 16 bit flavours, etc. In these tied situations, we end
up just picking the first of the two provided formats, even if the
second one should be preferred because it matches the actual bit depth.
This bug already existed if you tried to compare yuv420p10 against p016
and p010, for example, but it simply hadn't come up before so we never
noticed.
But now, we actually got a situation in the VAAPI VP9 decoder where it
offers both p010 and p012 because Profile 3 could be either depth and
ends up picking p012 for 10 bit content due to the ordering of the
testing.
In addition, in the process of testing the fix, I realised we have the
same gap when it comes to chroma subsampling - we do not favour a
format that has exactly the same subsampling vs one with less
subsampling when all else is equal.
To fix this, I'm introducing a small score penalty if the bit depth or
subsampling doesn't exactly match the source format. This will break
the tie in favour of the format with the exact match, but not offset
any of the other scoring penalties we already have.
I have added a set of tests around these formats which will fail
without this fix.
Update description for ssim and ms_ssim libvmaf options to specify
feature=float_ssim and feature=float_ms_ssim which are used to request
ssim and ms_ssim values in the latest versions of libvmaf.
Signed-off-by: Yondon Fu <yondon.fu@gmail.com>
Fixes: signed integer overflow: -2147448926 + -198321 cannot be represented in type 'int'
Fixes: 48798/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APE_fuzzer-5739619273015296
Fixes: 48798/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APE_fuzzer-6744428485672960
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 127 + 2147483536 cannot be represented in type 'int'
Fixes: 48798/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MOBICLIP_fuzzer-6014034970804224
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 17121181824 * 538976288 cannot be represented in type 'long long'
Fixes: 48798/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_fuzzer-5915330316206080
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes regression with tickets/4364/L1004220.DNG
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The patch fixes the bugs that occurred when running
fate-checkasm-hevc_pel on MIPS paltform.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch fixes a bug where the fate-checkasm-motion fails when
h is not a multiple of 8.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The code to initialize it takes more space (in .text) than
the table to be initialized (namely 86B vs 64B for GCC 11.2
with -O3 in an av_cold function), so hardcode the table.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Most of the VLCs used here have a max_depth of two;
some have a max_depth of one. Therefore one can just use two
and avoid the runtime check for whether one should
perform another round of LUT lookup in case the first read
did not read a complete codeword.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, initializing the dca VLC tables uses ff_init_vlc_sparse()
with length tables of type uint8_t and code tables of type uint16_t
(except for the LBR tables, which uses length and symbols of type
uint8_t; these tables are interleaved). In case of the quant index
codebooks these arrays were accessed via tables of pointers to the
individual tables.
This commit changes this: First, we switch to ff_init_vlc_from_lengths()
to replace the uint16_t code tables by uint8_t symbol tables
(this necessitates ordering the tables from left-to-right in the tree
first). These symbol tables are interleaved with the length tables.
Furthermore, these tables are combined in order to remove the table of
pointers to individual tables, thereby avoiding relocations (for x64
elf systems this amounts to 96*24B = 2304B saved in .rela.dyn) and
saving 1280B from .data.rel.ro (for 64bit systems). Meanwhile the
savings in .rodata amount to 2709 + 2 * 334 = 3377B. Due to padding
the actual savings are higher: The ELF x64 ABI requires objects >= 16B
to be padded to 16B and lots of the tables have 2^n + 1 elements
of these were from replacing uint16_t codes with uint8_t symbols;
the rest was due to the fact that combining the tables eliminated
padding (the ELF x64 ABI requires objects >= 16B to be padded to 16B
and lots of the tables have 2^n + 1 elements)). Taking this into
account gives savings of 4548B. (GCC by default uses an even higher
alignment (controlled by -malign-data); for it the savings are 5748B.)
These changes also necessitated to modify the init code for
the encoder tables.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, the encoder used the same tables that the decoder
uses to create its VLCs. These have the downside of requiring
the encoder to offset the tables at runtime as well as having
to read from separate tables for the length as well as the code
of the symbol to encode. The former are uint8_t, the latter uint16_t,
so using a joint table would require padding, but this doesn't
matter when these tables are generated at runtime, because they
live in the .bss segment.
Also move these init functions as well as the functions that
actually use them to dcaenc.c, because they are encoder-specific.
This also allows to remove an inclusion of PutBitContext from
dcahuff.h (and indirectly from all dca-decoder files).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It increases the size of one VLC from two to three bits, thereby
requiring four more VLCEntries (16 bytes .bss), but it allows to
inline the number of bits used when reading them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The ff_dca_vlc_transition_mode VLCs don't use an offset at all,
so just use ordinary VLCs for them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
GCC 12 apparently believes that negative palette sizes are
possible (they are not, as this has already been checked during
init) and therefore emits a -Wstringop-overflow= for the memcpy.
Using unsigned avoids this.
(To be honest, there might be a compiler bug involved.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This might be useful in case this decoder were changed to support
new extradata passed via side-data.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This code is only called once during init, so none of the buffers
here have been allocated already.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
183132872a made the iff demuxer
output extradata and made the decoder parse said extradata.
To make this extradata extensible, it came with its own internal
length field (containing the offset of the palette at the end
of the extradata). Furthermore, in order to support mid-stream
extradata changes, the packets returned by the demuxer also have
such a length field (containing the offset of the actual packet
data). Therefore the packet parsing the extradata accepted its
input from both AVPackets as well as from ordinary extradata.
Yet the demuxer never made use of this "feature": The packet's
length field always indicated that the packet data starts
immediately after the length field.
Later, commit cb928fc448 stopped
appending the length field to the packets' data; of course,
it also stopped searching for extradata in this data.
Instead it added code to parse the packet's header to the function
that parses extradata. This made this function consist of two disjoint
parts, one of which is only reachable if this function is called
from init (when parsing extradata) and one of which is reachable
when parsing packet headers.
Therefore this commit splits this function into two.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Otherwise the buffer might be too small. Fixes assert violations
when encoding mono audio with exactly one sample.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For example, if the jpeg contains exif information
and the rotation direction is included in the exif,
the displaymatrix will be set on the side_data of the frame when decoding.
However, when ffplay is used to play the image,
only the side data in the stream will be determined.
It does not check whether the frame also contains rotation information,
causing it to play in the wrong direction
Reviewed-by: Zhao Zhili <zhilizhao@tencent.com>
Signed-off-by: Wang Yaqiang <wangyaqiang03@kuaishou.com>
It is currently calling av_channel_layout_describe()
unnecessarily.
Reviewed-by: Nicolas George <george@nsup.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
If a developer using FFmpeg libraries seeks into an earlier position and calls
avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will drop
the next frame, since buffer flushing clears the first_frame flag. As a result,
the audio samples the calling code receives may be ahead of the requested seek
position, which is unacceptable in some use cases such as playing a looping
sound effect.
This commit records the presentation timestamp of the first frame and
determines after that if the new frame is the first frame (possible after
seeking to the start) by comparing its pts to the stored pts.
When appending two values (due to AV_DICT_APPEND), the earlier code
would first zero-allocate a buffer of the required size and then
copy both parts into it via av_strlcat(). This is problematic,
as it leads to quadratic performance in case of frequent enlargements.
Fix this by using av_realloc() (which is hopefully designed to handle
such cases in a better way than simply throwing the buffer we already
have away) and by copying the string via memcpy() (after all, we already
calculated the strlen of both strings).
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
If a key already exists in an AVDictionary and the AV_DICT_APPEND flag
is set, the old entry is at first discarded from the dictionary, but
a pointer to the value is kept. Lateron enough memory to store the
appended string is allocated; should this allocation fail, the old string
is not freed and hence leaks. This commit changes this by moving
creating the combined value to an earlier point in the function,
which also ensures that the AVDictionary is unchanged in case of errors.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
We know that an AVDictionary is not empty if we have just added
an entry to it, so only check for it being empty on the branch
that does not do so.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
av_strlcpy() returns the length of the src string to enable
the caller to check for truncation. It is currently used in
the following way in dump_metadata(): Every metadata value
is searched for \b, \n, \v, \f, \r and then the data up to
the first of these characters found is copied to a small
temporary buffer via av_strlcpy() (but of course not more
than fits into said buffer) and then printed; all characters up
to the character found earlier are then treated as consumed.
But this is bad performance-wise if the while string is big
and contains many of these characters, because av_strlcpy()
will unnecessarily calculate the length of the whole remaining string.
(dump_metadata() actually ignored the return value of av_strlcpy().)
Fix this by not copying the data to a temporary buffer at all.
Instead just use %.*s to bound the number of characters output.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It may be NULL, as is the case for D3D11VA_VLD.
Running "ffmpeg -h decoder=h264" on a Windows build
Before:
Decoder h264 [H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10]:
Supported hardware devices: dxva2 (null) d3d11va cuda
After:
Decoder h264 [H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10]:
Supported hardware devices: dxva2 d3d11va cuda
Signed-off-by: James Almer <jamrial@gmail.com>
If it's unsupported or invalid, then there's no point trying to rebuild it
using a value that may have been derived from the same layout to begin with.
Move the checks before the attempts at copying the layout while at it.
Fixes ticket #9908.
Signed-off-by: James Almer <jamrial@gmail.com>
This reverts commit 2c8dc7e953.
The loongarch headers have been fixed, so that this wrapper
is no longer necessary.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This reverts commit 6c9a60ada4.
The loongarch headers have been fixed, so that this workaround
is no longer necessary.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
If the target supports the Basic bit-manipulation (Zbb) extension, then
the REV8 instruction is available to reverse byte order.
Note that this instruction only exists at the "XLEN" register size,
so we need to right shift the result down to the data width.
If Zbb is not supported, then this patchset does nothing. Support for
run-time detection is left for the future. Currently, there are no
bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to
do this.
RISC-V defines the CLZ instruction as part of the ratified Zbb subset
of the (not yet ratified) bit mapulation extension (B). We can detect
it from the __riscv_zbb predefined constant. At least GCC 12 already
supports this correctly.
Note that the macro will be non-zero if supported, zero if enabled
in the compiler flags (e.g. -march=rv64gzbb) but not known to the
compiler, and undefined otherwise.
This uses the architected RISC-V 64-bit cycle counter from the
RISC-V unprivileged instruction set.
In 64-bit and 128-bit, this is a straightforward CSR read.
In 32-bit mode, the 64-bit value is exposed as two CSRs, which
cannot be read atomically, so a loop is necessary to detect and fix up
the race condition where the bottom half wraps exactly between the two
reads.
These tests test both the demuxer as well as the muxer
wherever possible. It is not always possible due to the fact
that the muxer supports more codecs than the demuxer.
The spdif demuxer does currently not set the need_parsing flag.
If one were to set this to AVSTREAM_PARSE_FULL, the test results
would change as follows:
- For spdif-aac-remux, the packets are currently padded to 16bits,
i.e. if the actual packet size is odd, there is a padding byte.
The parser splits this byte away into a one byte packet of its own.
Insanely, these one byte packets get the same duration as normal
packets, i.e. timing is ruined.
- The DCA-remux tests get proper duration/timestamps.
- In the spdif-mp2-remux test the demuxer marks the stream as
being MP2; the parser sets it to MP3 and this triggers
the "Codec change in IEC 61937" codepath; this test therefore
returns only two packets with the parser.
- For spdif-mp3-remux some bytes end up in different packets:
Some input packets of this file have an odd length (417B instead
of 418B like all the other packets) and are padded to 418B.
Without a parser, all returned packets from the spdif-demuxer
are 418B. With a parser, the packets that were originally 417B
are 417B again, but the padding byte has not been discarded,
but added to the next packet which is now 419B.
This fixes "Multiple frames in a packet" warning and avoids
an "Invalid data found when processing input" error when decoding.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When compiling FFmpeg with GCC-9, some very random segfaults were
observed in code which had previously called down into the SBC encoder
NEON assembly routines. This was caused by these functions clobbering
some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was
using these registers to save local variables, but after these
functions returned, they would contain garbage.
Fix by reallocating the registers in the two affected functions in
the following way:
ff_sbc_analyze_4_neon: q2-q5 => q8-q11, then q1-q4 => q8-q11
ff_sbc_analyze_8_neon: q2-q9 => q8-q15
The reason for using these replacements is to keep closely related
sets of registers consecutively numbered which hopefully makes the
code more easy to follow. Since this commit only reallocates
registers, it should have no performance impact.
Signed-off-by: James Cowgill <jcowgill@debian.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
It is not needed on x64, because the AV_COPY* and AV_ZERO*
macros never use MMX on x64.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Provide optimized implementation for vsse_intra16 for arm64.
Performance tests are shown below.
- vsse_4_c: 155.2
- vsse_4_neon: 36.2
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation for vsad_intra16 function for arm64.
Performance comparison tests are shown below.
- vsad_4_c: 177.5
- vsad_4_neon: 23.5
Benchmarks and tests are run with checkasm tool on AWS Gravtion 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of vsse16 for arm64.
Performance comparison tests are shown below.
- vsse_0_c: 257.7
- vsse_0_neon: 59.2
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of vsad16 function for arm64.
Performance comparison tests are shown below.
- vsad_0_c: 285.2
- vsad_0_neon: 39.5
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Add "slice" intra refresh type to h264_qsv and hevc_qsv. This type means
horizontal refresh by slices without overlapping. Also update the doc.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
This duration is equal to the longest duration in all track's tkhd atoms, which
may be comprised of the sum of all edit lists in each track. Empty edit lists
in tracks represent start_time, and the actual media duration is stored in the
mdhd atom.
This change lets the generic demux code derive the longest track duration taken
from mdhd atoms, so the correct duration and start_time combination will be
reported.
Should fix ticket #9775.
Reviewed-by: zhilizhao(赵志立) <quinkblack@foxmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
These macros are definitions, not only declarations and therefore
should not contain a semicolon. Such a semicolon is actually
spec-incompliant, but compilers happen to accept them.
Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The latest commit of Loongson MMI macro replaces were incorrect.
It makes a mass of green tints on HEVC videos when playing. I've
compared it with the older MMI implementation, and found out that
several lines have been replaced by wrong macros.
Signed-off-by: Qi Tiezheng <qitiezheng@360.cn>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
AV_PIX_FMT_VUYX is used in FFmpeg and MFX_FOURCC_AYUV is used in the SDK
Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
AV_PIX_FMT_VUYX is used for 8bit 4:4:4 content in FFmpeg VAAPI, so
AV_PIX_FMT_VUYX should be used for 8bit 4:4:4 content in FFmpeg QSV too
because QSV is based on VAAPI on Linux. However the SDK only declares
support for AYUV and does nothing with the alpha, so this commit fudged
a mapping between AV_PIX_FMT_VUYX and MFX_FOURCC_AYUV.
Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Currently AVBR is disabled and VBR is the default method if maxrate is
not specified on Linux, but AVBR is the default one if maxrate is not
specified on Windows. In order to make user experience better accross
Linux and Windows, use VBR by default on Windows if maxrate is not
specified. User need to set both avbr_accuracy and avbr_convergence to
non-zero explicitly and not to specify maxrate if AVBR is expected.
In addition, AVBR works for H264 and HEVC only in the SDK.
$ ffmpeg.exe -v verbose -f lavfi -i yuvtestsrc -vf "format=nv12" -c:v
vp9_qsv -f null -
The FFV1 decoder only uses the last frame's data to conceal
errors. The encoder does not have this problem and therefore
only uses the current frame and none of the ThreadFrames.
So only allocate them for the decoder.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
By using a symbol table one can already bake in applying
a LUT on the return value of get_vlc2(). So change the
symbol table for the vec2 and vec4 tables to avoid
using the symbol_to_vec2/4 LUTs.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It allows to replace tables of big codes (uint16_t and uint32_t)
by tables of smaller symbols (mostly uint8_t).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These codes are already ordered from left-to-right in the tree,
so one can just use ff_init_vlc_static_from_lengths().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Instead reuse the destination RL VLC as scratch space.
This is possible, because the (implicit) codes here are already
ordered from left-to-right in the tree and because the codelengths
are increasing, which implies that mapping from VLC entries to the
corresponding entries used to initialize the VLC is monotonically
increasing. This means that one can reuse the right end of the
destination RL VLC to store the tables used to initialize the VLC
with.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is possible because the codes are already ordered
from left to right in the tree. It avoids having to create
the codes ourselves and will enable the codes table
to be removed altogether once the encoder stops using it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Right now, it is nearly ordered by "left codes in the tree first";
the only exception is the escape value which has been put at the
end. This commit moves it to the place it should have according
to the above order. This is in preparation for further commits.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This matches a similar cap on the number of automatic threads
in libavcodec/pthread_slice.c.
On systems with lots of cores, this fixes a couple fate failures
in 32 bit mode on such machines (where spawning a huge number of
threads runs out of address space).
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead of the potentially adjusted ones. Otherwise, if config_props() is
called again and if using force_original_aspect_ratio, the already adjusted
values could be altered again.
Example command line
scale=size=1920x1000:force_original_aspect_ratio=decrease:force_divisible_by=2
user value 1920x1000 -> 1920x798 on init_dict() -> 1918x798 on frame
change when eval_mode == EVAL_MODE_INIT, which after e645a1ddb9 could be at the
very first frame.
Signed-off-by: James Almer <jamrial@gmail.com>
This state is not refcounted, so make sure it always has a well-defined
owner.
Remove the block added in 091341f2ab, as
this commit also solves that issue in a more general way.
Mention:
- that it is legacy and optional (every hwaccel that uses it can also
work with hwcontext, though some optional information can only be
signalled throught hwaccel_context)
- that it can be used for encoders (only qsvenc currently)
- ownership and lifetime
This commit implements an iMDCT in pure assembly.
This is capable of processing any mod-8 transforms, rather than just
power of two, but since power of two is all we have assembly for
currently, that's what's supported.
It would really benefit if we could somehow use the C code to decide
which function to jump into, but exposing function labels from assebly
into C is anything but easy.
The post-transform loop could probably be improved.
This was somewhat annoying to write, as we must support arbitrary
strides during runtime. There's a fast branch for stride == 4 bytes
and a slower one which uses vgatherdps.
Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):
128pt:
2811 decicycles in av_tx (imdct),16775916 runs, 1300 skips
3082 decicycles in av_imdct_half,16776751 runs, 465 skips
256pt:
4920 decicycles in av_tx (imdct),16775820 runs, 1396 skips
5378 decicycles in av_imdct_half,16776411 runs, 805 skips
512pt:
9668 decicycles in av_tx (imdct),16775774 runs, 1442 skips
10626 decicycles in av_imdct_half,16775647 runs, 1569 skips
1024pt:
19812 decicycles in av_tx (imdct),16777144 runs, 72 skips
23036 decicycles in av_imdct_half,16777167 runs, 49 skips
Inside a function, the second ';' in ";;" is just a null statement,
but it is actually illegal outside of functions. Compilers
nevertheless accept it without warning, except when in -pedantic
mode when e.g. Clang emits a -Wextra-semi warning. Therefore
remove the unnecessary ';'.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The field is not specific to Opus.
The mp2fixed encoder signals initial_padding and is used
by both the matroska-encoding-delay test as well as
the lavf-mkv tests which necessitated several FATE ref changes.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Matroska requires pts to be >= 0 with a slight exception:
It has a mechanism to deal with codec delay, i.e. with
the data added at the beginning that does not correspond
to actual input data and should be discarded by the player.
Only the audio actually intended to be output needs to have
a timestamp >= 0.
In order to avoid unnecessary timestamp shifting, this patch
allows muxers to inform the shifting code about this so that
it can take it into account.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Matroska generally requires timestamps to be nonnegative, but
there is an exception: Data that corresponds to encoder delay
and is not supposed to be output anyway can have a negative
timestamp. This is achieved by using the CodecDelay header
field: The demuxer has to subtract this value from the raw
(nonnegative) timestamps of the corresponding track.
Therefore the muxer has to add this value first to write
this raw timestamp.
Support for writing CodecDelay has been added in FFmpeg commit
d92b1b1bab and in Libav commit
a1aa37dd0b. The former simply
wrote the header field and did not apply any timestamp offsets,
leading to desynchronisation (if one uses multiple tracks).
The latter applied it at two places, but not at the one where
it actually matters, namely in mkv_write_block(), leading to
the same desynchronisation as with the former commit. It furthermore
used the wrong stream timebase to convert the delay to the
stream's timebase, as the conversion used the timebase from
before avpriv_set_pts_info().
When the latter was merged in 82e4f39883,
it was only done in a deactivated state that still did not
offset the timestamps when muxing due to "assertion failures
and av sync errors". a1aa37dd0b
made it definitely more likely to run into assertion failures
(namely if the relative block timestamp doesn't fit into an int16_t).
Yet all of the above issues have been fixed (in commits
962d631573,
5d3953a5dc and
4ebeab15b0. This commit therefore
enables applying CodecDelay, fixing ticket #7182.
There is just one slight regression from this: If one has input
with encoder delay where the first timestamp is negative, but
the pts of the part of the data that is actually intended to be
output is nonnegative, then the timestamps will currently by default
be shifted to make them nonnegative before they reach the muxer;
the muxer will then ensure that the shifted timestamps are retained.
Before this commit, the muxer did not ensure this; instead the
timestamps that the demuxer will output were shifted and
if the first timestamp of the actually intended output was zero
before shifting, then this unintentional shift just cancels
the shift performed before the packet reached the muxer.
(But notice that this only applies if all the tracks use the same
CodecDelay, or the relative sync between tracks will be impaired.)
This happens in the matroska-opus-remux and matroska-ogg-opus-remux
FATE tests. Future commits will forward the information that
the Matroska muxer has a limited capability to handle negative
timestamps so that the shifting in libavformat can take advantage
of it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Opus can be decoded to multiple samplerates (namely 48kHz, 24KHz,
16Khz, 12 KHz and 8Khz); libopus as well as our encoder wrapper
support these sample rates. The OpusHead contains a field for
this original samplerate. Yet the pre-skip (and the granule-position
in the Ogg-Opus mapping in general) are always in the 48KHz clock,
irrespective of the original sample rate.
Before commit c3c22bee63, our libopus
encoder was buggy: It did not account for the fact that the pre-skip
field is always according to a 48kHz clock and wrote a too small
value in case one uses the encoder with a sample rate other than 48kHz;
this discrepancy between CodecDelay and OpusHead led to Firefox
rejecting such streams.
In order to account for that, said commit made the muxer always use
48kHz instead of the actual sample rate to convert the initial_padding
(in samples in the stream's sample rate) to ns. This meant that both
fields are now off by the same factor, so Firefox was happy.
Then commit f4bdeddc3c fixed the issue
in libopusenc; so the OpusHead is correct, but the CodecDelay is
still off*. This commit fixes this by effectively reverting
c3c22bee63.
*: Firefox seems to no longer abort when CodecDelay and OpusHead
are off.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is possible for the trailing padding to be zero, namely
e.g. if the AV_PKT_DATA_SKIP_SAMPLES side data is used
for leading padding. Matroska supports this (use a negative
DiscardPadding), but players do not; at least Firefox refuses
to play such a file. So for now only write DiscardPadding
if it is trailing padding and nonzero.
The fate-matroska-ogg-opus-remux was affected by this.
(I wish CodecDelay would not exist and DiscardPadding would
be used to instead trim the codec delay away (with the Block
timestamp corresponding to the time at which the actually
output audio is output).)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Creating CFHD RL VLC tables works by first extending
the codes by the sign, followed by creating a VLC,
followed by deriving the RL VLC from this VLC (which
is then discarded). Extending the codes uses stack arrays.
The tables used to initialize the VLC are already sorted
from left-to-right in the tree. This means that the
corresponding VLC entries are generally also ascending,
but not always: Entries from subtables always follow
the corresponding main table although it is possible
for the right-most node to fit into the main table.
This suggests that one can try to use the final destination
buffer as scratch buffer for the tables with sign included.
Unfortunately it works for neither of the tables if one
uses the right-most part of the RL VLC buffer as scratch buffer;
using the left-most part of the RL VLC buffer as scratch buffer
might work if one traverses the VLC entries from end to start.
But it works only for the little RL VLC (table 9), not for table 18.
Therefore this patch uses the RL VLC buffer for table 9
as scratch buffer for creating the bigger table 18.
Afterwards the left part of the buffer for table 9 is
used as scratch buffer to create table 9.
This fixes the cfhd part of ticket #9399 (if it is not already fixed).
Notice that I do not consider the previous stack usage excessive.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
cfhddata.c initializes a RL VLC table via code tables and
corresponding tables for length, run and level. code and length
tables are used to initialize a VLC, no symbol table is used.
Afterwards the symbols of said VLC are just the indices of
the corresponding entries in the code and length table that
were used for initialization; they can therefore be used
to get the matching level and run entry and they are not used
for anything else. Therefore one can just permute these tables
without changing the resulting RL VLC tables.
This commit does just this. It permutes these tables so that
the code tables are ordered from left to right in the resulting
tree and then switches to ff_init_vlc_from_lengths(), which
allows to remove the codes table altogether.
Given that these tables are constructed on the stack, this
also reduces stack usage, potentially fixing part of #9399.
(The size of the tables on the stack decreases from 4752 to
2640.)
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
cfhd.c checked for level being equal to a certain codebook-
dependent constant and to run being two. The first check is
actually redundant, as all codebooks contain only one (real)
entry with run == 2 (as is usual with VLCs, this one real entry
has several corresponding entries in the table). But given
that no entry has a run of zero (except incomplete entries
which just signal that one needs to do another round of parsing),
one can actually use that as sentinel. This patch does so.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Demuxers are not allowed to do this and few callers, if any, will handle
this correctly. Send the AV_SIDE_DATA_PARAM_CHANGE_SAMPLE_RATE side data
instead.
Demuxers are not supposed to update AVCodecParameters after the stream
was seen by the caller. This value is not important enough to support
dynamic updates for.
The mov demuxer only returns DV audio, video packets are discarded.
It first reads the data to be parsed into a packet. Then both this
packet and the pointer to its data are passed together to
avpriv_dv_produce_packet(), which parses the data and partially
overwrites the packet. This is confusing and potentially dangerous, so
just pass NULL and avoid pointless packet modification.
The struct is quite small and the decoder and the encoder use different
fields from it, so benefits from reusing it are small.
This allows making the buf field const.
The function contains only two assignments, setting DVVideoContext.avctx
and AVCodecContext.chroma_sample_location. However, the decoder does not
use the former, and the encoder should not be setting the latter.
Therefore move the first assignment to dvenc and the second to dvdec.
Make the encoder warn if the user-signalled chroma sample location does
not match the supported one, and return an error on higher compliance
levels.
Initialized to 1:1, but if the script sets these properties, it
will be set to those instead (0:0 disables it, apparently).
Signed-off-by: Stephen Hutchinson <qyot27@gmail.com>
If we want to be able to map between VAAPI and Vulkan (to do Vulkan
filtering), we need to have matching formats on each side.
The mappings here are not exact. In the same way that P010 is still
mapped to full 16 bit formats, P012 has to be mapped that way as well.
Similarly, VUYX has to be mapped to an alpha-equipped format, and XV36
has to be mapped to a fully 16bit alpha-equipped format.
While Vulkan seems to fundamentally lack formats with an undefined,
but physically present, alpha channel, it has have 10X6 and 12X4
formats that you could imagine using for P010, P012 and XV36, but these
formats don't support the STORAGE usage flag. Today, hwcontext_vulkan
requires all formats to be storable because it wants to be able to use
them to create writable images. Until that changes, which might happen,
we have to restrict the set of formats we use.
Finally, when mapping a Vulkan image back to vaapi, I observed that
the VK_FORMAT_R16G16B16A16_UNORM format we have to use for XV36 going
to Vulkan is mapped to Y416 when going to vaapi (which makes sense as
it's the exact matching format) so I had to add an entry for it even
though we don't use it directly.
With the necessary pixel formats defined, we can now expose support for
the remaining 10/12bit combinations that VAAPI can handle.
Specifically, we are adding support for:
* HEVC
** 12bit 420
** 10bit 422
** 12bit 422
** 10bit 444
** 12bit 444
* VP9
** 10bit 444
** 12bit 444
These obviously require actual hardware support to be usable, but where
that exists, it is now enabled.
Note that unlike YUVA/YUVX, the Intel driver does not formally expose
support for the alphaless formats XV30 and XV360, and so we are
implicitly discarding the alpha from the decoder and passing undefined
values for the alpha to the encoder. If a future encoder iteration was
to actually do something with the alpha bits, we would need to use a
formal alpha capable format or the encoder would need to explicitly
accept the alphaless format.
These are the formats we want/need to use when dealing with the Intel
VAAPI decoder for 12bit 4:2:0, 12bit 4:2:2, 10bit 4:4:4 and 12bit 4:4:4
respectively.
As with the already supported Y210 and YUVX (XVUY) formats, they are
based on formats Microsoft picked as their preferred 4:2:2 and 4:4:4
video formats, and Intel ran with it.
P12 and Y212 are simply an extension of 10 bit formats to say 12 bits
will be used, with 4 unused bits instead of 6.
XV30, and XV36, as exotic as they sound, are variants of Y410 and Y412
where the alpha channel is left formally undefined. We prefer these
over the alpha versions because the hardware cannot actually do
anything with the alpha channel and respecting it is just overhead.
Y412/XV46 is a normal looking packed 4 channel format where each
channel is 16bits wide but only the 12msb are used (like P012).
Y410/XV30 packs three 10bit channels in 32bits with 2bits of alpha,
like A/X2RGB10 style formats. This annoying layout forced me to define
the BE version as a bitstream format. It seems like our pixdesc
infrastructure can handle the LE version being byte-defined, but not
when it's reversed. If there's a better way to handle this, please
let me know. Our existing X2 formats all have the 2 bits at the MSB
end, but this format places them at the LSB end and that seems to be
the root of the problem.
There are no particular reasons to force the compiler to use the same
register as output and input operand. This forces an extra MOV
instruction if the input value needs to be reused after the swap.
In most cases, this makes no differences, as the compiler will seleect
the same register for both operands either way.
Signed-off-by: Martin Storsjö <martin@martin.st>
There are no particular reasons to force the compiler to use the same
register as output and input operand. This forces an extra MOV
instruction if the input value needs to be reused after the swap.
In most cases, this makes no differences, as the compiler will seleect
the same register for both operands either way.
Signed-off-by: Martin Storsjö <martin@martin.st>
It is advantageous for ff_crop_tab, as the base pointer used to
access this table is not the first element of it. But the real
base pointer is still at a constant offset from the code/the GOT
and can therefore be accessed relative to the instruction pointer
(if supported by the arch) or relative to the GOT; without this,
one has to first load address of ff_crop_tab (potentially via
the GOT) and then offset manually (which is what the earlier code
did).
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It reduces typing: Before this patch, there were 11 callbacks
that exceeded the 80 char line length limit; now there are zero.
It also allows to remove ONLY_IF_THREADS_ENABLED() in
libavutil/internal.h.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It reduces typing: Before this patch, there were 105 codecs
whose long_name-definition exceeded the 80 char line length
limit. Now there are only nine of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It has been deprecated in b4f59beeb4,
but the attribute_deprecated was not set and there was no entry
in APIchanges. This commit adds these and schedules it for removal.
Given that the reason behind the deprecation is exactly the same
as in av_fopen_utf8(), reuse its FF_API_AV_FOPEN_UTF8.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are unused since d63443b968.
Furthermore, they were always in the wrong header:
libavutil/internal.h is auto-included almost everywhere, but
FF_SYMVER would only ever be used at a few places, so a proper
header of its own would be appropriate for it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Currently, the frame-threaded decoding API still supports thread-unsafe
callbacks. If one uses a thread-unsafe get_buffer2() callback,
calls to av_frame_unref() by the decoder are serialized, because
it is presumed that the underlying deallocator is thread-unsafe.
The frame-threaded encoder seems to have been written with this
restriction in mind: It always serializes unreferencing its AVFrames,
although no documentation forces it to do so.
This commit schedules to change this behaviour as soon as thread-unsafe
callbacks are removed. For this reason, the FF_API_THREAD_SAFE_CALLBACKS
define is reused.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The AArch64 assembly accesses those symbols directly, without
indirection via e.g. the GOT on ELF. In order for this not to
require text relocations, those symbols need to be resolved fully
at link time, i.e. those symbols can't be interposable.
Normally, so far, this is achieved when linking shared libraries
in two ways; we have a version script (libavcodec/libavcodec.v) which
marks all symbols that don't start with av* as local. Additionally,
we try to add -Wl,-Bsymbolic to the linker options if supported,
making sure that such symbol references are resolved fully at link
time, instead of making them interposable.
When the libavcodec static library is linked into another shared
library, there's no guarantee that it uses similar options (even though
that would be favourable), which would end up requiring text relocations
in the AArch64 assembly.
Explicitly mark the symbols that are accessed from AArch64 assembly
as hidden, so that they are resolved fully at link time even without
the version script and -Wl,-Bsymbolic.
Signed-off-by: Martin Storsjö <martin@martin.st>
It makes no sense here, as flac_parse_block_header()
is not even supposed to advance the caller's pointer.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(The FLAC parser currently ignores the streaminfo block;
therefore some of this is decoder-only. Given that the FLAC
parser should probably use the streaminfo block, this stuff
is moved to flac_parse.h.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This field is not really used by the decoder at all:
It is only output in some debug log message, but this debug
log message should better use the value read from the streaminfo
instead of a second-guessed value from the decoder.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The HEVC code currently uses an array of arrays of NALUs; one such array
contains all the SPS NALUs, one all PPS NALUs etc. The array of arrays
is grown dynamically via av_reallocp_array(), but given that the latter
function automatically frees its buffer upon reallocation error,
it may only be used with PODs, which this case is not. Even worse:
While the pointer to the arrays is reset, the counter for the number
of arrays is not, leading to a segfault in hvcc_close().
Fix this by avoiding the allocations of the array of arrays altogether.
This is easily possible because their number is bounded (by five).
Furthermore, as a byproduct we can ensure that the code always
produces the recommended ordering of VPS-SPS-PPS-SEI (which was
not guaranteed before).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is designed to improve and unify error handling for
allocation failures for the many (often small) allocations that we have
in the fftools. These typically either don't return an error message
or an error message that is not really helpful to the user
and can be replaced by a generic error message without loss of
information.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The threshold of 5 is arbitrary, both smaller and larger should work fine
Fixes: Stack overflow
Fixes: 50603/clusterfuzz-testcase-minimized-ffmpeg_dem_ASF_O_fuzzer-6049302564175872
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
update_video_stats() currently uses OutputStream.data_size to print the
total size of the encoded stream so far and the average bitrate.
However, that field is updated in the muxer thread, right before the
packet is sent to the muxer. Not only is this racy, but the numbers may
not match even if muxing was in the main thread due to bitstream
filters, filesize limiting, etc.
Introduce a new counter, data_size_enc, for total size of the packets
received from the encoder and use that in update_video_stats(). Rename
data_size to data_size_mux to indicate its semantics more clearly.
No synchronization is needed for data_size_mux, because it is only read
in the main thread in print_final_stats(), which runs after the muxer
threads are terminated.
It is either equal to OutputStream.enc_ctx->codec, or NULL when enc_ctx
is NULL. Replace the use of enc with enc_ctx->codec, or the equivalent
enc_ctx->codec_* fields where more convenient.
ost->enc is always non-NULL here, since
- this code is never called for streamcopy
- opening the output file will fail if an encoder cannot be found, so
filters are never initialized
This code cannot be triggered, since after 90944ee3ab opening the
output file will abort if an encoder cannot be found and streamcopy was
not explicitly requested.
It races with the demuxing thread. Instead, send the information along
with the demuxed packets.
Ideally, the code should stop using the stream-internal parsing
completely, but that requires considerably more effort.
Fixes races, e.g. in:
- fate-h264-brokensps-2580
- fate-h264-extradata-reload
- fate-iv8-demux
- fate-m4v-cfr
- fate-m4v
Adds an option to use constant bitrate instead of average bitrate to the
videotoolbox encoders. This is enabled via -constant_bit_rate true.
macOS 13 is required for this option to work.
Signed-off-by: Sebastian Beckmann <beckmann.sebastian@outlook.de>
Signed-off-by: Rick Kern <kernrj@gmail.com>
This has been broken since the start, and it was only discovered
when I started testing my replacement for the FFT.
Disable it, since there's no point in fixing slower code that's about
to be removed anyway.
The vfp version is not affected.
Fixes: signed integer overflow: -6322983228386819992 - 5557477266266529857 cannot be represented in type 'long'
Fixes: 50112/clusterfuzz-testcase-minimized-ffmpeg_dem_IFF_fuzzer-6329186221948928
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: Timeout
Fixes no testcase, this is the same idea as similar attacks against XML parsers
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It is only used by encoders; in fact, AVCodecContext.time_base
is only used by encoders, so it is only useful for encoders.
Also constify the AVCodecContext parameter in it.
Also fixup the other headers a bit while removing now unnecessary
internal.h inclusions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Decoder-only, as the dimensions are set by the user when encoding.
Also fixup the other headers a bit while removing unnecessary internal.h
inclusions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only used by decoders (encoders have ff_encode_alloc_frame()).
Also clean up the other headers a bit while removing now redundant
internal.h inclusions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only used by decoders.
Also clean up the headers a bit while removing now unnecessary
internal.h inclusions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, these encoders received non-refcounted packets
(whose data was owned by the corresponding AVCodecContext)
from ff_alloc_packet(); these packets were made refcounted lateron
by av_packet_make_refcounted() generically.
This commit makes these encoders accept user-supplied buffers by
replacing av_packet_make_refcounted() with an equivalent function
that is based upon get_encode_buffer().
(I am pretty certain that one can also set the flag for mpegvideo-
based encoders, but I want to double-check this later. What is certain
is that it reallocates the buffer owned by the AVCodecContext
which should maybe be moved to encode.c, so that proresenc_kostya.c
and ttaenc.c can make use of it, too.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The encode-callback (the callback used by the FF_CODEC_CB_TYPE_ENCODE
encoders) is currently called in two places: encode_simple_internal()
and by the worker threads of frame-threaded encoders.
After the call, some packet properties are set based upon
the corresponding AVFrame properties and the packet is made
refcounted if it isn't already. So there is some code duplication.
There was also non-duplicated code in encode_simple_internal()
which is executed even when using frame-threading. This included
an emms_c() (which is needed for frame-threading, too, if it is
needed for the single-threaded case, because there are allocations
(via av_packet_make_refcounted()) immediately after returning
from the encode-callback).
Furthermore, some further properties are only set in
encode_simple_internal(): For audio, pts and duration are derived
from the corresponding fields of the frame if the encoder does not
have the AV_CODEC_CAP_DELAY set. Yet this is wrong for frame-threaded
encoders, because frame-threading always introduces delay regardless
of whether the underlying codec has said cap. This only worked because
there are no frame-threaded audio encoders.
This commit fixes the code duplication and the above issue by factoring
this code out and reusing it in both places. It would work in case
of audio codecs with frame-threading, because now the values are
derived from the correct AVFrame.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Instead of indicating whether we got a packet by setting
pkt->data and pkt->size to zero.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
AVCodecInternal.frame_thread_encoder is only set iff
active_thread_type is FF_THREAD_FRAME.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
As vaapi doesn't actually do anything useful with the alpha channel,
and we have an alphaless format available, let's use that instead.
The changes here are mostly 1:1 switching, but do note the explicit
change in the number of declared channels from 4 to 3 to reflect that
the alpha is being ignored.
As we already have support for VUYA, I figured I should do the small
amount of work to support VUYX as well. That means a little refactoring
to share code.
This is the alphaless version of VUYA that I introduced recently. After
further discussion and noting that the Intel vaapi driver explicitly
lists XYUV as a support format for encoding and decoding 8bit 444
content, we decided to switch our usage and avoid the overhead of
having a declared alpha channel around.
Note that I am not removing VUYA, as this turned out to have another
use, which was to replace the need for v408enc/dec when dealing with
the format.
The vaapi switching will happen in the next change
The fastest fast Fourier transform in not just the west, but the world,
now for the most popular toy ISA.
On a high level, it follows the design of the AVX2 version closely,
with the exception that the input is slightly less permuted as we don't have
to do lane switching with the input on double 4pt and 8pt.
On a low level, the lack of subadd/addsub instructions REALLY penalizes
any attempt at writing an FFT. That single register matters a lot,
and reloading it simply takes unacceptably long.
In x86 land, vendors would've noticed developers need this.
In ARM land, you get a badly designed complex multiplication instruction
we cannot use, that's not present on 95% of devices. Because only
compilers matter, right?
Future optimization options are very few, perhaps better register
management to use more ld1/st1s.
All timings below are in cycles:
A53:
Length | C | New (lavu) | Old (lavc) | FFTW
------ |-------------|-------------|-------------|-----
4 | 842 | 420 | 1210 | 1460
8 | 1538 | 1020 | 1850 | 2520
16 | 3717 | 1900 | 3700 | 3990
32 | 9156 | 4070 | 8289 | 8860
64 | 21160 | 9931 | 18600 | 19625
128 | 49180 | 23278 | 41922 | 41922
256 | 112073 | 53876 | 93202 | 101092
512 | 252864 | 122884 | 205897 | 207868
1024 | 560512 | 278322 | 458071 | 453053
2048 | 1295402 | 775835 | 1038205 | 1020265
4096 | 3281263 | 2021221 | 2409718 | 2577554
8192 | 8577845 | 4780526 | 5673041 | 6802722
Apple M1
New - Total for len 512 reps 2097152 = 1.459141 s
Old - Total for len 512 reps 2097152 = 2.251344 s
FFTW - Total for len 512 reps 2097152 = 1.868429 s
New - Total for len 1024 reps 4194304 = 6.490080 s
Old - Total for len 1024 reps 4194304 = 9.604949 s
FFTW - Total for len 1024 reps 4194304 = 7.889281 s
New - Total for len 16384 reps 262144 = 10.374001 s
Old - Total for len 16384 reps 262144 = 15.266713 s
FFTW - Total for len 16384 reps 262144 = 12.341745 s
New - Total for len 65536 reps 8192 = 1.769812 s
Old - Total for len 65536 reps 8192 = 4.209413 s
FFTW - Total for len 65536 reps 8192 = 3.012365 s
New - Total for len 131072 reps 4096 = 1.942836 s
Old - Segfaults
FFTW - Total for len 131072 reps 4096 = 3.713713 s
Thanks to wbs for some simplifications, assembler fixes and a review
and to jannau for giving it a look.
It is unnecessary since ee551a21ddcbf81afe183d9489c534ee80f263a0;
but it was redundant even before that, because decode_simple_internal()
calls emms_c().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There's no warranty that vpx_codec_encode() will generate a list with the same
amount of packets for both the yuv planes encoder and the alpha plane encoder,
so queueing packets based on what the main encoder returns will fail when the
amount of packets in both lists differ.
Queue all data packets for every vpx_codec_encode() call from both encoders
before attempting to assemble output AVPackets out of them.
Fixes ticket #9884
Reviewed-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This is somewhat redundant with the is_decoded check. Maybe
there is a nicer solution
Fixes: Null pointer dereference
Fixes: 49584/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-5297367351427072
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -1948269928 * 10 cannot be represented in type 'int'
Fixes: 49451/clusterfuzz-testcase-minimized-ffmpeg_dem_SUBVIEWER_fuzzer-6344614822412288
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Don't silently replace it with the default layout for the amount of channels
from the requested layout.
Should fix ticket #9869
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr
filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x
filter-paletteuse-bayer filter-paletteuse-bayer0
filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests
when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to
"mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritten
when using SSSE3).
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
By checking immediately whether the first allocation was successfull
one can simplify the cleanup code in case of errors.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
APNG works with a single reference frame and an output frame.
According to the spec, decoding APNG works by decoding
the current IDAT/fdAT chunks (which decodes to a rectangular
subregion of the whole image region), followed by either
overwriting the region of the output frame with the newly
decoded data or by blending the newly decoded data with
the data from the reference frame onto the current subregion
of the output frame. The remainder of the output frame
is just copied from the reference frame.
Then the reference frame might be left untouched
(APNG_DISPOSE_OP_PREVIOUS), it might be replaced by the output
frame (APNG_DISPOSE_OP_NONE) or the rectangular subregion
corresponding to the just decoded frame has to be reset
to black (APNG_DISPOSE_OP_BACKGROUND).
The latter case is not handled correctly by our decoder:
It only performs resetting the rectangle in the reference frame
when decoding the next frame; and since commit
b593abda6c it does not reset
the reference frame permanently, but only temporarily (i.e.
it only affects decoding the frame after the frame with
APNG_DISPOSE_OP_BACKGROUND). This is a problem if the
frame after the APNG_DISPOSE_OP_BACKGROUND frame uses
APNG_DISPOSE_OP_PREVIOUS, because then the frame after
the APNG_DISPOSE_OP_PREVIOUS frame has an incorrect reference
frame. (If it is not followed by an APNG_DISPOSE_OP_PREVIOUS
frame, the decoder only keeps a reference to the output frame,
which is ok.)
This commit fixes this by being much closer to the spec
than the earlier code: Resetting the background is no longer
postponed until the next frame; instead it is applied to
the reference frame.
Fixes ticket #9602.
(For multithreaded decoding it was actually already broken
since commit 5663301560d77486c7f7c03c1aa5f542fab23c24.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Some code in alimiter assumes that there are 2 channels, resulting in
clipping if the loudest channel is 3 or above and an out-of-bounds read if
the input is monophonic. Fix that in 2 places.
Signed-off-by: David Flater <dave@flaterco.com>
Add adaptive_i/b feature to hevc_qsv. Adaptive_i allows changing of
frame type from P and B to I. Adaptive_b allows changing of frame type
frome B to P.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
According to its documentation it returns "pts of the last muxed packet
+ its duration", but the value it actually returns right now is
(possibly guessed) dts after muxer-internal bitstream filtering (if
any).
This function was added for ffmpeg.c, but it is not used there anymore.
Since the value it returns is ill-defined and so inappropriate for any
serious use, deprecate it.
Just return AVERROR_EXTERNAL immediately when encode error.
The other logic should keep the old behavior before commit 7c05b7951.
Suggested-By: Zhao Zhili <zhilizhao@tencent.com>
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
This commit moves the encoder-only allocations of the tables
owned solely by the main encoder context to mpegvideo_enc.c.
This avoids checks in mpegvideo.c for whether we are dealing
with an encoder; it also improves modularity (if encoders are
disabled, this code will no longer be included in the binary).
And it also avoids having to reset these pointers at the beginning
of ff_mpv_common_init() (in case the dst context is uninitialized,
ff_mpeg_update_thread_context() simply copies the src context
into dst which therefore may contain pointers not owned by it,
but this does not happen for encoders at all).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes frame-threaded decoding of samples created by concatting
a video with data partitioning and a video not using it.
(Only the MPEG-4 decoder sets this, so it is synced in
mpeg4_update_thread_context() despite being a MpegEncContext-field.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is by no means perfect, since at least ddagrab will return scRGB
data with values outside of 0.0f to 1.0f for HDR values.
Its primary purpose is to be able to work with the format at all.
_Float16 support was available on arm/aarch64 for a while, and with gcc
12 was enabled on x86 as long as SSE2 is supported.
If the target arch supports f16c, gcc emits fairly efficient assembly,
taking advantage of it. This is the case on x86-64-v3 or higher.
Same goes on arm, which has native float16 support.
On x86, without f16c, it emulates it in software using sse2 instructions.
This has shown to perform rather poorly:
_Float16 full SSE2 emulation:
frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A speed=33.9x
_Float16 f16c accelerated (Zen2, --cpu=znver2):
frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A speed=78.6x
classic half2float full software implementation:
frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A speed=64.2x
Hence an additional check was introduced, that only enables use of
_Float16 on x86 if f16c is being utilized.
On aarch64, a similar uplift in performance is seen:
RPi4 half2float full software implementation:
frame= 6088 fps=126 q=-0.0 Lsize=N/A time=00:04:03.48 bitrate=N/A speed=5.06x
RPi4 _Float16:
frame= 6103 fps=158 q=-0.0 Lsize=N/A time=00:04:04.08 bitrate=N/A speed=6.32x
Since arm/aarch64 always natively support 16 bit floats, it can always
be considered fast there.
I'm not aware of any additional platforms that currently support
_Float16. And if there are, they should be considered non-fast until
proven fast.
IEEE-754 differentiates two different kind of NaNs.
Quiet and Signaling ones. They are differentiated by the MSB of the
mantissa.
For whatever reason, actual hardware conversion of half to single always
sets the signaling bit to 1 if the mantissa is != 0, and to 0 if it's 0.
So our code has to follow suite or fate-testing hardware float16 will be
impossible.
This avoids triggering overflows in the filters, and avoids stray
test failures in the approximate functions on x86; due to rounding
differences, one implementation might overflow while another one
doesn't.
Signed-off-by: Martin Storsjö <martin@martin.st>
Some muxers, such as GPAC, create files with only one sidx, but two streams
muxed into the same fragments pointed to by this sidx.
Prevously, in such a case, when we seeked in such files, we fell back
to, for example, using the sidx associated with the video stream, to
seek the audio stream, leaving the seekhead in the wrong place.
We can still do this, but we need to take care to compare timestamps
in the same time base.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
They are already synced generically in update_context_from_thread()
in pthread_frame.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unused since 3575a495f6
(and the error message is dangerous: av_get_pix_fmt_name(format)
returns NULL iff av_pix_fmt_desc_get(format) returns NULL
and using a NULL string for %s would be UB).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Improves the grepability of the code.
(Furthermore, I hope that no compiler will really call memset
for 28 bytes.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is done later in ff_mpv_frame_start() (and nobody uses
current_picture_ptr between setting it in ff_mpv_frame_start()).
(The reason the vsynth*-h263-obmc ref files change is because
the call to ff_find_unused_picture() now happens after the older
pictures have been unreferenced in ff_mpv_frame_start(),
so that their slots in the picture array can be immediately
reused; the obmc code is somehow buggy and changes its output
depending on the earlier contents of the motion_val buffer.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Provide optimized implementation of pix_abs8 function for arm64.
Performance comparison tests are shown below.
- pix_abs_1_0_c: 101.2
- pix_abs_1_0_neon: 22.5
- sad_1_c: 101.2
- sad_1_neon: 22.5
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of sse8 function for arm64.
Performance comparison tests are shown below.
- sse_1_c: 130.7
- sse_1_neon: 29.7
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of pix_abs16_y2 function for arm64.
Performance comparison tests are shown below.
pix_abs_0_2_c: 317.2
pix_abs_0_2_neon: 37.5
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide neon implementation for sse4 function.
Performance comparison tests are shown below.
- sse_2_c: 80.7
- sse_2_neon: 31.0
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide neon implementation for sse16 function.
Performance comparison tests are shown below.
- sse_0_c: 268.2
- sse_0_neon: 43.5
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
c11fb46731 led to a regression whereby the return code for missing
input or input probe is overridden by writer close return code and
hence not conveyed in the exit code.
Since d69d12a5b9 these av_assert2()
(or more exactly, the ones in hadamard8_diff8x8_c() and
hadamard8_intra8x8_c()) are hit. So just remove all of these asserts.
(If the test were improved to know which functions expect h == 8
and which support any value, the asserts could be readded
at the appropriate places.)
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This directory dependency is normally added implicitly by rules
in ffbuild/common.mak; for tools it's created by a rule for TOOLOBJS.
TOOLOBJS is populated implicitly from TOOLS, and decode_simple.o
doesn't end up there because it's an odd occurrance of a lone
object file in the tools subdirectory, not belonging to any other
tool.
Signed-off-by: Martin Storsjö <martin@martin.st>
Previously, the checkasm test always passed h=8, so no other cases
were tested.
Out of the me_cmp functions, in practice, some functions are hardcoded
to always assume a 8x8 block (ignoring the h parameter), while others
do use the parameter. For those with hardcoded height, both the
reference C function and the assembly implementations ignore the
parameter similarly.
The documentation for the functions indicate that heights between
w/2 and 2*w, within the range of 4 to 16, should be supported. This
patch just tests random heights in that range, without knowing what
width the current function actually uses.
Signed-off-by: Martin Storsjö <martin@martin.st>
The height is hardcoded in some of the me_cmp functions, but not
in all of them. But in the case of all other functions, it's hardcoded
in the same place in SIMD functions as in the C reference functions,
while this one function differs from the behaviour of the C code.
(Before 542765ce3e, there were a
couple other sad8_*_mmx functions with similar hardcoded height.)
Signed-off-by: Martin Storsjö <martin@martin.st>
This commit adds new code paths for vscale when filterSize is 2, 4, or
8. By using specialized code with unrolling to match the filterSize we
can improve performance.
On AWS c7g (Graviton 3, Neoverse V1) instances:
before after
yuv2yuvX_2_0_512_accurate_neon: 558.8 268.9
yuv2yuvX_4_0_512_accurate_neon: 637.5 434.9
yuv2yuvX_8_0_512_accurate_neon: 1144.8 806.2
yuv2yuvX_16_0_512_accurate_neon: 2080.5 1853.7
Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Use scalar times vector multiply accumlate instructions instead of
vector times vector to remove the need for replicating load instructions
which are slightly slower.
On AWS c7g (Graviton 3, Neoverse V1) instances:
yuv2yuvX_8_0_512_accurate_neon: 1144.8 987.4
yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4
Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Change the reference to exactly match the C reference in swscale,
instead of exactly matching the x86 SIMD implementations (which
differs slightly). Test with and without SWS_ACCURATE_RND - if this
flag isn't set, the output must match the C reference exactly,
otherwise it is allowed to be off by 2.
Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND
is set - apparently this discrepancy hasn't been noticed in other
exact tests before.
Add a test for yuv2plane1.
Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Use it instead of AVStream.codecpar in the main thread. While
AVStream.codecpar is documented to only be updated when the stream is
added or avformat_find_stream_info(), it is actually updated during
demuxing. Accessing it from a different thread then constitutes a race.
Ideally, some mechanism should eventually be provided for signalling
parameter updates to the user. Then the demuxing thread could pick up
the changes and propagate them to the decoder.
This specialization handles the case where filtersize is 4 mod 8, e.g.
12, 20, etc. Aarch64 was previously using the c function for this case.
This implementation speeds up that case significantly.
hscale_8_to_15__fs_12_dstW_512_c: 6234.1
hscale_8_to_15__fs_12_dstW_512_neon: 1505.6
Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
frag_stream_info->index_entry isn't the first sample/trun index.
cenc.frag_index_entry_base failed to catch the case since
current_index > 0.
Fix ticket #9807.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
frag_index.current is used by cenc_filter, and is updated inside
mov_read_moof. It can out of sync regarding to mov_read_packet.
Partly fix ticket #9807.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Convert the input from a scatter to a gather instead,
which is faster and better for SIMD.
Also, add a pre-shuffled exptab version to avoid
gathering there at all. This doubles the exptab size,
but the speedup makes it worth it. In SIMD, the
exptab will likely be purged to a higher cache
anyway because of the FFT in the middle, and
the amount of loads stays identical.
For a 960-point inverse MDCT, the speedup is 10%.
This makes it possible to write sane and fast SIMD
versions of inverse MDCTs.
A gateway can see everything, and we should not be shipping a hardcoded
default from a third party company; it's a security risk.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
mpegvideo uses an array of Pictures and when it is done with using
them, it only unreferences them incompletely: Some buffers are kept
so that they can be reused lateron if the same slot in the Picture
array is reused, making this a sort of a bufferpool.
(Basically, a Picture is considered used if the AVFrame's buf is set.)
Yet given that other pieces of the decoder may have a reference to
these buffers, they need not be writable and are made writable using
av_buffer_make_writable() when preparing a new Picture. This involves
reading the buffer's data, although the old content of the buffer
need not be retained.
Worse, this read can be racy, because the buffer can be used by another
thread at the same time. This happens for Real Video 3 and 4.
This commit fixes this race by no longer copying the data;
instead the old buffer is replaced by a new, zero-allocated buffer.
(Here are the details of what happens with three or more decoding threads
when decoding rv30.rm from the FATE-suite as happens in the rv30 test:
The first decoding thread uses the first slot of its picture array
to store its current pic; update_thread_context copies this for the
second thread that decodes a P-frame. It uses the second slot in its
Picture array to store its P-frame. This arrangement is then copied
to the third decode thread, which decodes a B-frame. It uses the third
slot in its Picture array for its current frame.
update_thread_context copies this to the next thread. It unreferences
the third slot containing the other B-frame and then it reuses this
slot for its current frame. Because the pic array slots are only
incompletely unreferenced, the buffers of the previous B-frame are
still in there and they are not writable; in fact the previous
thread is concurrently writing to them, causing races when making
the buffer writable.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
At this point active_thread_type is set iff active_thread_type
is set to FF_THREAD_FRAME iff AVCodecInternal.frame_thread_encoder
is set.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Discontinuity detection/correction is left in the main thread, as it is
entangled with InputStream.next_dts and related variables, which may be
set by decoding code.
Fixes races e.g. in fate-ffmpeg-streamloop after
aae9de0cb2.
This will allow to move normal offset handling to demuxer thread, since
discontinuities currently have to be processed in the main thread, as
the code uses some decoder-produced values.
InputFile.ts_offset can change during transcoding, due to discontinuity
correction. This should not affect the streamcopy starting timestamp.
Cf. bf2590aed3
The AviSynth C API requires using avs_release_video_frame
whenever avs_get_frame has been used, but the recent addition
of frameprop reading to the demuxer was missing this in
avisynth_create_stream_video.
Signed-off-by: Stephen Hutchinson <qyot27@gmail.com>
This allows user to build FFmpeg against Intel oneVPL. oneVPL 2.6
is the required minimum version when building Intel oneVPL code.
It will fail to run configure script if both libmfx and libvpl are
enabled.
It is recommended to use oneVPL for new work, even for currently available
hardwares [1]
Note the preferred child device type is d3d11va for libvpl on Windows.
The commands below will use d3d11va if d3d11va is available on Windows.
$ ffmpeg -hwaccel qsv -c:v h264_qsv ...
$ ffmpeg -qsv_device 0 -hwaccel qsv -c:v h264_qsv ...
$ ffmpeg -init_hw_device qsv=qsv:hw_any -hwaccel qsv -c:v h264_qsv ...
$ ffmpeg -init_hw_device qsv=qsv:hw_any,child_device=0 -hwaccel qsv -c:v h264_qsv ...
User may use child_device_type option to specify child device type to
dxva2 or derive a qsv device from a dxva2 device
$ ffmpeg -init_hw_device qsv=qsv:hw_any,child_device=0,child_device_type=dxva2 -hwaccel qsv -c:v h264_qsv ...
$ ffmpeg -init_hw_device dxva2=d3d9:0 -init_hw_device qsv=qsv@d3d9 -hwaccel qsv -c:v h264_qsv ...
[1] https://www.intel.com/content/www/us/en/develop/documentation/upgrading-from-msdk-to-onevpl/top.html
If qsv hwdevice is available, use the mfxLoader handle in qsv hwdevice
to create mfx session. Otherwise create mfx session with a new mfxLoader
handle.
This is in preparation for oneVPL support
The following Cflags has been added to libmfx.pc, so mfx/ prefix is no
longer needed when including mfx headers in FFmpeg.
Cflags: -I${includedir} -I${includedir}/mfx
Some old versions of libmfx have the following Cflags in libmfx.pc
Cflags: -I${includedir}
We may add -I${includedir}/mfx to CFLAGS when running 'configure
--enable-libmfx' for old versions of libmfx, if so, mfx headers without
mfx/ prefix can be included too.
If libmfx comes without pkg-config support, we may do a small change to
the settings of the environment(e.g. set -I/opt/intel/mediasdk/include/mfx
instead of -I/opt/intel/mediasdk/include to CFLAGS), then the build can
find the mfx headers without mfx/ prefix
After applying this change, we won't need to change #include for mfx
headers when mfx headers are installed under a new directory.
This is in preparation for oneVPL support (mfx headers in oneVPL are
installed under vpl directory)
The data structures for VP9 in mfxvp9.h is wrapped by
MFX_VERSION_NEXT, which means those data structures have never been used
in a public release. Actually MFX_CODEC_VP9 and other VP9 stuffs are
added in mfxstructures.h. In addition, mfxdefs.h is included in
mfxvp9.h, so we may use the check in this patch for MFX_CODEC_VP9
This is in preparation for oneVPL support because mfxvp9.h is removed
from oneVPL [1]
[1]: https://github.com/oneapi-src/oneVPL
Intel's oneVPL is a successor to MediaSDK, but removed some obsolete
features of MediaSDK[1], some early versions of oneVPL still use libmfx
as library name[2]. However some of obsolete features, including OPAQUE
memory, multi-frame encode, user plugins and LA_EXT rate control mode
etc, have been enabled in QSV, so user can not use --enable-libmfx to
enable QSV if using an early version of oneVPL SDK. In order to ensure
user builds FFmpeg against a right version of libmfx, this patch added a
check for version < 2.0 and warning message about the used obsolete
features.
[1] https://spec.oneapi.io/versions/latest/elements/oneVPL/source/VPL_intel_media_sdk.html
[2] https://github.com/oneapi-src/oneVPL
It is the proper place to set it, directly besides mb_width and
mb_stride. The reason for doing it the way it is done now seems
to be that the code does not create more slice contexts than necessary
(i.e. not more than one per row), so that this number needs to be
known before setting the number of slices. But this can always be
arranged by just moving the code that sets the number of slices.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These fields are only ever set by the encoder for the current picture
and for no other picture. So only one set of these values needs to
exist, so move them to MpegEncContext.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Poisoning returned buffers is based around the implicit assumption
that the contents of said buffers are transient. Yet this is not true
for the buffer pools used by the various hardware contexts which store
important state in there that needs to be preserved.
Furthermore, the current code is also based on the assumption
that the complete buffer pointed to by AVBuffer->data coincides with
AVBufferRef->data; yet an implementation might store some data of its
own before the actual user-visible data (accessible via AVBufferRef)
which would be broken by the current code.
(This is of course yet more proof that the AVBuffer API is not the right
tool for the hardware contexts.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all the buffers that are made writable, three are always allocated
and the other four are allocated iff any one of them is allocated;
so one can replace the seven checks for existence with one.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, ff_wmv2_decode_secondary_picture_header() only
set the mb_type array for non I-pictures, so that the decoding
process uses the earlier values of this array; this affects
the output of the wmv8-x8intra FATE-test (which this patch
therefore updates). These earlier values were set when decoding
earlier frames or when the buffer was initially zero-allocated.
A consequence of this is that the output of this test would be
random if ff_find_unused_picture() would select the unused picture
to return at random. Furthermore decoding from a keyframe onwards
depends upon the earlier state of the decoder.
This patch therefore zeroes said array when decoding an I picture.
(It is not claimed that zero is the right value to fill the array with.
I just don't know.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Include two values for it, a default one that sets/keeps the current behavior,
where the frame event generated by the primary input will have a timestamp
equal or higher than frames in secondary input, plus a new one where the
secondary input frame will be that with the absolute closest timestamp to that
of the frame event one.
Addresses ticket #9689, where the new optional behavior produces better frame
syncronization.
Reviewed-by: Nicolas George <george@nsup.org>
Signed-off-by: James Almer <jamrial@gmail.com>
Stores the item ids of all the items found in the file and
processes the primary item at the end of the meta box. This patch
does not change any behavior. It sets up the code for parsing
alpha channel (and possibly images with 'grid') in follow up
patches.
Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: James Zern <jzern@google.com>
These tables are only used by encoders and only for the current picture;
ergo they need not be put into the picture at all, but rather into
the encoder's context. They also don't need to be refcounted,
because there is only one owner.
In contrast to this, the earlier code refcounts them which
incurs unnecessary overhead. These references are not unreferenced
in ff_mpeg_unref_picture() (they are kept in order to have something
like a buffer pool), so that several buffers are kept at the same
time, although only one is needed, thereby wasting memory.
The code also propagates references to other pictures not part of
the pictures array (namely the copy of the current/next/last picture
in the MpegEncContext which get references of their own). These
references are not unreferenced in ff_mpeg_unref_picture() (the
buffers are probably kept in order to have something like a pool),
yet if the current picture is a B-frame, it gets unreferenced
at the end of ff_mpv_encode_picture() and its slot in the picture
array will therefore be reused the next time; but the copy of the
current picture also still has its references and therefore
these buffers will be made duplicated in order to make them writable
in the next call to ff_mpv_encode_picture(). This is of course
unnecessary.
Finally, ff_find_unused_picture() is supposed to just return
any unused picture and the code is supposed to work with it;
yet for the vsynth*-mpeg4-adap tests the result depends upon
the content of these buffers; given that this patchset
changes the content of these buffers (the initial content is now
the state of these buffers after encoding the last frame;
before this patch the buffers used came from the last picture
that occupied the same slot in the picture array) their ref-files
needed to be changed. This points to a bug somewhere (if one removes
the initialization, one gets uninitialized reads in
adaptive_quantization in ratecontrol.c).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Sufficiently recent Intel hardware is able to do encoding of 8bit 4:4:4
content in HEVC and VP9. The main requirement here is that the frames
must be provided in the AYUV format.
Enabling support is done by adding the appropriate encoding profiles
and noting that AYUV is officially a four channel format with alpha so
we must state that we expect all four channels.
vaapi_decode_find_best_format currently does not set the
VA_SURFACE_ATTRIB_SETTABLE flag on the pixel format attribute that it
returns.
Without this flag, the attribute will be ignored by vaCreateSurfaces,
meaning that the driver's default logic for picking a pixel format will
kick in.
So far, this hasn't produced visible problems, but when trying to
decode 4:4:4 content, at least on Intel, the driver will pick the
444P planar format, even though the decoder can only return the AYUV
packed format.
The hwcontext_vaapi code that sets surface attributes when picking
formats does not have this bug.
Applications may use their own logic for finding the best format, and
so may not hit this bug. eg: mpv is unaffected.
The present default value of 0 will render the overlay video invisible.
A default of 1.0 is consistent with most common use cases.
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Directly branch into the special 64-point deinterleave
subroutine rather than going through the general deinterleave.
64-point transform timings on Zen 3:
Before:
1974 decicycles in av_tx (fft),16776864 runs, 352 skips
After:
1956 decicycles in av_tx (fft),16775378 runs, 1838 skips
This codepath is enabled by default on arm, if the linux perf API
is available, unless disabled with --disable-linux-perf.
Signed-off-by: Martin Storsjö <martin@martin.st>
-stream_loop is currently handled by destroying the demuxer thread,
seeking, then recreating it anew. This is very messy and conflicts with
the future goal of moving each major ffmpeg component into its own
thread.
Handle -stream_loop directly in the demuxer thread. Looping requires the
demuxer to know the duration of the file, which takes into account the
duration of the last decoded audio frame (if any). Use a thread message
queue to communicate this information from the main thread to the
demuxer thread.
This avoids a potential race with the demuxer adding new streams. It is
also more efficient, since we no longer do inter-thread transfers of
packets that will be just discarded.
This undocumented feature runtime-enables dumping input packets. I can
think of no reasonable real-world use case that cannot also be
accomplished in a different way. Keeping this functionality would
interfere with the following commit moving it to the input thread (then
setting the variable would require locking or atomics, which would be
unnecessarily complicated for a feature that probably nobody uses).
There are currently three possible modes for an output stream:
1) The stream is produced by encoding output from some filtergraph. This
is true when ost->enc_ctx != NULL, or equivalently when
ost->encoding_needed != 0.
2) The stream is produced by copying some input stream's packets. This
is true when ost->enc_ctx == NULL && ost->source_index >= 0.
3) The stream is produced by attaching some file directly. This is true
when ost->enc_ctx == NULL && ost->source_index < 0.
OutputStream.stream_copy is currently used to identify case 2), and
sometimes to confusingly (or even incorrectly) identify case 1). Remove
it, replacing its usage with checking enc_ctx/source_index values.
The fLaC and dfLa box IDs have been registered with the MP4 RA
(they are now listed at https://mp4ra.org/#/codecs) and support
for muxing FLAC in MP4 has been experimental in ffmpeg for
6 years now, since Nov 21, 2016
This patch removes the experimental status and removes the MP4
object type, as none has been registered for FLAC as it was not
deemed necessary.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
In addition to .eac3, .ec3 is also commonly used by people to name raw
E-AC-3 streams. Enables automatic recognition of the eac3 format for
the .ac3 extension.
For instance Dolby Digital Plus software only support files with
.ec3. Files with .eac3 are not supported. Check issue #18 in the
public dlb_mp4base repository from DolbyLaboratories.
Signed-off-by: Ruben Gonzalez <rgonzalez@fluendo.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The functions to replace parameter sets are only called
after the respective parameter set has just been read or
has just been written; all of these functions check
that the id field is within the appropriate range.
So the checks in the replace-functions can be removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is no longer used.
Also rename ff_cbs_alloc_unit_content2 to ff_cbs_alloc_unit_content.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
cbs_jpeg was the last user of CBS that didn't use
CodedBitstreamUnitTypeDescriptors.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The code just creates new references without allocating
new buffers for the subobjects; therefore the actual data pointer
stays valid and need not be updated.
Also remove an assert that ensured that the calculation
for updating the pointer makes sense.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This updates the list closer to reality.
Iam not a professional server admin, iam happy to help maintain the box as i have
done in the past. But iam not qualified nor volunteering to fix sudden problems
nor do i do major upgrades (i lack the experience to recover the box remotely if
something goes wrong) and also iam not maintaining backups ATM (our backup system
had a RAID-5 failure, raz is working on setting a new one up)
Maybe this should be signaled in a different way than spliting the lines but ATM
people ping me if something is wrong and what i do is mainly mail/ping raz
and try to find another root admin so raz is not the only active & professional
admin on the team. It would be more efficient if people contact raz and others
directly instead of depending on my waking up and forwarding a "ffmpeg.org" is down note
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It's not modified, so we can simply use a const pointer to it.
Also check the return value of the other copy in the function while at it.
Reviewed-by: Nicolas George <george@nsup.org>
Signed-off-by: James Almer <jamrial@gmail.com>
nvenc does not appear to use these values as inputs for its built in RGB
to YUV conversion, but instead sets them on the output as-is.
Testing indicates the input is expected to be sRGB, with the output
ending up as limited range bt.470.
Up until now, there were four copies of
quantize_and_encode_band_cost_(ZERO|[SU]QUAD|[SU]PAIR|ESC|NONE|NOISE|STEREO)
(namely in aaccoder.o, aacenc_is.o, aacenc_ltp.o, aacenc_pred.o).
As 43b378a0d3 says, this is meant to
enable more optimizations.
Yet neither GCC nor Clang performed such optimizations: The functions
in the aforementioned files are not optimized according to
the specifics of the calls in the respective file. Therefore
duplicating them is a waste of space; this commit therefore stops doing
so. The remaining copy is placed into aaccoder.c (which is the only
place where the "round to zero" variant of quantize_and_encode_band()
is used, so that this can be completely internal to aaccoder.c).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also constify the corresponding code in mpegvideo.c that handles
lowres.
(Unfortunately, not everything that is const could be constified:
ref_picture could be made const uint8_t* const* if C allowed the
safe automatic conversion from uint8_t**; and pix_op, qpix_op
could be made to point to const function pointers, but C's handling
of const in pointers to arrays is broken.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is possible now that the HEVC DSP API treats them as const.
Also constify the AVFrames whose data is used as source in such
cases.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
__lasx_xvldx does not accept a pointer to const (in fact,
no function in lasxintrin.h does so), although it is not allowed
to modify the pointed-to buffer. Therefore this commit adds a wrapper
for it in order to constify the H264Chroma API in a later commit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
__lsx_vldx does not accept a pointer to const (in fact,
no function in lsxintrin.h does so), although it is not allowed
to modify the pointed-to buffer. Therefore this commit adds a wrapper
for it in order to constify the HEVC DSP functions in a later commit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Now that we have a combination of capable hardware (new enough Intel)
and a mutually understood format ("AYUV"), we can declare support for
decoding 8bit 4:4:4 content via VAAPI.
This requires listing AYUV as a supported format, and then adding VAAPI
as a supported hwaccel for the relevant codecs (HEVC and VP9). I also
had to add VP9Profile1 to the set of supported profiles for VAAPI as it
was never relevant before.
The "AYUV" format is defined by Microsoft as their preferred format for
4:4:4 content, and so it is the format used by Intel VAAPI and QSV.
As Microsoft like to define their byte ordering in little-endian
fashion, the memory order is reversed, and so our pix_fmt, which
follows memory order, has a reversed name (VUYA).
To do so, store the pointer to the VLC table and not to the VLC.
This is possible, because all the VLCs of the same type use
the same number of bits.
Also use a const VLCElem*, because the target is static and must
therefore not be modified after its initialization.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The msmpeg4 decoders/encoders share a common set of prerequisites,
ergo it makes sense to use common subsystems for them. This also
allows to remove the CONFIG_MSMPEG4_DECODER/ENCODER ad-hoc defines
(which violated the CONFIG_ namespace).
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
IntraX8 uses WMV2DSP directly, so it should have a direct dependency
on it. Also remove the indirect Makefile dependency of the VC-1 decoder
on wmv2dsp.o. Notice that since the addition of the MIPS WMV2DSP
implementation building only the VC-1 decoder would fail, because
no Makefile dependency VC1->wmv2dsp_init_mips.o has been added.
This is of course fixed by this commit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Firstly, the timestamps generated from framerate are inaccurate for
variable framerate mode.
Secondly, the timestamps always start from zero, while pts/dts can
start from nonzero. FLV demuxer rejects such index with message:
"Found invalid index entries, clearing the index".
Usually a HW decoder is expected when user specifies a HW acceleration
method via -hwaccel option, however the current implementation doesn't
take HW acceleration method into account, it is possible to select a SW
decoder.
For example:
$ ffmpeg -hwaccel vaapi -i av1.mp4 -f null -
$ ffmpeg -hwaccel nvdec -i av1.mp4 -f null -
$ ffmpeg -hwaccel vdpau -i av1.mp4 -f null -
[...]
Stream #0:0 -> #0:0 (av1 (libdav1d) -> wrapped_avframe (native))
libdav1d is selected in this case even if vaapi, nvdec or vdpau is
specified.
After applying this patch, the native av1 decoder (with vaapi, nvdec or
vdpau support) is selected for decoding(libdav1d is still used for
probing format).
$ ffmpeg -hwaccel vaapi -i av1.mp4 -f null -
$ ffmpeg -hwaccel nvdec -i av1.mp4 -f null -
$ ffmpeg -hwaccel vdpau -i av1.mp4 -f null -
[...]
Stream #0:0 -> #0:0 (av1 (native) -> wrapped_avframe (native))
Tested-by: Mario Roy <marioeroy@gmail.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
After applying this patch, the desired HW acceleration method is known
before selecting decoder, so we may take HW acceleration method into
account when selecting decoder for input stream in the next commit
There should be no functional changes in this patch
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This function is bitrotten: It uses different parameters
than the corresponding ASM functions which replaced it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ea41e6d637 forgot
the int->ptrdiff_t switch for the stride. Libav didn't
do it because Libav had already dropped support for Alpha
at that point.
Only compilation has been tested for this commit.
(It might be that the ASM-versions of me_cmp_func functions
need to be updated as well.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The generated files are endian-dependent, so no checksums
may be part of the ref files.
Fixes ticket #9854.
Tested-by: Sebastian Ramacher <sramacher@debian.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The output file (even the filesize) of the recently added
EXR tests depends on the endianness; therefore checksums
of these files must not be part of the ref file. Therefore
this commit adds an option (unused for now) to disable these
checksums on a per-test basis.
In order to avoid having to check twice, the checksum and
the filesize info are moved to immediately follow one another;
this results into updates to the ref files of all lavf-image tests.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes outputting silence on the second channel when decoding Parametic Stereo
HE-AAC.
Closes ticekt #3361.
Signed-off-by: James Almer <jamrial@gmail.com>
The DXGI_OUTDUPL_FRAME_INFO type isn't available in Windows API
subsets other than "desktop", while the IDXGIOutput1 interface is
available for all API subsets.
This fixes compilation for UWP/"Windows Store" configurations (and
older API subsets like Windows Phone).
Signed-off-by: Martin Storsjö <martin@martin.st>
Fixes -Werror=format-security build failures when building with
disabled optimizations and (according to fate.ffmpeg.org also with
several other old GCC versions).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Modifying the main context from a slice thread is (usually)
a data race, so it must not happen. So only use a pointer to const
to access the main context.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Modifying the main context from a slice thread is (usually)
a data race, so it must not happen. So only use a pointer to const
to access the main context.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Modifying the main context from a slice thread is (usually)
a data race, so it must not happen. So only use a pointer to const
to access the main context.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Modifying the main context from a slice thread is (usually)
a data race, so it must not happen. So only use a pointer to const
to access the main context.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Modifying the main context from a slice thread is (usually)
a data race, so it must not happen. So only use a pointer to const
to access the main context.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Modifying the main context from a slice thread is (usually)
a data race, so it must not happen. So only use a pointer to const
to access the main context.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Modifying the main context from a slice thread is (usually)
a data race, so it must not happen. So only use a pointer to const
to access the main context.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Modifying the main context from a slice thread is (usually)
a data race, so it must not happen. So only use a pointer to const
to access the main context.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Modifying the main context from a slice thread is (usually) a data race,
so it must not happen. So only use a pointer to const to access
the main context.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is possible now that ff_thread_await_progress() accepts
a const ThreadFrame*.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is possible for most of the callers, because e.g. only
the MPEG-4 decoder can have bits_per_raw_sample > 8.
Also most mpegvideo-based codecs are 420 only.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These modifications were actually meant to be applied to
the coded_frame, yet 08b31a72db
changed this and so this code has not been removed when coded_frame
has been removed in 11bc790893.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
String literals are allowed to be deduplicated (and toolchains
are already capable of doing so), yet the same is not allowed
for named arrays (even when they contain strings). Therefore
use a const char *const pointing to an unnamed string literal
for ttml_default_namespacing.
Reviewed-by: Jan Ekström <jeebjp@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The decoder is meant to use it as a fallback if the value in extradata is
invalid.
Regression since d199099be.
Signed-off-by: James Almer <jamrial@gmail.com>
Since this behavior is intentional, use the VERBOSE level instead of WARNING as
it's nothing the user should worry about.
Signed-off-by: James Almer <jamrial@gmail.com>
This tests the new "-flags2 icc_profiles" option by making sure the
embedded ICC profile gets correctly detected as sRGB.
Signed-off-by: Niklas Haas <git@haasn.dev>
Only if requested, and only if the codec signals support for ICC
profiles. Implementation roughly matches the functionality of the
existing vf_iccgen filter, albeit with some reduced flexibility and no
caching.
Ideally, we'd also only do this on the first frame (e.g. mjpeg, apng),
but there's no meaningful way for us to distinguish between this case
and e.g. somebody using the image2 muxer, in which case we'd want to
attach ICC profiles to every frame in the stream.
Closes: #9672
Signed-off-by: Niklas Haas <git@haasn.dev>
Implementation for the decode side of the ICC profile API, roughly
matching the behavior of the existing vf_iccdetect filter.
Closes: #9673
Signed-off-by: Niklas Haas <git@haasn.dev>
Handling this in general code makes more sense than handling it in
individual codec files, because it would be a lot of unnecessary code
duplication for the plenty of formats that support exporting ICC
profiles (jpg, png, tiff, webp, jxl, ...).
encode.c and decode.c will be in charge of initializing this state as
needed, so we merely need to make sure to uninit it afterwards from the
common destructor path.
Signed-off-by: Niklas Haas <git@haasn.dev>
This functionally already exists, but as pointed out in #9672 and #9673,
requiring users to manually include filters is clumsy, error-prone and
hard to use together with tools like ffplay.
To streamline ICC profile support, add a new AVCodecContext flag to
globally enable reading and writing ICC profiles, automatically, for all
appropriate media types.
Note that this commit only includes the new API. The implementation is
split off to separate commits for readability.
Signed-off-by: Niklas Haas <git@haasn.dev>
Codecs that can read/write ICC profiles deserve a special capability so
the common logic in encode.c/decode.c can decide whether or not there
needs to be any special handling for ICC profiles. The motivation here
is to be able to use it to decide whether or not an ICC profile needs to
be generated in the encode path, but it might as well get added to
decoders as well for purely informative reasons.
It's not entirely clear to me whether the "thp" and "smvjpeg" variants
of "mjpeg" should have this capability set or not, given that the code
technically supports it but I somehow doubt these files may contain
them. In either case, this cap is purely informative for decoders so it
doesn't matter too much either way.
It's also not entirely clear whether the "amv" encoder should signal ICC
profile support, but again erring on the side of caution, we probably
*shouldn't* be generating (and encoding!) ICC profiles for this type of
media file.
Signed-off-by: Niklas Haas <git@haasn.dev>
We will need this helper inside libavcodec in the future, so move it
there, leaving behind an #include to the raw source file in its old
location in libvfilter. This approach is inspired by the handling of
vulkan.c, and avoids us needing to expose any of it publicly (or
semi-publicly) in e.g. libavutil, thus avoiding any ABI headaches.
It's debatable whether the actual code belongs in libavcodec or
libavfilter, but I decided to put it into libavcodec because it
conceptually deals with encoding and decoding ICC profiles, and will be
used to decode embedded ICC profiles in image files.
Signed-off-by: Niklas Haas <git@haasn.dev>
GPU hang is one of the most typical errors on Intel GPUs in
case something goes wrong. It's important to recognize it
explicitly for easier bugs triage. Also, this error code
can be used to trigger GPU recovery path in self-written
applications.
There were 2 other statuses which MediaSDK can ppotentially return,
MFX_ERR_NONE_PARTIAL_OUTPUT and MFX_ERR_REALLOC_SURFACE. Adding
them as well.
v2: move MFX_ERR_NONE_PARTIAL_OUTPUT next to MFX_WRN_* (Haihao)
Signed-off-by: Hon Wai Chow <hon.wai.chow@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
The streamcopy initialization code briefly needs an AVCodecContext to
apply AVOptions to. Allocate a temporary codec context, do not use the
encoding one.
Instead replace VP56mv by new and identical structures VP8mv and VP9mv.
Also replace VP56Frame by VP8FrameType in vp8.h and use that
in VP8 code. Also remove VP56_FRAME_GOLDEN2, as this has only
been used by VP8, and use VP8_FRAME_ALTREF as replacement for
its usage in VP8 as this is more in line with VP8 verbiage.
This allows to remove all inclusions of vp56.h from everything
that is not VP5/6. This also removes implicit inclusions
of hpeldsp.h, h264chroma.h, vp3dsp.h and vp56dsp.h from all VP8/9
files.
(This also fixes a build issue: If one compiles with -O0 and disables
everything except the VP8-VAAPI encoder, the file containing
ff_vpx_norm_shift is not compiled, yet this is used implicitly
by vp56_rac_gets_nn() which is defined in vp56.h; it is unused
by the VP8-VAAPI encoder and declared as av_unused, yet with -O0
unused noninline functions are not optimized away, leading to
linking failures. With this patch, said function is not included
in vaapi_encode_vp8.c any more.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Even resolution or number of picture stores changes, we still need
follow no_output_of_prior_pics_flag in next IDR.
Tested-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
suppose
a. You have 3 frames, 0, 1, 4096.
b. The ltMask is 0xfff and use_msb is 0.
c. The 0, 1 are lt refs for 4096.
d. you are decoding frame 4096, and get the 0 frame.
Since 4096 & ltMask is 0 too, even you want get 0, find_ref_idx may give you 4096.
add_candidate_ref will report an error for this
Tested-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
We will generate a new frame for a missed reference. The frame can only
be used for reference. We assign an invalid decode sequence to it, so
it will not be involved in any dpb process.
Tested-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
According to C.5.2.2, item 2. When we got an IRAP, and the
NoOutputOfPriorPicsFlag = 0, we need bump all outputable frames.
Tested-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
It conflicts with the name of the test using the testtool
in libavformat.mak.
Fixes ticket #9841.
Reviewed-by: Pierre-Anthony Lemieux <pal@sandflow.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It used to be allocated separately, so that the pointer to it
is copied to all HEVCContexts, so that all slice-threads
use the same. This is completely unnecessary now that there
is only one HEVCContext any more. There is just one minor
complication left: The slice-threads only get a pointer to
const HEVCContext, but they need to modify the common CABAC
state. Fix this by adding a pointer to the common CABAC state
to HEVCLocalContext and document why it exists.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
While just at it, also use av_calloc() instead of zeroing
the array ourselves in a loop.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Actually, ff_slice_thread_allocz_entries() always already
allocates zeroed entries, so ff_reset_entries() was already
unnecessary. Make this more clear by renaming it to
ff_slice_thread_allocz_entries().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The HEVC decoder has both HEVCContext and HEVCLocalContext
structures. The latter is supposed to be the structure
containing the per-slicethread state.
Yet up until now that is not how it is handled in practice:
Each HEVCLocalContext has a unique HEVCContext allocated for it
and each of these coincides except in exactly one field: The
corresponding HEVCLocalContext. This makes it possible to pass
the HEVCContext everywhere where logically a HEVCLocalContext
should be used. And up until recently, this is how it has been done.
Yet the preceding patches changed this, making it possible
to avoid allocating redundant HEVCContexts.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Right now the code passes a list of ints whose entry #i
is just i as opaque parameter to hls_decode_entry_wpp
via execute2; said list is even constantly allocated and freed.
This commit stops doing so and instead passes the list of
HEVCLocalContext* instead, so that the main HEVCContext
can be avoided in accessing the HEVCLocalContext.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The HEVC decoder has both HEVCContext and HEVCLocalContext
structures. The latter is supposed to be the structure
containing the per-slicethread state.
Yet that is not how it is handled in practice: Each HEVCLocalContext
has a unique HEVCContext allocated for it and each of these
coincides except in exactly one field: The corresponding
HEVCLocalContext. This makes it possible to pass the HEVCContext
everywhere where logically a HEVCLocalContext should be used.
This commit stops doing this for lavc/hevcdec.c itself.
It also constifies what can be constified in order to make
the nonconst stuff stand out more.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The HEVC decoder has both HEVCContext and HEVCLocalContext
structures. The latter is supposed to be the structure
containing the per-slicethread state.
Yet that is not how it is handled in practice: Each HEVCLocalContext
has a unique HEVCContext allocated for it and each of these
coincides except in exactly one field: The corresponding
HEVCLocalContext. This makes it possible to pass the HEVCContext
everywhere where logically a HEVCLocalContext should be used.
This commit stops doing this for lavc/hevcpred as well as
the corresponding mips code; the latter is untested.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The HEVC decoder has both HEVCContext and HEVCLocalContext
structures. The latter is supposed to be the structure
containing the per-slicethread state.
Yet that is not how it is handled in practice: Each HEVCLocalContext
has a unique HEVCContext allocated for it and each of these
coincides except in exactly one field: The corresponding
HEVCLocalContext. This makes it possible to pass the HEVCContext
everywhere where logically a HEVCLocalContext should be used.
This commit stops doing this for lavc/hevc_cabac.c; it also constifies
everything that is possible in order to ensure that no slice thread
accidentally modifies the main HEVCContext state.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The HEVC decoder has both HEVCContext and HEVCLocalContext
structures. The latter is supposed to be the structure
containing the per-slicethread state.
Yet that is not how it is handled in practice: Each HEVCLocalContext
has a unique HEVCContext allocated for it and each of these
coincides with the main HEVCContext except in exactly one field:
The corresponding HEVCLocalContext.
This makes it possible to pass the HEVCContext everywhere where
logically a HEVCLocalContext should be used.
This led to confusion in the first version of what eventually became
commit c8bc0f66a8:
Before said commit, the initialization of the Rice parameter derivation
state was incorrect; the fix for single-threaded as well as
frame-threaded decoding was to add backup stats to HEVCContext
that are used when the cabac state is updated*, see
https://ffmpeg.org/pipermail/ffmpeg-devel/2020-August/268861.html
Yet due to what has been said above, this does not work for
slice-threading, because the each HEVCLocalContext has its own
HEVCContext, so the Rice parameter state would not be transferred
between threads.
This is fixed in c8bc0f66a8
by a hack: It rederives what the previous thread was and accesses
the corresponding HEVCContext.
Fix this by treating the Rice parameter state the same way
the ordinary CABAC parameters are shared between threads:
Make them part of the same struct that is shared between
slice threads. This does not cause races, because
the parts of the code that access these Rice parameters
are a subset of the parts of code that access the CABAC parameters.
*: And if the persistent_rice_adaptation_enabled_flag is set.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The HEVC decoder has both HEVCContext and HEVCLocalContext
structures. The latter is supposed to be the structure
containing the per-slicethread state.
Yet that is not how it is handled in practice: Each HEVCLocalContext
has a unique HEVCContext allocated for it and each of these
coincides with the main HEVCContext except in exactly one field:
The corresponding HEVCLocalContext.
This makes it possible to pass the HEVCContext everywhere where
logically a HEVCLocalContext should be used.
This commit stops doing this for lavc/hevc_filter.c; it also constifies
everything that is possible in order to ensure that no slice thread
accidentally modifies the main HEVCContext state.
There are places where this was not possible, namely with the SAOParams
in sao_filter_CTB() or with sao_pixels_buffer_h in copy_CTB_to_hv().
Both of these instances lead to data races, see
https://fate.ffmpeg.org/report.cgi?time=20220629145651&slot=x86_64-archlinux-gcc-tsan-slices
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The HEVC decoder has both HEVCContext and HEVCLocalContext
structures. The latter is supposed to be the structure
containing the per-slicethread state.
Yet that is not how it is handled in practice: Each HEVCLocalContext
has a unique HEVCContext allocated for it and each of these
coincides except in exactly one field: The corresponding
HEVCLocalContext. This makes it possible to pass the HEVCContext
everywhere where logically a HEVCLocalContext should be used.
This commit stops doing this for lavc/hevc_mvs.c; it also constifies
everything that is possible in order to ensure that no slice thread
accidentally modifies the main HEVCContext state.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is safe for a slice thread to read the main context
and therefore it is safe to add a pointer to const HEVCContext
(namely the parent context) to each HEVCLocalContext.
It is also safe (and actually redundant) to add a pointer
to a logcontext to HEVCLocalContext.
Doing so allows to pass the HEVCLocalContext as context in
the parts of the code that is run slice-threaded when slice-threading
is in use (currently these parts of the code use ordinary
HEVCContext*). This way one is not tempted to modify
the main context from the slice contexts.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The slicethread contexts need to be initialized for
every frame, not only the first one, so one can
remove the initialization when allocating these contexts,
because the ordinary per-frame initialization will
initialize them again just a few lines below.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Otherwise, there is no guarantee that the various av_log-messages
are not interrupted by another log statement. The latter may originate
from anywhere else, even the HEVC decoder itself, as happens when
one uses frame-threading to decode the BUMPING_A_ericsson_1.bit
sample from the FATE-suite.
Furthermore, the earlier approach suffered from the fact that
various parts of the logmsg were output with different loglevels
and that checking stopped after having encountered the first
plane with MD5 mismatch, although it is probably interesting to
know whether other planes are incorrect, too.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Dump all input/output names to OVModel struct. In case other funcs use
them for reporting errors or locating issues.
Signed-off-by: Ting Fu <ting.fu@intel.com>
encode_block() in svq1enc.c looks like the following:
static int encode_block(int block[7][256], int level)
{
int best_score = 0;
for (unsigned x = 0; x < level; x++) {
int v = block[1][x];
block[level][x] = 0;
best_score += v * v;
}
if (level > 0 && best_score > 64) {
int score = 0;
score += encode_block(block, level - 1);
score += encode_block(block, level - 1);
if (score < best_score) {
best_score = score;
}
}
return best_score;
}
When called from outside of encode_block(), it is always called with
level == 5.
This triggers a bug [1] in GCC: On -O3, it creates eight clones of
encode_block with different values of level inlined into it. The clones
with negative values are of course useless*, but they also lead to
-Warray-bounds warnings, because they access block[-1].
This has been mitigated in GCC 12: It no longer creates clones
for parameters that it knows are impossible. Somehow switching levels
to unsigned makes GCC know this. Therefore this commit does this.
(For GCC 11, this changes the warning to "array subscript 4294967295 is
above array bounds" from "array subscript -1 is below array bounds".)
[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102513
*: These clones can actually be discarded when compiling with
-ffunction-sections.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
filter_mb_mbaff_edgev() and filter_mb_mbaff_edgecv()
have a function parameter whose expected size depends upon
another parameter: It is 2 * bsi + 1 (with bsi always being 1 or 2).
This array is declared as const int16_t[7], yet some of the callers
with bsi == 1 call it with only an const int16_t[4] available.
This leads to -Wstringop-overread warnings from GCC 12.1.
This commit fixes these by replacing [7] with [/* 2 * bsi + 1 */],
so that the expected range and its dependence on bsi is immediately
visible.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
check_block_inter() currently does this when calling check_block().
This leads to a -Wstringop-overflow= warning when compiling with
GCC 12.1.
Given that the main part of the body of check_block() consists
of an "if (intra) { ... } else { ... }" which is true iff
check_block() is not called from check_block_inter(),
it makes sense to fix this by just inlining check_block()
check_block_inter() and turning check_block() into a new
check_block_intra() (with the inter parts removed, of course).
This should also not make much of a difference for the generated code
given that both check_block() as well as check_block_inter()
are already marked as av_always_inline, so this commit follows
this route to fix the issue.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
multiswap_step() and multiswap_inv_step() both only require
six keys; in all current callers, these keys are part of
an array of twelve keys, yet in some of these callers the keys
given to these functions point to the second half of these
twelve keys, so that only six keys are available to these functions.
This led to -Wstringop-overread warnings when compiling with GCC 12.1.
Fix these by adapting the declaration of these functions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Using tail calls with functions returning void is forbidden
(C99/C11 6.8.6.4: "A return statement with an expression shall not appear
in a function whose return type is void.") GCC emits a warning
because of this when using -pedantic: "ISO C forbids ‘return’ with
expression, in function returning void"
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It retrieves the muxer's internal timestamp with under-defined
semantics. Continuing to use this value would also require
synchronization once the muxer is moved to a separate thread.
Replace the value with last_mux_dts.
This field means different things when the video is encoded (number of
frames emitted to the encoding sync queue/encoder by the video sync
code) or copied (number of packets sent to the muxer sync queue).
Print the value of packets_written instead, which means the same thing
in both cases. It is also more accurate, since packets may be dropped by
the sync queue or bitstream filters.
Same issues apply to it as to -shortest.
Changes the results of the following tests:
- matroska-flac-extradata-update
The test reencodes two input FLAC streams into three output FLAC
streams. The last output stream is limited to 8 frames. The current
code results in the first two output streams having 12 frames, after
this commit all three streams have 8 frames and are the same length.
This new result is better, since it is predictable.
- mkv-1242
The test streamcopies one video and one audio stream, video is limited
to 11 frames. The new result shortens the audio stream so that it is
not longer than the video.
The -shortest option (which finishes the output file at the time the
shortest stream ends) is currently implemented by faking the -t option
when an output stream ends. This approach is fragile, since it depends
on the frames/packets being processed in a specific order. E.g. there
are currently some situations in which the output file length will
depend unpredictably on unrelated factors like encoder delay. More
importantly, the present work aiming at splitting various ffmpeg
components into different threads will make this approach completely
unworkable, since the frames/packets will arrive in effectively random
order.
This commit introduces a "sync queue", which is essentially a collection
of FIFOs, one per stream. Frames/packets are submitted to these FIFOs
and are then released for further processing (encoding or muxing) when
it is ensured that the frame in question will not cause its stream to
get ahead of the other streams (the logic is similar to libavformat's
interleaving queue).
These sync queues are then used for encoding and/or muxing when the
-shortest option is specified.
A new option – -shortest_buf_duration – controls the maximum number of
queued packets, to avoid runaway memory usage.
This commit changes the results of the following tests:
- copy-shortest[12]: the last audio frame is now gone. This is
correct, since it actually outlasts the last video frame.
- shortest-sub: the video packets following the last subtitle packet are
now gone. This is also correct.
The following commits will add a new buffering stage after bitstream
filters, which should not be taken into account for choosing next
output.
OutputStream.last_mux_dts is also used by the muxing code to make up
missing DTS values - that field is now moved to the muxer-private
MuxStream object.
The current placement of this free is historical - it used to be
followed by avcodec_close(), since removed.
The proper place for freeing the stats is currently right before the
encoder context itself is freed.
It is currently called from two places:
- output_packet() in ffmpeg.c, which submits the newly available output
packet to the muxer
- from of_check_init() in ffmpeg_mux.c after the header has been
written, to flush the muxing queue
Some packets will thus be processed by this function twice, so it
requires an extra parameter to indicate the place it is called from and
avoid modifying some state twice.
This is fragile and hard to follow, so split this function into two.
Also rename of_write_packet() to of_submit_packet() to better reflect
its new purpose.
The muxing queue currently lives in OutputStream, which is a very large
struct storing the state for both encoding and muxing. The muxing queue
is only used by the code in ffmpeg_mux, so it makes sense to restrict it
to that file.
This makes the first step towards reducing the scope of OutputStream.
Figure out earlier whether the output stream/file should be bitexact and
store this information in a flag in OutputFile/OutputStream.
Stop accessing the muxer in set_encoder_id(), which will become
forbidden in future commits.
The current code postpones closing the files until after printing the
final report, which accesses the output file size. Deal with this by
storing the final file size before closing the file.
Move the file size checking code to ffmpeg_mux. Use the recently
introduced of_filesize(), making this code consistent with the size
shown by print_report().
Move header_written into it, which is not (and should not be) used by
any code outside of ffmpeg_mux.
In the future this context will contain more muxer-private state that
should not be visible to other code.
The amount of padding samples reported by containers take into account the
extended samplerate in HE-AAC.
Fixes ticket #9671.
Signed-off-by: James Almer <jamrial@gmail.com>
This issue would cause segmetaion fault when running srcnn model with
sr filter by TensorFlow backend. This filter would free the frame incorectly.
Signed-off-by: Ting Fu <ting.fu@intel.com>
Using parameter from AVCodecContext to reset qsv codec is more suitable
for MFXVideoENCODE_Reset()'s usage. Per-frame metadata is more suitable
for the usage of mfxEncodeCtrl being passed to
MFXVideoENCODE_EncodeFrameAsync(). Now change it to use the value
from AVCodecContext.
Because q->param is passed to both "in" and "out" parameters when call
MFXVideoENCODE_Query(), the value in q->param may be changed. New
variables are added to store old configuration, so that we can detect
real parameter change.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Dividing one line log into several av_log() call is not thread safe. Now
merge these strings into one av_log() call.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
The only duration field currently present in AVFrame is pkt_duration,
which is semantically restricted to those frames that are output by
decoders.
Add a new field that stores the frame's duration without regard for how
that frame was produced. Deprecate pkt_duration.
It need not be writable; in fact, it is often not writable even if
the packet sent to the decoder was writable, because the generic code
calls av_packet_ref() on it. It is never writable if a user
drains the decoder after every packet, because in this case the decode
callback is called from avcodec_send_packet().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
wrapped_avframe_decode() uses an AVFrame as dst in av_frame_move_ref()
after having called ff_decode_frame_props() to attach side-date
to this very frame. This leaks all the side-data and metadata
that ff_decode_frame_props() has attached.
This happens in various fate-filter-metadata tests since
6ca43a9675.
These particular leaks (which affect metadata-only)
could be fixed by not adding metadata side-data to AVPackets
in libavdevice if they are also available from the AVFrames.
Yet this would break users that extract the metadata from
AVPackets.
The changes to FATE happen because of the way av_dict_set()
works when it overwrites an already existing entry:
It overwrites the entry to be overwritten with the last entry
and adds the new entry at the end. The end result is that
the first entry of the dict is the second-to-last-entry of
the original dict, the last entry of the dict is the last
entry of the old dict and the first count - 2 entries
of the original dict are at positions 1..count - 2 in their
original order.
Reviewed-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
and remove FF_CODEC_CAP_INIT_THREADSAFE
All our native codecs are already init-threadsafe
(only wrappers for external libraries and hwaccels
are typically not marked as init-threadsafe yet),
so it is only natural for this to also be the default state.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is in preparation of switching the default init-thread-safety
to a codec being init-thread-safe.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Streams with all zero sample_delta in 'stts' have all zero dts.
They have higher chance be chose by mov_find_next_sample(), which
leads to seek again and again.
For example, GoPro created a 'GoPro SOS' stream:
Stream #0:4[0x5](eng): Data: none (fdsc / 0x63736466), 13 kb/s (default)
Metadata:
creation_time : 2022-06-21T08:49:19.000000Z
handler_name : GoPro SOS
With 'ffprobe -show_frames http://example.com/gopro.mp4', ffprobe
blocks until all samples in 'GoPro SOS' stream are consumed first.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
This avoids an extra copy of potentially quite big video frames.
Instead of copying the entire frames data into a rawvideo packet it
packs the frame into a wrapped avframe packet and passes it through
as-is.
Unfortunately, wrapped avframes are set up to be video frames, so the
audio frames continue to be copied.
Additionally, this enabled passing through video frames that previously
were impossible to process, like hardware frames or other special
formats that couldn't be packed into a rawvideo packet.
According to API docs avdevice_list_devices(), avdevice_list_input_sources()
and avdevice_list_input_sinks() should return the number of autodetected
devices on success. This is redundant with AVDeviceInfoList->nb_devices so it
was not noticed earlier that none of the underlying device list functions work
like that.
Let's fix it in generic code to make it in line with the API docs.
Fixes ticket #9820.
Signed-off-by: Marton Balint <cus@passwd.hu>
This patch prevents the libjxl encoder wrapper from failing to
encode images when the input video has untagged primaries. It will
instead assume BT.709/sRGB primaries and print a warning.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
The max height is currently documented as 16; the max difference per
pixel is 255, and a .8h element can easily contain 16*255, thus keep
accumulating in two .8h vectors, and just do the final accumulationat the
end. This should work for heights up to 256.
This requires a minor register renumbering in ff_pix_abs16_xy2_neon.
Before: Cortex A53 A72 A73 Graviton 3
pix_abs_0_0_neon: 97.7 47.0 37.5 22.7
pix_abs_0_1_neon: 154.0 59.0 52.0 25.0
pix_abs_0_3_neon: 179.7 96.7 87.5 41.2
After:
pix_abs_0_0_neon: 96.0 39.2 31.2 22.0
pix_abs_0_1_neon: 150.7 59.7 46.2 23.7
pix_abs_0_3_neon: 175.7 83.7 81.7 38.2
Signed-off-by: Martin Storsjö <martin@martin.st>
Using absolute-difference-accumulate does use twice the amount of
absolute-difference instructions, but avoids the need for the
uaddl and add instructions, reducing the total number of instructions
by 3.
These can be interleaved in the rest of the calculation, to avoid
tight dependencies at the end. Unfortunately, this is marginally
slower on Cortex A53, but faster on A72 and A73.
Before: Cortex A53 A72 A73 Graviton 3
pix_abs_0_3_neon: 175.7 109.2 92.0 41.2
After:
pix_abs_0_3_neon: 179.7 96.7 87.5 41.2
Signed-off-by: Martin Storsjö <martin@martin.st>
This is a per-file input option that adjusts an input's timestamps
with reference to another input, so that emitted packet timestamps
account for the difference between the start times of the two inputs.
Typical use case is to sync two or more live inputs such as from capture
devices. Both the target and reference input source timestamps should be
based on the same clock source.
If either input lacks starting timestamps, then no sync adjustment is made.
Provide neon implementation for pix_abs16_x2 function.
Performance tests of implementation are below.
- pix_abs_0_1_c: 283.5
- pix_abs_0_1_neon: 39.0
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
av_log(NULL, AV_LOG_WARNING, "Multiple %s options specified for stream %d, only the last option '-%s%s%s "SPECIFIER_OPT_FMT_##type"' will be used.\n",\
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.