SWGDE

published documents

SWGDE Core Technical Concepts for Time-Based Analysis of Digital Video Files

19-v-002

Disclaimer:

As a condition to the use of this document and the information contained therein, the SWGDE requests notification by e-mail before or contemporaneous to the introduction of this document, or any portion thereof, as a marked exhibit offered for or moved into evidence in any judicial, administrative, legislative or adjudicatory hearing or other proceeding (including discovery proceedings) in the United States or any Foreign country. Such notification shall include: 1) the formal name of the proceeding, including docket number or similar identifier; 2) the name and location of the body conducting the hearing or proceeding; 3) subsequent to the use of this document in a formal proceeding please notify SWGDE as to its use and outcome; 4) the name, mailing address (if available) and contact information of the party offering or moving the document into evidence. Notifications should be sent to secretary@swgde.org.

It is the reader’s responsibility to ensure they have the most current version of this document. It is recommended that previous versions be archived.

Redistribution Policy:

SWGDE grants permission for redistribution and use of all publicly posted documents created by SWGDE, provided that the following conditions are met:

  1. Redistribution of documents or parts of documents must retain the SWGDE cover page containing the disclaimer.
  2. Neither the name of SWGDE nor the names of contributors may be used to endorse or promote products derived from its documents.
  3. Any reference or quote from a SWGDE document must include the version number (or create date) of the document and mention if the document is in a draft status.

Requests for Modification:

SWGDE encourages stakeholder participation in the preparation of documents. Suggestions for modifications are welcome and must be forwarded to the Secretary in writing at secretary@swgde.org. The following information is required as a part of the response:

  1. Submitter’s name
  2. Affiliation (agency/organization)
  3. Address
  4. Telephone number and email address
  5. Document title and version number
  6. Change from (note document section number)
  7. Change to (provide suggested text where appropriate; comments not including suggested text will not be considered)
  8. Basis for change

Intellectual Property:

Unauthorized use of the SWGDE logo or documents without written permission from SWGDE is a violation of our intellectual property rights.

Individuals may not misstate or over represent duties and responsibilities of SWGDE work. This includes claiming oneself as a contributing member without actively participating in SWGDE meetings; claiming oneself as an officer of SWGDE without serving as such; claiming sole authorship of a document; use the SWGDE logo on any material or curriculum vitae.

Any mention of specific products within SWGDE documents is for informational purposes only; it does not imply a recommendation or endorsement by SWGDE.

Table of Contents

1. Purpose

The purpose of this document is to provide core technical concepts for time-based analysis of digital video files.

2. Scope

This document provides core technical concepts for understanding frame rate and frame timing information for video in multimedia file formats. The intended audience is individuals performing or utilizing time-based video analysis in investigative or legal settings which may include determining speed, duration, or timing of persons or objects in video. This document’s audience should have an understanding of forensic video analysis.

This is a foundational document that provides technical core concepts that are used in SWGDE Best Practice for Frame Timing Analysis of H.264 Video Stored in ISO Base Media File Formats [1].

3. Limitations

Due to the wide variety of proprietary digital video recording devices and file formats, a singular approach to determining frame rate and timing cannot be applied to all video files.

This document does not address the process by which images are captured, sampled, and/or encoded; it focuses on the interpretation of data once it has been encoded into a binary format.

This document does not address the forensic use of photogrammetry. See SWGDE Best Practices for the Forensic Use of Photogrammetry for more information [2].

4. Introduction

Video is the electronic representation of a sequence of images, depicting either stationary or moving scenes. Central to video are the concepts of time and duration. In digital video files, time refers to the specific location of a frame within the timeline of the video file. Duration refers to the length of time an image is displayed. Determining the frame timing within a video file has several applications and may be particularly helpful in determining the accuracy of an unknown variable during an event of interest.

Digital video containers and encoding formats define methods to encode timing information within binary streams or packages. Proper decoding of timing information is critical for the ability of software to provide accurate playback of digital video.

Commercial or open source tools are available to aid in the determination of speed, duration, and timing of events captured on video for both investigative and forensic examinations in civil and criminal litigation. For example, a frame information report can be generated with FFmpeg (See SWGDE Technical Notes on FFmpeg, Section 11.3) [3].

Note: Different video formats will display varying frame information dependent upon its encoding.

5. Core Technical Concepts

Timing information in a multimedia video file is not directly stored in a single location; instead it must be decoded and calculated based on a number of data elements stored throughout the file. The following concepts should be understood prior to conducting any frame timing analysis of video files.

5.1 Time Base

Modern multimedia containers manage the timing for presenting video frames by using a timestamp for a given frame, instead of relying solely on an expression of frame rate, which is an average of all frames in the video stream. For example, if the container relied solely on a number such as 25 frames per second (FPS), which implies that all frames will be presented on the screen every 0.4 seconds on the timeline, there would be no mechanism to provide more precision for videos where frame rate is not constant throughout the stream. In order to provide the desired precision, modern multimedia containers introduced a time base value (i.e., a unit of time that represents one tick, or one part, of a second). A time base of 30,000 represents 1/30,000th of a second. The term timescale1 refers to the reciprocal of the time base.2

For example, a video stream may have a time base of 30,000 parts per second. Duration for the video stream and individual frames will be encoded as units of this time base value. In order to map duration to seconds, the time base is used. In this example, if the duration of the stream is encoded as 600,600 time base units, then dividing this duration value by the time base value will be the duration of the stream in seconds.

𝑇𝑖𝑚𝑒 𝑏𝑎𝑠𝑒 (𝑇)

𝐷𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑆𝑡𝑟𝑒𝑎𝑚 𝑖𝑛 𝑇𝑖𝑚𝑒 𝐵𝑎𝑠𝑒 𝑢𝑛𝑖𝑡𝑠 (𝐷𝑆𝑇𝐵)

𝐷𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑆𝑡𝑟𝑒𝑎𝑚 𝑖𝑛 𝑠𝑒𝑐𝑜𝑛𝑑𝑠 (𝐷𝑆𝑆)

𝐷𝑆𝑆 = 𝐷𝑆𝑇𝐵 ÷ 𝑇

𝐷𝑆𝑆 = 600,600 ÷ 30,000

𝐷𝑆𝑆 = 20.02 𝑠𝑒𝑐𝑜𝑛𝑑𝑠

5.2 Frame Duration

Each frame is displayed on screen for a specific amount of time. This is known as the frame’s duration. The duration of a frame is often encoded in the time base units of the media stream. Video streams with a constant frame rate will include frames that share a common duration. Streams with variable frame rates will include frames that have differing durations from each other.

As an example of constant frame duration, a video stream might have a time base of 30,000 parts per second. In this stream, frame 1 has an encoded frame duration of 1,001 time base units, frame 2 has an encoded frame duration of 1,001 time base units, and frame 3 has an encoded frame duration of 1,001 time base units.

𝐷𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝐹𝑟𝑎𝑚𝑒 𝑖𝑛 𝑇𝑖𝑚𝑒 𝐵𝑎𝑠𝑒 𝑢𝑛𝑖𝑡𝑠 (𝐷𝐹𝑇𝐵)

𝐷𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝐹𝑟𝑎𝑚𝑒 𝑖𝑛 𝑠𝑒𝑐𝑜𝑛𝑑𝑠 (𝐷𝐹𝑆)

𝑇𝑖𝑚𝑒 𝑏𝑎𝑠𝑒 (𝑇)

𝐹𝑟𝑎𝑚𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 (𝑛)

𝐷𝐹𝑆(𝑛) = 𝐷𝐹𝑇𝐵 ÷ 𝑇

𝐷𝐹𝑆(1) = 1,001 ÷ 30,000 = 0.03336667

𝐷𝐹𝑆(2) = 1,001 ÷ 30,000 = 0.03336667

𝐷𝐹𝑆(3) = 1,001 ÷ 30,000 = 0.03336667

As an example of variable frame duration, a video stream might have a time base of 90,000 parts per second. In this stream, frame 1 has an encoded frame duration of 6,001 time base units, frame 2 has an encoded frame duration of 5,999 time base units, and frame 3 has an encoded frame duration of 6,000 time base units.

𝐷𝐹𝑆(𝑛) = 𝐷𝐹𝑇𝐵 ÷ 𝑇

𝐷𝐹𝑆(1) = 6,001 ÷ 90,000 = 0.06667778

𝐷𝐹𝑆(2) = 5,999 ÷ 90,000 = 0.06665556

𝐷𝐹𝑆(3) = 6,000 ÷ 90,000 = 0.06666667

5.3 Timecode

Timecode for a particular video frame is the calculation of elapsed-time from the start point of the video stream to the point at which the given frame should be displayed on screen. Using the time base and the frame duration of each preceding frame in the sequence, the elapsed time is calculated to determine the point in time at which the given frame is to be displayed. Each frame’s timecode can be calculated in this way. The initial start point of timecode within a video stream might not be 0, rather it might be an arbitrary number.

For example, a video stream might have a time base of 30,000. In this stream, frame 1 has an encoded frame duration of 1,001, frame 2 has an encoded frame duration of 1,001, and frame 3 has an encoded frame duration of 1,001. We can determine the timecode for each frame with the following calculations, assuming in this case that the timecode starts at 0:

In most implementations, timecode for a video frame will be calculated based on the time base of the specific video stream within the multimedia container.

5.4 Presentation Time

Presentation time is the calculated point in time, in seconds, from the start of the video stream timeline in which a specific frame is to be displayed on the screen.

The timecode of each frame, in time base units, can be divided by the time base in order to identify the presentation time for the frame in seconds. For example, presentation time can be calculated as follows:

Identifying presentation time is important because the order and timing in which frames are presented is not necessarily the same as the way the frames are stored within the multimedia container. In order to present a frame on screen in the correct playback sequence, a video decoder must know when to construct the image for on-screen display.

5.5 Decode Time

Decode time is the time at which a video frame is meant to be decoded by the playback engine. For uncompressed video, as well as for many compressed video implementations, decode time and presentation time will be the same for a given frame. This is also often the case with streaming video applications. For example, MPEG-2 Part 1 transport streams used in broadcasting deliver video frames in a packetized stream that is processed in sequence of receipt. However, when certain types of compression are applied to video streams found in multimedia containers, the decode time might differ from the presentation time for a given video frame. In these cases, it might be necessary to decode a subsequent frame prior to display in order to have reference data in the video stream. This scenario will be found in video files containing bidirectional frames (b-frames) because the subsequent predictive frame, or interframe (i-frame), must be decoded before the b-frame can be composed for on-screen display.

5.6 Frame Rate

SWGDE Technical Overview of Digital Video Files defines frame rate as a measure of the video display rate in frames per second [4]. Frame rate can be calculated as a factor of the media stream’s time base and the duration of each frame.

When all frames in a video stream share a common duration, then the interval of time between each frame will also be constant. When frames in a video stream have different durations, the interval of time between each frame will also be variable. Because the frames in a video stream can have variable or constant duration, frame rate is typically expressed as an average rate of frames displayed on screen per second.

For example, a video stream may have a time base of 30,000 and the duration of each frame encoded as 1,001 time base units. The average frame rate in this case is obtained by dividing the time base by the duration of each frame. The frame rate can be calculated in frames per second (FPS) as follows:

𝑇𝑖𝑚𝑒 𝑏𝑎𝑠𝑒 (𝑇)

𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐷𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎𝑙𝑙 𝐹𝑟𝑎𝑚𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑆𝑡𝑟𝑒𝑎𝑚 (𝐹𝐷)

𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐹𝑟𝑎𝑚𝑒 𝑅𝑎𝑡𝑒 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑆𝑡𝑟𝑒𝑎𝑚 (𝑅𝐹)

𝑅𝐹 = 𝑇 ÷ 𝐹𝐷

𝑅𝐹 = 30,000 ÷ 1,001 = 29.97003 𝑜𝑟 29.97 𝐹𝑃𝑆3

6. Additional Considerations

6.1 Source

The source where the digital video was encoded should be known and verified. Playback timing may be affected when digital files are transcoded. Absent an original video file, the effects of transcoding to a video file may not be readily apparent to the user.

6.2 Media Playback Software

The media playback software may have an effect on both playback timing and viewable information. For example, a file that plays back in one manner in the native player, may play back at a different speed in another player depending on the player’s interpretation (or use) of the internal timing information discussed above. Furthermore, one player may show date, time, and frame identification information, while another does not. The display of timestamps is dependent on the player’s ability to decode container information used by proprietary file formats.

6.3 Drop Frame Timecode

A common standard for expressing frame timing information is SMPTE4 Timecode (SMPTE 12M), which expresses the frame timecode (discussed above) in an HH:MM:SS form while also adding a fourth segmentation that represents an individual frame number within a given second: HH:MM:SS:FF. When frame rates that are non-integers are employed, such as 29.97, attempting to express those values as valid SMPTE timecode will ultimately result in a disparity between the clock on the wall and the timecode clock. In order to correct for this disparity, drop-frame timecode was introduced. It should be noted that SMPTE drop frame timecode is displayed using a “;” or a “.” rather than a “:” to delineate values.

Drop frame timecode will skip timecode values (without reducing the actual number of frames), so that the HH:MM:SS part of the SMPTE timecode can stay in tune with the HH:MM:SS on a wall clock. Examiners should understand that many commercial non-linear editing software may display time (and elapsed time) differently depending on the frame rate of the source video stream.

Failing to account for these variations can result in inaccurate or incomplete results. Individuals conducting time-based analysis should be familiar with these core concepts and incorporate them into their calculations.

7. References

 [1]  Scientific Working Group on Digital Evidence, “SWGDE Best Practice for Frame Timing Analysis of H.264 Video Stored in ISO Base Media File Formats,” 2019. [Online].https://www.swgde.org/documents/draft-released-for-comment/

 

[2] Scientific Working Group on Digital Evidence, “SWGDE Best Practices for the Forensic Use of Photogrammetry,” 2015. [Online]. https://www.swgde.org/documents

 

[3] Scientific Working Group on Digital Evidence, “SWGDE Technical Notes on FFmpeg,” 2018. [Online]. https://www.swgde.org/documents

 

[4] Scientific Working Group on Digital Evidence, “SWGDE Technical Overview of Digital Video Files,” 2017. [Online]. https://www.swgde.org/documents

History

Revision Issue Date Section History
1.0 DRAFT
2019-06-06
All
Initial draft created and voted by SWGDE for release as a Draft for Public Comment.
1.0 DRAFT
2019-07-16
All
Formatting and technical edit performed for release as a Draft for Public Comment.
1.0
2019-09-19
No changes following public comment period. SWGDE voted to publish as Approved.
1.0
2019-09-29
Formatted for release as Approved version 1.0.
1.0
2020-09-17
Voted for release as final publication

1 Time base is a value used to denominate timestamps. The scale used in the time base is often referred to as the timescale. For example, a video stream with a timescale of 30,000 has a time base of 1/30000.

2 Note that multiple media streams (e.g., audio, video) with varying time bases and durations may exist within a multimedia container, e.g., body camera video may not have audio in the first 30 seconds of the video because of the pre-recording event (when the camera is activated the video goes back 30 seconds but audio is not captured), which results in the audio duration being shorter than the video duration. When evaluating a multimedia container, one may find multiple time bases present in the metadata. It is important for analysts to know which time base refers to which media stream. The container will also store a master time base used for synchronization of all media time bases in the container. This container allows for optimal playback.

3Although this is a common example, values for parts-per-second and units of time base may vary.

4 Society of Motion Picture and Television Engineers

Version: 1.0 (September 17, 2020)