Benchmark for Anonymous Video Analytics - Dataset
The dataset was collected in settings that mimic real-world signage-camera setups used for AVA. The dataset is composed of 16 videos recorded at different locations such as airports, malls, subway stations, and pedestrian areas. Outdoor videos are recorded at different times of day such as morning, afternoon, and evening. The dataset is recorded with Internet Protocol or USB fixed cameras with wide and narrow lenses to mimic the real-world use cases. Videos are recorded at 1920x1080 resolution and 30fps. The dataset includes videos of duration between 2 minutes and 30 seconds, and 6 minutes and 26 seconds, totaling over 78 minutes, with over 141,000 frames. The videos feature 34 professional actors.
A sample frame of each location is shown below. For the mall location, two videos are at different times: indoors (Mall-1/2) and outdoors (Mall-3/4).
Airport-1 | Airport-2 | Airport-3 | Airport-4 |
---|---|---|---|
Mall-1/2 | Mall-3/4 | Pedestrian-1 | Pedestrian-2 |
Pedestrian-3 | Pedestrian-4 | Pedestrian-5 | Subway-1 |
Subway-2 | Subway-3 | ||
Annotations
A professional team of annotators used Computer Vision Annotation Tool (CVAT) to fully annotate all videos with the following attributes: face and body of the people, identity, age, gender, attention, pose, orientation, and occlusions (figure below). Annotations are provided in xml files with CVAT format.
For preventing the analytics to focus on very small (far from signage) people, who are likely to have no OTS, and to simplify the annotation process, we define a region in some scenarios where people are omitted, and thus, not annotated. We refer to these regions as ignore areas, that are shown with a white shading in the sample frames. Further information in the paper.
The annotations maintain the identity of each person throughout the same video, even if the person exists and re-enters into the field of view and across all videos, and across videos.
Each video includes a range between 11 and 158 unique people. The dataset annotation includes a total of 785 unique people, and over 748,000 annotated bounding boxes of people.
Dataset summary:
Video | Length | Unique people | Number of localization annotations | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Name | Daytime | Illuminance [Lux] | Time [min:sec] | Frames | All | OTS | People | Per-frame | Faces | Per-frame |
Airport-1 | - | 500 | 5:21 | 9629 | 37 | 29 | 22062 | 2.4 ± 1.0 | 12832 | 1.4 ± 1.1 |
Airport-2 | - | 500 | 5:34 | 10008 | 35 | 29 | 23600 | 2.7 ± 1.4 | 14214 | 1.6 ± 1.2 |
Airport-3 | - | 500 | 6:26 | 11578 | 47 | 44 | 26704 | 2.4 ± 1.2 | 17849 | 1.6 ± 1.0 |
Airport-4 | - | 500 | 5:08 | 9247 | 61 | 56 | 43685 | 4.7 ± 2.0 | 17792 | 1.9 ± 1.2 |
Mall-1 | - | 300 | 4:38 | 8344 | 158 | 111 | 106852 | 12.8 ± 2.3 | 45835 | 5.5 ± 1.8 |
Mall-2 | - | 300 | 3:41 | 6626 | 145 | 105 | 95417 | 14.4 ± 3.7 | 42779 | 6.5 ± 2.3 |
Mall-3 | - | 800 | 5:25 | 9740 | 33 | 30 | 37120 | 3.8 ± 1.5 | 18906 | 1.9 ± 1.2 |
Mall-4 | - | 800 | 6:04 | 10931 | 53 | 50 | 47113 | 4.3 ± 1.6 | 32038 | 2.9 ± 1.3 |
Pedestrian-1 | Afternoon | 60000 | 5:40 | 10202 | 18 | 17 | 39680 | 4.0 ± 1.7 | 19859 | 2.0 ± 1.4 |
Pedestrian-2 | Afternoon | 40000 | 6:15 | 11262 | 56 | 40 | 58477 | 5.2 ± 1.7 | 25042 | 2.2 ± 1.6 |
Pedestrian-3 | Midday-overcast | 7000 | 5:41 | 10220 | 27 | 25 | 22738 | 2.3 ± 1.2 | 13915 | 1.4 ± 1.0 |
Pedestrian-4 | Midday-shade | 5500 | 4:32 | 8166 | 27 | 25 | 33031 | 4.0 ± 1.4 | 16248 | 2.0 ± 1.0 |
Pedestrian-5 | Evening | 250 | 2:58 | 5350 | 11 | 11 | 24476 | 4.6 ± 1.6 | 13504 | 2.5 ± 1.8 |
Subway-1 | - | 180 | 3:13 | 5795 | 17 | 17 | 36828 | 6.5 ± 3.1 | 25884 | 4.6 ± 2.7 |
Subway-2 | - | 180 | 2:32 | 4549 | 29 | 28 | 45125 | 9.9 ± 2.8 | 24248 | 5.3 ± 2.4 |
Subway-3 | - | 200 | 5:45 | 10342 | 31 | 29 | 85358 | 8.5 ± 2.9 | 35460 | 3.6 ± 1.7 |
Overall | - | [180,60000] | 78:53 | 141989 | 785 | 646 | 748266 | 5.4 ± 3.9 | 376405 | 2.7 ± 2.1 |
Intel is committed to respecting human rights and avoiding complicity in human rights abuses. See Intel's Global Human Rights Principles. Intel's products and software are intended only to be used in applications that do not cause or contribute to a violation of an internationally recognized human right.