Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue*,
Tiankai Hang*,
Yanhong Zeng*,
Yuchong Sun*,
Bei Liu,
Huan Yang,
Jianlong Fu,
Baining Guo
June, 2022
Abstract
We collect a large dataset which is the first high-resolution dataset including 371.5k hours of 720p videos and the most diversified dataset covering 15 popular YouTube categories. We propose a novel High-resolution and Diversified VIdeo-LAnguage pre-training model (HD-VILA) for many visual tasks.
Publication
in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition