Filtrer vos résultats
- 7
- 5
- 1
- 1
- 7
- 3
- 3
- 1
- 7
- 7
- 6
- 5
- 1
- 7
- 4
- 4
- 4
- 4
- 2
- 2
- 2
- 2
- 1
- 1
- 1
7 résultats
|
|
triés par
|
|
TubeDETR: Spatio-Temporal Video Grounding with TransformersCVPR 2022 - IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2022, New Orleans, United States
Communication dans un congrès
hal-03625586v2
|
||
|
Learning Visual Language Models for Video UnderstandingComputer Vision and Pattern Recognition [cs.CV]. Ecole Normale Superieure de Paris - ENS Paris, 2023. English. ⟨NNT : ⟩
Thèse
tel-04307117v2
|
||
|
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video CaptioningCVPR 2023 - IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2023, Vancouver, Canada
Communication dans un congrès
hal-04039246v1
|
||
|
VidChapters-7M: Video Chapters at ScaleNeurIPS 2023 - Conference on Neural Information Processing Systems - Track on Datasets and Benchmarks, Dec 2023, New Orleans (LA), United States
Communication dans un congrès
hal-04217697v1
|
||
|
Just Ask: Learning to Answer Questions from Millions of Narrated VideosICCV 2021 - IEEE International Conference on Computer Vision, Oct 2021, Montréal, Canada
Communication dans un congrès
hal-03328749v1
|
||
|
Zero-Shot Video Question Answering via Frozen Bidirectional Language ModelsNeurIPS 2022 - 36th Conference on Neural Information Processing Systems, Nov 2022, New Orleans, United States
Communication dans un congrès
hal-03807016v2
|
||
|
Learning to Answer Visual Questions from Web VideosIEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, ⟨10.1109/tpami.2022.3173208⟩
Article dans une revue
hal-03664182v1
|