Accéder directement au contenu

Julien Abadji

2
Documents
Identifiants chercheurs

Présentation

Interested in software engineering revolving around generation and filtering of huge multilingual corpora.

Publications

Image document

Towards a Cleaner Document-Oriented Multilingual Crawled Corpus

Julien Abadji , Pedro Ortiz Suarez , Laurent Romary , Benoît Sagot
Thirteenth Language Resources and Evaluation Conference - LREC 2022, Jun 2022, Marseille, France
Communication dans un congrès hal-03536361v1
Image document

Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus

Julien Abadji , Pedro Javier Ortiz Suárez , Laurent Romary , Benoît Sagot
CMLC 2021 - 9th Workshop on Challenges in the Management of Large Corpora, Jul 2021, Limerick / Virtual, Ireland. ⟨10.14618/ids-pub-10468⟩
Communication dans un congrès hal-03301590v1