Skip to Main content
Number of documents


Julien Abadji

Interested in software engineering revolving around generation and filtering of huge multilingual corpora.

Conference papers1 document

  • Julien Abadji, Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot. Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus. CMLC 2021 - 9th Workshop on Challenges in the Management of Large Corpora, Jul 2021, Limerick / Virtual, Ireland. ⟨10.14618/ids-pub-10468⟩. ⟨hal-03301590⟩

Preprints, Working Papers, ...1 document

  • Julien Abadji, Pedro Ortiz Suarez, Laurent Romary, Benoît Sagot. Towards a Cleaner Document-Oriented Multilingual Crawled Corpus. 2022. ⟨hal-03536361⟩