Skip to Main content
Number of documents

1

Richard Dufour


Professor in Computer science

Speech and Language processing

Computer Science Lab of Nantes (LS2N) - NLP/TALN team - Nantes University


Research topics

  • Natural language processing
  • Information extraction
  • Social network analysis
  • Speech recognition

Summary

My research work began in automatic speech recognition (ASR), in particular on the processing of spontaneous speech. This theme is found in the ANR EPAC project, where my work led me to take an interest in adapting ASR models. I was also able to work on information extraction through the ASR systems, which can be found in the ANR EPAC project, for the detection of spontaneous speech, but also in the ANR PERCOL project, for recognition and correction of proper names.

The spectrum of my research topics then widened on both written and spoken documents, with, as a central point, the problem of the representation of words contained in documents with a view to their use for other tasks. My work then led me to work on document representations of a higher level than the simple word level, which can be found in particular in the ANR SuMACC project. Taking into account the temporal aspect of words and documents has also been part of my scientific orientations, which can be found in the ANR ContNomina project.

The issues in language processing, integrating a strong interdisciplinarity, have taken an increasingly important place in the research work in which I have been involved for several years (GaFes, TheVoice and RePoGa projects). In particular, these have given rise to original evaluation issues, requiring the setting up of often non-existent experimental frameworks (manager of the ANR DIETS project). We therefore worked on the proposal of new working frameworks in which the human sciences dimension was an important issue, also allowing us to offer original approaches such as for example for the exploration of digital social networks or the study of the played voice. 

Academic and scientific responsibilities

  • Since January 2022: Co-Head of the NLP (Natural Language Processing) team with Florian Boudin.
  • April 2020-September 2021: Coordinator of the Language and Cognition scientific axis of the Carnot Cognition Institute. Co-coordination with Nuria Gala (50%).
  • January 2020-September 2021: Head of the Computer Science Master's degree in Digital Society Software Engineering (ILSEN) at Avignon University.
  • September 2012-September 2021: Head of Communication for the Center for Education and Research in Computer Science (CERI) at Avignon University.

Projects, partners and evaluation campaigns

Responsible of a financed project

  • 2021-...: ANR DIETS: Automatic diagnosis of errors of end-to-end speech transcription systems from users perspective. Experts involved: Jane Wottawa - LIUM (Le Mans University), Arnaud Rey - LPC (Aix-Marseille University), Yannick Estève and Mickaël Rouvier - LIA (Avignon University).

Participation in financed projects

  • 2018-2021: ANR The Voice: Study of dubbing voices. Partners: IRCAM, Dubbing Brothers.
  • 2015-2018: ANR GaFes: Study of uses via data collected on the Internet and re-editorialization of content captured or produced by Internet users. Partners: Centre Norbert Elias, Syllabs, GECE.
  • 2013-2017: ANR ContNomina: Identification of multimedia concepts. Partners: Eurecom, Syllabs, Wikio.
  • 2013-2014: ANR SuMACC: Identification de concepts multimédias par patrons de collaboration. Partenaires : Eurecom, Syllabs, Wikio.
  • 2012-2014: ANR PERCOL: Person identification in audiovisual streams. Partners: Orange Labs, LIF (Aix-Marseille University), LILF (University of Lille).
  • 2007-2010: ANR EPAC: Automatic transcription of spontaneous speech. Partners: LIUM (Le Mans University), IRIT (University of Toulouse), LI (University of Tours), LIA (University of Avignon).

Industrial collaborations

  • Orkis: Ph.D. thesis of Killian Janod.
  • Aday (formerly Européenne des Données (EDD)): Ph.D. thesis of Mohamed Bouaziz.
  • Zenidoc: Master in computer science of Yanis Labrak.

Evaluation campaigns participation

Scientific supervision

Past thesis students

  • Mathias Quillot (2018-2022). Title: A first step towards the characterization of the information conveyed by acted voices. Co-supervised with Jean-François Bonastre (50%)
  • Adrien Gresse (2015-2020). Title: The art of the voice: characterizing the vocal information in an artistic choice. Co-supervised with Vincent Labatut (30%) and Jean-François Bonastre (40 %)
  • Mohamed Bouaziz (2013-2017). Title: Recurrent neural networks for sequence classification in parallel audiovisual streams. Co-supervised with Mohamed Morchid (30%) and Georges Linarès (40 %)
  • Killian Janod (2013-2017). Title: The representation of documents by neural networks for the comprehension of spoken documents. Co-supervised with Mohamed Morchid (30%) and Georges Linarès (40 %)
  • Mohamed Morchid (2011-2014). Title: Robust representations of noisy documents in homogeneous spaces. Co-supervised with Georges Linarès (50 %)

Current thesis students

  • Léane Jourdan (2022-...). Title: Neural approches for modeling and analyzing the argumentative structure of research articles. Co-supervised with Nicolas Hernandez (50 %) and Florian Boudin (30 %)
  • Yanis Labrak (2022-...). Title: Speech processing for medical domain. Co-supervised with Mickaël Rouvier (50 %)
  • Thibault Bañeras Roux (2021-...). Title: Automatic analysis of errors in automatic speech recognition systems from end-users reception. ANR DIETS Project. Co-supervised with Jane Wottawa (33%) and Mickaël Rouvier (33%)
  • Arthur Amalvy (2021-...). Title: Language processing and relationship modeling for the unified representation of narrative documents. Co-supervised with Vincent Labatut (50%)
  • Noé Cécillon (2019-...). Title: Combination of content and structure by representation learning: application to the analysis of textual documents. Co-supervised with Vincent Labatut (30%) and Georges Linarès (40%)

Master Interns

  • Rima Boubekeur - Master 2 (5 months - March to July 2022). Titre : Automatic generation of hashtags for short text messages from Twitter. Co-supervised with Florian Boudin (50 %)
  • Quentin Raymondaud - Master 2 (2021-2022). Title: Explainability of deep neural networks in speech processing. Co-supervised with Mickaël Rouvier (50%)
  • Yanis Labrak - Master 1 and 2 (2020-2022). Title: Language processing for the analysis of medical reports. Work carried out within the framework of industrial collaboration with the company Zenidoc.
  • Louis Aracil - Master 1 (2020-2021). Title: Tools for the analysis of the e-reputation of hotels. Co-supervised with Yannick Estève (50%). Work carried out as part of the industrial collaboration with the company Aha Concepts at Home Abroad.
  • Noé Cécillon - Master 2 (6 months - February to August 2019). Title: Exploring characteristics of graph embeddings for the detection of abusive messages. Co-supervised with Vincent Labatut (50 %)
  • Adrien Gresse - Master 2 (6 months - February to August 2015). Title: Recommendation of movie music. Co-supervised with Georges Linarès (50 %)
  • Mathias Quillot - Master 2 (Alternating student 2015-2017). Title: Conception and realization of the observatory of festivals as part of the ANR project GaFes. Co-supervised with Georges Linarès (50 %)
  • Mathias Quillot - Licence 2 et 3 (3 months - 2014/2015). Title: Demonstrator for the project ANR ContNomina. Co-supervised with Georges Linarès (50 %)

Open source tools

  • Alert (by Noé Cécillon): a tool for the detection of abusive messages in online conversations using characteristics related to the content and the conversational graph.
  • POET (by Yanis Labrak): an extended POS tagging tool for French. Available demo.

Open corpus

Full professor at LS2N - NLP/TALN team - Nantes University (since 2021)

Associate professor at LIA - Avignon University (2012-2021)

From September 2012 to September 2021, I am an associate professor at the Computer Science Laboratory of Avignon (LIA) in France. My research interests include automatic speech recognition, natural language processing and information extraction. I am particularly interested in issues related to the automatic recognition of person names in speech transcriptions and their diachronic aspect, as well as the problematic of the evaluation of speech recognition system performance in an application context. I am also involved in various projects funded by the French National Research Agency (ANR) and in different evaluation campaigns. 

Post-doctoral researcher at Orange Labs (2011-2012)

From June 2011 to June 2012, I was a post-doctoral research at Orange Labs in Lannion (France). I mainly worked on the detection, characterization and correction of speech recognition errors. The objective of this research was to find the error regions (i.e. consecutive errors) contained in automatic transcriptions, and then to categorize them in order to better understand the nature of the error. Thus, the nature of these errors can be various: errors on person names, proper names, or due to the phenomenon of homophony. 

In the continuity of these works around error detection, we proposed a solution to correct these person name errors in the automatic transcriptions. This work has a direct link with the project défi REPERE. Errors on these person names could have a direct impact, for example, in the context of document indexing. We chose to automatically correct these errors using the error region detection previously described. We applied a correction approach at the phonetic level. Indeed, the wrongly transcribed words could be very phonetically closed to the person name that should be initially found. We propose a solution that compares the phonetic sequence of a targeted error region with all the phonetic sequences of person names contained in a dictionary. The person name with the closest phonetic sequence is chosen as the correction. For example, the person name Sébastien Chabal ("s ei b a s t i in ch a b a l") has the closest phonetic sequence to the error region "s ei b a t i in ch a r a d e" (c'est bah tiens charade). 

Research intern at M*Modal (2010)

I had the opportunity to do a 4-month research internship (June to October 2010) at M*Modal in Pittsburgh (USA). The major activity of this company is to provide a perfect transcription of medical reports. 

I mainly worked on the problematic of the automatic word phonetization contained in the dictionary of the company multilingual ASR system. I proposed a strategy to estimate confidence measures for each automatic phonetization proposed by the Grapheme-To-Phoneme tool. This confidence score has for objective to guide the manual correction of these automatic phonetizations: the human correctors focus in priority on the words having a very bad automatic phonetization (low confidence score). I also proposed a solution that allows to automatically choose the n-best automatic pronunciations of a word using audio documents.

Assistant professor at LIUM (2010-2011)

From October 2010 to June 2011, I was an assistant professor at the Computer Science Laboratory (LIUM) of the University of Le Mans (France). I mainly worked on the possibility to apply the automatic spontaneous speech detection system, proposed during my Ph.D. thesis, to characterize multimedia documents, and particularly for the speaker role recognition problem. The initial study seek to highlight the link between speech spontaneity and the role of a speaker in a show. Indeed, out initial intuition was that, for example, a journalist has a tendency to prepare his discourse, while an interviewee should mainly have a less structured and fluent talk (and so a more prepared speech). For this study, we used a 100-hours radio broadcast corpus which was manually annotated in speaker roles and type of shows (corpus made in the context of the EPAC project). I then applied the automatic type of speech detection system proposed during my Ph.D. thesis. 

The second part of my part concerned the automatic recognition of speaker roles in radio broadcast shows using the features already extracted to detect spontaneous speech. We wanted to demonstrate that it was possible to obtain an alternative speaker role recognition system to those already proposed using our type of speech detection system. Results obtained showed that a speaker role recognition was possible with this approach: 74.4% of the speakers have been associated with their correct role. 

Ph.D. in Computer science at LIUM (2007-2010)

Automatic transcription of spontaneous speech

Defended the 1st December 2010 at University of Le Mans (France).
Thesis manuscript is available inline (in French) : https://tel.archives-ouvertes.fr/tel-00595465/document

Thesis committee:

President
Martine ADDA-DECKER (LPP/CNRS - Université de Paris 3)

Members
Guillaume GRAVIER (IRISA/CNRS - Université de Rennes 1)
Denis JOUVET (LORIA/INRIA - Université de Nancy)

Advisors
Paul DELÉGLISE (LIUM - Université du Maine)
Yannick ESTÈVE (LIUM - Université du Maine)

Thesis abstract

Automatic speech recognition (ASR) systems already reach a sufficient level of performance to be integrated in various applications (human-machine dialogue, information extraction, automatic indexing…). Nonetheless, in the context of large vocabulary continuous speech recognition (LVCSR), the transcription quality may vary depending the type of speech used in the documents. Indeed, ASR systems are performant when dealing with prepared speech, close to a text read, while the have much more difficulty when transcribing spontaneous speech, characterized by various specificities (disfluencies, ungrammaticality, decreased speech fluency…). 

The work of this thesis concerns the treatment of spontaneous speech, which takes part in the EPAC project context. The main objective is to propose ways to improve the performance of ASR systems on this type of speech. In our work, we chose to address spontaneous speech as a special study object requiring specific treatments. 

Thus, in a first step, we propose a tool to automatically detect spontaneous speech, based on this type of speech specificities. This proposed system is very important because it allows us, in a second step, to propose a semi-supervised adaptation of acoustic and language models of the ASR system approach to spontaneous speech. Transcriptions resulting from this adaptation offer recognition hypotheses different from those provided by the "classic" system. A significant reduction in terms of word error rate has been observed using the combination of the two systems (classic and adapted). 

The need for specific solutions finally oriented part of our work toward correcting a particularly linguistic problem: the homophony. We then seek to correct the transcripts provided by an ASR system, using a method offering specific solutions to specific homophony problems. The proposed method, in a post-processing step, corrects some homophonic word errors, regardless of the ASR system used.

Hability to supervise researches (French diploma) - HDR (2020)

Natual Language Processing: Studies and contributions at the frontiers of interdisciplinarity

HDR committee:

Jean-François BONASTRE (LIA - Avignon University)
Yannick ESTEVE (LIA - Avignon University)
Emmanuel ETHIS (INSEAC, CNAM - Guingamp)
Philippe LANGLAIS (DIRO - University of Montreal)
Georges LINARES (LIA - Avignon University)
Emmanuel MORIN (LS2N - University of Nantes)
Sophie ROSSET (LIMSI - University Paris-Saclay)
Pascale SEBILLOT (IRISA - University of Rennes 1)

Abstract

Natural language processing (NLP) is a vast field of research integrating many scientific themes (automatic speech recognition, automatic document indexing, machine translation, speech synthesis, etc.). This manuscript offers an overview of the various research works in which I have been able to participate in recent years, putting into perspective the evolution of my work, which has led me to work in collaboration with other scientific disciplines for the advancement of the NLP domain. The first part of the manuscript is devoted to one of the historical issues, namely the representation of written and spoken content. We then see, in the second part, some of the works we have carried out on performance and evaluation in language processing, ranging from the analysis and characterization of automatic speech recognition errors, to their correction. The third part shows the evolution of my research activities, which then turned towards interdisciplinary issues for language processing, with our work on the exploration of social networks for the analysis of events, the detection of abusive messages, and finally voice dubbing and voice recommendation. This last work has notably enabled collaborations with researchers in sociology, as well as in complex networks.


Jean-Valère Cossu    Juan-Manuel Torres-Moreno   

Conference papers1 document

  • Xavier Bost, Ilaria Brunetti, Luis Adrian Cabrera Diego, Jean-Valère Cossu, Andréa Carneiro Linhares, et al.. Systèmes du LIA à DEFT'13. DEFT2013, Jun 2013, Les Sables d'Olonne, France. ⟨hal-01313065⟩