EMNLP 2017 Workshop on Speech-Centric Natural Language Processing (SCNLP)

2017 1st Workshop on Speech-Centric Natural Language Processing (SCNLP)

September 7, 2017, Copenhagen, Denmark

We are happy to introduce the 1st SCNLP workshop, which was held at EMNLP 2017!

SCNLP's goal is to unite the ASR and NLP communities to discuss new frameworks for exploiting the rich information present in the speech signal to improve the capabilities of natural language processing applications such as conversational agents, question-answering systems, machine translation, and search. SCNLP encourages novel contributions that revisit the conventional NLP problems with a focus on incorporating the richness of spoken language, as well as contributions that promote cross-fertilization between statistical methods for ASR and NLP.

We envision SCNLP as a platform to promote collaboration between the ASR and NLP communities and to seek ways to lower the barrier of entry for researchers interested in working in the exciting intersection between Speech and Natural Language Processing.

The inaugural workshop is hosted on NLP home territory. Future iterations of the workshop will alternate between Speech and NLP venues to encourage the cross-fertilization of ideas.

Proceedings

The proceedings of SCNLP 2017 may be downloaded [here].

For specific papers, please download them from the schedule below.

Workshop Schedule & Logistics

The workshop took place in the Kastrup Airport room in the "CPH Conference" facility. The CPH Conference was a part of DGI-byen, located right next to the conference venue. There was also a round-table session to discuss relevant issues in speech-centric NLP.

08:50 - 09:00 Opening Remarks
09:00 - 10:00 Invited Talk

Modelling turn-taking in spoken interaction
Gabriel Skantze
KTH Royal Institute of Technology in Stockholm

One of the most fundamental aspects of spoken dialogue is the organization of speaking between the participants. Since it is difficult to speak and listen at the same time, the interlocutors need to take turns speaking, and this turn-taking has to be coordinated somehow. This coordination is achieved using verbal and non-verbal signals, expressed in the face and voice, including syntax, prosody and gaze. Contrary to this, spoken dialogue systems typically use a very simplistic, silence-based model of turn-taking, which often results in interruptions or sluggish responses. In this talk, I will give an overview of several studies on how to model turn-taking in spoken interaction, with a special focus on multi-modal, human-robot interaction. These studies show that humans in interaction with a human-like robot make use of the same coordination signals typically found in studies on human-human interaction, and that it is possible to automatically detect and combine these cues to facilitate real-time coordination. The studies also show that humans react naturally to such signals when used by a robot, without being given any special instructions. Finally, I will present recent work on how Recurrent Neural Networks can be used to train a predictive, continuous model of turn-taking from human-human interaction data.

10:00 - 10:30 Session I

Functions of Silences towards Information Flow in Spoken Conversation
[PDF]
Shammur Absar Chowdhury, Evgeny Stepanov, Morena Danieli, Giuseppe Riccardi
University of Trento

10:30 - 11:00 Coffee Break
11:00 - 12:30 Session II

Encoding Word Confusion Networks with Recurrent Neural Networks for Dialog State Tracking
[PDF] [Slides]
Glorianna Jagfeld¹ and Ngoc Thang Vu²
¹Institute for Natural Language Processing, University of Stuttgart, ²University of Stuttgart
Analyzing Human and Machine Performance In Resolving Ambiguous Spoken Sentences
[PDF] [PDF]
Hussein Ghaly¹ and Michael Mandel²
¹City University of New York, ²Brooklyn College, CUNY
Parsing transcripts of speech
[PDF] [Slides] [Data]
Andrew Caines¹, Michael McCarthy², Paula Buttery¹
¹University of Cambridge, ²University of Nottingham
Enriching ASR Lattices with POS Tags for Dependency Parsing
[PDF] [Slides]
Moritz Stiefel¹ and Ngoc Thang Vu²
¹IMS, University of Stuttgart, ²University of Stuttgart

12:30 - 14:00 Lunch - Øksnehallen
14:00 - 15:30 Session III

End-to-End Information Extraction without Token-Level Supervision
[PDF] [Slides]
Rasmus Berg Palm¹, Dirk Hovy², Florian Laws³, Ole Winther¹
¹Technical University Denmark, ²Center for Language Technology, University of Copenhagen, ³University of Stuttgart
Spoken Term Discovery for Language Documentation using Translations
[PDF] [Slides]
Antonios Anastasopoulos¹, Sameer Bansal², David Chiang¹, Sharon Goldwater², Adam Lopez²
¹University of Notre Dame, ²University of Edinburgh
Amharic-English Speech Translation in Tourism Domain
[PDF] [Slides]
Michael Melese¹, Laurent Besacier², Million Meshesha¹
¹Addis Ababa University, Addis Ababa, Ethiopia, ²LIG Laboratory, UJF, BP53, 38041 Grenoble Cedex 9, France
Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks
[PDF] [Slides]
Anastassia Loukina, Nitin Madnani, Aoife Cahill
Educational Testing Service

15:30 - 16:00 Coffee Break
16:00 - 16:25 Session IV

Improving coreference resolution with automatically predicted prosodic information
[PDF] [Slides]
Ina Roesiger, Sabrina Stehwien, Arndt Riester, Ngoc Thang Vu
University of Stuttgart

16:25 - 17:50 Round-table: Issues in Speech-centric NLP
17:50 - 18:00 Closing

Call for Papers

We invite submissions of both long and short papers on original and unpublished work. Similar to the main conference, submissions are limited to 8 pages. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally.

Topics of interest include but are not limited to:

Joint ASR/NLP modeling using deep learning
Spoken query reformulation for Question/Answering systems
ASR error modeling and evaluation for NLP
Emotive Speech Synthesis for Spoken dialogue systems
Word-sense disambiguation for speech
Information extraction from speech transcripts
Domain adaptation (Adapting textual NLP training data to speech-centric tasks)
Spoken language translation
Rich speech transcription
Disfluency and uncertainty detection
NLP with ASR lattices/confusion networks
Speech segmentation for NLP
Discourse and Speech Processing

All submissions should conform to the EMNLP 2017 two-column format, using the provided LaTeX style files (they will be posted on the conference site). Authors are strongly discouraged from modifying the style files. Please do not use other templates (e.g., Word). Submissions that do not conform to the required styles, including paper size, margin width, and font size restrictions, will be rejected without review..

Our paper submission portal is currently open. Visit https://www.softconf.com/emnlp2017/scnlp to submit your paper.

Program Committee

We are excited to have a strong program committee consisting of research leaders spanning the Speech and NLP communities.

Workshop Organizers

Nicholas Ruiz (Interactions, USA)
Srinivas Bangalore (Interactions, USA)

Program Committee

Francisco Casacuberta (Universitat Politècnica de València)
Eric Fosler-Lussier (The Ohio State University, USA)
Dilek Hakkani-Tür (Google, USA)
Xiaodong He (Microsoft Research, USA)
Peter Heeman (Oregon Health & Science University, USA)
Julia Hirschberg (Columbia University, USA)
Preethi Jyothi (IIT Bombay, India)
Gakuto Kurata (IBM Research, Tokyo, Japan)
Lin-shan Lee (National Taiwan University)
Yang Liu (University of Texas at Dallas, USA)
Karen Livescu (Toyota Technological Institute at Chicago, USA)
Raymond Mooney (University of Texas at Austin, USA)
Satoshi Nakamura (Nara Institute of Science and Technology, Japan)
Mari Ostendorf (University of Washington, USA)
Giuseppe Riccardi (University of Trento, Italy)
Andrew Rosenberg (IBM T.J. Watson Research Center)
Isabel Trancoso (Laboratòrio de Sistemas de Lingua Falada, Lisbon)
Jason Williams (Microsoft Research, USA)

Motivation

Language technologies have come of age and are playing an increasingly vital role in our everyday lives. From human-machine conversational technologies to text and speech analytics, we are routinely in contact with language technologies, with or without our knowledge. This progress is directly attributable to robust accuracy improvements in the automatic speech recognition (ASR) and natural language processing (NLP) communities. While both communities use data-driven techniques to achieve robustness, the opportunity to jointly optimize the robustness in a majority of speech-driven natural language processing systems is widely ignored; instead speech-centric NLP tasks predominantly rely on a sequential application of independently optimized ASR and NLP tools.

While advancements in ASR have been demonstrated through significant reductions in word error rate evaluation scores for a variety of word transcription tasks, the standard ASR evaluation metric does not account for the varied uses of the transcriptions in downstream NLP tasks. Furthermore, the impact of rich para-lexical information latent in speech on downstream tasks has not received sufficient attention due to the disproportionate emphasis on word transcription in speech processing. Likewise, although NLP research has begun to address the problem of extra-grammatical and telegraphic texts in user-generated social media, the traditional focus of the field has been on well-edited written texts. As a result, the majority of speech-centric NLP systems do not exploit the weighted multi-string hypotheses typically produced by speech recognizers, but instead treat the problem as a simple ASR-NLP pipeline which transforms ASR outputs into text-like input, such as N-best word hypotheses, prior to processing with conventional NLP tools. Such approaches result in a suboptimal quality of output with potentially significant room for improvement by leveraging the rich information available from speech input.

The purpose of this workshop is to unite the ASR and NLP communities to discuss new frameworks for exploiting the rich information present in the speech signal to improve the capabilities of natural language processing applications such as conversational agents, question-answering systems, machine translation, and search. In addition to acoustic environment information, the audio signal may contain speaker-specific features which may identify the emotional state, demographic information, and the presence of uncertainty in the speaker’s utterance: features which may influence the output of the NLP component. For example, a dialogue system may infer negative feedback from the consumer’s responses and switch to a different dialogue strategy to obtain the necessary information to carry out its task. We invite contributions that revisit the conventional NLP problems with a focus on incorporating the richness of spoken language, as well as contributions that promote cross-fertilization between statistical methods for ASR and NLP.

Anti-Harassment Policy

The open exchange of ideas, the freedom of thought and expression, and respectful scientific debate are central to the aims and goals of the ACL. These require a community and an environment that recognizes the inherent worth of every person and group, that fosters dignity, understanding, and mutual respect, and that embraces diversity. For these reasons, ACL is dedicated to providing a harassment-free experience for all the members, as well as participants at our events and in our programs.

Harassment and hostile behavior are unwelcome at any ACL conference, associated event, or in ACL-affiliated on-line discussions. This includes: speech or behavior that intimidates, creates discomfort, or interferes with a person's participation or opportunity for participation in a conference or an event. We aim for ACL-related activities to be an environment where harassment in any form does not happen, including but not limited to: harassment based on race, gender, religion, age, color, appearance, national origin, ancestry, disability, sexual orientation, or gender identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking, harassing photography or recording, inappropriate physical contact, and unwelcome sexual attention. The policy is not intended to inhibit challenging scientific debate, but rather to promote it through ensuring that all are welcome to participate in shared spirit of scientific inquiry.

It is the responsibility of the community as a whole to promote an inclusive and positive environment for our scholarly activities. In addition, anyone who experiences harassment or hostile behavior may contact any current member of the ACL Executive Committee or contact Priscilla Rasmussen (acl [AT] aclweb.org), who is usually available at the registration desk during ACL conferences. Members of the executive committee will be instructed to keep any such contact in strict confidence, and those who approach the committee will be consulted before any actions are taken.

Contact

Send us an email at scnlp {AT} interactions.com.

2017 1st Workshop on Speech-Centric Natural Language Processing (SCNLP)

Proceedings

Workshop Schedule & Logistics

Modelling turn-taking in spoken interaction

Functions of Silences towards Information Flow in Spoken Conversation

Encoding Word Confusion Networks with Recurrent Neural Networks for Dialog State Tracking

Analyzing Human and Machine Performance In Resolving Ambiguous Spoken Sentences

Parsing transcripts of speech

Enriching ASR Lattices with POS Tags for Dependency Parsing

End-to-End Information Extraction without Token-Level Supervision

Spoken Term Discovery for Language Documentation using Translations

Amharic-English Speech Translation in Tourism Domain

Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks

Improving coreference resolution with automatically predicted prosodic information

Call for Papers

Program Committee

Workshop Organizers

Program Committee

Motivation

Anti-Harassment Policy

Contact