2017 1st Workshop on Speech-Centric Natural Language Processing (SCNLP)

September 7, 2017, Copenhagen, Denmark

We are happy to introduce the 1st SCNLP workshop, which will be held at EMNLP 2017!

SCNLP's goal is to unite the ASR and NLP communities to discuss new frameworks for exploiting the rich information present in the speech signal to improve the capabilities of natural language processing applications such as conversational agents, question-answering systems, machine translation, and search. SCNLP encourages novel contributions that revisit the conventional NLP problems with a focus on incorporating the richness of spoken language, as well as contributions that promote cross-fertilization between statistical methods for ASR and NLP.

We envision SCNLP as a platform to promote collaboration between the ASR and NLP communities and to seek ways to lower the barrier of entry for researchers interested in working in the exciting intersection between Speech and Natural Language Processing.

The inaugural workshop is hosted on NLP home territory. Future iterations of the workshop will alternate between Speech and NLP venues to encourage the cross-fertilization of ideas.

We look forward to seeing you at EMNLP!

Workshop Organizers

News and Important Dates

[August 9]

The workshop schedule has been posted.

[July 5]

We need a few more days for paper review. We have pushed our notification date back to July 7, 2017. We apologize for the inconvenience.

[June 2]

Due to several requests from our participants, we are extending our paper submission deadline until June 9, 2017. Thanks for your participation!

[February 17]

We recently added the ACL Anti-Harassment Policy to this page. We stand behind our higher calling to be kind stewards of knowledge and wisdom by valuing the researcher as well as his or her research.

[February 15]

Paper submission portal is open!


Workshop Schedule

The workshop will consist of one invited talk lasting for 50 minutes and a total of 10 presentations, each lasting 20 minutes with 5 minutes for questions.
There will also be a round-table session to discuss relevant issues in speech-centric NLP.

September 7, 2017
8:50 - 9:00    Opening Remarks
Nicholas Ruiz and Srinivas Bangalore
9:00 - 10:00    Invited Talk
   Modelling turn-taking in spoken interaction
Gabriel Skantze
KTH Royal Institute of Technology in Stockholm

One of the most fundamental aspects of spoken dialogue is the organization of speaking between the participants. Since it is difficult to speak and listen at the same time, the interlocutors need to take turns speaking, and this turn-taking has to be coordinated somehow. This coordination is achieved using verbal and non-verbal signals, expressed in the face and voice, including syntax, prosody and gaze. Contrary to this, spoken dialogue systems typically use a very simplistic, silence-based model of turn-taking, which often results in interruptions or sluggish responses. In this talk, I will give an overview of several studies on how to model turn-taking in spoken interaction, with a special focus on multi-modal, human-robot interaction. These studies show that humans in interaction with a human-like robot make use of the same coordination signals typically found in studies on human-human interaction, and that it is possible to automatically detect and combine these cues to facilitate real-time coordination. The studies also show that humans react naturally to such signals when used by a robot, without being given any special instructions. Finally, I will present recent work on how Recurrent Neural Networks can be used to train a predictive, continuous model of turn-taking from human-human interaction data.

10:00 - 10:30    Session I
   Functions of Silences towards Information Flow in Spoken Conversation
Shammur Absar Chowdhury, Evgeny Stepanov, Morena Danieli, Giuseppe Riccardi
University of Trento
10:30 - 11:00    Coffee Break
11:00 - 12:30    Session II
   Encoding Word Confusion Networks with Recurrent Neural Networks for Dialog State Tracking
Glorianna Jagfeld1 and Ngoc Thang Vu2
1Institute for Natural Language Processing, University of Stuttgart, 2University of Stuttgart
   Analyzing Human and Machine Performance In Resolving Ambiguous Spoken Sentences
Hussein Ghaly1 and Michael Mandel2
1City University of New York, 2Brooklyn College, CUNY
   Parsing transcripts of speech
Andrew Caines1, Michael McCarthy2, Paula Buttery1
1University of Cambridge, 2University of Nottingham
   Enriching ASR Lattices with POS Tags for Dependency Parsing
Moritz Stiefel1 and Ngoc Thang Vu2
1IMS, University of Stuttgart, 2University of Stuttgart
12:30 - 14:00    Lunch
14:00 - 15:30    Session III
   End-to-End Information Extraction without Token-Level Supervision
Rasmus Berg Palm1, Dirk Hovy2, Florian Laws3, Ole Winther1
1Technical University Denmark, 2Center for Language Technology, University of Copenhagen, 3University of Stuttgart
   Spoken Term Discovery for Language Documentation using Translations
Antonios Anastasopoulos1, Sameer Bansal2, David Chiang1, Sharon Goldwater2, Adam Lopez2
1University of Notre Dame, 2University of Edinburgh
   Amharic-English Speech Translation in Tourism Domain
Michael Melese1, Laurent Besacier2, Million Meshesha1
1Addis Ababa University, Addis Ababa, Ethiopia, 2LIG Laboratory, UJF, BP53, 38041 Grenoble Cedex 9, France
   Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks
Anastassia Loukina, Nitin Madnani, Aoife Cahill
Educational Testing Service
15:30 - 16:00    Coffee Break / Poster Discussion
16:00 - 16:25    Session IV
   Improving coreference resolution with automatically predicted prosodic information
Ina Roesiger, Sabrina Stehwien, Arndt Riester, Ngoc Thang Vu
University of Stuttgart
16:25 - 17:50    Round-table: Issues in Speech-centric NLP
17:50 - 18:00    Closing

Call for Papers

We invite submissions of both long and short papers on original and unpublished work. Similar to the main conference, submissions are limited to 8 pages. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally.

Topics of interest include but are not limited to:

All submissions should conform to the EMNLP 2017 two-column format, using the provided LaTeX style files (they will be posted on the conference site). Authors are strongly discouraged from modifying the style files. Please do not use other templates (e.g., Word). Submissions that do not conform to the required styles, including paper size, margin width, and font size restrictions, will be rejected without review..

Our paper submission portal is currently open. Visit https://www.softconf.com/emnlp2017/scnlp to submit your paper.

Program Committee

We are excited to have a strong program committee consisting of research leaders spanning the Speech and NLP communities.
This list is still growing.


Language technologies have come of age and are playing an increasingly vital role in our everyday lives. From human-machine conversational technologies to text and speech analytics, we are routinely in contact with language technologies, with or without our knowledge. This progress is directly attributable to robust accuracy improvements in the automatic speech recognition (ASR) and natural language processing (NLP) communities. While both communities use data-driven techniques to achieve robustness, the opportunity to jointly optimize the robustness in a majority of speech-driven natural language processing systems is widely ignored; instead speech-centric NLP tasks predominantly rely on a sequential application of independently optimized ASR and NLP tools.

While advancements in ASR have been demonstrated through significant reductions in word error rate evaluation scores for a variety of word transcription tasks, the standard ASR evaluation metric does not account for the varied uses of the transcriptions in downstream NLP tasks. Furthermore, the impact of rich para-lexical information latent in speech on downstream tasks has not received sufficient attention due to the disproportionate emphasis on word transcription in speech processing. Likewise, although NLP research has begun to address the problem of extra-grammatical and telegraphic texts in user-generated social media, the traditional focus of the field has been on well-edited written texts. As a result, the majority of speech-centric NLP systems do not exploit the weighted multi-string hypotheses typically produced by speech recognizers, but instead treat the problem as a simple ASR-NLP pipeline which transforms ASR outputs into text-like input, such as N-best word hypotheses, prior to processing with conventional NLP tools. Such approaches result in a suboptimal quality of output with potentially significant room for improvement by leveraging the rich information available from speech input.

The purpose of this workshop is to unite the ASR and NLP communities to discuss new frameworks for exploiting the rich information present in the speech signal to improve the capabilities of natural language processing applications such as conversational agents, question-answering systems, machine translation, and search. In addition to acoustic environment information, the audio signal may contain speaker-specific features which may identify the emotional state, demographic information, and the presence of uncertainty in the speaker’s utterance: features which may influence the output of the NLP component. For example, a dialogue system may infer negative feedback from the consumer’s responses and switch to a different dialogue strategy to obtain the necessary information to carry out its task. We invite contributions that revisit the conventional NLP problems with a focus on incorporating the richness of spoken language, as well as contributions that promote cross-fertilization between statistical methods for ASR and NLP.

Anti-Harassment Policy

The open exchange of ideas, the freedom of thought and expression, and respectful scientific debate are central to the aims and goals of the ACL. These require a community and an environment that recognizes the inherent worth of every person and group, that fosters dignity, understanding, and mutual respect, and that embraces diversity. For these reasons, ACL is dedicated to providing a harassment-free experience for all the members, as well as participants at our events and in our programs.

Harassment and hostile behavior are unwelcome at any ACL conference, associated event, or in ACL-affiliated on-line discussions. This includes: speech or behavior that intimidates, creates discomfort, or interferes with a person's participation or opportunity for participation in a conference or an event. We aim for ACL-related activities to be an environment where harassment in any form does not happen, including but not limited to: harassment based on race, gender, religion, age, color, appearance, national origin, ancestry, disability, sexual orientation, or gender identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking, harassing photography or recording, inappropriate physical contact, and unwelcome sexual attention. The policy is not intended to inhibit challenging scientific debate, but rather to promote it through ensuring that all are welcome to participate in shared spirit of scientific inquiry.

It is the responsibility of the community as a whole to promote an inclusive and positive environment for our scholarly activities. In addition, anyone who experiences harassment or hostile behavior may contact any current member of the ACL Executive Committee or contact Priscilla Rasmussen (acl [AT] aclweb.org), who is usually available at the registration desk during ACL conferences. Members of the executive committee will be instructed to keep any such contact in strict confidence, and those who approach the committee will be consulted before any actions are taken.


Send us an email at scnlp {AT} interactions.com.