September 7, 2017, Copenhagen, Denmark
We are happy to introduce the 1st SCNLP workshop, which was held at EMNLP 2017!
SCNLP's goal is to unite the ASR and NLP communities to discuss new frameworks for exploiting the rich information present in the speech signal to improve the capabilities of natural language processing applications such as conversational agents, question-answering systems, machine translation, and search. SCNLP encourages novel contributions that revisit the conventional NLP problems with a focus on incorporating the richness of spoken language, as well as contributions that promote cross-fertilization between statistical methods for ASR and NLP.
We envision SCNLP as a platform to promote collaboration between the ASR and NLP communities and to seek ways to lower the barrier of entry for researchers interested in working in the exciting intersection between Speech and Natural Language Processing.
The inaugural workshop is hosted on NLP home territory. Future iterations of the workshop will alternate between Speech and NLP venues to encourage the cross-fertilization of ideas.
The proceedings of SCNLP 2017 may be downloaded [here].
For specific papers, please download them from the schedule below.
The workshop took place in the Kastrup Airport room in the "CPH Conference" facility. The CPH Conference was a part of DGI-byen, located right next to the conference venue. There was also a round-table session to discuss relevant issues in speech-centric NLP.
One of the most fundamental aspects of spoken dialogue is the organization of speaking between the participants. Since it is difficult to speak and listen at the same time, the interlocutors need to take turns speaking, and this turn-taking has to be coordinated somehow. This coordination is achieved using verbal and non-verbal signals, expressed in the face and voice, including syntax, prosody and gaze. Contrary to this, spoken dialogue systems typically use a very simplistic, silence-based model of turn-taking, which often results in interruptions or sluggish responses. In this talk, I will give an overview of several studies on how to model turn-taking in spoken interaction, with a special focus on multi-modal, human-robot interaction. These studies show that humans in interaction with a human-like robot make use of the same coordination signals typically found in studies on human-human interaction, and that it is possible to automatically detect and combine these cues to facilitate real-time coordination. The studies also show that humans react naturally to such signals when used by a robot, without being given any special instructions. Finally, I will present recent work on how Recurrent Neural Networks can be used to train a predictive, continuous model of turn-taking from human-human interaction data.
We invite submissions of both long and short papers on original and unpublished work. Similar to the main conference, submissions are limited to 8 pages. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally.
Topics of interest include but are not limited to:
All submissions should conform to the EMNLP 2017 two-column format, using the provided LaTeX style files (they will be posted on the conference site). Authors are strongly discouraged from modifying the style files. Please do not use other templates (e.g., Word). Submissions that do not conform to the required styles, including paper size, margin width, and font size restrictions, will be rejected without review..
Our paper submission portal is currently open. Visit https://www.softconf.com/emnlp2017/scnlp to submit your paper.
We are excited to have a strong program committee consisting of research leaders spanning the Speech and NLP communities.
Language technologies have come of age and are playing an increasingly vital role in our everyday lives. From human-machine conversational technologies to text and speech analytics, we are routinely in contact with language technologies, with or without our knowledge. This progress is directly attributable to robust accuracy improvements in the automatic speech recognition (ASR) and natural language processing (NLP) communities. While both communities use data-driven techniques to achieve robustness, the opportunity to jointly optimize the robustness in a majority of speech-driven natural language processing systems is widely ignored; instead speech-centric NLP tasks predominantly rely on a sequential application of independently optimized ASR and NLP tools.
While advancements in ASR have been demonstrated through significant reductions in word error rate evaluation scores for a variety of word transcription tasks, the standard ASR evaluation metric does not account for the varied uses of the transcriptions in downstream NLP tasks. Furthermore, the impact of rich para-lexical information latent in speech on downstream tasks has not received sufficient attention due to the disproportionate emphasis on word transcription in speech processing. Likewise, although NLP research has begun to address the problem of extra-grammatical and telegraphic texts in user-generated social media, the traditional focus of the field has been on well-edited written texts. As a result, the majority of speech-centric NLP systems do not exploit the weighted multi-string hypotheses typically produced by speech recognizers, but instead treat the problem as a simple ASR-NLP pipeline which transforms ASR outputs into text-like input, such as N-best word hypotheses, prior to processing with conventional NLP tools. Such approaches result in a suboptimal quality of output with potentially significant room for improvement by leveraging the rich information available from speech input.
The purpose of this workshop is to unite the ASR and NLP communities to discuss new frameworks for exploiting the rich information present in the speech signal to improve the capabilities of natural language processing applications such as conversational agents, question-answering systems, machine translation, and search. In addition to acoustic environment information, the audio signal may contain speaker-specific features which may identify the emotional state, demographic information, and the presence of uncertainty in the speaker’s utterance: features which may influence the output of the NLP component. For example, a dialogue system may infer negative feedback from the consumer’s responses and switch to a different dialogue strategy to obtain the necessary information to carry out its task. We invite contributions that revisit the conventional NLP problems with a focus on incorporating the richness of spoken language, as well as contributions that promote cross-fertilization between statistical methods for ASR and NLP.
The open exchange of ideas, the freedom of thought and expression, and respectful scientific debate are central to the aims and goals of the ACL. These require a community and an environment that recognizes the inherent worth of every person and group, that fosters dignity, understanding, and mutual respect, and that embraces diversity. For these reasons, ACL is dedicated to providing a harassment-free experience for all the members, as well as participants at our events and in our programs.
Harassment and hostile behavior are unwelcome at any ACL conference, associated event, or in ACL-affiliated on-line discussions. This includes: speech or behavior that intimidates, creates discomfort, or interferes with a person's participation or opportunity for participation in a conference or an event. We aim for ACL-related activities to be an environment where harassment in any form does not happen, including but not limited to: harassment based on race, gender, religion, age, color, appearance, national origin, ancestry, disability, sexual orientation, or gender identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking, harassing photography or recording, inappropriate physical contact, and unwelcome sexual attention. The policy is not intended to inhibit challenging scientific debate, but rather to promote it through ensuring that all are welcome to participate in shared spirit of scientific inquiry.
It is the responsibility of the community as a whole to promote an inclusive and positive environment for our scholarly activities. In addition, anyone who experiences harassment or hostile behavior may contact any current member of the ACL Executive Committee or contact Priscilla Rasmussen (acl [AT] aclweb.org), who is usually available at the registration desk during ACL conferences. Members of the executive committee will be instructed to keep any such contact in strict confidence, and those who approach the committee will be consulted before any actions are taken.