Tutorials

ASHA 2024 Invited Short Course on How AI Can Be Leveraged To Power Clinical Speech Assessment

Artificial intelligence (AI) is being increasingly used in a variety of clinical speech applications ranging from pronunciation and articulation assessment to remote patient monitoring to improving efficiency of practice. This short course introduces AI and machine learning (ML) concepts to a clinical audience, using practical examples to illustrate their application to clinical speech assessment, and their advantages and disadvantages. The course will also include a demonstration of how these concepts are seamlessly integrated into a real-world, conversational AI based remote assessment platform.

The short course was designed with a companion software notebook generated completely automatically using ChatGPT for execution within Google Colab – a free, open-source Python based Jupyter environment. The notebook illustrates how carefully designed prompts to ChatGPT can generate Python code to take in a given speech sample, and generate an automated summary report that plots the signal, and computes several measures of interest, such as speaking duration, SPL, pitch. Note that you will need to have Google account to run each cell of the code (by clicking the “Play” button in the top left margin of each cell), and a ChatGPT account to generate the code in the notebook based on input prompts.


ASHA 2023 Invited Masterclass on Integrating Speech Biomarkers and Outcome Measures Into Clinical Practice: Opportunities and Challenges (with Yana Yunusova)

With recent advancements in technology, automatic speech assessment is rapidly becoming available for use in research, clinic and home. Understanding the state of the field and readiness of these technologies in a context of research on biomarkers and outcome measures is essential while considering their clinical adoption and implementation. This tutorial will focus on outlining the measurement development practices for both biomarkers and outcome measures and the current research on automatic detection and monitoring of speech in brain diseases including Parkinson’s disease, ALS, mood disorders, and autism. It will describe frameworks for biomarker/outcome measure validation and challenges associated with their development. The session will conclude with recommendations for clinicians on the implementation of this knowledge in their clinical practice.

This tutorial was co-developed with Dr. Yana Yunusova, Professor of Speech-Language Pathology at the University of Toronto. Her research program is focused on the development of cutting-edge technologies for automatic assessment of orofacial dysfunction and speech disorders and novel methods of speech therapy that aim to impact clinical practice in the fields of speech language pathology and neurology.


Interspeech 2022 Invited Keynote Lecture on Multimodal Dialog Technologies for Neurological and Mental Health

This keynote talk reviews various modalities of health information that are useful for developing such remote clinical assessments in the real world at scale. I first present an overview of the various modalities of health information — speech acoustics, natural language, conversational dynamics, orofacial or full body movement, eye gaze, respiration, cardiopulmonary, and neural — which can each be extracted from various signal sources — audio, video, text, or sensors. I further motivate their clinical utility with examples of how information from each modality can help us characterize how different disorders affect different aspects of patients’ spoken communication. I argue that combining multiple modalities of health information allows for improved scientific interpretability, improved performance on downstream health applications such as early detection and progress monitoring, improved technological robustness, and improved user experience. Throughout, I illustrate how these principles can be leveraged for remote clinical assessment at scale using a real-world case study of the Modality assessment platform.

For more details, also see the following paper: Vikram Ramanarayanan (2024). Multimodal Technologies for Remote Assessment of Neurological and Mental Health, in: Journal of Speech, Language and Hearing Research. [pdf]



Interspeech 2020 Tutorial on Spoken Language Processing for Language Learning and Assessment (with Klaus Zechner and Keelan Evanini)

This tutorial provides an in-depth survey of the state of the art in spoken language processing in language learning and assessment from a practitioner’s perspective. The first part of the tutorial will discuss in detail the acoustic, speech, and language processing challenges in recognizing and dealing with native and non-native speech from both adults and children from different language backgrounds at scale. The second part of the tutorial will examine the current state of the art in approaches to automated scoring of monolog speech data along various dimensions of spoken language proficiency. The final part of the tutorial will look at hot topics and key challenge facing the field at the moment – that of automatically generating targeted feedback for language learners that can help them improve their overall spoken language proficiency. We also present current hot topics in the field such as the automated scoring of dialog and multimodal data for language learning and assessment.

The presenters, based at Educational Testing Service R&D in Princeton and San Francisco, USA, have more than 40 years of combined R&D experience in spoken language processing for education, speech recognition, spoken dialog systems and automated speech scoring.


Interspeech 2018 Tutorial on Spoken Dialog Technology for Educational Domain Applications (with Keelan Evanini and David Suendermann-Oeft)

This tutorial introduces participants to the basics of designing conversational applications in the educational domain using spoken and multimodal dialog technology. The increasing maturation of automated conversational technologies in recent years holds much promise towards developing intelligent agents that can guide one or multiple phases of student instruction, learning, and assessment. In language learning applications, using spoken dialogue systems (SDS) could be an effective solution to improving conversational skills, because an SDS provides a convenient means for people to both practice and obtain feedback on different aspects of their conversational skills in a new language. These allow learners to make mistakes without feeling incompetent, empowering them to improve their skills for when they do speak with native speakers. From the assessment perspective, well-designed dialog agents have potential to elicit and evaluate the full range of English speaking skills (such as turn taking abilities, politeness strategies, pragmatic competence) that are required for successful communication. Such technologies can potentially personalize education to each learner, providing a natural and practical learning interface that can adapt to their individual strengths and weaknesses in real time so as to increase the efficacy of instruction.

The tutorial assumes no prior knowledge of dialog technology or intelligent tutoring systems and demonstrates the use of open-source software tools in building conversational applications. The first part of the tutorial covers the state of the art in dialog technologies for educational domain applications, with a particular focus on language learning and assessment. This includes an introduction to the various components of spoken dialog systems and how they can be applied to develop conversational applications in the educational domain, as well as some advanced topics such as methods for speech scoring. The final part of the tutorial (not fully represented in the slides below) is specifically dedicated to a hands-on application building session, where participants will have a chance to design and deploy their own dialog application from scratch on the HALEF cloud-based dialog platform using open-source OpenVXML design toolkit, which will allow a better understanding how such systems can potentially be designed and built.