In a recent conversation with Javed Khan, Cisco Collaboration’s senior vice president and general manager, the focus was on the importance of seamless AI integration into workflows to ensure its effectiveness. Cisco’s audio data is crucial for their generative AI work, with the challenge of training AI to differentiate between different types of dogs and detect various accents and dialects being ongoing.
Khan highlighted the significance of good audio data for accurate transcription, which is foundational for AI applications. Cisco’s advantage lies in their extensive audio and video data, acquired through acquisitions like Voicea and Babblelabs. These technologies form the basis for transcription and translation tools, with the ability to create artificial data for training purposes.
The conversation also delved into the complexity of analyzing accents and dialects in data. Khan discussed the meticulous process of acquiring, labeling, and validating data to account for variations in pronunciation and contextual meaning across different speakers. The human aspect of validation remains crucial, especially in understanding nuances like accents and background noise variation.
Unsurprisingly, the training process for AI models is time-consuming and involves continual refinement based on client feedback and additional data needs. Khan shared a humorous anecdote about addressing background noise from dogs during the pandemic, highlighting the nuances AI must account for in different types of barking patterns.
The discussion touched on the complexity of voice and video data analysis, including challenges like American Sign Language, which requires consideration of facial expressions and body positions. Despite these complexities, Khan noted that video intelligence solutions for sign language interpretation are relatively easier to develop due to a finite number of signs compared to the variability of accents.
Overall, the conversation underscored the importance of high-quality audio data for AI applications and the ongoing efforts to refine AI models for diverse linguistic and cultural contexts. Cisco’s focus on leveraging their robust audio and video data assets, combined with careful data labeling and validation processes, highlights their commitment to developing effective AI solutions for transcription and translation tasks.
Article Source
https://www.nojitter.com/ai-speech-technologies/conversations-collaboration-cisco’s-javed-khan-how-detecting-accents-takes