Are combination and linguistic models are acceptable?

Dear Organizer,
We currently have a pipeline designed for cognitive impairment detection that integrates acoustic transformer models with handcrafted acoustic features. In addition, the pipeline incorporates linguistic models—both handcrafted and transformer-based—to analyze syntactic and semantic aspects of language as well as speech fluency.

Could you please clarify whether you are specifically seeking a pipeline built exclusively on acoustic models, or if a pipeline that integrates linguistic models with acoustic models would also be acceptable? Thank you.

We encourage solvers to explore all possible features, including linguistic and semantic ones, as long as those features are derived from the acoustic data. One note to be aware of is that features should be generated automatically. So you could use a pretrained model to get text from the audio, but you should not be manually transcribing audio to text.

Does that answer your question?