Job title: Research Engineer・Research Scientist (Speech, Audio, and Music Foundation Models)
Job type: Permanent
Emp type: Full-time
Industry: Consulting / コンサルティング
Functional Expertise: Consulting / コンサルタント
Salary: Negotiable
Location: Tokyo
Job published: 2026-06-26
Job ID: 73461

Job Description

Career Opportunity: Research Engineer/ Scientist (Speech, Audio, and Music Foundation Models)  in Tokyo, Japan

Note: This position is open to both Japan-based and overseas candidates. No Japanese language proficiency is required; business-level English is sufficient.

■ Research Engineer/ Scientist 

■ Company Overview

Join a cutting-edge AI research team dedicated to advancing Speech, Audio, and Music Foundation Models. This role offers the opportunity to conduct world-class research using one of Japan's largest AI computing infrastructures while developing next-generation speech language models with real-world impact.

You will work alongside leading researchers to push the boundaries of Speech AI, Large Language Models (LLMs), Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Speech Translation, and Spoken Dialogue Systems.

■ Your Role and Responsibilities 

●Research and develop state-of-the-art Speech Language Models.

●Design and improve technologies for: 

●Automatic Speech Recognition (ASR)

●Speech Synthesis (TTS)

●Speech Translation

●Spoken Dialogue Systems

●Build, generate, and preprocess large-scale speech datasets.

●Develop novel training methods combining linguistic knowledge with speech processing.

●Design benchmarks to evaluate Speech Foundation Models.

●Publish research findings through top-tier academic conferences, journals, and patent applications.

●Collaborate with multidisciplinary AI research teams on large-scale model development.

 

■ Experience and Qualifications

●Experience developing machine learning models in one or more of the following areas:

●Speech Recognition

●Speech Synthesis

●Speech Translation

●Spoken Dialogue Systems

●Strong programming skills in Python.

●Hands-on experience with PyTorch or similar deep learning frameworks.

●Experience using Git/GitHub for collaborative software development.

●Degree in Computer Science, Artificial Intelligence, Machine Learning, or a related field (Master's degree preferred).

●Strong problem-solving skills and the ability to drive large-scale AI training initiatives.

 

Preferred:

●Ph.D. in Computer Science, AI, Speech Processing, or a related field.

●Publications at leading AI conferences such as:

●ICASSP

●INTERSPEECH

●ACL

●EMNLP

●Experience with distributed model training.

●Business-level Japanese and English communication skills.

 

■ Good Reasons to Join

●Work on next-generation Speech Foundation Models with real-world impact.

Annual Salary: ¥6.5M – ¥18M

●Performance incentives may be provided separately.

●Salary is determined based on experience and qualifications.

●Access one of Japan's largest AI computing infrastructures.

●Collaborate with internationally recognized AI researchers.

●Publish research at leading international conferences.

●Contribute to technologies used by millions of users worldwide.

●Flexible work environment with strong support for research and innovation.

 

■ Work Location

Tokyo, Japan


If you're passionate about Speech AI, Audio AI, Music Foundation Models, Deep Learning, and Generative AI research, and want to work on cutting-edge technologies with significant real-world impact, we'd love to hear from you.

 

📩 Share your updated CV at kanika.pal@talisman-corporation.com

Details will be shared during the meeting.

File types (doc, docx, pdf, rtf, png, jpeg, jpg, bmp, jng, ppt, pptx, csv, gif) size up to 5MB
File types (doc, docx, pdf, rtf, png, jpeg, jpg, bmp, jng, ppt, pptx, csv, gif) size up to 5MB