Job title:	Research Engineer・Research Scientist （Speech, Audio, and Music Foundation Models）
Job type:	Permanent
Emp type:	Full-time
Industry:	Consulting / コンサルティング
Functional Expertise:	Consulting / コンサルタント
Salary:	Negotiable
Location:	Tokyo
Job published:	2026-06-26
Job ID:	73461

Job Description

Career Opportunity: Research Engineer/ Scientist (Speech, Audio, and Music Foundation Models) in Tokyo, Japan

Note: This position is open to both Japan-based and overseas candidates. No Japanese language proficiency is required; business-level English is sufficient.

■ Research Engineer/ Scientist

■ Company Overview

Join a cutting-edge AI research team dedicated to advancing Speech, Audio, and Music Foundation Models. This role offers the opportunity to conduct world-class research using one of Japan's largest AI computing infrastructures while developing next-generation speech language models with real-world impact.

You will work alongside leading researchers to push the boundaries of Speech AI, Large Language Models (LLMs), Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Speech Translation, and Spoken Dialogue Systems.

■ Your Role and Responsibilities

●Research and develop state-of-the-art Speech Language Models.

●Design and improve technologies for:

●Automatic Speech Recognition (ASR)

●Speech Synthesis (TTS)

●Speech Translation

●Spoken Dialogue Systems

●Build, generate, and preprocess large-scale speech datasets.

●Develop novel training methods combining linguistic knowledge with speech processing.

●Design benchmarks to evaluate Speech Foundation Models.

●Publish research findings through top-tier academic conferences, journals, and patent applications.

●Collaborate with multidisciplinary AI research teams on large-scale model development.

■ Experience and Qualifications

●Experience developing machine learning models in one or more of the following areas:

●Speech Recognition

●Speech Synthesis

●Speech Translation

●Spoken Dialogue Systems

●Strong programming skills in Python.

●Hands-on experience with PyTorch or similar deep learning frameworks.

●Experience using Git/GitHub for collaborative software development.

●Degree in Computer Science, Artificial Intelligence, Machine Learning, or a related field (Master's degree preferred).

●Strong problem-solving skills and the ability to drive large-scale AI training initiatives.

Preferred:

●Ph.D. in Computer Science, AI, Speech Processing, or a related field.

●Publications at leading AI conferences such as:

●ICASSP

●INTERSPEECH

●ACL

●EMNLP

●Experience with distributed model training.

●Business-level Japanese and English communication skills.

■ Good Reasons to Join

●Work on next-generation Speech Foundation Models with real-world impact.

●Annual Salary: ¥6.5M – ¥18M

●Performance incentives may be provided separately.

●Salary is determined based on experience and qualifications.

●Access one of Japan's largest AI computing infrastructures.

●Collaborate with internationally recognized AI researchers.

●Publish research at leading international conferences.

●Contribute to technologies used by millions of users worldwide.

●Flexible work environment with strong support for research and innovation.

■ Work Location

Tokyo, Japan

If you're passionate about Speech AI, Audio AI, Music Foundation Models, Deep Learning, and Generative AI research, and want to work on cutting-edge technologies with significant real-world impact, we'd love to hear from you.

📩 Share your updated CV at kanika.pal@talisman-corporation.com

Details will be shared during the meeting.

Job type:	Permanent
Emp type:	Full-time
Salary from:	JPY ¥9,500,000
Salary to:	JPY ¥12,500,000
Job published:	2026-07-04
Job ID:	67085

Job Description

Our use of cookies