Job Description
Career Opportunity: Research Engineer/ Scientist (Speech, Audio, and Music Foundation Models) in Tokyo, Japan
Note: This position is open to both Japan-based and overseas candidates. No Japanese language proficiency is required; business-level English is sufficient.
■ Research Engineer/ Scientist
■ Company Overview
Join a cutting-edge AI research team dedicated to advancing Speech, Audio, and Music Foundation Models. This role offers the opportunity to conduct world-class research using one of Japan's largest AI computing infrastructures while developing next-generation speech language models with real-world impact.
You will work alongside leading researchers to push the boundaries of Speech AI, Large Language Models (LLMs), Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Speech Translation, and Spoken Dialogue Systems.
■ Your Role and Responsibilities
●Research and develop state-of-the-art Speech Language Models.
●Design and improve technologies for:
●Automatic Speech Recognition (ASR)
●Speech Synthesis (TTS)
●Speech Translation
●Spoken Dialogue Systems
●Build, generate, and preprocess large-scale speech datasets.
●Develop novel training methods combining linguistic knowledge with speech processing.
●Design benchmarks to evaluate Speech Foundation Models.
●Publish research findings through top-tier academic conferences, journals, and patent applications.
●Collaborate with multidisciplinary AI research teams on large-scale model development.
■ Experience and Qualifications
●Experience developing machine learning models in one or more of the following areas:
●Speech Recognition
●Speech Synthesis
●Speech Translation
●Spoken Dialogue Systems
●Strong programming skills in Python.
●Hands-on experience with PyTorch or similar deep learning frameworks.
●Experience using Git/GitHub for collaborative software development.
●Degree in Computer Science, Artificial Intelligence, Machine Learning, or a related field (Master's degree preferred).
●Strong problem-solving skills and the ability to drive large-scale AI training initiatives.
Preferred:
●Ph.D. in Computer Science, AI, Speech Processing, or a related field.
●Publications at leading AI conferences such as:
●ICASSP
●INTERSPEECH
●ACL
●EMNLP
●Experience with distributed model training.
●Business-level Japanese and English communication skills.
■ Good Reasons to Join
●Work on next-generation Speech Foundation Models with real-world impact.
●Annual Salary: ¥6.5M – ¥18M
●Performance incentives may be provided separately.
●Salary is determined based on experience and qualifications.
●Access one of Japan's largest AI computing infrastructures.
●Collaborate with internationally recognized AI researchers.
●Publish research at leading international conferences.
●Contribute to technologies used by millions of users worldwide.
●Flexible work environment with strong support for research and innovation.
■ Work Location
Tokyo, Japan
If you're passionate about Speech AI, Audio AI, Music Foundation Models, Deep Learning, and Generative AI research, and want to work on cutting-edge technologies with significant real-world impact, we'd love to hear from you.
📩 Share your updated CV at kanika.pal@talisman-corporation.com
Details will be shared during the meeting.