Job title: Data Engineer (SR)
Job type: Permanent
Emp type: Full-time
Industry: IT & Telecommunications / IT・通信
Functional Expertise: Technical (IT) / 技術職(IT)
Salary: Negotiable
Job published: 2026-01-28
Job ID: 69110

Job Description

Career Opportunity for a Data Engineer in Japan!

 

■ Data Engineer

 

■ Company Overview

A research-driven organization at the intersection of AI and robotics, focused on advancing next-generation intelligent systems through cutting-edge research and real-world applications.

 

■ Your Role and Responsibilities 

● Design and implement large-scale data pipelines that cover the full lifecycle of high-quality datasets for robotics foundation models—collection, processing, curation, and publishing. 

● Design, build, and maintain data schemas, storage solutions, and query interfaces to enable VLA researchers to efficiently discover, query, and consume curated datasets. 

● Collaborate closely with VLA researchers to capture evolving data requirements and continuously improve data pipelines through analysis and experimentation.

● Design and scale distributed data-processing pipelines capable of handling petabyte-scale multimodal datasets (e.g., RGB/Depth, point clouds) with full lineage and reproducibility. 

● Define data-quality metrics and build feedback loops to continuously monitor and improve data quality.

 

■ Experience and Qualifications

● Master’s degree in Computer Science, Engineering, or related field (or equivalent practical experience).

● 5+ years professional experience in data engineering / data platform development.

● Experience in designing and operating large-scale ETL / ELT pipelines using Spark, Flink, Ray or similar distributed engine.

● Designed or led implementations using Delta Lake, Apache Iceberg, or Hudi.

● Integrated with Trino, Athena, Databricks SQL, or Glue/Unity Catalog

● Defined schema evolution, ACID compliance, partitioning strategy, time travel, and cost-performance optimization.

 

■ Additional Preferred Qualifications

● Experience working with terabyte or petabyte-scale datasets.

● Expertise in data lake storage systems such as Apache Iceberg or Delta Lake with query systems such as Trino and catalog systems such as Nessie.

● Expertise in distributed processing frameworks like Spark, Flink, or Ray.

 

■ Good Reasons to Join

● Be part of a research-first environment

 

■ Work Location

Tokyo, Japan
 

Details will be provided during the meeting.

File types (doc, docx, pdf, rtf, png, jpeg, jpg, bmp, jng, ppt, pptx, csv, gif) size up to 5MB
File types (doc, docx, pdf, rtf, png, jpeg, jpg, bmp, jng, ppt, pptx, csv, gif) size up to 5MB