The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train their cutting-edge models.
Our client work on distributed model execution as well as the interfaces and implementation for model code, training, and inference.
Our client priorities are to maximize training throughput (how quickly we can train a new model) and researcher throughput (how quickly we can develop new models) with the goal of accelerating progress towards AGI.
Our client frequently collaborates with other teams to speed up the development of new capabilities.
About the Role
As a Distributed Systems/ML engineer, you will work on improving the training throughput for their internal training framework, while enabling researchers to experiment with new ideas.This requires good engineering (for example designing, implementing, and optimizing state-of-the-art AI models), writing bug-free machine learning code (surprisingly difficult!), and acquiring deep knowledge of the performance of supercomputers. In all the projects this role pursues, the goal is to push the field forward.
They’re looking for people who love optimizing performance, understanding distributed systems, and who cannot stand having bugs in their code.
In this role, you will:
Work with researchers to enable them to develop the next generation of models.
Have run small scale ML experiments.
Love figuring out how systems work and continuously produce ideas for how to make them faster while minimizing complexity and maintenance burden.
Have strong software engineering skills and are proficient in Python.
If you are interested in finding out more about this career opportunity, please email your resume to Cani Fan, email@example.comWeb: www.charterhouse.com.hk
Charterhouse Partnership Hong Kong is here to assist you in your job search. Our experienced recruitment consultants will provide you with career advice and assist you to develop a tailored job search strategy.