The missionThe web was created by scientists and for scientists, to foster scientific collaboration and drive progress for a better world. Join our team to take the web back to its roots and achieve that original mission.We’re a passionate team of pragmatic optimists from around the world and from many different backgrounds. Together, we focus on building great products that change the way scientists communicate for the better.We love what we do. We connect the world of science and make research open to all.The positionAs part of ResearchGate’s data engineering team, you are working at the core of our data pipelines. These are not only helping our Analytics and Business departments to make the right decisions but also enabling product teams to craft the data-driven product features that make science more effective and fast on ResearchGate. Join us and help shape our data infrastructure to be reliable, robust and fast!
Responsibilities
- Become an essential member of our Machine Learning Infrastructure Architecture Team and shape the long-term vision of ML at ResearchGate
- Develop a system that enables data teams to quickly iterate on ML-based workloads and easily deploy their models to our production systems
- Ensure that the data pipelines we use at ResearchGate are ready for future challenges
- Provide technical leadership, influence, and partner with fellow engineers to architect, design and build infrastructure that withstands scale and availability while reducing operational overhead
- Engineer efficient, adaptable and scalable data architectures to make building and maintaining big data applications easy and enjoyable for others
- Build fault-tolerant, self-healing, adaptive, and highly accurate data computational pipelines
- Work with data scientists, data analysts, backend engineers, and product managers to solve problems, identify trends and leverage the data we produce
- Build workflows involving large datasets and/or machine learning models in production using distributed computing and big data processing concepts and technologies
Requirements
- Experience in designing and implementing data pipelines and ML applications
- Working with data at the petabyte scale
- Design and operation of robust distributed systems
- Experience in Python is a must, experience in Java is a plus
- Working knowledge of relational databases and query authoring (SQL)
- Experience using technologies like Kafka, Hadoop, Hive, and Flink
- Experience in using machine learning tools/frameworks/libraries, such as Python, R, Jupyter Notebook, scikit-learn, PyTorch, Tensorflow is a plus