Senior Data Engineer - Python (m/d/f) at ResearchGate GmbH (Berlin, Germany)

The missionThe web was created by scientists and for scientists, to foster scientific collaboration and drive progress for a better world. Join our team to take the web back to its roots and achieve that original mission.We’re a passionate team of pragmatic optimists from around the world and from many different backgrounds. Together, we focus on building great products that change the way scientists communicate for the better.We love what we do. We connect the world of science and make research open to all.The positionAs part of ResearchGate’s data engineering team, you are working at the core of our data pipelines. These are not only helping our Analytics and Business departments to make the right decisions but also enabling product teams to craft the data-driven product features that make science more effective and fast on ResearchGate. Join us and help shape our data infrastructure to be reliable, robust and fast!

Responsibilities

Become an essential member of our Machine Learning Infrastructure Architecture Team and shape the long-term vision of ML at ResearchGate
Develop a system that enables data teams to quickly iterate on ML-based workloads and easily deploy their models to our production systems
Ensure that the data pipelines we use at ResearchGate are ready for future challenges
Provide technical leadership, influence, and partner with fellow engineers to architect, design and build infrastructure that withstands scale and availability while reducing operational overhead
Engineer efficient, adaptable and scalable data architectures to make building and maintaining big data applications easy and enjoyable for others
Build fault-tolerant, self-healing, adaptive, and highly accurate data computational pipelines
Work with data scientists, data analysts, backend engineers, and product managers to solve problems, identify trends and leverage the data we produce
Build workflows involving large datasets and/or machine learning models in production using distributed computing and big data processing concepts and technologies

Requirements

Experience in designing and implementing data pipelines and ML applications
Working with data at the petabyte scale
Design and operation of robust distributed systems
Experience in Python is a must, experience in Java is a plus

Working knowledge of relational databases and query authoring (SQL)
Experience using technologies like Kafka, Hadoop, Hive, and Flink
Experience in using machine learning tools/frameworks/libraries, such as Python, R, Jupyter Notebook, scikit-learn, PyTorch, Tensorflow is a plus

EnvironmentYou’ll be working in a team-based environment where code is written, tested and shipped continuously. Our engineering team is passionate about building maintainable, scalable web applications that are constantly optimized to meet the needs of our users - 15+ million researchers worldwide.Our hiring process is uncomplicated. You’ll be interviewed by the people you’ll be working with, so you can quickly find the role that suits you best and start making an impact.We’re located at the heart of Berlin, one of the most exciting cities in the world and a place where people from all walks of life feel welcome. Work to change the world of science and have a good time while you’re at it: we offer free, healthy lunches and many fun events. Link: https://www.researchgate.net/careers