Data Engineer (m/f/d) at lengoo (Berlin, Germany)

YOUR MISSION

Have ownership over the data warehousing and data harvesting pipeline
Discover and acquire additional data sets for Machine Translation through researching open or closed source data, and/or web crawling
Improve data pipelining in the Machine Translation team
Ensure data and model scalability

YOUR RESPONSIBILITIES

Design and implement scalable data processes (think terabytes of text) for data storing, versioning and documenting
Develop automatic data harvesting services for monolingual and bilingual textual data
Architect, develop and maintain data web services for consuming the harvested data
Build and automate data preprocessing pipelines for the training of Machine Translation models
Introduce data pipelining solutions to the data preprocessing needs using technologie like Airflow, Luigi, similar
Work day-to-day with researchers to improve data collection processes
Prepare data for Machine Translation model training

YOUR PROFILE

You have 2+ years of experience working as Data Engineer, Python Engineer or related
Strong software engineering experience with an eye for clean, future-proof code
Strong data wrangling skills, including extracting, transforming, cleaning, and augmenting data, as well as standardizing data formats
Comfortable with Linux/Unix tools and bash
Experience working in the cloud (e.g. GCP, Azure, AWS, etc.)
Experience with productionizing tools Docker, Jenkins, Github actions, Kubernetes
Experience with Machine Learning and/or Data Science is a plus, but not required
You have solid communication skills and experience interacting with stakeholders from a multidisciplinary team
You are fluent in English and feel comfortable in a fast-paced and international environment