Main Duties and Responsibilities
- Help us create AI / ML ready datasets from Petabytes of raw data and meta-data
- Automate integration of different data-sources into a coherent flow/ data pipelines (support also data normalization and result calculation)
- Develop and build systems and architectures for ETLs
- Perform system & data testing
- Understand and apply FAIR data principles
- Strong adherence to compliance & regulatory environments
- Build algorithms to
Essential Requirements
- Computer Science, Engineering, or Bioinformatics (Master level) plus 5 years relevant experience
- Excellent programming skills (Python, C++, R)
- Experience in designing and implementing RESTful APIs and webservices
- An ability to interact with various data sources, both structured and unstructured (e.g. HDFS, SQL, noSQL)
- Experience working across multiple scientific compute environments to create data workflows and pipelines (e.g. HPC, cloud, Unix/Linux systems)
Desirable:
- Expertise with biological/health data
- Experience modelling data and information for graph/network representation,
- Experience of working with metadata models, controlled vocabularies and ontologies
- Ability to understand, map, integrate, and document complex data relationship and business rules
- Familiarity with data quality, cleaning and masking techniques
- Modern frameworks and concepts for scalable and distributed computation (containerization and orchestration e.g. k8s, specialized frameworks such as Spark, Hadoop, …)
- Experience with image processing and computer graphics
- Experience with cloud computing
We are looking forward to your application!
Link: https://jobs.definiens.com/Definiens/search/