Data Lake Engineer

Summary

VinBrain is a company funded by Vingroup, the largest conglomerate in Vietnam by market capitalization. Our mission is to perform cutting-edge research and development of AI, Machine Learning, and Deep Learning technologies and products that will lead to improved healthcare systems and quality of life. At VinBrain, we believe the greatest promise of Big Data lies in healthcare. We believe that by solving unique and challenging problems at the intersection of (medical) Big Data, AI, IoT, and IoP (Internet of People), we can improve the outcomes for patients around the globe. We have assembled seasoned leaders, engineers, and entrepreneurs with years of experience from world-class companies such as Microsoft, Amazon, Adobe, and Google to build the platform and services to achieve this. If you are motivated to be part of something special and to be a catalyst for improving lives and healthier communities, we want to talk to you.


Who we are looking for: experienced frontend and backend engineers who will play a central role in building out both the large-scale intelligent products that runs in the cloud to improve patients and doctor experiences. As a member of our dynamic, agile team, you will have influence on defining product features, design and implement products features and drive operational excellence, and spearhead the engineering best practices that enable a high-quality product.

 

Responsibilities:

  • Design, develop, and maintain a secure and scalable data lake on Azure to store structured and unstructured data from various healthcare sources, including HIS, PACS, RIS, LIS, and EMR.
  • Collaborate with cross-functional teams to understand data requirements and ensure the data lake architecture aligns with business objectives.
  • Implement data governance, data security, and compliance measures to protect sensitive healthcare information.
  • Optimize data storage, retrieval, and processing capabilities to ensure high performance and low latency.
  • Develop and maintain data pipelines for data ingestion, transformation, and integration into the Azure data lake.
  • Work closely with data scientists and machine learning engineers to enable seamless integration of Generative AI and Language Models (LLMs) for patient EMR summarization, medical information smart search, and AI-driven predictions.
  • Collaborate with Azure DevOps and CI/CD pipelines to automate deployment and ensure continuous integration and delivery of data lake components.
  • Practice disciplined software engineering (e.g., automated testing, code reviews)

 

Qualifications:

  • Bachelor's Degree in computer science or related field, or equivalent practical experience
  • Proven experience building and optimizing data lakes using Azure services (Azure Data Lake Storage, Azure Data Factory, Azure Databricks, etc.) in a healthcare or related industry.
  • Strong expertise in data modeling, data warehousing, and ETL processes.
  • Proficiency in programming languages such as Python, SQL, or Java.
  • Experience with a variety of SQL and NoSQL databases
  • Hands-on experience with cloud-based architectures, microservices, and containerization (Docker, Kubernetes).
  • Familiarity with data governance, compliance standards (HIPAA, GDPR), and healthcare data interoperability.
  • Excellent problem-solving skills and ability to work collaboratively in a team environment.
  • Strong communication skills to convey complex technical concepts to non-technical stakeholders.
  • Azure certifications (e.g., Microsoft Certified: Azure Data Engineer Associate) will be a plus.

Top

Chia sẻ