Data Engineer

Medcare MSO is one of the largest USA-based Healthcare IT organization in Pakistan, with 950+ people on board. We implement best practices and adopt state-of-the-art technology tools to achieve results. As a Data Engineer, you will be responsible for building and maintaining the data infrastructure that powers Medcare’s AI and analytics systems. You will work closely with ML Engineers and Product teams to develop reliable, scalable, and secure data pipelines across structured and unstructured healthcare data sources. Your role will focus on ensuring high-quality, accessible, and compliant data that enables downstream machine learning models and business-critical workflows. This role requires strong fundamentals in data engineering, a solid understanding of data systems, and the ability to work in regulated environments with sensitive data.

 

Position: Data Engineer

Shift Timing: 02:00pm-11:00pm

Location: MMalam, Lahore

 

Key Responsibilities:

  • Build, maintain, and optimize batch and near real-time data pipelines for ingesting, transforming, and loading data from multiple internal and external sources
  • Develop robust ETL/ELT workflows for structured and semi-structured healthcare data, ensuring data quality, consistency, and reliability
  • Write efficient, scalable SQL queries and Python-based data processing scripts for large datasets
  • Collaborate with ML Engineers to prepare and serve clean, feature-ready datasets for training and inference workflows
  • Implement data validation, quality checks, and monitoring to ensure pipeline reliability and integrity
  • Work with orchestration tools to schedule, manage, and monitor pipeline execution
  • Ensure secure handling of sensitive healthcare data, including PHI-safe practices, access control, and audit logging
  • Maintain and optimize data storage systems including data warehouses, data lakes, and hybrid architectures
  • Contribute to schema design, data modeling, and standardization of data structures across systems
  • Support data versioning, lineage tracking, and reproducibility of datasets used in ML workflows
  • Collaborate with engineering teams to integrate data pipelines with downstream applications and APIs

 

Qualifications:

  • Bachelor’s degree in Computer Science, Data Engineering, or a related discipline
  • 2-3 years of experience in Data Engineering with strong preference for 1+ year of rigorous hands-on experience, including freelance or independent project work demonstrating real-world problem solving
  • Demonstrated ability to write clean, efficient, and production-grade code independently, with a strong emphasis on first-principles problem solving rather than reliance on code-generation tools or low-code/no code platforms
  • Strong proficiency in SQL and Python for data processing and transformation
  • Hands-on experience building ETL/ELT pipelines using tools such as Airflow, dbt, or Azure Data Factory
  • Solid understanding of data modeling concepts, including normalization, denormalization, and schema design
  • Experience working with relational and non-relational databases
  • Familiarity with data warehouses and data lake architectures
  • Understanding of data quality, validation techniques, and pipeline monitoring
  • Experience working with large-scale datasets and performance optimization
  • Familiarity with healthcare data or regulated environments is a plus
  • Experience with Microsoft Azure (e.g., Azure Data Factory, Synapse, Blob Storage) and serverless data workflows is a plus
  • Strong problem-solving skills with the ability to work on ambiguous data challenges
  • Strong cross-functional communication and documentation skills
  • Demonstrated ability to write efficient and reliable data pipelines from first principles, without over-reliance on low-code tools or automated pipeline generators