Case Study: Research Data Management System

1. Business Need

A healthcare organization needed to improve its process for accessing and retrieving data for epidemiological research. The existing process was manual – field selection, population filtering, and structuring the dataset – all performed via Word documents and email threads, resulting in inconsistencies, errors, and inefficiencies across research teams.

2. The Challenge

  • Aggregating data from diverse sources (Data Lake, databases)
  • Translating technical tables into human-readable research fields
  • Coordinating between multiple roles: researcher, epidemiologist, and data engineer
  • Maintaining consistency and version control across iterations
  • Supporting external inputs (e.g., CSV files)

3. The Solution

Phase 1: ETL to Data Lake

Automated scheduled ETL using Talend from various data sources into Cloudera Hadoop. Data is standardized and stored in Parquet format for structured access.

Phase 2: Web-Based Research Management

A full-featured web application (React + Spring Boot) was developed.

  • Support for calculated fields and external research documents
  • Human-readable catalog of all Ministry of Health data sources
  • Advanced field filtering with version history per iteration

Phase 3: Output Generation

The system generates both HTML previews and XLSX files for structured data delivery. Final outputs are sent to the data engineering team for extraction execution.

4. Results

  • Reduced turnaround time from days to hours
  • Unified, auditable extraction protocol
  • Minimized manual errors

5. Technologies Used

  • Frontend: React + TypeScript
  • Backend: Spring Boot (REST API)
  • Database: PostgreSQL
  • Data Lake: Cloudera (HDFS)
  • ETL: Talend

Share the Case Study:

We'd Love To Hear From You