Case Study: Research Data Management System

1. Business Need

A healthcare organization needed to improve its process for accessing and retrieving data for epidemiological research. The existing process was manual – field selection, population filtering, and structuring the dataset – all performed via Word documents and email threads, resulting in inconsistencies, errors, and inefficiencies across research teams.

2. The Challenge

Aggregating data from diverse sources (Data Lake, databases)
Translating technical tables into human-readable research fields
Coordinating between multiple roles: researcher, epidemiologist, and data engineer
Maintaining consistency and version control across iterations
Supporting external inputs (e.g., CSV files)

3. The Solution

Phase 1: ETL to Data Lake

Automated scheduled ETL using Talend from various data sources into Cloudera Hadoop. Data is standardized and stored in Parquet format for structured access.

Phase 2: Web-Based Research Management

A full-featured web application (React + Spring Boot) was developed.

Support for calculated fields and external research documents
Human-readable catalog of all Ministry of Health data sources
Advanced field filtering with version history per iteration

Phase 3: Output Generation

The system generates both HTML previews and XLSX files for structured data delivery. Final outputs are sent to the data engineering team for extraction execution.

4. Results

Reduced turnaround time from days to hours
Unified, auditable extraction protocol
Minimized manual errors

5. Technologies Used

Frontend: React + TypeScript
Backend: Spring Boot (REST API)
Database: PostgreSQL
Data Lake: Cloudera (HDFS)
ETL: Talend

Share the Case Study: