Vitrai Gábor

Vitrai Gábor

I am a  

About Me

Professional Summary

I am an enthusiastic Data Engineer with a strong background in Computer Science and more than 3 years of experience. I specialize in ETL development, data pipeline design, and data warehousing using both streaming and batch processing solutions. Experienced in leveraging the AWS Cloud and Apache Kafka for data-heavy projects. I have proven to be reliable as an On-Call Engineer and demonstrated strong collaboration and independent work skills.

Budapest, Hungary
Hungarian (Native) • English (C1) • Italian (A2)

Skills

ETL Development Data Pipeline Design Data Warehousing Data Migration Big Data Processing AWS Cloud Python/Java/Scala SQL (PostgreSQL) Apache Kafka OpenSearch CI/CD (GitLab) Docker Kubernetes Prometheus & Grafana Apache Spark Streaming Architecture PagerDuty

Work Experience & Education

Work Experience

2022. Sept. - Present

Data Engineer

SpazioDati Srl., Trento, Italy

Implemented new and maintained existing data processing ETL pipelines and large scale projects. Acquired, processed, migrated and stored, structured and unstructured data. Relied on AWS Cloud, Apache Kafka, Kubernetes, and Docker to implement dozens of pipelines. Monitored and maintained processes as an On-Call Engineer.

2022. May - 2022. Aug.

Data Engineer Intern

SpazioDati Srl., Trento, Italy

Created Machine Learning models for the task of Postal Address Extraction from Free Text.

2021. Jan. - 2021. May

Teacher Assistant in Software Technology

Eötvös Loránd University, Budapest, Hungary

Taught Coding Principles, Java, CI, Architectures and Project management for Bachelor Students. Took the roles of Scrum Master, Project Manager.

Education

2020. Sept. - 2022. July

Master of Science in Data Science

University of Trento/Eötvös Loránd University, Budapest (Hungary) / Trento (Italy)

Graduated with 110/110 Cum Laude
Thesis Statement: Multilingual Address Extraction From Websites.
EIT Digital Master School Double Degree Program

2017. Sept. - 2020. July

Bachelor of Science in Computer Science

Eötvös Loránd University, Budapest

Graduated with 5/5 Grade
Thesis Statement: Development of a Corporate Management System with the help of Spring Framework

Projects

Cloud Resume Challenge

A production-ready serverless resume website built on AWS, demonstrating cloud architecture best practices, infrastructure as code, and modern CI/CD workflows. This project showcases end-to-end cloud engineering skills from infrastructure provisioning to automated deployments.

Cloud Resume Challenge Infrastructure Architecture

Architecture Highlights

  • Infrastructure as Code (Terraform): All AWS resources provisioned and managed declaratively, including S3 buckets, CloudFront distribution, Route53 DNS records, ACM certificates, Lambda functions, and DynamoDB tables. State stored in versioned S3 backend with DynamoDB state locking to prevent concurrent modifications.
  • CloudFront CDN: Global content delivery with edge caching, Origin Access Control (OAC) for secure S3 access, custom SSL/TLS certificate, and geo-restrictions for European traffic.
  • GitHub Actions CI/CD: Automated testing and deployment pipeline with branch-based workflows - tests run on feature/develop branches, full deployment to AWS on main branch merges. Includes HTML validation, JavaScript linting, S3 sync, and CloudFront cache invalidation.
  • Serverless View Counter: Lambda Function URL with Python runtime, atomic DynamoDB updates, CORS protection, and sub-second response times for real-time visitor tracking.
  • Route53 DNS Management: Custom domain with A/AAAA records aliased to CloudFront, automated DNS validation for SSL certificates.
  • AWS Certificate Manager: Free SSL/TLS certificate with automatic renewal, deployed in us-east-1 for CloudFront compatibility.
  • S3 Static Hosting: Versioned bucket with encryption at rest, private access via CloudFront OAC, optimized cache headers for performance.
  • DynamoDB: Serverless NoSQL database with on-demand billing, atomic increment operations, and single-digit millisecond latency.

Separation of Concerns

  • Terraform manages infrastructure: Long-lived resources like S3 buckets, CloudFront, DNS, certificates, Lambda, and DynamoDB.
  • GitHub Actions manages content: Website files (HTML, CSS, JS, images) deployed automatically on code changes.
Terraform AWS CloudFront AWS Lambda DynamoDB S3 Route53 ACM GitHub Actions Python

Certifications

Achievements

Migrated legacy batching pipelines to streaming solutions

We maintained large-scale legacy pipelines, processing data using batching solutions. With my team, we have successfully migrated away from these legacy systems and replaced them with state-of-the-art streaming solutions, providing live updates and more a cost-effective infrastructure.