Vitrai Gábor
I am a
About Me
Professional Summary
I am an enthusiastic Data Engineer with a strong background in Computer Science and more than 3 years of experience. I specialize in ETL development, data pipeline design, and data warehousing using both streaming and batch processing solutions. Experienced in leveraging the AWS Cloud and Apache Kafka for data-heavy projects. I have proven to be reliable as an On-Call Engineer and demonstrated strong collaboration and independent work skills.
Skills
Work Experience & Education
Work Experience
Data Engineer
SpazioDati Srl., Trento, Italy
Implemented new and maintained existing data processing ETL pipelines and large scale projects. Acquired, processed, migrated and stored, structured and unstructured data. Relied on AWS Cloud, Apache Kafka, Kubernetes, and Docker to implement dozens of pipelines. Monitored and maintained processes as an On-Call Engineer.
Data Engineer Intern
SpazioDati Srl., Trento, Italy
Created Machine Learning models for the task of Postal Address Extraction from Free Text.
Teacher Assistant in Software Technology
Eötvös Loránd University, Budapest, Hungary
Taught Coding Principles, Java, CI, Architectures and Project management for Bachelor Students. Took the roles of Scrum Master, Project Manager.
Education
Master of Science in Data Science
University of Trento/Eötvös Loránd University, Budapest (Hungary) / Trento (Italy)
Graduated with 110/110 Cum Laude
Thesis Statement: Multilingual Address Extraction From Websites.
EIT Digital Master School Double Degree Program
Bachelor of Science in Computer Science
Eötvös Loránd University, Budapest
Graduated with 5/5 Grade
Thesis Statement: Development of a Corporate Management System with the help of Spring Framework
Projects
Cloud Resume Challenge
A production-ready serverless resume website built on AWS, demonstrating cloud architecture best practices, infrastructure as code, and modern CI/CD workflows. This project showcases end-to-end cloud engineering skills from infrastructure provisioning to automated deployments.
Architecture Highlights
- Infrastructure as Code (Terraform): All AWS resources provisioned and managed declaratively, including S3 buckets, CloudFront distribution, Route53 DNS records, ACM certificates, Lambda functions, and DynamoDB tables. State stored in versioned S3 backend with DynamoDB state locking to prevent concurrent modifications.
- CloudFront CDN: Global content delivery with edge caching, Origin Access Control (OAC) for secure S3 access, custom SSL/TLS certificate, and geo-restrictions for European traffic.
- GitHub Actions CI/CD: Automated testing and deployment pipeline with branch-based workflows - tests run on feature/develop branches, full deployment to AWS on main branch merges. Includes HTML validation, JavaScript linting, S3 sync, and CloudFront cache invalidation.
- Serverless View Counter: Lambda Function URL with Python runtime, atomic DynamoDB updates, CORS protection, and sub-second response times for real-time visitor tracking.
- Route53 DNS Management: Custom domain with A/AAAA records aliased to CloudFront, automated DNS validation for SSL certificates.
- AWS Certificate Manager: Free SSL/TLS certificate with automatic renewal, deployed in us-east-1 for CloudFront compatibility.
- S3 Static Hosting: Versioned bucket with encryption at rest, private access via CloudFront OAC, optimized cache headers for performance.
- DynamoDB: Serverless NoSQL database with on-demand billing, atomic increment operations, and single-digit millisecond latency.
Separation of Concerns
- Terraform manages infrastructure: Long-lived resources like S3 buckets, CloudFront, DNS, certificates, Lambda, and DynamoDB.
- GitHub Actions manages content: Website files (HTML, CSS, JS, images) deployed automatically on code changes.
Certifications
AWS Certified Cloud Practitioner
Amazon Web Services (2023)
HashiCorp Certified Terraform Associate
HashiCorp (2025)
Achievements
Migrated legacy batching pipelines to streaming solutions
We maintained large-scale legacy pipelines, processing data using batching solutions. With my team, we have successfully migrated away from these legacy systems and replaced them with state-of-the-art streaming solutions, providing live updates and more a cost-effective infrastructure.