AWS Cloud and Spark Architect

Lovefreedom Solution

Bloomfield, CT, USA

Published: 6/14/2022

Technology

Full Time

Job Description

Job DescriptionPrimary Job Title: Lead AWS Cloud Apache Spark Architect

Industry Sector: Enterprise Cloud Data Engineering and Big Data Analytics. We design, deploy and operate high-scale AWS-native data platforms and analytics pipelines for enterprise customers—supporting batch and real-time ML/BI workloads across finance, healthcare, and adtech. This is an onsite U.S. role focused on architecting secure, cost-efficient Spark-based processing at scale.

Role Responsibilities

Architect and deliver AWS-native big data platforms and data lake solutions using S3, EMR, Glue, Redshift and EKS—designing for performance, scale and resiliency.
Lead migration efforts from on-prem Hadoop/Cloudera ecosystems to AWS (EMR/EKS/Glue), defining cutover strategies, data validation, and rollback plans.
Optimize Apache Spark (PySpark/Scala) jobs and clusters for throughput, latency and cost—tuning shuffle, partitioning, memory/executor settings and job scheduling.
Implement IaC and production-grade CI/CD for data pipelines using Terraform/CloudFormation and pipelines (Jenkins, GitLab CI), including automated testing and deployment safeguards.
Define and enforce security, governance and networking best practices (IAM, VPC design, encryption, data lineage, access controls) for enterprise workloads.
Mentor engineering teams, run architecture reviews, set operational runbooks, and drive capacity planning and observability standards.

Skills Qualifications

Must-Have: 7+ years hands-on AWS experience (EMR, S3, Glue, Redshift, EC2) and deep Apache Spark expertise (PySpark and/or Scala) including production performance tuning and debugging.
Must-Have: Proven track record migrating on-prem Hadoop or legacy ETL to AWS and operating Spark in EMR/EKS at enterprise scale.
Must-Have: Strong IaC CI/CD skills (Terraform/CloudFormation, Jenkins/GitLab/GitHub Actions), containerization (Docker) and Kubernetes/EKS experience.
Preferred: Experience with streaming (Kafka/Kinesis), Spark Structured Streaming, Delta Lake or Iceberg and event-driven architectures.
Preferred: Solid understanding of security compliance (IAM, encryption, SOC2/HIPAA awareness), VPC/networking and observability tooling (CloudWatch, Prometheus, Grafana).
Preferred: Bachelor’s/Master’s in CS or related field and prior leadership/architect role in enterprise data platform projects.

Benefits Culture Highlights

On-site U.S. role with ownership of high-impact modernization projects and visible cross-functional influence.
Engineering-first culture that values mentorship, technical excellence, and measurable business outcomes.
Learning development support—conferences, certifications, and hands-on opportunities to build large-scale, production data systems.

Location Work Type: United States — Onsite (candidate must be based in or willing to relocate to the U.S. and work from the office).

Keywords: AWS, Apache Spark, EMR, PySpark, Scala, Terraform, EKS, Kafka, Glue, Redshift, S3, data-lake, streaming, performance tuning, migration, IaC, CI/CD, security, observability.