AWS Glue

AWS Glue is a serverless data integration service.

Tech tags:

Related shared contents:

Accelerating log analytics at scale with AWS Glue and Apache Iceberg materialized views

product

2026-07-02

This article discusses the challenges of managing high-volume application logs and how to overcome them using AWS Glue and Apache Iceberg materialized views. It provides a detailed solution for building an application log pipeline that enhances query performance by utilizing pre-computed query results. The architecture leverages several AWS services, including Amazon CloudWatch Logs, AWS Lambda, and Amazon Data Firehose, to create a scalable and efficient data pipeline. The article also outlines deployment steps and best practices for maintaining fast analytics performance on large-scale log data.
Build AWS Glue Data Quality pipeline using Terraform

product

2026-03-26

The article discusses the implementation of AWS Glue Data Quality pipelines using Terraform, highlighting two methods: ETL-based and Catalog-based Data Quality validation. It explains how these methods can ensure comprehensive data quality across data lakes and pipelines, utilizing a real-world dataset of NYC yellow taxi trips. The article emphasizes the benefits of Infrastructure as Code (IaC) practices for consistent and repeatable deployments, and provides a step-by-step guide for setting up the necessary resources in AWS.
Building a modern lakehouse architecture: Yggdrasil Gaming’s journey from BigQuery to AWS

project

2026-03-03

The article details Yggdrasil Gaming's migration from Google BigQuery to an AWS-based lakehouse architecture, highlighting the challenges faced due to multi-cloud operational complexity and the need for a scalable analytics foundation. It outlines the phased approach taken to establish a new architecture using AWS services, including Amazon S3, Apache Iceberg, and Amazon Athena, which enabled real-time data ingestion and advanced analytics capabilities. The migration resulted in significant cost savings, improved data freshness, and enhanced governance for analytics workloads. The article serves as a case study for organizations looking to modernize their data architecture.
Accelerating development with the AWS Data Processing MCP Server and Agent

project

2025-07-22

This Data Processing MCP Server can fully manage all the EMR, Athena and Glue services. You really don't need code any more...
Schema Change Management at Halodoc

spike

2025-03-07
How Formula 1® uses generative AI to accelerate race-day issue resolution

project

2025-02-18

Very classic Glue job pipeline to feed the AWS Bedrock Knowledge Bases for a RAG use case.
Jumia builds a next-generation data platform with metadata-driven specification frameworks

vision

2024-12-20
A First Look at S3 (Iceberg) Tables

tech1

2024-12-04

S3 Table bucket handle the Iceberg compaction and catalog maintenance tasks for you.
Views pwn Tables as data interfaces

project

2024-12-05

Twitch has leveraged Views in their Data Lake to enhance data agility, minimize downtime, and streamline development workflows. By utilizing Views as interfaces to underlying data tables, they've enabled seamless schema modifications, such as column renames and VARCHAR resizing, without necessitating data reprocessing. This approach has facilitated rapid responses to data quality issues and supported efficient ETL processes, contributing to a scalable and adaptable data infrastructure.
Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

product

2024-12-20
Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

product

2024-12-09

Without Iceberg, there are lot of overhead works to implement WAP pattern.
Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

product

2024-12-12

Build a process to built the complete data lineage information by merging the partial lineage generated by dbt automatically.
BMW Cloud Data Hub: A reference implementation of the modern data architecture on AWS

project

2022-06-22
How BMW streamlined data access using AWS Lake Formation fine-grained access control

project

2024-10-29

With AWS LakeFormation, creating filter packages and controlling access. A filter package provides a restricted view of a data asset by defining column and row filters on the tables.
Building a GDPR compliance solution with Amazon DynamoDB

project

2024-11-11

To search user profiles to remove, we use an AWS Lambda function that queries Aurora, DynamoDB, and Athena and places those locations in a DynamoDB table specifically for GDPR requests.
Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

product

2024-11-22

Classic RAG solution for this kind of application.

In productions with:

Funding Circle