AWS Glue is a serverless data integration service.
Tech tags:
Related shared contents:
-
product2026-03-26
The article discusses the implementation of AWS Glue Data Quality pipelines using Terraform, highlighting two methods: ETL-based and Catalog-based Data Quality validation. It explains how these methods can ensure comprehensive data quality across data lakes and pipelines, utilizing a real-world dataset of NYC yellow taxi trips. The article emphasizes the benefits of Infrastructure as Code (IaC) practices for consistent and repeatable deployments, and provides a step-by-step guide for setting up the necessary resources in AWS.
-
project2026-03-03
The article details Yggdrasil Gaming's migration from Google BigQuery to an AWS-based lakehouse architecture, highlighting the challenges faced due to multi-cloud operational complexity and the need for a scalable analytics foundation. It outlines the phased approach taken to establish a new architecture using AWS services, including Amazon S3, Apache Iceberg, and Amazon Athena, which enabled real-time data ingestion and advanced analytics capabilities. The migration resulted in significant cost savings, improved data freshness, and enhanced governance for analytics workloads. The article serves as a case study for organizations looking to modernize their data architecture.
-
project2025-07-22
This Data Processing MCP Server can fully manage all the EMR, Athena and Glue services. You really don't need code any more...
-
spike2025-03-07
-
project2025-02-18
Very classic Glue job pipeline to feed the AWS Bedrock Knowledge Bases for a RAG use case.
-
vision2024-12-20
-
tech12024-12-04
S3 Table bucket handle the Iceberg compaction and catalog maintenance tasks for you.
-
project2024-12-05
Twitch has leveraged Views in their Data Lake to enhance data agility, minimize downtime, and streamline development workflows. By utilizing Views as interfaces to underlying data tables, they've enabled seamless schema modifications, such as column renames and VARCHAR resizing, without necessitating data reprocessing. This approach has facilitated rapid responses to data quality issues and supported efficient ETL processes, contributing to a scalable and adaptable data infrastructure.
-
product2024-12-20
-
product2024-12-09
Without Iceberg, there are lot of overhead works to implement WAP pattern.
-
product2024-12-12
Build a process to built the complete data lineage information by merging the partial lineage generated by dbt automatically.
-
project2022-06-22
-
project2024-10-29
With AWS LakeFormation, creating filter packages and controlling access. A filter package provides a restricted view of a data asset by defining column and row filters on the tables.
-
project2024-11-11
To search user profiles to remove, we use an AWS Lambda function that queries Aurora, DynamoDB, and Athena and places those locations in a DynamoDB table specifically for GDPR requests.
-
product2024-11-22
Classic RAG solution for this kind of application.
-