Filter articles by tags or search for specific topics:
Filter articles by tags or search for specific topics:
Added Date: December 10, 2024
Memo:
Original URL: https://aws.amazon.com/blogs/industries/bmw-cloud-data-hub-a-reference-implementation-of-the-modern-data-architecture-on-aws/
Added Date: December 9, 2024
Memo:
Original URL: https://aws.amazon.com/blogs/big-data/how-bmw-streamlined-data-access-using-aws-lake-formation-fine-grained-access-control/
Added Date: December 8, 2024
Memo: With AWS LakeFormation, creating filter packages and controlling access. A filter package provides a restricted view of a data asset by defining column and row filters on the tables.
Original URL: https://aws.amazon.com/blogs/database/building-a-gdpr-compliance-solution-with-amazon-dynamodb/
Added Date: December 7, 2024
Memo: To search user profiles to remove, we use an AWS Lambda function that queries Aurora, DynamoDB, and Athena and places those locations in a DynamoDB table specifically for GDPR requests.
Original URL: https://medium.com/airbnb-engineering/automation-platform-v2-improving-conversational-ai-at-airbnb-d86c9386e0cb
Added Date: December 5, 2024
Memo:
Original URL: https://netflixtechblog.com/netflixs-distributed-counter-abstraction-8d0c45eb66b2
Added Date: December 4, 2024
Memo: Netflix's Distributed Counter Abstraction is a scalable service designed to handle high-throughput counting operations with low latency. It supports two primary counter types: Best-Effort, which offers near-immediate access with potential slight inaccuracies, and Eventually Consistent, which ensures accurate counts with minimal delays. This abstraction is built atop Netflix's TimeSeries Abstraction and is managed via the Data Gateway Control Plane, allowing for flexible configuration and global deployment.
Original URL: https://netflixtechblog.com/introducing-netflix-timeseries-data-abstraction-layer-31552f6326f8
Added Date: December 2, 2024
Memo: Netflix's TimeSeries Data Abstraction Layer is designed to efficiently store and query vast amounts of temporal event data with low millisecond latency. It addresses challenges such as high throughput, efficient querying of large datasets, global read and write operations, tunable configurations, handling bursty traffic, and cost efficiency. The abstraction integrates with storage backends like Apache Cassandra and Elasticsearch, offering flexibility and scalability to support Netflix's diverse use cases.
Original URL: https://www.figma.com/blog/the-infrastructure-behind-ai-search-in-figma/
Added Date: December 1, 2024
Memo:
Original URL: https://xie.infoq.cn/article/852f508d81d24a17585793919
Added Date: November 28, 2024
Memo: 58 Group optimized its data integration platform using Apache SeaTunnel to handle over 500 billion daily data messages efficiently. This effort addressed challenges such as high reliability, throughput, low latency, and simplified maintenance. By evolving from Kafka Connect to SeaTunnel, the architecture now supports diverse data sources, enhanced task management, and real-time monitoring, with future plans to leverage AI for diagnostics and transition to cloud environments.
Original URL: https://aws.amazon.com/blogs/machine-learning/governing-the-ml-lifecycle-at-scale-part-3-setting-up-data-governance-at-scale/
Added Date: November 28, 2024
Memo: I think the challenges part still is the adaption of this kind of system, integration and change the workflow is a huge cost for the stakeholder team.