Filter articles by tags or search for specific topics:
Filter articles by tags or search for specific topics:
Original URL: https://www.anthropic.com/research/building-effective-agents
Added Date: January 31, 2025
Memo: Workflow: Evaluator-optimizer is interesting.
Added Date: January 30, 2025
Memo: Running local mode Spark cluster in k8 pods to processing the small files coming, this mode is more efficient than running big Spark cluster to process huge amount files in batch.
Original URL: https://www.linkedin.com/blog/engineering/ai/automated-genai-driven-search-quality-evaluation
Added Date: January 29, 2025
Memo:
Original URL: https://engineering.fb.com/2025/01/22/security/how-meta-discovers-data-flows-via-lineage-at-scale/
Added Date: January 28, 2025
Memo: Explained in the three systems (API, data warehouse, AI inference), how to efficiently collect and validate the Lineage metadata.
Original URL: https://www.databricks.com/blog/introducing-easier-change-data-capture-apache-spark-structured-streaming
Added Date: January 27, 2025
Memo: The State Reader API enables users to access and analyze Structured Streaming's internal state data. Readers will learn how to leverage the new features to debug, troubleshoot, and analyze state changes efficiently, making streaming workloads easier to manage at scale.
Original URL: https://aws.amazon.com/blogs/database/how-monzo-bank-reduced-cost-of-ttl-from-time-series-index-tables-in-amazon-keyspaces/
Added Date: January 27, 2025
Memo: Monzo Bank optimized their data retention strategy in Amazon Keyspaces by replacing the traditional Time to Live (TTL) approach with a bulk deletion mechanism. By partitioning time-series data across multiple tables, each representing a specific time bucket, they can efficiently drop entire tables of expired data. This method significantly reduces operational costs associated with per-row TTL deletions.
Original URL: https://netflixtechblog.com/introducing-configurable-metaflow-d2fb8e9ba1c6
Added Date: January 26, 2025
Memo:
Original URL: https://www.alibabacloud.com/blog/introducing-fluss-streaming-storage-for-real-time-analytics_601921
Added Date: January 25, 2025
Memo:
Original URL: https://www.alibabacloud.com/blog/why-fluss-top-4-challenges-of-using-kafka-for-real-time-analytics_601879
Added Date: January 25, 2025
Memo:
Original URL: https://www.infoq.cn/article/yfl7SPbwJuFO4XBt2vJo
Added Date: January 24, 2025
Memo: JD.com has developed a comprehensive big data governance framework to manage its extensive data infrastructure, which includes thousands of servers, exabytes of storage, and millions of data models and tasks. The governance strategy focuses on cost reduction, stability, security, and data quality. Key initiatives involve the implementation of audit logs, full-link data lineage, and automated governance platforms. These efforts aim to enhance data management efficiency, ensure data security, and optimize resource utilization across the organization.