Tag: data catalog

Run Apache Spark 3.5.1 workloads 4.5 times faster with Amazon EMR runtime for Apache Spark | Amazon Web Services

Big Data June 21, 2024

The Amazon EMR runtime for Apache Spark is a performance-optimized runtime that is 100% API compatible with open source Apache Spark. It offers faster...

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics | Amazon Web Services

Big Data June 10, 2024

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation | Amazon Web Services

Big Data June 10, 2024

Modernize your data observability with Amazon OpenSearch Service zero-ETL integration with Amazon S3 | Amazon Web Services

Big Data June 5, 2024

Simplify data lake access control for your enterprise users with trusted identity propagation in AWS IAM Identity Center, AWS Lake Formation, and Amazon S3...

Big DataMay 29, 2024

Many organizations use external identity providers (IdPs) such as Okta or Microsoft Azure Active Directory to manage their enterprise user identities. These users interact...

Introducing Amazon EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform | Amazon Web Services

Big DataMay 28, 2024

AWS recently announced that Apache Flink is generally available for Amazon EMR on Amazon Elastic Kubernetes Service (EKS). Apache Flink is a scalable, reliable,...

Combining Data Management and Data Storytelling to Generate Value – KDnuggets

Big DataMay 23, 2024

Lately, I have been focusing on data storytelling and its importance in effectively communicating the results of data analysis to generate value. However, my...

Adaptive Data Governance: What, Why, How – DATAVERSITY

Big DataMay 16, 2024

In DATAVERSITY’s 2023 Trends in Data Management survey, about 64% of participants stated that their companies had Data Governance (DG), the formalization and enforcement of data operations across the...

Use AWS Glue Data Catalog views to analyze data | Amazon Web Services

Big DataMay 9, 2024

In this post, we show you how to use the new views feature the AWS Glue Data Catalog. SQL views are a powerful object...

Governing data in relational databases using Amazon DataZone | Amazon Web Services

Big DataMay 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone is a...

Introducing Amazon Q data integration in AWS Glue | Amazon Web Services

Big DataApril 30, 2024

Today, we’re excited to announce general availability of Amazon Q data integration in AWS Glue. Amazon Q data integration, a new generative AI-powered capability...

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center | Amazon Web Services

Big DataApril 26, 2024

To enable your workforce users for analytics with fine-grained data access controls and audit data access, you might have to create multiple AWS Identity...

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA | Amazon Web Services

Big DataApril 25, 2024

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that you can use to set up and...

12 3...20 Page 1 of 20

Latest Intelligence

Geospatial Data Analysis with Geemap – KDnuggets

Big Data April 15, 2024

Amazon DataZone announces integration with AWS Lake Formation hybrid access mode for the AWS Glue Data Catalog | Amazon Web Services

Big Data April 8, 2024

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight | Amazon...

Big Data March 29, 2024

10 GitHub Repositories to Master MLOps – KDnuggets

Big Data March 29, 2024

Generative Data Intelligence