Apache Spark - Plato Data Intelligence

Introducing Apache Hudi support with AWS Glue crawlers | Amazon Web Services

Big DataNovember 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as...

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics | Amazon Web Services

Big DataNovember 20, 2023

For any modern data-driven company, having smooth data integration pipelines is crucial. These pipelines pull data from various sources, transform it, and load it...

Speed up queries with the cost-based optimizer in Amazon Athena | Amazon Web Services

Big DataNovember 17, 2023

Amazon Athena is a serverless, interactive analytics service built on open source frameworks, supporting open table file formats. Athena provides a simplified, flexible way...

Your guide to AWS Analytics at AWS re:Invent 2023 | Amazon Web Services

Big DataNovember 13, 2023

Join the AWS Analytics team at AWS re:Invent this year, where new ideas and exciting innovations come together. For those in the data...

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark | Amazon Web Services

Big DataNovember 10, 2023

This post is co-written with Preshen Goobiah and Johan Olivier from Capitec. Apache Spark is a widely-used open source distributed processing system renowned for...

Connect your data for faster decisions with AWS | Amazon Web Services

Big DataNovember 7, 2023

The most impactful data-driven insights come from connecting the dots between all your data sources—across departments, services, on-premises tools, and third-party applications. But typically,...

What Are the Best Practices for Deploying PySpark on AWS?

Big DataNovember 7, 2023

Introduction In big data and advanced analytics, PySpark has emerged as a powerful tool for processing large datasets and analyzing distributed data. Deploying PySpark on...

Use IAM runtime roles with Amazon EMR Studio Workspaces and AWS Lake Formation for cross-account fine-grained access control | Amazon Web Services

Big DataNovember 6, 2023

Amazon EMR Studio is an integrated development environment (IDE) that makes it straightforward for data scientists and data engineers to develop, visualize, and debug...

Deploy Amazon QuickSight dashboards to monitor AWS Glue ETL job metrics and set alarms | Amazon Web Services

Big DataNovember 3, 2023

No matter the industry or level of maturity within AWS, our customers require better visibility into their AWS Glue usage. Better visibility can lend...

Comprehensive Guide to Data Analysis Tools in 2023: Maximize Your Insights

EdTechNovember 3, 2023

Table of contents Introduction In an era where data is considered the ‘new oil,’ the importance of data analysis cannot be overstated....

GoDaddy benchmarking results in up to 24% better price-performance for their Spark workloads with AWS Graviton2 on Amazon EMR Serverless | Amazon Web Services

Big DataNovember 2, 2023

This is a guest post co-written with Mukul Sharma, Software Development Engineer, and Ozcan IIikhan, Director of Engineering from GoDaddy. GoDaddy empowers everyday entrepreneurs...

Spark on AWS Lambda: An Apache Spark runtime for AWS Lambda | Amazon Web Services

Big DataOctober 30, 2023

Spark on AWS Lambda (SoAL) is a framework that runs Apache Spark workloads on AWS Lambda. It’s designed for both batch and event-based workloads,...

1 234...18 Page 3 of 18

Latest Intelligence

20 Technologies in Data Science for Professionals

AI February 5, 2024

Preprocess and fine-tune LLMs quickly and cost-effectively using Amazon EMR Serverless and Amazon SageMaker | Amazon Web Services

AI February 1, 2024

Mastering market dynamics: Transforming transaction cost analytics with ultra-precise Tick History – PCAP and Amazon Athena for Apache Spark | Amazon Web Services

Big Data January 31, 2024

The Only Free Course You Need To Become a Professional Data Engineer – KDnuggets

Big Data January 26, 2024

Use Amazon Athena with Spark SQL for your open-source transactional table formats | Amazon Web Services

Big Data January 24, 2024

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation | Amazon Web Services

Big Data January 17, 2024

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1 | Amazon Web Services

Big Data January 8, 2024

Generative Data Intelligence

Tag: Apache Spark