Zephyrnet Logo

Tag: PySpark

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center | Amazon Web Services

To enable your workforce users for analytics with fine-grained data access controls and audit data access, you might have to create multiple AWS Identity...

Top News

7 Python Libraries Every Data Engineer Should Know – KDnuggets

Image by Author  As a data engineer, the list of tools and frameworks you’re expected to know can often be daunting. But, at the...

Run interactive workloads on Amazon EMR Serverless from Amazon EMR Studio | Amazon Web Services

Starting from release 6.14, Amazon EMR Studio supports interactive analytics on Amazon EMR Serverless. You can now use EMR Serverless applications as the compute,...

Automate large-scale data validation using Amazon EMR and Apache Griffin | Amazon Web Services

Many enterprises are migrating their on-premises data stores to the AWS Cloud. During data migration, a key requirement is to validate all the data...

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions | Amazon Web Services

Today, we are pleased to announce that Amazon DataZone is now able to present data quality information for data assets. This information empowers end-users...

How Amazon optimized its high-volume financial reconciliation process with Amazon EMR for higher scalability and performance | Amazon Web Services

Account reconciliation is an important step to ensure the completeness and accuracy of financial statements. Specifically, companies must reconcile balance sheet accounts that could...

Working with Window Functions in PySpark

Introduction Learning about Window Functions in PySpark can be challenging but worth the effort. Window Functions are a powerful tool for analyzing data and can...

Scale AWS Glue jobs by optimizing IP address consumption and expanding network capacity using a private NAT gateway | Amazon Web Services

As businesses expand, the demand for IP addresses within the corporate network often exceeds the supply. An organization’s network is often designed with some...

Build a pseudonymization service on AWS to protect sensitive data: Part 2 | Amazon Web Services

Part 1 of this two-part series described how to build a pseudonymization service that converts plain text data attributes into a pseudonym or vice...

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg | Amazon Web Services

As enterprises collect increasing amounts of data from various sources, the structure and organization of that data often need to change over time to...

Data Science and the Go Programming Language – KDnuggets

Sponsored Content Comments by Tom Miller, Faculty Director of Northwestern University’s MSDS program. Years ago, as a student of applied statistics at the University of Minnesota,...

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success | Amazon Web Services

This post is co-written with Toney Thomas and Ben Vengerovsky from Bluestone. In the ever-evolving world of...

Simplify authentication with native LDAP integration on Amazon EMR | Amazon Web Services

Many companies have corporate identities stored inside identity providers (IdPs) like Active Directory (AD) or OpenLDAP. Previously, customers using Amazon EMR could integrate their...

Latest Intelligence

spot_img
spot_img