Explore Visualizations With AWS Glue Interactive Sessions | Amazon Web Services

AWS Glue interactive sessions offer a powerful way to iteratively explore datasets and fine-tune transformations using Jupyter-compatible notebooks. Interactive sessions enable you to work with a choice of popular integrated development environments (IDEs) in your local environment or with AWS Glue or Amazon SageMaker Studio notebooks on the AWS Management Console, all while seamlessly harnessing the power of a scalable, on-demand Apache Spark backend. This post is part of a series exploring the features of AWS Glue interactive sessions.

AWS Glue interactive sessions now include native support for the matplotlib visualization library (AWS Glue version 3.0 and later). In this post, we look at how we can use matplotlib and Seaborn to explore and visualize data using AWS Glue interactive sessions, facilitating rapid insights without complex infrastructure setup.

Solution overview

You can quickly provision new interactive sessions directly from your notebook without needing to interact with the AWS Command Line Interface (AWS CLI) or the console. You can use magic commands to provide configuration options for your session and install any additional Python modules that are needed.

In this post, we use the classic Iris and MNIST datasets to navigate through a few commonly used visualization techniques using matplotlib on AWS Glue interactive sessions.

Create visualizations using AWS Glue interactive sessions

We start by installing the Sklearn and Seaborn libraries using the additional_python_modules Jupyter magic command:

%additional_python_modules scikit-learn, seaborn

You can also upload Python wheel modules to Amazon Simple Storage Service (Amazon S3) and specify the full path as a parameter value to the additional_python_modules magic command.

Now, let’s run a few visualizations on the Iris and MNIST datasets.

Create a pair plot using Seaborn to uncover patterns within sepal and petal measurements across the iris species:

import seaborn as sns
import matplotlib.pyplot as plt # Load the Iris dataset
iris = sns.load_dataset("iris") # Create a pair plot
sns.pairplot(iris, hue="species")
%matplot plt

Create a violin plot to reveal the distribution of the sepal width measure across the three species of iris flowers:

# Create a violin plot of the Sepal Width measure
plt.figure(figsize=(10, 6))
sns.violinplot(x="species", y="sepal_width", data=iris)
plt.title("Violin Plot of Sepal Width by Species")
plt.show()
%matplot plt

Create a heat map to display correlations across the iris dataset variables:

# Calculate the correlation matrix
correlation_matrix = iris.corr() # Create a heatmap using Seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
%matplot plt

Create a scatter plot on the MNIST dataset using PCA to visualize distributions among the handwritten digits:

import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.decomposition import PCA # Load the MNIST dataset
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist['data'], mnist['target'] # Apply PCA to reduce dimensions to 2 for visualization
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X) # Scatter plot of the reduced data
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y.astype(int), cmap='viridis', s=5)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA - MNIST Dataset")
plt.colorbar(label="Digit Class") %matplot plt

Create another visualization using matplotlib and the mplot3d toolkit:

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D # Generate mock data
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2)) # Create a 3D plot
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d') # Plot the surface
surface = ax.plot_surface(x, y, z, cmap='viridis') # Add color bar to map values to colors
fig.colorbar(surface, ax=ax, shrink=0.5, aspect=10) # Set labels and title
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
ax.set_title('3D Surface Plot Example') %matplot plt

As illustrated by the preceding examples, you can use any compatible visualization library by installing the required modules and then using the %matplot magic command.

Conclusion

In this post, we discussed how extract, transform, and load (ETL) developers and data scientists can efficiently visualize patterns in their data using familiar libraries through AWS Glue interactive sessions. With this functionality, you’re empowered to focus on extracting valuable insights from their data, while AWS Glue handles the infrastructure heavy lifting using a serverless compute model. To get started today, refer to Developing AWS Glue jobs with Notebooks and Interactive sessions.

About the authors

Annie Nelson is a Senior Solutions Architect at AWS. She is a data enthusiast who enjoys problem solving and tackling complex architectural challenges with customers.

Keerthi Chadalavada is a Senior Software Development Engineer at AWS Glue. She is passionate about designing and building end-to-end solutions to address customer data integration and analytic needs.

Zach Mitchell is a Sr. Big Data Architect. He works within the product team to enhance understanding between product engineers and their customers while guiding customers through their journey to develop their enterprise data architecture on AWS.

Gal Heyne is a Product Manager for AWS Glue with a strong focus on AI/ML, data engineering and BI. She is passionate about developing a deep understanding of customer’s business needs and collaborating with engineers to design easy to use data products.

SEO Powered Content & PR Distribution. Get Amplified Today.
PlatoData.Network Vertical Generative Ai. Empower Yourself. Access Here.
PlatoAiStream. Web3 Intelligence. Knowledge Amplified. Access Here.
PlatoESG. Carbon, CleanTech, Energy, Environment, Solar, Waste Management. Access Here.
PlatoHealth. Biotech and Clinical Trials Intelligence. Access Here.
Source: https://aws.amazon.com/blogs/big-data/explore-visualizations-with-aws-glue-interactive-sessions/

Generative Data Intelligence

Explore visualizations with AWS Glue interactive sessions | Amazon Web Services

Solution overview

Create visualizations using AWS Glue interactive sessions

Conclusion

About the authors

Carlie Hanson pays tribute with her heartfelt cover of Alice In Chains’ ‘Nutshell’

Hyundai will build more hybrids to supplement slowing EV demand – Autoblog

Latest Intelligence

Drake Threatened With Lawsuit Over Tupac AI Vocals

Exclusive Trump Bitcoin NFTs With Custom Ordinals For ‘Mugshot Edition’ Buyers – CryptoInfoNet

Company Provides Digital Financial Literacy Training For Nigerians – CryptoInfoNet

BDAG Leads Top 5 Promising Crypto Presales of 2024

How to Assess Market Sentiment Before Buying Cryptocurrency

BlockDAG’s $100M Liquidity & Vesting Period Amidst SOL Network Issues & DOT Price Predictions