With regard to educating its community about data science, Analytics Vidhya has long been at the forefront. We periodically hold “DataHour” events to increase community interest in studying data science. These webinars are hosted by top industry experts and they teach and democratize data science knowledge. Here is the knowledge session by Shanthababu Pandian on “Introduction to BigQuery ML.”
About Speaker: Shanthababu Pandian, AI & Data Analytics Lead at Cognizant, has 20+ years of experience in Information Technology (IT). Expertise in ML & Data Architect and Program Delivery, including liaising with clients, gathering, eliciting requirements, architecting, and devising cost-effective solutions as per delivery frameworks and mitigating project/delivery risks.
BigQuery ML Overview
As you know, we have so many Cloud platforms available to us. But platforms like Azure, AWS, and Google Cloud Platform (GCP) dominate the all-digital transformation program. They all are competitors and always provide outstanding services to their users. We will talk about the Google Cloud Platform (GCP) here. Google is different and best compared to other champion companies. About 99% of online users use Google in some way, whether chrome, Gmail, cloud, etc.
Now, if we see Google analytics now it is very straightforward and creates the data chart in a very understandable manner. Google analytics can handle massive data very well.
BigQuery comes under Big Data Analytics, as seen in the above picture.
What is BigQuery ML?
BigQuery is the data warehouse to work with large amounts of data. With BigQuery, we can:
- Collect data from various sources
- Analyze data
- Visualize the analysis in multiple ways.
BigQuery can be divided into below parts:
As you can see in the above diagram, first, you need to collect Raw data. It could be collected from cloud storage, google sheets, databases etc. All the data collected can be inserted into BigQuery space, and we will get the refined data. In BigQuery, you can store data as well as create multiple queries. With the help of SQL statements, we can analyze the data and perform machine learning operations. You can connect obtained data to the locker, data studio, cloud AI etc., so that you have a separate cloud platform. We can do lots of things with this. BigQuery is affordable and compatible with using a massive amount of data.
Features of BigQuery ML
- Multi-cloud functionality: It allows data analysis across multiple cloud platforms. It can run the computation on the data right where it is located. You can even execute the queries from foreign cloud platforms like AWS, IBM and Azure. We can gain insights into the data with consistent data experience across clouds.
- Built-in ML integration (BQ ML): The best thing about this is that developers can use simple SQL statements to build the ML. It is used for creating and executing Machine learning models in BigQuery using simple SQL queries. It eliminates the need to understand ML-specific knowledge and programming skills (like Python, R, or Java). It allows all SQL practitioners to build ML models using their existing skills.
- Foundation for BI (BQ BI engine): This engine is an In-memory analysis solution. You analyze the data stored in BigQuery with High Concurrency and Response Times. It comes with an SQL interface, which really helps us to interact with other BI tools like Locker, Tableau, Power BI, etc.
- Geospatial Analysis (BQ GIS): BigQuery Geographic Information Systems (GIS) provides information about location and mapping. It converts Latitudes and Longitudes columns into geographical points.
- Automated Data Transfer (BQ Data Transfer Service): This Service is automated, and data moves into BigQuery regularly. No special coding is required. It takes care of Data Backfills to compensate for any gaps or outages during ingestion.
Below is the simple architecture diagram that shows the data warehouse we are implementing. Without Completing this, we cannot execute our ML operations.
What is Special in BigQuery ML?
Imagine there are 2 friends, David and Paul, talking about ML projects. Paul learned the python language and essential ML aspects and started writing codes. Paul was feeling proud of this and bragging to David that no one can work on ML projects without knowing python language.
On the other hand, David has multiple years of experience in writing SQL queries. David doesn’t know any programming language. He wants to work on ML projects but is afraid it might be difficult for him as he doesn’t know any programming language like python, R, or Java.
Paul tells him that it’s impossible to work on ML aspects without the knowledge of primary programming languages.
Like David, many people are proficient in SQL but still can’t work on ML projects.
People like David must explore Big Data BigQuery ML, and it will help them enough to work on ML projects because most of the clients want to implement BigQuery ML on their projects. If you know SQL, it is very easy to implement that. Below is the machine learning process in 5 easy steps.
5 Crucial Steps to Nail Machine Learning Process
As mentioned before, you need to collect the data from various sources. Then you need to do data cleaning and future engineering. This is very important to get the Golden dataset. Next, you require Model building for selecting the correct ML algorithm. We will evaluate the model after that. Lastly, you are going to deploy the model on the deployment production. The significant libraries supporting these steps are NumPy, SciPy, Pandas, Matplotlib, Keras, TensorFlow, SciKit-Learn, PyTorch, Scrapy and Beautiful Soup.
David is worried that he has to find and learn from so many libraries. And he assumes that he is not a fit person for doing ML projects. He concludes that Paul was correct.
David only knows CURD (Create, Read, Update, Delete), a basic procedure. He is thinking, can he use this approach in ML? We will find the answer ahead.
What is BigQuery ML (Machine Learning)?
The BigQuery from google is a serverless, highly scalable, cost-effective cloud data warehouse. It is designed to help you make informed decisions quickly to easily transform your business. BigQuery provides ML capabilities on top of the data that is being stored in it using the standard SQL syntax.
Now, David should get relaxed; yes, he can work on ML projects using SQL knowledge.
So, if you want anything to learn, forget about complexity and focus on your interest. Anything could be possible with dedication and basic knowledge.
BigQuery ML supports running models using SQL queries, bridging the gap between data analysts and scientists.
It supports various models like Linear regression binary, logistic regression, multiclass logistic regression, K-means clustering, etc.
As you can see, below is the complete workflow of the BigQuery ML data available in the warehouse.
Steps to implement BigQuery ML data:
ETL stands for Extracting, Transforming and Loading data into our BigQuery. One very crucial step is to create a model, then evaluate it and lastly, predict the model.
For model creation, the speaker has used his own dataset, as you can see below i.e. mydataset.mymodel. You can also give options in this as below. For feature information, you need to give a description.
Now, log in to BigQuery using your Gmail account. It will also ask for your Credit/debit card details. Below on the screen, you can see the projects created on BigQuery. You can also create new projects in it. Once the project is created, you have the specific ID.
Click on the option “ADD DATA” and upload your excel csv file here. Now click on the “Editor” option, you can see your executed datasets.
When you click on Editor2, you’ll see all the executed models like precision, recall, accuracy, loss etc.
Click on travel_insurance_model, and then “Evaluation”, you will see all the required and necessary information like below.
When you are dealing with python code or implementing ML, we separate the entire dataset into a Training set and test like 75% and 25%, respectively.
We can say that David should get excited now, as he can work on ML projects. He already knows SQL, and with the help of BigQuery, he can get the opportunity to work on some ML tasks.
So, we learned that you could work on ML projects without processing language knowledge. With the basic knowledge of SQL and the help of BigQuery, you are good to go. For BIgQuery ML, A basic understanding of ML discipline is required, and then Data Analyst can enter the world of Machine learning. I hope this article erases your doubts.
The media shown in this article is not owned by Analytics Vidhya and is used from the presenter’s presentation.