Connect with us

Big Data

Understanding Gradient Descent Algorithm

Avatar

Published

on

Big Data

EHR Data Launches Movement for Patients to Own Health Data

Avatar

Published

on

EHR Data has announced the official launch of EHR Data Wavemakers, a movement geared towards educating and empowering individuals to create waves that will push a much-needed change in the healthcare industry—for patients to be able to own and control their health data.

Many people from all walks of life must have had experiences dealing with miscommunication due to delays and failures in retrieving and sharing their own health records that may have negatively impacted their own or loved ones’ medical treatment and care. At the forefront of the EHR Data Wavemakers movement is the digital campaign My EHR Story that encourages people to share their stories on social media using the hashtag #myEHRstory. This does not only create awareness about the current situation wherein patients are having difficulties accessing their personal health data—when that data should be rightfully theirs to own and control—but also spearheads a movement towards responsible data management towards a blockchain-based global healthcare database.

“EHR Data is a company that’s existed for 41 years in the U.S. It’s not a startup. It is a significant player in the world of healthcare data in the U.S. It’s bringing its 41 years of experience to follow Craig Wright’s lead in empowering people to have more control over their data, in this case, healthcare data. They want to create more patient safety. They’re building the concept of a global electronic health record so that all of your health data can live in one place on the blockchain. As opposed to the current systems we have in the U.S. and many countries where I go to my general practitioner doctor, and they have some of my health records; my dentist has some health records,” Bitcoin Association Founding President Jimmy Nguyen said in a presentation of enterprise solutions built on the Bitcoin SV blockchain during an event in Ljubljana, Slovenia last year.

This movement could not have come at a more appropriate time. As people realize the value of data during this time of the pandemic, now is the time to make waves and enact the changes needed for people to own their data and benefit from it. Not only that, the global healthcare database is built on the Bitcoin SV blockchain, which provides transparency, security, scalability, and immutability to data. Furthermore, the Bitcoin SV blockchain can accommodate big data and low-cost microtransactions and operates on an economically incentivized model,

which makes it perfect for a global healthcare database.

“Times are changing, and a greater focus is being placed on interoperability and the patients’ absolute right to increased access to their health data. We will spearhead and shepherd this process; it’s high time that there was a centralized location for healthcare data, controlled and permission by the patient that they and their team of providers can access at any time,” Ron Austring, EHR Data Chief Scientist, explained.

Because patients own their health data, they will be asked for permission whenever their data is needed for use by various institutions, and they will be paid for it. This is in contrast to the current system where only big businesses profit off of collecting people’s data. And this is why there is a need for change. People must come together to revolutionize the system so they may claim back ownership of their data.

Visit https://ehrdata.com/wavemakers to become part of this movement and to learn more on how to share your stories.

Author:Makkie Maclang

source link:https://bitcoinassociation.net/bitcoin-sv-means-business-why-bsv-is-the-enterprise-friendly-blockchain/

 

Continue Reading

Artificial Intelligence

Deep Learning vs Machine Learning: How an Emerging Field Influences Traditional Computer Programming

Avatar

Published

on

When two different concepts are greatly intertwined, it can be difficult to separate them as distinct academic topics. That might explain why it’s so difficult to separate deep learning from machine learning as a whole. Considering the current push for both automation as well as instant gratification, a great deal of renewed focus has been heaped on the topic.

Everything from automated manufacturing worfklows to personalized digital medicine could potentially grow to rely on deep learning technology. Defining the exact aspects of this technical discipline that will revolutionize these industries is, however, admittedly much more difficult. Perhaps it’s best to consider deep learning in the context of a greater movement in computer science.

Defining Deep Learning as a Subset of Machine Learning

Machine learning and deep learning are essentially two sides of the same coin. Deep learning techniques are a specific discipline that belong to a much larger field that includes a large variety of trained artificially intelligent agents that can predict the correct response in an equally wide array of situations. What makes deep learning independent of all of these other techniques, however, is the fact that it focuses almost exclusively on teaching agents to accomplish a specific goal by learning the best possible action in a number of virtual environments.

Traditional machine learning algorithms usually teach artificial nodes how to respond to stimuli by rote memorization. This is somewhat similar to human teaching techniques that consist of simple repetition, and therefore might be thought of the computerized equivalent of a student running through times tables until they can recite them. While this is effective in a way, artificially intelligent agents educated in such a manner may not be able to respond to any stimulus outside of the realm of their original design specifications.

That’s why deep learning specialists have developed alternative algorithms that are considered to be somewhat superior to this method, though they are admittedly far more hardware intensive in many ways. Subrountines used by deep learning agents may be based around generative adversarial networks, convolutional neural node structures or a practical form of restricted Boltzmann machine. These stand in sharp contrast to the binary trees and linked lists used by conventional machine learning firmware as well as a majority of modern file systems.

Self-organizing maps have also widely been in deep learning, though their applications in other AI research fields have typically been much less promising. When it comes to defining the deep learning vs machine learning debate, however, it’s highly likely that technicians will be looking more for practical applications than for theoretical academic discussion in the coming months. Suffice it to say that machine learning encompasses everything from the simplest AI to the most sophisticated predictive algorithms while deep learning constitutes a more selective subset of these techniques.

Practical Applications of Deep Learning Technology

Depending on how a particular program is authored, deep learning techniques could be deployed along supervised or semi-supervised neural networks. Theoretically, it’d also be possible to do so via a completely unsupervised node layout, and it’s this technique that has quickly become the most promising. Unsupervised networks may be useful for medical image analysis, since this application often presents unique pieces of graphical information to a computer program that have to be tested against known inputs.

Traditional binary tree or blockchain-based learning systems have struggled to identify the same patterns in dramatically different scenarios, because the information remains hidden in a structure that would have otherwise been designed to present data effectively. It’s essentially a natural form of steganography, and it has confounded computer algorithms in the healthcare industry. However, this new type of unsupervised learning node could virtually educate itself on how to match these patterns even in a data structure that isn’t organized along the normal lines that a computer would expect it to be.

Others have proposed implementing semi-supervised artificially intelligent marketing agents that could eliminate much of the concern over ethics regarding existing deal-closing software. Instead of trying to reach as large a customer base as possible, these tools would calculate the odds of any given individual needing a product at a given time. In order to do so, it would need certain types of information provided by the organization that it works on behalf of, but it would eventually be able to predict all further actions on its own.

While some companies are currently relying on tools that utilize traditional machine learning technology to achieve the same goals, these are often wrought with privacy and ethical concerns. The advent of deep structured learning algorithms have enabled software engineers to come up with new systems that don’t suffer from these drawbacks.

Developing a Private Automated Learning Environment

Conventional machine learning programs often run into serious privacy concerns because of the fact that they need a huge amount of input in order to draw any usable conclusions. Deep learning image recognition software works by processing a smaller subset of inputs, thus ensuring that it doesn’t need as much information to do its job. This is of particular importance for those who are concerned about the possibility of consumer data leaks.

Considering new regulatory stances on many of these issues, it’s also quickly become something that’s become important from a compliance standpoint as well. As toxicology labs begin using bioactivity-focused deep structured learning packages, it’s likely that regulators will express additional concerns in regards to the amount of information needed to perform any given task with this kind of sensitive data. Computer scientists have had to scale back what some have called a veritable fire hose of bytes that tell more of a story than most would be comfortable with.

In a way, these developments hearken back to an earlier time when it was believed that each process in a system should only have the amount of privileges necessary to complete its job. As machine learning engineers embrace this paradigm, it’s highly likely that future developments will be considerably more secure simply because they don’t require the massive amount of data mining necessary to power today’s existing operations.

Image Credit: toptal.io

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://datafloq.com/read/deep-learning-vs-machine-learning-how-emerging-field-influences-traditional-computer-programming/13652

Continue Reading

AI

What did COVID do to all our models?

Avatar

Published

on

What did COVID do to all our models?

An interview with Dean Abbott and John Elder about change management, complexity, interpretability, and the risk of AI taking over humanity.


By Heather Fyson, KNIME

What did COVID do to all our models?

After the KNIME Fall Summit, the dinosaurs went back home… well, switched off their laptops. Dean Abbott and John Elder, longstanding data science experts, were invited to the Fall Summit by Michael to join him in a discussion of The Future of Data Science: A Fireside Chat with Industry Dinosaurs. The result was a sparkling conversation about data science challenges and new trends. Since switching off the studio lights, Rosaria has distilled and expanded some of the highlights about change management, complexity, interpretability, and more in the data science world. Let’s see where it brought us.

What is your experience with change management in AI, when reality changes and models have to be updated? What did COVID do to all our models?

 
[Dean] Machine Learning (ML) algorithms assume consistency between past and future. When things change, the models fail. COVID has changed our habits, and therefore our data. Pre-COVID models struggle to deal with the new situation.

[John] A simple example would be the Traffic layer on Google Maps. After lockdowns hit country after country in 2020, Google Maps traffic estimates were very inaccurate for a while. It had been built on fairly stable training data but now that system was thrown completely out of whack.

How do you figure out when the world has changed and the models don’t work anymore?

 
[Dean] Here’s a little trick I use: I partition my data by time and label records as “before” and “after”. I then build a classification model to discriminate the “after” vs. the “before” from the same inputs the model uses. If the discrimination is possible, then the “after” is different from the “before”, the world has changed, the data has changed, and the models must be retrained.

How complicated is it to retrain models in projects, especially after years of customization?

 
[John] Training models is usually the easiest step of all! The vast majority of otherwise successful projects die in the implementation phase. The greatest time is spent in the data cleansing and preparation phase. And the most problems are missed or made in the business understanding / project definition phase. So if you understand what the flaw is and can obtain new data and have the implementation framework in place, creating a new model is, by comparison, very straightforward.

Based on your decades-long experience, how complex is it to put together a really functioning Data Science application?

 
[John] It can vary of course, by complexity. Most of our projects get functioning prototypes at least in a few months. But for all, I cannot stress enough the importance of feedback: You have to talk to people much more often than you want to. And listen! We learn new things about the business problem, the data, or constraints, each time. Not all us quantitative people are skilled at speaking with humans, so it often takes a team. But the whole team of stakeholders has to learn to speak the same language.

[Dean] It is important to talk to our business counterpart. People fear change and don’t want to change the current status. One key problem really is psychological. The analysts are often seen as an annoyance. So, we have to build the trust between the business counterpart and the analytics geeks. The start of a project should always include the following step: Sync up domain experts / project managers, the analysts, and the IT and infrastructure (DevOps) team so everyone is clear on the objectives of the project and how it will be executed. Analysts are number 11 on the top 10 list of people they have to see every day! Let’s avoid embodying data scientist arrogance: “The business can’t understand us/our techniques, but we know what works best”. What we don’t understand, however, are the domains experts are actually experts in the domain we are working in! Translation of data science assumptions and approaches into language that is understood by the domain experts is key!

The latest trend now is deep learning, apparently it can solve everything. I got a question from a student lately, asking “why do we need to learn other ML algorithms if deep learning is the state of the art to solve data science problems”?

 
[Dean] Deep learning sucked a lot of the oxygen out of the room. It feels so much like the early 1990s when neural networks ascended with similar optimism! Deep Learning is a set of powerful techniques for sure, but they are hard to implement and optimize. XGBoost, Ensembles of trees, are also powerful but currently more mainstream. The vast majority of problems we need to solve using advanced analytics really don’t require complex solutions, so start simple; deep learning is overkill in these situations. It is best to use the Occam’s razor principle: if two models perform the same, adopt the simplest.

About complexity. The other trend, opposite to deep learning, is ML interpretability. Here, you greatly (excessively?) simplify the model in order to be able to explain it. Is interpretability that important?

 
[John] I often find myself fighting interpretability. It is nice, sure, but often comes at too high a cost of the most important model property: reliable accuracy. But many stakeholders believe interpretability is essential, so it becomes a barrier for acceptance. Thus, it is essential to discover what kind of interpretability is needed. Perhaps it is just knowing what the most important variables are? That’s doable with many nonlinear models. Maybe, as with explaining to credit applicants why they were turned down, one just needs to interpret outputs for one case at a time? We can build a linear approximation for a given point. Or, we can generate data from our black box model and build an “interpretable” model of any complexity to fit that data.

Lastly, research has shown that if users have the chance to play with a model – that is, to poke it with trial values of inputs and see its outputs, and perhaps visualize it – they get the same warm feelings of interpretability. Overall, trust – in the people and technology behind the model – is necessary for acceptance, and this is enhanced by regular communication and by including the eventual users of the model in the build phases and decisions of the modeling process.

[Dean] By the way KNIME Analytics Platform has a great feature to quantify the importance of the input variables in a Random Forest! The Random Forest Learner node outputs the statistics of candidate and splitting variables. Remember that, when you use the Random Forest Learner node.

There is an increase in requests for explanations of what a model does. For example, for some security classes, the European Union is demanding verification that the model doesn’t do what it’s not supposed to do. If we have to explain it all, then maybe Machine Learning is not the way to go. No more Machine Learning?

 
[Dean]  Maybe full explainability is too hard to obtain, but we can achieve progress by performing a grid search on model inputs to create something like a score card describing what the model does. This is something like regression testing in hardware and software QA. If a formal proof what models are doing is not possible, then let’s test and test and test! Input Shuffling and Target Shuffling can help to achieve a rough representation of the model behavior.

[John] Talking about understanding what a model does, I would like to raise the problem of reproducibility in science. A huge proportion of journal articles in all fields — 65 to 90% — is believed to be unreplicable. This is a true crisis in science. Medical papers try to tell you how to reproduce their results. ML papers don’t yet seem to care about reproducibility. A recent study showed that only 15% of AI papers share their code.

Let’s talk about Machine Learning Bias. Is it possible to build models that don’t discriminate?

 
[John] (To be a nerd for a second, that word is unfortunately overloaded. To “discriminate” in the ML world word is your very goal: to make a distinction between two classes.) But to your real question, it depends on the data (and on whether the analyst is clever enough to adjust for weaknesses in the data): The models will pull out of the data the information reflected therein. The computer knows nothing about the world except for what’s in the data in front of it. So the analyst has to curate the data — take responsibility for those cases reflecting reality. If certain types of people, for example, are under-represented then the model will pay less attention to them and won’t be as accurate on them going forward. I ask, “What did the data have to go through to get here?” (to get in this dataset) to think of how other cases might have dropped out along the way through the process (that is survivor bias). A skilled data scientist can look for such problems and think of ways to adjust/correct for them.

[Dean] The bias is not in the algorithms. The bias is in the data. If the data is biased, we’re working with a biased view of the world. Math is just math, it is not biased.

Will AI take over humanity?!

 
[John] I believe AI is just good engineering. Will AI exceed human intelligence? In my experience anyone under 40 believes yes, this is inevitable, and most over 40 (like me, obviously): no! AI models are fast, loyal, and obedient. Like a good German Shepherd dog, an AI model will go and get that ball, but it knows nothing about the world other than the data it has been shown. It has no common sense. It is a great assistant for specific tasks, but actually quite dimwitted.

[Dean] On that note, I would like to report two quotes made by Marvin Minsky in 1961 and 1970, from the dawn of AI, that I think describe well the future of AI.

“Within our lifetime some machines may surpass us in general intelligence” (1961)

“In three to eight years we’ll have a machine with the intelligence of a human being” (1970)

These ideas have been around for a long time. Here is one reason why AI will not solve all the problems: We’re judging its behavior based on one number, one number only! (Model error.) For example, predictions of stock prices over the next five years, predicted by building models using root mean square error as the error metric, cannot possibly paint the full picture of what the data are actually doing and severely hampers the model and its ability to flexibly uncover the patterns. We all know that RMSE is too coarse of a measure. Deep Learning algorithms will continue to get better, but we also need to get better at judging how good a model really is. So, no! I do not think that AI will take over humanity.

We have reached the end of this interview. We would like to thank Dean and John for their time and their pills of knowledge. Let’s hope we meet again soon!

About Dean Abbott and John Elder

What did COVID do to all our models Dean Abbott is Co-Founder and Chief Data Scientist at SmarterHQ. He is an internationally recognized expert and innovator in data science and predictive analytics, with three decades of experience solving problems in omnichannel customer analytics, fraud detection, risk modeling, text mining & survey analysis. Included frequently in lists of pioneering data scientists and data scientists, he is a popular keynote speaker and workshop instructor at conferences worldwide, also serving on Advisory Boards for the UC/Irvine Predictive Analytics and UCSD Data Science Certificate programs. He is the author of Applied Predictive Analytics (Wiley, 2014) and co-author of The IBM SPSS Modeler Cookbook (Packt Publishing, 2013).


What did COVID do to all our models John Elder founded Elder Research, America’s largest and most experienced data science consultancy in 1995. With offices in Charlottesville VA, Baltimore MD, Raleigh, NC, Washington DC, and London, they’ve solved hundreds of challenges for commercial and government clients by extracting actionable knowledge from all types of data. Dr. Elder co-authored three books — on practical data mining, ensembles, and text mining — two of which won “book of the year” awards. John has created data mining tools, was a discoverer of ensemble methods, chairs international conferences, and is a popular workshop and keynote speaker.


 
Bio: Heather Fyson is the blog editor at KNIME. Initially on the Event Team, her background is actually in translation & proofreading, so by moving to the blog in 2019 she has returned to her real passion of working with texts. P.S. She is always interested to hear your ideas for new articles.

Original. Reposted with permission.

Related:

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.kdnuggets.com/2021/04/covid-do-all-our-models.html

Continue Reading

Big Data

Shapash: Making Machine Learning Models Understandable

Avatar

Published

on

Shapash: Making Machine Learning Models Understandable

Establishing an expectation for trust around AI technologies may soon become one of the most important skills provided by Data Scientists. Significant research investments are underway in this area, and new tools are being developed, such as Shapash, an open-source Python library that helps Data Scientists make machine learning models more transparent and understandable.


By Yann Golhen, MAIF, Lead Data Scientist.

Shapash Web App Demo

Shapash by MAIF is a Python Toolkit that facilitates the understanding of Machine Learning models to data scientists. It makes it easier to share and discuss the model interpretability with non-data specialists: business analysts, managers, and end-users.

Concretely, Shapash provides easy-to-read visualizations and a web app. Shapash displays results with appropriate wording (preprocessing inverse/post-processing). Shapash is useful in an operational context as it enables data scientists to use explicability from exploration to production: You can easily deploy local explainability in production to complete each of your forecasts/recommendations with a summary of the local explainability.

In this post, we will present the main features of Shapash and how it operates. We will illustrate the implementation of the library on a concrete use case.

Elements of context

Interpretability and explicability of models are hot topics. There are many articles, publications, and open-source contributions about it. All these contributions do not deal with the same issues and challenges.

Most data scientists use these techniques for many reasons: to better understand their models, to check that they are consistent and unbiased, as well as for debugging.

However, there is more to it:

Intelligibility matters for pedagogic purposes. Intelligible Machine Learning models can be debated with people that are not data specialists: business analysts, final users…

Concretely, there are two steps in our Data Science projects that involve non-specialists:

Exploratory step & Model fitting

At this step, data scientists and business analysts discuss what is at stake and define the essential data they will integrate into the projects. It requires understanding the subject well and the main drivers of the problem we are modeling.

To do this, data scientists study global explicability, features importance, and which role the model’s top features play. They can also locally look at some individuals, especially outliers. A Web App is interesting at this phase because they need to look at visualizations and graphics. Discussing these results with business analysts is interesting to challenge the approach and validate the model.

Deploying the model in a production environment

That’s it! The model is validated, deployed, and gives predictions to the end-users. Local explicability can bring them a lot of value, only if there is a way to provide them with a good, useful, and understandable summary. It will be valuable to them for two reasons:

  • Transparency brings trust: They will trust models if they understands them.
  • Human stays in control: No model is 100% reliable. When they can understand the algorithm’s outputs, users can overturn the algorithm suggestions if they think they rest on incorrect data.

Shapash has been developed to help data scientists to meet these needs.

Shapash key features

  • Easy-to-read visualizations, for everyone.
  • A web app: To understand how a model works, you have to look at multiple graphs, features importance, and global contribution of a feature to a model. A web app is a useful tool for this.
  • Several methods to show results with appropriate wording (preprocessing inverse, post-processing). You can easily add your data dictionaries, category-encodersobject, or sklearn ColumnTransformer for more explicit outputs.
  • Functions to easily save Picklefiles and to export results in tables.
  • Explainability summary: the summary is configurable to fit with your need and to focus on what matters for local explicability.
  • Ability to easily deploy in a production environment and to complete every prediction/recommendation with a local explicability summary for each operational apps (Batch or API)
  • Shapashis open to several ways of proceeding: It can be used to easily access results or to work on better wording. Very few arguments are required to display results. But the more you work with cleaning and documenting the dataset, the clearer the results will be for the end-user.

Shapash works for Regression, Binary Classification, or Multiclass problems. It is compatible with many models: CatboostXgboostLightGBMSklearn EnsembleLinear modelsSVM.

Shapash is based on local contributions calculated with Shap (shapley values), Lime, or any technique which allows computing summable local contributions.

Installation

You can install the package through pip:

$pip install shapash 

Shapash Demonstration

Let’s use Shapash on a concrete dataset. In the rest of this article, we will show you how Shapash can explore models.

We will use the famous “House Prices” dataset from Kaggle to fit a regressor and predict house prices! Let’s start by loading the Dataset:

import pandas as pd
from shapash.data.data_loader import data_loading house_df, house_dict = data_loading('house_prices')
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]
house_df.head(3) 

Encode the categorical features:

from category_encoders import OrdinalEncoder categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']
encoder = OrdinalEncoder(cols=categorical_features).fit(X_df)
X_df=encoder.transform(X_df) 

Train, test split and model fitting:

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75)
reg = RandomForestRegressor(n_estimators=200, min_samples_leaf=2).fit(Xtrain,ytrain) 

And predict test data:

y_pred = pd.DataFrame(reg.predict(Xtest), columns=['pred'], index=Xtest.index) 

Let’s discover and use Shapash SmartExplainer.

Step 1 — Import

from shapash.explainer.smart_explainer import SmartExplainer 

Step 2 — Initialise a SmartExplainer Object

xpl = SmartExplainer(features_dict=house_dict) # Optional parameter 
  • features_dict: dict that specifies the meaning of each column name of the x pd.DataFrame.

Step 3 — Compile

xpl.compile( x=Xtest, model=regressor, preprocessing=encoder,# Optional: use inverse_transform method y_pred=y_pred # Optional
) 

The compile method permits to use of another optional parameter: postprocess. It gives the possibility to apply new functions to specify to have better wording (regex, mapping dict, …).

Now, we can display results and understand how the regression model works!

Step 4 — Launching the Web App

app = xpl.run_app() 

The web app link appears in Jupyter output (access the demo here).

There are four parts to this Web App:

Each one interacts to help to explore the model easily.

Features Importance: you can click on each feature to update the contribution plot below.

Contribution plot: How does a feature influence the prediction? Display violin or scatter plot of each local contribution of the feature.

Local Plot:

  • Local explanation: which features contribute the most to the predicted value.
  • You can use several buttons/sliders/lists to configure the summary of this local explainability. We will describe below with the filter method the different parameters you can work your summary with.
  • This web app is a useful tool to discuss with business analysts the best way to summarize the explainability to meet operational needs.

Selection Table: It allows the Web App user to select:

  • A subset to focus the exploration on this subset
  • A single row to display the associated local explanation

How do you use the data table to select a subset? At the top of the table, just below the name of the column that you want to use to filter, specify:

  • =Value, >Value, <Value
  • If you want to select every row containing a specific word, just type that word without “=”

There are a few options available on this web app (top right button). The most important one is probably the size of the sample (default: 1000). To avoid latency, the web app relies on a sample to display the results. Use this option to modify this sample size.

To kill the app:

app.kill() 

Step 5 — The plots

All the plots are available in jupyter notebooks, and the paragraph below describes the key points of each plot.

Feature Importance

This parameter allows comparing features importance of a subset. It is useful to detect specific behavior in a subset.

subset = [ 168, 54, 995, 799, 310, 322, 1374, 1106, 232, 645, 1170, 1229, 703, 66, 886, 160, 191, 1183, 1037, 991, 482, 725, 410, 59, 28, 719, 337, 36 ]
xpl.plot.features_importance(selection=subset) 

Contribution plot

Contribution plots are used to answer questions like:

How a feature impacts my prediction? Does it contribute positively? Is the feature increasingly contributing? decreasingly? Are there any threshold effects? For a categorical variable, how does each modality contributes? This plot completes the importance of the features for the interpretability, the global intelligibility of the model to better understand the influence of a feature on a model.

There are several parameters on this plot. Note that the plot displayed adapts depending on whether you are interested in a categorical or continuous variable (Violin or Scatter) and depending on the type of use case you address (regression, classification).

xpl.plot.contribution_plot("OverallQual") 

Contribution plot applied to a continuous feature.

Classification Case: Titanic Classifier — Contribution plot applied to categorical feature.

Local plot

You can use local plots for local explainability of models.

The filter() and local_plot() methods allow you to test and choose the best way to summarize the signal that the model has picked up. You can use it during the exploratory phase. You can then deploy this summary in a production environment for the end-user to understand in a few seconds what are the most influential criteria for each recommendation.

We will publish a second article to explain how to deploy local explainability in production.

Combine the filter and local_plot methods

Use the filter method to specify how to summarize local explainability. You have four parameters to configure your summary:

  • max_contrib: maximum number of criteria to display
  • threshold: minimum value of the contribution (in absolute value) necessary to display a criterion
  • positive: display only positive contribution? Negative? (default None)
  • features_to_hide: list of features you don’t want to display

After defining these parameters, we can display the results with the local_plot() method, or export them with to_pandas().

xpl.filter(max_contrib=8,threshold=100)
xpl.plot.local_plot(index=560) 

Export to pandas DataFrame:

xpl.filter(max_contrib=3,threshold=1000)
summary_df = xpl.to_pandas()
summary_df.head() 

Compare plot

With the compare_plot() method, the SmartExplainer object makes it possible to understand why two or more individuals do not have the same predicted values. The most decisive criterion appears at the top of the plot.

xpl.plot.compare_plot(row_num=[0, 1, 2, 3, 4], max_features=8) 

We hope that Shapash will be useful in building trust in AI. Thank you in advance to all those who will give us their feedback, idea… Shapash is opensource! Feel free to contribute by commenting on this post or directly on the GitHub discussions.

Original. Reposted with permission.

Related:

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.kdnuggets.com/2021/04/shapash-machine-learning-models-understandable.html

Continue Reading
Esports4 days ago

Free Fire World Series APK Download for Android

Esports2 days ago

C9 White Keiti Blackmail Scandal Explains Sudden Dismissal

Esports2 days ago

Overwatch League 2021 Day 1 Recap

Esports4 days ago

Dota 2: Top Mid Heroes of Patch 7.29

Esports2 days ago

Fortnite: Epic Vaults Rocket Launchers, Cuddlefish & Explosive Bows From Competitive

Esports3 days ago

Don’t Miss Out on the Rogue Energy x Esports Talk Giveaway!

Esports4 days ago

Capcom Reveals Ransomware Hack Came from Old VPN

Esports3 days ago

Fortnite: DreamHack Cash Cup Extra Europe & NA East Results

Esports2 days ago

Gamers Club and Riot Games Organize Women’s Valorant Circuit in Latin America

Esports4 days ago

PSA: CSGO Fans Beware, Unfixed Steam Invite Hack Could Take Over Your PC.

Blockchain3 days ago

CoinSmart Appoints Joe Tosti as Chief Compliance Officer

Fintech4 days ago

FinSS and Salt Edge partner for CDR Compliance solution in Australia

Blockchain4 days ago

Bitfinex-Hacker versenden BTC im Wert von 750 Millionen USD

Blockchain5 days ago

Tech firm unveils Australian first initiative to help charities access blockchain funding

Blockchain3 days ago

April Continuum Blockchain Legislation Summit ContinuumBlockLegs

Esports4 days ago

AI-Driven Overwatch Power Rankings Coming From IBM

Esports4 days ago

COD Mobile Season 3 Tokyo Escape

Esports3 days ago

2021 Call of Duty Mobile World Championship Announced

Fintech3 days ago

Mambu research reveals global consumers are hesitant to use Open Banking

Blockchain2 days ago

15. BNB Burn: Binance zerstört Coins im Wert von 600 Mio. USD

Trending