Connect with us

# Fast Combinatorial Feature Selection with New Definition of Predictive Power

Published

on

In this article, I proposes a simple metric to measure predictive power. It is used for combinatorial feature selection, where a large number of feature combinations need to be ranked automatically and very fast, for instance in the context of transaction scoring, in order to optimize predictive models. This is about rather big data, and we would like to see an Hadoop methodology for the technology proposed here. It can easily be implemented in a Map Reduce framework. It  was developed by the author in the context of credit card fraud detection, and click/keyword scoring. This material will be part of our data science apprenticeship, and included in our Wiley book.

Feature selection is a methodology used to detect the best subset of features, out of dozens or hundreds of features (also called variables or rules). By “best”, we mean with highest predictive power, a concept defined in the following subsection. In short, we want to remove duplicate features, simplify a bit the correlation structure (among features) and remove features that bring no value, such as a features taking on random values, thus lacking predictive power, or features (rules) that are almost never triggered (except if they are perfect fraud indicators when triggered).

The problem is combinatorial in nature. You want a manageable, small set of features (say 20 features) selected from (say) a set of 500 features, to run our hidden decision trees (or some other classification / scoring technique) in a way that is statistically robust.  But there are 2.7 * 1035 combinations of 20 features out of 500, and you need to compute all of them to find the one with maximum predictive power. This problem is computationally intractable, and you need to find an alternate solution. The good thing is that you don’t need to find the absolute maximum; you just need to find a subset of 20 features that is good enough.

One way to proceed is to compute the predictive power of each feature. Then, add one feature at a time to the subset (starting with 0 feature) until you reach either

• Adding a new feature does not significantly improve the overall predictive power of the subset (in short, convergence has been attained)

At each iteration, choose the feature to be added among the two remaining features with the highest predictive power: you will choose (among these two features) the one that increases the overall predictive power (of the subset under construction) most. Now you have reduced your computations from 2.7 * 1035 to 40 = 2 * 20.

# Seven Tools for Effective CDO Leadership

Published

on

The position of Chief Data Officer (CDO) is relatively new in the federal government, and emerging regulations are providing leadership opportunities for the CDO. A new law, the Foundations for Evidence-Based Policymaking Act, went into effect on January 14, 2019, establishing a set of standards and practices for the United States federal government to modernize its data handling.

Title II of this act is called the Open, Public, Electronic and Necessary (OPEN) Government Data Act, which arose out of the 2013 Open Data Policy. The OPEN Government Data Act requires federal agencies to publish a comprehensive inventory of all data assets, made available as machine-readable data in an open format, under open licenses, as well as putting in place a non-politically appointed senior executive (now the CDO) responsible for actively managing data as an asset. “Not just to talk about it, not just try to leverage value for the enterprise, but to treat it like an asset,” said Corlan Budd, Manager of Data, and Analytics, and Technology Strategy with Ernst & Young. He discussed this during his presentation titled The Chief Data Officer as an Effective Leader at the DATAVERSITY® DGVision Conference. He shared seven tools that can help the CDO be a more effective leader, whether in a government agency, or in the private sector.

#### Key Responsibilities

Budd identified four key
responsibilities of the CDO:

• Managing data as an asset
• Transforming how the agency interacts with data
• Value generation
• Regulatory Compliance

Previously, government agencies treated data like a by-product of the system without much concern about practices around the data. Now that the CDO is responsible for changing the culture and transforming the way the agency interacts with data, compliance with the Evidence-Based Policymaking Act, as well as a number of other data privacy acts, including HIPAA, is within with the CDO’s purview. The CDO is also responsible for value generation, which is measured differently in the government space than it is in the private sector, he said. Rather than valuing the data and trying to monetize it, “we have to support the mission and improve public service,” he said.

#### Culture and the CDO Challenge

Budd quoted Peter Drucker: “Culture will eat strategy for breakfast.” Building an effective strategy is a waste of time if the culture puts up roadblocks to its success. The key to ensuring strategy is embraced rather than ‘eaten for breakfast,’ Budd said, is leadership, yet, “The culture and the organizational dynamics don’t necessarily line up for success immediately.” Cultural factors are dependent on context, and the organizational structure where the CDO resides, whether that is in finance, or risk, or another part of the organization. Support from the CIO and the dynamics of power above the CDO have an effect on autonomy. Culture issues below the CDO often stem from staff buy-in and stakeholder support.

#### Funding and Proving Value

The CDO must show the value of the data itself as well as the value of improving the organization’s relationship with data, while managing expectations about how and when this will happen.  Contracts that are project-based, or with more sophisticated capabilities tend to have an easier time getting funding than program-based proposals that could enhance customer value and provide better service company-wide. With some business units, he said, essentially the only value that they get is the ability to operate their program.

Innovation and transformation provide peak value when C-level
execs are able to make data-driven decisions, optimize performance, and reduce
costs. What often stands in the way of that is culture. The key is to change
from a program or business unit focus to an enterprise-wide approach. “Get
folks in a room and get them talking,” creating an environment that facilitates
conversation among data enthusiasts where they can discuss data issues and leverage
data sharing initiatives. This can provide a lot of value and open up
possibilities for positive cultural change, he said.

#### Assessing Culture: Hofstede’s 6 Dimensions of Culture

Budd suggests using three elements of social psychologist Geert Hofstede’s Six Dimensions of Culture as a guide to qualitatively assess the organizational culture: Individualism vs. collectivism, uncertainty avoidance, and long-term vs. short-term orientation.

• Individualism vs. Collectivism: An
individualistic culture values individual performance and recognition over
playing a role as part of larger extended team or group. Loyalties in an
individualistic culture are focused on the individual. Collectivist culture
loyalties are focused on groups or departments. When building a team
environment, everyone has to understand that in some circumstances they will be
recognized for individual accomplishment, but in relationship to data, each
person has a role as part of a team. “That helps the overall success of not
just the chief data officer, but how effectively we can utilize our data and
how much value we can get from our data for the entire organization, not just
in that C-suite area.”
• Short Term vs. Long Term Orientation: Budd
was surprised at how prevalent short-term orientation was throughout his
organization, with an almost complete lack of interest in any long-term
orientation for strategy. The value of a strategy happens over the course of
time, so he suggests finding some of the low-hanging fruit without sacrificing
longer-term goals. When focusing on moving the needle from short-term
orientation toward the long-term orientation side, “The only way I was able to
do that was to satisfy some of the short-term need, at least for the moment,”
which gave him enough momentum to focus in on some of the longer-term strategy
issues.
• Low vs. High Uncertainty Tolerance: Uncertainty avoidance can be a stumbling block or a wise choice depending on the situation. Concern about investments in new technology is a good idea if the tool is unproven. Stakeholders may have difficulty buying in if there’s a high level of uncertainty about the vision or the likelihood of success, especially if they previously saw a Chief Data Officer who tried something similar and didn’t succeed the first time. With uncertainty avoidance, he considered his efforts a success if there was any move across the halfway point toward risk.

When you come across a situation where you’re on one extreme of the continuum, figure out how you can move that needle culture-wise back to an acceptable area for your strategy to succeed,” he said.

Budd found two leadership principles from John Maxwell’s 21 Irrefutable Laws of Leadership particularly useful for developing skills needed to adapt to the existing environment and connect with the people in it.

• The Law of the Lid: Leadership ability
determines a person’s level of effectiveness. Implementing required changes
without buy-in has a negative effect on culture, he said. “There are a lot of
things that you just can’t do unless you have consensus.” Understand the importance
of developing multiple leadership styles based on the existing culture, such as
using a transformative leadership style in some circumstances, and democratic
leadership in other circumstances. “When you need to develop consensus, you
might have to switch your leadership style to one that’s a little bit more
democratic,”
• The Law of Connection: Leaders touch a
heart before they ask for a hand. A leader needs to develop a personal
connectionbefore successfully affecting culture or leading individuals
in the organization, said Budd. “Followers don’t necessarily follow a
particular thing, but they will follow your vision, and if they connect with

#### Effective Leadership: Influence and Motivate

Three more of Maxwell’s laws, as well as Jim Collins’ Turning the Flywheel provide guidance for learning how to influence and motivate others:

• The Law of Explosive Growth: To add
essentially lead the entire agency, because everyone is a consumer of data, he
said. Identify a group of data consumers and empower them – enable them to the
impact for culture change has essentially multiplied.”
• The Law of Influence: The true measure of
build on one another and contribute to a leader’s level of influence.  “If we want to be effective, and the
measurement of our effectiveness is our influence, then that’s what we need to
make sure we’re honing in on.”
• The
Law of the Big Mo:
Momentum is the leader’s best friend. It’s the little
things that lead to the big things
• The Flywheel Concept: Establish momentum
early on in the process by getting some wins and providing short-term value.
This is similar to riding a bike or turning a flywheel. “The first couple of
strides are always really, really difficult, but once you get that momentum
going when you’re riding the bike, then the machine does a lot of the work for
you.”

According to Jim Collins’ Good to Great, effectively leading an organization into greatness entails sustaining a certain level of performance and growth over time. “A leader’s lasting value is measured by how things continue after they’re gone,” said Budd, yet often when a leader leaves, their initiatives fall by the wayside. An effective leader uses Maxwell’s Law of Explosive Growthto build sustainability. “‘It takes a leader to raise a leader,’ so the essential strategy for sustainability is to develop leaders who will support your data initiatives into the future.”

#### Effective Leadership: First Things First

To manage short-term value expectations, Budd recommends Steven
Covey’s concept of ‘first things first.’ With effective prioritizing, a leader
is able to focus on values, plan ahead, and have opportunities for networking,
relationship-building, and impacting the culture.

Budd uses the Eisenhower Decision Matrix as tool for effectively determining which tasks are important but not urgent, and how to move from reactive to proactive, “Instead of trying to get through the day putting out fires.”

As new activities are added to his plate, Budd uses the chart to
ask himself where they fit in the matrix and whether they line up with his priorities
and strategy. This process, he said, “provides some pretty good immediate
value.” Socializing the Eisenhower matrix can create buy-in and ownership among
team members. When all members participate in thinking through where time
should be spent and work together to ensure that quadrant one
(Important/Urgent) and quadrant two (Important/Not Urgent) are balanced,
priorities are shared and value becomes apparent. “The key also is making sure
that when you do that, you track the value and you measure it, and you
celebrate your win whenever you get one.”

Want to learn more about DATAVERSITY’s upcoming events? Check out our current lineup of online and face-to-face conferences here.

Here is the video of the DGVision Presentation:

# Key Considerations for Executing a Successful M&A Data Migration or Carve-Out

Published

on

Mergers, acquisitions, and divestitures
are just as much of an undertaking for a CIO as they are for a CFO; they are
impactful on both the business and technology side. Determining which SAP
systems and data sets to migrate, integrate, or carve-out as part of the deal —
and then executing on those migrations or carve-outs — can be costly, lengthy,
and incredibly complex processes, which in turn impacts your overall timeline.
Missteps in the data migration process can result in unnecessary technical
debt, potential Transition Services Agreement penalties, and even delays in achieving
your final goals for the project.

There are some key considerations that I would recommend to companies undergoing mergers, acquisitions, or divestitures when it comes to their data migration needs. Chief among those considerations is the need to build automation into the heart of your migration or carve-out strategies and why aligning with the right software-driven partner is integral for executing a data migration or carve-out that stays on track and achieves overall timelines and goals.

#### Create a Clear Plan of Action

Mergers, acquisitions, and
divestitures are incredibly complex processes. Obviously, no business
undertakes one without first outlining a clear plan of action and a timeline
for that plan to proceed along. But it’s crucial that that plan also prioritizes
the data migration side of the operation; it can’t just be a business-facing
process. Data migrations and carve-outs are among the most daunting tasks that
come with executing a merger, acquisition, or divestiture — so getting it right
is critical to accomplishing the broader mission at hand.

While every company’s situation is
different, there are a few key questions that businesses undergoing a merger,
acquisition, or divestiture need to ask themselves to ensure their data needs
aren’t being overlooked:

• Do we need to
integrate the company we just bought into our ERP systems? In the case of a
divestiture: Do we need to identify and carve out data from our systems?
• Does the company we’re
acquiring use SAP or another kind of ERP? Do both companies already share the
same kind of ERP?
• What regulatory issues
may come up that could lengthen, halt, or delay the process? Are there any
potential TSA compliance hurdles that we might come up against?

• One area to consider
is sales overlaps. With a 20 percent overlap from balance sheet to balance
sheet, this can present a significant potential regulatory obstacle.
• How quickly do we need
the data migration or carve-out done?

While these may seem like fundamental
first steps, they’re crucial ones. Without a clear outline of your data needs,
you could end up in a situation where a merger, acquisition, or divestiture
results in the new company taking on excessive levels of technical debt or
violating regulatory compliance — which itself carries a whole host of new
problems with which to deal.

#### Putting Automation Front and Center

Whether you’re integrating or carving out data, the process is incredibly labor-intensive and rife with repetitive tasks. More than that, each decision to be made carries potentially far-reaching consequences for everything from data history preservation and master data relevance to security and compliance. In other words, getting it right the first time is business-critical.

This is all the more the reason why
automation needs to be treated as an integral part of these processes.
Automating data migrations or carve-outs ensures that the volume of menial
tasks is being executed both quickly and painlessly while leaving the more
weighted choices to be done manually. Automation ensures decision-makers are
essentially only spending their time and resources on the tasks that most
require their input — all of which enables IT teams to best allocate and
prioritize their resources for performing even the most challenging carve-out
or migration plans.

This also comes in handy in the
aftermath of the merger, where automation can speed up post-merger/acquisition
integration projects, both accelerating how quickly and seamlessly the
migration can take hold while providing a new level of insight and control over
the process that can’t otherwise be achieved through traditional, manual
approaches.

#### Executing with Minimal Business Disruptions

After building a plan of action
and wrapping it around an automation-driven strategy, the next consideration
ultimately turns to the go-live date: Can your business handle a disruption
that lasts longer than a weekend? How quickly do you need to execute the data
operations? Just how long is too long?

This might be the last step in the
process, but it’s no less critical. Being able to carry out your new data
migration or carve-out with minimal downtime or disruptions to the business is
essentially the first proving ground of how successful your new merger,
acquisition, or divestiture will be. To that end, businesses undergoing these
transformations need to ensure they’ve aligned themselves with the right
software partner ahead of time. Successful data migrations and carve-outs are
integral to the success of the newly merged or divested company and key to
averting the technical debt or TSA violations that can otherwise knock you off
track. Getting that done on time and in line with your goals requires getting
off on the right foot with the right partner.

With so much at stake, businesses
undergoing a merger, acquisition, or divestiture need nothing less than a
predictable process for executing their data migration and carve-out needs — a
software-driven, end-to-end, automated process that is predictable in its
speed, efficiency, and success rate in delivering on your goals within your
timetable.

# Parallel ways of Data Scientist and Machine Learning

Published

on

👉 📊 There are endless conversations, debates, and discussions over this popular topic, and it can be a little overwhelming to know where to start from data science experts to complete newbies.

🔥 While, from researchers to students, industry experts, and machine learning (ML) enthusiasts — keeping up with the best and the latest machine learning research is a matter of finding reliable information. Here in this blog, we are going to share information on how data science is evolving with the rising demand for Machine Learning.

### Inside 🎰 Machine Learning- 👇

In amazingly simple words every time we pick our phones to get seek information from any search engine like google or any social media platform like Facebook or Instagram, Machine Learning is playing its role each moment. It is the role of Machine Learning to provide the most relevant information/ recommendations to the searcher. From searching for good restaurant hopping options to tips for skincare regime, we are contributing machine learning through our searches on the internet, without realizing it.

🎯 Machine Learning technology plays a big role in collecting and keeping track of user search behavioral data for the companies, so the same can be taken into consideration while taking the important product of services related decisions by Data Scientist or business personnel.

🗨 So, this was the explanation of how in our daily lives we are interacting with Machine learning Cluelessly. Now let us understand the role of data scientists and how it related to Machine Learning.

### 📉 Who is a Data Scientist?

🚀 This can be drafted as the one who is an expert in extracting meaningful information from the heaps of data. They are specialists, gathering, and analyzing large sets of structured and unstructured data. With a combination of computer science, statistics, and mathematics, Data scientists are analytical experts who utilize their skills both technologically and ethically to find trends and manage data. They analyze, process, and model data then translate the results to create actionable plans for companies and other organizations.

👩‍💻 The Sufficient knowledge of different Machine Learning techniques and like Python, SAS, R, and SQL/NoSQL database, and other tools Data Scientist can perform the task with very few challenges and easily outrank the competitor.

### 🎰 Machine Learning for Data Scientist or Vise-Versa? 👇

Taking into consideration the role of Data Scientist discussed above- without data, machine learning does not fulfill its use. This is how machine learning and data science go hand in hand as they both are incomplete without each other.

🗨 Where machine learning collects the data for Data scientists to evaluate and extract the meaningful out of it. With the increased use of technology/internet, the use of ML acts as a spur to push data science in high demand.

In the world of 📈 data science one can never feel the shortage of tools and algorithms to be applied to data, with this we can say data science skills also involves the ability to evaluate Machine learning and can make the machine as smart as to make their analyses process easier. Going forward, essential levels of machine learning will become a benchmark for data scientists. 🔻

Seeing from a different perspective, to match human abilities, machines need to be smart enough and Machine Learning is the soul of Artificial intelligence.

👨‍⚖️ Data Scientists must understand Machine Learning for the best outcomes and quality results. This can help machines to make the right decisions and smarter actions in real-time with zero human intervention. Hence, Data Scientists must acquire skills in Machine Learning. 👇

### 📖 Conclusion-

In the world of Data Science, Machine learning has already proven its worth, it is turning out to be the best solution to a deeper analysis of a huge amount of data. Data scientists must acquire knowledge of ML to standout in the competitive market.

#### ✍ Author Bio :⤵

Senior Data Scientist and Alumnus of IIM- C (Indian Institute of Management – Kolkata) with over 25 years of professional experience Specialized in Data Science, Artificial Intelligence, and Machine Learning.
PMP Certified
ITIL Expert certified APMG, PEOPLECERT, and EXIN Accredited Trainer for all modules of ITIL till Expert Trained over 3000+ professionals across the globe currently authoring a book on ITIL “ITIL MADE EASY”.

Conducted myriad Project management and ITIL Process consulting engagements in various organizations. Performed maturity assessment, gap analysis, and Project management process definition and end to end implementation of Project management best practices. 👇