AI
Data Envelopment Analysis Tutorial
 February 24, 2014
 Vasilis Vryniotis
 . 3 Comments
Data Envelopment Analysis, also known as DEA, is a nonparametric method for performing frontier analysis. It uses linear programming to estimate the efficiency of multiple decisionmaking units and it is commonly used in production, management and economics. The technique was first proposed by Charnes, Cooper and Rhodes in 1978 and since then it became a valuable tool for estimating production frontiers.
Update: The Datumbox Machine Learning Framework is now opensource and free to download. Check out the package com.datumbox.framework.algorithms.dea to see the implementation of Data Envelopment Analysis in Java.
When I first encountered the method 56 years ago, I was amazed by the originality of the algorithm, its simplicity and the cleverness of the ideas that it used. I was even more amazed to see that the technique worked well outside of its usual applications (financial, operation research etc) since it could be successfully applied in Online Marketing, Search Engine Ranking and for creating composite metrics. Despite this, today DEA is almost exclusively discussed within the context of business. That is why, in this article, I will cover the basic ideas and mathematical framework behind DEA and in the next post I will show you some novel applications of the algorithm on web applications.
Why Data Envelopment Analysis is interesting?
Data Envelopment Analysis is a method that enables us compare and rank records (stores, employees, factories, webpages, marketing campaigns etc) based on their features (weight, size, cost, revenues and other metrics or KPIs) without making any prior assumptions about the importance or weights of the features. The most interesting part of this technique is that it allows us to compare records comprised of multiple features that have totally different units of measurement. This means that we can have records with features measured in kilometers, kilograms or monetary units and still be able to compare, rank them and find the best/worst and average performing records. Sounds interesting? Keep reading.
The description and assumptions of Data Envelopment Analysis
As we discussed earlier, DEA is a method which was invented to measure productivity in business. Thus several of its ideas come from the way that productivity is measured in this context. One of the core characteristics of the method is the separation of the record features into two categories: input and output. For example if we measure the efficiency of a car, we could say that the input is the liters of petrol and the output is the number of kilometers that it travels.
In DEA, all features must be positive and it is assumed that the higher their value, the more their input/output is. Additionally Data Envelopment Analysis assumes that the features can be combined linearly as a weighted sum of nonnegative weights and form a ratio between input and output that will measure the efficiency of each record. For a record to be efficient it must give us a “good” output relative to the provided input. The efficiency is measured by the ratio between output and input and then compared to the ratio of the other records.
The ingenious idea behind DEA
What we covered so far is a common sense/practice. We use input and outputs, weighted sums and ratios to rank our records. The clever idea of DEA is in the way that the weights of the features are calculated. Instead of having to set the weights of the features and deciding on their importance before we run the analysis, the Data Envelopment Analysis calculates them from the data. Moreover the weights are NOT the same for every record!
Here is how DEA selects the weights: We try to maximize the ratio of every record by selecting the appropriate feature weights; at the same time though we must ensure that if we use the same weights to calculate the ratios of all the other records, none of them will become larger than 1.
The idea sounds a bit strange at the beginning. Won’t this lead to the calculation of differently weighted ratios? The answer is yes. Does not this mean that we actually calculate differently the ratios for every record? The answer is again yes. So how does this work? The answer is simple: For every record, given its characteristics we try to find the “ideal situation” (weights) in which its ratio would be as high as possible and thus making it as effective as possible. BUT at the same time, given this “ideal situation” none of the output/input ratios of the other records should be larger than 1, meaning that they can’t be more effective than 100%! Once we calculate the ratios of all records under each “ideal situation”, we use their ratios to rank them.
So the main idea of DEA can be summed in the following: “Find the ideal situation in which we can achieve the best ratio score based on the characteristics of each record. Then calculate this ideal ratio of each record and use it to compare their effectiveness”.
Let’s see an example
Let’s see an example where we could use DEA.
Suppose that we are interested in evaluating the efficiency of the supermarket stores of a particular chain based on a number of characteristics: the total number of employees, the size of store in square meters, the amount of sales that they generate and the number of customers that they serve every month on average. It becomes obvious that finding the most efficient stores requires us to compare records with multiple features.
To apply DEA we must define which is our input and output. In this case the output is obviously the amount of sales and the number of customers that they serve. The input is the number of employees and the size of the store. If we run DEA, we will estimate the output to input ratio for every store under the ideal weights (as discussed above). Once we have their ratios we will rank them according to their efficiency.
It’s math time!
Now that we got an intuition of how DEA works, it’s time to dig into the maths.
The efficiency ratio of a particular record i with x input and y output (both feature vectors with positive values) is estimated by using the following formula:
Where u and v are the weights of each output and input of the record, s is the number of output features and m is the number of input features.
The problem of finding the best/ideal weights for a particular record i can be formulated as follows:
Again the above is just the mathematical way of finding the weights u and v that maximize the efficiency of record i, provided that those weights will not make any of the other records more efficient than 100%.
To solve this problem we must use linear programming. Unfortunately linear programming does not allow us to use fractions and thus we need to transform the formulation of the problem as following:
We should stress that the above linear programming problem will gives us the best weights for record i and calculate its efficiency under those optimal weights. The same must be repeated for every record in our dataset. So if we have n records, we have to solve n separate linear problems. Here is the pseudocode of how DEA works:
ratio_scores = []; for every record i { i_ratio = get_maximum_effectiveness(); ratio_scores[i] = i_ratio; }
Limitations of Data Envelopment Analysis
DEA is a great technique but it has its limitations. You must understand that DEA is like a black box. Since the weights that are used in the effectiveness ratio of each record are different, trying to explain how and why each score was calculated is pointless. Usually we focus on the ranking of the records rather than on the actual values of the effectiveness scores. Also note that the existence of extremums can cause the scores to have very low values.
Have in mind that DEA uses linear combinations of the features to estimate the ratios. Thus if combining them linearly is not appropriate in our application, we must apply transformations on the features and make them possible to be linearly combined. Another drawback of this technique is that we have to solve as many linear programming problems as the number of records, something that requires a lot of computational resources.
Another problem that DEA faces is that it does not work well with high dimensional data. To use DEA the number of dimensions d = m + s must be significant lower than the number of observations. Running DEA when d is very close or larger than n does not provide useful results since most likely all the records will be found to be optimal. Note that as you add a new output variable (dimension), all the records with maximum value in this dimension will be found optimal.
Finally we should note that in the general form of the algorithm, the weights of the features in DEA are estimated from the data and thus they don’t use any prior information about the importance of features that we might have in our problem (of course it is possible to incorporate this information as constrains in our linear problem). Additionally the efficiency scores that are calculated are actually the upper limit efficiency ratios of each record since they are calculated under “ideal situations”. This means that DEA can be a good solution when it is not possible to make any assumptions about the importance of the features but if we do have any prior information or we can quantify their importance then using alternative techniques is advised.
In the next article, I will show you how to develop an implementation of Data Envelopment Analysis in JAVA and we will use the method to estimate the popularity of web pages and articles in social media networks.
If you like the article, take a moment to share it on Twitter or Facebook. 🙂
Source: http://blog.datumbox.com/dataenvelopmentanalysistutorial/
AI
Improved OCR and structured data extraction with Amazon Textract
Optical character recognition (OCR) technology, which enables extracting text from an image, has been around since the mid20th century, and continues to be a research topic today. OCR and document understanding are still vibrant areas of research because they’re both valuable and hard problems to solve.
AWS has been investing in improving OCR and document understanding technology, and our research scientists continue to publish research papers in these areas. For example, the research paper Can you read me now? Content aware rectification using angle supervision describes how to tackle the problem of document rectification which is fundamental to the OCR process on documents. Additionally, the paper SCATTER: Selective Context Attentional Scene Text Recognizer introduces a novel way to perform scene text recognition, which is the task of recognizing text against complex image backgrounds. For more recent publications in this area, see Computer Vision.
Amazon scientists also incorporate these research findings into bestofbreed technologies such as Amazon Textract, a fully managed service that uses machine learning (ML) to identify text and data from tables and forms in documents—such as tax information from a W2, or values from a table in a scanned inventory report—and recognizes a range of document formats, including those specific to financial services, insurance, and healthcare, without requiring customization or human intervention.
One of the advantages of a fully managed service is the automatic and periodic improvement to the underlying ML models to improve accuracy. You may need to extract information from documents that have been scanned or pictured in different lighting conditions, a variety of angles, and numerous document types. As the models are trained using data inputs that encompass these different conditions, they become better at detecting and extracting data.
In this post, we discuss a few recent updates to Amazon Textract that improve the overall accuracy of document detection and extraction.
Currency symbols
Amazon Textract now detects a set of currency symbols (Chinese yuan, Japanese yen, Indian rupee, British pound, and US dollar) and the degree symbol with more precision without much regression on existing symbol detection.
For example, the following is a sample table in a document from a company’s annual report.
The following screenshot shows the output on the Amazon Textract console before the latest update.
Amazon Textract detects all the text accurately. However, the Indian rupee symbol is recognized as an “R” instead of “₹”. The following screenshot shows the output using the updated model.
The rupee symbol is detected and extracted accurately. Similarly, the degree symbol and the other currency symbols (yuan, yen, pound, and dollar) are now supported in Amazon Textract.
Detecting rows and columns in large tables
Amazon Textract released a new table model update that more accurately detects rows and columns of large tables that span an entire page. Overall table detection and extraction of data and text within tables has also been improved.
The following is an example of a table in a personal investment account statement.
The following screenshot shows the Amazon Textract output prior to the new model update.
Even though all the rows, columns, and text is detected properly, the output also contains empty columns. The original table didn’t have a clear separation for columns, so the model included extra columns.
The following screenshot shows the output after the model update.
The output now is much cleaner. Amazon Textract still extracts all the data accurately from this table and now includes the correct number of columns. Similar performance improvement can be seen in tables that span an entire page and columns are not omitted.
Improved accuracy in forms
Amazon Textract now has higher accuracy on a variety of forms, especially income verification documents such as pay stubs, bank statements, and tax documents. The following screenshot shows an example of such a form.
The preceding form is not of highquality resolution. Regardless, you may have to process such documents in your organization. The following screenshot is the Amazon Textract output using one of the previous models.
Although the older model detected many of the check boxes, it didn’t capture all of them. The following screenshot shows the output using the new model.
With this new model, Amazon Textract accurately detected all the check boxes in the document.
Summary
The improvements to the currency symbols and the degree symbol detection will be launched in the Asia Pacific (Singapore) region on September 24th, 2020, followed by other regions where Amazon Textract is available in the next few days. With the latest improvements to Amazon Textract, you can retrieve information from documents with more accuracy. Tables spanning the entire page are detected more accurately, currency symbols (yuan, yen, rupee, pound, and dollar) and the degree symbol are now supported, and keyvalue pairs and check boxes in financial forms are detected with more precision. To start extracting data from your documents and images, try Amazon Textract for yourself.
About the Author
Raj Copparapu is a Product Manager focused on putting machine learning in the hands of every developer.
AI
Preventing customer churn by optimizing incentive programs using stochastic programming
In recent years, businesses are increasingly looking for ways to integrate the power of machine learning (ML) into business decisionmaking. This post demonstrates the use case of creating an optimal incentive program to offer customers identified as being at risk of leaving for a competitor, or churning. It extends a popular ML use case, predicting customer churn, and shows how to optimize an incentive program to address the real business goal of preventing customer churn. We use a large phone company for our use case.
Although it’s usual to treat this as a binary classification problem, the real world is less binary: people become likely to churn for some time before they actually churn. Loss of brand loyalty occurs some time before someone actually buys from a competitor. There’s frequently a slow rise in dissatisfaction over time before someone is finally driven to act. Providing the right incentive at the right time can reset a customer’s satisfaction.
This post builds on the post Gain customer insights using Amazon Aurora machine learning. There we met a telco CEO and heard his concern about customer churn. In that post, we moved from predicting customer churn to intervening in time to prevent it. We built a solution that integrates Amazon Aurora machine learning with the Amazon SageMaker builtin XGBoost algorithm to predict which customers will churn. We then integrated Amazon Comprehend to identify the customer’s sentiment when they called customer service. Lastly, we created a naïve incentive to offer customers identified as being at risk at the time they called.
In this post, we focus on replacing this naïve incentive with an optimized incentive program. Rather than using an abstract cost function, we optimize using the actual economic value of each customer and a limited incentive budget. We use a mathematical optimization approach to calculate the optimal incentive to offer each customer, based on our estimate of the probability that they’ll churn, and the probability that they’ll accept our incentive to stay.
Solution overview
Our incentive program is intended to be used in a system such as that described in the post Gain customer insights using Amazon Aurora machine learning. For simplicity, we’ve built this post so that it can run separately.
We use a Jupyter notebook running on an Amazon SageMaker notebook instance. Amazon SageMaker is a fullymanaged service that enables developers and data scientists to quickly and easily build, train, and deploy ML models at any scale. In the Jupyter notebook, we first build and host an XGBoost model, identical to the one in the prior post. Then we run the optimization based on a given marketing budget and review the expected results.
Setting up the solution infrastructure
To set up the environment necessary to run this example in your own AWS account, follow Steps 0 and 1 in the post Simulate quantum systems on Amazon SageMaker to set up an Amazon SageMaker instance.
Then, as in Step 2, open a terminal. Enter the following command to copy the notebook to your Amazon SageMaker notebook instance:
Alternatively, you can review a prerun version of the notebook.
Building the XGBoost model
The first sections of the notebook—Setup, Data Exploration, Train, and Host, are the same as the sample notebook Amazon SageMaker Examples – Customer Churn. The exception is that we capture a copy of the data for later use and add a column to calculate each customer’s total spend.
At the end of these sections, we have a running XGBoost model on an Amazon SageMaker endpoint. We can use it to predict which customers will churn.
Assessing and optimizing
In this section of the post and accompanying notebook, we focus on assessing the XGBoost model and creating our optimal incentive program.
We can assess model performance by looking at the prediction scores, as shown in the original customer churn prediction post, Amazon SageMaker Examples – Customer Churn.
So how do we calculate the minimum incentive that will give the desired result? Rather than providing a single program to all customers, can we save money and gain a better outcome by using variable incentives, customized to a customer’s churn probability and value? And if so, how?
We can do so by building on components we’ve already developed.
Assigning costs to our predictions
The costs of churn for the mobile operator depend on the specific actions that the business takes. One common approach is to assign costs, treating each customer’s prediction as binary: they churn or don’t, we predicted correctly or we didn’t. To demonstrate this approach, we must make some assumptions. We assign the true negatives the cost of $0. Our model essentially correctly identified a happy customer in this case, and we won’t offer them an incentive. An alternative is to assign the actual value of the customer’s spend to the true negatives, because this is the customer’s contribution to our overall revenue.
False negatives are the most problematic because they incorrectly predict that a churning customer will stay. We lose the customer and have to pay all the costs of acquiring a replacement customer, including foregone revenue, advertising costs, administrative costs, point of sale costs, and likely a phone hardware subsidy. Such costs typically run in the hundreds of dollars, so for this use case, we assume $500 to be the cost for each false negative. For a better estimate, our marketing department should be able to give us a value to use for the overhead, and we have the actual customer spend for each customer in our dataset.
Finally, we give an incentive to customers that our model identifies as churning. For this post, we assume a onetime retention incentive of $50. This is the cost we apply to both true positive and false positive outcomes. In the case of false positives (the customer is happy, but the model mistakenly predicted churn), we waste the concession. We probably could have spent those dollars more effectively, but it’s possible we increased the loyalty of an already loyal customer, so that’s not so bad. We revise this approach later in this post.
Mapping the customer churn threshold
In previous versions of this notebook, we’ve shown the effect of false negatives that are substantially more costly than false positives. Instead of optimizing for error based on the number of customers, we’ve used a cost function that looks like the following equation:
FN(C) means that the false negative percentage is a function of the cutoff, C, and similar for TN, FP, and TP. We want to find the cutoff, C, where the result of the expression is smallest.
We start by using the same values for all customers, to give us a starting point for discussion with the business. With our estimates, this equation becomes the following:
A straightforward way to understand the impact of these numbers is to simply run a simulation over a large number of possible cutoffs. We test 100 possible values, and produce the following graph.
The following output summarizes our results:
The preceding chart shows how picking a threshold too low results in costs skyrocketing as all customers are given a retention incentive. Meanwhile, setting the threshold too high (such as 0.7 or above) results in too many lost customers, which ultimately grows to be nearly as costly. In between, there is a large grey area, where perhaps some more nuanced incentives would create better outcomes.
The overall cost can be minimized at $25,750 by setting the cutoff to 0.13, which is substantially better than the $100,000 or more we would expect to lose by not taking any action.
We can also calculate the dollar outlay of the program and compare to the total spend of the customers. Here we can see that paying the incentive to all predicted churn customers costs $13,750, and that these customers spend $183,700. (Your numbers may vary, depending on the specific customers chosen for the sample.)
What happens if we instead have a smaller budget for our campaign? We choose a budget of 1% of total customer monthly spend. The following output shows our results:
We can see that our cost changes. But it’s pretty clear that an incentive of approximately $0.60 is unlikely to change many people’s minds.
Can we do better? We could offer a range of incentives to customers that meet different criteria. For example, it’s worth more to the business to prevent a high spend customer from churning than a low spend customer. We could also target the grey area of customers that have less loyalty and could be swayed by another company’s advertising. We explore this in the following section.
Preventing customer churn using mathematical optimization of incentive programs
In this section, we use a more sophisticated approach to developing our customer retention program. We want to tailor our incentives to target the customers most likely to reconsider a churn decision.
Intuitively, we know that we don’t need to offer an incentive to customers with a low churn probability. Also, above some threshold, we’ve already lost the customer’s heart and mind, even if they haven’t actually left yet. So the best target for our incentive is between those two thresholds—these are the customers we can convince to stay.
The problem under investigation is inherently stochastic in that each customer might churn or not, and might accept the incentive (offer) or not. Stochastic programming [1, 2] is an approach for modeling optimization problems that involve uncertainty. Whereas deterministic optimization problems are formulated with known parameters, realworld problems almost invariably include parameters that are unknown at the time a decision should be made. An example would be the construction of an investment portfolio to maximize return. An efficient portfolio would be defined as the portfolio that maximizes the expected return for a given amount of risk (such as standard deviation), or the portfolio that minimizes the risk subject to a given expected return [3].
Our use case has the following elements:
 We know the number of customers, 𝑁.
 We can use the customer’s current spend as the (upper bound) estimate of the profit they generate, P.
 We can use the churn score from our ML model as an estimate of the probability of churn, alpha.
 We use 1% of our total revenue as our campaign budget, C.
 The probability that the customer is swayed, beta, depends on how convincing the incentive is to the customer, which we represent as 𝛾.
 The incentive, c, is what we want to calculate.
We set up our inputs: P (profit), alpha (our churn probabilities, from our preceding model), and C, our campaign budget. We then define the function we wish to optimize, f(c_{i}) being the expected total profit across the 𝑁 customers.
Our goal is to optimally allocate the discount 𝑐_{𝑖} across the 𝑁 customers to maximize the expected total profit. Mathematically this is equivalent to the following optimization problem:
Now we can specify how likely we think each customer is to accept the offer and not churn—that is, how convincing they’ll find the incentive. We represent this as 𝛾 in the formulae.
Although this is a matter of business judgment, we can use the preceding graph to inform that judgment. In this case, the business believes that if the churn probability is below 0.55, they are unlikely to churn, even without an incentive; on the other hand, if the customer’s churn probability is above 0.95, the customer has little loyalty and is unlikely to be convinced. The real targets for the incentives are the customers with churn probability between 0.55–0.95.
We could include that business insight into the optimization by setting the value for the convincing factor 𝛾_{𝑖} as follows:
 𝛾_{𝑖} = 100. This is equivalent to giving less importance as a deciding factor to the discount for customers whose churn probability is below 0.55 (they are loyal and less likely to churn), or greater than 0.95 (they will most likely leave despite the retention campaign).
 𝛾_{𝑖} = 1. This is equivalent to saying that the probability customer i will accept the discount is equal to 𝛽=1−𝑒^{−C}^{𝑖} for customers with churn probability between 0.55 and 0.95.
When we start to offer these incentives, we can log whether or not each customer accepts the offer and remains a customer. With that information, we can learn this function from experience, and use that learned function to develop the next set of incentives.
Solving the optimization problem
A variety of opensource solvers are available that can solve this optimization problem for us. Examples include SciPy scipy.optimize.minimize, or faster opensource solvers like GEKKO, which is what we use for this post. For largescale problems, we would recommend using commercial optimization solvers like CPLEX or GUROBI.
After the optimization task has run, we check how much of our budget has been allocated.
Now we evaluate the expected total profit for the following scenarios:
 Optimal discount allocation, as calculated by our optimization algorithm
 Uniform discount allocation: every customer is offered the same incentive
 No discount
The following graph shows our outcomes.
Lastly, we add the discount to our customer data. We can see how the discount we offer varies across our customer base. The red vertical line shows the value of the uniform discount. The pattern of discounts we offer closely mirrors the pattern in the prediction scores, where many customers aren’t identified as likely churners, and a few are identified as highly likely to churn.
We can also see a sample of the discounts we’d be offering to individual customers. See the following table.
For each customer, we can see their total monthly spend and the optimal incentive to offer them. We can see that the discount varies by churn probability, and we’re assured that the incentive campaign fits within our budget.
Depending on the size of the total budget we allocate, we may occasionally find that we’re offering all customers a discount. This discount allocation problem reminds us of the waterfilling algorithm in wireless communications [4,5], where the problem is of maximizing the mutual information between the input and the output of a channel composed of several subchannels (such as a frequencyselective channel, a timevarying channel, or a set of parallel subchannels arising from the use of multiple antennas at both sides of the link) with a global power constraint at the transmitter. More power is allocated to the channels with higher gains to maximize the sum of data rates or the capacity of all the channels. The solution to this class of problems can be interpreted as pouring a limited volume of water into a tank, the bottom of which has the stair levels determined by the inverse of the subchannel gains.
Unfortunately, our problem doesn’t have an intuitive explanation as for the waterfilling problem. This is due to the fact that, because of the nature of the objective function, the system of equations and inequalities corresponding to the KKT conditions [6] doesn’t admit a closed form solution.
The optimal incentives calculated here are the result of an optimization routine designed to maximize an economic figure, which is the expected total profit. Although this approach provides a principled way for marketing teams to make systematic, quantitative, and analyticsdriven decisions, it’s also important to recall that the objective function to be optimized is a proxy measure to the actual total profit. It goes without saying that we can’t compute the actual profit based on future decisions (this would paradoxically imply maximizing the actual return based on future values of the stocks). But we can explore new ideas using techniques such as the potential outcomes work [7], which we could use to design strategies for backtesting our solution.
Conclusion
We’ve now taken another step towards preventing customer churn. We built on a prior post in which we integrated our customer data with our ML model to predict churn. We can now experiment with variations on this optimization equation, and see the effect of different campaign budgets or even different theories of how they should be modeled.
To gather more data on effective incentives and customer behavior, we could also test several campaigns against different subsets of our customers. We can collect their responses—do they churn after being offered this incentive, or not—and use that data in a future ML model to further refine the incentives offered. We can use this data to learn what kinds of incentives convince customers with different characteristics to stay, and use that new function within this optimization.
We’ve empowered marketing teams with the tools to make datadriven decisions that they can quickly turn into action. This approach can drive fast iterations on incentive programs, moving at the speed with which our customers make decisions. Over to you, marketing!
Bibliography
[1] S. Uryasev, P. M. Pardalos, Stochastic Optimization: Algorithm and Applications, Kluwer Academic: Norwell, MA, USA, 2001.
[2] John R. Birge and François V. Louveaux. Introduction to Stochastic Programming. Springer Verlag, New York, 1997.
[3] Francis, J. C. and Kim, D. (2013). Modern portfolio theory: Foundations, analysis, and new developments (Vol. 795). John Wiley & Sons.
[4] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991.
[5] D. P. Palomar and J. R. Fonollosa, “Practical algorithms for a family of waterfilling solutions,” IEEE Trans. Signal Process., vol. 53, no. 2, pp. 686–695, Feb. 2005.
[6] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
[7] Imbens, G. W. and D. B. Rubin (2015): Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.
About the Authors
Marco Guerriero, PhD, is a Practice Manager for Emergent Technologies and Intelligence Platform for AWS Professional Services. He loves working on ways for emergent technologies such as AI/ML, Big Data, IoT, Quantum, and more to help businesses across different industry verticals succeed within their innovation journey.
Veronika Megler, PhD, is Principal Data Scientist for Amazon.com Consumer Packaging. Until recently she, was the Principal Data Scientist for AWS Professional Services. She enjoys adapting innovative big data, AI, and ML technologies to help companies solve new problems, and to solve old problems more efficiently and effectively. Her work has lately been focused more heavily on economic impacts of ML models and exploring causality.
Oliver Boom is a London based consultant for the Emerging Technologies and Intelligent Platforms team at AWS. He enjoys solving largescale analytics problems using big data, data science and dev ops, and loves working at the intersection of business and technology. In his spare time he enjoys language learning, music production and surfing.
Dr Sokratis Kartakis is a UKbased Data Science Consultant for AWS. He works with enterprise customers to help them adopt and productionize innovative Machine Learning (ML) solutions at scale solving challenging business problems. His focus areas are ML algorithms, ML Industrialization, and AI/MLOps. He enjoys spending time with his family outdoors and traveling to new destinations to discover new cultures.
AI
Underwater Autonomous Vehicles Helping Navy Get More for the Money
By AI Trends Staff
As the US Navy strives to get to a 355ship fleet by 2034, part of a 30year plan the Navy outlined in 2019, it is finding it economical to use underwater autonomous vehicles. They meet many of the requirements of larger submarines at a fraction of the cost.
For example, the Navy awarded Boeing a $43million contract to build four Orca Extra Large Unmanned Undersea Vehicles (XLUUVs) last year, described in a release from USNI News.
Boeing based its winning Orca XLUUV design on its Echo Voyager unmanned dieselelectric submersible. The 51footlong submersible is launched from a pier and can operate autonomously while sailing up to 6,500 nautical miles without being connected to a manned mother ship, according to the Navy.
In military spending terms, the versatility of the Orca at the price is “fairly unheard of in military spending,” according to an account in The National Interest. The nearest equivalent was cited as the Navy’s Littoral Combat Ship, which costs $584 million each and has a crew of 40. While the LCS is faster, has an onboard crew, and has a larger payload, the Orca is autonomous, and cheaper by orders of magnitude.
Not only due to President Donald Trump’s initiatives to pursue leadership in AI, the Navy is pushing into autonomous vehicles. The Navy’s autonomous Sea Hunter trimaran, engineered for minesweeping and subhunting, earlier this year traveled from San Diego to Hawaii and back again without a single sailor on board, in a historic cruise.
The Navy is envisioning the potential of “robot wolfpacks” of unmanned, remotelyoperated surface vessels to function as scouts, decoys, and forward electronic warfare platforms, according to an account last year in Breaking Defense.
“Part of the value of having unmanned surface vehicles is you can get capacity at a lower cost,” stated Rear Adm. John Neagley, the Navy’s Program Executive Officer for Unmanned & Small Combatants.
Distributed Marine Operations Across Platforms is a Goal
The Navy is working on a communications network that can pull information from multiple ships into a single picture. Rear Admiral Douglas Small, who heads the Program Executive Office for Integrated Warfare Systems (PEOIWS), said in the long term, the goal is “communication as a service.”
“We can improve our naval power immediately… by stitching together things that we have today,” Adm. Small stated. “We’re working really hard on concepts like integrated fire control, expanding that out to every disparate platform we have out there, just expanding our reach and really taking advantage of this concept of distributed maritime operations.”
The network should be agnostic to the hardware on any specific ship. “What’s crucial is to get the technology to the fleet, quickly, so real crews can experiment with it in realworld conditions,” stated Adm. Small.
Ships being held in port during the coronavirus pandemic has given the Navy and other federal agencies on the water an opportunity to use autonomous systems to keep work going. For example, the NOAA sent Sail Drones to Alaska to perform a critical fisher survey and for coastal mapping, according to an account in SeaPower Magazine.
“We were able to map in pretty shallow areas that would have been hazardous for ships,” stated retired Rear Adm. Tim Gallaudet, deputy administrator of the National Oceanic and Atmospheric Administration and the former Oceanographer of the Navy. He was speaking in a recent webinar hosted by the Marine Technology Society’s Washington section and the company Oceaneering.
The agency is leveraging artificial intelligence, machine learning, autonomous systems, data management and other advances and “applying those technologies in everything we do,” he said, including setting up an AI center for NOAA.
Another example is the unmanned, antisubmarine ship Sea Hunter, launched in 2016, which autonomously navigates open waters and actively coordinates missions with other unmanned sea vessels.
With the introduction of more AI and machine learning into the US Navy, come new cybersecurity challenges. A recent report from Thomas Insights issues a warning about the risk that novel machine learning algorithms can become susceptible to manipulation by adversaries, in what is known as adversarial machine learning (AML).
Adversarial machine learning (AML) is a technique used to dupe ML models into producing false or inaccurate outputs. AML can be accomplished by inserting altered or manipulated inputs into the ML model’s dataset during or after training, referred to by cybersecurity researchers as “poisoning” and “evasion.” It can also be executed by making physical, realworld alterations to objects that an AI system is expected to detect and respond to once deployed.
These tactics can have serious consequences for both national security and human life.
The most pressing of scenarios outlined in a 2018 report by the Office of the Director of National Intelligence, was the AML’s potential to compromise computer vision algorithms. For example, researchers have demonstrated that by strategically placing stickers on a stop sign, a vehicle’s object detection system can consistently misidentify it as a speed limit sign, putting the driver, passengers, other drivers and pedestrians at risk.
Countermeasures being tried by defenders include adversarial training, which involves feeding a machine learning algorithms potential small changes to an image, called ‘perturbations,” to train it to recognize the image despite the manipulation. Other methods include preprocessing and denoising, to automatically remove any adversarial noise from inputs, and adversarial example detection, to distinguish between legitimate and adversarial inputs. These approaches try to assure that AML inputs and alterations are neutralized before they reach the algorithm for classification.
Read the source articles and information from USNI News, The National Interest, Breaking Defense, SeaPower Magazine and Thomas Insights.

Esports1 week ago
Championship LeBlanc Chromas, Price, Release Date, How to Get

Start Ups1 week ago
Chinese Virologist Dr. LiMeng Yan says the Chinese Communist Party intentionally created and unleashed COVID19 upon the world as part of biological warfare (video)

AR/VR1 week ago
Quest 2 Official Accessories Include Elite Strap, Elite Battery Strap, Carrying Case, & Fit Pack

Gaming1 week ago
Microsoft Flight Simulator 2020 update 1.8.3.0 patch notes

Esports1 week ago
League of Legends: Wild Rift regional closed beta is getting 6 new champions

Covid191 week ago
Job Searching in Pandemic Times

CNBC1 week ago
iRobot’s Roomba i3+ is its cheapest vacuum with a selfcleaning dock

CNBC6 days ago
Supreme Court Justice Ruth Bader Ginsburg dies at age 87