Novel AI Approaches For Marketing & Advertising

Image Synthesis - ImageGAN

Marketing and advertising are some of the functional areas where AI is expected to drive the most ROI for enterprises. Unfortunately, the industry is moving so fast that it’s challenging for both marketers and technologists to keep up with all the research advances, much less apply them to pressing business problems.

To help you stay aware of the state-of-the-art AI applications in marketing, we have prepared an update of our AI marketing research series with 15 recent research papers in:

If these accessible AI research analyses & summaries are useful for you, you can subscribe to receive our regular industry updates below.

If you’d like to skip around, here are the papers we featured:

Marketing Attribution

1. Causally Driven Incremental Multi-Touch Attribution Using a Recurrent Neural Network, by Ruihuan Du, Yu Zhong, Harikesh Nair, Bo Cui, Ruyang Shou

Original Abstract

This paper describes a practical system for Multi-Touch Attribution (MTA) for use by a publisher of digital ads. We developed this system for JD.com, an eCommerce company, which is also a publisher of digital ads in China. The approach has two steps. The first step (‘response modeling’) fits a user-level model for purchase of a product as a function of the user’s exposure to ads. The second (‘credit allocation’) uses the fitted model to allocate the incremental part of the observed purchase due to advertising, to the ads the user is exposed to over the previous T days. To implement step one, we train a Recurrent Neural Network (RNN) on user-level conversion and exposure data. The RNN has the advantage of flexibly handling the sequential dependence in the data in a semi-parametric way. The specific RNN formulation we implement captures the impact of advertising intensity, timing, competition, and user-heterogeneity, which are known to be relevant to ad-response. To implement step two, we compute Shapley Values, which have the advantage of having axiomatic foundations and satisfying fairness considerations. The specific formulation of the Shapley Value we implement respects incrementality by allocating the overall incremental improvement in conversion to the exposed ads, while handling the sequence-dependence of exposures on the observed outcomes. The system is under production at JD.com, and scales to handle the high dimensionality of the problem on the platform (attribution of the orders of about 300M users, for roughly 160K brands, across 200+ ad-types, served about 80B ad-impressions over a typical 15-day period).

Our Summary

The researchers from JD.com, a key Chinese eCommerce firm and a publisher of digital ads, introduce a new approach to multi-touch attribution. In particular, they suggest a two-step solution with a recurrent neural network (RNN) used at the first step to fit a user-level model for conversion as a function of the user’s exposure to ads. In the second step, the authors use Shapley values to allocate the incremental part of the purchase to the ads that the user was exposed to over the previous T days. The implementation of the suggested approach in production at JD.com demonstrates that this system is effective and applicable in the real world and can be scaled to a huge ad publishing platform with 300M users, over 200 ad types and about 80B ad impressions over a 15-day period.

What’s the core idea of this paper?

The research team introduces a novel two-step approach to multi-touch attribution:
- Response modeling. To model a user-level conversion as a function of the user’s exposure to ads, the researchers suggest using a recurrent neural network (RNN):
  - The RNN allows capturing heterogeneity across users, and the responsiveness of current purchases to a sequence of past ad exposures, as well as to the intensity, timing, and competitiveness of ad exposures.
  - The model outputs the probability that a user buys a certain product within a given time period, given the impressions served to the user and a set of user characteristics.
- Credit allocation. To allocate the incremental part of the observed purchase due to advertising to the advertisements that the user was exposed to over the last T days, the authors suggest computing Shapley values, which have a good theoretical grounding and satisfy fairness considerations.

What’s the key achievement?

Introducing a coherent, theoretically-grounded and data-driven attribution framework that:
- captures that some orders are more advertising-driven while others are likely to occur irrespective of advertising;
- helps identify top advertisements that contribute the most to conversion;
- is scalable to a high-dimensionality advertising platform that serves millions of customers, with these customers exposed to billions of ad impressions.

What does the AI community think?

The paper was presented at the AdKDD workshop within the KDD 2019 conference.

What are future research areas?

Developing a method for optimal advertiser budget allocation by incorporating the attribution defined by the proposed model.

What are possible business applications?

The introduced multi-touch attribution framework can be used by ad publishers, including eCommerce platforms, and also advertisers to assign credit to the various ads they buy.

2. Shapley Meets Uniform: An Axiomatic Framework for Attribution in Online Advertising, by Raghav Singal, Omar Besbes, Antoine Desir, Vineet Goyal, Garud Iyengar

Original Abstract

One of the central challenges in online advertising is attribution, namely, assessing the contribution of individual advertiser actions including emails, display ads and search ads to eventual conversion. Several heuristics are used for attribution in practice; however, there is no formal justification for them and many of these fail even in simple canonical settings. The main contribution in this work is to develop an axiomatic framework for attribution in online advertising. In particular, we consider a Markovian model for the user journey through the conversion funnel, in which ad actions may have disparate impacts at different stages. We propose a novel attribution metric, that we refer to as counterfactual adjusted Shapley value, which inherits the desirable properties of the traditional Shapley value. Furthermore, we establish that this metric coincides with an adjusted “unique-uniform” attribution scheme. This scheme is efficiently computable and implementable and can be interpreted as a correction to the commonly used uniform attribution scheme.

Our Summary

In this paper, the authors from Columbia University try to take a systematic and theoretically-sound approach towards measuring attribution in online advertising. First of all, they represent the user journey through the conversion funnel as an abstract Markov chain model, where at each period the user is in one of many states, and an advertiser takes an action observing this state. Then, the researchers propose a new metric for attribution in the Markovian model of user behavior, called counterfactual adjusted Shapley value. This metric inherits the benefits of the classical Shapley value (SV), such as efficiency, symmetry, and linearity, but in contrast to the classical SV, can be easily computed. The authors also demonstrate that the suggested metric coincides with an adjusted “unique-uniform” attribution scheme.

What’s the core idea of this paper?

Despite the importance of the attribution issue in online advertising, the research community has still not defined the “best” attribution measure:
- Incremental value heuristic (IVH) seems to be the most popular one but lacks theoretical grounding.
- Shapley value (SV) has strong theoretical justification but cannot be estimated exactly, requiring certain assumptions.
The researchers suggest an abstract Markov chain model for representation of the user journey through the conversion funnel:
- At every period, the user is in one of the finitely many states.
- An advertiser observes the state and takes action.
- Then, instead of the traditional approach of attributing the value only to advertising actions, the authors suggest attributing the values to each state-action pair.
Finally, the paper suggests measuring attribution in the Markovian model using a counterfactual adjusted Shapley value, a metric that is efficiently computable, implementable and interpretable as a correction to the popular uniform attribution scheme.

What’s the key achievement?

Suggesting a new metric for measuring attribution in online advertising, a counterfactual adjusted Shapley value, that:
- inherits the desirable properties of the classical SV;
- is robust to a mix of network structures;
- coincides with a unique-uniform attribution scheme;
- can be easily computed.

What does the AI community think?

The paper was accepted for presentation to The Web Conference 2019.

What are future research areas?

Developing more canonical settings to verify the appropriateness of the proposed metric.
Understanding the statistical efficiency of the algorithms used to estimate this metric.
Comparing the output of the suggested scheme with the alternative ones on a real-world dataset.
Applying the introduced methodology to other domains.

What are possible business applications?

The introduced metric can help advertisers get a better understanding of:
- the value of specific channels at a given time;
- the optimal budget allocation.

Marketing Optimization

3. AiAds: Automated and Intelligent Advertising System for Sponsored Search, by Xiao Yang, Daren Sun, Ruiwei Zhu, Tao Deng, Zhi Guo, Jiao Ding, Shouke Qin, Zongyao Ding, Yanfeng Zhu

Original Abstract

Sponsored search has more than 20 years of history, and it has been proven to be a successful business model for online advertising. Based on the pay-per-click pricing model and the keyword targeting technology, the sponsored system runs online auctions to determine the allocations and prices of search advertisements. In the traditional setting, advertisers should manually create lots of ad creatives and bid on some relevant keywords to target their audience. Due to the huge amount of search traffic and a wide variety of ad creations, the limits of manual optimizations from advertisers become the main bottleneck for improving the efficiency of this market. Moreover, as many emerging advertising forms and supplies are growing, it’s crucial for sponsored search platform to pay more attention to the ROI metrics of ads for getting the marketing budgets of advertisers.

In this paper, we present the AiAds system developed at Baidu, which use machine learning techniques to build an automated and intelligent advertising system. By designing and implementing the automated bidding strategy, the intelligent targeting and the intelligent creation models, the AiAds system can transform the manual optimizations into multiple automated tasks and optimize these tasks in advanced methods. AiAds is a brand-new architecture of sponsored search system which changes the bidding language and allocation mechanism, breaks the limit of keyword targeting with end-to-end ad retrieval framework and provides global optimization of ad creation. This system can increase the advertiser’s campaign performance, the user experience and the revenue of the advertising platform simultaneously and significantly. We present the overall architecture and modeling techniques for each module of the system and share our lessons learned in solving several key challenges.

Our Summary

The Baidu research team addresses the key challenges of sponsored search by introducing the AiAds system, an automated and intelligent advertising system, which is based on various machine learning techniques. In particular, they suggest an automated bidding engine to solve the problems of traditional keyword-level manual bidding optimization; an intelligent targeting service for direct matching from query to related ads without the mediation of keywords; and an intelligent framework for automated creation of ad templates based on the available information about the product and business. The results from the online A/B tests and the long-term grouping experiment demonstrate the effectiveness of the AiAds system for advertisers, advertising platforms, and users.

What’s the core idea of this paper?

These days, the traditional approach to sponsored search – with manual keyword selection, keyword-level bidding optimization, pay-per-click pricing model, and manual ad creation – is becoming too cumbersome and inefficient.
To address the main challenges in sponsored search, the Baidu research team suggests:
- a new bidding language and corresponding automated bidding strategy for advertisers to optimize campaign performance directly;
- a straightforward retrieval and matching model for more optimal selection of ads corresponding to search queries;
- a componentized framework for designing and generating ad creations that automatically optimize the content and layout of advertising.

What’s the key achievement?

Introducing an automated and intelligent system for sponsored search that:
- increases campaign performance for advertisers (56% improvement in conversions);
- enhances the user experience by providing more relevant ads for the given queries;
- increases the revenue of the advertising platform (47% improvement in revenue).

What does the AI community think?

The paper was accepted to KDD 2019, the leading conference in knowledge discovery and data mining.

What are future research areas?

Optimizing the ad retrieval model by utilizing more data sources and advanced models.
Solving the existing issues in designing a reasonable mechanism for ROI-constrained bidders.

What are possible business applications?

The elements of the introduced advertising system can be implemented by other advertising platforms to improve conversions, enhance user experience and increase revenue.

4. Time-Aware Prospective Modeling of Users for Online Display Advertising, by Djordje Gligorijevic, Jelena Gligorijevic, Aaron Flores

Original Abstract

Prospective display advertising poses a great challenge for large advertising platforms as the strongest predictive signals of users are not eligible to be used in the conversion prediction systems. To that end, efforts are made to collect as much information as possible about each user from various data sources and to design powerful models that can capture weaker signals ultimately obtaining good quality of conversion prediction probability estimates. In this study, we propose a novel time-aware approach to model heterogeneous sequences of users’ activities and capture implicit signals of users’ conversion intents. On two real-world datasets, we show that our approach outperforms other, previously proposed approaches, while providing interpretability of signal impact to conversion probability.

Our Summary

The Yahoo Research team addresses the problem of attracting new users with online display advertising. This is a particularly challenging task since such strong signals of users’ interest as visits to the advertiser’s website or recent conversions are not available in the case of prospective customers. Thus, the researchers suggest gathering all available information about the user as a time-ordered sequence of activities (i.e., search sessions, ad clicks, reservations, shopping carts, etc.). Then, they introduce a sequence learning approach to model time-ordered heterogeneous user activities gathered from multiple sources. The approach includes a novel time-aware mechanism to capture the temporal aspect of events. The experiments demonstrate the effectiveness and interpretability of the suggested approach.

What’s the core idea of this paper?

Advertisers are always interested in getting new customers who have had no previous interactions with the respective advertiser.
To address this problem, the Yahoo Research team introduces a novel Deep Time-Aware conversIoN (DTAIN) model:
- The inputs of the model include a sequence of events, the time difference between events’ timestamps, and the time point of prediction.
- This information goes through 5 specifically-designed blocks: events and temporal information embedding, temporal attention learning block, recurrent net block, attention learning block, and final classification block.

What’s the key achievement?

Experiments with public and proprietary datasets demonstrate that:
- Temporal information is important for predicting conversion.
- DTAIN significantly outperforms several strong baselines with regard to conversion prediction.

What does the AI community think?

The paper was presented at the AdKDD workshop within the KDD 2019 conference.

What are future research areas?

Developing novel techniques to address significant noise that is present in the data collected from many data sources.

What are possible business applications?

The introduced approach can benefit advertisers and ad publishers by effectively predicting the conversion of prospective customers.

5. A Unified Framework for Marketing Budget Allocation, by Kui Zhao, Junhao Hua, Ling Yan, Qi Zhang, Huan Xu, Cheng Yang

Original Abstract

While marketing budget allocation has been studied for decades in traditional business, nowadays online business brings much more challenges due to the dynamic environment and complex decision-making process. In this paper, we present a novel unified framework for marketing budget allocation. By leveraging abundant data, the proposed data-driven approach can help us to overcome the challenges and make more informed decisions. In our approach, a semi-black-box model is built to forecast the dynamic market response and an efficient optimization method is proposed to solve the complex allocation task. First, the response in each market-segment is forecasted by exploring historical data through a semi-black-box model, where the capability of logit demand curve is enhanced by neural networks. The response model reveals relationship between sales and marketing cost. Based on the learned model, budget allocation is then formulated as an optimization problem, and we design efficient algorithms to solve it in both continuous and discrete settings. Several kinds of business constraints are supported in one unified optimization paradigm, including cost upper bound, profit lower bound, or ROI lower bound. The proposed framework is easy to implement and readily to handle large-scale problems. It has been successfully applied to many scenarios in Alibaba Group. The results of both offline experiments and online A/B testing demonstrate its effectiveness.

Our Summary

Online business brings new challenges to the marketing budget allocation process. The environment is very dynamic and budget adjustments need to be made weekly or even daily. The Alibaba research team introduces a unified framework for marketing budget allocation in online business. They suggest a two-step approach, where first, the response in each market segment is learned from historical data, and then, budget allocation is optimized based on learned models. The suggested approach is being applied by the Alibaba Group, demonstrating its effectiveness in handling large-scale problems.

What’s the core idea of this paper?

Companies that operate in a dynamic online environment need to approach the marketing budget allocation problem with new data-driven solutions.
The Alibaba research team introduces a novel unified framework for marketing budget allocation.
- First, the market response in each segment is forecast with a semi-black-box model, where the logit demand curve is supported by neural networks.
- Then, budget allocation is formulated as an optimization problem.
  - The Lagrange multiplier method is applied to address the non-convexity of logit demand curves.
Additional business constraints, such as cost upper bound, profit lower bound, or ROI lower bound can be also incorporated into the suggested framework.

What’s the key achievement?

The proposed framework has been successfully applied to many scenarios in Alibaba Group.
The online A/B testing demonstrates that the introduced framework for marketing budget allocation can lead to:
- weekly sales growth of over 6%
- with 40% less money spent.

What does the AI community think?

The paper was accepted to KDD 2019, the leading conference in knowledge discovery and data mining.

What are future research areas?

Exploring the relationship between the market cost and contextual variables (i.e., brands, cities, consumption time, etc) in the logit response model.
Investigating the possibility of supporting boundary constraints on decision variables in the optimization part of the framework.

What are possible business applications?

The introduced approach can significantly improve the effectiveness of marketing budget allocation for companies operating in online business.

Personalization in Marketing

6. Deep Learning Recommendation Model for Personalization and Recommendation Systems, by Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, Misha Smelyanskiy

Original Abstract

With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design.

Our Summary

The Facebook research team addresses personalization by combining perspectives from recommendation systems and predictive analytics. Specifically, they introduce a Deep Learning Recommendation Model (DLRM) that uses embeddings to process sparse features and a multilayer perceptron (MLP) to process dense features. Then, the model combines these features explicitly and defines the event probability using another MLP. The experiments demonstrate the effectiveness of the suggested approach in building a recommender system.

What’s the core idea of this paper?

To tackle personalization with neural networks, the Facebook team introduces a Deep Learning Recommendation Model (DLRM):
- All categorical features are represented by an embedding vector.
- The continuous features are transformed by a multilayer perceptron (MLP) providing a dense representation of the same length as the embedding vectors.
- Then, the second-order interaction of different features is computed explicitly following the approach of handling sparse data provided in factorization machines. Namely, the dot product between all pairs of embedding vectors and processed dense features is computed.
- In the next step, the dot products are combined with the original dense features and post-processed using another MLP.
- Finally, the output of the MLP is fed into a sigmoid function to give a probability.

What’s the key achievement?

Introducing a novel deep learning-based recommendation model that exploits categorical data and demonstrates good performance compared to the existing approaches.
Open-sourcing implementation of the introduced model.

What are future research areas?

Further improving the model’s performance by tuning hyperparameters and experimenting with model design.

What are possible business applications?

Companies can use the open-sourced implementation of the suggested deep learning-based model to enhance their recommender systems with neural networks.

Where can you get implementation code?

The authors provide the implementation of DLRM in PyTorch and Caffe2 on GitHub.

7. Personalized Purchase Prediction of Market Baskets with Wasserstein-Based Sequence Matching, by Mathias Kraus and Stefan Feuerriegel

Original Abstract

Personalization in marketing aims at improving the shopping experience of customers by tailoring services to individuals. In order to achieve this, businesses must be able to make personalized predictions regarding the next purchase. That is, one must forecast the exact list of items that will comprise the next purchase, i.e., the so-called market basket. Despite its relevance to firm operations, this problem has received surprisingly little attention in prior research, largely due to its inherent complexity. In fact, state-of-the-art approaches are limited to intuitive decision rules for pattern extraction. However, the simplicity of the pre-coded rules impedes performance, since decision rules operate in an autoregressive fashion: the rules can only make inferences from past purchases of a single customer without taking into account the knowledge transfer that takes place between customers. In contrast, our research overcomes the limitations of pre-set rules by contributing a novel predictor of market baskets from sequential purchase histories: our predictions are based on similarity matching in order to identify similar purchase habits among the complete shopping histories of all customers. Our contributions are as follows: (1) We propose similarity matching based on subsequential dynamic time warping (SDTW) as a novel predictor of market baskets. Thereby, we can effectively identify cross-customer patterns. (2) We leverage the Wasserstein distance for measuring the similarity among embedded purchase histories. (3) We develop a fast approximation algorithm for computing a lower bound of the Wasserstein distance in our setting. An extensive series of computational experiments demonstrates the effectiveness of our approach. The accuracy of identifying the exact market baskets based on state-of-the-art decision rules from the literature is outperformed by a factor of 4.0.

Our Summary

The research team from ETH Zurich addresses market basket prediction by considering the complete shopping histories of all available customers. Specifically, their algorithm learns to identify co-occurrences between shopping histories from different customers. The sequence-based similarity matching is computed according to the Wasserstein distance. Thereby, market baskets are interpreted as probability distributions of products from an assortment. The experiments on three real-world datasets demonstrate that the suggested approach significantly outperforms association rules and naïve heuristics with respect to the accuracy of market basket prediction.

What’s the core idea of this paper?

The proposed algorithm for prediction of market baskets includes four steps:
- Building item embeddings with each item represented as a multi-dimensional vector and similar items being closer together.
- Utilizing the Wasserstein distance to compute distances between market baskets as the minimum cost of turning one probability distribution of products into the other.
- Building a k-nearest neighbor sequence matching with a subsequence dynamic time warping (i.e., kNN-SDTW).
- Making a prediction of the next market basket by choosing the most similar shopping histories.

What’s the key achievement?

Introducing a novel market basket prediction algorithm that:
- can learn hidden structure among products and leverage cross-customer knowledge for improved predictions;
- achieves scalability by deriving a fast variant of subsequence matching;
- outperforms baseline models by 2.54% on a multi-category dataset;
- improves the ratio of correct predictions by a factor of 4.0 for a dataset covering food, office supplies, and furniture.

What does the AI community think?

The paper was presented at KDD 2019, the leading conference in knowledge discovery and data mining.

What are future research areas?

Exploring the ways to capture complex substitution effects driven by spontaneous purchases or price promotions.

What are possible business applications?

The introduced algorithm can help companies achieve higher accuracy in predicting the future purchases of their customers, leading to an improved shopping experience for these customers and increased sales.

Where can you get implementation code?

The authors provide the implementation of the presented approach on GitHub.

8. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches, by Maurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach

Original Abstract

Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today’s research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models.

In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area.

Our Summary

The researchers question the progress that deep learning techniques bring into the recommender system area. They conduct a systematic analysis of 18 research papers that have introduced new algorithms for proposing top-n recommendations and have been presented at top conferences during the last few years. The authors identify two major issues with this research: (1) lack of reproducibility, with only 7 out of 18 papers providing sufficient information for reproducing their research; (2) lack of progress, with 6 out of 7 reproduced models being outperformed using simple heuristic methods. Thus, the researchers call for more rigorous research practices with respect to the evaluation of new contributions in this area.

What’s the core idea of this paper?

Despite neural networks becoming a popular tool for building recommender systems, the progress that these methods introduce compared to simple heuristic practices is questionable.
The factors that contribute to that phenomena include:
- weak baselines that researchers use when comparing their novel approaches;
- difficulties with reproducing results across papers as source code is often not shared;
- using different types of datasets, evaluation protocols, performance measures, and data preprocessing steps, which makes it more difficult to reproduce and compare the introduced approaches.

What’s the key achievement?

Demonstrating that new deep learning-based approaches for top-n recommendation tasks are not making much progress compared to methods based on nearest-neighbor or graph-based techniques:
- Only 7 out of 18 research papers selected for analysis could be reproduced.
- 6 out of 7 models were outperformed by comparably simple heuristic methods.
- One model (Mult-VAE) clearly outperformed the baselines but did not consistently perform better than a well-tuned non-neural linear ranking method.

What does the AI community think?

The paper was presented at RecSys 2019, the 13th ACM Conference on Recommender Systems.

What are future research areas?

Extending the analysis to other publication outlets beyond conferences and other types of recommendation problems.
Considering more traditional algorithms as baselines (e.g., matrix factorization).

Where can you get implementation code?

The authors provide the implementation of their evaluation on GitHub.

Marketing Analytics

9. A Deep Probabilistic Model for Customer Lifetime Value Prediction, by Xiaojing Wang, Tianqi Liu, Jingang Miao

Original Abstract

Accurate predictions of customers’ future lifetime value (LTV) given their attributes and past purchase behavior enables a more customer-centric marketing strategy. Marketers can segment customers into various buckets based on the predicted LTV and, in turn, customize marketing messages or advertising copies to serve customers in different segments better. Furthermore, LTV predictions can directly inform marketing budget allocations and improve real-time targeting and bidding of ad impressions.

One challenge of LTV modeling is that some customers never come back, and the distribution of LTV can be heavy-tailed. The commonly used mean squared error (MSE) loss does not accommodate the significant fraction of zero value LTV from one-time purchasers and can be sensitive to extremely large LTVs from top spenders. In this article, we model the distribution of LTV given associated features as a mixture of zero point mass and lognormal distribution, which we refer to as the zero-inflated lognormal (ZILN) distribution. This modeling approach allows us to capture the churn probability and account for the heavy-tailedness nature of LTV at the same time. It also yields straightforward uncertainty quantification of the point prediction. The ZILN loss can be used in both linear models and deep neural networks (DNN). For model evaluation, we recommend the normalized Gini coefficient to quantify model discrimination and decile charts to assess model calibration. Empirically, we demonstrate the predictive performance of our proposed model on two real-world public datasets.

Our Summary

In this paper, the Google research team addresses the problem of predicting customers’ future lifetime value (LTV). In particular, they want to solve the problem of the heavy-tailed distribution of LTV because of the high number of one-time purchasers and large LTVs from top spenders. To this end, they suggest modeling LTV using the zero-inflated lognormal (ZILN) distribution, which is a mix of zero-point mass and lognormal distribution, and also using a supervised regression to leverage all customer-level attributes. They also measure a model’s ability to differentiate high-value customers from low-value ones with the normalized Gini coefficient. The experiments on two real-world datasets demonstrate the effectiveness of the suggested approach.

What’s the core idea of this paper?

Prediction of customer lifetime value is important for a firm’s financial planning, marketing decisions, and customer relationship management.
When predicting the LTV of new customers, the commonly used frequency and recency characteristics cannot differentiate among customers. Thus, the authors suggest leveraging customer attributes and purchase characteristics by applying a supervised regression using a deep neural network (DNN).
Further, the authors point out the challenges associated with the LTV distribution, which is usually heavy-tailed and volatile due to the high number of non-returning customers and extremely large LTVs for the top spenders:
- Mean Squared Error (MSE) is not appropriate in this case as it (a) ignores the fact that LTV labels include both zero and continuous values; (b) is highly sensitive to outliers because of the squared term.
- The solution is to model the zero-inflated lognormal (ZILN) distribution, which handles the zero and extreme large LTVs by design.
The model is evaluated using the normalized Gini coefficient, which is robust to outliers and allows better business interpretation.

What’s the key achievement?

The experiments demonstrate that both deep neural network architecture and ZILN loss contribute to:
- a higher Spearman’s correlation between true and predicted LTV;
- a higher normalized Gini coefficient.

What are future research areas?

Exploring possible ways to further improve the predictive performance of the introduced approach by experimenting with model architecture and tuning model hyperparameters.

What are possible business applications?

The suggested approach to predicting customers’ lifetime value can help marketers improve their financial planning and customer relationship management.

Where can you get implementation code?

The implementation of the suggested approach to predicting customers’ lifetime value is available on GitHub.

10. Context-aware Embedding for Targeted Aspect-based Sentiment Analysis, by Bin Liang, Jiachen Du, Ruifeng Xu, Binyang Li, Hejiao Huang

Original Abstract

Attention-based neural models were employed to detect the different aspects and sentiment polarities of the same target in targeted aspect-based sentiment analysis (TABSA). However, existing methods do not specifically pre-train reasonable embeddings for targets and aspects in TABSA. This may result in targets or aspects having the same vector representations in different contexts and losing the context-dependent information. To address this problem, we propose a novel method to refine the embeddings of targets and aspects. Such pivotal embedding refinement utilizes a sparse coefficient vector to adjust the embeddings of target and aspect from the context. Hence the embeddings of targets and aspects can be refined from the highly correlative words instead of using context-independent or randomly initialized vectors. Experiment results on two benchmark datasets show that our approach yields the state-of-the-art performance in TABSA task.

Our Summary

Targeted aspect-based sentiment analysis (TABSA) can be very useful for automated analysis of customers’ reviews and understanding the reviewers’ attitudes to different aspects of a product (e.g., price, service, safety). Attention-based neural networks have demonstrated remarkable progress in the TABSA task but the authors of the current paper note that the existing approaches usually utilize context-independent or randomly initialized vectors for representing targets and aspects. As a result, the semantic information is lost and the interdependence among specific targets, corresponding aspects, and context, is not considered. To address this problem, the researchers propose a novel embedding refinement method to obtain context-aware embeddings for TABSA. Specifically, they suggest reconstructing the vector representation for the target from the context using a sparse coefficient vector. This results in target representation being generated from highly correlative words rather than randomly initialized embeddings. The experiments show that the introduced approach leads to state-of-the-art performance in the TABSA task.

What’s the core idea of this paper?

The paper introduces a novel embedding refinement method to obtain context-aware embeddings for the TABSA task rather than context-independent or randomly initialized embeddings:
- A sparse coefficient vector is leveraged to select highly correlated words from the sentence.
- The representations of target and aspect are adjusted to make these highly-correlated words more valuable.
- The aspect embedding is fine-tuned so that it is closer to the highly correlated target and further away from the irrelevant targets.

What’s the key achievement?

The experimental results show that incorporating context-aware embeddings of targets and aspects into the neural models significantly improves:
- aspect detection (by 2.9% in strict accuracy), and
- sentiment classification (by 1.8% in strict accuracy).

What does the AI community think?

The paper was presented at ACL 2019, the leading conference in natural language processing.

What are future research areas?

Exploring the extension of the suggested approach to other tasks.

What are possible business applications?

The introduced approach to obtaining context-aware embeddings for targeted aspect-based sentiment analysis can significantly improve the accuracy of customer reviews analysis.

11. Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis, by Jialong Tang, Ziyao Lu, Jinsong Su, Yubin Ge, Linfeng Song, Le Sun, Jiebo Luo

Original Abstract

In aspect-level sentiment classification (ASC), it is prevalent to equip dominant neural models with attention mechanisms, for the sake of acquiring the importance of each context word on the given aspect. However, such a mechanism tends to excessively focus on a few frequent words with sentiment polarities, while ignoring infrequent ones. In this paper, we propose a progressive self-supervised attention learning approach for neural ASC models, which automatically mines useful attention supervision information from a training corpus to refine attention mechanisms. Specifically, we iteratively conduct sentiment predictions on all training instances. Particularly, at each iteration, the context word with the maximum attention weight is extracted as the one with active/misleading influence on the correct/incorrect prediction of every instance, and then the word itself is masked for subsequent iterations. Finally, we augment the conventional training objective with a regularization term, which enables ASC models to continue equally focusing on the extracted active context words while decreasing weights of those misleading ones. Experimental results on multiple datasets show that our proposed approach yields better attention mechanisms, leading to substantial improvements over the two state-of-the-art neural ASC models. Source code and trained models are available.

Our Summary

The authors note that the existing attention mechanism in aspect-level sentiment classification (ASC) tends to focus on several frequent words with sentiment polarities and ignores infrequent ones. To address this problem, they introduce a novel progressive self-supervised attention learning approach for aspect-level sentiment classification. This approach is based on the idea that the context word with the maximum attention weight has a major impact on sentiment prediction. Thus, if the training instance with the respective context word was predicted correctly, this word should be considered in the model training. Otherwise, it should be ignored as it apparently provides inaccurate information for prediction. The researchers incorporate this approach into the neural model by augmenting the training objective with a corresponding regularizer. The experiments on several benchmark datasets demonstrate the effectiveness of the introduced approach.

What’s the core idea of this paper?

The existing attention mechanism in aspect-level sentiment classification is prone to overly focus on a few frequent words with sentiment polarities while ignoring the infrequent ones. This often results in poor performance of ASC models.
To solve this issue, the researchers introduce a novel progressive self-supervised attention learning approach for ASC models:
- The approach follows the idea that the context word with the highest attention weight has the greatest impact on the sentiment prediction of the corresponding sentence. Keeping this in mind, we should consider these context words in model training only if they result in correctly predicted training instances.
- Following this idea, sentiment prediction is iteratively conducted on all training instances.
- Finally, the training objective is augmented with a regularizer that enforces focus on the extracted active context words while decreasing the weights of the misleading context words.

What’s the key achievement?

Proposing a novel approach to automatically extracting attention supervision information for aspect-level classification models.
Demonstrating the effectiveness of the proposed attention learning approach, which significantly improves the performance of two popular ASC models, Memory Network (MN) and Transformation Network (TNet).

What does the AI community think?

The paper was presented at ACL 2019, the leading conference in natural language processing.

What are future research areas?

Extending the presented approach to other NLP tasks with attention mechanisms, including neural document classification and neural machine translation.

What are possible business applications?

The introduced approach to attention learning for aspect-level sentiment analysis can significantly boost the performance of sentiment classification models applied to the analysis of customer reviews.

Where can you get implementation code?

The authors provide their source code and trained models on GitHub.

Content Generation

12. MirrorGAN: Learning Text-to-image Generation by Redescription, by Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao

Original Abstract

Generating an image from a given text description has two goals: visual realism and semantic consistency. Although significant progress has been made in generating high-quality and visually realistic images using generative adversarial networks, guaranteeing semantic consistency between the text description and visual content remains very challenging. In this paper, we address this problem by proposing a novel global-local attentive and semantic-preserving text-to-image-to-text framework called MirrorGAN. MirrorGAN exploits the idea of learning text-to-image generation by redescription and consists of three modules: a semantic text embedding module (STEM), a global-local collaborative attentive module for cascaded image generation (GLAM), and a semantic text regeneration and alignment module (STREAM). STEM generates word- and sentence-level embeddings. GLAM has a cascaded architecture for generating target images from coarse to fine scales, leveraging both local word attention and global sentence attention to progressively enhance the diversity and semantic consistency of the generated images. STREAM seeks to regenerate the text description from the generated image, which semantically aligns with the given text description. Thorough experiments on two public benchmark datasets demonstrate the superiority of MirrorGAN over other representative state-of-the-art methods.

Our Summary

In this paper, the authors address the problem of generating realistic images that match a given text description. They introduce a novel global-local attentive text-to-image-to-text framework called MirrorGAN. It exploits the idea that if the generated image is semantically consistent with a given text description, its redescription created through image-text translation should have exactly the same semantics as the original text description. Thus, in addition to visual realism adversarial loss and text-image paired semantic consistency adversarial loss, the model also includes a text-semantics reconstruction loss based on cross-entropy. The experiments on two public datasets demonstrate that MirrorGAN outperforms other representative state-of-the-art methods with respect to both visual realism and semantic consistency.

Learning text-to-image generation by redescription

What’s the core idea of this paper?

To generate visually realistic images that are consistent with a given text description, the authors introduce a novel text-to-image-to-text framework called MirrorGAN.
The model exploits the idea of learning text-to-image generation by redescription.
It includes three modules:
- a semantic text embedding module (STEM) for generating word- and sentence-level embeddings;
- a global-local collaborative attentive module (GLAM) for cascaded image generation;
- a semantic text regeneration and alignment module (STREAM) for regenerating the text description from the generated image.
The model uses two adversarial losses to ensure visual realism and text-image paired semantic consistency, and also employs a text-semantics reconstruction loss based on cross-entropy.

What’s the key achievement?

The experiments on the CUB and COCO datasets demonstrate that MirrorGAN outperforms the state-of-the-art AttnGAN by:
- improving the Inception Score from 4.36 to 4.56 on CUB and from 25.89 to 26.47 on the COCO dataset, implying higher diversity and better quality of the generated images;
- getting significantly higher R-precision scores, implying higher semantic consistency of the generated images;
- generating more convincing images, according to the results of the human perceptual test.

What does the AI community think?

The paper was presented at CVPR 2019, the leading conference in computer vision.

What are future research areas?

Optimizing the MirrorGAN modules jointly with complete end-to-end training.
Employing a more advanced language model like BERT for text embedding and image captioning.

What are possible business applications?

The introduced approach to generating realistic images from a given text description can be leveraged in advertising to automatically generate ad creatives.

Where can you get implementation code?

The PyTorch implementation of MirrorGAN is available on GitHub.

13. High-Fidelity Image Generation With Fewer Labels, by Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Zhai, Olivier Bachem, Sylvain Gelly

Original Abstract

Deep generative models are becoming a cornerstone of modern machine learning. Recent work on conditional generative adversarial networks has shown that learning complex, high-dimensional distributions over natural images is within reach. While the latest models are able to generate high-fidelity, diverse natural images at high resolution, they rely on a vast quantity of labeled data. In this work we demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform the state of the art on both unsupervised ImageNet synthesis, as well as in the conditional setting. In particular, the proposed approach is able to match the sample quality (as measured by FID) of the current state-of-the-art conditional model BigGAN on ImageNet using only 10% of the labels and outperform it using 20% of the labels.

Our Summary

The Google Research team investigates several directions for reducing the appetite for labeled data in state-of-the-art Generative Adversarial Networks (GANs). In particular, they show that the recent advances in self-supervised and semi-supervised learning can be leveraged to significantly reduce the amount of ground-truth label information required for natural image generation, while still achieving state-of-the-art results. Namely, they demonstrate that a pre-trained self-supervised approach can match the state-of-the-art performance of BigGAN using only 20% labeled data. Moreover, self-supervision during GAN training leads to even better model performance that can match the state-of-the-art BigGAN with only 10% of the labels and outperform it with 20% of the labels.

What’s the core idea of this paper?

The paper investigates several avenues for reducing the need for labeled data in natural image generation:
- Pre-trained approaches:
  - Unsupervised clustering-based method, where cluster assignments are used as a replacement for labels.
  - Semi-supervised method, where the above-mentioned approach is extended with a semi-supervised loss.
- Co-training approaches:
  - Unsupervised method, where the authors experiment with a single label assigned to all examples and random labels assigned to real images.
  - Semi-supervised method, where labels for the unlabeled images are predicted based on the available labels.
  - Self-supervision during GAN training, where the discriminator is augmented with an auxiliary task – self-supervision through rotation prediction.

What’s the key achievement?

Achieving a new state of the art in unsupervised image generation with the unsupervised clustering-based approach.
Getting state-of-the-art performance using 20% labeled data with the pretrained semi-supervised approach.
Matching the state-of-the-art BigGAN with only 10% of the labels and outperforming it with 20% labels by applying self-supervision during GAN training.

What does the AI community think?

The paper was accepted for oral presentation at ICML 2019, one of the key conferences in machine learning.

What are future research areas?

The authors suggest the following directions for future work:
- exploring the applicability of the introduced techniques for even larger and more diverse datasets than ImageNet;
- investigating the impact of other self-supervised and semi-supervised approaches on the model performance;
- investigating the impact of self-supervision in other deep generative models.

What are possible business applications?

Considering the scarcity of labeled data in the business setting, the presented techniques can be very beneficial for companies seeking to deploy GANs for automated generation of high-quality ad creatives.

Where can you get implementation code?

The authors open-source all the code used in their experiments on GitHub.

14. Enabling Hyper-Personalisation: Automated Ad Creative Generation and Ranking for Fashion e-Commerce, by Sreekanth Vempati, Korah T Malayil, Sruthi V, Sandeep R

Original Abstract

A homepage is the first touch point in the customer’s journey and is one of the prominent channels of revenue for many e-commerce companies. A user’s attention is mostly captured by homepage banner images (also called Ads/Creatives). The set of banners shown and their design influence the customer’s interest and play a key role in optimizing the click-through rates of the banners. Presently, massive and repetitive effort is put in, to manually create aesthetically pleasing banner images. Due to the large amount of time and effort involved in this process, only a small set of banners are made live at any point. This reduces the number of banners created as well as the degree of personalization that can be achieved. This paper thus presents a method to generate creatives automatically on a large scale in a short duration. The availability of diverse banners generated helps in improving personalization as they can cater to the taste of larger audience. The focus of our paper is on generating wide variety of homepage banners that can be made as an input for user-level personalization engine. Following are the main contributions of this paper: 1) We introduce and explain the need for large scale banner generation for e-commerce 2) We present on how we utilize existing deep learning-based detectors which can automatically annotate the required objects/tags from the image. 3) We also propose a Genetic Algorithm based method to generate an optimal banner layout for the given image content, input components and other design constraints. 4) Further, to aid the process of picking the right set of banners, we designed a ranking method and evaluated multiple models. All our experiments have been performed on data from Myntra, one of the top fashion e-commerce players in India.

Our Summary

In this paper, the research team from Myntra, one of the leading e-commerce players in India, introduces its approach to hyper-personalization of homepage banners. Manual creation of banners requires lots of hours invested in searching through image libraries, selecting font colors, size, and typography, image transformation, and finally, combining all the elements into an aesthetically appealing banner. As a result, only a few banners are usually available, which doesn’t allow a significant degree of personalization. At Myntra, they leverage a genetic algorithm-based method that automatically generates banners using a library of design elements. To pick the right set of banners from the generated ones, they use a ranking method built on banner meta-data. The online A/B test demonstrates that hyper-personalization enabled by automatic banner generation results in a 72% increase in click-through rate (CTR).

What’s the core idea of this paper?

To enable the hyper-personalization of ad banners, they need to be created automatically.
The Myntra research team suggests the following pipeline for automatic generation of ad creatives such as homepage banners:
- large-scale automated annotation of all available images and tagging each of them with the relevant data;
- feeding the annotated data to the layout generation module and further to the creation module;
- re-ranking the generated banners with a model built on historical data.

End-to-end pipeline for automatic creation of banners

What’s the key achievement?

Introducing a novel approach to automatic generation of ad creatives which, according to the experiments:
- results in a significant CTR increase (by 72%);
- includes a ranking model that evaluates generated banners in line with human judgment.

What does the AI community think?

The paper was presented during the Workshop on Recommender Systems in Fashion within RecSys 2019, the 13th ACM Conference on Recommender Systems.

What are future research areas?

Exploring the opportunity to further boost personalization by performing online ranking via reinforcement learning.

What are possible business applications?

The introduced approach to the automatic generation of aesthetically appealing banners can be leveraged by e-commerce companies, social messaging platforms, and online video content providers.

15. Towards Controllable and Personalized Review Generation, by Pan Li and Alexander Tuzhilin

Original Abstract

In this paper, we propose a novel model RevGAN that automatically generates controllable and personalized user reviews based on the arbitrarily given sentimental and stylistic information. RevGAN utilizes the combination of three novel components, including self-attentive recursive autoencoders, conditional discriminators, and personalized decoders. We test its performance on the several real-world datasets, where our model significantly outperforms state-of-the-art generation models in terms of sentence quality, coherence, personalization and human evaluations. We also empirically show that the generated reviews could not be easily distinguished from the organically produced reviews and that they follow the same statistical linguistics laws.

Our Summary

Only a small fraction of users have time to write reviews. To encourage users to provide feedback, the researchers from New York University suggest a model for the generation of controllable and personalized customer reviews. This model, called RevGAN, generates high-quality user reviews based on the product descriptions, sentiment labels, and previous reviews. The model has three components, including a Self-Attentive Recursive Autoencoder for capturing the hierarchical structure and semantic meaning of user reviews, a Conditional Discriminator for generating controllable user reviews in terms of quality and accuracy, and a Personalized Decoder for personalizing the writing style of the users. The experiments on different datasets show that the RevGAN model significantly outperforms strong baselines and generates reviews that are hard to distinguish from the original ones.

Self-Attentive Recursive AutoEncoder

What’s the core idea of this paper?

The paper seeks to provide an additional tool for encouraging users to provide feedback after getting a product or service via online marketplace websites.
To this end, the authors propose a novel model RevGAN for automated generation of high-quality and personalized user reviews. The model consists of three main components:
- A Self-Attentive Recursive Autoencoder that maps users’ reviews and product descriptions into continuous embeddings to capture the hierarchical structure and semantic meaning of textual information;
- A Conditional Discriminator that controls the sentiment of generated reviews by conditioning sentiment on the discriminator;
- A Personalized Decoder that decodes the generated review embeddings by taking into account the personalized writing style of the user as captured from the user’s historical reviews.

What’s the key achievement?

Introducing a novel model for controllable and personalized review generation that:
- statistically and empirically outperforms state-of-the-art baselines with respect to sentence quality and coherency;
- automatically generates reviews that are very similar to the organically-generated ones in terms of style and content.

What does the AI community think?

The paper was presented at EMNLP 2019, one of the leading conferences in natural language processing.

What are future research areas?

Exploring the methods for generating reviews based on several keywords provided by a user.
Developing novel methods for distinguishing automatically generated reviews from the organic ones.

What are possible business applications?

The introduced approach to automated review generation may help companies encourage their customers to provide feedback even if they don’t have time or don’t want to write a review: the clients can simply edit or approve the automatically generated review instead of writing it from scratch.

To access all of the available research summaries in our AI for Marketing series, check out the following articles that cover the latest AI & machine learning approaches to 5 aspects of marketing automation:

We’ll let you know when we release more summary articles like this one.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.topbots.com/ai-marketing-research-papers-2020/