Connect with us

Artificial Intelligence

China Roundup: facial recognition lawsuit and cashless payments for foreigners



Hello and welcome back to TechCrunch’s China Roundup, a digest of recent events shaping the Chinese tech landscape and what they mean to people in the rest of the world. This week, a lawsuit sparked a debate over the deployment of China’s pervasive facial recognition; meanwhile, in some good news, foreigners in China can finally experience cashless payment just like locals.

China’s first lawsuit against face scans

Many argue that China holds an unfair advantage in artificial intelligence because of its citizens’ willingness to easily give up personal data desired by tech companies. But a handful of people are surely getting more privacy-conscious.

This week, a Chinese law professor filed what looks like the country’s first lawsuit against the use of AI-powered face scans, according to Qianjiang Evening News, a local newspaper in the eastern province of Zhejiang. In dispute is the decision by a privately-owned zoo to impose mandatory facial recognition on admission control for all annual pass holders.

“I’ve always been conservative about gathering facial biometrics data. The collection and use of facial biometrics involve very uncertain security risks,” the professor told the paper, adding that he nonetheless would accept such requirement from the government for the purpose of “public interest.”

Both the government and businesses in China have aggressively embraced facial recognition in wide-ranging scenarios, be it to aid public security checks or speed up payments at supermarket checkouts. The technology will certainly draw more scrutiny from the public as it continues to spread. Already, the zoo case is garnering considerable attention. On Weibo, China’s equivalent of Twitter, posts about the suit have generated some 100 million views and 10,000 comments in less than a week. Many share the professors’ concerns over potential leaks and data abuse.

Scan and pay like a local

The other technology that has become ubiquitous in China is cashless payments. For many years, foreign visitors without a Chinese bank account have not been able to participate in the scan-and-pay craze that’s received extensive coverage in the west. But the fences are now down.

This week, two of the country’s largest payment systems announced almost at the same time that they are making it easier for foreigners to pay through their smartphones. Visitors can now pay at a selection of Chinese merchants after linking their overseas credit cards backed by Visa, MasterCard, American Express, Discover Global Network or JCB to Tencent’s WeChat Pay.

“This is to provide travelers, holding 2.6 billion Mastercard cards around the world, with the ability to make simple and smart payments anytime, anywhere in China,” Mastercard said in a company statement.

Alipay, Alibaba’s affiliate, now also allows foreign visitors to top up RMB onto a prepaid virtual card issued by Bank of Shanghai with their international credit or debit cards. The move is a boon to the large swathes of foreign tourists in China, which numbered 141 million in 2018.

Also worth your attention

Didi’s controversial carpooling service is finally back this week, more than a year after the feature was suspended following two murders of female passengers. But the company, which has become synonymous with ride-hailing, was immediately put in the hot seat again. The relaunched feature noticeably included a curfew on women, who are only able to carpool between 5 a.m. and 8 p.m. The public lambasted the decision as humiliating and discriminating against women, and Didi responded swiftly to extend the limit to both women and men. The murders were a huge backlash for the company, and it’s since tried to allay the concerns. At this point, the ride-hailing giant simply can’t afford another publicity debacle.

The government moves to stamp out monopolistic practices of some of China’s largest e-commerce platforms ahead of Single’s Day, the country’s busiest shopping festival. Merchants have traditionally been forced to be an exclusive supplier for one of these giants, but Beijing wants to put a stop to it and summoned Alibaba,, Pinduoduo (in Chinese) and other major retail players for talks on anti-competition this week.

Iqiyi, often hailed as the “Netflix of China,” reports widening net loss at $516.0 million in the third quarter ending September 30. The good news is it has added 25 million new subscribers to its video streaming platform. 99.2% of its 105.8 million user base are now paying members.

36Kr, one of China’s most prominent tech news sites, saw its shares tumble 10% in its Nasdaq debut on Friday. The company generates revenue from subscriptions, advertisements and enterprise “value-added” services. The last segment, according to its prospectus, is designed to “help established companies increase media exposure and brand awareness.”

Read more:


Maximizing NLP model performance with automatic model tuning in Amazon SageMaker




The field of Natural Language Processing (NLP) has had many remarkable breakthroughs in the past two years. Advanced deep learning models are raising the state-of-the-art performance standards for NLP tasks. To benefit from newly published NLP models, the best approach is to apply a pre-trained language model to a new dataset and fine-tune it for a specific NLP task. This approach is known as transfer learning, which could significantly reduce model training resource requirements, compared to training a model from scratch, and could also produce decent results, even with small amounts of training data.

With the rapid growth of NLP techniques, many NLP frameworks with packaged pre-trained language models have been developed to provide users easy access to transfer learning. For example, ULMFiT by and PyTorch-Transformers by Hugging Face are two popular NLP frameworks with pre-trained language models.

This post shows how to fine-tune NLP models using PyTorch-Transformers in Amazon SageMaker and apply the built-in automatic model-tuning capability for two NLP datasets: the Microsoft Research Paraphrase Corpus (MRPC) [1] and the Stanford Question Answering Dataset (SQuAD) 1.1 [2]. PyTorch-Transformers is a library with a collection of state-of-the-art pre-trained language models, including BERT, XLNET, and GPT-2. This post uses its pre-trained BERT (Bidirectional Encoder Representations from Transformers) uncased base model, which Google developed [3].

About this blog post
Time to read 10 minutes
Time to complete ~ 20 hours
Cost to complete ~ $160 (at publication time)
Learning level Advanced (300)
AWS services Amazon SageMaker
Amazon S3

The detailed step-by-step code walkthrough can be found in the example notebook.

This post demonstrates how to do the following:

  • Run PyTorch-Transformers Git code in Amazon SageMaker that can run on your local machine, using the SageMaker PyTorch framework.
  • Optimize hyperparameters using automatic model tuning in Amazon SageMaker.
  • View the sensitivity of NLP models to hyperparameter values.

Setting up PyTorch-Transformers

This post uses a ml.t2.medium notebook instance. For a general introduction to Amazon SageMaker, see Get Started with Amazon SageMaker.

This post uses a General Language Understanding Evaluation (GLUE) dataset MRPC as an example to walk through the key steps required for onboarding the PyTorch-Transformer into Amazon SageMaker, and for fine-tuning the model using the SageMaker PyTorch container. To set up PyTorch-Transformer, complete the following steps:

  1. Download the GLUE data by running the script on the GitHub repo and unpacking it to the directory $GLUE_DIR. The following example code in the Jupyter notebook downloads the data and saves it to a folder named glue_data:
    !python --data_dir glue_data --tasks all

  2. Copy the PyTorch-Transformer GitHub code to a local directory in the Amazon SageMaker notebook instance. See the following code:
    !git clone

    After these two steps, you should have a data folder named glue_data and a script folder named pytorch-transformers in the directory of your Jupyter notebook.

  3. Fine tune the BERT model for the MRPC dataset. Use the script, provided in /pytorch-transformers/example/, as a training script for the PyTorch estimator in Amazon SageMaker.

Before you create the estimator, make the following changes:

  • Check and modify the argparse code in to allow the SageMaker estimator to read the input arguments as hyperparameters. For example, you may want to treat do_train and do_eval as hyperparameters and pass a boolean value to them when the PyTorch estimator is called in Amazon SageMaker.
  • Create a requirements.txt file in the same directory as the training script The requirements.txt file should include packages required by the training script that are not already installed by default in the Amazon SageMaker PyTorch container. For example, pytorch_transformers is one of the packages you need to install.

A requirements.txt file is a text file that contains a list of required packages that are installed using a python package installer, in this case pip. When you launch training jobs, the Amazon SageMaker container automatically looks for a requirements.txt file in the script source folder, and uses pip install to install the packages listed in that file.

Fine-tuning the NLP model in Amazon SageMaker

After downloading the data and preparing the training script, you are ready to fine-tune the NLP model in Amazon SageMaker. To launch a training job, complete the following steps:

  1. Upload the data to Amazon S3. See the following code:
    inputs = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=s3_prefix)

  2. Configure the training script environment variables and hyperparameter initial settings. See the following code:
    container_data_dir = '/opt/ml/input/data/training'
    container_model_dir = '/opt/ml/model' parameters = { 'model_type': 'bert', 'model_name_or_path': 'bert-base-uncased', 'task_name': task_name, 'data_dir': container_data_dir, 'output_dir': container_model_dir, 'num_train_epochs': 3, 'per_gpu_train_batch_size': 64, 'per_gpu_eval_batch_size': 64, 'save_steps': 150, 'logging_steps': 150 # you can add more input arguments here

    In the preceding code, you pass the container directories for input data (/opt/ml/input/data/training) and model artifacts (/opt/ml/model) to the training script. You provide directories relative to the container, instead of local directories, because Amazon SageMaker runs a training job in a Docker container. To launch a training job, you need to pass the training data’s S3 path to the function. During the container creation process, Amazon SageMaker automatically downloads the S3 data and saves it to the directories defined by the container’s environment variables.

    It is important to know the correct model input and output locations in an Amazon SageMaker Docker container. For model input, use  /opt/ml/input/data/channel_name/, where the user provides the channel_name (for example training or testing). For model artifacts, use /opt/ml/model/. For more information, see Amazon SageMaker Containers: a Library to Create Docker Containers and Building your own algorithm container in the GitHub repo.

    The training script loads both training and validation data from the same directory, so you only need to define one data directory and label the channel name as training. The model logging file and trained model artifacts are saved to the directory /opt/ml/model/ in the Docker container, and upload to S3 when the training job is complete.

  3. Create a PyTorch estimator and launch a training job. See the following code:
    from sagemaker.pytorch import PyTorch estimator = PyTorch(entry_point='', source_dir = './pytorch-transformers/examples/', hyperparameters=parameters, role=role, framework_version='1.1.0', train_instance_count=1, train_instance_type=’ml.p3.2xlarge’ ){'training': inputs})

    The training job for the preceding example takes approximately five minutes to complete. You should see a training job launched in the Training jobs session tab in the Amazon SageMaker console. By choosing the training job, you can see detailed information about the training run, including Amazon CloudWatch logs and the S3 location link for model output and artifacts.

Launching automatic model tuning

Test the training setup by completing one full training job without errors. Then you can launch an automatic model tuning using Bayesian optimization with the following steps:

  1. Define an optimization metric. Amazon SageMaker supports predefined metrics that it can read automatically from the training CloudWatch log, which exist for built-in algorithms (such as XGBoost) and frameworks (such as TensorFlow or MXNet). When you use your own training script, you need to tell Amazon SageMaker how to extract your metric from the log with a simple regular expression. See the following code:
    metric_definitions = [{'Name': 'f1_score', 'Regex': ''f1_': ([0-9\.]+)'}]

    Modify to make sure the model evaluation results print out into the CloudWatch log. For this post, use the F1 score as the optimization metric for automatic model tuning.

  2. Define the hyperparameter range. See the following code:
    hyperparameter_ranges = { 'learning_rate': ContinuousParameter(5e-06, 5e-04), scaling_type="Logarithmic") }

    For large NLP models, it is better to limit the tuning job to one or two hyperparameters at a time, so the Bayesian optimization is stable and can converge faster. Also keep in mind that different hyperparameter values could require different computing resources. For example, the batch size for deep learning models directly impacts the amount of CPU/GPU memory you need during model training. Setting a hyperparameter range that is compatible with the EC2 training instance capacity is good practice, and helps provide a smooth model tuning experience.

  3. Launch the hyperparameter tuning job. See the following code:
    from sagemaker.tuner import HyperparameterTuner objective_metric_name = 'f1_score' tuner = HyperparameterTuner(estimator, objective_metric_name, hyperparameter_ranges, metric_definitions, strategy = 'Bayesian', objective_type = 'Maximize', max_jobs=30, max_parallel_jobs=3, early_stopping_type = 'Auto') tuning_job_name = "pt-bert-mrpc-{}".format(strftime("%d-%H-%M-%S", gmtime())){'training': inputs}, job_name=tuning_job_name)

You can monitor the tuning job’s progress in the Amazon SageMaker console. For more information, see Monitor the Progress of a Hyperparameter Tuning Job.

Automatic model tuning results for GLUE dataset MRPC

You can easily extract the hyperparameter tuning results into a dataframe for further analysis. See the following code:

tuner_metrics = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name).dataframe()

The following graphs show the progression of the learning rate and the model F1 score over training jobs, ordered by the training job start time. The Bayesian optimization process is set using three parallel jobs for 10 iterations (a total of 30 training jobs). The automatic model tuning took 15 training jobs (five iterations) to find an optimal learning rate and then finely adjust the learning rate around a value of 6.5e-5 to maximize the F1 score.

The following graph plots the F1 score vs. learning rate and illustrates the model’s sensitivity to hyperparameters. The results are from the MRPC dev dataset using the pre-trained BERT uncase base model.

The F1 score can vary widely between 0.75 and 0.92 for a learning rate between 1e-5 and 5e-4. The graph also shows there is an optimal learning rate at 6.47e-5, with the F1 score peaked at 0.918 for the validation dataset. Most of the training jobs (22 of 30 jobs) were conducted near the optimal value of the learning rate, indicating good efficiency of the Bayesian optimization algorithm.

Automatic model tuning results for SQuAD dataset

You can conduct a similar automatic model tuning for the SQuAD 1.1 dataset, a collection of 100K crowd-sourced question and answer pairs. Fine-tuning the SQuAD dataset demonstrates the hyperparameter tuning for another NLP task and also for a dataset larger than MRPC data, which is a collection of approximately 5,000 sentence pairs. The code details can be found in the notebook.

The following graphs show the hyperparameter tuning progression in the same way as the MRPC results. The training has slightly different settings and used two parallel jobs for 15 iterations (30 jobs in total). Again, the Bayesian optimization can quickly find an optimal learning rate (around 5.7e-5) after eight jobs (four iterations). There is also a spike at job 20, but then the tuning jobs quickly converge toward the optimal learning rate. The spike at job 20 may be due to randomization in the Bayesian optimization algorithm, which tends to prevent local minimum/maximum fitting.

Similar to the MRPC case, the SQuAD model also has a strong sensitivity to hyperparameter values. The following graph of F1 vs. learning rate shows a nice parabolic shape, with the optimal learning rate at 5.73e-5. The corresponding F1 score is 0.884 and the exact match (EM) is 0.812, which are similar to the original BERT paper reporting of F1 at 0.885 and EM at 0.808 for the SQuAD 1.1 dev dataset.

We used a smaller batch size (16) and one epoch in model tuning, compared to a batch size of 32 and three epochs as used in the BERT paper [3].

Cleaning up

To prevent any additional charges, stop the notebook instance and delete the model artifacts saved in S3.


This post showed you how to fine-tune NLP models using Hugging Face’s PyTorch-Transformers library in Amazon SageMaker, and demonstrated the effectiveness of using the built-in automatic model tuning in Amazon SageMaker to maximize model performance through hyperparameter optimization. This approach makes it easy to adopt state-of-the-art language models for NLP problems, and to achieve new accuracy records in NLP by using Amazon SageMaker’s automatic model tuning capability.


[1] Alex Wang, Amanpreet Singh, Julian Michael, Fe-lix Hill, Omer Levy, and Samuel Bowman. “GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding.” arXiv preprint arXiv:1804.07461 (2018)

[2] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. “SQuAD: 100,000+ questions for machine comprehension of text.” arXiv preprint arXiv:1606.05250 (2016)

[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” arXiv preprint arXiv: 1810.04805 (2018)

 About the authors

Jason Zhu is a Data Scientist with AWS Professional Services focusing on helping customers using machine learning. In his spare time, he enjoys being outdoors and growing his capabilities as a cook.

Xiaofei Ma is an Applied Scientist with AWS AI Labs focusing on developing machine learning-based services for AWS customers. In his spare time, he enjoys reading and traveling.

Kyle Brubaker is a Data Scientist with AWS Professional Services, where he works with customers to develop and implement machine learning solutions on AWS. He enjoys playing soccer, surfing, and finding new shows to watch.


Continue Reading


Jeff Bezos’ DMs hint Saudi crown prince knew private info from hack, report says




Who would dare get all up in Bezos' business?
Who would dare get all up in Bezos’ business?
Image: Alex Wong / Getty Images

It sure looks like Saudi crown prince Mohammed bin Salman spent some time poking around Jeff Bezos’ private business after the Amazon CEO’s phone was reportedly hacked.   

According to the New York Times, investigators have pieced together a timeline and determined that, after the hack, the Saudis learned about Bezos’ private affair with Lauren Sanchez. 

In the spring of 2018, about a month after first meeting bin Salman at a dinner in Los Angeles and exchanging numbers with him, Bezos received a video from bin Salman’s number on WhatsApp that featured “an image of Saudi and Swedish flags overlaid with Arabic text.” After that, investigators note, “the amount of data exiting his phone increased almost three hundredfold.”

While the report doesn’t say whether or not Bezos clicked the video, investigators suggest he did, giving bad actors access to his phone. And two messages sent later by the the crown prince suggest he used the hack to look at Bezos’ private information.

The first message was from November 2018, in which the crown prince sent a meme to Bezos that featured “a woman who strongly resembled Lauren Sanchez,” according to the Times. Sanchez was the woman with whom Bezos was having an affair with at the time of the message.

The text of the image read, “Arguing with a woman is like reading the software license agreement. In the end you have to ignore everything and click I agree.” 

Apparently, not even the crown prince of Saudi Arabia can withstand the allure of stale, outdated, “Women, am I right?” jokes. 

Still, the affair was private and only someone who had access to Bezos’ private messages would have know about it, according to the report. 

Bezos and wife MacKenzie announced their divorce in January 2019 but it wasn’t until two months later that his affair with Sanchez became public. Then, in February 2019, Bezos published a blog post that claimed the National Enquirer was trying to extort him over text messages and dick pics related to his affair. 

In his open letter about the extortion attempt, Bezos accused David Pecker, the head of American Media, Inc (AMI), which owns the Enquirer, of pursuing some shady business deals with Saudi Arabia. Bezos also mentioned that Pecker, a close ally of President Donald Trump (who also had a cozy relationship with bin Salman), was “apoplectic” over coverage from the Bezos-owned Washington Post of the October 2018 murder of Post columnist Jamal Khashoggi, allegedly on bin Salman’s orders.

(It’s also worth noting that, in late March 2019, Gavin De Becker, one of Bezos’ investigators, re-emphasized Saudi Arabia’s roll in the extortion attempt in an op-ed in The Daily Beast.)

All of this had unfolded when the second message that investigators flagged was sent. On Feb. 14, 2019, Bezos reportedly had phone calls with advisors about the alleged campaign against him from Saudi Arabia. Two days after the call, bin Salman slid into Bezos’ DMs with the message, “there is nothing against you or Amazon from me or Saudi Arabia.” 

While it’s true this could be a coincidence given the public nature of Bezos’ accusations at this point, investigators still noted the timing. 

For its part, Saudi Arabia is claiming it had nothing to do with any of this, of course.

We’ve also reached out to Amazon for comment on the new report.


Continue Reading

Artificial Intelligence

Unearth the future of agriculture at TC Sessions: Robotics+AI with the CEOs of Traptic, Farmwise and Pyka




Farming is one of the oldest professions, but today those amber waves of grain (and soy) are a test bed for sophisticated robotic solutions to problems farmers have had for millennia. Learn about the cutting edge (sometimes literally) of agricultural robots at TC Sessions: Robotics+AI on March 3 with the founders of Traptic, Pyka, and Farmwise.

Traptic, and its co-founder and CEO Lewis Anderson, you may remember from Disrupt SF 2019, where it was a finalist in the Startup Battlefield. The company has developed a robotic berry picker that identifies ripe strawberries and plucks them off the plants with a gentle grip. It could be the beginning of a new automated era for the fruit industry, which is decades behind grains and other crops when it comes to machine-based harvesting.

Farmwise has a job that’s equally delicate yet involves rough treatment of the plants — weeding. Its towering machine trundles along rows of crops, using computer vision to locate and remove invasive plants, working 24/7, 365 days a year. CEO Sebastian Boyer will speak to the difficulty of this task and how he plans to evolve the machines to become “doctors” for crops, monitoring health and spontaneously removing pests like aphids.

Pyka’s robot is considerably less earthbound than those: an autonomous, all-electric crop-spraying aircraft — with wings! This is a much different challenge from the more stable farming and spraying drones like those of DroneSeed and SkyX, but the choice gives the craft more power and range, hugely important for today’s vast fields. Co-founder Michael Norcia can speak to that scale and his company’s methods of meeting it.

These three companies and founders are at the very frontier of what’s possible at the intersection of agriculture and technology, so expect a fruitful conversation.

$150 Early Bird savings end on Feb. 14! Book your $275 Early Bird Ticket today and put that extra money in your pocket.

Students, grab your super discounted $50 tickets right here. You might just meet your future employer/internship opportunity at this event.

Startups, we only have 5 demo tables left for the event. Book your $2200 demo table here and get in front of some of today’s leading names in the biz. Each table comes with 4 tickets to attend the show.


Continue Reading