Connect with us

AI

How Euler Hermes detects typo squatting with Amazon SageMaker

Avatar

Published

on

This is a guest post from Euler Hermes. In their own words, “For over 100 years, Euler Hermes, the world leader in credit insurance, has accompanied its clients to provide simpler and safer digital products, thus becoming a key catalyzer in the world’s commerce.”

Euler Hermes manages more than 600,000 B2B transactions per month and effectuates data analytics from over 30 million companies worldwide. At-scale artificial intelligence and machine learning (ML) have become the heart of the business.

Euler Hermes uses ML across a variety of use cases. One recent example is typo squatting detection, which came about after an ideation workshop between the Cybersecurity and IT Innovation teams to better protect clients. As it turns out, moving from idea to production has never been easier when your data is in the AWS Cloud and you can put the right tools in the hands of your data scientists in minutes.

Typo squatting, or hijacking, is a form of cybersecurity attack. It consists of registering internet domain names that closely resemble legitimate, reputable, and well-known ones with the goal of phishing scams, identity theft, advertising, and malware installation, among other potential issues. The sources of typo squatting can be varied, including different top-level domains (TLD), typos, misspellings, combo squatting, or differently phrased domains.

The challenge we faced was building an ML solution to quickly detect any suspicious domains registered that could be used to exploit the Euler Hermes brand or its products.

To simplify the ML workflow and reduce time-to-market, we opted to use Amazon SageMaker. This fully managed AWS service was a natural choice due to the ability to easily build, train, tune, and deploy ML models at scale without worrying about the underlying infrastructure while being able to integrate with other AWS services such as Amazon Simple Storage Service (Amazon S3) or AWS Lambda. Furthermore, Amazon SageMaker meets the strict security requirements necessary for financial services companies like Euler Hermes, including support for private notebooks and endpoints, encryption of data in transit and at rest, and more.

Solution overview

To build and tune ML models, we used Amazon SageMaker notebooks as the main working tool for our data scientists. The idea was to train an ML model to recognize domains related to Euler Hermes. To accomplish this, we worked on the following two key steps: dataset construction and model building.

Dataset construction

Every ML project requires a lot of data, and our first objective was to build the training dataset.

The dataset of negative examples was composed of 1 million entries randomly picked from Alexa, Umbrella, and publicly registered domains, whereas the dataset of 1 million positive examples was created from a domain generated algorithm (DGA) using Euler Hermes’s internal domains.

Model building and tuning

One of the project’s biggest challenges was to decrease the number of false positives to a minimum. On a daily basis, we need to unearth domains related to Euler Hermes from a large dataset of approximately 150,000 publicly registered domains.

We tried two approaches: classical ML models and deep learning.

We considered various models for classical ML, including Random Forest, Logistic regression, and gradient boosting (LightGBM and XGBoost). For these models, we manually created more than 250 features. After an extensive feature-engineering phase, we selected the following as the most relevant:

  • Number of FQDN levels
  • Vowels ration
  • Number of characters
  • Bag of n-grams (top 50 n-grams)
  • Features TF-IDF
  • Latent Dirichlet allocation features

For deep learning, we decided to work with recurrent neural networks. The model we adopted was a Bidirectional LSTM (BiLSTM) with an attention layer. We found this model to be the best at extracting a URL’s underlying structure.

The following diagram shows the architecture designed for the BiLSTM model. To avoid overfitting, a Dropout layer was added.

The following code orchestrates the set of layers:

def AttentionModel_(vocab_size, input_length, hidden_dim): model = tf.keras.models.Sequential() model.add(Embedding(MAX_VOCAB_SIZE, hidden_dim, input_length=input_length)) model.add(Bidirectional(LSTM(units=hidden_dim, return_sequences=True, dropout=0.2, recurrent_dropout=0.2))) model.add(SecSelfAttention(attention_activation='sigmoid')) model.add(Reshape((2*hidden_dim*input_length))) model.add(Dense(1, activation='sigmoid')) model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["acc", tf.keras.metrics.FalsePositives()]) return model

We built and tuned the classical ML and the deep learning models using the Amazon SageMaker-provided containers for Scikit-learn and Keras.

The following table summarizes the results we obtained. The BiLSTM outperformed the other models with a 13% precision improvement compared to the second-best model (LightGBM). For this reason, we put the BiLSTM model into production.

Models

Precision F1-Score

ROC-AUC

(Area Under the Curve)

Random Forest

0.832

0.841

0.908

XGBoost

0.870

0.876

0.921

LightGBM

0.880

0.883

0.928

RNN (BiLSTM)

0.996

0.997

0.997

Model training

For model training, we made use of Managed Spot Training in Amazon SageMaker to use Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances for training jobs. This allowed us to optimize the cost of training models at a lower cost compared to On-Demand Instances.

Because we predominantly used custom deep learning models, we needed GPU instances for time-consuming neural network training jobs, with times ranging from minutes to a few hours. Under these constraints, Managed Spot Training was a game-changing solution. The on-demand solution permitted no interruption of our data scientists while managing instance-stopping conditions.

Productizing

Euler Hermes’s cloud principles follow a serverless-first strategy, with an Infrastructure as Code DevOps practice. Systematically, we construct a serverless architecture based on Lambda whenever possible, but when this isn’t possible, we deploy to containers using AWS Fargate.

Amazon SageMaker allows us to deploy our ML models at scale within the same platform on a 100% serverless and scalable architecture. It creates a model endpoint that is ready to serve inference requests. To get inferences for an entire dataset, we use batch transform, which cuts the dataset off in smaller batches and gets the predictions on each one. Batch transform manages all the compute resources required to get inferences, including launching instances and deleting them after the batch transform job is complete.

The following figure depicts the architecture deployed for the use case in this post.

First, a daily Amazon CloudWatch event is set to trigger a Lambda function with two jobs: download all the publicly registered domains and store them in an Amazon Simple Storage Service (Amazon S3) bucket subfolder and trigger the BatchTransform job. Amazon SageMaker automatically saves the inferences in an S3 bucket that you specify when creating the batch transform job.

Finally, a second CloudWatch event monitors the task success of Amazon SageMaker. If the task succeeds, it triggers a second Lambda function that retrieves the inferred domains and selects those that have label 1—related to Euler Hermes or its products—and stores them in another S3 bucket subfolder.

Following Euler Hermes’s DevOps principles, all the infrastructure in this solution is coded in Terraform to implement an MLOps pipeline to deploy to production.

Conclusion

Amazon SageMaker provides the tool that our data scientists need to quickly and securely experiment and test while maintaining compliance with strict financial service standards. This allows us to bring new ideas into production very rapidly. With flexibility and inherent programmability, Amazon SageMaker helped us tackle our main pain point of industrializing ML models at scale. After we train an ML model, we can use Amazon SageMaker to deploy the model, and can automate the entire pipeline following the same DevOps principles and tools we use for all other applications we run with AWS.

In under 7 months, we were able to launch a new internal ML service from ideation to production and can now identify URL squatting fraud within 24 hours after the creation of a malicious domain.

Although our application is ready, we have some additional steps planned. First, we’ll extend the inferences currently stored on Amazon S3 to our SIEM platform. Second, we’ll implement a web interface to monitor the model and allow manual feedback that is captured for model retraining.


About the Authors

Luis Leon is the IT Innovation Advisor responsible for the data science practice in the IT at Euler Hermes. He is in charge of the ideation of digital projects as well as managing the design, build and industrialization of at scale machine learning products. His main interests are Natural Language Processing, Time Series Analysis and non-supervised learning.

Hamza Benchekroun is Data Scientist in the IT Innovation hub at Euler Hermes focusing on deep learning solutions to increase productivity and guide decision making across teams. His research interests include Natural Language Processing, Time Series Analysis, Semi-Supervised Learning and their applications.

Hatim Binani is data scientist intern in the IT Innovation hub at Euler Hermes. He is an engineering student at INSA Lyon in the computer science department. His field of interest is data science and machine learning. He contributed within the IT innovation team to the deployment of Watson on Amazon Sagemaker.

Guillaume Chambert is an IT security engineer at Euler Hermes. As SOC manager, he strives to stay ahead of new threats in order to protect Euler Hermes’ sensitive and mission-critical data. He is interested in developing innovation solutions to prevent critical information from being stolen, damaged or compromised by hackers.

Source: https://aws.amazon.com/blogs/machine-learning/how-euler-hermes-detects-typo-squatting-with-amazon-sagemaker/

Artificial Intelligence

DataRobot expands platform and announces Zepl acquisition

Avatar

Published

on

DataRobot, the Boston-based automated machine learning startup, had a bushel of announcements this morning as it expanded its platform to give technical and non-technical users alike something new. It also announced it has acquired Zepl, giving it an advanced development environment where data scientists can bring their own code to DataRobot. The two companies did not share the acquisition price.

Nenshad Bardoliwalla, SVP of Product at DataRobot says that his company aspires to be the leader in this market and it believes the path to doing that is appealing to a broad spectrum of user requirements from those who have little data science understanding to those who can do their own machine learning coding in Python and R.

“While people love automation, they also want it to be [flexible]. They don’t want just automation, but then you can’t do anything with it. They also want the ability to turn the knobs and pull the levers,” Bardoliwalla explained.

To resolve that problem, rather than building a coding environment from scratch, it chose to buy Zepl and incorporate its coding notebook into the platform in a new tool called Composable ML. “With Composable ML and with the Zepl acquisition, we are now providing a really first class environment for people who want to code,” he said.

Zepl was founded in 2016 and raised $13 million along the way, according to Crunchbase data. The company didn’t want to reveal the number of employees or the purchase price, but the acquisition gives it advanced capabilities, especially a notebook environment to call its own to attract those more advanced users to the platform.The company plans to incorporate the Zepl functionality into the platform, while also leaving the stand-alone product in place.

Bardoliwalla said that they see the Zepl acquisition as an extension of the automated side of the house, where these tools can work in conjunction with one another with machines and humans working together to generate the best models. “This [generates an] organic mixture of the best of what a system can generate using DataRobot AutoML and the best of what human beings can do and kind of trying to compose those together into something really interesting […],” Bardoliwalla said.

The company is also introducing a no-code AI app builder that enables non-technical users to create apps from the data set with drag and drop components. In addition, it’s adding a tool to monitor the accuracy of the model over time. Sometimes, after a model is in production for a time, the accuracy can begin to break down as the data the model is based is no longer valid. This tool monitors the model data for accuracy and warns the team when it’s starting to fall out of compliance.

Finally the company is announcing a model bias monitoring tool to help root out model bias that could introduce racist, sexist or other assumptions into the model. To avoid this, the company has built a tool to identify when it sees this happening both in the model building phase and in production. It warns the team of potential bias, while providing them with suggestions to tweak the model to remove it.

DataRobot is based in Boston and was founded in 2012. It has raised over $750 million and has a valuation of over $2.8 billion, according to Pitchbook.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://techcrunch.com/2021/05/11/datarobot-expands-platform-and-announces-zepl-acquisition/

Continue Reading

Artificial Intelligence

Powering the Next Wave of Healthcare Innovation with AI

Avatar

Published

on

AI Healthcare
Illustration: © IoT For All

There’s no doubt that data is poised to transform healthcare like it has so many other sectors, but it’ll need a helping hand. Today, healthcare providers collect exabytes of patient data from hospitals, clinics, imaging and pathology labs, and more. This data contains a wealth of insight into human health, but its lack of structure and sheer volume means it’s well beyond the limits of human ability to decipher it.

Fortunately, sophisticated AI and machine learning solutions can carry the torch of innovation.

In healthcare, the value of machine learning is its capacity for processing massive data sets that are far beyond the scope of human ability. Raw, unstructured data goes in, and clinical insights come out, helping physicians plan and provide better care at a lower cost. While the sky is the limit as far as the benefits of machine learning, constructing these complex algorithms takes time. In the next five to 10 years, we expect to see medical professionals reaping the dividends of healthcare-based innovation in these areas:

Advanced Image Analysis

Medical professionals are highly trained, and some of their work reflects their tremendous value add. However, there’s still a need for professionals to spend time on repetitive tasks such as image analysis. In radiology, for example, doctors spend time looking at images from CT scans, MRIs, ultrasounds, PET scans, mammography, and more. AI-assisted imaging solutions use the technology’s advanced pattern-recognition capabilities to highlight image features, identify early predictors of cancer, prioritize cases, and cut down on the volume of labor required to perform accurate diagnoses. As AI processes more and more data sets, the technology will inevitably eclipse the ability of human doctors to spot the signs of disease as early as possible.

Disease Detection

Due to its high cost, healthcare imaging generally takes place only to confirm a diagnosis. It’s an effective solution, but one that AI promises to upend and replace. By conducting an in-depth analysis of huge amounts of historical data, AI can predict the possibility of sickness or disease at incredibly early stages. For example, by looking at an entire patient population that closely matches the demographic of a specific individual in addition to the medical history of relatives, AI could conclude that a patient is very likely to develop a malady such as heart disease years before a doctor could ever accurately make a diagnosis.

Drug Discovery

We’ve all seen firsthand how important it is to design and produce effective drugs and vaccines to combat a newly discovered disease. Historically, this process has taken massive investments of time and money, with development timelines extending out to more than a decade in some cases. The ability of AI to cross-reference drugs that are known to be safe and effective and replicate parts of their formulas to suggest new iterations could be groundbreaking, potentially saving countless lives and helping to prevent the next global pandemic.

Digital Consultation

The pandemic undoubtedly spurred innovation in the telehealth space. However, there’s still a long way to go to make virtual visits as effective as a physical visit to the doctor’s office. AI can help close that gap in numerous ways. Machine learning and natural language processing (NLP), for example, will help facilitate symptom collection using just a patient’s voice. Combined with an analysis of the patient’s electronic health record, AI can highlight probable health concerns for doctors to review. By processing information ahead of time, AI increases the volume of patients that doctors can handle, improves the efficacy of virtual visits, and even minimizes the risk of infection from physical interactions as a result.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.iotforall.com/powering-the-next-wave-of-healthcare-innovation-with-ai

Continue Reading

Artificial Intelligence

CMU researchers show potential of privacy-preserving activity tracking using radar

Avatar

Published

on

Imagine if you could settle/rekindle domestic arguments by asking your smart speaker when the room last got cleaned or whether the bins already got taken out?

Or — for an altogether healthier use-case — what if you could ask your speaker to keep count of reps as you do squats and bench presses? Or switch into full-on ‘personal trainer’ mode — barking orders to peddle faster as you spin cycles on a dusty old exercise bike (who needs a Peloton!).

And what if the speaker was smart enough to just know you’re eating dinner and took care of slipping on a little mood music?

Now imagine if all those activity tracking smarts were on tap without any connected cameras being plugged inside your home.

Another bit of fascinating research from researchers at Carnegie Mellon University’s Future Interfaces Group opens up these sorts of possibilities — demonstrating a novel approach to activity tracking that does not rely on cameras as the sensing tool. 

Installing connected cameras inside your home is of course a horrible privacy risk. Which is why the CMU researchers set about investigating the potential of using millimeter wave (mmWave) doppler radar as a medium for detecting different types of human activity.

The challenge they needed to overcome is that while mmWave offers a “signal richness approaching that of microphones and cameras”, as they put it, data-sets to train AI models to recognize different human activities as RF noise are not readily available (as visual data for training other types of AI models is).

Not to be deterred, they set about sythensizing doppler data to feed a human activity tracking model — devising a software pipeline for training privacy-preserving activity tracking AI models. 

The results can be seen in this video — where the model is shown correctly identifying a number of different activities, including cycling, clapping, waving and squats. Purely from its ability to interpret the mmWave signal the movements generate — and purely having been trained on public video data. 

“We show how this cross-domain translation can be successful through a series of experimental results,” they write. “Overall, we believe our approach is an important stepping stone towards significantly reducing the burden of training such as human sensing systems, and could help bootstrap uses in human-computer interaction.”

Researcher Chris Harrison confirms the mmWave doppler radar-based sensing doesn’t work for “very subtle stuff” (like spotting different facial expressions). But he says it’s sensitive enough to detect less vigorous activity — like eating or reading a book.

The motion detection ability of doppler radar is also limited by a need for line-of-sight between the subject and the sensing hardware. (Aka: “It can’t reach around corners yet.” Which, for those concerned about future robots’ powers of human detection, will surely sound slightly reassuring.)

Detection does require special sensing hardware, of course. But things are already moving on that front: Google has been dipping its toe in already, via project Soli — adding a radar sensor to the Pixel 4, for example.

Google’s Nest Hub also integrates the same radar sense to track sleep quality.

“One of the reasons we haven’t seen more adoption of radar sensors in phones is a lack of compelling use cases (sort of a chicken and egg problem),” Harris tells TechCrunch. “Our research into radar-based activity detection helps to open more applications (e.g., smarter Siris, who know when you are eating, or making dinner, or cleaning, or working out, etc.).”

Asked whether he sees greater potential in mobile or fixed applications, Harris reckons there are interesting use-cases for both.

“I see use cases in both mobile and non mobile,” he says. “Returning to the Nest Hub… the sensor is already in the room, so why not use that to bootstrap more advanced functionality in a Google smart speaker (like rep counting your exercises).

“There are a bunch of radar sensors already used in building to detect occupancy (but now they can detect the last time the room was cleaned, for example).”

“Overall, the cost of these sensors is going to drop to a few dollars very soon (some on eBay are already around $1), so you can include them in everything,” he adds. “And as Google is showing with a product that goes in your bedroom, the threat of a ‘surveillance society’ is much less worry-some than with camera sensors.”

Startups like VergeSense are already using sensor hardware and computer vision technology to power real-time analytics of indoor space and activity for the b2b market (such as measuring office occupancy).

But even with local processing of low-resolution image data, there could still be a perception of privacy risk around the use of vision sensors — certainly in consumer environments.

Radar offers an alternative to such visual surveillance that could be a better fit for privacy-risking consumer connected devices such as ‘smart mirrors‘.

“If it is processed locally, would you put a camera in your bedroom? Bathroom? Maybe I’m prudish but I wouldn’t personally,” says Harris.

He also points to earlier research which he says underlines the value of incorporating more types of sensing hardware: “The more sensors, the longer tail of interesting applications you can support. Cameras can’t capture everything, nor do they work in the dark.”

“Cameras are pretty cheap these days, so hard to compete there, even if radar is a bit cheaper. I do believe the strongest advantage is privacy preservation,” he adds.

Of course having any sensing hardware — visual or otherwise — raises potential privacy issues.

A sensor that tells you when a child’s bedroom is occupied may be good or bad depending on who has access to the data, for example. And all sorts of human activity can generate sensitive information, depending on what’s going on. (I mean, do you really want your smart speaker to know when you’re having sex?)

So while radar-based tracking may be less invasive than some other types of sensors it doesn’t mean there are no potential privacy concerns at all.

As ever, it depends on where and how the sensing hardware is being used. Albeit, it’s hard to argue that the data radar generates is likely to be less sensitive than equivalent visual data were it to be exposed via a breach.

“Any sensor should naturally raise the question of privacy — it is a spectrum rather than a yes/no question,” agrees Harris.  “Radar sensors happen to be usually rich in detail, but highly anonymizing, unlike cameras. If your doppler radar data leaked online, it’d be hard to be embarrassed about it. No one would recognize you. If cameras from inside your house leaked online, well… ”

What about the compute costs of synthesizing the training data, given the lack of immediately available doppler signal data?

“It isn’t turnkey, but there are many large video corpuses to pull from (including things like Youtube-8M),” he says. “It is orders of magnitude faster to download video data and create synthetic radar data than having to recruit people to come into your lab to capture motion data.

“One is inherently 1 hour spent for 1 hour of quality data. Whereas you can download hundreds of hours of footage pretty easily from many excellently curated video databases these days. For every hour of video, it takes us about 2 hours to process, but that is just on one desktop machine we have here in the lab. The key is that you can parallelize this, using Amazon AWS or equivalent, and process 100 videos at once, so the throughput can be extremely high.”

And while RF signal does reflect, and do so to different degrees off of different surfaces (aka “multi-path interference”), Harris says the signal reflected by the user “is by far the dominant signal”. Which means they didn’t need to model other reflections in order to get their demo model working. (Though he notes that could be done to further hone capabilities “by extracting big surfaces like walls/ceiling/floor/furniture with computer vision and adding that into the synthesis stage”.)

“The [doppler] signal is actually very high level and abstract, and so it’s not particularly hard to process in real time (much less ‘pixels’ than a camera).” he adds. “Embedded processors in cars use radar data for things like collision breaking and blind spot monitoring, and those are low end CPUs (no deep learning or anything).”

The research is being presented at the ACM CHI conference, alongside another Group project — called Pose-on-the-Go — which uses smartphone sensors to approximate the user’s full-body pose without the need for wearable sensors.

CMU researchers from the Group have also previously demonstrated a method for indoor ‘smart home’ sensing on the cheap (also without the need for cameras), as well as — last year — showing how smartphone cameras could be used to give an on-device AI assistant more contextual savvy.

In recent years they’ve also investigated using laser vibrometry and electromagnetic noise to give smart devices better environmental awareness and contextual functionality. Other interesting research out of the Group includes using conductive spray paint to turn anything into a touchscreen. And various methods to extend the interactive potential of wearables — such as by using lasers to project virtual buttons onto the arm of a device user or incorporating another wearable (a ring) into the mix.

The future of human computer interaction looks certain to be a lot more contextually savvy — even if current-gen ‘smart’ devices can still stumble on the basics and seem more than a little dumb.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://techcrunch.com/2021/05/11/cmu-researchers-show-potential-of-privacy-preserving-activity-tracking-using-radar/

Continue Reading

Artificial Intelligence

CMU researchers show potential of privacy-preserving activity tracking using radar

Avatar

Published

on

Imagine if you could settle/rekindle domestic arguments by asking your smart speaker when the room last got cleaned or whether the bins already got taken out?

Or — for an altogether healthier use-case — what if you could ask your speaker to keep count of reps as you do squats and bench presses? Or switch into full-on ‘personal trainer’ mode — barking orders to peddle faster as you spin cycles on a dusty old exercise bike (who needs a Peloton!).

And what if the speaker was smart enough to just know you’re eating dinner and took care of slipping on a little mood music?

Now imagine if all those activity tracking smarts were on tap without any connected cameras being plugged inside your home.

Another bit of fascinating research from researchers at Carnegie Mellon University’s Future Interfaces Group opens up these sorts of possibilities — demonstrating a novel approach to activity tracking that does not rely on cameras as the sensing tool. 

Installing connected cameras inside your home is of course a horrible privacy risk. Which is why the CMU researchers set about investigating the potential of using millimeter wave (mmWave) doppler radar as a medium for detecting different types of human activity.

The challenge they needed to overcome is that while mmWave offers a “signal richness approaching that of microphones and cameras”, as they put it, data-sets to train AI models to recognize different human activities as RF noise are not readily available (as visual data for training other types of AI models is).

Not to be deterred, they set about sythensizing doppler data to feed a human activity tracking model — devising a software pipeline for training privacy-preserving activity tracking AI models. 

The results can be seen in this video — where the model is shown correctly identifying a number of different activities, including cycling, clapping, waving and squats. Purely from its ability to interpret the mmWave signal the movements generate — and purely having been trained on public video data. 

“We show how this cross-domain translation can be successful through a series of experimental results,” they write. “Overall, we believe our approach is an important stepping stone towards significantly reducing the burden of training such as human sensing systems, and could help bootstrap uses in human-computer interaction.”

Researcher Chris Harrison confirms the mmWave doppler radar-based sensing doesn’t work for “very subtle stuff” (like spotting different facial expressions). But he says it’s sensitive enough to detect less vigorous activity — like eating or reading a book.

The motion detection ability of doppler radar is also limited by a need for line-of-sight between the subject and the sensing hardware. (Aka: “It can’t reach around corners yet.” Which, for those concerned about future robots’ powers of human detection, will surely sound slightly reassuring.)

Detection does require special sensing hardware, of course. But things are already moving on that front: Google has been dipping its toe in already, via project Soli — adding a radar sensor to the Pixel 4, for example.

Google’s Nest Hub also integrates the same radar sense to track sleep quality.

“One of the reasons we haven’t seen more adoption of radar sensors in phones is a lack of compelling use cases (sort of a chicken and egg problem),” Harris tells TechCrunch. “Our research into radar-based activity detection helps to open more applications (e.g., smarter Siris, who know when you are eating, or making dinner, or cleaning, or working out, etc.).”

Asked whether he sees greater potential in mobile or fixed applications, Harris reckons there are interesting use-cases for both.

“I see use cases in both mobile and non mobile,” he says. “Returning to the Nest Hub… the sensor is already in the room, so why not use that to bootstrap more advanced functionality in a Google smart speaker (like rep counting your exercises).

“There are a bunch of radar sensors already used in building to detect occupancy (but now they can detect the last time the room was cleaned, for example).”

“Overall, the cost of these sensors is going to drop to a few dollars very soon (some on eBay are already around $1), so you can include them in everything,” he adds. “And as Google is showing with a product that goes in your bedroom, the threat of a ‘surveillance society’ is much less worry-some than with camera sensors.”

Startups like VergeSense are already using sensor hardware and computer vision technology to power real-time analytics of indoor space and activity for the b2b market (such as measuring office occupancy).

But even with local processing of low-resolution image data, there could still be a perception of privacy risk around the use of vision sensors — certainly in consumer environments.

Radar offers an alternative to such visual surveillance that could be a better fit for privacy-risking consumer connected devices such as ‘smart mirrors‘.

“If it is processed locally, would you put a camera in your bedroom? Bathroom? Maybe I’m prudish but I wouldn’t personally,” says Harris.

He also points to earlier research which he says underlines the value of incorporating more types of sensing hardware: “The more sensors, the longer tail of interesting applications you can support. Cameras can’t capture everything, nor do they work in the dark.”

“Cameras are pretty cheap these days, so hard to compete there, even if radar is a bit cheaper. I do believe the strongest advantage is privacy preservation,” he adds.

Of course having any sensing hardware — visual or otherwise — raises potential privacy issues.

A sensor that tells you when a child’s bedroom is occupied may be good or bad depending on who has access to the data, for example. And all sorts of human activity can generate sensitive information, depending on what’s going on. (I mean, do you really want your smart speaker to know when you’re having sex?)

So while radar-based tracking may be less invasive than some other types of sensors it doesn’t mean there are no potential privacy concerns at all.

As ever, it depends on where and how the sensing hardware is being used. Albeit, it’s hard to argue that the data radar generates is likely to be less sensitive than equivalent visual data were it to be exposed via a breach.

“Any sensor should naturally raise the question of privacy — it is a spectrum rather than a yes/no question,” agrees Harris.  “Radar sensors happen to be usually rich in detail, but highly anonymizing, unlike cameras. If your doppler radar data leaked online, it’d be hard to be embarrassed about it. No one would recognize you. If cameras from inside your house leaked online, well… ”

What about the compute costs of synthesizing the training data, given the lack of immediately available doppler signal data?

“It isn’t turnkey, but there are many large video corpuses to pull from (including things like Youtube-8M),” he says. “It is orders of magnitude faster to download video data and create synthetic radar data than having to recruit people to come into your lab to capture motion data.

“One is inherently 1 hour spent for 1 hour of quality data. Whereas you can download hundreds of hours of footage pretty easily from many excellently curated video databases these days. For every hour of video, it takes us about 2 hours to process, but that is just on one desktop machine we have here in the lab. The key is that you can parallelize this, using Amazon AWS or equivalent, and process 100 videos at once, so the throughput can be extremely high.”

And while RF signal does reflect, and do so to different degrees off of different surfaces (aka “multi-path interference”), Harris says the signal reflected by the user “is by far the dominant signal”. Which means they didn’t need to model other reflections in order to get their demo model working. (Though he notes that could be done to further hone capabilities “by extracting big surfaces like walls/ceiling/floor/furniture with computer vision and adding that into the synthesis stage”.)

“The [doppler] signal is actually very high level and abstract, and so it’s not particularly hard to process in real time (much less ‘pixels’ than a camera).” he adds. “Embedded processors in cars use radar data for things like collision breaking and blind spot monitoring, and those are low end CPUs (no deep learning or anything).”

The research is being presented at the ACM CHI conference, alongside another Group project — called Pose-on-the-Go — which uses smartphone sensors to approximate the user’s full-body pose without the need for wearable sensors.

CMU researchers from the Group have also previously demonstrated a method for indoor ‘smart home’ sensing on the cheap (also without the need for cameras), as well as — last year — showing how smartphone cameras could be used to give an on-device AI assistant more contextual savvy.

In recent years they’ve also investigated using laser vibrometry and electromagnetic noise to give smart devices better environmental awareness and contextual functionality. Other interesting research out of the Group includes using conductive spray paint to turn anything into a touchscreen. And various methods to extend the interactive potential of wearables — such as by using lasers to project virtual buttons onto the arm of a device user or incorporating another wearable (a ring) into the mix.

The future of human computer interaction looks certain to be a lot more contextually savvy — even if current-gen ‘smart’ devices can still stumble on the basics and seem more than a little dumb.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://techcrunch.com/2021/05/11/cmu-researchers-show-potential-of-privacy-preserving-activity-tracking-using-radar/

Continue Reading
PR Newswire4 days ago

Polystyrene Foam Market worth $32.2 billion by 2026 – Exclusive Report by MarketsandMarkets™

Aviation3 days ago

What Happened To Lufthansa’s Boeing 707 Aircraft?

Blockchain4 days ago

Launch of Crypto Trading Team by Goldman Sachs

Cyber Security4 days ago

How to Become a Cryptographer: A Complete Career Guide

AR/VR5 days ago

Apple is giving a laser company that builds some of its AR tech $410 million

Aviation3 days ago

JetBlue Hits Back At Eastern Airlines On Ecuador Flights

Cyber Security3 days ago

Cybersecurity Degrees in Massachusetts — Your Guide to Choosing a School

Payments5 days ago

G20 TechSprint Initiative invites firm to tackle green finance

Ripple’s XRP Price
Blockchain4 days ago

Charted: Ripple (XRP) Turns Green, Here’s Why The Bulls Could Aim $2

Cyber Security4 days ago

How To Unblock Gambling Websites?

Blockchain3 days ago

DOGE Co-founder Reveals the Reasons Behind its Price Rise

Blockchain3 days ago

Miten tekoälyä käytetään videopeleissä ja mitä tulevaisuudessa on odotettavissa

Blockchain4 days ago

South America’s Largest E-Commerce Company Adds $7.8M Worth of Bitcoin to its Balance Sheet

Blockchain4 days ago

Bitcoin Has No Existential Threats, Says Michael Saylor

Aviation4 days ago

United Airlines Uses The Crisis To Diversify Latin American Network

Cyber Security3 days ago

U.S. and the U.K. Published Attack on IT Management Company SolarWinds

Fintech3 days ago

The Spanish fintech Pecunpay strengthens its position as a leader in the issuance of corporate programs

Blockchain3 days ago

“Privacy is a ‘Privilege’ that Users Ought to Cherish”: Elena Nadoliksi

Blockchain4 days ago

Cardano (ADA) Staking Live on the US-Based Kraken Exchange

Private Equity4 days ago

This Dream Job Will Pay You to Gamble in Las Vegas on the Company’s Dime

Trending