In text classification, the feature selection is the process of selecting a specific subset of the terms of the training set and using only them in the classification algorithm. The feature selection process takes place before the training of the classifier.
Update: The Datumbox Machine Learning Framework is now open-source and free to download. Check out the package com.datumbox.framework.machinelearning.featureselection to see the implementation of Chi-square and Mutual Information Feature Selection methods in Java.
The main advantages for using feature selection algorithms are the facts that it reduces the dimension of our data, it makes the training faster and it can improve accuracy by removing noisy features. As a consequence feature selection can help us to avoid overfitting.
The basic selection algorithm for selecting the k best features is presented below (Manning et al, 2008):
On the next sections we present two different feature selection algorithms: the Mutual Information and the Chi Square.
One of the most common feature selection methods is the Mutual Information of term t in class c (Manning et al, 2008). This measures how much information the presence or absence of a particular term contributes to making the correct classification decision on c. The mutual information can be calculated by using the following formula:
In our calculations, since we use the Maximum Likelihood Estimates of the probabilities we can use the following equation:
Where N is the total number of documents, Ntcare the counts of documents that have the values et (occurrence of term t in the document; it takes the value 1 or 0) and ec(occurrence of document in class c; it takes the value 1 or 0) that indicated by two subscripts, and . Finally we must note that all the aforementioned variables take non-negative values.
Another common feature selection method is the Chi Square. The x2 test is used in statistics, among other things, to test the independence of two events. More specifically in feature selection we use it to test whether the occurrence of a specific term and the occurrence of a specific class are independent. Thus we estimate the following quantity for each term and we rank them by their score:
High scores on x2 indicate that the null hypothesis (H0) of independence should be rejected and thus that the occurrence of the term and class are dependent. If they are dependent then we select the feature for the text classification.
The above formula can be rewritten as follows:
If we use the Chi Square method, we should select only a predefined number of features that have a x2 test score larger than 10.83 which indicates statistical significance at the 0.001 level.
Last but not least we should note that from statistical point the Chi Square feature selection is inaccurate, due to the one degree of freedom and Yates correction should be used instead (which will make it harder to reach statistical significance). Thus we should expect that out of the total selected features, a small part of them are independent from the class). Thus we should expect that out of the total selected features, a small part of them are independent from the class. Nevertheless as Manning et al (2008) showed, these noisy features do not seriously affect the overall accuracy of our classifier.
Removing noisy/rare features
Another technique which can help us to avoid overfitting, reduce memory consumption and improve speed, is to remove all the rare terms from the vocabulary. For example one can eliminate all the terms that occurred only once across all categories. Removing those terms can reduce the memory usage by a significant factor and improved the speed of the analysis. Finally we should not that this technique can be used in conjunction with the above feature selection algorithms.
Did you like the article? Please take a minute to share it on Twitter. 🙂
Natural language processing: A cheat sheet
Learn the basics about natural language processing, a cross-discipline approach to making computers hear, process, understand, and duplicate human speech.
It wasn’t too long ago that talking to a computer and having it not only understand, but speak back, was confined to the realm of science fiction, like that of the shipboard computers of Star Trek. The technology of the 24th century’s Starship Enterprise is reality in the 21st century thanks to natural language processing (NLP), a machine learning-driven discipline that gives computers the ability to understand, process, and respond to spoken words and written text.
Make no mistake: NLP is a complicated field that one can spend years studying. This guide contains the basics about NLP, details how it can benefit businesses, and explains where to get started with its implementation.
SEE: Managing AI and ML in the enterprise 2020: Tech leaders increase project development and implementation (TechRepublic Premium)
What is natural language processing?
Natural language processing (NLP) is a cross-discipline approach to making computers hear, process, understand, and duplicate human language. Fields including linguistics, computer science, and machine learning are all a part of the process of NLP, the results of which can be seen in things like digital assistants, chatbots, real-time translation apps, and other language-using software.
The concept of computers learning to understand and use language isn’t a new one—it can arguably be traced all the way back to Alan Turing’s Computing Machinery and Intelligence paper published in 1950, which was where the idea of the Turing Test comes from.
In brief, Turing attempted to determine whether machines could behave in a way indistinguishable from a human, which fundamentally requires the ability to process language and respond in a sensible way.
Since Turing wrote his paper, a number of approaches to natural language processing have emerged. First came rules-based systems, like ELIZA, which were limited in what they could do to a set of instructions. Systems like ELIZA were easy to distinguish from a human because of their formulaic, non-specific responses that quickly become repetitive and feel unnatural: It lacked understanding, which is a fundamental part of modern NLP.
With the advent of machine learning, which allows computers to algorithmically develop their own rules based on sample data, natural language processing exploded in ways Turing never could have predicted.
Natural language processing has reached a state where it’s now better at understanding human speech than real humans. Even this impressive milestone still falls short of truly complete NLP, though, because the machine performing the work was simply transcribing language, not being asked to comprehend it.
Modern NLP platforms are also capable of visually processing speech. Facebook’s Rosetta, for example, is able to “extract text in different languages from more than a billion images and video frames in real time,” TechRepublic sister site CNET said.
What are the challenges of natural language processing?
Computers don’t need to understand human speech to speak a language–the machines operate on a kind of linguistic structure that allows them to accept input, process data, and respond to commands.
Human speech isn’t precise by any stretch of the definition: It’s contextual, metaphorical, ambiguous, and spoken imperfectly all the time, and understanding language requires a lot of background and interpretive ability that computers lack.
Computational linguist Ekaterina Kochmar, in a talk about natural language processing, explained that words exist in a sort of imaginary semantic space. In our minds, Kochmar said, we have representations of words, and words with related or similar meanings live close together in a web of semantic understanding.
Thinking of language in that manner allows machine learning tools to be built that let computers algorithmically create their own semantic space, which lets them infer relations between words and better understand natural speech.
SEE: Robotic process automation: A cheat sheet (free PDF) (TechRepublic)
That doesn’t mean challenges are overcome, though. Going from understanding simple, precise statements like those given to digital assistants to producing sensible speech on their own is still difficult for NLP programs. Candy hearts produced by artificial intelligence (AI) taught to understand romantic language are predictably absurd, and 1 the Road, a novel written entirely by an artificial neural network, is generally nonsensical with only the most occasional glimpse of semantic understanding, which could be entirely chalked up to chance.
As advanced as natural language processing is in its ability to analyze speech, turn it into data, understand it, and use an algorithm to generate an appropriate response, still generally lacks the ability to speak on its own or grasp the ambiguity and metaphor that is fundamental to natural language.
We’ve mastered the first part: Understanding. It’s the second part, generating natural speech or human language, that we’re still a bit stuck on. And we might be stuck there for a while, if pioneering mathematician and computer scientist Ada Lovelace is correct: She posited that computers were only able to do what we told them to, and were incapable of originality. Known as Lady Lovelace’s Objection, it’s become a common part of criticism of the Turing Test and thus a criticism of natural language processing: If machines can’t have original thoughts, then is there any way to teach them to use language that isn’t ultimately repetitive?
How is natural language processing used?
Natural language processing has a lot of practical applications for a variety of business uses.
Google Duplex is perhaps the most remarkable use of natural language processing available as an example today. The digital assistant, introduced in 2018, is not only able to understand complex statements, but it also speaks on the phone in a way that’s practically indistinguishable from a human—vocal tics and all. Duplex’s goal is to carry out real-world tasks over the phone, saving Google users time spent making appointments, booking services, placing orders, and more.
Ninety-eight percent of Fortune 500 companies are now using natural language processing software to filter candidates for job searches with products known as applicant tracking systems. These products pick through resumes to look for appropriate keywords and other linguistic elements.
SEE: Robotics in the enterprise (free PDF) (TechRepublic)
Chatbots are quickly becoming the first line of online customer service, with 68% of consumers saying they had a positive experience speaking with one. These bots use natural language processing to address basic requests and problems, while also being able to elevate requests to humans as needed.
Uses of NLP in healthcare settings are numerous: Physician dictation, processing hand-written records, compiling unstructured healthcare data into usable formats, and connecting natural language to complicated medical billing codes are all potential uses. NLP has also been used recently to screen COVID-19 patients.
NLP can be used to gauge customer attitudes in call center environments, perform “sentiment analysis” on social media posts, can be used as part of business intelligence analysis, and can supplement predictive analytics.
Natural language processing has a potentially endless variety of applications: Anything involving language can, with the right approach, be a use case for NLP, especially if it involves dealing with a large volume of data that would take a human too long to work with.
How can developers learn about natural language processing?
NLP is a complicated topic that a computer scientist could easily spend years learning the ins and outs of. If your objective is being at the cutting edge of NLP research, it’s probably best to think about attending a university known for having a good computational linguistics program.
Developers who want to learn to make use of current NLP technology don’t need to dive that far into the deep end. Text analytics firm MonkeyLearn has an excellent rundown of resources and steps to get started with natural language processing; here are a few key points from its guide.
MonkeyLearn’s guide also has a variety of links in it to articles, research, and journals that any budding NLP developer should be aware of.
What is the best way for businesses to get started with natural language processing?
Every business uses language, so there’s a good chance you can come up with at least one or two uses for natural language processing in your organization—but how do you go from thinking about what NLP could do for you to actually doing it? There are a lot of steps to consider.
For starters, you need to know what your objectives are for NLP in your business. Do you want to use it to aggregate data as an analytics tool, or do you want to build a chatbot that can interact with customers via text on your support portal? Maybe you want to use NLP as the backbone of an e-mail filter, understand customer sentiment, or use it for real-time translation.
No matter what you want NLP to do for your business you need to know your goal before even starting to think about achieving it.
Once you know what you want to do with natural language processing, it’s time to find the right talent to build the system you want. You may already have developers in-house who are familiar with Python and some of the NLP frameworks mentioned above. If that’s the case, get them involved in the planning stages from the very beginning.
If you don’t have anyone in-house who can develop natural language processing software, you’re faced with a choice: Hire new people or bring in a third-party that specializes in NLP solutions.
If you choose to go about your NLP objectives in-house, you’ll need to find the right software solutions or providers for hosting your NLP platform, and there are plenty of recognizable names to choose from.
IBM Watson has options, AWS offers Amazon Comprehend and other NLP services, Microsoft Azure has NLP services as well, as does Google Cloud. Choosing the proper platform will require input from your developers because they’re the ones who will be working with the software every day, and your NLP initiative’s success may hinge on how well they can use the platform.
The path to real-world artificial intelligence
Experts from MIT and IBM held a webinar this week to discuss where AI technologies are today and advances that will help make their usage more practical and widespread.
Artificial intelligence has made significant strides in recent years, but modern AI techniques remain limited, a panel of MIT professors and IBM’s director of the Watson AI Lab said during a webinar this week.
Neural networks can perform specific, well-defined tasks but they struggle in real-world situations that go beyond pattern recognition and present obstacles like limited data, reliance on self-training, and answering questions like “why” and “how” versus “what,” the panel said.
The future of AI depends on enabling AI systems to do something once considered impossible: Learn by demonstrating flexibility, some semblance of reasoning, and/or by transferring knowledge from one set of tasks to another, the group said.
SEE: Robotic process automation: A cheat sheet (free PDF) (TechRepublic)
The panel discussion was moderated by David Schubmehl, a research director at IDC, and it began with a question he posed asking about the current limitations of AI and machine learning.
“The striking success right now in particular, in machine learning, is in problems that require interpretation of signals—images, speech and language,” said panelist Leslie Kaelbling, a computer science and engineering professor at MIT.
For years, people have tried to solve problems like detecting faces and images and directly engineering solutions that didn’t work, she said.
We have become good at engineering algorithms that take data and use that to derive a solution, she said. “That’s been an amazing success.” But it takes a lot of data and a lot of computation so for some problems formulations aren’t available yet that would let us learn from the amount of data available, Kaelbling said.
One of her areas of focus is in robotics, and it’s harder to get training examples there because robots are expensive and parts break, “so we really have to be able to learn from smaller amounts of data,” Kaelbling said.
Neural networks and deep learning are the “latest and greatest way to frame those sorts of problems and the successes are many,” added Josh Tenenbaum, a professor of cognitive science and computation at MIT.
But when talking about general intelligence and how to get machines to understand the world there is still a huge gap, he said.
“But on the research side … really exciting things are starting to happen to try to capture some steps to more general forms of intelligence [in] machines,” he said. In his work, “we’re seeing ways in which we can draw insights from how humans understand the world and taking small steps to put them in machines.”
Although people think of AI as being synonymous with automation, it is incredibly labor intensive in a way that doesn’t work for most of the problems we want to solve, noted David Cox, IBM director of the MIT-IBM Watson AI Lab.
Echoing Kaelbling, Cox said that leveraging tools today like deep learning requires huge amounts of “carefully curated, bias-balanced data,” to be able to use them well. Additionally, for most problems we are trying to solve, we don’t have those “giant rivers of data” to build a dam in front of to extract some value from that river, Cox said.
Today, companies are more focused on solving some type of one-off problem and even when they have big data, it’s rarely curated, he said. “So most of the problems we love to solve with AI—we don’t have the right tools for that.”
That’s because we have problems with bias and interpretability with humans using these tools and they have to understand why they are making these decisions, Cox said. “They’re all barriers.”
However, he said, there’s enormous opportunity looking at all these different fields to chart a path forward.
That includes using deep learning, which is good for pattern recognition, to help solve difficult search problems, Tenenbaum said.
To develop intelligent agents, scientists need to use all the available tools, said Kaelbling. For example, neural networks are needed for perception as well as higher level and more abstract types of reasoning to decide, for example, what to make for dinner or to decide how to disperse supplies.
“The critical thing technologically is to realize the sweet spot for each piece and figure out what it is good at and not good at. Scientists need to understand the role each piece plays,” she said.
The MIT and IBM AI experts also discussed a new foundational method known as neurosymbolic AI, which is the ability to combine statistical, data-driven learning of neural networks with the powerful knowledge representation and reasoning of symbolic approaches.
Moderator Schubmehl commented that having a combination of neurosymbolic AI and deep learning “might really be the holy grail” for advancing real-world AI.
Kaelbling agreed, adding that it may be not just those two techniques but include others as well.
One of the themes that emerged from the webinar is that there is a very helpful confluence of all types of AI that are now being used, said Cox. The next evolution of very practical AI is going to be understanding the science of finding things and building a system we can reason with and grow and learn from, and determine what is going to happen. “That will be when AI hits its stride,” he said.
Role of Artificial Intelligence in Social Media
Considering the fact that there are 3.81 billion (continue to grow) active social media population worldwide, it would no wrong to say that we live in an era of social media. Different online studies reveal that every smartphone users uses at least one social media application (Instagram, Facebook, Twitter, Tumblr, LinkedIn, Snapchat, etc.).
But, social media is not just about connecting to your friends or family, rather, it has become a perfect place for businesses to find new clients or nurture their relationships with the existing ones. By sharing their thoughts, photos, or videos on these platforms, both the businesses and individuals are adding an unimaginable amount of data that is increasing exponentially with each passing year.
And, if you are wondering how these platforms are managing the same, then the answer is through AI and various other technologies. Yes, AI or artificial intelligence is contributing massively to manage this sea of human data coming to these platforms.
This branch of computer science makes machines to act, think, and behave like human beings. AI and machine learning (a subset of AI) in social media are helping the giant social networking companies to make sense of the user-generated data to manage various activities. This article is all about the impact of artificial intelligence on social media.
How is AI used in social media?
Managing social media platforms (flooded with innumerable users) is not a child’s play; it requires a lot of things to look upon. With artificial intelligence, social networking companies are analyzing voluminous data to find out what’s trending, different hashtags, and patterns. This analysis helps in understanding users’ behavior.
With the help of various algorithms, artificial intelligence can keep an eye on the unstructured user comments to offer a personalized experience and to recognize crisis. The technology can also assist in providing content analyzing different activities as well as demographics.
Most of the top social networking companies have already adopted AI to scale up their processes and take their business to the next level.
Top social networking platform uses machine learning and AI for serving you the content of your interest, recognizing your face in photos, recommending you tag options, identifying visuals, and for various other tasks.
The platform uses AI to detect a face, from a complete image, to create a thumbnail. It utilizes neural networks to decipher- which section of an image the user would like. Twitter also uses this technology to suggest replies while commenting on a tweet or answering a comment.
This social media platform relies on machine learning and artificial intelligence to predict suitable candidates for a particular job role. Using AI, Linkedin also highlights the candidates who are actively looking out for a new opportunity or are most likely to respond.
There are more than 200 billion users who Pinterest pins on this platform and 80% of them make a purchase the personalized content. The platform uses neural networking to show its users the content of their interest. It means images available on Pinterest are linked to a neural network based on a particular theme.
Apart from these four, other social networking websites are also leveraging AI and machine learning to streamline their processes and deliver an unmatchable user experience.
Benefits of using AI in social media
To recognize images
AI-powered image recognition software and tools help in recognizing various images to understand the change in users’ behavior or pattern. Through complex algorithms, it can go through millions of images to bring out valuable information.
Businesses running over social media can use AI-powered chatbots to answer their customers’ queries in no time. AI-enabled chatbots can efficiently conduct conversations with the consumers and provide them the required answers by understanding the intent of a query. With this, businesses can improve customer experience to a significant level.
Since AI can analyze the nature or intent of a query or comment or something posted by a user, it can help brands to identify sentiments to know how you feel. For this, AI uses another subset known as natural language processing. NLP also helps in finding out positive and negative words in a post or comment.
AI can help social media platforms to protect the user data and increase the privacy of their information. Through user authentication, pattern detection, fraud prevention, and other features, this technology can help users to improve the security of their social media accounts.
Future of AI in social media
The amazing benefits artificial intelligence is rendering to social media platforms depict that the technology is here to stay for long. Keeping in view the growing number of social media users, it would be no wonder to consider it as the biggest marketplace in the future. The technology will help social networking companies to deliver better customer experience and help marketers to target the right customers that will increase the conversion rate and ROI. They will use images to boost the engagement of their targeted audiences and look into their behavior.
How AI is used in social media ads or social media marketing?
AI-powered tools help you to look into your brand’s social media profiles and visitors. Using those tools, you can understand users’ behavior and what they say or post about your brand. This data can further help you to know your global brand equity, recognize new trends, target new audiences based on their interest, and identify new ways of social media promotion.
While social media platform gives individuals and businesses to connect to people and targeted audience, it also allows brands to run paid advertisements based on behavioral targeting and demographics. Having funds is not the only criterion to run a brand advertisement successfully; you need to be creative to make your ad get the desirable clicks.
There are several AI-based tools that can help to know how to optimize your ad for conversions and clicks. These tools can also provide you other valuable information, such as which language will deliver better results or what words are popular among customers looking out for the products or services like yours.
AI-enabled tools can also help you to measure the performance of your ad campaign to know whether it is going in the right direction or delivering the desired output or not. AI can also help you to predict market growth and make your ad strategies accordingly. You can increase ROI and get more organic customers for your brand.
Artificial intelligence is helping social media platforms to manage the pool of data and make sense of it to know the latest trends, user behavior and their interests, find out and block abusive content, and for various other purposes. It has a bright future in this industry as it improves user experiences and help brands to serve them better. AI also plays a major role in social media marketing by letting the brands measure the performance of the company and identify users that can be converted into potential customers. To integrate artificial intelligence in your existing social media application or to build a new AI-powered social media app, reach out to a reliable AI development company, or hire AI/ML developers.
Decentralized File Sharing, Explained
Here are dev1ce’s CS:GO settings and crosshair
Scores and standings for the PMWL East opening weekend
Comparing Apple to Bitcoin? Crypto Occupies a Class of Its Own
Review: ‘HOW ARE WE’ – Tokenized Performance Art Film
The Problem With Forced LEC Narratives
PUBG Mobile esports unveils new documentary titled “Between the Battlegrounds”
Japanese Exchange Holdings of Bitcoin Surged During the Pandemic
This Exchange Crashed Bitcoin Price to $9K: Here’s Why That’s Bullish
CDEC take the OGA Dota PIT Season 2: China championship title
Volumes fall post-July Fourth (as expected) but remain strong
Dota 2: The New Anti-Mage Persona Is Live
Oxford Instruments Plasma Technology Releases PTIQ: Intelligent Control Software for Plasma and Ion beam Processing Equipment
This Eerie Shanghai Composite Fractal Predicts Bitcoin Will Surge Past $10k
The Encrypted Messaging Dilemma: Balancing Censorship and Freedom
World’s First FDA IDE Coronary Patient Treated With a DEB
EFF’s 30th Anniversary Livestream | Electronic Frontier Foundation @eff
Juno Captures Amazng Image of ‘Clyde’s Spot’ on Jupiter #SpaceSaturday
This Map Crowdsources Police Brutality Data
Is ex-OG carry ana hinting at a return to competition?
3 weed products rap legend Wyclef Jean can’t live without
What’s an IP location CDN?
CATCH 22 – DAY 11: Man facing charges for brake checking, DUI, marijuana
Is Honda Waking Up?
Presidential advisers, confidants claim Trump has made himself into a victim amid the pandemic, according to report
Health department shuts down production at Dov Charney’s clothing company, Los Angeles Apparel, after ‘flagrant’ health violations and death of 4 workers
FinovateAsia: Innovation in Customer Experience, Regtech, and Financial Crime
Facebook Announced Rewards for Vulnerabilities in Hermes and Spark AR
One step forward, 2 steps back: Harvard’s Dr. Thomas Tsai assesses Trump’s response to COVID-19, and tells us his fears and hopes about the months ahead
It’s Time for Watch Clocks to Make a Comeback
Upper Princeton area storage buildings could offer future cannabis use
Google Announced New Policy Updates to Reject Ads for Spyware
President Donald Trump commutes ally Roger Stone’s prison sentence
Demand for Synthetix Continues To Grow Despite ‘Concerning Signs’
Correlation Between Bitcoin Price and Stocks Reaches a New All-Time High
Shapshot VR – Developer Interview with GIANT SCAM
Jurassic fossils from northeastern China reveal morphological stasis in the catkin-yew
An ex-Goldman Sachs exec thinks Ethereum is on the verge of outpacing Bitcoin
Unless Bitcoin Breaks Through These 2 Levels, a Crash to $7,000 Is “Logical”
Researcher Says Ethereum 2.0 May Be Delayed — Increasing Risk to ETH Bulls
Business Insider1 week ago
A 17-year-old entrepreneur made nearly $500,000 reselling sneakers during a quarantine. Here’s a look inside his pandemic-proof business model.
Gaming1 week ago
Fortnite Floating Rings Locations: Where To Collect Rings At Lazy Lake
Gaming1 week ago
Popular gamer Byron ‘Reckful’ Bernstein dead at 31, hours after proposing on Twitter
Biotechnology1 week ago
Researchers Find A Newer Dominant Variant of COVID-19-Causing Virus
Gaming1 week ago
EVO Online canceled & Mr. Wizard to leave company amid sexual misconduct allegations
Start Ups7 days ago
Elon Musk tweeted a meme of “7 Things Every Kid Needs to Hear”
Business Insider4 days ago
A 36-year-old business owner saw her pandemic sales skyrocket without spending a dime on marketing. Here’s how she tweaked her Etsy shop and used word-of-mouth power to keep revenues high.
Big Data1 week ago
Real-World Machine Learning Case Study: Clustering Transactions Based on Text Descriptions