Connect with us

Big Data

NoSQL for Beginners

Avatar

Published

on

NoSQL for Beginners

NoSQL can offer an advantage to those who are entering Data Science and Analytics, as well as having applications with high-performance needs that aren’t met by traditional SQL databases.


By Alex Williams, Hosting Data UK

What is NoSQL?

 
NoSQL is essentially the response to SQL’s rigid structure. First created in the early 1970s, NoSQL didn’t really take off until the late 2000s, when Amazon and Google both put a lot of research and development into it. Since then, it’s taken off to be an integral part of the modern world, with many big websites around the world using some form of NoSQL.

So what is NoSQL exactly? Essentially it is a philosophy for creating databases that does not require a schema nor does it store data in a relational model. In fact, NoSQL has a variety of NoSQL Databases to pick from, each with their own specialization and use cases. As such, NoSQL is incredibly diverse when it comes to filling niches, and you can almost certainly find a NoSQL data model to fit your needs.

Image

Differences Between SQL vs NoSQL

 
While SQL is a specific database and language, NoSQL isn’t, but that doesn’t mean that we can’t look at the general philosophies and differences between the two.

Scalability

 
When it comes to SQL, the only real way to scale is to upgrade vertically. That means that you need to buy higher-end and more expensive gear than you already have if you want better performance. Scaling with NoSQL is done through horizontal expansion, so all you really need to do is throw in another shard and you’re basically done.

This means that NoSQL is absolutely great for applications that are likely to grow in the future where hardware may very well be a substantial roadblock.

Schema

 
SQL is built from the ground-up to essentially avoid data duplication. That ultimately means that any SQL project requires an expert designer to spend a long period of time on the schema before implementing it. This step is not only important in the long run to maintain data quality, it can also get quite expensive.

On the other hand, since NoSQL doesn’t necessitate the need for a schema, you avoid the expense and time of that initial design stage. Furthermore, the lack of schema means that most NoSQL databases are incredibly flexible, allowing you to change or even mix data types and models. This makes administrating and dealing with the database much easier.

Performance

 
With SQL, querying data tends to require you to do so across multiple tables. With NoSQL, all the data is contained in one table, and therefore querying is much easier. This has the side-effect of making NoSQL much better at dealing with high-performance tasks. For example, Amazon DB can do millions of queries a second, which is pretty useful for a global store like Amazon.

Support

 
The only really big downside when it comes to NoSQL is that it isn’t as established as SQL. Keep in mind that SQL has several decades worth of a head start, and this maturity shows in the ease with which information can be found. Similarly, finding an expert on SQL is much easier than finding one for NoSQL.

Finally, since NoSQL isn’t a singular database or language, but instead several dozen data models, this granularity further divides any potential expertise and makes NoSQL a matter of specialization. Therefore, reliance for information and support will mostly be contained to smaller communities.

Types of NoSQL Databases

 
While there are over half a dozen NoSQL data models to go with, we’ll cover the four main ones here that are essential to understanding NoSQL databases.

Image

Document Stores

This type of data model allows you to store information as any type of data. This is in contrast to SQL, which relies heavily on XML and JSON, and essentially ties the two together, and can make any query inefficient (or less efficient). Since NoSQL doesn’t use a scheme, there’s no need for relational data storage, and no need to tie those two together.

In fact, there is a NoSQL data model that is XML specific, if you want to go that route.

 
Graph

Graph or Network Data models are built around the concept that the relationship between the data is just as important as the data itself. In this data model, information is stored as relationships and nodes, with the nodes holding the data, and the relationship describing the relationship between any set of nodes.

As the name might suggest, and if you’ve been following along, this is an excellent data model to use for showing information on a graph. The ability to quickly visualize information from disparate sets of data, especially in relation to each other, can offer a massive amount of insight that doesn’t require pouring through several hundred pages of data.

 
Key-Value Store

As the name suggests, this data model stores information using keys and pointers to the specific value. Since both keys and values can be any piece of data you desire, Key-value store data models are quite versatile. Its purpose made for retrieving, storing, and managing arrays and is perfect for high volume applications.

In fact, Amazon DB is a key-value store data model, and this data model type was pioneered by Amazon themselves. A key-value store is also a general category under which other data models exist, with some types of graph data-models essentially functioning like key-value.

 
Column-oriented

Whereas SQL traditionally stores data in rows, Column-oriented stores it in columns. These are then grouped into families, which can themselves hold a nearly infinite amount of columns. Writing and reading are also done by columns as well, so the whole system is really efficient, and is made for fast search & access, as well as data aggregation.

That being said, it isn’t that great for complex querying.

Conclusion

 
One thing to remember is that NoSQL is not meant as a replacement to SQL so much as it’s meant to supplement it. NoSQL itself mostly uses specialized databases to fill in the gaps that are missing from SQL, and while you absolutely can go without SQL if you so chose, NoSQL does not preclude the use of it. Sometimes you may very well find yourself using both SQL and NoSQL.

 
Bio: Alex Williams is a seasoned full-stack developer and the owner of Hosting Data UK. After graduating from the University of London, majoring in IT, Alex worked as a developer leading various projects for clients from all over the world for almost 10 years. Recently, Alex switched to being an independent IT consultant and started his own blog. There, he explores web development, data management, digital marketing, and solutions for online business owners just starting out.

Related:

Source: https://www.kdnuggets.com/2020/12/nosql-beginners.html

Big Data

California’s Proposition 24 Confirms the Fate of Data Privacy

Avatar

Published

on

Click to learn more about author Kyle McNabb.

The rolling thunder of data regulations rumbles on — much to the dismay of companies and the delight of consumers. The latest rainmaker (or taker) is California’s Proposition 24. This consumer privacy ballot initiative, containing the Consumer Privacy Rights Act (CPRA), was passed on November 3, 2020, establishing a new standard for data privacy in the state. The CPRA builds on the California Consumer Privacy Act (CCPA), addressing its predecessor’s shortcomings and expediting California’s legislation on data privacy.

While Proposition 24 has been nicknamed CCPA 2.0, it is much more than another drop in the regulatory bucket. It will enforce new requirements that companies must take note of and prepare for — both with their compliance strategies and long-term approach to data privacy, which is clearly here to stay.

What Does
Proposition 24 Mean for Data Privacy?

There is a
key difference between the CCPA, which just became enforceable months ago, and
Proposition 24 (and the CPRA). Proposition 24 will become a state law as
written, not legislatively-enacted — which means it can’t be amended without
more voter action, like another ballot initiative. Why does this matter?

The passing
of Proposition 24 in California is further proof that consumers want a say in
how they are tracked on the internet and how their data is used by companies. They
feel so strongly about these rights that they’ve already improved upon the CCPA
and ensured these improvements were more legislatively permanent. That’s
telling. Proposition 24 represents more than a surge in regulations — it
embodies an awakening of the modern consumer.

With a
greater burden placed on businesses to stay on top of cybersecurity audits and
risk assessments, it’s increasingly important they have a handle on how much
data lives within their organization, how sensitive it is, and how much risk is
involved in their handling of that data.

How Does
Proposition 24 Change the CCPA?

The new legislation will ultimately strengthen and give new teeth to the existing CCPA by creating new privacy rights for consumers, obligations for businesses, and enforcement mechanisms through a new state agency. Under Proposition 24, consumers gain the right to:

  1. Correct personal information
  2. Know the length of data retention
  3. Opt-out of advertisers using precise
    geolocation
  4. Restrict usage of sensitive personal
    information

While the
new legislation does roll back requirements on companies to respond to
individual data requests and provide full data reports, other laws still require
businesses to provide individuals with information about how their data is used.
In other words, companies shouldn’t be thinking about relaxing any data privacy
and security efforts they have in place. Instead, businesses should look out
for four big changes from Proposition 24:

  1. It defines a new category of “sensitive personal information,” which
    is broader and stricter than just “personal information.” For instance, new
    stipulations include increasing penalties three times for violations concerning
    consumers younger than 16 years old.
  2. It creates a new state agency: the California Privacy Protection
    Agency (CPPA), the first of its kind in the United States. The CPPA will have
    full administrative power and oversight for enforcement, including audits.
  3. It prohibits precise geolocation tracking to a location within roughly
    250 acres. To accommodate this change, companies will have to adjust their data
    collection processes.
  4. It allows consumers to limit the use and disclosure of sensitive
    personal information based on the broader category.

The key
here is that the legislation still gives consumers data rights they didn’t have
previously, and companies will need to actively make changes to their data
collection practices.

How Should Companies
Prepare for Proposition 24?

While the
new legislation won’t go into effect until the start of 2023, consumers’ right
to access their personal information will extend back to data collected by
companies on or after January 1, 2022. That gives businesses just a year to
prepare for these massive changes, so it’s critical they begin their
preparations now. In fact, state-specific legislation will drive data privacy
regulations to go national. To prepare for the future, businesses must invest
in tools that make it easier to protect the privacy of consumers’ information
and govern that information in compliance with regulations.

Organizations need to build trust with their data — knowing where it lives, where it came from, and who has touched it. For many companies, trust begins with building an automated “as is” data inventory, which collects metadata from sources inside and outside the business. Proposition 24, like other data privacy regulations, requires that companies can quickly locate all sensitive personal information to respond to data consumer requests or opt-outs. A data inventory automates the scanning and identification of sensitive personal data across the entire organization — giving companies a full view of the information they have and where it is.

That said, data intelligence is not enough for compliance alone — companies also need visibility into where sensitive personal information resides within their documents, content, and records, too. This is a major roadblock for companies. Most businesses lack the ability both to find sensitive information within content and to associate that information with a specific person — and it’s only getting worse with remote work and content sprawl. Companies must operationalize privacy compliance in order to adhere to consumer requests around their data. They need a governance strategy that can locate personal information anywhere in the enterprise. Having solutions with capabilities such as rules-based retention, redaction, and auditability of access makes this process much easier, especially when responding to consumer questions/requests.

By implementing a privacy-aware information management
strategy — for both structured and unstructured data — organizations can
understand their entire ecosystem. Heading into 2021, it will be increasingly
important to proactively seek out dark data, tackle compliance, and prepare for
current and future data privacy regulations like Proposition 24.

It’s no longer enough to simply manage data and content.
As the GDPR, CCPA, and now CPRA have shown, data privacy regulations will only
keep coming — and they will be increasingly targeted, intentional, and perhaps
even stricter. Companies outside of California, or the EU for that matter, must
resist the urge to turn a blind eye while they are not the direct subjects of
data regulations. Because while data privacy laws may sound like distant
thunder today, the lightning is on its way.

Source: https://www.dataversity.net/californias-proposition-24-confirms-the-fate-of-data-privacy/

Continue Reading

AI

Three Reasons the Technical Talent Gap Isn’t to Blame for Failing AI Projects

Avatar

Published

on

Click to learn more about author David Talby.

A shortage of technical talent has long
been a challenge for getting AI projects off the ground. While research shows
that this may still be the case, it’s not the end-all-be-all and certainly not
the only reason so many AI initiatives are doomed from the start.

Deloitte’s recent State of AI in the Enterprise survey found the type of talent most in-demand — AI developers and engineers, AI researchers, and data scientists — was fairly consistent across all levels of AI proficiency. However, business leaders, domain experts, and project managers fell lower on the list. While there’s no disputing that technical talent is valuable and necessary, the lack of attention on the latter titles should be a bigger part of the conversation.

It’s likely that the technical skills gap will persist for the next few years, as university programs play catch up to real-world applications of AI, and organizations implement internal training or opt for outsourcing entirely. That doesn’t mean businesses can wait for these problems to solve themselves or for the talent pool to grow. In order to avoid being one of the 85 percent of AI projects that fail to deliver on their intended promises, there are three areas organizations can focus on to give their projects a fighting chance.

1.
Organizational Buy-In: AI-Driven Product, Revenue, and Customer Success

Understanding how AI will work within a professional and product environment and how it translates to a better customer experience and new revenue opportunities is critical — and that spans far beyond the IT team. Being able to train and deploy accurate AI models doesn’t address the question of how to most effectively use them to help your customers. Doing this requires educating all organizational disciplines — sales, marketing, product, design, legal, customer success — on why this is useful and how it will impact their job function.

When done well, new capabilities
unlocked by AI enable product teams to completely rethink the user experience.
It’s the difference between adding Netflix or Spotify recommendations as a side
feature versus designing the user interface around content discovery. More
aspirationally, it’s the difference between adding a lane departure alert to
your new car versus building a self-driving vehicle that doesn’t have pedals or
wheels. Cross-functional collaboration and buy-in on AI projects is a vital
part of the success and scaling and should be a priority from the get-go.

2.
Realistic Expectations: The Lab vs. the Real World

We’re at an exciting juncture for AI development, and it’s easy to get caught up in the “new shiny object” mentality. While eagerness to implement new AI-enabled efficiencies is a good thing, jumping in before setting expectations is a sure-fire way to end up disappointed. A real instance of the challenges organizations face when implementing and scaling AI projects comes from a recent Google Research paper about a new deep learning model used to detect diabetic retinopathy from images of patients’ eyes. Diabetic retinopathy, when untreated, causes blindness, but if detected early, it can often be prevented. As a response, scientists trained a deep learning model to identify early stages of the disease symptom to accelerate detection and prevention.

Google had access to advanced machines for model training
and data from environments that followed proper protocols for testing. So,
while the technology itself was as accurate, if not more so than human
specialists, this didn’t matter when applied to clinics in rural Thailand.
There, the quality of the machines, lighting in the rooms in the clinic, and
patients’ willingness to participate for a host of reasons were quite different
than the conditions the model was trained on. The lack of appropriate infrastructure
and understanding of practical limitations is a prime example of the discord
between Data Science success and business success.

3.
The Right Foundation: Tools and Processes to Operate Safely

Successful AI products and services
require applied skills in three layers. First, data scientists must be
available, productively tooled, and have domain expertise and access to
relevant data. While AI technology is becoming well understood, from bias
prevention, explainability, concept drift, and similar issues, many teams are
still struggling with this first layer of technical issues. Second,
organizations must learn how to deploy and operate AI models in production.
This requires DevOps, SecOps, and newly emerging “AI Ops” tools and processes
to be put in place, so models continue working accurately in production over
time. Third, product managers and business leaders must be involved from the
start in order to redesign new technical capabilities and how they will be
applied to make customers and end-users successful.

There’s been tremendous progress in
education and tooling over the past five years, but it’s still early days for
operating AI models in production. Unfortunately, design and product management
are far behind, and becoming one of the most common barriers to AI success.
This is why it might be time for respondents of the aforementioned Deloitte
survey to start putting overall business success and organizational buy-in
before finding the top technical talent to lead the way. The antidote for this
is investing in hands-on education and training, and fortunately, from the
classroom to technical training courses, these are becoming more widely
available.

Although a relatively new technology, AI has the power to
change how we work and live for the better. That said, like any technology, AI
success hinges on proper training, education, buy-in, and well-understood
expectations and business value. Aligning all of these factors takes time, so
be patient, and be sure to have a strategy in place to ensure your AI efforts
deliver.

Source: https://www.dataversity.net/three-reasons-the-technical-talent-gap-isnt-to-blame-for-failing-ai-projects/

Continue Reading

Big Data

Traveling in the Age of COVID-19: Big Data Is Watching

Avatar

Published

on

Click to learn more about author Bernard Brode.

With news of the first dose of a vaccine successfully administered, it
appears that we might finally be seeing the beginning of the end of the COVID-19
pandemic. However, it’s also clear that the impact of the virus — and the ways
we have responded to it — will last for many years. Long after the health and
economic effects have faded.

Those of us who work in technology have been aware of this for some time, of course. Back at the beginning of the pandemic, we were warning that the security of medical devices might become a very real problem this year. Similarly, we warned that the use of big data to fight the pandemic ran the risk of setting a problematic precedent when it came to the right to personal privacy.

We are now living with the consequences of that decision. Traveling
today means greater privacy intrusion than ever before, and we have the
pandemic to blame for that. In this article, we’ll look at how we ended up in
this position and how we can avoid this becoming the new normal.

Beating the Virus with Big Data

Most of the mainstream analyses of the way that technology has been leveraged to fight the COVID-19 virus have focused on the expansion of data acquisition systems. This was the focus, for instance, of an April article in the New York Times, which set the tone for most of the reporting on the apparent tension between personal privacy and public health surveillance.

That article noted that many countries around the world — from Italy to Israel — have begun to harvest geolocation data from their citizens’ smartphones in order to track their movements. This move was certainly unprecedented and represented a radical expansion of a nation state’s ability to keep track of citizens. In terms of fighting the pandemic, however, it was less than useful.

To understand why, it’s instructive to reflect on this article in HealthITAnalytics, also from April 2020. The interview is with James Hendler, the Tetherless World Professor of Computer, Web, and Cognitive Science at Rensselaer Polytechnic Institute (RPI) and Director of the Rensselaer Institute for Data Exploration and Applications (IDEA). He told the magazine that fighting the virus was not merely a question of being able to collect data; rather, the bottleneck was in being able to manipulate and analyze it in a way that would produce actionable insights.

In other words, Hendler pointed out, fighting the virus is “a big data problem,” and one where “artificial intelligence can play a big role.” And with more than 4.5 billion people already online by the end of 2020, our ability to process and secure these data lags significantly behind our ability to collect it.

Privacy Concerns

This central insight — that analyzing the data produced by large-scale surveillance networks required the deployment of big data tools — is likely to have a remarkable impact on the way that we travel in the next few years.

The biggest impact, for most of us, will be an expansion of the kind of
“intelligent” systems that are used to make personalized recommendations for
products and services to buy. Several of the companies who run such engines
were keen to offer their expertise to public health researchers early in the
pandemic. Amazon Web Services, Google Cloud, and others have all offered
researchers free access to open datasets and analytics tools to help them
develop COVID-19 solutions faster.

Many travelers — indeed, many citizens — should be worried about that. As we noted early this year, asking whether big data can save us from the virus was never really the issue — it was clear that this kind of analysis would be of great utility from a public health perspective. The problem was what would happen to this data after the pandemic and what kind of precedent this surveillance would set.

In other words, most people were happy to have their movements tracked
in order to beat the virus, but will governments ever stop tracking us? Or will
they merely sell this information to advertising companies?

The New Normal?

Consumers are, of course, aware of these issues. Every time there is an expansion in the surveillance infrastructure used by the state and by advertisers, we see a simultaneous rise in search interest related to online privacy tools intended to prevent this kind of tracking.

However, consumers can only go so far when it comes to protecting
themselves and their privacy. Ultimately, in order to prevent our every flight,
drive, and even walk from being tracked, we will need to build a legal framework
that matches the sophistication of the networks used to collect this
information.

There are promising signs that this is happening. STAT’s Casey Ross recently wrote about a number of initiatives that seek to put an inherent limit on governmental ability to share location data outside of specific circumstances — such as a global pandemic.

However, most analysts also agree that there is a glaring inconsistency
when it comes to arguments that try to limit governments’ abilities to track
their citizens. This is that many citizens who claim to worry about the privacy
implications of this are happy to share their location data with private
companies who operate under far less stringent protocols and legislation.

As Jack Dunn recently put it on the IAPP website, how can we reasonably evaluate the costs and benefits of Google or Facebook sharing location data with the federal government when it has been perfectly legal for Walgreens to share access to customer data with pharmaceutical advertisers? How does aggregating and anonymizing data safeguard privacy when a user’s personal data can be revealed through other data points?

The Future

This, unfortunately, is the reality of traveling today — that, even if
the government is not tracking your movements, there are plenty of apps on your
phone that probably are. Thus, as it did in many other ways, the pandemic has
done more to exacerbate existing issues with the way we approach technology
rather than representing a totally unprecedented event.

Not that this makes moving forward after the pandemic any easier, of course. But we should recognize that the issues with big data, and with data acquisition more generally, go much deeper than just the past year.

Source: https://www.dataversity.net/traveling-in-the-age-of-covid-19-big-data-is-watching/

Continue Reading

Big Data

Slides: Moving from a Relational Model to NoSQL

Avatar

Published

on

To view just the On Demand recording of this presentation, click HERE>>

This webinar is sponsored by:

About the Webinar

Businesses are quickly moving to NoSQL databases to power their modern applications. However, a technology migration involves risk, especially if you have to change your data model. What if you could host a relatively unmodified RDBMS schema on your NoSQL database, then optimize it over time?

We’ll show you how Couchbase makes it easy to:

  • Use SQL for JSON to query your data and create joins
  • Optimize indexes and perform HashMap queries
  • Build applications and analysis with NoSQL

About the Speaker

Matthew Groves

Senior Product Marketing Manager, Couchbase

Matthew D. Groves is a guy who loves to code. It doesn’t matter if it’s C#, jQuery, or PHP: He’ll submit pull requests for anything. He has been coding professionally ever since he wrote a QuickBASIC point-of-sale app for his parent’s pizza shop back in the ’90s. He currently works as a Developer Advocate for Couchbase. His free time is spent with his family, watching the Reds, and getting involved in the developer community. He is the author of AOP in .NET (published by Manning), a Pluralsight author, and a Microsoft MVP.

Source: https://www.dataversity.net/slides-moving-from-a-relational-model-to-nosql-2/

Continue Reading
Blockchain3 days ago

Buying the Bitcoin Dip: MicroStrategy Scoops $10M Worth of BTC Following $7K Daily Crash

Blockchain3 days ago

Bitcoin Correction Intact While Altcoins Skyrocket: The Crypto Weekly Recap

Blockchain3 days ago

MicroStrategy CEO claims to have “thousands” of executives interested in Bitcoin

Blockchain3 days ago

Canadian VR Company Sells $4.2M of Bitcoin Following the Double-Spending FUD

custom-packet-sniffer-is-a-great-way-to-learn-can.png
Blockchain5 days ago

TA: Ethereum Starts Recovery, Why ETH Could Face Resistance Near $1,250

Amb Crypto3 days ago

Monero, OMG Network, DigiByte Price Analysis: 23 January

Amb Crypto3 days ago

Chainlink Price Analysis: 23 January

Amb Crypto2 days ago

Will range-bound Bitcoin fuel an altcoin rally?

NEWATLAS19 hours ago

Lockheed Martin and Boeing debut Defiant X advanced assault helicopter

Amb Crypto2 days ago

Bitcoin Price Analysis: 24 January

Amb Crypto4 days ago

Popular analyst prefers altcoins LINK, UNI, others during Bitcoin & Eth’s correction phase

Amb Crypto3 days ago

Bitcoin Cash, Synthetix, Dash Price Analysis: 23 January

Amb Crypto3 days ago

Why has Bitcoin’s brief recovery not been enough

Automotive3 days ago

Tesla Powerwalls selected for first 100% solar and battery neighborhood in Australia

SPAC Insiders5 days ago

Virtuoso Acquisition Corp. (VOSOU) Prices Upsized $200M IPO

Blockchain4 days ago

Bitcoin Cash Analysis: Strong Support Forming Near $400

Blockchain4 days ago

OIO Holdings Appoints Rudy Lim as CEO of Blockchain Business Subsidiary

Amb Crypto3 days ago

Why now is the best time to buy Bitcoin, Ethereum

Amb Crypto3 days ago

Stellar Lumens, Cosmos, Zcash Price Analysis: 23 January

AI2 days ago

Plato had Big Data and AI firmly on his radar

Trending