Connect with us

Big Data

Interactive Exploration and Analysis of Scientific Datasets

Avatar

Published

on

Click to learn more about author Martyna Pawletta.

The availability of scientific datasets in Google BigQuery opens new possibilities for the exploration and analysis of public life sciences data. The Google Cloud Platform (GCP) provides a place where SQL queries can be easily and intuitively created in order to explore huge datasets extremely fast. Here we present a practical example of how you can work with them effectively, on BigQuery stored datasets, using the open-source Analytics Platform. 

In this blog post, we will cover a use case relevant for life sciences research. We will focus on answering some questions from the area of pharmaceutical research by linking and querying different datasets stored in BigQuery.

But don’t worry: Even if you’re not a life science expert, you still might find it useful to see how easy it can be to connect to BigQuery, construct complex queries without needing to write SQL, and explore the results of the queries using the Analytics Platform

SciWalker Open Data

This example was inspired by the SciWalker Open Data sets that were added to Google BigQuery and announced at the American Chemical Society meeting in San Diego this year. You can find the abstract in the Chemical Information Bulletin, page 86-87, here.

SciWalker is a comprehensive resource that contains chemistry-related data like molecules, nucleotides, and peptide sequences (overall 211 million unique molecules) that are linked to additional scientific information. The datasets also include clinical and drug-related data with links to different ontologies that allow us to compare data coming from different data sources using different wording.

  • Set up a BigQuery Account first! You’ll find a detailed description on how to set up your BigQuery account in this blog article by Emilio Silvestri.

Once your BigQuery account is configured, you can create your first query using the other database nodes, as demonstrated in the short example below. These nodes let you create SQL queries in a visual way, without needing to write SQL yourself (although you can add SQL if you want/need to).

  • To learn more about nodes provided for databases, check out our Hub, where you’ll also find more example workflows.
  • Additionally, you will find documentation, the Database Extension Guide, here.

Selecting and Downloading Data

In the short workflow below, we select data from two tables: One contains general information about clinical trials and the other references to literature that has been linked to those clinical trials. They can be joined using the DB Joiner node on the nct_id column and filtered for certain columns like IDs, title, study phase, and the PubMed ID from the reference table using the DB Column Filter node. Additionally, we group the data according to nct_id and count how many PubMed references have been registered per study. 

In the last step, the DB Reader node is used in order to execute the query and download the data into a table.

Fig. 1: The workflow to select data from two tables: One contains general information about clinical trials and the other references to literature that has been linked to those clinical trials.

Time to Play

Now that you’ve connected to a BigQuery resource and queried it with the database nodes, we will demonstrate how to interactively explore the data in a few simple steps. In each step you can use an interactive view to select the data you’re interested in, which are then used to create further queries and pull the matching data from BigQuery – and all this without writing code!

Fig. 2: The workflow “Explore Scientific Data Stored on BigQuery.”

Step 1

In the very first step of our exploration journey, we retrieve a list of diseases that are included in the clinical data (clinicaltrials.gov) datasets and standardized according to the disease ontology that is part of the SciWalker data collection. We then use this list to create an autocomplete menu, which we can use to select the disease we want to investigate further. For example, here we will investigate schizophrenia.

Step 2

Selecting a disease brings us – after some data querying, joining, wrangling, and preprocessing – to the next step, where we can explore compounds that have been registered for clinical studies on schizophrenia. We calculate some chemical properties and merge the data with additional information about the clinical trial. In a second table, PubMed references from each study are visible.

To make the view even more interactive, we added web links to the study and reference IDs that will bring you directly to the web pages describing those studies/references.

Let’s select “methotrexate”here, which is known as a chemotherapy agent and immune system suppressant, and see what happens in the next step.

Fig. 3: Interactive view, with additional web links to the study and reference IDs that bring you directly to the web pages describing those studies/references.

Step 3

Here we once again take advantage of the ontologies available in SciWalker. 

The view below shows which chemical classes “methotrexate”belongs to, along with how many other compounds from each of those chemical classes have been registered for clinical studies. Here, one class should be selected to go to the next step. We selected “pteridines,” which seems to be not that popular (with only 21 compounds registered for clinical studies). In the next step, let’s check which 21 compounds those are and for which diseases the studies have been conducted.

Fig. 4: View showing which chemical classes “methotrexate” belongs to, plus how many other compounds from each of those chemical classes have been registered for clinical studies.

Step 4

This view shows a tag cloud with disease and condition names for which studies have been registered for compounds in the selected compound class (here: pteridines). When you select a disease from the tag cloud, the list of compounds in the selected class that are associated with that disease are displayed in the table below. 

When we select “Rheumatoid arthritis,” we see that within the class of pteridines three compounds are linked. We see that methotrexate has been tested for schizophrenia and rheumatoid arthritis.

Fig. 5: View showing a tag cloud with disease and condition names for which studies have been registered for compounds in the selected compound class.

Step 5

The last view shows all compounds found in the clinical trials dataset that have been tested for both schizophrenia and rheumatoid arthritis. If you are curious which compounds those are, check out the workflow on the Hub here.

Prerequisites to run the example:

  • BigQuery account
  • Simba Driver
  • KNIME Analytics Platform (4.1)
  • KNIME Big Data Extension
  • KNIME Community Extensions – Cheminformatics (including RDKit)

Wrapping Up

In this blog post, we highlighted how to interactively explore and analyze scientific data using Google BigQuery and KNIME Analytics Platform together. We showed that combining these two tools allows us to take advantage of the breadth of data available in BigQuery using the interactive query construction, data analysis, and visualization capabilities in the Analytics Platform. Maybe this sparks further ideas or questions or even allows you to create new hypotheses? 

Though we’ve focused on life sciences data here, the combination of an analytics platform and Google BigQuery can be applied in many different fields, so feel free to give it a try no matter what your use case or industry!

If this makes you curious, start playing with the workflow demonstrated today or look for other examples here on the Hub. 

If you want to explore and do more experiments using freely available scientific datasets on Google BigQuery, check out the Marketplace. There is a lot more data to explore! 

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.dataversity.net/interactive-exploration-and-analysis-of-scientific-datasets/

AI

Optimal Dynamics nabs $22M for AI-powered freight logistics

Avatar

Published

on

Join Transform 2021 this July 12-16. Register for the AI event of the year.


Optimal Dynamics, a New York-based startup applying AI to shipping logistics, today announced that it closed a $18.4 million round led by Bessemer Venture Partners. Optimal Dynamics says that the funds will be used to more than triple its 25-person team and support engineering efforts, as well as bolster sales and marketing departments.

Last-mile delivery logistics tends to be the most expensive and time-consuming part of the shipping process. According to one estimate, last-mile accounts for 53% of total shipping costs and 41% of total supply chain costs. With the rise of ecommerce in the U.S., retail providers are increasingly focusing on fulfilment and distribution at the lowest cost. Particularly in the construction industry, the pandemic continues to disrupt wholesalers — a 2020 Statista survey found that 73% of buyers and users of freight transportation and logistics services experienced an impact on their operations.

Founded in 2016, Optimal Dynamics offers a platform that taps AI to generate shipment plans likely to be profitable — and on time. The fruit of nearly 40 years of R&D at Princeton, the company’s product generates simulations for freight transportation, enabling logistics companies to answer questions about what equipment they should buy, how many drivers they need, daily dispatching, load acceptance, and more.

Simulating logistics

Roughly 80% of all cargo in the U.S. is transported by the 7.1 million people who drive flatbed trailers, dry vans, and other heavy lifters for the country’s 1.3 million trucking companies. The trucking industry generates $726 billion in revenue annually and is forecast to grow 75% by 2026. Even before the pandemic, last-mile delivery was fast becoming the most profitable part of the supply chain, with research firm Capgemini pegging its share of the pie at 41%.

Optimal Dynamics’ platform can perform strategic, tactical, and real-time freight planning, forecasting shipment events as far as two weeks in advance. CEO Daniel Powell — who cofounded the company with his father, Warren Princeton, a professor of operations research and financial engineering — says that the underlying technology was deployed, tested, and iterated with trucking companies, railroads, and energy companies, along with projects in health, ecommerce, finance, and materials science.

“Use of something called ‘high-dimensional AI’ allows us to take in exponentially greater detail while planning under uncertainty. We also leverage clever methods that allow us to deploy robust AI systems even when we have very little training data, a common issue in the logistics industry,” Powell told VentureBeat via email. “The results are … a dramatic increase in companies’ abilities to plan into the future.”

The global logistics market was worth $10.32 billion in 2017 and is estimated to grow to $12.68 billion USD by 2023, according to Research and Markets. Optimal Dynamics competes with Uber, which offers a logistics service called Uber Freight. San Francisco-based startup KeepTruckin recently secured $149 million to further develop its shipment marketplace. Next Trucking closed a $97 million investment. And Convoy raised $400 million at a $2.75 billion valuation to make freight trucking more efficient.

But 25-employee Optimal Dynamics investor Mike Droesch, a partner at BVP, says that demand remains strong for the company’s products. “Logistics operators need to consider a staggering number of variables, making this an ideal application for a software-as-a-service product that can help operators make more informed decisions by leveraging Optimal Dynamics industry leading technology. We were really impressed with the combination of their deep technology and the commercial impact that Optimal Dynamics is already delivering to their customers,” he said in a statement.

With the latest funding round, a series A, Optimal Dynamics has raised over $22 million to date. Beyond Bessemer, Fusion Fund, The Westly Group, TenOneTen Ventures, Embark Ventures, FitzGate Ventures, and John Larkin and John Hess also contributed .

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://venturebeat.com/2021/05/13/optimal-dynamics-nabs-22m-for-ai-powered-freight-logistics/

Continue Reading

AI

Code-scanning platform BluBracket nabs $12M for enterprise security

Avatar

Published

on

Join Transform 2021 this July 12-16. Register for the AI event of the year.


Code security startup BluBracket today announced it has raised $12 million in a series A round led by Evolution Equity Partners. The capital will be used to further develop BluBracket’s products and grow its sales team.

Detecting exploits in source code can be a pain point for enterprises, especially with the onset of containerization, infrastructure as code, and microservices. According to a recent Flexera report, the number of vulnerabilities remotely exploitable in apps reached more than 13,300 from 249 vendors in 2020. In 2019, Barracuda Networks found that 13% of security pros hadn’t patched their web apps over the past 12 months. And in a 2020 survey from Edgescan, organizations said it took them an average of just over 50 days to address critical vulnerabilities in internet-facing apps.

BluBracket, which was founded in 2019 and is headquartered in Palo Alto, California, scans codebases for secrets and blocks future commits from introducing new risks. The platform can monitor real-time risk scores across codebases, git configurations, infrastructure as code, code copies, and code access and resolve issues, detecting passwords and over 50 different types of tokens, keys, and IDs.

Code-scanning automation

Coralogix estimates that developers create 70 bugs per 1,000 lines of code and that fixing a bug takes 30 times longer than writing a line of code. In the U.S., companies spend $113 billion annually on identifying and fixing product defects.

BluBracket attempts to prevent this by proactively monitoring public repositories with the highest risk factors, generating reports for dev teams. It prioritizes commits based on their risk scores, minimizing duplicates using a tracking hash for every secret. A rules engine reduces false positives and scans for regular expressions, as well as sensitive words. And BluBracket sanitizes commit history both locally and remotely, supporting the exporting of reports via download or email.

BluBracket offers a free product in its Community Edition. Both it and the company’s paid products, Teams and Enterprise, work with GitHub, BitBucket, and Gitlab and offer CI/CD integration with Jenkins, GitHub Actions, and Azure Pipelines.

BluBracket

Above: The Community Edition of BluBracket’s software.

Image Credit: BluBracket

“Since our introduction early last year, the industry has seen through Solar Winds how big of an attack surface code is. Hackers are exploiting credentials and secrets in code, and valuable code is available in the public domain for virtually every company we engage with,” CEO Prakash Linga, who cofounded BluBracket with Ajay Arora, told VentureBeat via email.

BluBracket competes on some fronts with Sourcegraph, a “universal code search” platform that enables developer teams to manage and glean insights from their codebase. It has another rival in Amazon’s CodeGuru, an AI-powered developer tool that provides recommendations for improving code quality. There’s also cloud monitoring platform Datadog, codebase coverage tester Codecov, and feature-piloting solution LaunchDarkly, to name a few.

But BluBracket, which has about 30 employees, says demand for its code security solutions has increased “dramatically” since 2020. Its security products are being used in “dozens” of companies with “thousands” of users, according to Linga.

“DevSecOps and AppSec teams are scrambling, as we all know, to address this growing threat. By enabling their developers to keep these secrets out of code in the first place, our solutions make everyone’s life easier,” Linga continued. “We are excited to work with Evolution on this next stage of our company’s growth.”

Unusual Ventures, Point72 Ventures, SignalFire, and Firebolt Ventures also participated in BluBracket’s latest funding round. The startup had previously raised $6.5 million in a seed round led by Unusual Ventures.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://venturebeat.com/2021/05/13/code-scanning-platform-blubracket-nabs-12m-for-enterprise-security/

Continue Reading

AI

Data governance and security startup Cyral raises $26M

Avatar

Published

on

Join Transform 2021 this July 12-16. Register for the AI event of the year.


Data security and governance startup Cyral today announced it has raised $26 million, bringing its total to date to $41.1 million. The company plans to put the funds toward expanding its platform and global workforce.

Managing and securing data remains a challenge for enterprises. Just 29% of IT executives give their employees an “A” grade for following procedures to keep files and documents secure, according to Egnyte’s most recent survey. A separate report from KPMG found only 35% of C-suite leaders highly trust their organization’s use of data and analytics, with 92% saying they were concerned about the reputational risk of machine-assisted decisions.

Redwood City, California-based Cyral, which was founded in 2018 by Manav Mital and Srini Vadlamani, uses stateless interception technology to deliver enterprise data governance across platforms, including Amazon S3, Snowflake, Kafka, MongoDB, and Oracle. Cyral monitors activity across popular databases, pipelines, and data warehouses — whether on-premises, hosted, or software-as-service-based. And it traces data flows and requests, sending output logs, traces, and metrics to third-party infrastructure and management dashboards.

Cyral can prevent unauthorized access from users, apps, and tools and provide dynamic attribute-based access control, as well as ephemeral access with “just-enough” privileges. The platform supports both alerting and blocking of disallowed accesses and continuously monitors privileges across clouds, tracking and enforcing just-in-time and just-enough privileges for all users and apps.

Identifying roles and anomalies

Beyond this, Cyral can identify users behind shared roles and service accounts to tag all activity with the actual user identity, enabling policies to be specified against them. And it can perform baselining and anomaly detection, analyzing aggregated activity across data endpoints and generating policies for normal activity, which can be set to alert or block anomalous access.

“Cyral is built on a high-performance stateless interception technology that monitors all data endpoint activity in real time and enables unified visibility, identity federation, and granular access controls. [The platform] automates workflows and enables collaboration between DevOps and Security teams to automate assurance and prevent data leakage,” the spokesperson said.

Cyral

Existing investors, including Redpoint, Costanoa Ventures, A.Capital, and strategic investor Silicon Valley CISO Investments, participated in Cyral’s latest funding round. Since launching in Q2 2020, Cyral — which has 40 employees and occupies a market estimated to be worth $5.7 billion by 2025, according to Markets and Markets — says it has nearly doubled the size of its team and close to quadrupled its valuation.

“This is an emerging market with no entrenched solutions … We’re now working with customers across a variety of industries — finance, health care, insurance, supply chain, technology, and more. They include some of the world’s largest organizations with complex environments and some of the fastest-growing tech companies,” the spokesperson said. “With Cyral, our company was built during the pandemic. We have grown the majority of our company during this time, and it has allowed us to start our company with a remote-first business model.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://venturebeat.com/2021/05/13/data-governance-and-security-startup-cyral-raises-26m/

Continue Reading

AI

Data governance and security startup Cyral raises $26M

Avatar

Published

on

Join Transform 2021 this July 12-16. Register for the AI event of the year.


Data security and governance startup Cyral today announced it has raised $26 million, bringing its total to date to $41.1 million. The company plans to put the funds toward expanding its platform and global workforce.

Managing and securing data remains a challenge for enterprises. Just 29% of IT executives give their employees an “A” grade for following procedures to keep files and documents secure, according to Egnyte’s most recent survey. A separate report from KPMG found only 35% of C-suite leaders highly trust their organization’s use of data and analytics, with 92% saying they were concerned about the reputational risk of machine-assisted decisions.

Redwood City, California-based Cyral, which was founded in 2018 by Manav Mital and Srini Vadlamani, uses stateless interception technology to deliver enterprise data governance across platforms, including Amazon S3, Snowflake, Kafka, MongoDB, and Oracle. Cyral monitors activity across popular databases, pipelines, and data warehouses — whether on-premises, hosted, or software-as-service-based. And it traces data flows and requests, sending output logs, traces, and metrics to third-party infrastructure and management dashboards.

Cyral can prevent unauthorized access from users, apps, and tools and provide dynamic attribute-based access control, as well as ephemeral access with “just-enough” privileges. The platform supports both alerting and blocking of disallowed accesses and continuously monitors privileges across clouds, tracking and enforcing just-in-time and just-enough privileges for all users and apps.

Identifying roles and anomalies

Beyond this, Cyral can identify users behind shared roles and service accounts to tag all activity with the actual user identity, enabling policies to be specified against them. And it can perform baselining and anomaly detection, analyzing aggregated activity across data endpoints and generating policies for normal activity, which can be set to alert or block anomalous access.

“Cyral is built on a high-performance stateless interception technology that monitors all data endpoint activity in real time and enables unified visibility, identity federation, and granular access controls. [The platform] automates workflows and enables collaboration between DevOps and Security teams to automate assurance and prevent data leakage,” the spokesperson said.

Cyral

Existing investors, including Redpoint, Costanoa Ventures, A.Capital, and strategic investor Silicon Valley CISO Investments, participated in Cyral’s latest funding round. Since launching in Q2 2020, Cyral — which has 40 employees and occupies a market estimated to be worth $5.7 billion by 2025, according to Markets and Markets — says it has nearly doubled the size of its team and close to quadrupled its valuation.

“This is an emerging market with no entrenched solutions … We’re now working with customers across a variety of industries — finance, health care, insurance, supply chain, technology, and more. They include some of the world’s largest organizations with complex environments and some of the fastest-growing tech companies,” the spokesperson said. “With Cyral, our company was built during the pandemic. We have grown the majority of our company during this time, and it has allowed us to start our company with a remote-first business model.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://venturebeat.com/2021/05/13/data-governance-and-security-startup-cyral-raises-26m/

Continue Reading
Aviation4 days ago

JetBlue Hits Back At Eastern Airlines On Ecuador Flights

Cyber Security5 days ago

Cybersecurity Degrees in Massachusetts — Your Guide to Choosing a School

Blockchain4 days ago

“Privacy is a ‘Privilege’ that Users Ought to Cherish”: Elena Nadoliksi

AI2 days ago

Build a cognitive search and a health knowledge graph using AWS AI services

Cyber Security5 days ago

Cybersecurity Degrees in Texas — Your Guide to Choosing a School

Blockchain1 day ago

Meme Coins Craze Attracting Money Behind Fall of Bitcoin

Energy3 days ago

ONE Gas to Participate in American Gas Association Financial Forum

Esports3 days ago

Pokémon Go Special Weekend announced, features global partners like Verizon, 7-Eleven Mexico, and Yoshinoya

Fintech3 days ago

Credit Karma Launches Instant Karma Rewards

Blockchain4 days ago

Opimas estimates that over US$190 billion worth of Bitcoin is currently at risk due to subpar safekeeping

SaaS4 days ago

Blockchain11 hours ago

Shiba Inu: Know How to Buy the New Dogecoin Rival

Esports2 days ago

Valve launches Supporters Clubs, allows fans to directly support Dota Pro Circuit teams

SaaS4 days ago

Blockchain4 days ago

Yieldly announces IDO

Esports4 days ago

5 Best Mid Laners in League of Legends Patch 11.10

Cyber Security3 days ago

Top Tips On Why And How To Get A Cyber Security Degree ?

SaaS4 days ago

Blockchain1 day ago

Sentiment Flippening: Why This Bitcoin Expert Doesn’t Own Ethereum

Business Insider2 days ago

Bella Aurora launches its first treatment for white patches on the skin

Trending