Connect with us


Fivetran snags $565m funding round as Snowflake attempts to eat its lunch with in-house data integration tools



Automated data integration outfit Fivetran has confirmed a $565m funding round – valuing the company at $5.6bn, roughly the GDP of Montenegro.

Meanwhile, the 2013-founded company has used some of its startup capital and bought data replication firm HVR, which employs log-based change data capture (CDC) technology, in a cash-and-stock deal worth around $700m.

The investment arm of web pioneer Marc Andreessen (a16z) led the Fivetran funding round, which also included General Catalyst, CEAS Investments, and Matrix Partners, and takes the total startup capital to $730m to date.

Promising prebuilt connectors, automated schema migration and SQL transformation, Fivetran, which was founded by CEO George Fraser and COO Taylor Brown, counts sportswear manufacturer ASICS, Autodesk, BJ’s Restaurants, DocuSign, and media giant Lionsgate among its customers.

The company claims its technology offers ready-to-use connectors that automatically adapt as schemas and APIs change, ensuring consistent, reliable access to data, saving time for data wranglers tasked with data warehousing and analytics problems.

In a pre-canned statement, Martin Casado, general partner at Andreessen Horowitz, said: “Fivetran solves the complex challenge of scaling and automating data integration better than anyone else, and joining forces with HVR will expand Fivetran’s capabilities to address the performance and security requirements of the enterprise.”

Fivetran has made strides in recent years, hitching its wagon to that other data darling of the day, Databricks, now valued at an incredible $38bn.

Earlier this year, Gartner offered the firm a notable mention as one of the technologies worth considering outside the main cloud providers’ data management tools.

Despite investor enthusiasm, Fivetran faces challenges. Namely that the cloud providers and other cloud-based data warehousing companies could set out to eat its lunch.

For example, Snowflake has launched its own ETL tools, hoping data engineers, data scientists, and developers will use ETL/ELT, data preparation, and feature engineering tools within the Snowflake environment rather than using third-party tools.

At the time, Philip Howard, research director at Bloor Research, named Fivetran and Matillion as companies that might find the move a challenge to their future growth strategies. ®

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.



Put data first when deploying scale-out file storage for accelerated systems



Sponsored It is easy to spend a lot of time thinking about the compute and interconnect in any kind of high performance computing workload – and hard not to spend just as much time thinking about the storage supporting that workload. It is particularly important to think about the type and volume of the data that will feed into these applications because this, more than any other factor, will determine the success or failure of that workload in meeting the needs of the organization.

It is in vogue these days to have a “cloud first” mentality when it comes to IT infrastructure, but what organizations really need is a “data first” attitude and then realize that cloud is just a deployment model with a pricing scheme and – perhaps – a deeper pool of resources than many organizations are accustomed to. But those deep pools come at a cost. It is fairly cheap to move data into clouds or generate it there and keep it there; however, it can be exorbitantly expensive to move data from a cloud so it can be used elsewhere.

The new classes of HPC applications, such as machine learning training and data analytics running at scale, tend to feed on or create large datasets, so it is important to have this data first attitude as the system is being architected. The one thing you don’t want to do is find out somewhere between proof of concept and production that you have the wrong storage – or worse yet, find out that your storage can’t keep up with the data as a new workload rolls into production and is a wild success.

“When storage hardware is added as a quick fix without a well thought out strategy around current and future requirements, problems will often arise,” Brian Henderson, director of unstructured data storage product marketing at Dell Technologies, says. “Organizations buy some servers, attach some storage, launch the project, and see how it goes. This type of approach very often leads to problems of scale, problems of performance, problems of sharing the data. What these organizations need is a flexible scale-out file storage solution that enables them to contain all of their disparate data and connect all of it so stakeholders and applications can all quickly and easily access and share it.”

So, it is important to consider some key data storage requirements before the compute and networking components are set in stone in a purchase order.

The first thing to consider is scale, and you should assume scale from the get-go and then find a system that can start small but grow large enough to contain the data and serve disparate systems and data types.

Although it is probably possible to rely on internal storage or a hodgepodge of storage attached to systems or clusters, HPC and AI workloads more often than not are accelerated by GPUs from NVIDIA. It is best to assume that compute, storage, and networking will have to scale as workloads and datasets grow and proliferate. There are many different growth vectors to consider and forgetting any of them can lead to capacity and performance issues down the road.

And there is an even more subtle element to this storage scale issue that should be considered. Data is archived for both HPC and AI systems. HPC applications take small amounts of initial conditions and create a massive simulation and visualization that reveals something about the real world, while AI systems take massive amounts of information – usually a mix of structured and unstructured data – and distill it into a model that can be used to analyze the real world or react to it. These initial datasets and their models must be preserved for business reasons as well as data governance and regulatory compliance.

You can’t throw this data away even if you want to

“You can’t throw this data away even if you want to,” says Thomas Henson, who is global business development manager for AI and analytics for the Unstructured Data Solutions team at Dell Technologies. “No matter what the vertical industry – automotive, healthcare, transportation, financial services – you might find a defect in the algorithms and litigation is an issue. You will have to show the data that was fed into algorithms that produced the defective result or prove that it didn’t. To a certain extent, the value of that algorithm is the data that was fed into it. And that is just one small example.”

So for hybrid CPU-GPU systems, it is probably best to assume that local storage on the machines will not suffice, and that external storage capable of holding lots of unstructured data will be needed. For economic reasons, as AI and some HPC projects are still in proof of concept phases, it will be useful to start out small and be able to scale capacity and performance fast and on independent vectors, if need be.

The PowerScale all-flash arrays running the OneFS file system from Dell Technologies fit this storage profile. The base system comes in a three-node configuration that has up to 11 TB of raw storage and a modest price under six figures, and has been tested in the labs up to 250 nodes in a shared storage cluster that can hold up to 96 PB of data. And Dell Technologies has customers running PowerScale arrays at a much higher scale than this, by the way, but they often spawn separate clusters to reduce the potential blast area of an outage. Which is extremely rare.

PowerScale can be deployed on-premises or it can be extended into a number of public clouds with multi-cloud or native cloud integrated options where customers can take advantage of additional compute or other native cloud services.

Performance is the other part of scale that companies need to consider, and this is particularly important when the systems are being accelerated by GPUs. Ever since the early days of GPU compute, NVIDIA has worked to get the CPU and its memory out of the way and to keep it from becoming the bottleneck that keeps GPUs from sharing data (GPUDirect) as they run their simulations or build their models or that keeps GPUs from accessing storage lightning fast (GPUDirect Storage).

If external storage is a necessity for such GPU accelerated systems – there is no way servers with four or eight GPUs will have enough storage to hold the datasets that most HPC and AI applications process – then it seems clear that whatever that storage is has to speak GPUDirect Storage and speak it fast.

The previous record holder was Pavilion Data, which tested a 2.2 PB storage array and was able to read data into a DGX-A100 system based on the new “Ampere” A100 GPUs at 191 GB/sec in file mode. In the lab, Dell Technologies is putting the finishing touches on its GPUDirect Storage benchmark tests running on PowerScale arrays and says it can push the performance considerably higher, at least to 252 GB/sec. And since PowerScale can scale to 252 nodes in a single namespace, it doesn’t stop there and can scale far beyond that if needed.

“The point is, we know how to optimize for these GPU compute environments,” says Henderson. And here is a more general statement about the performance of GPU-accelerated systems running AI workloads and how PowerScale storage performs:

The breadth of support for various kinds of systems is another thing to consider while architecting a hybrid CPU-GPU system. The very nature of shared storage is to be shared, and it is important to be able to use the data on the shared storage for other applications. The PowerScale arrays have been integrated with over 250 applications and are certified as supported on many kinds of systems. This is one of the reasons that Isilon and PowerScale storage has over 15,000 customers worldwide.

High performance computing is about more than performance, particularly in an enterprise environment where resources are constrained and having control of systems and data is absolutely critical. So the next thing that must be considered in architecting the storage for GPU-accelerated systems is storage management.

Tooled up

On this front, Dell Technologies brings a number of tools to the party. The first is InsightIQ, which does very specific and detailed storage monitoring and reporting for PowerScale and its predecessor, the Isilon storage array.

Another tool is called CloudIQ, which uses machine learning and predictive analytics techniques that monitors and helps manage the full range of Dell Technologies infrastructure products, including PowerStore, PowerMax, PowerScale, PowerVault, Unity XT, XtremIO, and SC Series, as well as PowerEdge Servers and converged and hyperconverged platforms such as VxBlock, VxRail, and PowerFlex.

And finally, there is DataIQ, a storage monitoring and dataset management software for unstructured data which provides a unified view of unstructured datasets across PowerScale, PowerMax, and PowerStore arrays as well as cloud storage from the big public clouds. DataIQ doesn’t just show you the unstructured datasets but also keeps track of how they are used and moves them to the most appropriate storage, for example, on-premises file systems or cloud-based object storage.

The last consideration is reliability and data protection, which go hand in hand in any enterprise-grade storage platform. The PowerScale arrays have their heritage in Isilon and its OneFS file system, which has been around for a long time, and which has been trusted in enterprise, government, and academic HPC institutions for two decades. OneFS and its underlying PowerScale hardware are designed to deliver up to 99.9999 percent availability, while most cloud storage services that handle unstructured data are lucky to have service agreements for 99.9 percent availability. The former has 31 seconds of downtime a year, while the latter is offline eight hours and 46 minutes.

Moreover, PowerScale is designed to give good performance and maintain data access even if some of the nodes in the storage cluster are down for maintenance or repairing themselves after a component failure. (Component failures are unavoidable for all IT equipment, after all.)

But there is another kind of resiliency that is becoming increasingly important these days: recovery from ransomware attacks.

“We have API-integrated ransomware protection for PowerScale that will detect suspicious behavior on the OneFS file system and alert administrators about it,” says Henderson. “And a lot of our customers are implementing a physically separate, air-gapped cluster setup to maintain a separate copy of all of their data. In the event of a cyberattack, you just shut down the production storage and you have your data, and you are not trying to restore from backups or archives, which could take days or weeks – particularly if you are restoring from cloud archives. Once you are talking about petabytes of data, that could take months.

“We can restore quickly, at storage replication speeds, which is very, very fast. And you have options to host your ransomware defender solution in multi-cloud environments where you can recover your data from a cyber event leveraging a public cloud.”

Sponsored by Dell.

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.


Continue Reading


Fatal Attraction: Lovely collection, really, but it does not belong anywhere near magnetic storage media



On Call Friday brings the promise of a weekend free from the work laptop but likely shackled to the personal laptop instead. Kick off your two days of downtime with another tale from those brave enough to be On Call.

Today’s story comes from a reader Regomised as “Elliot” who was spending the mid-noughties gainfully employed as third-line support and general server boffin at an insurance company.

Such was his enthusiasm for his job that he referred to the users demanding assistance as “customers” rather than the inconveniences we know they tend to be.

On the day in question, one of these customers called into IT support. The complaint was relatively run of the mill – the user’s Word documents were being corrupted.

The unusual twist here was that the problem was only occurring on the user’s local machine. Documents saved to the network were fine. Local disk, though? Corruption city.

Elliot, being third-line support, was saved from having to do an actual visit to the customer’s desk. Instead, an engineer was sent out. The simplest solution was to swap out the hard disk, reimage the machine, and the user was good to go.

For a while.

A few days later the same user made the same complaint. Documents were getting corrupted. Important insurance business was not being done. What was IT going to do about it?

“A different engineer was sent,” Elliot told us. “And he replaced the PC…”

Sorted? Er, no. It took a few short days before the request for help came in once again. This time the boss wheeled out the big guns and sent in third-line support.

Experience is all. As is the ability to observe one’s surroundings.

“I saw the cause of the file corruption the moment I opened her office door,” he told us.

“She’d plastered the computer case with fridge magnets.”

Inspirational or amusing quotes and holiday knick-knacks might make for handy ways of pinning to-do lists and the artwork of one’s poppets to a domestic appliance. They do not, however, belong anywhere near magnetic media. Sure, a fridge magnet by itself would be unlikely to destroy a drive, but the odd bit of mystery data corruption? Absolutely!

A clearance of the clutter made the problem go away.

“I found it a trifle worrying,” mused Elliot, “that two ‘engineers’ hadn’t made the mental connection between magnets and magnetic media.

“Perhaps it was because they were too young to remember cassette tapes.”

Age, experience, and cunning always trumps youthful experience. Ever been called out to an insoluble problem and fixed it in seconds because you’d seen it all before? Or had to explain to a user that the save icon wasn’t actually the back of a bus? Share your tale with an email to On Call. ®

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.


Continue Reading


What’s not to like about Microsoft 365? Only its data-protection tooling



Sponsored Microsoft 365 has truly cemented its position at the center of the corporate productivity universe over the last year and a half.

The platform gives most corporate users pretty much everything they need to do their jobs, whether in the office or remotely, both in terms of collaboration tools and for producing the data and content that is the lifeblood of any organisation.

The problem is that while 365’s communication and document management capabilities are unquestionable, its native security and data protection tools leave much to be desired. And as valuable corporate data is increasingly created and lives in 365, nagging concerns for data specialists can quickly develop into a full-on nightmare.

So, what to do? Well, we think you’ll sleep a little easier by checking out our upcoming webcast, Protect your Microsoft 365, which you can catch on-demand from October 21.

Our own highly collaborative Tim Phillips will be joined by Greg Carter from Rubrik to discuss best practices around protecting Microsoft 365.

They’ll point out exactly where the platform’s native tools fall short of the sort of protection enterprises, indeed all businesses, really need, and explain what holistic protection really means. They’ll also pick talk through the benefits of centralized management and automated protection and show what that looks like in practice.

Whether your Microsoft 365 estate is large or small, you’ll come away with a much better idea of how to keep your users – and their precious data – safe and sound.

All you need to do is head here, drop in a few details, and we’ll let you know from when the webcast is ready to watch on-demand from whatever timezone you’re in. In the meantime … try to get some sleep.

Sponsored by Rubrik

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.


Continue Reading


Surrey County Council faces £700k additional SAP support fees as £30m Unit4 ERP set to miss go-live target



Facing a £47.1m budget shortfall, Surrey County Council has been forced to delay the implementation of a £30m Unit4 ERP system, incurring additional annual support fees of £700,000 on its ageing SAP system.

Unit4 was awarded the contract in autumn of 2020 following a competitive tender process to replace a SAP R/3 system first rolled out in 2004 by Capgemini. The plan was to go live with the new cloud-based SaaS system by 1 December 2021, with implementation partners Embridge Consulting and Proactis on the project.

In the business case justifying the project [PDF], the council noted the risks attached to its timeline.

It noted the council would need to “provide notice to SAP by September 2021 to end support & maintenance contract by January 2022.”

“There is a risk that the completion of the implementation phase slips into Q4 2021/22, which will result in the council incurring the full cost of SAP support & maintenance for 2022 (£700k), which becomes due in January 2022.”

In a statement to The Register, the council admitted the implementation of the new system had been delayed until April 2022. “We are having positive ongoing conversations with SAP about support beyond 31 December,” it said.

“The decision was made by the Digital Business and Insights Strategic Programme Board to revise the date in order to maximise the opportunity to ensure the safe and smooth delivery of the new system, give more time to get everyone trained and go-live with the full-set of improvements.”

Not only will the council incur additional support fees for the German vendor, it will presumably have to wait for the imagined benefits of the new system. The business case says it is looking forward to “new efficiencies realised from a reduction in annual running costs of £77k in 2022/23 and rising to £327k in 2023/24.”

Also listed in the outline business case are requirements for other resources outside the main contract, which would amount to £394,000 to provide consultancy support for business requirements and an internal project manager. They also allocate £177,000 for Andrew Richards, interim digital business and insights programme manager, who trades as A C Richards Consulting. The fee covers the years 2019/2020 and 2020/21.

The council has been asked if these contracts will be extended into 2021/22 given the delay to the project. ®

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.


Continue Reading
Automotive4 days ago

Evans and TOYOTA GAZOO Racing Seal Second in Spain

Fintech4 days ago

PNC cuts nearly 600 apps for BBVA conversion

Automotive4 days ago

This Toyota Mirai 1:10 Scale RC Car Actually Runs On Hydrogen

Automotive1 day ago

7 Secrets That Automakers Wish You Don’t Know

Blockchain1 day ago

What Is the Best Crypto IRA for Me? Use These 6 Pieces of Criteria to Find Out More

Cyber Security4 days ago

Spotify Web Player

Blockchain9 hours ago

People’s payment attitude: Why cash Remains the most Common Means of Payment & How Technology and Crypto have more Advantages as a Means of payment

Aviation5 days ago

Vaccine passports for overseas travel to be rolled out this week

IOT1 day ago

The Benefits of Using IoT SIM Card Technology

Gaming1 day ago

New Steam Games You Might Have Missed In August 2021

Blockchain1 day ago

The Most Profitable Cryptocurrencies on the Market

Esports4 days ago

New World team share details of upcoming server transfers in Q&A

Gaming1 day ago

How do casinos without an account work?

Esports5 days ago

New World Animal Instincts Quest Details

Startups9 hours ago

The 12 TikTok facts you should know

Esports4 days ago

How TFT Set 6 Hextech Augments work: full list and updates

Energy4 days ago

Segunda Conferência Ministerial de Energia da Rota e Cinturão é realizada em Qingdao

Blockchain1 day ago

What does swapping crypto mean?

Gaming1 day ago

Norway will crack down on the unlicensed iGaming market with a new gaming law

Payments4 days ago

Everyone is building a wallet