Connect with us

Publications

4 Key Observability Metrics for Distributed Applications

Published

on

image

A common architectural design pattern these days is to break up an application monolith into smaller microservices. Each microservice is then responsible for a specific aspect or feature of your app. For example, one microservice might be responsible for serving external API requests, while another might handle data fetching for your frontend. 

Designing a robust and fail-safe infrastructure in this way can be challenging; monitoring the operations of all these microservices together can be even harder. It’s best not to simply rely on your application logs for an understanding of your systems’ successes and errors. Setting up proper monitoring will provide you with a more complete picture, but it can be difficult to know where to start. In this post, we’ll cover service areas your metrics should focus on to ensure you’re not missing key insights.

We’re going to make a few assumptions about your app setup. Don’t worry—you don’t need to use any specific framework to start tracking metrics. However, it does help to have a general understanding of the components involved. In other words, how you set up your observability tooling matters less than what you track. 

Since a sufficiently large set of microservices requires some level of coordination, we’re going to assume you are using Kubernetes for orchestration. We’re also assuming you have a time series database like Prometheus or InfluxDB for storing your metrics data. You might also need an ingress controller, such as the one Kong provides to control traffic flow, and a service mesh, such as Kuma, to better facilitate connections between services.

Before implementing any monitoring, it’s essential to know how your services actually interact with one another. Writing out a document that identifies which services and features depend on one another and how availability issues would impact them can help you strategize around setting baseline numbers for what constitutes an appropriate threshold. 

You should be able to see data points from two perspectives: Impact Data and Causal Data. Impact Data represents information that identifies who is being impacted. For example, if there’s a service interruption and responses slow down, Impact Data can help identify what percentage of your active users is affected. 

While Impact Data determines who is being affected, Causal Data identifies what is being affected and why. Kong Ingress, which can monitor network activity, can give us insight into Impact Data. Meanwhile, Kuma can collect and report Causal Data. 

Let’s look at a few data sources and explore the differences between Impact Data and Causal Data that can be collected about them.

Latency

Latency is the amount of time it takes between a user performing an action and its final result. For example, if a user adds an item to their shopping cart, the latency would measure the time between the item addition and the moment the user sees a response that indicates its successful addition. If the service responsible for fulfilling this action degraded, the latency would increase, and without an immediate response, the user might wonder whether the site was working at all. 

To properly track latency in an Impact Data context, it’s necessary to follow a single event throughout its entire lifetime. Sticking with our purchasing example, we might expect the full flow of an event to look like the following:

  • The customer clicks the “Add to Cart” button
  • The browser makes a server-side request, initiating the event
  • The server accepts the request
  • A database query ensures that the product is still in stock
  • The database response is parsed, a response is sent to the user, and the event is complete

To successfully follow this sequence, you should standardize on a naming pattern that identifies both what is happening and when it’s happening, such as

customer_purchase.initiate

,

customer_purchase.queried

,

customer_purchase.finalized

, and so on. Depending on your programming language, you might be able to provide a function block or lambda to the metrics service:

statsd.timing('customer_purchase.initiate') do # ...
end

By providing specific keywords, you ought to hone in on which segment of the event was slow in the event of a latency issue.

Tracking latency in a Causal Data context requires you to track the speed of an event between services, not just the actions performed. In practice, this means timing service-to-service requests:

statsd.histogram('customer_purchase.initiate') do statsd.histogram('customer_purchase.external_database_query') do # ... end
end

This shouldn’t be limited to capturing the overall endpoint request/response cycles. That sort of latency tracking is too broad and ought to be more granular. Suppose you have a microservice with an endpoint that makes internal database requests. In that case, you might want to time the moment the request was received, how long the query took, the moment the service responded with a request, and the moment when the originating client received that request. This way, you can pinpoint precisely how the services communicate with one another.

Traffic

You want your application to be useful and popular—but an influx of users can be too much of a good thing if you’re not prepared! Changes in site traffic can be difficult to predict. You might be able to serve user load on a day-to-day basis, but events (both expected and unexpected) can have unanticipated consequences. Is your eCommerce site running a weekend promotion? Did your site go viral because of some unexpected praise? Traffic variances can also be affected by geolocation. Perhaps users in Japan are experiencing traffic load in a way that users in France are not. You might think that your systems are working as intended, but all it takes is a massive influx of users to test that belief. If an event takes 200ms to complete, but your system can only process one event at a time, it might not seem like there’s a problem—until the event queue is suddenly clogged up with work.

Similar to latency, it’s useful to track the number of events being processed throughout the event’s lifecycle to get a sense of any bottlenecks. For example, tracking the number of jobs in a queue, the number of HTTP requests completed per second, and the number of active users are good starting points for monitoring traffic.

For Causal Data, monitoring traffic involves capturing how services transmit information to one another, similar to how we did it for latency. Your monitoring setup ought to track the number of requests to specific services, their response codes, their payload sizes, and so on—as much about the request and response cycle as necessary. When you need to investigate worsening performance, knowing which service is experiencing problems will help you track the possible source much sooner.

Error Rates

Tracking error rates is rather straightforward. Any 5xx (or even 4xx) issued as an HTTP response by your server should be tagged and counted. Even situations that you’ve accounted for, such as caught exceptions, should be monitored because they still represent a non-ideal state. These issues can act as warnings for deeper problems stemming from defensive coding that doesn’t address actual problems. 

Kuma can capture the error codes and messages thrown by your service, but this represents only a portion of actionable data. For example, you can also capture the arguments which caused the error (in case a query was malformed), the database query issued (in case it timed out), the permissions of the acting user (in case they made an unauthorized attempt), and so on. In short, capturing the state of your service at the moment it produces an error can help you replicate the issue in your development and testing environments.

Saturation

You should track the memory usage, CPU utilization, disk reads/writes, and available storage of each of your microservices. If your resource usage regularly spikes during certain hours or operations or increases at a steady rate, this suggests you’re overutilizing your server. While your server may be running as expected, once again, an influx of traffic or other unforeseen occurrences can quickly topple it over.

Kong Ingress only monitors network activity, so it’s not ideal for tracking saturation. However, there are many tools available for tracking this with Kubernetes.

Up to now, we’ve discussed the kinds of metrics that will be important to track in your cloud application. Next, let’s dive into some specific steps you can take to implement this monitoring and observability.

Install Prometheus

Prometheus is the go-to standard for monitoring, an open-source system that is easy to install and integrate with your Kubernetes setup. Installation is especially simple if you use Helm.

First, we create a

monitoring

namespace:

$ kubectl create namespace monitoring

Next, we use Helm to install Prometheus. We make sure to add the Prometheus charts to Helm as well:

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo add stable https://kubernetes-charts.storage.googleapis.com/
$ helm repo update
$ helm install -f https://bit.ly/2RgzDtg -n monitoring prometheus prometheus-community/prometheus

The values file referenced at https://bit.ly/2RgzDtg sets the data scrape interval for Prometheus to ten seconds.

Enable Prometheus Plugin in Kong

Assuming you are using Kong Ingress Controller (KIC) for Kubernetes, your next step will be to create a custom resource—a

KongPlugin

resource—which integrates into the KIC. Create a file called

prometheus-plugin.yml

:

apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata: name: prometheus annotations: kubernetes.io/ingress.class: kong labels: global: "true"
plugin: prometheus

Install Grafana

Grafana is an observability platform that provides excellent dashboards for visualization of data scraped by Prometheus. We use Helm to install Grafana as follows:

$ helm install grafana stable/grafana -n monitoring --values http://bit.ly/2FuFVfV

You can view the bit.ly URL in the above command to see the specific configuration values for Grafana that we provide upon installation.

Enable Port Forwarding

Now that Prometheus and Grafana are up and running in our Kubernetes cluster, we’ll need access to their dashboards. For this article, we’ll set up basic port forwarding to expose those services. This is a simple—but not very secure—way to get access, but not advisable for production deployments.

$ POD_NAME=$(kubectl get pods --namespace monitoring -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 9090 & $ POD_NAME=$(kubectl get pods --namespace monitoring -l "app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 3000 &

The above two commands expose the Prometheus server on port

9090 

and the Grafana dashboard on port

3000

.

Those simple steps should be sufficient to set you off and running. With Kong Ingress Controller and its integrated Prometheus plugin, capturing metrics with Prometheus and visualizing them with Grafana are quick and simple to set up.

Whenever you need to investigate worsening performance, your Impact Data metrics can help orient you on the magnitude of the problem: it should tell you how many people are affected. Likewise, your Causal Data identifies what isn’t working and why. The former points you to the plume of smoke, and the latter takes you to the fire. 

In addition to all of the above, you should also consider the rate at which your metrics are changing. For example, say your traffic numbers are increasing. Observing how quickly those numbers are moving can help you determine when (or if) it’ll become a problem. This is essential for managing upcoming work with regular deployments and changes to your services. It also establishes what an ideal performance metric should be.

Google wrote an entire book on site reliability, which is a must-read for any developer. If you’re already running Kong alongside your clusters, plugins such as this one integrate directly with Prometheus, which means less configuration on your part to monitor and store metrics for your services.

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://hackernoon.com/4-key-observability-metrics-for-distributed-applications-z11337yh?source=rss

Crunchbase

Egyptian ride-sharing company Swvl plans to go public in a $1.5B SPAC merger

Published

on

Cairo and Dubai-based ride-sharing company Swvl plans to go public in a merger with special purpose acquisition company Queen’s Gambit Growth Capital, Swvl said Tuesday. The deal will see Swvl valued at roughly $1.5 billion.

Swvl was founded by Mostafa Kandil, Mahmoud Nouh and Ahmed Sabbah in 2017. The trio started the company as a bus-hailing service in Egypt and other ride-sharing services in emerging markets with fragmented public transportation.

Its services, mainly bus-hailing, enables users to make intra-state journeys by booking seats on buses running a fixed route. This is pocket-friendly for residents in these markets compared to single-rider options and helps reduce emissions (Swvl claims it has prevented over 240 million pounds of carbon emission since inception).

After its Egypt launch, Swvl expanded to Kenya, Pakistan, Jordan and Saudi Arabia. The company also moved its headquarters to Dubai as part of its strategy to become a global company.

Swvl offerings have expanded beyond bus-hailing services. Now, the company offers inter-city rides, car ride-sharing, and corporate services across the 10 cities it operates in across Africa and the Middle East.

Queen’s Gambit, the women-led SPAC in charge of the deal, raised $300 million in January and added $45 million via an underwriters’ overallotment option focusing on startups in clean energy, healthcare and mobility sectors.

The statement also mentions a group of investors — Agility, Luxor Capital and Zain Group — which will contribute $100 million through a private investment in public equity, or PIPE.

Per Crunchbase, Swvl has raised over $170 million. From an African perspective, Swvl features as one of the most venture-backed startups on the continent. The company has been touted to reach unicorn status in the past and will when this SPAC merger is completed.

The company will aptly trade under the ticker SWVL. The listing will make it the first Egyptian startup to go public outside Egypt and the second to go public after Fawry. It will also make the mobility company the largest African unicorn debut on any U.S.-listed exchange, beating Jumia’s debut of $1.1 billion on the NYSE. Swvl joins music-streaming platform Anghami as the second startup in the region to go public via a SPAC merger in the Middle East.

Swvl had annual gross revenue of $26 million in 2020, according to the statement, and the company expects its annual gross revenue to increase to $79 million this year and $1 billion by 2025 after expanding to 20 countries across five continents.

On why Queen’s Gambit picked Swvl for this deal, Victoria Grace, founder and CEO, said in a statement that the company fit the profile of what she was looking for: “a disruptive platform that solves complex challenges and empowers underserved populations.”

“Having established a leadership position in key emerging markets, we believe Swvl is ready to capitalize on a truly global market opportunity,” she added.

In May, TechCrunch wrote that SPACs didn’t target African startups for several reasons, including a lack of global appeal and private capital and market satisfaction. Judging by Grace’s comments, Swvl has that global appeal and is ready to venture into the public market despite being in operation for just four years.

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://techcrunch.com/2021/07/28/egyptian-ride-sharing-company-swvl-plans-to-go-public-in-a-1-5b-spac-merger/

Continue Reading

Crunchbase

Egyptian ride-sharing company Swvl plans to go public in a $1.5B SPAC merger

Published

on

Cairo and Dubai-based ride-sharing company Swvl plans to go public in a merger with special purpose acquisition company Queen’s Gambit Growth Capital, Swvl said Tuesday. The deal will see Swvl valued at roughly $1.5 billion.

Swvl was founded by Mostafa Kandil, Mahmoud Nouh and Ahmed Sabbah in 2017. The trio started the company as a bus-hailing service in Egypt and other ride-sharing services in emerging markets with fragmented public transportation.

Its services, mainly bus-hailing, enables users to make intra-state journeys by booking seats on buses running a fixed route. This is pocket-friendly for residents in these markets compared to single-rider options and helps reduce emissions (Swvl claims it has prevented over 240 million pounds of carbon emission since inception).

After its Egypt launch, Swvl expanded to Kenya, Pakistan, Jordan and Saudi Arabia. The company also moved its headquarters to Dubai as part of its strategy to become a global company.

Swvl offerings have expanded beyond bus-hailing services. Now, the company offers inter-city rides, car ride-sharing, and corporate services across the 10 cities it operates in across Africa and the Middle East.

Queen’s Gambit, the women-led SPAC in charge of the deal, raised $300 million in January and added $45 million via an underwriters’ overallotment option focusing on startups in clean energy, healthcare and mobility sectors.

The statement also mentions a group of investors — Agility, Luxor Capital and Zain Group — which will contribute $100 million through a private investment in public equity, or PIPE.

Per Crunchbase, Swvl has raised over $170 million. From an African perspective, Swvl features as one of the most venture-backed startups on the continent. The company has been touted to reach unicorn status in the past and will when this SPAC merger is completed.

The company will aptly trade under the ticker SWVL. The listing will make it the first Egyptian startup to go public outside Egypt and the second to go public after Fawry. It will also make the mobility company the largest African unicorn debut on any U.S.-listed exchange, beating Jumia’s debut of $1.1 billion on the NYSE. Swvl joins music-streaming platform Anghami as the second startup in the region to go public via a SPAC merger in the Middle East.

Swvl had annual gross revenue of $26 million in 2020, according to the statement, and the company expects its annual gross revenue to increase to $79 million this year and $1 billion by 2025 after expanding to 20 countries across five continents.

On why Queen’s Gambit picked Swvl for this deal, Victoria Grace, founder and CEO, said in a statement that the company fit the profile of what she was looking for: “a disruptive platform that solves complex challenges and empowers underserved populations.”

“Having established a leadership position in key emerging markets, we believe Swvl is ready to capitalize on a truly global market opportunity,” she added.

In May, TechCrunch wrote that SPACs didn’t target African startups for several reasons, including a lack of global appeal and private capital and market satisfaction. Judging by Grace’s comments, Swvl has that global appeal and is ready to venture into the public market despite being in operation for just four years.

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://techcrunch.com/2021/07/28/egyptian-ride-sharing-company-swvl-plans-to-go-public-in-a-1-5b-spac-merger/

Continue Reading

HRTech

Google will require vaccines as workers return to the office

Published

on

Coronavirus

In the coming weeks, Google will start requiring workers to be vaccinated before coming into the office, the company announced Wednesday. The tech giant is also extending its voluntary work-from-home policy through late October, as the Delta variant of the Covid-19 virus spreads. 

The vaccination requirement will roll out across US offices first and then expand to other regions. The implementation of the policy will vary depending on local conditions and regulations, as well as the availability of the vaccine. 

Google’s decision follows similar vaccination requirements by a growing number of government agencies and entities like the Mayo Clinic. 

Google first closed its offices in March 2020 and previously said its employees would work remotely until at least September 2021. After announcing it would adopt a “hybrid workforce model” that asked employees to spend at least some time in the office, Google in May said that it expects around 20 percent of its employees to work from home permanently. 

Some of the company’s campuses have started reopening. Google said it will give employees at least 30 days’ notice before implementing its full return-to-office plans. 

“It’s encouraging to see very high vaccination rates for our Google community in areas where vaccines are widely available,” CEO Sundar Pichai wrote in an email to employees Wednesday. “This is a big reason why we felt comfortable opening some of our offices to employees who wanted to return early.”

Prior and related coverage: 

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://www.zdnet.com/article/google-will-require-vaccines-as-workers-return-to-the-office/#ftag=RSSbaffb68

Continue Reading

Techcrunch

Score a free month of Extra Crunch with your TC Sessions: SaaS 2021 pass

Published

on

Whether you’re just starting to build your SaaS empire or you’re further along in your journey, you don’t want to miss TC Sessions: SaaS 2021 on October 27. This day-long virtual event, dedicated to the increasingly sophisticated world of software-as-a-service, features some of the sector’s biggest names, plenty of actionable advice and ample opportunity to network for, well, ample opportunities.

Learn how to scale, how to manage growth — of your business and of the massive amount of data it generates — and how to keep your products and services safe in an increasingly cyber-hostile world. And that’s just for starters.

Bonus Alert: Buy a TC Sessions: SaaS pass and receive a free, one-month subscription to Extra Crunch, our members-only program featuring exclusive daily articles for founders and startup teams.

Extra Crunch membership gives you the inside scoop and helps you stay ahead of the tech, business and investing trends every startup founder needs to know. Since Extra Crunch launched in 2019, we’ve posted more than 2,000 articles.

You’ll have access to exclusive articles on topics like market analysis, growth and fundraising. Here’s a quick peek at just some of the recent titles available to Extra Crunch subscribers:

Your membership also includes access to our weekly virtual event series, Extra Crunch Live. We hosted more than 40 events during 2020, and we built more interactivity into our 2021 format. We added a bunch of new stuff, too — like Pitch Deck Teardowns. Check out what’s going on with Extra Crunch Live in 2021.

We’re not quite ready to share the TC Sessions: SaaS event agenda, but register for updates and you’ll know when we announce new speakers, add events and offer ticket discounts.

TC Sessions: SaaS 2021 takes place on October 27. Join your global SaaS community to learn, inspire, connect and grow a stronger business. Buy your SaaS pass here and scoop up a free month of Extra Crunch goodness on us.

Is your company interested in sponsoring or exhibiting at TC Sessions: SaaS 2021 – Marketing & Fundraising? Contact our sponsorship sales team by filling out this form.

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://techcrunch.com/2021/07/28/score-a-free-month-of-extra-crunch-with-your-tc-sessions-saas-2021-pass/

Continue Reading
Esports4 days ago

Teppei Genshin Impact Voice Actor: Who is it?

Esports4 days ago

Who won Minecraft Championships (MCC) 15? | Final Standings and Scores

Esports5 days ago

All ranked mode rewards for Pokémon UNITE: Season 1

Aviation3 days ago

Legendary F-14 Pilot Dale ‘Snort’ Snodgrass Dies In A Tragic Plane Crash

Cleantech4 days ago

Form Energy Reveals Iron-Air 100 Hour Storage Battery

Esports4 days ago

Sakura Arborism Genshin Impact: How to Complete

Esports5 days ago

Here are the results for the PUBG Mobile World Invitational (PMWI) West 2021

watch-live-russias-pirs-module-set-to-depart-space-station-today.jpg
Aerospace3 days ago

Watch live: Russia’s Pirs module set to depart space station today

Esports5 days ago

Here are the results for the PUBG Mobile World Invitational (PMWI) East 2021

best-gengar-build-in-pokemon-unite.png
Esports4 days ago

Best Gengar build in Pokémon UNITE

Techcrunch4 days ago

This Week in Apps: Clubhouse opens up, Twitter talks bitcoin, Snap sees record quarter

Cyber Security5 days ago

Threat Actors are Abusing Argo Workflows to Target Kubernetes

Esports5 days ago

Are there ranked rewards in Pokémon UNITE?

Esports4 days ago

Best Garchomp build in Pokémon UNITE

Cyber Security5 days ago

What Programming Language Should I Learn for CyberSecurity?

Blockchain4 days ago

Canadian Border Town Halts Crypto Mining to Draw Up Regulations

Esports4 days ago

How to unlock Pokémon in Pokémon UNITE, all Unite License costs

AR/VR4 days ago

Warplanes: WW1 Fighters to See Official Oculus Quest Store Launch This Week

AI4 days ago

What is the Freedom Phone and Should You Buy It?

Crowdfunding4 days ago

Calgary, Alberta’s Allied Venture Partners Confirms they’ve Invested $1M+ into Early-Stage Tech Firms

Trending