Connect with us

# Predicting qualification ranking based on practice session performance for Formula 1 Grand Prix

Published

on

If you’re a Formula 1 (F1) fan, have you ever wondered why F1 teams have very different performances between qualifying and practice sessions? Why do they have multiple practice sessions in the first place? Can practice session results actually tell something about the upcoming qualifying race? In this post, we answer these questions and more. We show you how we can predict qualifying results based on practice session performances by harnessing the power of data and machine learning (ML). These predictions are being integrated into the new “Qualifying Pace” insight for each F1 Grand Prix (GP). This work is part of the continuous collaboration between F1 and the Amazon ML Solutions Lab to generate new F1 Insights powered by AWS.

Each F1 GP consists of several stages. The event starts with three practice sessions (P1, P2, and P3), followed by a qualifying (Q) session, and then the final race. Teams approach practice and qualifying sessions differently because these sessions serve different purposes. The practice sessions are the teams’ opportunities to test out strategies and tire compounds to gather critical data in preparation for the final race. They observe the car’s performance with different strategies and tire compounds, and use this to determine their overall race strategy.

In contrast, qualifying sessions determine the starting position of each driver on race day. Teams focus solely on obtaining the fastest lap time. Because of this shift in tactics, Friday and Saturday practice session results often fail to accurately predict the qualifying order.

In this post, we introduce deterministic and probabilistic methods to model the time difference between the fastest lap time in practice sessions and the qualifying session (∆t = tq-tp). The goal is to more accurately predict the upcoming qualifying standings based on the practice sessions.

## Error sources of ∆t

The delta of the fastest lap time between practice and qualifying sessions (∆t) comes primarily from variations in fuel level and tire grip.

A higher fuel level adds weight to the car and reduces the speed of the car. For practice sessions, teams vary the fuel level as they please. For the second practice session (P2), it’s common to begin with a low fuel level and run with more fuel in the latter part of the session. During qualifying, teams use minimal fuel levels in order to record the fastest lap time. The impact of fuel on lap time varies from circuit to circuit, depending on how many straights the circuit has and how long these straights are.

Tires also play a significant role in an F1 car’s performance. During each GP event, the tire supplier brings various tire types with varying compounds suitable for different racing conditions. Two of these are for wet circuit conditions: intermediate tires for light standing water and wet tires for heavy standing water. The remaining dry running tires can be categorized into three compound types: hard, medium, and soft. These tire compounds provide different grips to the circuit surface. The more grip the tire provides, the faster the car can run.

Past racing results showed that car performance dropped significantly when wet tires were used. For example, in the 2018 Italy GP, because the P1 session was wet and the qualifying session was dry, the fastest lap time in P1 was more than 10 seconds slower than the qualifying session.

Among the dry running types, the hard tire provides the least grip but is the most durable, whereas the soft tire has the most grip but is the least durable. Tires degrade over the course of a race, which reduces the tire grip and slows down the car. Track temperature and moisture affects the progression of degradation, which in turn changes the tire grip. As in the case with fuel level, tire impact on lap time changes from circuit to circuit.

## Data and attempted approaches

Given this understanding of factors that can impact lap time, we can use fuel level and tire grip data to estimate the final qualifying lap time based on known practice session performance. However, as of this writing, data records to directly infer fuel level and tire grip during the race are not available. Therefore, we take an alternative approach with data we can currently obtain.

The data we used in the modeling were records of fastest lap times for each GP since 1950 and partial years of weather data for the corresponding sessions. The lap times data included the fastest lap time for each session (P1, P2, P3, and Q) of each GP with the driver, car and team, and circuit name (publicly available on F1’s website). Track wetness and temperature for each corresponding session was available in the weather data.

We explored two implicit methods with the following model inputs: the team and driver name, and the circuit name. Method one was a rule-based empirical model that attributed observed  to circuits and teams. We estimated the latent parameter values (fuel level and tire grip differences specific to each team and circuit) based on their known lap time sensitivities. These sensitivities were provided by F1 and calculated through simulation runs on each circuit track. Method two was a regression model with driver and circuit indicators. The regression model learned the sensitivity of ∆t for each driver on each circuit without explicitly knowing the fuel level and tire grip exerted. We developed and compared deterministic models using XGBoost and AutoGluon, and probabilistic models using PyMC3.

We built models using race data from 2014 to 2019, and tested against race data from 2020. We excluded data from before 2014 because there were significant car development and regulation changes over the years. We removed races in which either the practice or qualifying session was wet because ∆t for those sessions were considered outliers.

## Managed model training with Amazon SageMaker

We trained our regression models on Amazon SageMaker.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly. Specifically for model training, it provides many features to assist with the process.

For our use case, we explored multiple iterations on the choices of model feature sets and hyperparameters. Recording and comparing the model metrics of interest was critical to choosing the most suitable model. The Amazon SageMaker API allowed customized metrics definition prior to launching a model training job, and easy retrieval after the training job was complete. Using the automatic model tuning feature reduced the mean squared error (MSE) metric on the test data by 45% compared to the default hyperparameter choice.

We trained an XGBoost model using the Amazon SageMaker’s built-in implementation. Its built-in implementation allowed us to run model training through a general estimator interface. This approach provided better logging, superior hyperparameter validation, and a larger set of metrics than the original implementation.

## Rule-based model

In the rule-based approach, we reason that the differences of lap times ∆t primarily come from systematic variations of tire grip for each circuit and fuel level for each team between practice and qualifying sessions. After accounting for these known variations, we assume residuals are random small numbers with a mean of zero. ∆t can be modeled with the following equation:

∆tf(c) and ∆tg(c) are known sensitivities of fuel mass and tire grip, and  is the residual. A hierarchy exists among the factors contained in the equation. We assume grip variations for each circuit (g(c)) are at the top level. Under each circuit, there are variations of fuel level across teams (f(t,c)).

To further simplify the model, we neglect  because we assume it is small. We further assume fuel variation for each team across all circuits is the same (i.e., f(t,c) = f(t)). We can simplify the model to the following:

Because ∆tf(c) and ∆tg(c) are known, f(t) and g(c), we can estimate team fuel variations and tire grip variations from the data.

The differences in the sensitivities depend on the characteristics of circuits. From the following track maps, we can observe that the Italian GP circuit has fewer corner turns and the straight sections are longer compared to the Singapore GP circuit. Additional tire grip gives a larger advantage in the Singapore GP circuit.

## ML regression model

For the ML regression method, we don’t directly model the relation between  and fuel level and grip variations. Instead, we fit the following regression model with just the circuit, team, and driver indicator variables:

Ic, It, and Id represent the indicator variables for circuits, teams, and drivers.

## Hierarchical Bayesian model

Another challenge with modeling the race pace was due to noisy measurements in lap times. The magnitude of random effect (ϵ) of ∆t could be non-negligible. Such randomness might come from drivers’ accidental drift from their normal practice at the turns or random variations of drivers’ efforts during practice sessions. With deterministic approaches, such random effect wasn’t appropriately captured. Ideally, we wanted a model that could quantify uncertainty about the predictions. Therefore, we explored Bayesian sampling methods.

With a hierarchical Bayesian model, we account for the hierarchical structure of the error sources. As with the rule-based model, we assume grip variations for each circuit (g(c))) are at the top level. The additional benefit of a hierarchical Bayesian model is that it incorporates individual-level variations when estimating group-level coefficients. It’s a middle ground between two extreme views of data. One extreme is to pool data for every group (circuit and driver) without considering the intrinsic variations among groups. The other extreme is to train a regression model for each circuit or driver. With 21 circuits, this amounts to 21 regression models. With a hierarchical model, we have a single model that considers the variations simultaneously at the group and individual level.

We can mathematically describe the underlying statistical model for the hierarchical Bayesian approach as the following varying intercepts model:

Here, i represents the index of each data observation, j represents the index of each driver, and k represents the index of each circuit. μjk represents the varying intercept for each driver under each circuit, and θk represents the varying intercept for each circuit. wp and wq represent the wetness level of the track during practice and qualifying sessions, and ∆T represents the track temperature difference.

## Test models in the 2020 races

After predicting ∆t, we added it into the practice lap times to generate predictions of qualifying lap times. We determined the final ranking based on the predicted qualifying lap times. Finally, we compared predicted lap times and rankings with the actual results.

The following figure compares the predicted rankings and the actual rankings for all three practice sessions for the Austria, Hungary, and Great Britain GPs in 2020 (we exclude P2 for the Hungary GP because the session was wet).

For the Bayesian model, we generated predictions with an uncertainty range based on the posterior samples. This enabled us to predict the ranking of the drivers relatively with the median while accounting for unexpected outcomes in the drivers’ performances.

The following figure shows an example of predicted qualifying lap times (in seconds) with an uncertainty range for selected drivers at the Austria GP. If two drivers’ prediction profiles are very close (such as MAG and GIO), it’s not surprising that either driver might be the faster one in the upcoming qualifying session.

## Metrics on model performance

To compare the models, we used mean squared error (MSE) and mean absolute error (MAE) for lap time errors. For ranking errors, we used rank discounted cumulative gain (RDCG). Because only the top 10 drivers gain points during a race, we used RDCG to apply more weight to errors in the higher rankings. For the Bayesian model output, we used median posterior value to generate the metrics.

The following table shows the resulting metrics of each modeling approach for the test P2 and P3 sessions. The best model by each metric for each session is highlighted.

 MODEL MSE MAE RDCG P2 P3 P2 P3 P2 P3 Practice raw 2.822 1.053 1.544 0.949 0.92 0.95 Rule-based 0.349 0.186 0.462 0.346 0.88 0.95 XGBoost 0.358 0.141 0.472 0.297 0.91 0.95 AutoGluon 0.567 0.351 0.591 0.459 0.90 0.96 Hierarchical Bayesian 0.431 0.186 0.521 0.332 0.87 0.92

All models reduced the qualifying lap time prediction errors significantly compared to directly using the practice session results. Using practice lap times directly without considering pace correction, the MSE on the predicted qualifying lap time was up to 2.8 seconds. With machine learning methods which automatically learned pace variation patterns for teams and drivers on different circuits, we brought the MSE down to smaller than half a second. The resulting prediction was a more accurate representation of the pace in the qualifying session. In addition, the models improved the prediction of rankings by a small margin. However, there was no one single approach that outperformed all others. This observation highlighted the effect of random errors on the underlying data.

## Summary

In this post, we described a new Insight developed by the Amazon ML Solutions Lab in collaboration with Formula 1 (F1).

This work is part of the six new F1 Insights powered by AWS that are being released in 2020, as F1 continues to use AWS for advanced data processing and ML modeling. Fans can expect to see this new Insight unveiled at the 2020 Turkish GP to provide predictions for the upcoming qualifying races at practice sessions.

Guang Yang is a data scientist at the Amazon ML Solutions Lab where he works with customers across various verticals and applies creative problem solving to generate value for customers with state-of-the-art ML/AI solutions.

# AI clocks first-known ‘binary sextuply-eclipsing sextuple star system’. Another AI will be along shortly to tell us how to pronounce that properly

Published

on

Astronomers have discovered the first-known “sextuply-eclipsing sextuple star system,” after a neural network flagged it up in data collected by NASA’s Transiting Exoplanet Survey Satellite (TESS).

The star system, codenamed TIC 168789840, is an oddball compared to its peers. Not only does it contain six suns, they’re split into three pairs of eclipsing binary stars. That means the suns in each pair, to an observer, pass directly in front of one another in their orbits. The stars in each pair are gravitationally bound to each other and to every other sun in their system, meaning each pair circles around each other, and around a common center of mass.

To get an idea of how this is structured, consider each star pair to be labeled A, B, and C. The A pair circle one another every 1.6 days, and C every 1.3 days. The A and C system completes a full orbit around their galactic center in a little under four years.

The remaining pair, B, is much further away, and is described as an outer binary. The B suns revolve around one another every 8.22 days, and it takes them about two thousand years to run a lap around the system’s common center of mass, according to a paper due to appear in The Astrophysical Journal detailing these findings. If that’s all a little mind-boggling, here’s a rough sketch of the system’s structure taken from the paper:

The system’s orbital mechanics … Click to enlarge

An alien living on a hypothetical planet orbiting one of the inner quadruple stars would see four very bright suns in the sky and another two dimmer ones further away. These stars would periodically disappear, as they eclipsed one another. The chances of anyone observing this, however, are pretty slim to none as it doesn’t look like there are any exoplanets in TIC 168789840.

### Finding the first sextuply-eclipsing sextuple star system with machine learning

NASA’s TESS telescope gathers a massive amount of data. Instead of manually poring over tens of millions of objects, scientists instead feed the data into machine-learning algorithms designed to highlight the most interesting examples for further examination.

Brian Powell, first author of the study and a data scientist at NASA’s High Energy Astrophysics Science Archive Research Center, trained a classifier to spot eclipsing binary systems.

The neural network looks for the characteristic dip in an object’s light curve, caused when one star passes in front of the other. It assigns a score on the likelihood it’s identified a eclipsing binary systems: ones rated above 0.9 on a scale up to 1.0 is considered a strong candidate.

The computer-vision model that performs all this is made up of a layer of approximately 5.5 million parameters, and was taught using more than 40,000 training examples on a cluster of eight Nvidia V100 GPUs for approximately two days.

At first, TIC 168789840 didn’t seem so odd. “The neural network was trained to look for the feature of the eclipse in the light curve with no concern as to periodicity,” Powell told The Register.

“Therefore, to the neural net, an eclipsing binary is no different than an eclipsing sextuple, both of them would likely have an output near 1.0.”

Upon closer inspection, however, the scientists were shocked when they realized they had discovered the first-known triplet eclipsing binary system. Each star locked in their pairs are very similar to one another in terms of mass, radius, and temperature.

“The fact that all three binaries show eclipses allows us to determine the radii and relative temperatures of each star. This, together with measurement of the radial velocities, allows us to determine the masses of the stars. Having this much information on a multiple star system of this order is quite rare,” Powell added.

There are 17 or so known sextuple star systems, though TIC 168789840 is the first structure where the sextuple suns are also binary eclipsing stars. Scientists hope that studying all its structural and physical properties will unlock mysteries of how multiple star systems are born. ®

# Governance: Companies mature in their use of AI know that it needs guardrails

Published

on

Quality governance ensures responsible data models and AI execution, as well as helps the data models stay true to the business objectives.

The fundamentals of traditional IT governance have focused on service-level agreements like uptime and response time, and also on oversight of areas such as security and data privacy. The beauty of these goals is that they are concrete and easy to understand. This makes them attainable with minimal confusion if an organization is committed to getting the job done.

Unfortunately, governance becomes a much less-definable task in the world of artificial intelligence (AI), and a premature one for many organizations.

“This can come down to the level of AI maturity that a company is at,” said Scott Zoldi, chief analytics officer at FICO. “Companies are in a variety of stages of the AI lifecycle, from exploring use cases and hiring staff, to building the models, and having a couple of instances deployed but not widely across the organization. Model governance comes into play when companies are mature in their use of AI technology, are invested in it, and realize that AI’s predictive and business value should be accompanied by guardrails.”

Because AI is more opaque than enterprise IT environments, AI requires a governance strategy that asks questions of architectures and that requires architectures to be more transparent,” Zoldi said.

SEE: 3 steps for better data modeling with IT and data science (TechRepublic)

Achieving transparency in AI governance begins with being able explain in plain language the technology behind AI and how it operates to board members, senior management, end users, and non-AI IT staff. Questions that AI practitioners should be able to answer should include but not be limited to, how data is prepared and taken into AI systems, which data is being taken in and why, and how the AI operates on the data to return answers to the questions that the business is asking. AI practitioners should also explain how both data and what you ask of it continuously change over time as business and other conditions change.

This is a pathway to ensuring responsible data models and AI execution, and also a way to ensure that the data models that a company develops for its AI stay true to its business objectives.

One central AI governance challenge is ensuring that the data and the AI operating on it are as bias-free as possible.

“AI governance is a board-level responsibility to mitigate pressures from regulators and advocacy groups,” Zoldi said. “Boards of directors should care about AI governance because AI technology makes decisions that profoundly affect everyone. Will a borrower be invisibly discriminated against and denied a loan? Will a patient’s disease be incorrectly diagnosed, or a citizen unjustly arrested for a crime he did not commit?

## How to achieve AI fairness

The increasing magnitude of AI’s life-altering decisions underscores the urgency with which AI fairness and bias should be ushered onto boards’ agendas.”

SEE: Equitable tech: AI-enabled platform to reduce bias in datasets released  (TechRepublic)

Zoldi said that to eliminate bias, boards must understand and enforce auditable, immutable AI model governance based on four classic tenets of corporate governance: accountability, fairness, transparency, and responsibility. He believes this can be achieved if organizations focus their AI governance on ethical, efficient, and explainable AI.

Ethical AI ensures that models operate without bias toward a protected group, and are used only in areas where we have confidence in the decisions the models generate. These issues have strong business implications; models that make biased decisions against protected groups aren’t just wrong, they are illegal.

Efficient AI helps AI make the leap from the development lab to making decisions in production that can be accepted with confidence. Otherwise, an inordinate amount of time and resources are invested in models that don’t deliver real-world business value.

Explainable AI makes sure that companies using AI models can meet a growing list of regulations, starting with GDPR, to be able to explain how the model made its decision, and why.”

SEE: Encourage AI adoption by moving shadow AI into the daylight (TechRepublic)

Some organizations are already tackling these AI governance challenges, while others are just beginning to think about them.

This is why, when putting together an internal team to address governance, a best practice approach is a three-tiered structure that begins with an executive sponsor at the top to champion AI at a corporate level.

“One tier down, executives such as the CAO, CTO, CFO, and head of legal should lead the oversight of AI governance from a policy and process perspective,” Zoldi said. “Finally, at the blocking-and-tackling level, senior practitioners from the various model development and model delivery areas, who work together with AI technology on a daily basis, should hash out how to meet those corporate governance standards.”

# Gartner: The future of AI is not as rosy as some might think

Published

on

A Gartner report predicts that the second-order consequences of widespread AI will have massive societal impacts, to the point of making us unsure if and when we can trust our own eyes.

Gartner has released a series of Predicts 2021 research reports, including one that outlines the serious, wide-reaching ethical and social problems it predicts artificial intelligence (AI) to cause in the next several years. In Predicts 2021: Artificial Intelligence and Its Impact on People and Society, five Gartner analysts report on different predictions it believes will come to fruition by 2025. The report calls particular attention to what it calls second-order consequences of artificial intelligence that arise as unintended results of new technologies.

Generative AI, for example, is now able to create amazingly realistic photographs of people and objects that don’t actually exist; Gartner predicts that by 2023, 20% of account takeovers will use deepfakes generated by this type of AI. “AI capabilities that can create and generate hyper-realistic content will have a transformational effect on the extent to which people can trust their own eyes,” the report said.

The report tackles five different predictions for the AI market, and gives recommendations for how businesses can address those challenges and adapt to the future:

• By 2025, pretrained AI models will be largely concentrated among 1% of vendors, making responsible use of AI a societal concern
• In 2023, 20% of successful account takeover attacks will use deepfakes as part of social engineering attacks
• By 2024, 60% of AI providers will include harm/misuse mitigation as a part of their software
• By 2025, 10% of governments will avoid privacy and security concerns by using synthetic populations to train AI
• By 2025, 75% of workplace conversations will be recorded and analyzed for use in adding organizational value and assessing risk

Each of those analyses is enough to make AI-watchers sit up and take notice, but when combined it creates a picture of a grim future rife with ethical concerns, potential misuse of AI, and loss of privacy in the workplace.

Concerns over AI’s effect on privacy and truth are sure to be major topics in the coming years if Gartner’s analysts are accurate in their predictions, and successful businesses will need to be ready to adapt quickly to those concerns.

A recurring theme in the report is the establishment of ethics boards at companies that rely on AI, whether as a service or a product. This is mentioned particularly for businesses that plan to record and analyze workplace conversations: Boards with employee representation should be established to ensure fair use of conversations data, Gartner said.

SEE: Natural language processing: A cheat sheet (TechRepublic)

Gartner also recommends that businesses establish criteria for responsible AI consumption and prioritize vendors that “can demonstrate responsible development of AI and clarity in addressing related societal concerns.”

As for security concerns surrounding deepfakes and generative AI, Gartner recommends that organizations should schedule training about deepfakes. “We are now entering a zero-trust world. Nothing can be trusted unless it is certified as authenticated using cryptographic digital signatures,” the report said.

There’s a lot to digest in this report, from figures saying that the best deepfake detection software will top out at a 50% identification rate in the long term, to the prediction that in 2023 a major US corporation will adopt conversation analysis to determine employee compensation. There’s much to be worried about in these analyses, but potential antidotes are included as well. The full report is available at Gartner, but interested parties will need to pay for access.

# Model serving in Java with AWS Elastic Beanstalk made easy with Deep Java Library

Published

on

Deploying your machine learning (ML) models to run on a REST endpoint has never been easier. Using AWS Elastic Beanstalk and Amazon Elastic Compute Cloud (Amazon EC2) to host your endpoint and Deep Java Library (DJL) to load your deep learning models for inference makes the model deployment process extremely easy to set up. Setting up a model on Elastic Beanstalk is great if you require fast response times on all your inference calls. In this post, we cover deploying a model on Elastic Beanstalk using DJL and sending an image through a post call to get inference results on what the image contains.

DJL is a deep learning framework written in Java that supports training and inference. DJL is built on top of modern deep learning engines (such as TenserFlow, PyTorch, and MXNet). You can easily use DJL to train your model or deploy your favorite models from a variety of engines without any additional conversion. It contains a powerful model zoo design that allows you to manage trained models and load them in a single line. The built-in model zoo currently supports more than 70 pre-trained and ready-to-use models from GluonCV, HuggingFace, TorchHub, and Keras.

## Benefits

The primary benefit of hosting your model using Elastic Beanstalk and DJL is that it’s very easy to set up and provides consistent sub-second responses to a post request. With DJL, you don’t need to download any other libraries or worry about importing dependencies for your chosen deep learning framework. Using Elastic Beanstalk has two advantages:

• No cold startup – Compared to an AWS Lambda solution, the EC2 instance is running all the time, so any call to your endpoint runs instantly and there isn’t any ovdeeerhead when starting up new containers.
• Scalable – Compared to a server-based solution, you can allow Elastic Beanstalk to scale horizontally.

## Configurations

You need to have the following gradle dependencies set up to run our PyTorch model:

``````plugins { id 'org.springframework.boot' version '2.3.0.RELEASE' id 'io.spring.dependency-management' version '1.0.9.RELEASE' id 'java'
} dependencies { implementation platform("ai.djl:bom:0.8.0") implementation "ai.djl.pytorch:pytorch-model-zoo" implementation "ai.djl.pytorch:pytorch-native-auto" implementation "org.springframework.boot:spring-boot-starter" implementation "org.springframework.boot:spring-boot-starter-web"
}
``````

## The code

We first create a RESTful endpoint using Java SpringBoot and have it accept an image request. We decode the image and turn it into an `Image` object to pass into our model. The model is autowired by the Spring framework by calling the `model()` method. For simplicity, we create the predictor object on each request, where we pass our image for inference (you can optimize this by using an object pool) . When inference is complete, we return the results to the requester. See the following code:

`` @Autowired ZooModel<Image, Classifications> model; /** * This method is the REST endpoint where the user can post their images * to run inference against a model of their choice using DJL. * * @param input the request body containing the image * @return returns the top 3 probable items from the model output * @throws IOException if failed read HTTP request */ @PostMapping(value = "/doodle") public String handleRequest(InputStream input) throws IOException { Image img = ImageFactory.getInstance().fromInputStream(input); try (Predictor<Image, Classifications> predictor = model.newPredictor()) { Classifications classifications = predictor.predict(img); return GSON.toJson(classifications.topK(3)) + System.lineSeparator(); } catch (RuntimeException | TranslateException e) { logger.error("", e); Map<String, String> error = new ConcurrentHashMap<>(); error.put("status", "Invoke failed: " + e.toString()); return GSON.toJson(error) + System.lineSeparator(); } } @Bean public ZooModel<Image, Classifications> model() throws ModelException, IOException { Translator<Image, Classifications> translator = ImageClassificationTranslator.builder() .optFlag(Image.Flag.GRAYSCALE) .setPipeline(new Pipeline(new ToTensor())) .optApplySoftmax(true) .build(); Criteria<Image, Classifications> criteria = Criteria.builder() .setTypes(Image.class, Classifications.class) .optModelUrls(MODEL_URL) .optTranslator(translator) .build(); return ModelZoo.loadModel(criteria); } ``

A full copy of the code is available on the GitHub repo.

Go into the `beanstalk-model-serving` directory and enter the following code:

``````cd beanstalk-model-serving
``````

This creates a JAR file found in `build/libs/beanstalk-model-serving-0.0.1-SNAPSHOT.jar`

## Deploying to Elastic Beanstalk

To deploy this model, complete the following steps:

1. On the Elastic Beanstalk console, create a new environment.
2. For our use case, we name the environment DJL-Demo.
3. For Platform, select Managed platform.
4. For Platform settings, choose Java 8 and the appropriate branch and version.

1. When selecting your application code, choose Choose file and upload the` beanstalk-model-serving-0.0.1-SNAPSHOT.jar` that was created in your build.
2. Choose Create environment.

After Elastic Beanstalk creates the environment, we need to update the Software and Capacity boxes in our configuration, located on the Configuration overview page.

1. For the Software configuration, we add an additional setting in the Environment Properties section with the name SERVER_PORT and value 5000.
2. For the Capacity configuration, we change the instance type to t2.small to give our endpoint a little more compute and memory.
3. Choose Apply configuration and wait for your endpoint to update.

Now we can call our Elastic Beanstalk endpoint with our image of a smiley face.

See the following code:

``curl -X POST -T smiley.png <endpoint>.elasticbeanstalk.com/inference``

We get the following response:

``````[ { "className": "smiley_face", "probability": 0.9874626994132996 }, { "className": "face", "probability": 0.004804758355021477 }, { "className": "mouth", "probability": 0.0015588520327582955 }
]
``````

The output predicts that a smiley face is the most probable item in our image. Success!

## Limitations

If your model isn’t called often and there isn’t a requirement for fast inference, we recommend deploying your models on a serverless service such as Lambda. However, this adds overhead due to the cold startup nature of the service. Hosting your models through Elastic Beanstalk may be slightly more expensive because the EC2 instance runs 24 hours a day, so you pay for the service even when you’re not using it. However, if you expect a lot of inference requests a month, we have found the cost of model serving on Lambda is equal to the cost of Elastic Beanstalk using a t3.small when there are about 2.57 million inference requests to the endpoint.

## Conclusion

In this post, we demonstrated how to start deploying and serving your deep learning models using Elastic Beanstalk and DJL. You just need to set up your endpoint with Java Spring, build your JAR file, upload that file to Elastic Beanstalk, update some configurations, and it’s deployed!

We also discussed some of the pros and cons of this deployment process, namely that it’s ideal if you need fast inference calls, but the cost is higher when compared to hosting it on a serverless endpoint with lower utilization.

This demo is available in full in the DJL demo GitHub repo. You can also find other examples of serving models with DJL across different JVM tools like Spark and AWS products like Lambda. Whatever your requirements, there is an option for you.

Follow our GitHub, demo repository, Slack channel, and Twitter for more documentation and examples of DJL!