Connect with us

Big Data

Robinhood, gateway to ‘meme’ stocks, raises $2.1 billion in IPO

Published

on

By Echo Wang and David French

(Reuters) -Robinhood Markets Inc, the owner of the trading app which emerged as the go-to destination for retail investors speculating on this year’s “meme’ stock trading frenzy, raised $2.1 billion in its initial public offering on Wednesday.

The company was seeking to capitalize on individual investors’ fascination with cryptocurrencies and stocks such as GameStop Corp, which have seen wild swings after becoming the subject of trading speculation on social media sites such as Reddit. Robinhood’s monthly active users surged from 11.7 million at the end of December to 21.3 million as of the end of June.

The IPO valued Robinhood at $31.8 billion, making it greater as a function of its revenue than many of its traditional rivals such as Charles Schwab Corp, but the offering priced at the bottom of the company’s indicated range.

Some investors stayed on the sidelines, citing concerns over the frothy valuation, the risk of regulators cracking down on Robinhood’s business, and even lingering anger with the company’s imposition of trading curbs when the meme stock trading frenzy flared up at the end of January.

Robinhood said it sold 55 million shares in the IPO at $38 apiece, the low end of its $38 to $42 price range. This makes it one of the most valuable U.S. companies to have gone public year-to-date, amid a red-hot market for new listings.

In an unusual move, Robinhood had said it would reserve between 20% and 35% of its shares for its users.

Robinhood’s platform allows users to make unlimited commission-free trades in stocks, exchange-traded funds, options and cryptocurrencies. Its simple interface made it popular with young investors trading from home during the COVID-19 pandemic.

Robinhood enraged some investors and U.S. lawmakers earlier this year when it restricted trading in some popular stocks following a 10-fold rise in deposit requirements at its clearinghouse. It has been at the center of many regulatory probes.

The company disclosed this week that it has received inquiries from U.S. regulators looking into whether its employees traded shares of GameStop and AMC Entertainment Holdings, Inc before the trading curbs were placed at the end of January.

In June, Robinhood agreed to pay nearly $70 million to settle an investigation by Wall Street’s own regulator, the Financial Industry Regulatory Authority, for “systemic” failures, including systems outages, providing “false or misleading” information, and weak options trading controls.

The brokerage has also been criticized for relying on “payment for order flow” for most of its revenue, under which it receives fees from market makers for routing trades to them and does not charge users for individual trades.

Critics argue the practice, which is used by many other brokers, creates a conflict of interest, on the grounds that it incentivizes brokers to send orders to whoever pays the higher fees. Robinhood contends that it routes trades based on what is cheapest for its users, and that charging a commission would be more expensive. The U.S. Securities and Exchange Commission is examining the practice.

Robinhood was founded in 2013 by Stanford University roommates Vlad Tenev and Baiju Bhatt. They will hold a majority of the voting power after the offering, these filings showed, with Bhatt having around 39% of the voting power of outstanding stock while Tenev will hold about 26.2%.

The company’s shares are scheduled to start trading on Nasdaq on Thursday under the ticker “HOOD”

Goldman Sachs and J.P. Morgan were the lead underwriters in Robinhood’s IPO.

(Reporting by Echo Wang and David French in New York; Editing by Leslie Adler)

Image Credit: Reuters

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://datafloq.com/read/robinhood-gateway-meme-stocks-raises-21-billion-ipo/16712

Big Data

Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV

Published

on

Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV

This article documents the authors’ experience building their custom MLOps approach.


By Aaron Soellinger & Will Kunz

When designing the MLOps stack for our project, we needed a solution that allowed for a high degree of customization and flexibility to evolve as our experimentation dictated. We considered large platforms that encompassed many functions, but found it limiting in some key areas. Ultimately we decided on an approach where separate specialized tools were implemented for labeling, data versioning, and continuous integration. This article documents our experience building this custom MLOps approach.



Photo by Finding Dan | Dan Grinwis on Unsplash

NBDEV

 
 



(Taken from https://github.com/fastai/nbdev)

 

The classic problem using Jupyter for development was moving from prototype to production required copy/pasting code from a notebook to a python module. NBDEV automates the transition between notebook and module, thus enabling the Jupyter notebook to be an official part of a production pipeline. NBDEV allows the developer to state which module a notebook should create, which notebook cells to push to the module and which notebook cells are tests. A key capability of NBDEV is its approach to testing within the notebooks, and the NBDEV template even provides a base Github Action to implement testing in the CI/CD framework. The resulting Python module requires no editing by the developer, and can easily be integrated into other notebooks or the project at large using built-in python import functionality.

Iterative.ai: DVC/CML

 
 



(Taken from https://iterative.ai/)

 

The files used in machine learning pipelines are often large archives of binary/compressed files, which are not accessible or cost prohibitive for existing version control solutions like git. DVC solves data versioning by representing large datasets as a hash of the file contents which enables DVC to track changes. It works similar to git (e.g. dvc adddvc push). When you run dvc add on your dataset, it gets added to the .gitignore and tracked for changes by dvc. CML is a project that provides functionality for publishing model artifacts from Github Actions workflows into comments attached Github Issues, Pull Requests, etc… That is important because it helps us start to fill in the gaps in the Pull Requests accounting for training data changes and resulting model accuracy and effectiveness.

Github Actions

 
 



(Taken from https://github.com/features/actions)

 

We want automated code testing, including building models in the automated testing pipeline. Github Actions is in competition with CircleCI, Travis, Jenkins, which is to automate testing around code pushes, commits, pull requests, etc. Since we’re already using Github to host our repos, we avoid another 3rd party app by using Actions. In this project we need to use Github self-hosted runners to run jobs on an on-prem GPU cluster.

Label Studio

 
 



(Taken from https://labelstud.io/)

 

We did a deep dive into how we’re using Label Studio found here. Label Studio is a solution for labeling data. It works well, and is flexible to run in a variety of environments.

Why use them together?

 
 
The setup is designed to deploy models faster. That means, more data scientists working harmoniously in parallel, transparency in the repository and faster onboarding time for new people. The goal is to standardize the types of activities that data scientists need to do in project and provide clear instructions for them.

The following is a list of tasks we want to streamline with this system design:

  1. Automate the ingest from Label Studio and provide a single point for ingesting that into the model training and evaluation activities.
  2. Automated testing on the data pipeline code, that is unit testing and re-deployment of containers used by the process.
  3. Automated testing on the model code, that is unit testing and re-deployment of containers used by the process.
  4. Enable automated testing to include model re-training and evaluation criteria. When the model code changes, train a model with the new code and compare it to the existing incumbent model.
  5. Trigger model retraining when training data changes.

Below is the description of pipeline for each task.

Traditional CI/CD Pipeline

 
 
This pipeline implements automated testing feedback for each pull request that includes evaluation of syntax, unit, regression and integration tests. The outcome of this process is a functionally tested docker image to our private repository. This process maximizes the likelihood that the latest best code is in a fully tested image available in the repository for downstream tasks. Here’s how the developer lifecycle works in the context of a new feature:



Here we show how the workflow function for while editing the code. Using NBDEV enables us to work directly from the Jupyter notebooks including writing the tests directly in the notebook. NBDEV requires that all the cells in the notebooks run without exception (unless the cell is flagged not to run). (Image by Author)

Data pipeline

 
 
Label Studio currently lacks event hooks enabling updates on-changes to the label data stored. So we take a cron triggered approach, updating the dataset every hour. Additionally, while the label studio training dataset is small enough, the updates can be done as part of the training pipeline as well. We have the ability to trigger the data pipeline refresh on demand using the Github Actions interface.



The data pipeline feeds from Label Studio, and persists every version of the dataset and relevant inputs to the DVC cache stored in AWS S3. (Image by Author)

Model Pipeline

 
 
The modeling pipeline integrates model training into the CI/CD pipeline for the repository. This enables each pull request to evaluate the syntax, unit, integration and regression tests configured on the codebase, but also can provide feedback that includes evaluating the new resulting model



The workflow in this case, run the model training experiment specified in the configuration file (model_params.yaml) and update the model artifact (best-model.pth) (Image by Author)

Benchmark Evaluation Pipeline

 
 
The benchmarking pipeline forms an “official submission” process to ensure all modeling activities are measured against the metrics of the project.



The newly trained model in best-model.pth is evaluated against the benchmark dataset and the results are tagged with the latest commit hash and persisted in AWS S3. (Image by Author)

Workflow

 
 
Here is the DAG definition file that is used by DVC. It captures the workflow steps and their inputs, and allows for reproducibility across users and machines.

stages: labelstudio_export_trad: cmd: python pipelines/1_labelstudio_export.py --config_fp pipelines/traditional_pipeline.yaml --ls_token *** --proj_root "." params: - pipelines/traditional_pipeline.yaml: - src.host - src.out_fp - src.proj_id dataset_create_trad: cmd: python pipelines/2_labelstudio_todataset.py --config_fp pipelines/create_traditional.yaml --proj_root "." deps: - data/raw_labels/traditional.json params: - pipelines/create_traditional.yaml: - dataset.bmdata_fp - dataset.labels_map - dataset.out_fp - dataset.rawdata_dir train_model_trad: cmd: python pipelines/3_train_model.py --config_fp pipelines/model_params.yaml --proj_root "." deps: - data/traditional_labeling params: - pipelines/model_params.yaml: - dataloader.bs - dataloader.size - dataloader.train_fp - dataloader.valid_fp - learner.backbone - learner.data_dir - learner.in_checkpoint - learner.metrics - learner.n_out - learner.wandb_project_name - train.cycles labelstudio_export_bench: cmd: python pipelines/1_labelstudio_export.py --config_fp pipelines/benchmark_pipeline.yaml --ls_token *** --proj_root "." params: - pipelines/benchmark_pipeline.yaml: - src.host - src.out_fp - src.proj_id dataset_create_bench: cmd: python pipelines/2_labelstudio_todataset.py --config_fp pipelines/create_benchmark.yaml --proj_root "." deps: - data/raw_labels/benchmark.json params: - pipelines/create_benchmark.yaml: - dataset.bmdata_fp - dataset.labels_map - dataset.out_fp - dataset.rawdata_dir eval_model_trad: cmd: python pipelines/4_eval_model.py --config_fp pipelines/bench_eval.yaml --proj_root "." deps: - data/models/best-model.pth params: - pipelines/bench_eval.yaml: - eval.bench_fp - eval.label_config - eval.metrics_fp - eval.model_conf - eval.overlay_dir

Findings

 
 

  1. The Github Actions workflow cron trigger is not extremely reliable. It does not guarantee timing.
  2. DVC does not work in a clear manner inside a Github Action workflow that is triggered on push. It will alter the trackers that are source controlled and when that is committed it will create another Github action.
  3. The Github Actions orchestration as a mechanism to run model requires a self-hosted runner to use a GPU. This means connecting to a GPU instance in the cloud or on-prem, and this presents issues with access control. For example, we can’t open source the exact repo without removing that self-hosted runner configuration from the repo or else random people would be able to run workloads on our training server by pushing code to the project.
  4. NBDEV built-in workflow is testing the code in the wrong place. It’s testing the notebook instead of the compiled package. On the one hand, it’s nice to be able to say that the “tests can be written right into the notebook”. On the other hand, testing the notebooks directly tests leaves open the possibility that the code package created by NBDEV fails even though the notebook ran. What we need is the ability to test the NBDEV-compiled package directly
  5. NBDEV doesn’t interoperate with “traditional” Python development in the sense that NBDEV is a one-way street. It simply allows the project to be developed in the interactive Jupyter notebook style. It makes it impossible to develop the Python modules directly. If at any point, the project wants to be converted to “traditional” Python development testing would need to be accomplished another way.
  6. In the beginning, we were using Weights & Biases as our experiment tracking dashboard, however there were issues deploying it into a Github Action. What we can say is that the user experience for implementing wandb hit its first hiccup in the Action Workflow. Removing Weights & Biases resolved the problem straight away. Before that, wandb stood out as the best user experience in MLOps.

Conclusions

 
 
Ultimately, it took one week to complete the implementation of these tools for managing our code with Github Actions, Iterative.ai tools (DVC & CML) and NBDEV. This provides us with the following capabilities:

  1. Work from Jupyter notebooks as the system of record for the code. We like Jupyter. The main use case it accomplishes is to enable us to work directly on any hardware we can SSH into by hosting a Jupyter server there and forwarding it to a desktop. To be clear, we would be doing this even if we were not using NBDev because the alternative is using Vim or some such tool that we don’t like as much. Past experiments to connect to remote servers with VS Code or Pycharm failed. So it’s Jupyter.
  2. Testing the code, and testing the model it creates. Now as part of the CI/CD pipeline we can evaluate whether or not the model resulting from the changes to the repo make the model better, worse or stay the same. This is all available in the pull request before it is merged into main.
  3. Using Github Actions server as an orchestrator for training runs begins to allow multiple data scientists to work simultaneously in a more clear manner. Going forward, we will see the limitations of this setup for orchestrating the collaborative data science process.

 
Aaron Soellinger has formerly worked as a data scientist and software engineer solving problems in finance, predictive maintenance and sports. He currently works as a machine learning systems consultant with Hoplabs working on a multi-camera computer vision application.

Will Kunz is a back end software developer, bringing a can-do attitude and dogged determination to challenges. It doesn’t matter if it’s tracking down an elusive bug or adapting quickly to a new technology. If there’s a solution, Will wants to find it.

Original. Reposted with permission.

Related:


PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://www.kdnuggets.com/2021/09/adventures-mlops-github-actions-iterative-ai-label-studio-and-nbdev.html

Continue Reading

Big Data

Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV

Published

on

Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV

This article documents the authors’ experience building their custom MLOps approach.


By Aaron Soellinger & Will Kunz

When designing the MLOps stack for our project, we needed a solution that allowed for a high degree of customization and flexibility to evolve as our experimentation dictated. We considered large platforms that encompassed many functions, but found it limiting in some key areas. Ultimately we decided on an approach where separate specialized tools were implemented for labeling, data versioning, and continuous integration. This article documents our experience building this custom MLOps approach.



Photo by Finding Dan | Dan Grinwis on Unsplash

NBDEV

 
 



(Taken from https://github.com/fastai/nbdev)

 

The classic problem using Jupyter for development was moving from prototype to production required copy/pasting code from a notebook to a python module. NBDEV automates the transition between notebook and module, thus enabling the Jupyter notebook to be an official part of a production pipeline. NBDEV allows the developer to state which module a notebook should create, which notebook cells to push to the module and which notebook cells are tests. A key capability of NBDEV is its approach to testing within the notebooks, and the NBDEV template even provides a base Github Action to implement testing in the CI/CD framework. The resulting Python module requires no editing by the developer, and can easily be integrated into other notebooks or the project at large using built-in python import functionality.

Iterative.ai: DVC/CML

 
 



(Taken from https://iterative.ai/)

 

The files used in machine learning pipelines are often large archives of binary/compressed files, which are not accessible or cost prohibitive for existing version control solutions like git. DVC solves data versioning by representing large datasets as a hash of the file contents which enables DVC to track changes. It works similar to git (e.g. dvc adddvc push). When you run dvc add on your dataset, it gets added to the .gitignore and tracked for changes by dvc. CML is a project that provides functionality for publishing model artifacts from Github Actions workflows into comments attached Github Issues, Pull Requests, etc… That is important because it helps us start to fill in the gaps in the Pull Requests accounting for training data changes and resulting model accuracy and effectiveness.

Github Actions

 
 



(Taken from https://github.com/features/actions)

 

We want automated code testing, including building models in the automated testing pipeline. Github Actions is in competition with CircleCI, Travis, Jenkins, which is to automate testing around code pushes, commits, pull requests, etc. Since we’re already using Github to host our repos, we avoid another 3rd party app by using Actions. In this project we need to use Github self-hosted runners to run jobs on an on-prem GPU cluster.

Label Studio

 
 



(Taken from https://labelstud.io/)

 

We did a deep dive into how we’re using Label Studio found here. Label Studio is a solution for labeling data. It works well, and is flexible to run in a variety of environments.

Why use them together?

 
 
The setup is designed to deploy models faster. That means, more data scientists working harmoniously in parallel, transparency in the repository and faster onboarding time for new people. The goal is to standardize the types of activities that data scientists need to do in project and provide clear instructions for them.

The following is a list of tasks we want to streamline with this system design:

  1. Automate the ingest from Label Studio and provide a single point for ingesting that into the model training and evaluation activities.
  2. Automated testing on the data pipeline code, that is unit testing and re-deployment of containers used by the process.
  3. Automated testing on the model code, that is unit testing and re-deployment of containers used by the process.
  4. Enable automated testing to include model re-training and evaluation criteria. When the model code changes, train a model with the new code and compare it to the existing incumbent model.
  5. Trigger model retraining when training data changes.

Below is the description of pipeline for each task.

Traditional CI/CD Pipeline

 
 
This pipeline implements automated testing feedback for each pull request that includes evaluation of syntax, unit, regression and integration tests. The outcome of this process is a functionally tested docker image to our private repository. This process maximizes the likelihood that the latest best code is in a fully tested image available in the repository for downstream tasks. Here’s how the developer lifecycle works in the context of a new feature:



Here we show how the workflow function for while editing the code. Using NBDEV enables us to work directly from the Jupyter notebooks including writing the tests directly in the notebook. NBDEV requires that all the cells in the notebooks run without exception (unless the cell is flagged not to run). (Image by Author)

Data pipeline

 
 
Label Studio currently lacks event hooks enabling updates on-changes to the label data stored. So we take a cron triggered approach, updating the dataset every hour. Additionally, while the label studio training dataset is small enough, the updates can be done as part of the training pipeline as well. We have the ability to trigger the data pipeline refresh on demand using the Github Actions interface.



The data pipeline feeds from Label Studio, and persists every version of the dataset and relevant inputs to the DVC cache stored in AWS S3. (Image by Author)

Model Pipeline

 
 
The modeling pipeline integrates model training into the CI/CD pipeline for the repository. This enables each pull request to evaluate the syntax, unit, integration and regression tests configured on the codebase, but also can provide feedback that includes evaluating the new resulting model



The workflow in this case, run the model training experiment specified in the configuration file (model_params.yaml) and update the model artifact (best-model.pth) (Image by Author)

Benchmark Evaluation Pipeline

 
 
The benchmarking pipeline forms an “official submission” process to ensure all modeling activities are measured against the metrics of the project.



The newly trained model in best-model.pth is evaluated against the benchmark dataset and the results are tagged with the latest commit hash and persisted in AWS S3. (Image by Author)

Workflow

 
 
Here is the DAG definition file that is used by DVC. It captures the workflow steps and their inputs, and allows for reproducibility across users and machines.

stages: labelstudio_export_trad: cmd: python pipelines/1_labelstudio_export.py --config_fp pipelines/traditional_pipeline.yaml --ls_token *** --proj_root "." params: - pipelines/traditional_pipeline.yaml: - src.host - src.out_fp - src.proj_id dataset_create_trad: cmd: python pipelines/2_labelstudio_todataset.py --config_fp pipelines/create_traditional.yaml --proj_root "." deps: - data/raw_labels/traditional.json params: - pipelines/create_traditional.yaml: - dataset.bmdata_fp - dataset.labels_map - dataset.out_fp - dataset.rawdata_dir train_model_trad: cmd: python pipelines/3_train_model.py --config_fp pipelines/model_params.yaml --proj_root "." deps: - data/traditional_labeling params: - pipelines/model_params.yaml: - dataloader.bs - dataloader.size - dataloader.train_fp - dataloader.valid_fp - learner.backbone - learner.data_dir - learner.in_checkpoint - learner.metrics - learner.n_out - learner.wandb_project_name - train.cycles labelstudio_export_bench: cmd: python pipelines/1_labelstudio_export.py --config_fp pipelines/benchmark_pipeline.yaml --ls_token *** --proj_root "." params: - pipelines/benchmark_pipeline.yaml: - src.host - src.out_fp - src.proj_id dataset_create_bench: cmd: python pipelines/2_labelstudio_todataset.py --config_fp pipelines/create_benchmark.yaml --proj_root "." deps: - data/raw_labels/benchmark.json params: - pipelines/create_benchmark.yaml: - dataset.bmdata_fp - dataset.labels_map - dataset.out_fp - dataset.rawdata_dir eval_model_trad: cmd: python pipelines/4_eval_model.py --config_fp pipelines/bench_eval.yaml --proj_root "." deps: - data/models/best-model.pth params: - pipelines/bench_eval.yaml: - eval.bench_fp - eval.label_config - eval.metrics_fp - eval.model_conf - eval.overlay_dir

Findings

 
 

  1. The Github Actions workflow cron trigger is not extremely reliable. It does not guarantee timing.
  2. DVC does not work in a clear manner inside a Github Action workflow that is triggered on push. It will alter the trackers that are source controlled and when that is committed it will create another Github action.
  3. The Github Actions orchestration as a mechanism to run model requires a self-hosted runner to use a GPU. This means connecting to a GPU instance in the cloud or on-prem, and this presents issues with access control. For example, we can’t open source the exact repo without removing that self-hosted runner configuration from the repo or else random people would be able to run workloads on our training server by pushing code to the project.
  4. NBDEV built-in workflow is testing the code in the wrong place. It’s testing the notebook instead of the compiled package. On the one hand, it’s nice to be able to say that the “tests can be written right into the notebook”. On the other hand, testing the notebooks directly tests leaves open the possibility that the code package created by NBDEV fails even though the notebook ran. What we need is the ability to test the NBDEV-compiled package directly
  5. NBDEV doesn’t interoperate with “traditional” Python development in the sense that NBDEV is a one-way street. It simply allows the project to be developed in the interactive Jupyter notebook style. It makes it impossible to develop the Python modules directly. If at any point, the project wants to be converted to “traditional” Python development testing would need to be accomplished another way.
  6. In the beginning, we were using Weights & Biases as our experiment tracking dashboard, however there were issues deploying it into a Github Action. What we can say is that the user experience for implementing wandb hit its first hiccup in the Action Workflow. Removing Weights & Biases resolved the problem straight away. Before that, wandb stood out as the best user experience in MLOps.

Conclusions

 
 
Ultimately, it took one week to complete the implementation of these tools for managing our code with Github Actions, Iterative.ai tools (DVC & CML) and NBDEV. This provides us with the following capabilities:

  1. Work from Jupyter notebooks as the system of record for the code. We like Jupyter. The main use case it accomplishes is to enable us to work directly on any hardware we can SSH into by hosting a Jupyter server there and forwarding it to a desktop. To be clear, we would be doing this even if we were not using NBDev because the alternative is using Vim or some such tool that we don’t like as much. Past experiments to connect to remote servers with VS Code or Pycharm failed. So it’s Jupyter.
  2. Testing the code, and testing the model it creates. Now as part of the CI/CD pipeline we can evaluate whether or not the model resulting from the changes to the repo make the model better, worse or stay the same. This is all available in the pull request before it is merged into main.
  3. Using Github Actions server as an orchestrator for training runs begins to allow multiple data scientists to work simultaneously in a more clear manner. Going forward, we will see the limitations of this setup for orchestrating the collaborative data science process.

 
Aaron Soellinger has formerly worked as a data scientist and software engineer solving problems in finance, predictive maintenance and sports. He currently works as a machine learning systems consultant with Hoplabs working on a multi-camera computer vision application.

Will Kunz is a back end software developer, bringing a can-do attitude and dogged determination to challenges. It doesn’t matter if it’s tracking down an elusive bug or adapting quickly to a new technology. If there’s a solution, Will wants to find it.

Original. Reposted with permission.

Related:


PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://www.kdnuggets.com/2021/09/adventures-mlops-github-actions-iterative-ai-label-studio-and-nbdev.html

Continue Reading

Big Data

The Machine & Deep Learning Compendium Open Book

Published

on

The Machine & Deep Learning Compendium Open Book

After years in the making, this extensive and comprehensive ebook resource is now available and open for data scientists and ML engineers. Learn from and contribute to this tome of valuable information to support all your work in data science from engineering to strategy to management.


By Ori Cohen, AI/ML/DL Expert, Researcher, Data Scientist.

Partial Topic List From The Machine & Deep Learning Compendium.

Nearly a year ago, I announced the Machine & Deep Learning Compendium, a Google document that I have been writing for the last 4 years. The ML Compendium contains over 500 topics, and it is over 400 pages long.

Today, I’m announcing that the Compendium is fully open. It is now a project on GitBook and GitHub (please star it!). I believe in knowledge sharing, and the Compendium will always be free to everyone.

I see this compendium as a gateway, as a frequently visited resource for people of various proficiency levels, for industry data scientists, and academics. The compendium will save you countless hours googling and sifting through articles that may not give you any value.

The Compendium includes around 500 topics that contain various summaries, links, and articles that I have read on numerous topics that I found interesting or that I had needed to learn. It includes the majority of modern machine learning algorithms, statistics, feature selection and engineering techniques, deep-learning, NLP, audio, deep and classic vision, time series, anomaly detection, graphs, experiment management, and much more. In addition, strategic topics, such as data science management and team building, are highlighted as well as other essential topics, such as product management, product design, and a technology stack from a data science perspective.

Please keep in mind that this is a perpetual work in progress on a variety of topics. If you feel that something should be changed, you can now easily contribute using GitBook, GitHub, or contact me.

GitBook

The ML Compendium is a project on GitBook, which means that you can contribute as a GitBook writer. Writing and editing content using the internal editor is easy and intuitive, especially compared to the more advanced option of contributing via GitHub pull requests.

You can visit the mlcompendium.com website or directly access the compendium “book”. As seen in Figure 1, on the left you have the main topics and on the right the sub-topics which are in each main topic, not to mention that the search feature is more advanced, especially compared to the old method of using CTRL-F inside the original document.

Figure 1: The Machine & Deep Learning Compendium with the GitBook UI.

The following are two topics that may interest you, the natural language processing (NLP) page, as seen in Figure 2, and the deep neural nets (DNN) page, as seen in Figure 3.

Figure 2: Natural Language Processing.

Figure 3: Deep Neural Nets.

GitHub

Alternatively, you can use GitHub (Figure 4) if you want to contribute content, please place the content within the proper topic, then create a pull request to a new branch. Finally, don’t forget to ‘Star’ the project if you like it.

The following is a simple set of instructions for contributing using GitHub:

  1. git clone https://github.com/orico/www.mlcompendium.com.git
  2. git branch mybranch
  3. git switch mybranch
  4. add your content
  5. git add the-edited-file
  6. git commit -m “my content”
  7. git push
  8. create a PR by visiting this link: https://github.com/orico/stateofmlops/pull/new/mybranch

Figure 4: The mlcompendium.com GitHub project.

.

Original. Reposted with permission.

Bio: Dr. Ori Cohen has a Ph.D. in Computer Science with a focus on machine learning. He is the author of the ML & DL Compendium and the StateOfMLOps.com. He is a lead data scientist at New Relic TLV, doing machine and deep learning research in the field of AIOps & MLOps.

Related:


PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://www.kdnuggets.com/2021/09/machine-deep-learning-open-book.html

Continue Reading

Big Data

The Machine & Deep Learning Compendium Open Book

Published

on

The Machine & Deep Learning Compendium Open Book

After years in the making, this extensive and comprehensive ebook resource is now available and open for data scientists and ML engineers. Learn from and contribute to this tome of valuable information to support all your work in data science from engineering to strategy to management.


By Ori Cohen, AI/ML/DL Expert, Researcher, Data Scientist.

Partial Topic List From The Machine & Deep Learning Compendium.

Nearly a year ago, I announced the Machine & Deep Learning Compendium, a Google document that I have been writing for the last 4 years. The ML Compendium contains over 500 topics, and it is over 400 pages long.

Today, I’m announcing that the Compendium is fully open. It is now a project on GitBook and GitHub (please star it!). I believe in knowledge sharing, and the Compendium will always be free to everyone.

I see this compendium as a gateway, as a frequently visited resource for people of various proficiency levels, for industry data scientists, and academics. The compendium will save you countless hours googling and sifting through articles that may not give you any value.

The Compendium includes around 500 topics that contain various summaries, links, and articles that I have read on numerous topics that I found interesting or that I had needed to learn. It includes the majority of modern machine learning algorithms, statistics, feature selection and engineering techniques, deep-learning, NLP, audio, deep and classic vision, time series, anomaly detection, graphs, experiment management, and much more. In addition, strategic topics, such as data science management and team building, are highlighted as well as other essential topics, such as product management, product design, and a technology stack from a data science perspective.

Please keep in mind that this is a perpetual work in progress on a variety of topics. If you feel that something should be changed, you can now easily contribute using GitBook, GitHub, or contact me.

GitBook

The ML Compendium is a project on GitBook, which means that you can contribute as a GitBook writer. Writing and editing content using the internal editor is easy and intuitive, especially compared to the more advanced option of contributing via GitHub pull requests.

You can visit the mlcompendium.com website or directly access the compendium “book”. As seen in Figure 1, on the left you have the main topics and on the right the sub-topics which are in each main topic, not to mention that the search feature is more advanced, especially compared to the old method of using CTRL-F inside the original document.

Figure 1: The Machine & Deep Learning Compendium with the GitBook UI.

The following are two topics that may interest you, the natural language processing (NLP) page, as seen in Figure 2, and the deep neural nets (DNN) page, as seen in Figure 3.

Figure 2: Natural Language Processing.

Figure 3: Deep Neural Nets.

GitHub

Alternatively, you can use GitHub (Figure 4) if you want to contribute content, please place the content within the proper topic, then create a pull request to a new branch. Finally, don’t forget to ‘Star’ the project if you like it.

The following is a simple set of instructions for contributing using GitHub:

  1. git clone https://github.com/orico/www.mlcompendium.com.git
  2. git branch mybranch
  3. git switch mybranch
  4. add your content
  5. git add the-edited-file
  6. git commit -m “my content”
  7. git push
  8. create a PR by visiting this link: https://github.com/orico/stateofmlops/pull/new/mybranch

Figure 4: The mlcompendium.com GitHub project.

.

Original. Reposted with permission.

Bio: Dr. Ori Cohen has a Ph.D. in Computer Science with a focus on machine learning. He is the author of the ML & DL Compendium and the StateOfMLOps.com. He is a lead data scientist at New Relic TLV, doing machine and deep learning research in the field of AIOps & MLOps.

Related:


PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://www.kdnuggets.com/2021/09/machine-deep-learning-open-book.html

Continue Reading
Esports5 days ago

NBA 2K22 MyCareer: How to Get a Shoe Deal

Crowdfunding3 days ago

Conister Bank Lends More to Time Finance

Esports4 days ago

How to craft a weapon in Fortnite Chapter 2, season 8

Esports4 days ago

How to complete a Sideways Encounter in Fortnite

Aerospace5 days ago

Potential component defect to delay next Virgin Galactic flight

Esports3 days ago

NBA 2K22 Lightning Green Animation: How to Claim

Esports5 days ago

Na’Vi win ESL Pro League Season 14 in five-map thriller against Vitality to complete $1 million Intel Grand Slam

Ukraine
Esports5 days ago

s1mple claims third consecutive MVP in EPL victory

Esports5 days ago

The best dunkers in NBA 2K22

Aerospace4 days ago

Photos: SpaceX rocket arrives on launch pad for Inspiration4 mission

HRTech4 days ago

Amazon launches educational benefits for frontline staff

Esports5 days ago

AVE and Trasko emerge victorious from CIS IEM Fall open qualifier

Big Data5 days ago

China to break up Ant’s Alipay and force creation of separate loans app – FT

Esports5 days ago

Imperial round out IEM Fall SA closed qualifier team list

HRTech4 days ago

How Crompton is using the ‘Power of Language’ to create a high-engagement culture

Cleantech5 days ago

Bringing Solar & Tesla Batteries To Restaurants In New Orleans To “Stay Lit,” And How You Can Help

HRTech4 days ago

Competing with Self

Esports5 days ago

All Maps in Battlefield 2042

Cleantech5 days ago

Is A Tesla Model S Plaid Fully Submersible? (VIDEO)

Aerospace5 days ago

SpaceX rocket rolls out to pad 39A for Inspiration4 launch

Trending

Copyright © 2020 Plato Technologies Inc.