Connect with us

Big Data

Implementing LSTM for Human Activity Recognition using Smartphone Accelerometer data



Implementing LSTM for Human Activity Recognition

Learn everything about Analytics

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.



Nine Tools I Wish I Mastered Before My PhD in Machine Learning



Nine Tools I Wish I Mastered Before My PhD in Machine Learning

Whether you are building a start up or making scientific breakthroughs these tools will bring your ML pipeline to the next level.

By Aliaksei Mikhailiuk, AI Scientist

Image by Author.


Despite its monumental role in advancing technology, academia is often ignorant of industrial achievements. By the end of my PhD I realised that there is a myriad of great auxiliary tools, overlooked in academia, but widely adopted in industry.

From my personal experience I know that learning and integrating new tools can be boring, scary, could put back and demotivate, especially when the current set up is so familiar and works.

Dropping bad habits can be difficult. With every tool outlined below I had to accept that the way I did things was suboptimal. However, in the process I have also learnt that at times results not seen in the moment pay off ten fold at a later stage.

Below I talk about the tools that I have found very useful for researching and building machine learning applications, both as an academic and an AI engineer. I group the tools in four sections by their purpose: environment isolation, experiment tracking, collaboration and visualisation.

Isolating Environments

Machine learning is an extremly fast developing field and hence commonly used packages are updated very often. Despite developers efforts, newer versions are often not compatible with their predecessors. And that does cause a lot of pain!

Fortunately there are tools to solve this problem!



Image by Author.


How many times did those NVIDIA drivers caused you trouble? During my PhD I had a university managed machine that was regularly updated. Updated overnight and without any notice. Imagine my surprise when the morning after the update I find out that most of my work is now incompatible with the latest drivers.

Although not directly meant for that, docker saves you from these especially stressful before the deadline misfortunes.

Docker allows to wrap software in packages called containers. Containers are isolated units that have their own software, libraries and configuration files. In a simplified view a container is a separate, independent virtual operating system that has means to communicate with the outside world.

Docker has a plethora of ready made containers for you to use, without extensive knowledge of how to configure everything yourself it is very easy to get started with the basics.

For those wanting to have a quick start, check out this tutorial. Also Amazon AWS has done a great job explaining why and how to use docker for machine learning here.


Reusing someones code became a new norm today. Someone creates a useful repository on github, you clone the code, install and get your solution without the need to write anything yourself.

There is a slight inconvenience though. When multiple projects are used together you run into package managing problem, where different projects require different versions of packages.

I am glad I discovered Conda not so late in my PhD. Conda is a package and environment management system. It allows to create multiple environments and quickly installs, run and update packages and their dependencies. You can quickly switch between isolated environments and always be sure that your project interacts only with the packages you expect.

Conda provides their own tutorial on how to create your first environment.

Running, tracking and logging experiments

Two essential pillars, without which getting a PhD in an applied field is close to impossible are rigour and consistency. And if you have ever tried to work with machine learning models you probably know how easy it is to loose track of the tested parameters. Back in the day parameter tracking was done in lab notebooks, I am certain these are still very useful in other fields, but in Computer Science we now have tools much more powerful than that.

Weights and biases


Snapshot of the wandb panel for a set of simple metrics — train loss, learning rate and average validation loss. Notice that you can also track system parameters! Image by Author.



Do these names look familiar? If so, then your model tracking skills should be stepped up. This was me in the first year of my PhD. As an excuse, I should say that I had a spreadsheet where I would log the details of every experiment and all associated files. However, it is still very convoluted and every change in parameters logging would inevitably impact the post-processing scripts.

Weights and biases (W&B/wandb) is one of the gems that I found quite late, but now use in every project. It lets you track, compare, visualize and optimize machine learning experiments with just a few lines of code. It also lets you track your datasets. Despite a large number of options I found W&B easy to set up and use with a very friendly web interface.

For those interested check out their quick set up tutorial here.



Image by Author.


Similar to W&B, MLFlow provides functionality for logging code, models and datasets on which your model has been trained. Although I have used it solely for the purpose of logging data, models and code, it provides functionality well beyond that. It allows to manage the whole ML lifecycle, including experimentation, reproducibility and deployment.

If you want to quickly integrate it into your models, check out this tutorial. Databricks have also shared a very nice explanation of MLflow.


Leaving the experiments running overnight and hoping that your machine won’t go to sleep was my go to option in the first half a year of my PhD. When the work moved to remote I used to worry about the ssh session breaking — the code was running for several hours and almost converged.

I learnt about screen function rather late, and so couldn’t save myself from half backed results in the mornings. But in this case it is indeed better late than never.

Screen lets you launch and use multiple shell sessions from a single ssh session. The process started with screen can be detached from session and then reattached at a later time. So your experiments can be run in the background, without the need to worry about session closing, or terminal crashing.

The functionality is summarised here.


Academia is notorious for not having proper mechanisms for effective team management. To an extent this is justified by very strict requirements for personal contribution. Nevertheless the pace at which machine learning is progressing needs joint effort. Below are two rather basic tools that would be handy for effective communication, especially in the new realm of remote work.


Pretty basic, huh? After seeing all the horror of how people track their code in accademia I cannot stress how important it is to be well versed in version control. No more folders named code_v1, code_v2.

Github provides a very useful framework for code tracking, merging and reviewing. Whenever a team is building a deep image quality metric each member could have its own branch of the code, working in parallel. Different parts of the solution can then be merged together. Whenever someone introduces a bug, it is dead easy to revert to the working version. Overall I rank git as the most important of all the tools I have mentioned in this article.

Check out this step by step guide on how to quickly start up.


Lucidchart was introduced to me recently, before that I was using — a very simple interface for creating diagrams. Lucidchart is thousand times more powerful and has a much more versatile functionality. Its major strength is the shared space for collaboration and ability to make notes next to diagrams. Imagine a giant online whiteboard with a huge set of templates.

For a quick start check this tutorial page by Lucidchart.


Numerous paper submissions, especially unsuccessful ones, have taught me that presentation is often as important as the results. If the reviewer, who usually does not have much time, does not understand the text, the work is straightaway rejected. Images made in haste make a poor impression. Someone once told me: “If you cannot make a chart, how can I trust your results?”. I disagree with this statement, however, I do agree that the impression does matter.


A picture is worth a thousand words (in fact, correction 84.1 word).

Inkscape is a FREE software for vector graphics. In fact I was taught how to use it in my web-development course in my undergrad. However, I learnt how to enjoy it in full only during my PhD — working on those pretty pictures for the papers.

Of all the functionality that Inkscape provides especially valuable was TexText extension. With this package you can integrate your latex formulas seamlesly into an image.

There is a myriad of tutorials, however for the basic functionality I would recommend the ones provided by Inkscape team here.


Did you ever need to create a simple website to showcase you results or a simple machine learning application? In just few lines of python code it’s possible with Streamlit.

I found it particularly useful for paper supplementary materials, however it can be even more useful for easy deployment and showcasing project demos to clients.

For a quick start up check out this tutorial.

Summary and beyond

Finishing my PhD while positioning myself in industry was not easy. But it taught me several important lessons I wish I had at an earlier stage of my PhD.

The most important lesson is that curiosity and readiness to learn and change can greatly impact the quality of your work.

Below is the summary of the tutorials I have mentioned in each section:

Weights and biasesTutorial

If you liked this article share it with a friend! To read more on machine learning and image processing topics press subscribe!

Have I missed anything? Do not hesitate to leave a note, comment or message me directly!

Bio: Aliaksei Mikhailiuk has a proven track record of researching, developing, deploying and maintaining machine learning algorithms in Computer Vision, Preference Aggregation and Natural Language Processing.

Original. Reposted with permission.


PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.


Continue Reading

Big Data

Exclusive-Polish gene project moves to drop Chinese tech on data concerns



By Joanna Plucinska

WARSAW (Reuters) – A European Union-funded project to build a genomic map of Poland plans to drop gene-sequencing technology from China’s BGI Group over concerns about data security, one of the project’s leaders told Reuters.

The Genomic Map of Poland’s concerns stem from questions over how Polish genomic data may be used that relate to national security, said Marek Figlerowicz, a Professor at the Institute of Bioorganic Chemistry at the Polish Academy of Sciences who steers the project.

Figlerowicz said the concerns were initially raised by a report earlier this year from the U.S. National Security Commission on Artificial Intelligence (NSCAI) which said BGI may be serving as a “global collection mechanism for Chinese government genetic databases.”

BGI told Reuters in response that the U.S. report was “disinformation, not borne out by the facts;” China’s Ministry of Foreign Affairs has called it “groundless accusations and smears.”

An NSCAI spokesperson said it stood by its report, and has recommended the United States and allies double-down on techniques to better protect patient privacy. Since 2015, Beijing has restricted foreign researchers from accessing gene data on Chinese people.

In August, a human genetics committee at the Polish Academy of Sciences said a “lack of compliance” by what it called “Far East companies” with the principles of genetic testing ethics raised serious doubts. It did not name any companies or countries but urged labs and scientific institutions that sequence genetic material abroad to stop using biotechnology companies there.

It said about 100,000 complete Polish genomes may already be in “Far Eastern” laboratories, citing a rough estimate which Reuters could not verify. Poland has no control over that sensitive personal data, the committee said.

Figlerowicz said the Genomic Map, which is expected to cost over 100 million zlotys ($25.35 million) and is about halfway through its programme of sequencing 5,000 Polish genomes, has outsourced the work to a third party since 2019.

That company, Bialystok-based Central Europe Genomics Center sp. z o.o. (CEGC), started using BGI’s technology last year, he said; now Figlerowicz said the Genomic Map of Poland had decided not to send any genetic data out of the country and is likely to cancel the contract it has with CEGC. He added that the final decision, still to be approved by the funding institutions, is expected within the next week or two.

As the technology to sequence genetic data has advanced and become cheaper, Figlerowicz said, the mapping project plans to bring the remaining sequencing in-house. The project wants to ensure Poland has genomic “independence” so it can ensure data security.

CEGC did not respond to requests for comment. Poznan-based biotech company Inno-Gene S.A., which holds a minority stake in CEGC, said it did not know of a possible cancellation.

The European Union, which provided about 65% of the total funding for the Genomic Map, did not respond to a request for comment. Neither did Poland’s Ministry of Education and National Information Processing Institute, also involved with funding the project. Poland’s special services declined to comment.


Reuters reported in July that BGI had developed and improved a prenatal screening test sold in at least 52 countries in collaboration with People’s Liberation Army hospitals.

The privacy policy on the test’s website says data collected can be shared when it is “directly relevant to national security or national defence security” in China, although BGI says it hasn’t been asked to do that. BGI uses the pregnant women’s genetic data for research into the traits of populations. It also collaborates with the PLA in other areas of research.

BGI rejects any suggestion that it developed the test, branded NIFTY, in collaboration with the military, and says working with military hospitals is not equivalent.

“BGI takes all aspects of data protection, privacy and ethics extremely seriously,” the company said in a statement on the Polish decision, adding it complies with all applicable data protection laws and regulations.

“We value the business and research relationships we have with all our partners and customers in Poland and we look forward to continuing our collaboration with them.”

($1 = 3.9448 zlotys)

(Additional reporting by Kirsty Needham in Sydney; Edited by Sara Ledwith)

Image Credit: Reuters

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.


Continue Reading

Big Data

Energy trader Vitol, China’s BYD partner on EV infrastructure



LONDON (Reuters) – Vitol has partnered with China’s BYD to allocate an initial $250 million to expand electric vehicle infrastructure in some markets, the energy trading firm said on Wednesday.

“Together they will offer municipal, corporate and others a comprehensive solution including electric vehicles, charging infrastructure and depot design,” the statement said.

Vitol is currently deploying over 300 electric buses in Bogota, Colombia, and is seeking additional opportunities in South America and further afield.

“We are excited by the potential and intend to grow our fleet tenfold in the coming years,” Andrew de Pass, Vitol’s head of renewables, said in the statement.

(Reporting by Julia Payne; editing by Jason Neely)

Image Credit: Reuters

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.


Continue Reading

Big Data

Japan regulator steps in to fix Mizuho’s computer flaws



By Yuki Nitta and Junko Fujita

TOKYO (Reuters) – Japan’s banking regulator will take a role in overseeing computer systems at Mizuho Financial Group’s retail banking arm after glitches that exposed failings at the country’s third-largest lender despite $3.6 billion in upgrades.

The Financial Services Agency’s unusual move reflects the deep regulatory concern over deep-rooted technical problems at Mizuho Bank, which the FSA ordered to improve its business.

“I can’t think of any other cases where the FSA has become so closely involved in inspections of a (banks’) computer system,” said Brian Waterhouse, senior bank analyst of Windamee Research, who publishes on the Smartkarma platform.

“This shows how much the FSA is concerned about the problems,” Waterhouse added.

The FSA will communicate with Mizuho Bank on the running of its computers, officials said at a briefing, although the regulator stopped short of taking on direct management of the bank’s systems after a series of technical meltdowns this year.

These included widespread ATM outages, causing frustration among customers and undermining confidence in Mizuho Bank.

“Mizuho will submit reports with what’s needed to be done to us and we will point out if changes are needed.” an FSA official told a media briefing, adding that there would be ongoing conversations with the bank.

The problems are all the more notable given that Mizuho spent more than $3.6 billion to overhaul its systems in 2019 following two large-scale breakdowns in 2002 and 2011.

A third-party report commissioned by the bank found its corporate culture was to blame for its tech system failures, creating an atmosphere where managers are reluctant to express opinions and unable to respond well to crises.

Mizuho said in a statement it took the regulator’s punishment seriously and reassess the need for scheduled system upgrades and updates.

“Our top priority is the stable operation of our IT system, and we will do all in our power to ensure upgrades and updates proceed steadily and securely,” it said.

“All of our employees will continue to work together towards this goal.”

The regulator’s next actions will depend on the outcome of a report from the bank, which is due by Oct. 29.

(Reporting by Yuki Nitta and Junko Fujita; Writing by Ritsuko Ando and David Dolan; Editing by Stephen Coates and Alexander Smith)

Image Credit: Reuters

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.


Continue Reading
Esports4 days ago

How to download Deltarune Chapter 2

Esports5 days ago

How to transfer Deltarune chapter one save files to chapter 2

Esports3 days ago

NBA 2K22 Limitless Spot-Up and Chef Badges Explained

Esports4 days ago

Square Enix reveals Endwalker title screen on Final Fantasy XIV Letter from the producer

Cyber Security3 days ago

KrebsonSecurity Reported That TTEC Hit With Ransomware Attack

Esports3 days ago

Clash Royale League World Finals 2021 will take place in December with a $1,020,000 prize pool

Esports4 days ago

ArcSystemWorks announces ArcRevo 2021 schedule with Guilty Gear Strive as the only featured title

Esports3 days ago

What is The Old Gym in NBA 2K22 Next Gen?

Esports4 days ago

Square Enix reveals sneak peek of Endwalker areas in Final Fantasy XIV Letter from the producer

Esports10 hours ago

Valve reveals CS:GO Operation Riptide, featuring private queue, short competitive games, new deathmatch modes, and more

Esports9 hours ago

Here are all of CS:GO’s Operation Riptide skins

Esports2 days ago

All Fashion Week Timed Research, Finding Your Voice Special Research, and event-exclusive Field Research tasks and rewards in Pokémon Go

IOT4 days ago

How IoT Revolutionized Several Areas of the Ecommerce Industry

Esports5 days ago

G2 eliminate KRÜ Esports with monster Jett performance from Keloqz, advance to semifinals of VCT Masters Berlin

AR/VR3 days ago

The VR Job Hub: First Contact Entertainment, SyncVR Medical & University of Westminster

Esports3 days ago

Karmine Corp. avoid reverse sweep vs. Fnatic Rising, set record as first team to win back-to-back EU Masters championships

Esports3 days ago

Gambit Esports defeat Envy 3-0 in VCT Masters Berlin Grand Finals

Startups4 days ago

6 Ways Higher Education Can Help You Advance Your Career

Esports4 days ago

Respawn extends Emergence’s ranked split one, says Apex’s connection issues may persist until Sept. 22

Startups4 days ago

Strategic movements a rising startup should do