Code Europe – the biggest tech festival in Poland
Did you know that US retail giant Walmart generates 2.5 petabytes of data from approximately 1,000,000 customers every hour! A petabyte is equal to 1,000,000 gigabytes the equivalent of 13.3 years of HD video content on YouTube. Yes, there are not many companies like Walmart, but even smaller enterprises nowadays generate huge amounts of data, so it becomes increasingly more challenging to take advantage of such information abundance. Data science is at the heart of all that, but before we can apply data science, we must do justice to another crucial player – the cloud and cloud computing in general.
To understand the advantages cloud computing provides when it comes to data science, let us imagine a world with as much data as we have today but without servers. In such an unfortunate scenario firms would need databases that run locally. So every time when you as a data scientist want to engage in new analysis or refreshing existing algorithm you have to transfer information to your machine from the central database and then proceed to operate locally. This unfortunate world would have several main drawbacks:
This doesn’t sound like a perfect scenario does it? That’s why we invented servers and then these servers had drawbacks of their own. The most obvious one is that a server needs space to be stored. A cloud is basically somebody else’s server, so it is essentially their storage problem. Server infrastructure is expensive to buy and set up. Cloud infrastructure is already there and is simply awaiting your server consumption. In house data storing requires you have backups and ideally have them in different locations. Cloud offer data everywhere anytime usually backed up on many different servers across the world. Servers need planning for fast growing companies’, servers needs could be unpredictable even for the current quarter. With in-house servers usually end up buying more servers than you need at a given time. With cloud you pay as much as you use.
Fortunately, we now have clouds, they overshadow local servers in almost every conceivable aspect, and in fact data scientists could now be focused on developing great algorithms, testing hypothesis, taking advantage of all available data – without having to wait hours to see the results of the tests they are performing and certainly without having to worry how much memory space they have left on their computer. Sometimes, data scientists do end up waiting long hours for an algorithm to train, but with the cloud they have the option to pay more and get the job done faster. That is yet another advantage of cloud computing over servers.
The biggest winners are smaller firms, as they get cheap access to the same tools as enormous corporations. Therefore, cloud technologies are a huge enabler – they create a level playing field and allow small players to compete with much bigger ones. This technological progress in the cloud changed several businesses in a way like how the Internet changed commerce. Remember when all of a sudden people around the world were able to open eCommerce stores and compete on a global scale with the established firms? In the same way cloud technologies democratise data analysis and data science. The fact that data scientists and data analysts can rely on data stored on the cloud truly makes their life so much easier. In addition, most cloud providers allow data scientists to access readily installed open-source frameworks right away. This is not only super convenient but can also be a huge time saver.