NaN-tic May 26, 2022
Data analysis is one of the business tools that has changed and developed the most so far in the twenty-first century. The position of a data analyst, which ten years ago was barely heard, today is one of the professional profiles that has grown the most in demand in companies of all kinds, in large companies and not so large. In fact, doesn't it happen to you that it is rare to go to bed one day without having heard or read the words "big data", "data scientist", "machine learning" or "artificial intelligence"?
In this article, we are going to bring us down to earth clarifying (for laymen) what some concepts are and what they are not, and with it, you will discover the most common mistakes in data analysis so that you can put a solution in your company or project.
To do this, we interviewed an expert in data analysis, Cristina Campos, a data scientist, and science communicator. Cristina has a degree in Physics in the specialization of Astrophysics from the University of La Laguna and was awarded a scholarship by the Astrophysical Institute of the Canary Islands to study planetary nebulae. At the end of her degree, she won the scholarship to the resident astrophysicist of the IAC, being in the first place, where he worked for NASA's Sunrise program. At this stage, she specialized in stochastic mathematics and artificial intelligence studying Finance at the UOC, Artificial Intelligence at Stanford University, and the Master of Mathematics for Financial Instruments at the UAB, and worked making stock market predictions in the banking sector. Today, Cristina Campos focuses on AI, data analysis and its visualization in the company Dainso, of which she is a co-founder.
After talking to Cristina, you realize that she is passionate about science, and that she transmits her knowledge with the ability to excite.
First, we have to define what big data is. Simplifying, it is exactly the same work of data analysis, the difference lies in the fact that big data is a huge amount of data, the management of which requires computational and programming resources that are not available to everyone due to their difficulty and power. The good news is that most companies don't have big data, they have at most large data, which is another level. But the science behind it for structuring and profit is the same regardless of its size, it is data analysis.
Data analysis to know and understand what has already happened in the past, that is, a descriptive model is what companies usually want, and it is far from being affected by what we would call chaos. On the other hand, if we talk about data analysis to generate predictions from what has happened in the past, the predictive model is different, there the possibilities of unforeseen variables increase. However, by getting 70% or 80% reliability in your prediction you already have a lot of gain in making business decisions, much more than throwing a coin in the air. They apply, for example, machine learning techniques, and stochastic mathematics where random movements are taken into account (it is widely used for example in the financial world).
Machine learning is precisely an imitation of the human brain’s behavior, it imitates our neural connections to learn from what has already happened, and based on that it predicts. Data analysis and predictive models are nothing more than a tool, for example, an experienced salesperson can know what is going to sell more from his product catalog, but if the catalog is large he lacks information, the automation of data analysis will help him not to miss sales opportunities.
The first mistake is the use of programs that were not designed for today’s data analysis, that have a rudimentary presentation of the data and, in addition, the collection of data is manual with the number of errors that it implies.
The second mistake is how they structure the data, not all the data is relevant and can introduce noise according to what we want to measure. So, the result is that we don't get the information we need.
Third, not having realistic goals regarding what we can get from data analysis. Especially in terms of predictive models, sometimes results are expected that are not possible. Expectations are fueled by the belief that Artificial Intelligence is superhuman, and it is not, at all.
Another added difficulty is the dispersion of the data, if you do not use software that integrates all the data and which is designed in order to have this data synchronized in all processes, such as an ERP, the disaggregation of the information plays against us.
Fifthly and finally, the lack of objectivity. If someone has a lot of interest in finding a result through data, they will find it. It is very important to base the choice, structuring and combination of the data objectively, in addition to a correct modeling, to get as close as possible to the knowledge of reality. That is why it is very good that agents external to the organization review the work of data analysis.
It is super important for a very simple reason, in their day to day, people who are dedicated to running a business, have to concentrate their efforts in many directions, and they can not spend the day looking number by number, with which, the dashboard has to be a tool that helps them as it opens, at first glance, give them the most important information so they can make decisions on time before things can get worse. It is very important to be able to anticipate. Visual information saves them many hours.
The financial sector, like the medical sector, like the educational sector and like so many others, have changed a lot from our parents to us. Now everything is very fast, someone can do a two-week seminar of programming or intensive data science, but the people who dedicate our university years to mathematics, engineering, physics, etc., we learned how to think to solve problems and in the world of data analysis and predictive models there are no shortcuts, it takes time.
So much for the interview with Cristina Campos who has been so kind to answer our questions.