Recently I have been looking up at the sky a lot. Wondering about this and that, is it going to rain, why do those clouds look so good? Is it time for bed, or to eat? Yes I have been on holiday and the whole world appeared different. I am now no longer on holiday but am in the early days of looking at another software application. These are the good times, before the work starts in earnest, so I allow myself a bit of wandering around, as though I am still on holiday: gazing at the horizon and wondering how we might find a way of doing this or that.
When you are invited to look at how an organisation copes with information, technology or anything in particular you have to start off by gathering the data to base an understanding of the organisation and its needs on.
There are a great many ways to gather data for business analysis. Many organisations have systems that hold the information and a lot of the time they are immature, fragmented and often stand alone sub systems (email archives, spreadsheets, single user databases, lists, word processing documents, slide shows, images and videos etc.). In many cases these systems duplicate each other, are redundant, incorrect, out of date and live in isolation.
Many times data may not even be recorded and has to be elicited through; interviews conversations workshops and even interrogation under torture. In these cases it has to be entered manually into a new data store and given time to grow and gradually become more complete and authoritative as more evidence is added.
The discoveries we seek need to be as dynamic as the source data: snapshots are not to be trusted. The data needs to be able to expand and be updated and update the conclusions it suggests; which in turn means it needs to be re-collected each time a question of it is asked. The self-same questions need to be asked repeatedly despite the subject being presented in an, as yet, unknown way.
All of these collections from the data source needs to stored somewhere but the difficulty is that while parts may have their own structure, or be structured in similar ways, finding a structure that allows for this all of this information and accommodates its diversity is a big problem. So is the fact that to be dynamic demands that the data is growing, in ways we never anticipated, all of the time. It will contain structures we did not even know about at the point we were designing a container to place it in.
Is this all sounding familiar? Not in the sense that, yes, we have all faced this problem in the past (most information workers will have) but familiar in the sense that; there is this mass of unstructured and ever changing information out there that is not just useful to us. it is critical. We use it all the time without giving a second thought. It contains all of the truths and all of the knowledge we need … yes the web. The need for distributed processing and shared data (the cloud) has already dealt technically with the challenges of the fact that the world of information is messy.
The web was once described as containing all of the information and knowledge that was in all of the libraries in the world but tipped off the shelves in great piles lying around in no particular order and impossible to make sense of. It therefore makes sense to now think of how the information on the web has evolved from that description and has now come to be more useful, so useful we take it for granted and could even believe it was structured in some way so as to make sense to us. The answers to the question of what to do with all this unstructured (or partly structured or mixed-structure) information is (and was from the early days) out there; all that was needed was a way to Google for it. Now we all know that now don’t we?
In collecting data for business analysis we simply need to take the same approach, not try to structure the data and make it fit into our boxes but put it in places where it sits quite happily in its unstructured form. Then use the same methods, that we use day to day in the cloud, to understand the information we have gathered. Hard as it is to consider breaking away from the well designed schema that gave so much power to the relational databases of our previous applications, noSQL data stores are a great place to keep the unstructured data we may be collecting for analysis. There are also have very good tools available to them for sifting through such banks of unstructured data and making sense of the content. The sense we make of it is then capable of holding that structure we yearn for, indeed it is the best place for the structure to reside, even if the source information comprises so many disparate and discrete structures as to appear unstructured.
This is not the end of the schema, as we know and love it, but any thoughts on an application architecture need to be as least of modern as the most prevalent of all information technologies. Tightly knit neat containers with a handful of columns are no longer dominant in the our world, a single table with a million or so columns is quite common in the world we now move around in. So if your problem is fitting masses of information into your constructs the answer is ignore the problem – the solution is how you look at this mass of data not how to fit it in.
In reality clouds contain a fair bit of water vapour and weigh quite a bit more than you would think, so if we put clouds full of water into neat containers like bottles they would be quite heavy, and as essential as bottles of water are they would never float in the sky and would not tell us much about the weather – or anything much else really.