The Mysteries of Big Data and the Orient … DBkillerseven
Mapping the world of big data must be a lot like demystifying the antiquated concept of the Orient, trying to decipher a mass of unknowns. With the ever multiplying expanse of data and the natural desire of humans to simultaneously understand it—as soon as possible and in real time—technology is continually evolving to allow us to make sense of it, make connections between it, turn it into actionable insight, and act upon it physically in the real world. It’s a huge enterprise, and you’ve got to imagine with the masses of data collated years before on legacy database systems, without the capacity for the technological insight and analysis we have now, there are relationships within the data that remain undefined—the known unknowns, the unknown knowns, and the known knowns (that Rumsfeld guy was making sense you see?). It’s fascinating to think what we might learn from the data we have already collected. There is a burning need these days to break down the mysteries of big data and developers out there are continually thinking of ways we can interpret it, mapping data so that it is intuitive and understandable.
The major way developers have reconceptualized data in order to make sense of it is as a network connected tightly together by relationships. The obvious examples are Facebook or LinkedIn, which map out vast networks of people connected by various shared properties, such as education, location, interest, or profession. One way of mapping highly connectable data is by structuring data in the form of a graph, a design that has emerged in recent years as databases have evolved. The main progenitor of this data structure is Neo4j, which is far and away the leader in the field of graph databases, mobilized by a huge number of enterprises working with big data. Neo4j has cornered the market, and it’s not hard to see why—it offers a powerful solution with heavy commercial support for enterprise deployments. In truth there aren’t many alternatives out there, but alternatives exist. OrientDB is a hybrid graph document database that offers the unique flexibility of modeling data in the form of either documents, or graphs, while incorporating object-oriented programming as a way of encapsulating relationships. Again, it’s a great example of developers imagining ways in which we can accommodate the myriad of different data types, and relationships that connect it all together.
The real mystery of the Orient(DB) however, is the relatively low (visible) adoption of a database that offers both innovation, and reputedly staggering levels of performance (claims are that it can store up to 150,000 records a second). The question isn’t just why it hasn’t managed to dent a market essentially owned by Neo4j, but why, on its own merits, haven’t more developers opted for the database? The answer may in the end be vaguely related to the commercial drivers—outside of Europe it seems as if OrientDB has struggled to create the kind of traction that would push greater levels of adoption, or perhaps it is related to the considerable development and tuning of the project for use in production. Related to that, maybe OrientDB still has a way to go in terms of enterprise grade support for production. For sure it’s hard to say what the deciding factor is here. In many ways it’s a simple reiteration of the level of difficulty facing startups and new technologies endeavoring to acquire adoption, and that the road to this goal is typically a long one. Regardless, what both Neo4j and OrientDB are valuable for is adapting both familiar and unfamiliar programming concepts in order to reimagine the way we represent, model, and interpret connections in data, mapping the information of the world.