I learned about some intriguing new technology recently that drives the data backbone for Yahoo and Google. Highly parallel distributed computing, based on functional programming techniques of recursion.
http://labs.google.com/papers/mapreduce.html
There is also an open source version.
http://wiki.apache.org/hadoop/HadoopMapReduce
I have always been intrigued about the advances in crawlers for unstructure data such as web content, but nothing exists to index and crawl all the structure data that exists within corporate databases or spreadsheets and link it semantically for search and retrieval.
It would be pretty powerful to have an engine that can "search" for all the versions of sales in the south and present it to a user. It nothing else it would drive visibility into the data profliferation and characterize the one version of the truth problem that exists within most corporations.
It has brought to life a topic I spent hours contemplating as part of my masters thesis that I have recently woefully realized has atrophied as I moved on from tackling technical challenges to people and organizational challenges. It might be time to reignite thoughts about the problem and see what else has changed over the last 9 years.
Saturday, July 12, 2008
Subscribe to:
Posts (Atom)