Using Big Data in Astronomy – A Sample MapReduce Project in Hadoop

A New Planet, starting as a disk

Since ancient times humans have wondered if the universe encompasses much more than the surroundings we can perceive through our limited senses: are there not only other continents, and peoples on Earth, but other planets revolving around their own suns?

In the past few decades the amount of data that is known to astronomers has exploded. Star catalogs have grown in size from Yale’s bright star catalog with 9,910 entries (assembled in the 1920s) to today’s USNO-B1.0 catalog with 109 objects – with an enormous potential to grow further with current missions such as ESA’s Gaia.
Planet counts are too exploding –from 50 extrasolar planets known in the year 2000, to 5,195 objects today and with thousands more to come, perhaps tens of thousands more, once new missions such as TESS are launched and the rest of NASA’s Kepler data is analyzed.

This YouTube short video is a two-minute presentation of my final project for Harvard’s HES CSCI-E63 (Big Data Analytics). A much more in detail presentation can be found here.

In my presentation I summarize how Map Reduce readily and easily allows us to take the many measurements that describe the empirical data and easily find out how common, for example, stars are like our sun, and how many Earth-like planets have been observed.

After I applied standard Map Reduce and Hadoop techniques to the Exoplanet Orbit database, and to the HYG stellar database, I see that bringing a large dataset into a tidy and insightful classification is particularly well suited to MapReduce –its application in the astronomical realm shows the power, and elegance of this method in astronomy and the physical sciences.

(New planet and its disk image courtesy of ESO)

Comments are closed