Big Data Platform upgraded to Hadoop 3

The Big Data platform has been upgraded to Hadoop 3, including also the new Spark 2.4.

The Big Data service allows not only processing in parallel large volumes of information, but also collecting and processing streaming data and using Jupyter Notebooks for exploration and visualization tasks.

The service is based in the last version of Cloudera’s Hadoop distribution, CDH6, based on Hadoop 3 which offers a stable solution that included numerous components of the Hadoop ecosystem: YARN, HDFS, MapReduce, Spark, Flume, Hive, Impala, HBase, …

The upgraded platform is accessible using SSH to **hadoop3.cesga.es**, while the old one based on Hadoop 2 will continue to be accessible through hadoop.cesga.es. To ease the transition, the old platform will remain active until June 30, when it will be shutdown.

In the new User Guide that has been prepared there is a section explaining how to migrate the data from HDFS: http://bigdata.cesga.es/user-guide/migrating_data.html

It is not neccessary to migrate the data in the HOME filesystem since the data in the old platform have been automatically migrated from GlusterFS to the new HOME.

For more information about the platform, including tutorials to quickly learn the different tools, we recommend you to access the following portal that isdevoted to the platform: https://bigdata.cesga.es

This portal included a web interface (WebUI) that allows to perform most common tasks using the browser. We also recommend you to read the new user guide that we have prepared: http://bigdata.cesga.es/user-guide

NOTE: In some browsers, if you have previously accessed the portal or the WebUI, you will have to clean your browser’s cache to visualize the new version.

To share this story, choose any platform