How to wipe out the DFS in Hadoop?

If you format only Namenode, it will remove the metadata stored by the Namenode, however all the temporary storage and Datanode blocks will still be there. To remove temporary storage and all the Datanode blocks you would need to delete the main Hadoop storage directory from every node. This directory is defined by the hadoop.tmp.dir property in your hdfs-site.xml.

First you would need to stop all the Hadoop processes in your Namenode. This can be done by running the default stop-all script which will also stop DFS:

  • On Linux – bin/stop-all.sh
  • On Windows – C:appsdistbinStopHadoop.cmd

Now you would need to delete all files in your main Hadoop storage based on your Hadoop. The storage directory is defined using Hadoop.tmp.dir parameter in hdfs-site.xml file. Be sure to perform this action on every machine in your cluster i.e Namenodes, JobTrackers, Datanodes etc.:

  • On Linux: hadoop dfs -rmr /
  • On Windows: 
    • hadoop fs -rmr (At Hadoop Command Shell)
    • #rmr (At Interactive JavaScript Shell)

At last you would need to reformat the namenode as below:

  • hadoop namenode -format

Finally you start your cluster again by running the following command which will startup DFS again:

  • On Linux: bin/start-all.sh
  • On Windows: C:appsdistbinStartHadoop.cmd
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s