Run a basic word count map reduce program to understand map reduce paradigm

Java code Create a .java file and paste following code

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }


  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Run this following command

export $HADOOP_CLASSPATH=$(hadoop classpath)

Create the data.txt file and paste this following code into it

my
name
is
ashwin
i
repeat
my
name
is ashwin

create input directory on hadoop

hadoop fs -mkdir /user/ashwin/input

upload this file to the hadoop

hadoop fs -put data.txt /user/ashwin/input

create output directory on hadoop

hadoop fs -mkdir /user/ashwin/output

Run java file

javac -classpath ${HADOOP_CLASSPATH} -d /<class_folder> /<path_java_file>

convert to .jar file

jat -cvf jar_file_name.jar -C /<class_folder> /<path_to_be_store>

Enter the command

hadoop jar jar_file_name.jar <Class_Name> /user/ashwin/input /user/ashwin/output

Final Output here

Run a basic word count map reduce program to understand map reduce paradigm

Comments

Big data Analytics

To study Apache Kafka Architecture in details, and how to install, deploy configure Apache kafka.

More from this blog

Analogy: React Redux in Web Application Architecture

Simplify Your State Management in React with the useReducer Hook

Unleash Your Coding Efficiency with Vim!

Navigating the Changing Web Development Landscape 🌐🚀

Exploring React Native: Your Path to Mobile App Development ✨

Command Palette

Comments

Big data Analytics

To study Apache Kafka Architecture in details, and how to install, deploy configure Apache kafka.

More from this blog