Bachelor and Master Theses

Title: NOISY BIG DATA CLASSIFICATION USING MAPREDUCE DISTRIBUTED FUZZY RANDOM FOREST
Subject:
Level: Advanced
Description: Every day huge amounts of data is generated. We need to make sense of this data and one of the approaches is data classification. But data contains noise in most of the cases so this is a problem too. Many solutions exist to deal with it, but the most common modern approaches use the machine learning techniques to enable computers to learn from data and later apply that knowledge to the yet unseen examples. This is called supervised learning. State of the art results in this field set very high bar mostly due to most recent successes of the so neural network models that seem to be highly accurate in classifying. In some areas they achieve results that are better than human. But the problem with these state of the art models is that they are not interpretable by human readers and in domains where such property is desirable they fall short of the other more interpretable methods. Fuzzy logic approach deals with both interpretability and noise problem. More specifically, we modify the fuzzy random forest classifier to make it much faster while maintaining state of the art level accuracy. We call our version Hihgly random fuzzy forest. This speedup enables us to distribute the algorithm on the cluster of computers that can process much bigger amounts of data than originally possible.
Start date: 2017-01-20
Prel. end date: 2017-05-17
Presentation date: 2017-06-12
Student: Faruk Mustafic fmc16001@student.mdh.se
IDT supervisor: Ning Xiong
ning.xiong@mdh.se, +46-21-151716
Examinator: Shahina Begum
Shahina Begum
shahina.begum@mdh.se, +46-21-107370

Rapport och bilagor

Size

Senaste uppdatering

TR2072.pdf

1173159

2017-05-17, 14:32


  • Mälardalen University |
  • Box 883 |
  • 721 23 Västerås/Eskilstuna |
  • 021-101300, 016-153600 |
  • webmaster |
  • Latest update: 2018.03.15