Apache Mahout

                   databases_examaples

What is Apache Mahout?

A mahout is one who drives an elephant as its master. The name comes from its close association with Apache Hadoop which uses an elephant as its logo.

Before Mahout it was difficult to perform machine learning techniques quickly on a large scale data. Mahout uses Hadoop to break the problems into multiple paralel tasks on which ML techniques are able to perform quickly.

Apache Mahout is an open source project that is primarily used for creating scalable machine learning algorithms. It implements popular machine learning techniques such as:

  • Recommendation
  • Classification
  • Clustering

Reccomendation uses USER INFORMATION + COMMUNITY INFORMATION = RECOMMENDATION formula to determine if a single USER is going to prefer certain product. Such reccomendation engine is Netflix recommendation engine.

Classification, also known as categorization, is a machine learning technique that uses known data to determine how the new data should be classified into a set of existing categories. Classification is a form of supervised learning.

Clustering is used to form groups or clusters of similar data based on common characteristics. Clustering is a form of unsupervised learning.

Core Algorithms

  1. Collaborative Filtering
    • User- and Item-based recommenders
  2. Classification
    • Logistic regression classifier
    • Hidden Markov models classifier
    • Random forest classifier
  3. Clustering
    • Canopy clustering
    • K-means and Fuzzy k-means clustering
    • Spectral clustering
  4. Other Algorithms
    • Dimensionality Reduction
    • Topic Models
    • Frequent Pattern Mining
  5. Taste Collaborative Filtering
    • Taste recommender framework
    • Donated by Sean Owen