Working with Yum on CentOS

Searching/Listing package in List of packages:

List everything available to install from , a list of all packages

  • # yum list
  • # yum list | grep openssl

Example: list all packages starts with python

  • # yum list python*

Example: List of package names as openssl Install:

  • #yum list openssl

List all available versions of a package:

  • # yum –showduplicates list php
    • Available Packages php.x86_64 5.3.3-3.el6_1.3 base php.x86_64 5.3.3-3.el6_2.5 updates php.x86_64 5.3.3-3.el6_2.6 updates php.x86_64 5.3.3-3.el6_2.8 updates
  • # yum –showduplicates list python
    • Installed Packages python.x86_64 2.6.6-29.el6_2.2 @updates Available Packages python.i686 2.6.6-29.el6 base python.x86_64 2.6.6-29.el6 base python.i686 2.6.6-29.el6_2.2 updates python.x86_64 2.6.6-29.el6_2.2 updates

Searching into your Linux Box

Search a specific package in all installed packages in the box

  • # yum search <packagename>

For example search for openssl

  • # yum search openssl


Check if this file belongs which package

  • # yum provides nodejs
  • # yum provides */ssl


Find out the all locations where the package is installed:

  • # yum provides */nodejs


List the dependencies of a specific package

  • # yum deplist nodejs


Installing a package in your Linux Box:

Install Package in your machine:

  • # yum install <package_name>

Install a collection of Packages “group package” in your machine:

  • # yum groupinstall <group package name>

Updating a package in your Linux Box:

To update the list of YUM packages to the latest edition. This may take some time to get all the stuff updated.

  • # yum update
  • #yum groupupdate


Remove an installed package:

  • # yum remove <package name>
  • #yum groupremove <group package name>

Downgrade a package:

  • # yum install yum-allowdowngrade

List of software groups available to install

  • # yum grouplist
  • # yum grouplist | grep gn


List of available repolist for yum

  • #yum repolist all


Clean the package install cache:

  • # yum clean all


Other Yum Commands:

• yum-aliases
• yum-allowdowngrade
• yum-arch
• yum-basearchonly
• yum-changelog
• yum-cron
• yum-downloadonly
• yumex.noarch
• yum-fastestmirror
• yum-filter-data
• yum-noarch
• yum-kernel-module
• yum-kmod
• yum-list-data
• yum-merge-conf
• yum-metadata-parser
• yum-priorities
• yum-protectbase
• yum-protect-packages
• yum-refresh-updatesd
• yum-security
• yum-skip-broken
• yum-tsflags
• yum-updateonboot
• yum-updatesd
• yum-upgrade-helper
• yum-utils
• yum-versionlock


K-means clustering and comparison of K-means clustering algorithms in Python, Java and R

What is K-means Clustering:

The k-means algorithm is one of the simplest clustering techniques commonly used in data analytics.

Original Definition :

The k-means algorithm is an evolutionary algorithm that gains its name from its method of operation. The algorithm clusters observations into k groups, where k is provided as an input parameter. It then assigns each observation to clusters based upon the observation’s proximity to the mean of the cluster. The cluster’s mean is then recomputed and the process begins again.

A very Simple Example:

If you have a collection of dog and you would want to cluster them based on their gender then K =2 would be perfect.

A very complex Example:

If you have the same collection of dog and you would want to cluster them based on their size you could put K=5 (Extra Large, large, medium, small, extra small) however if you would use their family, you may be using K=50, may be cause you just only know 50 types of dog families and this may not be suitable scenario.

 K-means clustering algorithms in Python:

import sys
from math import fabs
from org.apache.pig.scripting import Pig

filename = “student.txt”
k = 4
tolerance = 0.01


# initial centroid, equally divide the space
initial_centroids = “”
last_centroids = [None] * k
for i in range(k):
last_centroids[i] = MIN_SCORE + float(i)/k*(MAX_SCORE-MIN_SCORE)
initial_centroids = initial_centroids + str(last_centroids[i])
if i!=k-1:
initial_centroids = initial_centroids + “:”

P = Pig.compile(“””register udf.jar
DEFINE find_centroid FindCentroid(‘$centroids’);
raw = load ‘student.txt’ as (name:chararray, age:int, gpa:double);
centroided = foreach raw generate gpa, find_centroid(gpa) as centroid;
grouped = group centroided by centroid;
result = foreach grouped generate group, AVG(centroided.gpa);
store result into ‘output’;

converged = False
iter_num = 0
while iter_num<MAX_ITERATION:
Q = P.bind({‘centroids’:initial_centroids})
results = Q.runSingle()
if results.isSuccessful() == “FAILED”:
raise “Pig job failed”
iter = results.result(“result”).iterator()
centroids = [None] * k
distance_move = 0
# get new centroid of this iteration, caculate the moving distance with last iteration
for i in range(k):
tuple =
centroids[i] = float(str(tuple.get(1)))
distance_move = distance_move + fabs(last_centroids[i]-centroids[i])
distance_move = distance_move / k;
Pig.fs(“rmr output”)
print(“iteration ” + str(iter_num))
print(“average distance moved: ” + str(distance_move))
if distance_move<tolerance:
sys.stdout.write(“k-means converged at centroids: [“)
sys.stdout.write(“,”.join(str(v) for v in centroids))
converged = True
last_centroids = centroids[:]
initial_centroids = “”
for i in range(k):
initial_centroids = initial_centroids + str(last_centroids[i])
if i!=k-1:
initial_centroids = initial_centroids + “:”
iter_num += 1

if not converged:
print(“not converge after ” + str(iter_num) + ” iterations”)
sys.stdout.write(“last centroids: [“)
sys.stdout.write(“,”.join(str(v) for v in last_centroids))


K-means clustering algorithms in Java:



import org.apache.pig.EvalFunc;
public class FindCentroid extends EvalFunc<Double> {
double[] centroids;
public FindCentroid(String initialCentroid) {
String[] centroidStrings = initialCentroid.split(“:”);
centroids = new double[centroidStrings.length];
for (int i=0;i<centroidStrings.length;i++)
centroids[i] = Double.parseDouble(centroidStrings[i]);
public Double exec(Tuple input) throws IOException {
double min_distance = Double.MAX_VALUE;
double closest_centroid = 0;
for (double centroid : centroids) {
double distance = Math.abs(centroid – (Double)input.get(0));
if (distance < min_distance) {
min_distance = distance;
closest_centroid = centroid;
return closest_centroid;



K-means clustering algorithms in R:

kmeans =
function(points, ncenters, iterations = 10, distfun = NULL) {
distfun =
function(a,b) norm(as.matrix(a-b), type = ‘F’)
newCenters =
ncenters = ncenters)
for(i in 1:iterations) {
newCenters = kmeans.iter(points, distfun, centers = newCenters)}

kmeans.iter =
function(points, distfun, ncenters = dim(centers)[1], centers = NULL) {
from.dfs(mapreduce(input = points,
map =
if (is.null(centers)) {
function(k,v) keyval(sample(1:ncenters,1),v)}
else {
function(k,v) {
distances = apply(centers, 1, function(c) distfun(c,v))
keyval(centers[which.min(distances),], v)}},
reduce = function(k,vv) keyval(NULL, apply(, vv), 2, mean))), = T)}