A list of error message while processing PIG commands with Hadoop MapReduce

A list of possible error messages while processing PIG Command with Hadoop MapReduce is listed below. This list is not comprehensive and will be modified to reflect the true error message along with the error code:

– ||”’Error Code”’ ||”’Error Message”’ ||”’How to Handle”’ ||
– ||1000 ||Error during parsing ||
– ||1001 ||Unable to descirbe schema for alias <alias> ||
– ||1002 ||Unable to store alias <id> ||
– ||1003 ||Unable to find an operator for alias <alias> ||
– ||1004 ||No alias <alias> to <operation> ||
– ||1005 ||No plan for <alias> to <operation> ||
– ||1006 ||Could not find operator in plan ||
– ||1007 ||Found duplicates in schema. <list of duplicate column names> . Please alias the columns with unique names. ||
– ||1008 ||Expected a bag with a single element of type tuple but got a bag schema with multiple elements ||
– ||1009 ||Expected a bag with a single element of type tuple but got an element of type <type> ||
– ||1010 ||getAtomicGroupByType is used only when dealing with atomic <group/join> col ||
– ||1011 ||getTupleGroupBySchema is used only when dealing with <tuple/join> group col ||
– ||1012 ||Each <COGroup/join> input has to have the same number of inner plans ||
– ||1013 ||attributes can either be star (*) or a list of expressions, but not both. ||
– ||1014 ||Problem with input <operator> of User-defined function: <function> ||
– ||1015 ||Error determining fieldschema of constant: <constant> ||
– ||1016 ||Problems in merging user defined schema ||
– ||1017 ||Schema mismatch. A basic type on flattening cannot have more than one column. User defined schema: <schema> ||
– ||1018 ||Problem determining schema during load ||
– ||1019 ||Unable to merge schemas ||
– ||1020 ||Only a BAG or TUPLE can have schemas. Got <type> ||
– ||1021 ||Type mismatch. No useful type for merging. Field Schema: <field schema>. Other Fileld Schema: + otherFs ||
– ||1022 ||Type mismatch. Field Schema: <field schema>. Other Fileld Schema: + otherFs ||
– ||1023 ||Unable to create field schema ||
– ||1024 ||Found duplicate aliases: <alias> ||
– ||1025 ||Found more than one match: <list of aliases> ||
– ||1026 ||Attempt to fetch field: <field> from schema of size <size> ||
– ||1027 ||Cannot reconcile schemas with different sizes. This schema has size <size> other has size of <size> ||
– ||1028 ||Access to the tuple <alias> of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed. ||
– ||1029 ||One of the schemas is null for merging schemas. Schema: <schema> Other schema: <schema> ||
– ||1030 ||Different schema sizes for merging schemas. Schema size: <size> Other schema size: <size> ||
– ||1031 ||Incompatible types for merging schemas. Field schema type: <type> Other field schema type: <type> ||
– ||1032 ||Incompatible inner schemas for merging schemas. Field schema: <schema> Other field schema: <schema> ||
– ||1033 ||Schema size mismatch for merging schemas. Other schema size greater than schema size. Schema: <schema>. Other schema: <schema> ||
– ||1034 ||TypeCastInserter invoked with an invalid operator class name: <operator class name> ||
– ||1035 ||Error getting LOProject’s input schema ||
– ||1036 ||Map key should be a basic type ||
– ||1037 ||Operand of Regex can be CharArray only ||
– ||1038 ||Operands of AND/OR can be boolean only ||
– ||1039 ||Incompatible types in <Addition/Subtraction/Division/Multiplication/Mod/GreaterThan/LesserThan/> operator. left hand side: <type> right hand size: type ||
– ||1040 ||Could not set <Add/Subtract/Multiply/Divide/Mod/UserFunc/BinCond> field schema ||
– ||1041 ||NEG can be used with numbers or Bytearray only ||
– ||1042 ||NOT can be used with boolean only ||
– ||1043 ||Unable to retrieve field schema of operator. ||
– ||1044 ||Unable to get list of overloaded methods. ||
– ||1045 ||Could not infer the matching function for <func spec> as multiple or none of them fit. Please use an explicit cast. ||
– ||1046 ||Multiple matching functions for <funcspec> with input schemas: ( <schema> , <schema>). Please use an explicit cast. ||
– ||1047 ||Condition in BinCond must be boolean ||
– ||1048 ||Two inputs of BinCond must have compatible schemas ||
– ||1049 ||Problem during evaluaton of BinCond output type ||
– ||1050 ||Unsupported input type for BinCond: lhs = <type>; rhs = <type> ||
– ||1051 ||Cannot cast to bytearray ||
– ||1052 ||Cannot cast <type> [with schema <schema>] to <type> with schema <schema> ||
– ||1053 ||Cannot resolve load function to use for casting from <type> to <type> ||
– ||1054 ||Cannot merge schemas from inputs of UNION ||
– ||1055 ||Problem while reading schemas from inputs of <Union/SplitOutput/Distinct/Limit/Cross> ||
– ||1056 ||Problem while casting inputs of Union ||
– ||1057 ||’s inner plan can only have one output (leaf) ||
– ||1058 ||Split’s condition must evaluate to boolean. Found: <type> ||
– ||1059 ||Problem while reconciling output schema of <Sort/Filter/Split> ||
– ||1060 ||Cannot resolve <COGroup/Foreach/Fragment Replicate Join> output schema ||
– ||1061 ||Sorry, group by complex types will be supported soon ||
– ||1062 ||COGroup by incompatible types ||
– ||1063 ||Problem while reading field schema from input while inserting cast ||
– ||1064 ||Problem reading column <col> from schema: <schema> ||
– ||1065 ||Found more than one load function to use: <list of load functions> ||
– ||1066 ||Unable to open iterator for alias <alias> ||
– ||1067 ||Unable to explain alias <alias> ||
– ||1068 ||Using <Map/Bag> as key not supported. ||
– ||1069 ||Problem resolving class version numbers for class <class> ||
– ||1070 ||Could not resolve <class> using imports: <package import list> ||
– ||1071 ||Cannot convert a <type> to <a/an> <type> ||
– ||1072 ||Out of bounds access: Request for field number <number> exceeds tuple size of <size> ||
– ||1073 ||Cannot determine field schema for <object> ||
– ||1074 ||Problem with formatting. Could not convert <object> to <Integer/Long/Float/Double>. ||
– ||1075 ||Received a bytearray from the UDF. Cannot determine how to convert the bytearray to <int/float/long/double/string/tuple/bag/map> ||
– ||1076 ||Problem while reading field schema of cast operator. ||
– ||1077 ||Two operators that require a cast in between are not adjacent. ||
– ||1078 ||Schema size mismatch for casting. Input schema size: <size>. Target schema size: <size> ||
– ||1079 ||Undefined type checking logic for unary operator: ” <operator> ||
– ||1080 ||Did not find inputs for operator: ” <operator> ||
– ||1081 ||Cannot cast to <int/float/long/double/string/tuple/bag/map>. Expected bytearray but received: <type> ||
– ||1082 ||Cogroups with more than 127 inputs not supported. ||
– ||1083 ||setBatchOn() must be called first. ||
– ||1084 ||Invalid Query: Query is null or of size 0. ||
– ||1085 ||operator in <pushBefore/pushAfter> is null. Cannot <pushBefore/pushAfter> null operators. ||
– ||1086 ||First operator in <pushBefore/pushAfter> should have multiple <inputs/outputs>. Found first operator with <size> <inputs/outputs>. ||
– ||1087 ||The <inputNum/outputNum> <num> should be lesser than the number of <inputs/outputs> of the first operator. Found first operator with <size> <inputs/outputs>. ||
– ||1088 ||operator in <pushBefore/pushAfter> should have <at least> one <output/input>. Found <first/second> operator with <no/<size> > <outputs/inputs>. ||
– ||1089 ||Second operator in <pushBefore/pushAfter> should be the <successor/predecessor> of the First operator. ||
– ||1090 ||Second operator can have at most one <incoming/outgoing> edge from First operator. Found <num> edges. ||
– ||1091 ||First operator does not support multiple <outputs/inputs>. On completing the <pushBefore/pushAfter> operation First operator will end up with <num> edges ||
– ||1092 ||operator in swap is null. Cannot swap null operators. ||
– ||1093 ||Swap supports swap of operators with at most one <input/output>. Found <first/second> operator with <size> <inputs/outputs> ||
– ||1094 ||Attempt to insert between two nodes that were not connected. ||
– ||1095 ||Attempt to remove and reconnect for node with multiple <predecessors/successors>. ||
– ||1096 ||Attempt to remove and reconnect for node with <<size>/no> <predecessors/successors>. ||
– ||1097 ||Containing node cannot be null. ||
– ||1098 ||Node index cannot be negative. ||
– ||1099 ||Node to be replaced cannot be null. ||
– ||1100 ||Replacement node cannot be null. ||
– ||1101 ||Merge Join must have exactly two inputs. Found : + <size> + inputs ||
– ||1102 ||Data is not sorted on <left/right> side. Last two keys encountered were: <previous key>, <current key> ||
– ||1103 ||Merge join only supports Filter, Foreach and Load as its predecessor. Found : <operator> ||
– ||1104 ||Right input of merge-join must implement SamplableLoader interface. This loader doesn’t implement it. ||
– ||1105 ||Heap percentage / Conversion factor cannot be set to 0 ||
– ||1106 ||Merge join is possible only for simple column or ‘*’ join keys when using <funcspec> as the loader ||
– ||1107 ||Try to merge incompatible types (eg. numerical type vs non-numeircal type) ||
– ||1108 ||Duplicated schema ||
– ||1109 ||Input ( <input alias> ) on which outer join is desired should have a valid schema ||
– ||1110 ||Unsupported query: You have an partition column (<colname>) inside a <regexp operator/function/cast/null check operator/bincond operator> in the filter condition. ||
– ||1111 ||Use of partition column/condition with non partition column/condition in filter expression is not supported. ||
– ||1112 ||Unsupported query: You have an partition column (<column name>) in a construction like: (pcond and …) or (pcond and …) where pcond is a condition on a partition column. ||
– ||1113 ||Unable to describe schema for nested expression <alias> ||
– ||1114 ||Unable to find schema for nested alias <nested alias> ||
– ||1115 ||Place holder for Howl related errors ||
– ||1116 ||Duplicate udf script (in scripting language) ||
– ||1117 ||Cannot merge schema ||
– ||1118 ||Cannot convert bytes load from BinStorage ||
– ||1119 ||Cannot find LoadCaster class ||
– ||1120 ||Cannot cast complex data ||
– ||1121 ||Python error ||
– ||1122||The arity of cogroup/group by columns do not match||
– ||1123||Cogroup/Group by * is only allowed if the input has a schema||
– ||1124||Mismatch merging expression field schema .. with user specified schema ..||
– ||1125||Error determining field schema from object in constant expression”||
– ||1126||Schema having field with null alias cannot be merged using alias.||
– ||1127||Dereference index out of range in schema.||
– ||1128||Cannot find field dereference field in schema.||
– ||1129|| Referring to column(s) within a column of type .. is not allowed ||
– ||1130|| Datatype of i’th group/join column in j’th relation of statement is incompatible with corresponding column in other relations in the statement ||

– ||2000 ||Internal error. Mismatch in group by arities. Expected: <schema>. Found: <schema> ||
– ||2001 ||Unable to clone plan before compiling ||
– ||2002 ||The output file(s): <filename> already exists ||
– ||2003 ||Cannot read from the storage where the output <filename> will be stored ||
– ||2004 ||Internal error while trying to check if type casts are needed ||
– ||2005 ||Expected <class>, got <class> ||
– ||2006 ||TypeCastInserter invoked with an invalid operator class name: <class> ||
– ||2007 ||Unable to insert type casts into plan ||
– ||2008 ||cannot have more than one input. Found <n> inputs. ||
– ||2009 ||Can not move LOLimit up ||
– ||2010 ||LOFilter should have one input ||
– ||2011 ||Can not insert LOLimit clone ||
– ||2012 ||Can not remove LOLimit after <class> ||
– ||2013 ||Moving LOLimit in front of <class> is not implemented ||
– ||2014 ||Unable to optimize load-stream-store optimization ||
– ||2015 ||Invalid physical operators in the physical plan ||
– ||2016 ||Unable to obtain a temporary path. ||
– ||2017 ||Internal error creating job configuration. ||
– ||2018 ||Internal error. Unable to introduce the combiner for optimization. ||
– ||2019 ||Expected to find plan with single leaf. Found <n> leaves. ||
– ||2020 ||Expected to find plan with UDF leaf. Found <class> ||
– ||2021 ||Internal error. Unexpected operator project(*) in local rearrange inner plan. ||
– ||2022 ||Both map and reduce phases have been done. This is unexpected while compiling. ||
– ||2023 ||Received a multi input plan when expecting only a single input one. ||
– ||2024 ||Expected reduce to have single leaf. Found <n> leaves. ||
– ||2025 ||Expected leaf of reduce plan to always be POStore. Found <class> ||
– ||2026 ||No expression plan found in POSort. ||
– ||2027 ||Both map and reduce phases have been done. This is unexpected for a merge. ||
– ||2028 ||ForEach can only have one successor. Found <n> successors. ||
– ||2029 ||Error rewriting POJoinPackage. ||
– ||2030 ||Expected reduce plan leaf to have a single predecessor. Found <n> predecessors. ||
– ||2031 ||Found map reduce operator with POLocalRearrange as last oper but with no succesor. ||
– ||2032 ||Expected map reduce operator to have a single successor. Found <n> successors. ||
– ||2033 ||Problems in rearranging map reduce operators in plan. ||
– ||2034 ||Error compiling operator <class> ||
– ||2035 ||Internal error. Could not compute key type of sort operator. ||
– ||2036 ||Unhandled key type <type> ||
– ||2037 ||Invalid ship specification. File doesn’t exist: <file> ||
– ||2038 ||Unable to rename <oldName> to <newName> ||
– ||2039 ||Unable to copy <src> to <dst> ||
– ||2040 ||Unknown exec type: <type> ||
– ||2041 ||No Plan to compile ||
– ||2042 ||Internal error. Unable to translate logical plan to physical plan. ||
– ||2043 ||Unexpected error during execution. ||
– ||2044 ||The type <type> cannot be collected as a Key type ||
– ||2045 ||Internal error. Not able to check if the leaf node is a store operator. ||
– ||2046 ||Unable to create FileInputHandler. ||
– ||2047 ||Internal error. Unable to introduce split operators. ||
– ||2048 ||Error while performing checks to introduce split operators. ||
– ||2049 ||Error while performing checks to optimize limit operator. ||
– ||2050 ||Internal error. Unable to optimize limit operator. ||
– ||2051 ||Did not find a predecessor for <Distinct/Filter/Limit/Negative/Null/Sort/Split/Split Output/Store/Stream>. ||
– ||2052 ||Internal error. Cannot retrieve operator from null or empty list. ||
– ||2053 ||Internal error. Did not find roots in the physical plan. ||
– ||2054 ||Internal error. Could not convert <object> to <Integer/Long/Float/Double/Tuple/Bag/Map> ||
– ||2055 ||Did not find exception name to create exception from string: <string> ||
– ||2056 ||Cannot create exception from empty string. ||Pig could not find an exception in the error messages from Hadoop, examine the [[#clientSideLog|client log]] to find more information. ||
– ||2057 ||Did not find fully qualified method name to reconstruct stack trace: <line> ||
– ||2058 ||Unable to set index on the newly created POLocalRearrange. ||
– ||2059 ||Problem with inserting cast operator for <regular expression/binary conditional/unary operator/user defined function/fragment replicate join/cogroup/project/<operator>> in plan. ||
– ||2060 ||Expected one leaf. Found <n> leaves. ||
– ||2061 ||Expected single group by element but found multiple elements. ||
– ||2062 ||Each COGroup input has to have the same number of inner plans.” ||
– ||2063 ||Expected multiple group by element but found single element. ||
– ||2064 ||Unsupported root type in LOForEach: <operator> ||
– ||2065 ||Did not find roots of the inner plan. ||
– ||2066 ||Unsupported (root) operator in inner plan: <operator> ||
– ||2067 ||does not know how to handle type: <type> ||
– ||2068 ||Internal error. Improper use of method getColumn() in POProject ||
– ||2069 ||Error during map reduce compilation. Problem in accessing column from project operator. ||
– ||2070 ||Problem in accessing column from project operator. ||
– ||2071 ||Problem with setting up local rearrange’s plans. ||
– ||2072 ||Attempt to run a non-algebraic function as an algebraic function ||
– ||2073 ||Problem with replacing distinct operator with distinct built-in function. ||
– ||2074 ||Could not configure distinct’s algebraic functions in map reduce plan. ||
– ||2075 ||Could not set algebraic function type. ||
– ||2076 ||Unexpected Project-Distinct pair while trying to set up plans for use with combiner. ||
– ||2077 ||Problem with reconfiguring plan to add distinct built-in function. ||
– ||2078 ||Caught error from UDF: <class> [<message from UDF>] ||
– ||2079 ||Unexpected error while printing physical plan. ||
– ||2080 ||Foreach currently does not handle type <type> ||
– ||2081 ||Unable to setup the <load/store> function. ||
– ||2082 ||Did not expect result of type: <type> ||
– ||2083 ||Error while trying to get next result in POStream. ||
– ||2084 ||Error while running streaming binary. ||
– ||2085 ||Unexpected problem during optimization. Could not find LocalRearrange in combine plan. ||
– ||2086 ||Unexpected problem during optimization. Could not find all LocalRearrange operators. ||
– ||2087 ||Unexpected problem during optimization. Found index: <index> in multiple LocalRearrange operators. ||
– ||2088 ||Unable to get results for: <file specification> ||
– ||2089 ||Unable to flag project operator to use single tuple bag. ||
– ||2090 ||Received Error while processing the <combine/reduce> plan. ||
– ||2091 ||Packaging error while processing group. ||
– ||2092 ||No input paths specified in job. ||
– ||2093 ||Encountered error in package operator while processing group. ||
– ||2094 ||Unable to deserialize object ||
– ||2095 ||Did not get reduce key type from job configuration. ||
– ||2096 ||Unexpected class in SortPartitioner: <class name> ||
– ||2097 ||Failed to copy from: <src> to: <dst> ||
– ||2098 ||Invalid seek option: <options> ||
– ||2099 ||Problem in constructing slices. ||
– ||2100 ||does not exist. ||
– ||2101 ||should not be used for storing. ||
– ||2102 ||”Cannot test a <type> for emptiness. ||
– ||2103 ||Problem while computing <max/min/sum> of <doubles/floats/ints/longs/strings>. ||
– ||2104 ||Error while determining schema of <BinStorage data/input>. ||
– ||2105 ||Error while converting <int/long/float/double/chararray/tuple/bag/map> to bytes ||
– ||2106 ||Error while computing <arity/count/concat/min/max/sum/size> in <class name> ||
– ||2107 ||DIFF expected two inputs but received <n> inputs. ||
– ||2108 ||Could not determine data type of field: <object> ||
– ||2109 ||TextLoader does not support conversion <from/to> <Bag/Tuple/Map/Integer/Long/Float/Double>. ||
– ||2110 ||Unable to deserialize optimizer rules. ||
– ||2111 ||Unable to create temporary directory: <path> ||
– ||2112 ||Unexpected data while reading tuple from binary file. ||
– ||2113 ||SingleTupleBag should never be serialized or serialized. ||
– ||2114 ||Expected input to be chararray, but got <class name> ||
– ||2115 ||Internal error. Expected to throw exception from the backend. Did not find any exception to throw. ||
– ||2116 ||Unexpected error. Could not check for the existence of the file(s): <filename> ||
– ||2117 ||Unexpected error when launching map reduce job. ||
– ||2118 ||Unable to create input slice for: <filename> ||
– ||2119 ||Internal Error: Found multiple data types for map key ||
– ||2120 ||Internal Error: Unable to determine data type for map key ||
– ||2121 ||Error while calling finish method on UDFs. ||
– ||2122 ||Sum of probabilities should be one ||
– ||2123 ||Internal Error: Unable to discover required fields from the loads ||
– ||2124 ||Internal Error: Unexpected error creating field schema ||
– ||2125 ||Expected at most one predecessor of load ||
– ||2126 ||Predecessor of load should be store ||
– ||2127 ||Cloning of plan failed. ||
– ||2128 ||Failed to connect store with dependent load. ||
– ||2129 ||Internal Error. Unable to add store to the split plan for optimization. ||
– ||2130 ||Internal Error. Unable to merge split plans for optimization. ||
– ||2131 ||Internal Error. Unable to connect split plan for optimization. ||
– ||2132 ||Internal Error. Unable to replace store with split operator for optimization. ||
– ||2133 ||Internal Error. Unable to connect map plan with successors for optimization. ||
– ||2134 ||Internal Error. Unable to connect map plan with predecessors for optimization. ||
– ||2135 ||Received error from store function. ||
– ||2136 ||Internal Error. Unable to set multi-query index for optimization. ||
– ||2137 ||Internal Error. Unable to add demux to the plan as leaf for optimization. ||
– ||2138 ||Internal Error. Unable to connect package to local rearrange operator in pass-through combiner for optimization. ||
– ||2139 ||Invalid value type: <type>. Expected value type is DataBag. ||
– ||2140 ||Invalid package index: <index>. Should be in the range between 0 and <package array size>. ||
– ||2141 ||Internal Error. Cannot merge non-combiner with combiners for optimization. ||
– ||2142 ||ReadOnceBag should never be serialized. ||
– ||2143 ||Expected index value within POPackageLite is 0, but found ‘index’. ||
– ||2144 ||Problem while fixing project inputs during rewiring. ||
– ||2145 ||Problem while rebuilding schemas after transformation. ||
– ||2146 ||Internal Error. Inconsistency in key index found during optimization. ||
– ||2147 ||Error cloning POLocalRearrange for limit after sort. ||
– ||2148 ||Error cloning POPackageLite for limit after sort ||
– ||2149 ||Internal error while trying to check if filters can be pushed up. ||
– ||2150 ||Internal error. The push before input is not set. ||
– ||2151 ||Internal error while pushing filters up. ||
– ||2152 ||Internal error while trying to check if foreach with flatten can be pushed down. ||
– ||2153 ||Internal error. The mapping for the flattened columns is empty ||
– ||2154 ||Internal error. Schema of successor cannot be null for pushing down foreach with flatten. ||
– ||2155 ||Internal error while pushing foreach with flatten down. ||
– ||2156 ||Error while fixing projections. Projection map of node to be replaced is null. ||
– ||2157 ||Error while fixing projections. No mapping available in old predecessor to replace column. ||
– ||2158 ||Error during fixing projections. No mapping available in old predecessor for column to be replaced. ||
– ||2159 ||Error during fixing projections. Could not locate replacement column from the old predecessor. ||
– ||2160 ||Error during fixing projections. Projection map of new predecessor is null. ||
– ||2161 ||Error during fixing projections. No mapping available in new predecessor to replace column. ||
– ||2162 ||Error during fixing projections. Could not locate mapping for column <column> in new predecessor. ||
– ||2163 ||Error during fixing projections. Could not locate replacement column for column: <column> in the new predecessor. ||
– ||2164 ||Expected EOP as return status. Found: <returnStatus> ||
– ||2165 ||Problem in index construction. ||
– ||2166 ||Key type mismatch. Found key of type <type> on left side. But, found key of type <type> in index built for right side. ||
– ||2167 ||LocalRearrange used to extract keys from tuple isn’t configured correctly. ||
– ||2168 ||Expected physical plan with exactly one root and one leaf. ||
– ||2169 ||Physical operator preceding <right/left> predicate not found in compiled MR jobs. ||
– ||2170 ||Physical operator preceding both left and right predicate found to be same. This is not expected. ||
– ||2171 ||Expected one but found more then one root physical operator in physical plan. ||
– ||2172 ||Expected physical operator at root to be POLoad. Found : <PhysicalOperator> ||
– ||2173 ||One of the preceding compiled MR operator is null. This is not expected. ||
– ||2174 ||Internal exception. Could not create the sampler job. ||
– ||2175 ||Internal error. Could not retrieve file size for the sampler. ||
– ||2176 ||Error processing right input during merge join ||
– ||2177 ||Prune column optimization: Cannot retrieve operator from null or empty list ||
– ||2178 ||Prune column optimization: The matching node from the optimizor framework is null ||
– ||2179 ||Prune column optimization: Error while performing checks to prune columns. ||
– ||2180 ||Prune column optimization: Only LOForEach and LOSplit are expected ||
– ||2181 ||Prune column optimization: Unable to prune columns. ||
– ||2182 ||Prune column optimization: Only relational operator can be used in column prune optimization. ||
– ||2183 ||Prune column optimization: LOLoad must be the root logical operator. ||
– ||2184 ||Prune column optimization: Fields list inside RequiredFields is null. ||
– ||2185 ||Prune column optimization: Unable to prune columns. ||
– ||2186 ||Prune column optimization: Cannot locate node from successor ||
– ||2187 ||Column pruner: Cannot get predessors ||
– ||2188 ||Column pruner: Cannot prune columns ||
– ||2189 ||Column pruner: Expect schema ||
– ||2190 ||PruneColumns: Cannot find predecessors for logical operator ||
– ||2191 ||PruneColumns: No input to prune ||
– ||2192 ||PruneColumns: Column to prune does not exist ||
– ||2193 ||PruneColumns: Foreach can only have 1 predecessor ||
– ||2194 ||PruneColumns: Expect schema ||
– ||2195 ||PruneColumns: Fail to visit foreach inner plan ||
– ||2196 ||RelationalOperator: Exception when traversing inner plan ||
– ||2197 ||RelationalOperator: Cannot drop column which require * ||
– ||2198 ||LOLoad: load only take 1 input ||
– ||2199 ||LOLoad: schema mismatch ||
– ||2200 ||PruneColumns: Error getting top level project ||
– ||2201 ||Could not validate schema alias ||
– ||2202 ||Error change distinct/sort to use secondary key optimizer ||
– ||2203 ||Sort on columns from different inputs ||
– ||2204 ||Error setting secondary key plan ||
– ||2205 ||Error visiting POForEach inner plan ||
– ||2206 ||Error visiting POSort inner plan ||
– ||2207 ||POForEach inner plan has more than 1 root ||
– ||2208 ||Exception visiting foreach inner plan ||
– ||2209 ||Internal error while processing any partition filter conditions in the filter after the load ||
– ||2210 ||Internal Error in logical optimizer. ||
– ||2211 ||Column pruner: Unable to prune columns. ||
– ||2212 ||Unable to prune plan. ||
– ||2213 ||Error visiting inner plan for ForEach. ||
– ||2214 ||Cannot find POLocalRearrange to set secondary plan. ||
– ||2215 ||See more than 1 successors in the nested plan. ||
– ||2216 ||Cannot get field schema ||
– ||2217 ||Problem setFieldSchema ||
– ||2218 ||Invalid resource schema: bag schema must have tuple as its field ||
– ||2219 ||Attempt to disconnect operators which are not connected ||
– ||2220 ||Plan in inconssistent state, connected in fromEdges but not toEdges ||
– ||2221 ||No more walkers to pop ||
– ||2222 ||Expected LogicalExpressionVisitor to visit expression node ||
– ||2223 ||Expected LogicalPlanVisitor to visit relational node ||
– ||2224 ||Found LogicalExpressionPlan with more than one root ||
– ||2225 ||Projection with nothing to reference ||
– ||2226 ||Cannot fine reference for ProjectExpression ||
– ||2227 ||LogicalExpressionVisitor expects to visit expression plans ||
– ||2228 ||Could not find a related project Expression for Dereference ||
– ||2229 ||Couldn’t find matching uid for project expression ||
– ||2230 ||Cannot get column from project ||
– ||2231 ||Unable to set index on newly create POLocalRearrange ||
– ||2232 ||Cannot get schema ||
– ||2233 ||Cannot get predecessor ||
– ||2234 ||Cannot get group key schema ||
– ||2235 ||Expected an ArrayList of Expression Plans ||
– ||2236 ||User defined load function should implement the LoadFunc interface ||
– ||2237 ||Unsupported operator in inner plan ||
– ||2238 ||Expected list of expression plans ||
– ||2239 ||Structure of schema change ||
– ||2240 ||LogicalPlanVisitor can only visit logical plan ||
– ||2241 ||UID is not found in the schema ||
– ||2242 ||TypeCastInserter invoked with an invalid operator ||
– ||2243 ||Attempt to remove operator that is still connected to other operators ||
– ||2244 ||Hadoop does not return any error message ||
– ||2245 ||Cannot get schema from loadFunc ||
– ||2246 ||Error merging schema ||
– ||2247 ||Cannot determine skewed join schema ||
– ||2248 ||twoLevelAccessRequired==true is not supported with” +”and isSubNameMatch==true. ||
– ||2249 ||While using ‘collected’ on group; data must be loaded via loader implementing CollectableLoadFunc. ||
– ||2250 ||Blocking operators are not allowed before Collected Group. Consider dropping using ‘collected’. ||
– ||2251 ||Merge Cogroup work on two or more relations. To use map-side group-by on single relation, use ‘collected’ qualifier. ||
– ||2252 ||Base loader in Cogroup must implement CollectableLoadFunc. ||
– ||2253 ||Side loaders in cogroup must implement IndexableLoadFunc. ||
– ||2254 ||Currently merged cogroup is not supported after blocking operators. ||
– ||2255 ||POSkewedJoin operator has ” + compiledInputs.length + ” inputs. It should have 2. ||
– ||2256 ||Cannot remove and reconnect node with multiple inputs/outputs ||
– ||2257 ||An unexpected exception caused the validation to stop ||
– ||2258 ||Bug:Two different load functions mapped to an LOCast op ||
– ||2259 ||Cannot instantiate class ||
– ||2260||in split only one of input/output schema is null||
– ||2261||input and output schema size of split differ||
– ||2262||uid mapped to two different load functions ||
– ||2263||expected only one predecessor||
– ||2264||more than one project star as leaf in plan||
– ||2265||Schema not expected for project-star||
– ||2266||Expected single LOGenerate output in innerplan of foreach||
– ||2267||reset on schema at pos greater than schema size||
– ||2268||More than one input found for scalar expression||
– ||2269||No input found for scalar expression||



– ||2997 ||Encountered IOException. ||
– ||2998 ||Unexpected internal error. ||
– ||2999 ||Unhandled internal error. ||
– ||3000 ||IOException caught while compiling POMergeJoin ||
– ||4000 ||The output file(s): <filename> already exists ||
– ||4001 ||Cannot read from the storage where the output <filename> will be stored ||
– ||4002 ||Can’t read jar file: <name> ||
– ||4003 ||Unable to obtain a temporary path. ||
– ||4004 ||Invalid ship specification. File doesn’t exist: <file> ||
– ||4005 ||Unable to rename <oldName> to <newName> ||
– ||4006 ||Unable to copy <src> to <dst> ||
– ||4007 ||Missing <parameter> from hadoop configuration ||
– ||4008 ||Failed to create local hadoop file <file> ||
– ||4009 ||Failed to copy data to local hadoop file <file> ||
– ||6000 ||The output file(s): <filename> already exists ||
– ||6001 ||Cannot read from the storage where the output <filename> will be stored ||
– ||6002 ||Unable to obtain a temporary path. ||
– ||6003 ||Invalid cache specification. File doesn’t exist: <file> ||
– ||6004 ||Invalid ship specification. File doesn’t exist: <file> ||
– ||6005 ||Unable to rename <oldName> to <newName> ||
– ||6006 ||Unable to copy <src> to <dst> ||
– ||6007 ||Unable to check name <name> ||
– ||6008 ||Failed to obtain glob for <pattern> ||
– ||6009 ||Failed to create job client ||
– ||6010 ||Could not connect to HOD ||
– ||6011 ||Failed to run command <command> on server <server>; return code: <code>; error: <error message> ||
– ||6012 ||Unable to run command: <command> on server <server> ||
– ||6013 ||Unable to chmod <executable> . Thread interrupted. ||
– ||6014 ||Failed to save secondary output ‘<fileName>’ of task: <taskId> ||
– ||6015 ||During execution, encountered a Hadoop error. ||
– ||6016 ||Out of memory. ||
– ||6017 ||Execution failed, while processing ‘<fileNames’> ||
– ||6018 ||Error while reading input ||

Advertisement

Which one to choose between Pig and Hive?

Technically they both will do the job, you are looking from “either hive or Pig” perspective, means you don’t know what you are doing yet. However if you first define the data source, scope and the result representation and then look for which one to choose between Hive or Pig, you will find they are different for your job now and choosing one instead of other will have extra benefits. At last both Hive and Pig can be extended with UDFs and UDAFs to make them look again same at the end so now you can think again which one was best.

For a person with roots in database & SQL, Hive is the best however for script kids or programmer, Pig has close resemblance.

Hive provides SQL like interface and relational model to your data, and if your data really unstructured, PIG is better choice. If you look at definition of a proper schema in HIVE which makes it closer in concept to RDBMS. You can also say that In Hive you write SQL, in Pig you execute a sequence of plans. Both Pig and Hive are abstractions on top of MapReduce, so for control and performance you would really need to use MapReduce.  You can start with Pig and use MapReduce when you really want to go deeper.

Resources:

Running Apache Pig (Pig Latin) at Apache Hadoop on Windows Azure

Microsoft Distribution of Apache Hadoop comes with Pig Support along with an Interactive JavaScript shell where users can run their Pig queries immediately without adding specific configuration. The Apache distribution running on Windows Azure has built in support to Apache Pig.

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

At the present time, Pig’s infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Pig’s language layer currently consists of a textual language called Pig Latin, which has the following key properties:

  • Ease of programming: It is trivial to achieve parallel execution of simple, “embarrassingly parallel” data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain.
  • Optimization opportunities: The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency.
  • Extensibility: Users can create their own functions to do special-purpose processing.

Apache Pig has two execution modes or exectypes:

  • Local Mode: – To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local).

Example:

$ pig -x local

$ pig

  • Mapreduce Mode: – To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, but don’t need to, specify it using the -x flag (pig OR pig -x mapreduce).

Example:

$ pig -x mapreduce

You can run Pig in either mode using the “pig” command (the bin/pig Perl script) or the “java” command (java -cp pig.jar …). To learn more about Apache Pig please click here.

After you have configured your Hadoop Cluster on Windows Azure, you can remote login to your Hadoop Cluster.  To run Pig scripts you can copy sample pig files at the C:Appsdistpig folder from the link here:

 

 

 

Now, you can launch the Hadoop Command Line Shortcut and run the command as below:

cd c:appsdistexamplespig

   hadoop fs -copyFromLocal excite.log.bz2 excite.log.bz2

C:Appsdistpig>pig

    grunt> run script1-hadoop.pig

Once the Job has started you can see the job details at Job Tracker (http://localhost:50030/jobtracker.jsp)

 

script1-hadoop.pig:

 

/*

* Licensed to the Apache Software Foundation (ASF) under one

* or more contributor license agreements.  See the NOTICE file

* distributed with this work for additional information

* regarding copyright ownership.  The ASF licenses this file

* to you under the Apache License, Version 2.0 (the

* “License”); you may not use this file except in compliance

* with the License.  You may obtain a copy of the License at

*

*     http://www.apache.org/licenses/LICENSE-2.0

*

* Unless required by applicable law or agreed to in writing, software

* distributed under the License is distributed on an “AS IS” BASIS,

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

* See the License for the specific language governing permissions and

* limitations under the License.

*/

 

— Query Phrase Popularity (Hadoop cluster)

 

— This script processes a search query log file from the Excite search engine and finds search phrases that occur with particular high frequency during certain times of the day.

 

 

— Register the tutorial JAR file so that the included UDFs can be called in the script.

REGISTER ./tutorial.jar;

 

— Use the  PigStorage function to load the excite log file into the “raw” bag as an array of records.

— Input: (user,time,query)

raw = LOAD ‘excite.log.bz2’ USING PigStorage(‘t’) AS (user, time, query);

 

 

— Call the NonURLDetector UDF to remove records if the query field is empty or a URL.

clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(query);

 

— Call the ToLower UDF to change the query field to lowercase.

clean2 = FOREACH clean1 GENERATE user, time, org.apache.pig.tutorial.ToLower(query) as query;

 

— Because the log file only contains queries for a single day, we are only interested in the hour.

— The excite query log timestamp format is YYMMDDHHMMSS.

— Call the ExtractHour UDF to extract the hour (HH) from the time field.

houred = FOREACH clean2 GENERATE user, org.apache.pig.tutorial.ExtractHour(time) as hour, query;

 

— Call the NGramGenerator UDF to compose the n-grams of the query.

ngramed1 = FOREACH houred GENERATE user, hour, flatten(org.apache.pig.tutorial.NGramGenerator(query)) as ngram;

 

— Use the  DISTINCT command to get the unique n-grams for all records.

ngramed2 = DISTINCT ngramed1;

 

— Use the  GROUP command to group records by n-gram and hour.

hour_frequency1 = GROUP ngramed2 BY (ngram, hour);

 

— Use the  COUNT function to get the count (occurrences) of each n-gram.

hour_frequency2 = FOREACH hour_frequency1 GENERATE flatten($0), COUNT($1) as count;

 

— Use the  GROUP command to group records by n-gram only.

— Each group now corresponds to a distinct n-gram and has the count for each hour.

uniq_frequency1 = GROUP hour_frequency2 BY group::ngram;

 

— For each group, identify the hour in which this n-gram is used with a particularly high frequency.

— Call the ScoreGenerator UDF to calculate a “popularity” score for the n-gram.

uniq_frequency2 = FOREACH uniq_frequency1 GENERATE flatten($0), flatten(org.apache.pig.tutorial.ScoreGenerator($1));

 

— Use the  FOREACH-GENERATE command to assign names to the fields.

uniq_frequency3 = FOREACH uniq_frequency2 GENERATE $1 as hour, $0 as ngram, $2 as score, $3 as count, $4 as mean;

 

— Use the  FILTER command to move all records with a score less than or equal to 2.0.

filtered_uniq_frequency = FILTER uniq_frequency3 BY score > 2.0;

 

— Use the  ORDER command to sort the remaining records by hour and score.

ordered_uniq_frequency = ORDER filtered_uniq_frequency BY hour, score;

 

— Use the  PigStorage function to store the results.

— Output: (hour, n-gram, score, count, average_counts_among_all_hours)

STORE ordered_uniq_frequency INTO ‘script1-hadoop-results’ USING PigStorage();

 

 

The full output of the Job as below:

 

c:Appsdistpig>pig

2012-01-10 07:22:23,273 [main] INFO  org.apache.pig.Main – Logging error messages to: c:Appsdistpigpig_1326180143273.log

2012-01-10 07:22:23,695 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://10.2

8.202.165:9000

2012-01-10 07:22:24,070 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to map-reduce job tracker at: 10.28.2

02.165:9010

grunt> run script1-hadoop.pig

grunt> /*

grunt>  * Licensed to the Apache Software Foundation (ASF) under one

grunt>  * or more contributor license agreements.  See the NOTICE file

grunt>  * distributed with this work for additional information

grunt>  * regarding copyright ownership.  The ASF licenses this file

grunt>  * to you under the Apache License, Version 2.0 (the

grunt>  * “License”); you may not use this file except in compliance

grunt>  * with the License.  You may obtain a copy of the License at

grunt>  *

grunt>  *     http://www.apache.org/licenses/LICENSE-2.0

grunt>  *

grunt>  * Unless required by applicable law or agreed to in writing, software

grunt>  * distributed under the License is distributed on an “AS IS” BASIS,

grunt>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

grunt>  * See the License for the specific language governing permissions and

grunt>  * limitations under the License.

grunt>  */

grunt>

grunt> — Query Phrase Popularity (Hadoop cluster)

grunt>

grunt> — This script processes a search query log file from the Excite search engine and finds search phrases that occur with particular high frequen

cy during certain times of the day.

grunt>

grunt>

grunt> — Register the tutorial JAR file so that the included UDFs can be called in the script.

grunt> REGISTER ./tutorial.jar;

grunt>

grunt> — Use the  PigStorage function to load the excite log file into the ôrawö bag as an array of records.

grunt> — Input: (user,time,query)

grunt> raw = LOAD ‘excite.log.bz2’ USING PigStorage(‘t’) AS (user, time, query);

grunt>

grunt>

grunt> — Call the NonURLDetector UDF to remove records if the query field is empty or a URL.

grunt> clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(query);

grunt>

grunt> — Call the ToLower UDF to change the query field to lowercase.

grunt> clean2 = FOREACH clean1 GENERATE user, time, org.apache.pig.tutorial.ToLower(query) as query;

grunt>

grunt> — Because the log file only contains queries for a single day, we are only interested in the hour.

grunt> — The excite query log timestamp format is YYMMDDHHMMSS.

grunt> — Call the ExtractHour UDF to extract the hour (HH) from the time field.

grunt> houred = FOREACH clean2 GENERATE user, org.apache.pig.tutorial.ExtractHour(time) as hour, query;

grunt>

grunt> — Call the NGramGenerator UDF to compose the n-grams of the query.

grunt> ngramed1 = FOREACH houred GENERATE user, hour, flatten(org.apache.pig.tutorial.NGramGenerator(query)) as ngram;

grunt>

grunt> — Use the  DISTINCT command to get the unique n-grams for all records.

grunt> ngramed2 = DISTINCT ngramed1;

grunt>

grunt> — Use the  GROUP command to group records by n-gram and hour.

grunt> hour_frequency1 = GROUP ngramed2 BY (ngram, hour);

grunt>

grunt> — Use the  COUNT function to get the count (occurrences) of each n-gram.

grunt> hour_frequency2 = FOREACH hour_frequency1 GENERATE flatten($0), COUNT($1) as count;

grunt>

grunt> — Use the  GROUP command to group records by n-gram only.

grunt> — Each group now corresponds to a distinct n-gram and has the count for each hour.

grunt> uniq_frequency1 = GROUP hour_frequency2 BY group::ngram;

grunt>

grunt> — For each group, identify the hour in which this n-gram is used with a particularly high frequency.

grunt> — Call the ScoreGenerator UDF to calculate a “popularity” score for the n-gram.

grunt> uniq_frequency2 = FOREACH uniq_frequency1 GENERATE flatten($0), flatten(org.apache.pig.tutorial.ScoreGenerator($1));

grunt>

grunt> — Use the  FOREACH-GENERATE command to assign names to the fields.

grunt> uniq_frequency3 = FOREACH uniq_frequency2 GENERATE $1 as hour, $0 as ngram, $2 as score, $3 as count, $4 as mean;

grunt>

grunt> — Use the  FILTER command to move all records with a score less than or equal to 2.0.

grunt> filtered_uniq_frequency = FILTER uniq_frequency3 BY score > 2.0;

grunt>

grunt> — Use the  ORDER command to sort the remaining records by hour and score.

grunt> ordered_uniq_frequency = ORDER filtered_uniq_frequency BY hour, score;

grunt>

grunt> — Use the  PigStorage function to store the results.

grunt> — Output: (hour, n-gram, score, count, average_counts_among_all_hours)

grunt> STORE ordered_uniq_frequency INTO ‘script1-hadoop-results’ USING PigStorage();

2012-01-10 07:22:48,614 [main] WARN  org.apache.pig.PigServer – Encountered Warning USING_OVERLOADED_FUNCTION 3 time(s).

2012-01-10 07:22:48,614 [main] INFO  org.apache.pig.tools.pigstats.ScriptState – Pig features used in the script: GROUP_BY,ORDER_BY,DISTINCT,FILTER

2012-01-10 07:22:48,614 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – pig.usenewlogicalplan is set to true. New logica

l plan will be used.

2012-01-10 07:22:48,958 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – (Name: ordered_uniq_frequency: Store(hdfs://10.2

8.202.165:9000/user/avkash/script1-hadoop-results:PigStorage) – scope-71 Operator Key: scope-71)

2012-01-10 07:22:48,989 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler – File concatenation threshold: 100 optim

istic? false

2012-01-10 07:22:49,083 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer – Choosing to move algebraic forea

ch to combiner

2012-01-10 07:22:49,192 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer – MR plan size before optimizati

on: 5

2012-01-10 07:22:49,192 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer – MR plan size after optimizatio

n: 5

2012-01-10 07:22:49,349 [main] INFO  org.apache.pig.tools.pigstats.ScriptState – Pig script settings are added to the job

2012-01-10 07:22:49,364 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler – mapred.job.reduce.markreset.buf

fer.percent is not set, set to default 0.3

2012-01-10 07:22:50,536 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler – Setting up single store job

2012-01-10 07:22:50,536 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler – Setting identity combiner class

.

2012-01-10 07:22:50,552 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler – BytesPerReducer=1000000000 maxR

educers=999 totalInputFileSize=10408717

2012-01-10 07:22:50,552 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler – Neither PARALLEL nor default pa

rallelism is set for this job. Setting number of reducers to 1

2012-01-10 07:22:50,646 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 1 map-reduce job(s) waiting for

submission.

2012-01-10 07:22:51,145 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 0% complete

2012-01-10 07:22:51,349 [Thread-6] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat – Total input paths to process : 1

2012-01-10 07:22:51,364 [Thread-6] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil – Total input paths to process : 1

2012-01-10 07:22:51,380 [Thread-6] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil – Total input paths (combined) to process : 1

2012-01-10 07:22:52,661 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – HadoopJobId: job_201201092258_00

01

2012-01-10 07:22:52,661 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – More information at:http://10.2

8.202.165:50030/jobdetails.jsp?jobid=job_201201092258_0001

2012-01-10 07:23:59,655 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 10% complete

2012-01-10 07:24:02,655 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 10% complete

2012-01-10 07:24:07,655 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 10% complete

2012-01-10 07:24:12,654 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 10% complete

2012-01-10 07:24:17,654 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 10% complete

2012-01-10 07:24:22,653 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 10% complete

2012-01-10 07:24:23,653 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 13% complete

2012-01-10 07:24:26,653 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 16% complete

2012-01-10 07:24:27,653 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 16% complete

2012-01-10 07:24:42,652 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 19% complete

2012-01-10 07:24:57,229 [main] INFO  org.apache.pig.tools.pigstats.ScriptState – Pig script settings are added to the job

2012-01-10 07:24:57,229 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler – mapred.job.reduce.markreset.buf

fer.percent is not set, set to default 0.3

…..

…..

…..

05

2012-01-10 07:29:07,411 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – More information at:http://10.2

8.202.165:50030/jobdetails.jsp?jobid=job_201201092258_0005

2012-01-10 07:29:36,487 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 90% complete

2012-01-10 07:29:51,501 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 93% complete

2012-01-10 07:30:12,171 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 100% complete

2012-01-10 07:30:12,187 [main] INFO  org.apache.pig.tools.pigstats.PigStats – Script Statistics:

 

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features

0.20.203.1-SNAPSHOT     0.8.1-SNAPSHOT  avkash  2012-01-10 07:22:49     2012-01-10 07:30:12     GROUP_BY,ORDER_BY,DISTINCT,FILTER

 

Success!

 

Job Stats (time in seconds):

JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs

job_201201092258_0001   1       1       54      54      54      39      39      39      clean1,clean2,houred,ngramed1,raw       DISTINCT

job_201201092258_0002   1       1       39      39      39      30      30      30      hour_frequency1,hour_frequency2 GROUP_BY,COMBINER

job_201201092258_0003   1       1       18      18      18      24      24      24      filtered_uniq_frequency,uniq_frequency1,uniq_frequency2,uniq_f

requency3       GROUP_BY

job_201201092258_0004   1       1       12      12      12      21      21      21      ordered_uniq_frequency  SAMPLER

job_201201092258_0005   1       1       12      12      12      21      21      21      ordered_uniq_frequency  ORDER_BY        hdfs://10.28.202.165:9

000/user/avkash/script1-hadoop-results,

 

Input(s):

Successfully read 944954 records (10409087 bytes) from: “hdfs://10.28.202.165:9000/user/avkash/excite.log.bz2”

 

Output(s):

Successfully stored 13528 records (659755 bytes) in: “hdfs://10.28.202.165:9000/user/avkash/script1-hadoop-results”

 

Counters:

Total records written : 13528

Total bytes written : 659755

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

 

Job DAG:

job_201201092258_0001   ->      job_201201092258_0002,

job_201201092258_0002   ->      job_201201092258_0003,

job_201201092258_0003   ->      job_201201092258_0004,

job_201201092258_0004   ->      job_201201092258_0005,

job_201201092258_0005

 

 

2012-01-10 07:30:12,296 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Encountered Warning ACCESSING_NO

N_EXISTENT_FIELD 14 time(s).

2012-01-10 07:30:12,296 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Success!

grunt>

 

Resources: