????7??????worker???????

??????????С????????????????????????д?????????????????kill??

???????????????????hadoop???????task?????????????????????????800M??????????????泬??800M?????????kill??????

?????????????????????????2???????1?????????д?????????????kill???????????????????泬??????kill??2???????????mapred???????????????á???????800M???????????к??????

????8??MPI?????hadoop????????????

????????????mpi???????????????????в??????????????

??????????????????????Hadoop??????????????????? ?????????????????hadoop????????????????????????hadoop????????mpi???

???????????????????????hdfs???????????????????в????????????????鼴??hadoop??????mpi????????С?????????mpi???????????hadoop?????????????????????

????9??????map reduce?????в?????????

???????????????????????г??????????hadoop??????????

????????????????????map reduce?????в???????????????????rd???????????????????У????tab????棬?????????hadoop?????????????

???????????????????map reduce?????в?????????????????????rd??shell???????????á??????hadoop?????????????shell?????????滻????????????????????

????10??Hadoop???????????bistreaming?????????????

????????????????????????????????????????????????????????????????????????????????

????????????????????????????streaming??bistreaming???????bistreaming???????????????hadoop?? sequence file???????????????к???key length??value length???????hadoop???????????????????????????????????????????????????????

?????????????????????????outputformat=SequenceFileAsBinaryOutputFormat ??????????hadoop dfs -copySeqFileToLocal –ignoreLen?????????????????????????????????????????????д??????????

????11??Hadoop??????????????з?

??????????????????????session??query????У?session?????з??????????map????????????????????????map??????н??????

???????????????hadoop??????????????????С????з??????????????????????ж??????????????hadoop?????????????session?е?????map task?С??????????session???????session???

????????????????????????????????????????????????????????map???з??С???????????????????????????????з????map?????????ж????????????

???????????ò??????

????1???繒????????????????????????

?????????????У????????????????????????????????????????????????????????????????????????????????????????????????????????distcp???

????DistCp?????????????????????????????????俽?????????????Map/Reduce????????????????????????????????ɡ? ??????????????б????map??????????????????????б??в?????????????

????hadoop distcp hdfs??//nn1??8020/foo/bar hdfs??//nn2??8020/bar/foo??????????nn1?????/foo/bar???μ????????????????????洢????????????У???Щ?????????????????????????map???? ??????TaskTracker?????д?nn1??nn2??????????????DistCp???·?????в?????

????????distcp?????????????????????????????????????????????????????????£??????????????????????ж????????????????д????

????2????????????????????

???????????????????????治????800M???????????????????????????????????н??в????

????cat input | mapper | sort | reducer > output

??????????????????????????????????

????1??Streaming??????????л????????????????cat input?????????BiStreaming??“<key-length><key><value- length><value>”??????????????????????????????????÷??????

????cat input | ./ reader |./mapper |./ reducer >output

????reader?????????????mapper???????????keyLength?? key?? valueLength?? value??????????????????????sequencefile????????????reader??

????2????Mapper??Reducer??????hadoop????????????????????????????????Щ?????????????????????????Щ?????????????

????3????????????????????????

??????У??????????????????????????????????????????????????diff???????????汾?????н????

????????hadoop??????????????з?????map???reduce???????????????????????????????????????????????????????????е?????????????????????????????????????????????????diff?????????sort???????diff??

????4????????????

?????????????????map-reduce????????壬??????map-reduce????????????y?????????????????????????????????????map-reduce ???????????????????????Э???????????????????????????ɡ?

?????????????????????-x???????У?????????log????????????????н??????????????????????????н????log???????????Щ????????????