<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>极限手指</title>
	<atom:link href="http://ahei.info/feed" rel="self" type="application/rss+xml" />
	<link>http://ahei.info</link>
	<description>没有我做不到的，只有你想不到的</description>
	<lastBuildDate>Thu, 01 Dec 2011 04:36:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>灵异的shell</title>
		<link>http://ahei.info/shell.htm</link>
		<comments>http://ahei.info/shell.htm#comments</comments>
		<pubDate>Thu, 01 Dec 2011 04:36:35 +0000</pubDate>
		<dc:creator>ahei</dc:creator>
				<category><![CDATA[linux]]></category>

		<guid isPermaLink="false">http://ahei.info/?p=40851</guid>
		<description><![CDATA[1 引子 2 语法介绍 2.1 定义 2.2 管道 2.3 引用 (QUOTING) 2.4 参数 (PARAMETERS) 2.5 扩展 (EXPANSION) 2.6 重定向 3 小技巧 4 工具 4.1 log4sh 4.2 shunit 4.3 bashdb 5 shell快捷键 6 shell炸弹 7 shell加密 7.1 shc 7.2 wzsh 8 宝典 1 引子 我06年开始接触shell编程, 一开始照着别人的例子写些简单的脚本, 后来在网上找些shell语法的教程来看看(我想大多数同学学习shell也是这么个过程), 觉得shell挺简单的, 比其他语言简单多了. 但是随着写shell脚本次数的增多, 发现根本不是那么回事, 觉得shell太灵异了, 经常出现一些奇怪的错误, 比如: 变量赋值 #!/usr/bin/env [...]]]></description>
			<content:encoded><![CDATA[<div id="table-of-contents">
<div id="text-table-of-contents">
<ul>
<li><a href="#sec-1">1 引子 </a></li>
<li><a href="#sec-2">2 语法介绍 </a>
<ul>
<li><a href="#sec-2_1">2.1 定义 </a></li>
<li><a href="#sec-2_2">2.2 管道 </a></li>
<li><a href="#sec-2_3">2.3 引用 (QUOTING) </a></li>
<li><a href="#sec-2_4">2.4 参数 (PARAMETERS) </a></li>
<li><a href="#sec-2_5">2.5 扩展 (EXPANSION) </a></li>
<li><a href="#sec-2_6">2.6 重定向 </a></li>
</ul>
</li>
<li><a href="#sec-3">3 小技巧 </a></li>
<li><a href="#sec-4">4 工具 </a>
<ul>
<li><a href="#sec-4_1">4.1 log4sh </a></li>
<li><a href="#sec-4_2">4.2 shunit </a></li>
<li><a href="#sec-4_3">4.3 bashdb </a></li>
</ul>
</li>
<li><a href="#sec-5">5 shell快捷键 </a></li>
<li><a href="#sec-6">6 shell炸弹 </a></li>
<li><a href="#sec-7">7 shell加密 </a>
<ul>
<li><a href="#sec-7_1">7.1 shc </a></li>
<li><a href="#sec-7_2">7.2 wzsh </a></li>
</ul>
</li>
<li><a href="#sec-8">8 宝典 </a></li>
</ul>
</div>
</div>
<div id="outline-container-1" class="outline-3">
<h3 id="sec-1"><span class="section-number-3">1</span> 引子 </h3>
<div class="outline-text-3" id="text-1">
<p>我06年开始接触shell编程, 一开始照着别人的例子写些简单的脚本, 后来在网上找些shell语法的教程来看看(我想大多数同学学习shell也是这么个过程), 觉得shell挺简单的, 比其他语言简单多了. 但是随着写shell脚本次数的增多, 发现根本不是那么回事, 觉得shell太灵异了, 经常出现一些奇怪的错误, 比如:<span id="more-40851"></span>
</p>
<ol>
<li>
变量赋值</p>
<pre class="src src-sh"><span style="color: #b22222;">#</span><span style="color: #b22222;">!/usr/bin/</span><span style="color: #a020f0;">env</span><span style="color: #b22222;"> bash
</span>
action = <span style="color: #8b2252;">"$1"</span>

<span style="color: #7a378b;">echo</span> <span style="color: #8b2252;">"You want $action"</span>
</pre>
<p>
把上面的代码保存为test.sh并加上可执行权限, 执行./test.sh exit, 得到这样的错误提示:
</p>
<pre class="example">
./test.sh: line 3: action: command not found
</pre>
<p>再改下:
</p>
<pre class="src src-sh"><span style="color: #b22222;">#</span><span style="color: #b22222;">!/usr/bin/</span><span style="color: #a020f0;">env</span><span style="color: #b22222;"> bash
</span>
<span style="color: #a0522d;">action</span>= <span style="color: #8b2252;">"$1"</span>

<span style="color: #7a378b;">echo</span> <span style="color: #8b2252;">"You want $action"</span>
</pre>
<p>
不错, 好像没错误了, 不过怎么啥都不打印啦? <br/><br />
有过其他语言编程经验的同学可能也会像我犯那样的错误, 同时会产生这样的疑问: 怎么shell中赋个值还这么多麻烦啊?
</p>
</li>
<li>
判断2个字符串是否相等</p>
<pre class="src src-sh"><span style="color: #a020f0;">if</span> [ $<span style="color: #a0522d;">user</span> = <span style="color: #8b2252;">"admin"</span> ]; <span style="color: #a020f0;">then</span>
    <span style="color: #7a378b;">echo</span> <span style="color: #8b2252;">"You are admin!"</span>
<span style="color: #a020f0;">fi</span>
</pre>
<p>
上述代码判断一个用户是否为管理员, 但是有时候上面的代码运行时会出现这样的错误:
</p>
<pre class="example">
-bash: [: =: unary operator expected
</pre>
<p>这意思好像是说"期待一元运算符"? 啥意思? <br/><br />
有的教程里指出这样就可以了:
</p>
<pre class="src src-sh"><span style="color: #a020f0;">if</span> [ x$<span style="color: #a0522d;">user</span> = <span style="color: #8b2252;">"xadmin"</span> ]; <span style="color: #a020f0;">then</span>
    <span style="color: #7a378b;">echo</span> <span style="color: #8b2252;">"You are admin!"</span>
<span style="color: #a020f0;">fi</span>
</pre>
<p>
我试了下, 好像还真行, 为啥这样就可以呢? 这个&#8221;x&#8221;这么神奇? 其他的字母也可以这么神奇吗? <br/><br />
我看有的人这样写:
</p>
<pre class="src src-sh"><span style="color: #a020f0;">if</span> [ <span style="color: #8b2252;">"x$user"</span> = <span style="color: #8b2252;">"xadmin"</span> ]; <span style="color: #a020f0;">then</span>
</pre>
<p>
还有人这样写:
</p>
<pre class="src src-sh"><span style="color: #a020f0;">if</span> [ x$<span style="color: #a0522d;">user</span> = xadmin ]; <span style="color: #a020f0;">then</span>
</pre>
<p>
甚至有人这样:
</p>
<pre class="src src-sh"><span style="color: #a020f0;">if</span> [ x$<span style="color: #a0522d;">user</span> = x<span style="color: #8b2252;">"admin"</span> ]; <span style="color: #a020f0;">then</span>
</pre>
<p>
后来我无意中发现这样就可以了:
</p>
<pre class="src src-sh"><span style="color: #a020f0;">if</span> [ <span style="color: #8b2252;">"$user"</span> = <span style="color: #8b2252;">"admin"</span> ]; <span style="color: #a020f0;">then</span>
</pre>
<p>
这加不加双引号和加在哪到底有什么不同?
</p>
</li>
<li>
read好使不</p>
<ol>
<li>
echo str | read a, 怎么$a就不是我想要的str呢?
</li>
<li>
假如文件a的第一行是&#8221;str1 TAB str2&#8243;, 执行:</p>
<pre class="src src-sh"><span style="color: #7a378b;">read</span> str &lt; a
<span style="color: #7a378b;">echo</span> $<span style="color: #a0522d;">str</span>
</pre>
<p>
怎么输出是&#8221;str1 str2&#8243;, 而不是&#8221;str1 TAB str2&#8243;呢, 我的TAB哪去了?
</p>
</li>
</ol>
</li>
<li>
神奇的注释 <br/><br />
有人说下面这些符号可以用来注释shell:</p>
<pre class="src src-sh">: &lt;&lt; COMMENT<span style="color: #ffa54f;">
COMMENT
</span></pre>
<p>
比如:
</p>
<pre class="src src-sh"><span style="color: #7a378b;">echo</span> xxx

: &lt;&lt; COMMENT<span style="color: #ffa54f;">
echo "mm said, you could not touch me!"
COMMENT
</span>
<span style="color: #7a378b;">echo</span> yyy
</pre>
<p>
好像还真行, 但是下面这个怎么不行呢?
</p>
<pre class="src src-sh"><span style="color: #7a378b;">echo</span> xxx

: &lt;&lt; COMMENT<span style="color: #ffa54f;">
file=a
result=$(</span><span style="color: #ff00ff;">grep</span><span style="color: #ffa54f;"> str $file)
COMMENT
</span>
<span style="color: #7a378b;">echo</span> <span style="color: #8b2252;">"mm said, you can touch me!"</span>
</pre>
<p>
还有其他的神奇注释吗?
</p>
</li>
</ol>
<p>其实类似上面这些灵异的例子还有很多, 但是纵观那些shell教程, 很少有能把shell的这些灵异的地方给读者讲明白的. 下面, 我结合我自己的一些经验, 力图把shell的一些本质语法给大家讲明白, 让大家在遇到一些灵异的问题时, 能迅速的定位和解决问题.
</p>
</div>
</div>
<div id="outline-container-2" class="outline-3">
<h3 id="sec-2"><span class="section-number-3">2</span> 语法介绍 </h3>
<div class="outline-text-3" id="text-2">
</div>
<div id="outline-container-2_1" class="outline-4">
<h4 id="sec-2_1"><span class="section-number-4">2.1</span> 定义 </h4>
<div class="outline-text-4" id="text-2_1">
<ul>
<li id="sec-2_1_1">单词 (word) <br/><br />
一串字符构成一个单词, 也叫token
</li>
</ul>
<ul>
<li id="sec-2_1_2">name (identifier) <br/><br />
仅有字母、数字、下划线构成， 而且由字母或者下划线开头的word叫name, 也叫标识符(identifier)
</li>
</ul>
<ul>
<li id="sec-2_1_3">元字符 (metacharacter) <br/>
<pre class="example">
| &amp; ; ( ) &lt; &gt; space tab
</pre>
<p>这些字符没有被引号引起来时, 可以用来分割单词
</p>
</li>
</ul>
</div>
</div>
<div id="outline-container-2_2" class="outline-4">
<h4 id="sec-2_2"><span class="section-number-4">2.2</span> 管道 </h4>
<div class="outline-text-4" id="text-2_2">
<pre class="example">
command | command2
command |&amp; command2
</pre>
<p>把command的输出通过管道连接到command2的输入, |&amp;连标准错误也一起做为command2的输入.
</p>
<p>
这里要注意的时, command2是在子shell里面执行的, command2对环境所做的改变不会影响到command所在的shell环境. 这就解释了本文开头的问题3.1
</p>
</div>
</div>
<div id="outline-container-2_3" class="outline-4">
<h4 id="sec-2_3"><span class="section-number-4">2.3</span> 引用 (QUOTING) </h4>
<div class="outline-text-4" id="text-2_3">
<p>引用用来去掉某些字符的特殊意义. 比如想使用元字符的字面意义必须对其进行引用.
</p>
<p>
引用有3类: 反斜线引用(\)、单引号引用、双引号引用.
</p>
<p>
单引号引用屏蔽单引号内的任何字符所具有的特殊意义, 包括反斜线(\), 所以单引号引用不能再包含单引号(比较杯具&hellip;)
</p>
<p>
双引号引用中除了 <b>$</b> 、 <b>`</b> 、 <b>\</b> 、 <b>!</b> , 其他特殊字符的意义都被屏蔽.
</p>
<p>
<b>小技巧:</b>
</p>
<ul>
<li>
$&#8217;string&#8217;<br />
这个语法的意思是: string中含有的反斜线及其后的字符会被特殊解释, 比如: \t会被解释成TAB. 这个非常有用, 比如sort的字段分隔符只能是单个字符, 如果想用TAB做字段分隔符的话, 好多人都这样: sort -t &#8221;   &#8220;, 由于好多编辑器会把TAB变成4个空格, 所以这样做经常会出问题, 那现在你可以这样了: sort -t $&#8217;\t&#8217;
</li>
</ul>
</div>
</div>
<div id="outline-container-2_4" class="outline-4">
<h4 id="sec-2_4"><span class="section-number-4">2.4</span> 参数 (PARAMETERS) </h4>
<div class="outline-text-4" id="text-2_4">
<p>参数是用来存储值的实体, 它可以是数字(0, 1, 2 &hellip;)、name、某些特殊字符(@, *, &hellip;). 当参数是一个name时, 也叫变量(variable), 变量赋值:
</p>
<pre class="src src-sh"><span style="color: #a0522d;">name</span>=[value]
</pre>
<p>
等号2边不能有空格, 如果有空格的话, shell解释程序怎么知道你到底是想要运行name命令还是给name赋值呢? 所以的shell的变量赋值才不得不这样&#8221;讲究&#8221;
</p>
<p>
<b>小技巧:</b>
</p>
<ul>
<li>
shell变量也可以 <b>+=</b>
</li>
<li>
在命令之前的变量赋值语句只影响该命令, 比如:</p>
<pre class="src src-sh"><span style="color: #a0522d;">LANG</span>= sort file
</pre>
<p>
上面的命令表示在运行sort file的时候LANG为空, 不会影响其他的后续命令. 你是否还记得这样的代码:
</p>
<pre class="src src-sh"><span style="color: #a0522d;">tmp_LANG</span>=$<span style="color: #a0522d;">LANG</span>
<span style="color: #a0522d;">LANG</span>=zh_CN
codes ...
<span style="color: #a0522d;">LANG</span>=$<span style="color: #a0522d;">tmp_LANG</span>
</pre>
</li>
</ul>
<ul>
<li id="sec-2_4_1">位置参数 (Positional Parameters) <br/><br />
$0, $1, &hellip;</p>
<p>
<b>小技巧:</b>
</p>
<ul>
<li>
怎么重设位置参数? 用set
</li>
<li>
$10可以吗? 用${10}
</li>
</ul>
</li>
</ul>
<ul>
<li id="sec-2_4_2">特殊参数 (Special Parameters) <br/>
<ul>
<li id="sec-2_4_2_1">$* <br/>
<pre class="example">
$* == $1 $2 $3 ...
"$*" == "$1c$2c$3...", c为IFS的第一个字符
</pre>
<p>
<b>IFS</b> 参见<a href="#IFS">这里</a>
</p>
</li>
</ul>
<ul>
<li id="sec-2_4_2_2">$@ <br/>
<pre class="example">
$2 == $*
"$@" == "$1" "$2" "$3" ...
</pre>
<p>
<b>$*</b> 和 <b>$@</b> 啥区别? 见后文
</p>
</li>
</ul>
</li>
</ul>
<ul>
<li id="sec-2_4_3">shell内置变量 (Shell Variables) <br/>
<ul>
<li id="sec-2_4_3_1"><a name="sec-2_4_3_1" id="sec-2_4_3_1"></a>IFS <br/><br />
Internal Field Separator, 用来扩展后分割单词, read命令也是用它来分割单词. 默认值为: &lt;space&gt;&lt;tab&gt;&lt;newline&gt;</p>
</li>
</ul>
<ul>
<li id="sec-2_4_3_2">LANG <br/><br />
这个变量控制你的环境所使用的语言(locale), 还有LC_开头的好几个shell变量也控制locale相关的一些方面. 当你sort一个含有中文的文件时, 是不是结果不如你所愿? 试试LANG=C sort</p>
</li>
</ul>
<ul>
<li id="sec-2_4_3_3">PATH <br/><br />
可执行文件的搜索路径</p>
</li>
</ul>
</li>
</ul>
</div>
</div>
<div id="outline-container-2_5" class="outline-4">
<h4 id="sec-2_5"><span class="section-number-4">2.5</span> 扩展 (EXPANSION) </h4>
<div class="outline-text-4" id="text-2_5">
<p>命令行被分割成单词后, 开始执行扩展. 扩展有大括号扩展(brace expansion), 波浪号扩展(tilde expansion), 参数和变量扩展(parameter and variable expansion), 算术扩展(arithmetic expansion), 命令替换(command substitution), 单词分割(word splitting), 路径扩展(pathname expansion). 扩展的优先级也如上所示. 有的系统还支持进程替换(process substitution)
</p>
<ul>
<li id="sec-2_5_1">大括号扩展 <br/>
<pre class="src src-sh"><span style="color: #7a378b;">echo</span> a{b,c}
ab ac

<span style="color: #7a378b;">echo</span> {1..10}
1 2 3 4 5 6 7 8 9 10

<span style="color: #7a378b;">echo</span> {10..1}
10 9 8 7 6 5 4 3 2 1

<span style="color: #7a378b;">echo</span> {1..10..3}
1 4 7 10

<span style="color: #7a378b;">echo</span> {a..f}
a b c d e f

<span style="color: #7a378b;">echo</span> {a..f..2}
a c e
</pre>
</li>
</ul>
<ul>
<li id="sec-2_5_2">波浪号扩展 <br/>
<pre class="src src-sh"><span style="color: #7a378b;">echo</span> ~/sdfa
/home/taoshanwen/sdfa

~+ =&gt; PWD
~- =&gt; OLDPWD
</pre>
</li>
</ul>
<ul>
<li id="sec-2_5_3">参数扩展 <br/><br />
${parameter}, 就是取出parameter的值, 有很多形式:</p>
<ul>
<li>
${parameter:offset}
</li>
<li>
${parameter:offset:length} <br/><br />
对parameter进行substr</p>
</li>
<li>
${parameter#word}
</li>
<li>
${parameter##word} <br/><br />
删掉匹配的前缀</p>
</li>
<li>
${parameter%word}
</li>
<li>
${parameter%%word} <br/><br />
删掉匹配的后缀</p>
</li>
</ul>
<p>还有很多, 详见bash man
</p>
</li>
</ul>
<ul>
<li id="sec-2_5_4">命令替换 <br/><br />
$(command) 或者`command`, 把command的输出做为结果</p>
</li>
</ul>
<ul>
<li id="sec-2_5_5">算术扩展 <br/><br />
$((expression)), 对expression进行算术表达式操作, 例如:</p>
<pre class="src src-sh"><span style="color: #7a378b;">echo</span> $((9 + 8 * 9))
81

<span style="color: #7a378b;">echo</span> $((9 + 8 ** 9))
134217737
</pre>
</li>
</ul>
<ul>
<li id="sec-2_5_6">进程替换 <br/><br />
假如我现在想比较两个目录dir1和dir2中的文件有啥不同, 我想很多人会这样做:</p>
<pre class="src src-sh">ls dir1 &gt; 1
ls dir2 &gt; 2
diff 1 2
</pre>
<p>
但你试试这样:
</p>
<pre class="src src-sh">diff &lt;(ls dir1) &lt;(ls dir2)
</pre>
<p>
是不是也可以? 很神奇吧. 上面的这个语法&lt;(command)就是进程替换. &lt;(command)表示把command的输出生成一个临时文件, 并把这个文件名作为另外一个命令的参数. 对于上面的命令, 就是把&#8221;ls dir1&#8243;命令的输出生成一个临时文件, 并把临时文件名做为diff命令的第一个参数. 再举一个例子:
</p>
<pre class="src src-sh">wget -q -O &gt;(cat) http://baidu.com
</pre>
<p>
wget命令会把下载后的文件保存到文件中去, 但是我们可以用上面的命令不让它保存到文件中去, 而是显示出来. wget的&#8221;-O&#8221;选项后本来应该是一个文件名的参数, 但是我们现在用&gt;(cat)代替, 表示wget下载下来的内容放到一个临时文件中, 然后把这个临时文件名再传给&gt;()里面的cat命令.<br />
灵活运用进程替换, 将会非常的方便, <b>严重推荐</b>
</p>
</li>
</ul>
<ul>
<li id="sec-2_5_7">单词分割 <br/><br />
<b>shell解释器最为重要的一步! shell灵异的来源</b></p>
<p>
上述扩展如果没有双引号扩起来, 扩展完后, shell将会对结果用IFS进行单词分割. 例如:
</p>
<pre class="src src-sh"><span style="color: #a0522d;">str</span>=<span style="color: #8b2252;">"a         b          c"</span>

<span style="color: #7a378b;">echo</span> $<span style="color: #a0522d;">str</span>
a b c

<span style="color: #7a378b;">echo</span> <span style="color: #8b2252;">"$str"</span>
a         b          c
</pre>
<p>
为什么加不加双引号结果会迥然不同? 因为没加双引号时, shell会对扩展结果进行单词分割, $str的扩展结果为&#8221;a         b          c&#8221;, 分割后变成3个单词a、b、c, 这3个单词做为echo命令的三个参数, 最终输出结果自然是&#8221;a b c&#8221;了.
</p>
<p>
想起来本文开头的3.2问题了吗? 知道怎么回事了吧?
</p>
<p>
另外, 扩展结果为空的话, 如果没有被双引号或者单引号扩起来的话, 会被删掉. 例如:
</p>
<pre class="src src-sh"><span style="color: #b22222;">#</span><span style="color: #b22222;">!/usr/bin/</span><span style="color: #a020f0;">env</span><span style="color: #b22222;"> bash
</span>
<span style="color: #a0522d;">user</span>=<span style="color: #8b2252;">"$1"</span>

mysql -u $<span style="color: #a0522d;">user</span> db -e <span style="color: #8b2252;">"$sql"</span>
</pre>
<p>
上面这个脚本如果第一个参数为空的话, $user将会被删掉, 从而mysql的用户名会变成db, 正确的代码应该是:
</p>
<pre class="src src-sh">mysql -u <span style="color: #8b2252;">"$user"</span> db -e <span style="color: #8b2252;">"$sql"</span>
</pre>
<p>
那你知道下面这些代码的错误之处了吗?
</p>
<pre class="src src-sh"><span style="color: #a0522d;">str</span>=$(<span style="color: #ff00ff;">cat</span> file)

<span style="color: #a020f0;">for</span> line<span style="color: #a020f0;"> in</span> <span style="color: #8b2252;">"$str"</span>; <span style="color: #a020f0;">do</span>
    <span style="color: #7a378b;">echo</span> <span style="color: #8b2252;">"$line"</span>
<span style="color: #a020f0;">done</span>
</pre>
<p>
说到这里, 我们来说说$*和$@的差别. 它们在不加双引号时完全一样, 但是不加双引号时, 他们都有一个问题, 就是扩展会进行单词分割, 如果输入的参数中含有空格, 可能有时候结果就不是我们想要的了, 比如:
</p>
<pre class="src src-sh"><span style="color: #b22222;">#</span><span style="color: #b22222;">!/usr/bin/</span><span style="color: #a020f0;">env</span><span style="color: #b22222;"> bash
</span>
<span style="color: #a020f0;">for</span> i<span style="color: #a020f0;"> in</span> $<span style="color: #a0522d;">*</span>; <span style="color: #a020f0;">do</span>
    <span style="color: #7a378b;">echo</span> $<span style="color: #a0522d;">i</span>
<span style="color: #a020f0;">done</span>
</pre>
<p>
保存上述的程序为test.sh, 该程序想打印每个输入参数,
</p>
<pre class="src src-sh">taoshanwen@taoshanwen-laptop ~$ ./test.sh ab cd ef
ab
<span style="color: #7a378b;">cd</span>
ef

taoshanwen@taoshanwen-laptop ~$ ./test.sh <span style="color: #8b2252;">"ab xx"</span> <span style="color: #8b2252;">"cd yy"</span> <span style="color: #8b2252;">"ef zz"</span>
ab
xx
<span style="color: #7a378b;">cd</span>
yy
ef
zz
</pre>
<p>
上述结果并不是我们想要的, 那怎么取得准确的输入参数呢? &#8220;$@&#8221;可以解决, 你可以试试, <img src='http://ahei.info/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
</li>
</ul>
<ul>
<li id="sec-2_5_8">路径扩展 <br/><br />
如果当前路径下有文件ab、ac、ad, 那么:</p>
<pre class="src src-sh"><span style="color: #7a378b;">echo</span> a*
ab ac ad
</pre>
</li>
</ul>
<ul>
<li id="sec-2_5_9">删除引用(Quote Removal) <br/><br />
经过上述扩展之后, 对于不是由于上述扩展产生的并且没有被引用的双引号、单引号、反斜线都会被删掉， 例如:</p>
<pre class="src src-sh"><span style="color: #7a378b;">echo</span> <span style="color: #8b2252;">"xx"</span> =&gt; xx
<span style="color: #7a378b;">echo</span> a<span style="color: #8b2252;">"xx"</span> =&gt; axx
</pre>
<p>
经过上面这么多的了解, 我们大致知道了shell解释器的解释过程:
</p>
<p>
<img src="screenshots/shell_interpreter.png"  alt="shell_69e8ac756c50863cb4a741bbc96c7b4b47dd5c66.png" />
</p>
</li>
</ul>
</div>
</div>
<div id="outline-container-2_6" class="outline-4">
<h4 id="sec-2_6"><span class="section-number-4">2.6</span> 重定向 </h4>
<div class="outline-text-4" id="text-2_6">
<ul>
<li id="sec-2_6_1">Here Documents <br/>
<pre class="example">
&lt;&lt;[-]word
here-document
delimiter
</pre>
<p>把here-document作为某个命令的标准输入. 例子:
</p>
<pre class="src src-sh">grep a &lt;&lt; EOF<span style="color: #ffa54f;">
asdf
qweszd
asdf
EOF
</span></pre>
<p>
如果word用双引号括住, delimiter就是word删除引用后的结果, here-document里面不进行任何扩展. 如果word没有用双引号括住, 那么here-document里面会进行参数替换、命令替换、算术扩展.
</p>
<p>
我们再来看看本文开头说的那个神奇的注释,
</p>
<pre class="src src-sh">: &lt;&lt; COMMENT<span style="color: #ffa54f;">
COMMENT
</span></pre>
<p>
*&#8221;:&#8221;* 是一个shell内置命令, 它不干任何事情, 它的返回值为0. 这样就好理解了, 被注释的内容实际上是作为 <b>:</b> 的标准输入, 而这个命令啥事情都没干, 起到注释的作用了. 但是你现在知道为啥下面这个没起到注释作用了吗? 咋解决呢?
</p>
<pre class="src src-sh"><span style="color: #7a378b;">echo</span> xxx

: &lt;&lt; COMMENT<span style="color: #ffa54f;">
file=a
result=$(</span><span style="color: #ff00ff;">grep</span><span style="color: #ffa54f;"> str $file)
COMMENT
</span>
<span style="color: #7a378b;">echo</span> <span style="color: #8b2252;">"mm said, you can touch me!"</span>
</pre>
</li>
</ul>
<ul>
<li id="sec-2_6_2">Here Strings <br/>
<pre class="example">
&lt;&lt;&lt; here-strings
</pre>
<p>把word作为命令的标准输入, 例子:<br />
grep a &lt;&lt;&lt; abc
</p>
</li>
</ul>
</div>
</div>
</div>
<div id="outline-container-3" class="outline-3">
<h3 id="sec-3"><span class="section-number-3">3</span> 小技巧 </h3>
<div class="outline-text-3" id="text-3">
<ul>
<li>
type <br/><br />
这个内置命令比which强大多了, 可以查找别名、函数、内置命令</p>
<pre class="example">
taoshanwen@taoshanwen-laptop ~$ type ls
ls 是 `ls --<a href="http://ahei.info/t/color" class="st_tag internal_tag" rel="tag" title="标签 color 下的日志">color</a> -N --show-<a href="http://ahei.info/t/control" class="st_tag internal_tag" rel="tag" title="标签 control 下的日志">control</a>-chars' 的别名
ls 是 /bin/ls

taoshanwen@taoshanwen-laptop ~$ type [
[ 是 shell 内嵌
[ 是 /usr/bin/[
</pre>
</li>
<li>
丰富多彩 <br/></p>
<ol>
<li>
grep有个&ndash;color选项, 可以高亮匹配的地方, 非常不错
</li>
<li>
在你的.bashrc里面加入下面的代码:</p>
<pre class="src src-sh"><span style="color: #b22222;"># </span><span style="color: #b22222;">less color configure
</span><span style="color: #b22222;"># </span><span style="color: #b22222;">blue
</span><span style="color: #7a378b;">export</span> <span style="color: #a0522d;">LESS_TERMCAP_mb</span>=$<span style="color: #8b2252;">'\E[01;34m'</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">red
</span><span style="color: #7a378b;">export</span> <span style="color: #a0522d;">LESS_TERMCAP_md</span>=$<span style="color: #8b2252;">'\E[01;31m'</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">magenta
</span><span style="color: #7a378b;">export</span> <span style="color: #a0522d;">LESS_TERMCAP_me</span>=$<span style="color: #8b2252;">'\E[01;35m'</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">write
</span><span style="color: #7a378b;">export</span> <span style="color: #a0522d;">LESS_TERMCAP_<a href="http://ahei.info/t/se" class="st_tag internal_tag" rel="tag" title="标签 se 下的日志">se</a></span>=$<span style="color: #8b2252;">'\E[0m'</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">yellow
</span><span style="color: #7a378b;">export</span> <span style="color: #a0522d;">LESS_TERMCAP_so</span>=$<span style="color: #8b2252;">'\E[01;44;33m'</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">cyan
</span><span style="color: #7a378b;">export</span> <span style="color: #a0522d;">LESS_TERMCAP_ue</span>=$<span style="color: #8b2252;">'\E[01;36m'</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">green
</span><span style="color: #7a378b;">export</span> <span style="color: #a0522d;">LESS_TERMCAP_us</span>=$<span style="color: #8b2252;">'\E[01;32m'</span>
</pre>
<p>
保证你的man会色彩缤纷, 重点突出, 非常方便
</p>
</li>
</ol>
</li>
<li>
[[]]和[]的区别 <br/></p>
<ol>
<li>
[[]]内不进行单词分割和路径扩展, 所以 <a href="#$a=====ab"> $a = ab </a>是可以的. []内则进行所有的扩展, [ $a = ab ]是不保险的.
</li>
<li>
[[]]内的&lt;&gt;是用当前locale做字符串比较的, []内的&lt;&gt;是根据ASCII顺序做比较的, 2者都不是对数字进行比较的, 这个需要注意, 比如可以试试<a href="#3==>==11"> 3 &gt; 11 </a>; echo $?, 是不是返回0? 另外, [只是内置的命令, 所以不能直接[ 3 &lt; 2 ], 这样的话, &lt;是元字符, 当作重定向符号了, 需要对&lt;进行转义, 需要这样 [ 3 "&lt;" 2 ]
</li>
<li>
[[]]的==、!=、=~确实是正则匹配的, 具体用法可以见bash man
</li>
</ol>
</li>
</ul>
</div>
</div>
<div id="outline-container-4" class="outline-3">
<h3 id="sec-4"><span class="section-number-3">4</span> 工具 </h3>
<div class="outline-text-3" id="text-4">
</div>
<div id="outline-container-4_1" class="outline-4">
<h4 id="sec-4_1"><span class="section-number-4">4.1</span> log4sh </h4>
<div class="outline-text-4" id="text-4_1">
<p><a href="http://sourceforge.net/projects/log4sh/" target="_blank">http://sourceforge.net/projects/log4sh/</a>, shell里的日志工具, 和log4系列的其他日志库配置基本差不多
</p>
</div>
</div>
<div id="outline-container-4_2" class="outline-4">
<h4 id="sec-4_2"><span class="section-number-4">4.2</span> shunit </h4>
<div class="outline-text-4" id="text-4_2">
<p><a href="http://shunit.sourceforge.net/" target="_blank">http://shunit.sourceforge.net/</a>, shell的单元测试工具
</p>
</div>
</div>
<div id="outline-container-4_3" class="outline-4">
<h4 id="sec-4_3"><span class="section-number-4">4.3</span> bashdb </h4>
<div class="outline-text-4" id="text-4_3">
<p><a href="http://bashdb.sourceforge.net/" target="_blank">http://bashdb.sourceforge.net/</a>, shell的调试工具
</p>
</div>
</div>
</div>
<div id="outline-container-5" class="outline-3">
<h3 id="sec-5"><span class="section-number-3">5</span> shell快捷键 </h3>
<div class="outline-text-3" id="text-5">
<p><a href="http://ahei.info/bash.htm" target="_blank">高效操作Bash</a>
</p>
</div>
</div>
<div id="outline-container-6" class="outline-3">
<h3 id="sec-6"><span class="section-number-3">6</span> shell炸弹 </h3>
<div class="outline-text-3" id="text-6">
<pre class="example">
 <img src='http://ahei.info/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> ) { <img src='http://ahei.info/wp-includes/images/smilies/icon_neutral.gif' alt=':|' class='wp-smiley' /> :&amp;};:
</pre>
<p>上面的命令能迅速的灭了你的系统, <b>慎用!</b> ulimit -u进行限制
</p>
</div>
</div>
<div id="outline-container-7" class="outline-3">
<h3 id="sec-7"><span class="section-number-3">7</span> shell加密 </h3>
<div class="outline-text-3" id="text-7">
</div>
<div id="outline-container-7_1" class="outline-4">
<h4 id="sec-7_1"><span class="section-number-4">7.1</span> shc </h4>
<div class="outline-text-4" id="text-7_1">
<p><a href="http://www.datsi.fi.upm.es/~frosal/" target="_blank">http://www.datsi.fi.upm.es/~frosal/</a>, 简单的加密工具, 会把shell转换成一个二进制文件
</p>
</div>
</div>
<div id="outline-container-7_2" class="outline-4">
<h4 id="sec-7_2"><span class="section-number-4">7.2</span> wzsh </h4>
<div class="outline-text-4" id="text-7_2">
<p><a href="http://wzce.tripod.com/wzsh.html" target="_blank">http://wzce.tripod.com/wzsh.html</a>, 更加强大的加密工具
</p>
</div>
</div>
</div>
<div id="outline-container-8" class="outline-3">
<h3 id="sec-8"><span class="section-number-3">8</span> 宝典 </h3>
<div class="outline-text-3" id="text-8">
<ul>
<li>
<a href="http://www.linuxsir.org/main/doc/abs/abs3.7cnhtm/index.html" target="_blank">高级Bash脚本编程指南</a>
</li>
<li>
bash man, <a href="http://ahei.info/chinese-bash-man.htm" target="_blank">中文bash man</a>
</li>
</ul>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ahei.info/shell.htm/feed</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>高效操作Bash</title>
		<link>http://ahei.info/bash.htm</link>
		<comments>http://ahei.info/bash.htm#comments</comments>
		<pubDate>Sat, 18 Dec 2010 14:37:12 +0000</pubDate>
		<dc:creator>ahei</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[中级]]></category>

		<guid isPermaLink="false">http://ahei.info/?p=40838</guid>
		<description><![CDATA[我们在平常工作中大量使用linux, 而使用linux的过程中操作Bash更是非常之频繁, 所以怎样高效的操作Bash是一个非常重要的问题. 下面我结合自己的经验总结一下高效操作Bash的一些技巧. 1 快捷键 1.1 注意 1.2 重度推荐 1.3 常用快捷键 1.4 高级快捷键 1.5 总结 2 历史扩展 2.1 概念 2.2 事件指示器(Event Designators) 2.3 单词指示器(Word Designators) 2.4 修饰符(Modifiers) 2.5 例子 2.6 总结 3 shell技巧 3.1 Here Documents 3.2 Here Strings 3.3 进程替换(Process Substitution) 4 广告 1 快捷键 1.1 注意 本文的快捷键表示中, C 表示Ctrl键, M表示Alt健. 这些快捷键中, 有一个小规律, [...]]]></description>
			<content:encoded><![CDATA[<p>
我们在平常工作中大量使用linux, 而使用linux的过程中操作Bash更是非常之频繁, 所以怎样高效的操作Bash是一个非常重要的问题. 下面我结合自己的经验总结一下高效操作Bash的一些技巧.<span id="more-40838"></span>
</p>
<div id="table-of-contents">
<div id="text-table-of-contents">
<ul>
<li><a href="#sec-1">1 快捷键 </a>
<ul>
<li><a href="#sec-1_1">1.1 注意 </a></li>
<li><a href="#sec-1_2">1.2 重度推荐 </a></li>
<li><a href="#sec-1_3">1.3 常用快捷键 </a></li>
<li><a href="#sec-1_4">1.4 高级快捷键 </a></li>
<li><a href="#sec-1_5">1.5 总结 </a></li>
</ul>
</li>
<li><a href="#sec-2">2 历史扩展 </a>
<ul>
<li><a href="#sec-2_1">2.1 概念 </a></li>
<li><a href="#sec-2_2">2.2 事件指示器(Event Designators) </a></li>
<li><a href="#sec-2_3">2.3 单词指示器(Word Designators) </a></li>
<li><a href="#sec-2_4">2.4 修饰符(Modifiers) </a></li>
<li><a href="#sec-2_5">2.5 例子 </a></li>
<li><a href="#sec-2_6">2.6 总结 </a></li>
</ul>
</li>
<li><a href="#sec-3">3 shell技巧 </a>
<ul>
<li><a href="#sec-3_1">3.1 Here Documents </a></li>
<li><a href="#sec-3_2">3.2 Here Strings </a></li>
<li><a href="#sec-3_3">3.3 进程替换(Process Substitution) </a></li>
</ul>
</li>
<li><a href="#sec-4">4 广告 </a></li>
</ul>
</div>
</div>
<div id="outline-container-1" class="outline-3">
<h3 id="sec-1"><span class="section-number-3">1</span> 快捷键 </h3>
<div class="outline-text-3" id="text-1">
</div>
<div id="outline-container-1_1" class="outline-4">
<h4 id="sec-1_1"><span class="section-number-4">1.1</span> 注意 </h4>
<div class="outline-text-4" id="text-1_1">
<p>本文的快捷键表示中, <b>C</b> 表示Ctrl键, M表示Alt健. 这些快捷键中, 有一个小规律, 对字符操作一般是C开头, 对单词操作一般是M开头. 如果你用SecureCRT, 默认的话, 会输入不了Alt开头的快捷键, 因为Alt被当作菜单快捷键了, 可以点 选项 -&gt; 回话选项, 选择tab 终端-&gt;仿真-&gt;<a href="http://ahei.info/c-emacs.htm" class="st_tag internal_tag" rel="tag" title="标签 Emacs 下的日志">Emacs</a>, 把&#8221;使用Alt键作为元键&#8221;打勾. 如果你用gnome-terminal, 默认状态下也输入不了Alt开头的快捷键，也被当作菜单快捷键了，可以点 编辑 -&gt; 键盘快捷键, 把＂启用菜单快捷键＂前面的勾去掉.<br />
下面的快捷键中很多以Ctrl键开头, 很多键盘的Ctrl键并不是很好按, 可以尝试把<a href="http://emacser.com/capslocak.htm" target="_blank">Ctrl键和Capslock键交换</a>.
</p>
</div>
</div>
<div id="outline-container-1_2" class="outline-4">
<h4 id="sec-1_2"><span class="section-number-4">1.2</span> 重度推荐 </h4>
<div class="outline-text-4" id="text-1_2">
<ul>
<li id="sec-1_2_1">C-r <br/><br />
有时候，如果你想重新输入以前输入过的某条命令怎么办? 我见过两种做法：</p>
<ol>
<li>
不停的按向上方向键，试图找出那条命令
</li>
<li>
输入history命令，然后找到那条命令，或者grep一把history命令的输出
</li>
</ol>
<p>其实, 你有更好的选择, 那就是按 <b>C-r</b>, 然后输入你想要的命令中含有的单词, 就会出现含有这个单词的命令, 如果它不是你想要的命令, 就继续按C-r, 知道出现你想要的命令为止. C-r效果: <br/>
</p>
<pre class="example">
(reverse-i-search)`ls': ls a b c
</pre>
</li>
</ul>
<ul>
<li id="sec-1_2_2">M-. <br/><br />
我经常见别人用mkdir long-long-long-name-dir后, 再输入cd, 后面跟那个长的不能再长的目录名, 这时候我就会告诉他, 其实你输入完cd后, 可以按M-., 就可以自动输入那个长的不能再长的目录名了. 其实, M-.的真正作用就是把上一条命令的最后一个参数输入到当前命令行. <b>非常非常之方便, 强烈推荐</b>. 如果继续按M-., 会把上上条命令的最后一个参数拿过来. 同样, 如果你想把上一条命令第一个参数拿过来咋办呢? 用M-0 M-., 就是先输入M-0, 再输入M-.. 如果是上上条命令的第一个参数呢? 当然是M-0 M-. M-.了.</p>
</li>
</ul>
</div>
</div>
<div id="outline-container-1_3" class="outline-4">
<h4 id="sec-1_3"><span class="section-number-4">1.3</span> 常用快捷键 </h4>
<div class="outline-text-4" id="text-1_3">
<ul>
<li id="sec-1_3_1">程序控制 <br/><br />
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption></caption>
<colgroup>
<col class="left" />
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">意义</th>
<th scope="col" class="left">快捷键</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">终止当前在前台运行的程序</td>
<td class="left">C-c</td>
</tr>
<tr>
<td class="left">挂起当前在前台运行的程序</td>
<td class="left">C-z</td>
</tr>
<tr>
<td class="left">如果光标在行首且当前行没有输入任何字符, C-d会退出当前会话</td>
<td class="left">C-d</td>
</tr>
</tbody>
</table>
</li>
</ul>
<ul>
<li id="sec-1_3_2">光标移动 <br/><br />
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption></caption>
<colgroup>
<col class="left" />
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">意义</th>
<th scope="col" class="left">快捷键</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">向前(Forward)移动一个字符</td>
<td class="left">C-f</td>
</tr>
<tr>
<td class="left">向后(Backward)移动一个字符</td>
<td class="left">C-b</td>
</tr>
<tr>
<td class="left">向前移动一个单词</td>
<td class="left">M-f</td>
</tr>
<tr>
<td class="left">向后移动一个单词</td>
<td class="left">M-b</td>
</tr>
<tr>
<td class="left">移动光标到行首</td>
<td class="left">C-a</td>
</tr>
<tr>
<td class="left">移动光标到行尾</td>
<td class="left">C-e</td>
</tr>
</tbody>
</table>
</li>
</ul>
<ul>
<li id="sec-1_3_3">编辑 <br/><br />
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption></caption>
<colgroup>
<col class="left" />
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">意义</th>
<th scope="col" class="left">快捷键</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">向前删一个字符</td>
<td class="left">C-d</td>
</tr>
<tr>
<td class="left">向后删一个字符</td>
<td class="left">C-h</td>
</tr>
<tr>
<td class="left">向前删一个单词</td>
<td class="left">M-d</td>
</tr>
<tr>
<td class="left">向后删一个单词, 单词之间以符号分割</td>
<td class="left">C-M-h</td>
</tr>
<tr>
<td class="left">向后删一个单词, 单词之间以空格分割</td>
<td class="left">C-w</td>
</tr>
<tr>
<td class="left">清屏, 相当于命令clear, 有了这个快捷键, 就不用每次努力的敲clear了</td>
<td class="left">C-l</td>
</tr>
<tr>
<td class="left">删除当前光标到行尾的字符</td>
<td class="left">C-k</td>
</tr>
<tr>
<td class="left">删除当前光标到行首的字符</td>
<td class="left">C-u</td>
</tr>
<tr>
<td class="left">粘贴删除环里面的第一项</td>
<td class="left">C-y</td>
</tr>
<tr>
<td class="left">粘贴删除环里面的后面的项</td>
<td class="left">M-y</td>
</tr>
<tr>
<td class="left">undo</td>
<td class="left">C-/</td>
</tr>
<tr>
<td class="left">取出上一条命令的最后一个参数</td>
<td class="left">M-.</td>
</tr>
</tbody>
</table>
<p>
对于C-M-h和C-w的区别, 看下面这个例子: <br/><br />
如果当前光标前面的字符串为&#8221;abc def-ghi&#8221;, C-M-h会删掉ghi, 但是C-w会删掉&#8221;def-ghi&#8221;, 也就是说, C-M-h向后删的时候碰到非字母和数字就会停止, 但是C-w碰到空格才会停止. <br/>
</p>
<p>
Bash下有一个删除环(kill-ring), 所有被删除的东西(用C-d删除的字符不算)都会进入这个环, C-y会粘贴环里面最近进去的项, 想要粘贴后面的项, 必须在按C-y后, 不停的按M-y, 直到出来你想要的项为止. <br/>
</p>
<p>
有时候, 你想搜索某个文件中是否有TAB键, 你这时候会怎么做呢? 你或许会用grep, 在你输入完grep后, 你再按TAB, 这时候会出来什么? 什么都没出现! 再按? 出来:
</p>
<pre class="example">
Display all N possibilities? (y or n)
</pre>
<p>这是为何呢? 因为TAB是补全键. 那么是否是输入不了TAB吗? 不是! 按C-v后, 再按TAB即可. 同样, 想输入C-a, C-b也是同样的道理. <br/>
</p>
</li>
</ul>
<ul>
<li id="sec-1_3_4">历史命令操作 <br/><br />
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption></caption>
<colgroup>
<col class="left" />
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">意义</th>
<th scope="col" class="left">快捷键</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">从历史命令列表中取下一条命令, 相当于向下方向键</td>
<td class="left">C-n</td>
</tr>
<tr>
<td class="left">从历史命令列表中取上一条命令, 相当于向上方向键</td>
<td class="left">C-p</td>
</tr>
<tr>
<td class="left">向后增量搜索历史命令, <b>非常方便</b>, 严重推荐, 有了它, 以前输入过的很长的命令, 可以不用重复输入</td>
<td class="left">C-r</td>
</tr>
<tr>
<td class="left">循环执行历史命令</td>
<td class="left">C-o</td>
</tr>
</tbody>
</table>
<p>
用C-p取出历史命令列表中某一个命令后, 按C-o可以在这条命令到历史命令列表后面的命令之间循环执行命令, 比如历史命令列表中有50条命令, 后面三项分别是命令A, 命令B, 命令C, 用C-p取出命令A后, 再按C-o就可以不停的在命令A, 命令B, 命令C中循环执行这三个命令. C-o有一个非常好用的地方, 比如用cp命令在拷贝一个大目录的时候, 你肯定很想知道当前的拷贝进度, 那么你现在该怎样做呢? 估计很多人会想到不停的输入du -sh dir去执行, 但用C-o可以非常完美的解决这个问题, 方法就是:
</p>
<ol>
<li>
输入du -sh dir, 按回车执行命令
</li>
<li>
C-p, C-o, 然后就可以不停的按C-o了, 会不停的执行du -sh dir这条命令
</li>
</ol>
<p>其实上面这个问题也可以用watch命令解决:
</p>
<pre class="src src-sh">watch -n 1 -d du -sh dir
</pre>
</li>
</ul>
</div>
</div>
<div id="outline-container-1_4" class="outline-4">
<h4 id="sec-1_4"><span class="section-number-4">1.4</span> 高级快捷键 </h4>
<div class="outline-text-4" id="text-1_4">
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption></caption>
<colgroup>
<col class="left" />
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">意义</th>
<th scope="col" class="left">快捷键</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">从当前光标处向前搜索字符</td>
<td class="left">C-]</td>
</tr>
<tr>
<td class="left">从当前光标处向后搜索字符</td>
<td class="left">C-M-]</td>
</tr>
<tr>
<td class="left">交换当前光标下的字符和光标前面的一个字符, 交换后, 光标向后移东一个字符</td>
<td class="left">C-t</td>
</tr>
<tr>
<td class="left">交换当前光标所在单词和光标前面一个单词, 交换后, 光标向后移动一个单词</td>
<td class="left">M-t</td>
</tr>
<tr>
<td class="left">把单词首字符变成大写, 其他变成小写</td>
<td class="left">M-c</td>
</tr>
<tr>
<td class="left">把单词变成小写</td>
<td class="left">M-l</td>
</tr>
<tr>
<td class="left">把单词变成大写</td>
<td class="left">M-u</td>
</tr>
<tr>
<td class="left">删除当前光标前面所有的空白字符</td>
<td class="left">M-\</td>
</tr>
<tr>
<td class="left">向后非增量搜索历史命令</td>
<td class="left">M-p</td>
</tr>
<tr>
<td class="left">相当于TAB健</td>
<td class="left">C-i</td>
</tr>
<tr>
<td class="left">相当于回车键</td>
<td class="left">C-m/C-j</td>
</tr>
<tr>
<td class="left">在当前光标处和上一次光标处不停的移动</td>
<td class="left">C-x C-x</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="outline-container-1_5" class="outline-4">
<h4 id="sec-1_5"><span class="section-number-4">1.5</span> 总结 </h4>
<div class="outline-text-4" id="text-1_5">
<p>其实, 上面所说的快捷键并不是由Bash来控制的, 而是有一个叫<a href="http://www.gnu.org/software/readline/" target="_blank"><b>readline</b></a>的库来控制的, readline库用在很多地方, 比如gdb, mysql, 你使用gdb的时候, 是不是很奇怪, 为啥它也能用上下方向键取出前面后面的命令? 因为它用的也是readline库. 所以只要掌握了readline, 就掌握了Bash, gdb, mysql等程序里面的快捷键操作技巧. readline是一个非常非常强悍的库, 它有两种模式, 一个是<a href="http://emacser.com" target="_blank">Emacs</a>模式, 另外一个是vi模式, Emacs模式非常适合在命令行下使用, 我上面说的快捷键都是针对Emacs模式来说的. readline的Emacs模式下的光标移动, 编辑等快捷键和Emacs下的快捷键也非常相近. 所以你学会了这些快捷键, 也快入门Emacs了, <img src='http://ahei.info/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> . readline也可以自定义快捷键, 它还有一套配置语法. 关于它的详细介绍, 可以man readline或者info readline, 也可以看看大牛王垠写的<a href="http://docs.huihoo.com/homepage/shredderyin/readline.html" target="_blank">readline介绍</a>.
</p>
</div>
</div>
</div>
<div id="outline-container-2" class="outline-3">
<h3 id="sec-2"><span class="section-number-3">2</span> 历史扩展 </h3>
<div class="outline-text-3" id="text-2">
</div>
<div id="outline-container-2_1" class="outline-4">
<h4 id="sec-2_1"><span class="section-number-4">2.1</span> 概念 </h4>
<div class="outline-text-4" id="text-2_1">
<p>首先举个例子:<br />
首先输入一条命令:
</p>
<pre class="src src-sh">ls abc def ghi
</pre>
<p>
再输入:
</p>
<pre class="src src-sh">!!*:s/b/d
</pre>
<p>
那么实际上执行的命令是:
</p>
<pre class="src src-sh">adc def ghi
</pre>
<p>
我来解释一下, !!表示从命令历史列表中取上一条历史命令&#8221;ls abc def ghi&#8221;, *表示选择取刚才选择的命令的所有参数, 即: &#8220;abc def ghi&#8221;, :s/b/d表示对刚才取出来的参数&#8221;abc def ghi&#8221;进行替换, 把第一个出现的b替换成d <br/><br />
从上面可以看出, 操作历史命令分为三步: <br/>
</p>
<ul>
<li>
首先从历史命令列表中选择某条命令, 被选择到的命令被称作 <b>事件(event)</b> (对应上面的!!)
</li>
<li>
再从选择好的事件中选择一部分单词(words), 事件中的每个单词以空格分割(对应上面的*)
</li>
<li>
最后对选择好的一部分单词进行修改(Modifiers)
</li>
</ul>
</div>
</div>
<div id="outline-container-2_2" class="outline-4">
<h4 id="sec-2_2"><span class="section-number-4">2.2</span> 事件指示器(Event Designators) </h4>
<div class="outline-text-4" id="text-2_2">
<p>事件指示器用来从历史命令列表中选择一条命令, 也就是选择事件 <br/>
</p>
<ul>
<li>
!n <br/><br />
选择历史命令列表中第n条命令
</li>
<li>
!-n <br/><br />
选择倒数第n条命令
</li>
<li>
!! <br/><br />
选择上一条命令, 相当于!-1, 和 <b>C-p</b> 的作用也一样
</li>
<li>
!string <br/><br />
选择最近的以string开头的命令
</li>
<li>
!?string[?] <br/><br />
选择最近的包含string的命令, 如果该指示器后面是换行符, 则可以不用输入结尾的&#8221;?&#8221;
</li>
<li>
^string1^string2 <br/><br />
取上一条命令, 并把第一个出现的string1替换成string2
</li>
<li>
!# <br/><br />
引用目前输入的所有命令, 比如输入:</p>
<pre class="src src-sh">more a !#
</pre>
<p>
那么最终执行的命令就是:
</p>
<pre class="src src-sh">more a more a
</pre>
</li>
</ul>
</div>
</div>
<div id="outline-container-2_3" class="outline-4">
<h4 id="sec-2_3"><span class="section-number-4">2.3</span> 单词指示器(Word Designators) </h4>
<div class="outline-text-4" id="text-2_3">
<p>单词指示器用来从被选择好的事件中选择一部分单词, 单词指示器必须以冒号(:)和事件指示器分割开来, 除非单词指示器以^, $, *, -, %开头 <br/>
</p>
<ul>
<li>
0 <br/><br />
选择第0个word, 也就是命令. 假如事件为&#8221;ls abc&#8221;, 那么单词指示器0选择的word即为&#8221;ls&#8221;
</li>
<li>
n <br/><br />
选择第n个word
</li>
<li>
^ <br/><br />
选择命令的第一个参数, 也就是第一个word, 相当于单词指示器1
</li>
<li>
$ <br/><br />
选择命令的最后一个参数
</li>
<li>
% <br/><br />
选择最近的与 &#8220;?string?&#8221; 搜索相匹配的单词
</li>
<li>
x-y <br/><br />
选择第x到第y个word, -y表示0-y
</li>
<li>
* <br/><br />
选择命令的所有参数, 相当于1-$
</li>
<li>
x* <br/><br />
x-$的缩写
</li>
<li>
x- <br/><br />
类似x*, 不过不包含最后一个word. -选择除最后一个word外所有的words
</li>
</ul>
</div>
</div>
<div id="outline-container-2_4" class="outline-4">
<h4 id="sec-2_4"><span class="section-number-4">2.4</span> 修饰符(Modifiers) </h4>
<div class="outline-text-4" id="text-2_4">
<p>对选择的单词进行修改, 修饰符可以出现多次, 每个修饰符要以冒号开头 <br/>
</p>
<ul>
<li>
p <br/><br />
打印新命令, 但不执行
</li>
<li>
s/old/new/<br />
把 <b>第一次出现的</b> old替换成new, 如果分隔符&#8221;/&#8221;是最后一个字符的话, 可以省略. 就像sed中一样, 分隔符&#8221;/&#8221;可以用其他字符代替, 比如s:old:new:. new中出现的&amp;将被old代替. 如果old省略, 那么就用上一次替换用的old代替.
</li>
<li>
&amp; <br/><br />
重复上一次替换
</li>
<li>
g <br/><br />
使修饰符所做的修改应用于整个选择的单词. 类似于sed中的s命令最后的g, 可配合:s和:&amp;修饰符使用, 比如:gs/old/new则对整个事件进行替换.
</li>
<li>
a <br/><br />
和g作用一样
</li>
<li>
G <br/><br />
使后面的:s修饰符对每个word只替换一次
</li>
</ul>
</div>
</div>
<div id="outline-container-2_5" class="outline-4">
<h4 id="sec-2_5"><span class="section-number-4">2.5</span> 例子 </h4>
<div class="outline-text-4" id="text-2_5">
<ul>
<li id="sec-2_5_1">例一 <br/><br />
从别的机器的一个目录拷贝一个a.log文件, 执行:</p>
<pre class="src src-sh">scp user@machine:/home/user/a/a.log .
</pre>
<p>
后来执行:
</p>
<pre class="src src-sh">ls a.log
rm -rf a.log
</pre>
<p>
这时候再想拷贝一下b/b.log, 这时候就可以这样做:
</p>
<pre class="src src-sh">!scp:gs/a/b
</pre>
<p>
如果只想看看用历史扩展出来的命令, 那可以这样:
</p>
<pre class="src src-sh">!scp:gs/a/b/:p
</pre>
</li>
</ul>
<ul>
<li id="sec-2_5_2">例二 <br/><br />
从别的机器同时拷贝a/a.log和b/b.log:</p>
<pre class="src src-sh">scp user@mbchine:/home/user/a/a.log . &amp;&amp; !#-:gs/a/b
</pre>
<p>
上面的!#为事件指示器, 选择前面已经输入的命令&#8221;scp user@mbchine:/home/user/a/a.log . &amp;&amp;&#8221;, &#8220;-&#8221;为单词指示器, 选择除最后一个word, 即&#8221;&amp;&amp;&#8221;外的所有words, 也就是&#8221;scp user@mbchine:/home/user/a/a.log . &#8220;, 最后的&#8221;:gs/a/b&#8221;为修饰符, 对刚才选择的words进行全局替换, 把a替换成b, 最后就成了&#8221;scp user@mbchine:/home/user/b/b.log .&#8221;, 那么最终命令也就成了&#8221;scp user@mbchine:/home/user/a/a.log . &amp;&amp; scp user@mbchine:/home/user/b/b.log .&#8221;
</p>
</li>
</ul>
</div>
</div>
<div id="outline-container-2_6" class="outline-4">
<h4 id="sec-2_6"><span class="section-number-4">2.6</span> 总结 </h4>
<div class="outline-text-4" id="text-2_6">
<p>上面的例子都可以用前面所说的快捷键完成, 不过灵活利用历史扩展有时候还是能更高效的完成同样的事情
</p>
</div>
</div>
</div>
<div id="outline-container-3" class="outline-3">
<h3 id="sec-3"><span class="section-number-3">3</span> shell技巧 </h3>
<div class="outline-text-3" id="text-3">
</div>
<div id="outline-container-3_1" class="outline-4">
<h4 id="sec-3_1"><span class="section-number-4">3.1</span> Here Documents </h4>
<div class="outline-text-4" id="text-3_1">
<pre class="example">
&lt;&lt;[-]word
here-documents
delimiter
</pre>
<p>把here-documents作为某个命令的标准输入, 例子:
</p>
<pre class="src src-sh">grep a <span style="color: #00ffff;">&lt;&lt;</span> EOF<span style="color: #ff1493;">
asdf
qweszd
asdf
EOF
</span></pre>
</div>
</div>
<div id="outline-container-3_2" class="outline-4">
<h4 id="sec-3_2"><span class="section-number-4">3.2</span> Here Strings </h4>
<div class="outline-text-4" id="text-3_2">
<pre class="example">
&lt;&lt;&lt; here-strings
</pre>
<p>把word作为命令的标准输入, 例子:<br />
grep a &lt;&lt;&lt; abc
</p>
</div>
</div>
<div id="outline-container-3_3" class="outline-4">
<h4 id="sec-3_3"><span class="section-number-4">3.3</span> 进程替换(Process Substitution) </h4>
<div class="outline-text-4" id="text-3_3">
<p>假如我现在想比较两个目录dir1和dir2中的文件有啥不同, 我想很多人会这样做:
</p>
<pre class="src src-sh">ls dir1 <span style="color: #00ffff;">&gt;</span> 1
ls dir2 <span style="color: #00ffff;">&gt;</span> 2
diff 1 2
</pre>
<p>
但你试试这样:
</p>
<pre class="src src-sh">diff <span style="color: #00ffff;">&lt;</span><span style="color: #6495ed;">(</span>ls dir1<span style="color: #6495ed;">)</span> <span style="color: #00ffff;">&lt;</span><span style="color: #6495ed;">(</span>ls dir2<span style="color: #6495ed;">)</span>
</pre>
<p>
是不是也可以? 很神奇吧. 上面的这个语法&lt;(command)就是进程替换. &lt;(command)表示把command的输出生成一个临时文件, 并把这个文件名作为另外一个命令的参数. 对于上面的命令, 就是把&#8221;ls dir1&#8243;命令的输出生成一个临时文件, 并把临时文件名做为diff命令的第一个参数. 再举一个例子:
</p>
<pre class="src src-sh">wget -q -O <span style="color: #00ffff;">&gt;</span><span style="color: #6495ed;">(</span>cat<span style="color: #6495ed;">)</span> <a href="http://baidu.com">http://baidu.com</a>
</pre>
<p>
wget命令会把下载后的文件保存到文件中去, 但是我们可以用上面的命令不让它保存到文件中去, 而是显示出来. wget的&#8221;-O&#8221;选项后本来应该是一个文件名的参数, 但是我们现在用&gt;(cat)代替, 表示wget下载下来的内容放到一个临时文件中, 然后把这个临时文件名再传给&gt;()里面的cat命令.<br />
灵活运用进程替换, 将会非常的方便, <b>严重推荐</b>
</p>
</div>
</div>
</div>
<div id="outline-container-4" class="outline-3">
<h3 id="sec-4"><span class="section-number-3">4</span> 广告 </h3>
<div class="outline-text-3" id="text-4">
<p>呵呵, 最后做一点小广告, 这篇文章是在Emacs Org Mode下写的(本文最后一句话, HTML generated by org-mode 7.3 in emacs 23, 你看到了吗), <a href="http://orgmode.org/" target="_blank">Org Mode</a>是Emacs内置的一个非常强悍非常强悍的Mode, 是实践<a href="http://zh.wikipedia.org/zh/GTD" target="_blank">GTD</a>最好的工具, 它的功能包括但不限于: <b>时间管理</b>, 做笔记, 用原始的文本格式html/pdf/latex, 画流程图等. 可以看看这几篇文章以引起你的兴趣: <a href="http://emacser.com/org-mode.htm" target="_blank">Emacs org mode学习笔记</a>, <a href="http://emacser.com/emacs-ditaa.htm" target="_blank">Emacs中绘图 － ditaa篇</a>, <a href="http://emacser.com/emacs-simple-use.htm" target="_blank">Emacs － 普通人的编辑利器</a>.
</p>
</div>
</div>
<div id="postamble">
<p class="creator">HTML generated by org-mode 7.3 in emacs 23</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ahei.info/bash.htm/feed</wfw:commentRss>
		<slash:comments>49</slash:comments>
		</item>
		<item>
		<title>nutch的分布式抓取</title>
		<link>http://ahei.info/nutch-distributed-crawl.htm</link>
		<comments>http://ahei.info/nutch-distributed-crawl.htm#comments</comments>
		<pubDate>Fri, 12 Feb 2010 15:26:30 +0000</pubDate>
		<dc:creator>ahei</dc:creator>
				<category><![CDATA[Nutch]]></category>
		<category><![CDATA[中级]]></category>
		<category><![CDATA[分布式]]></category>
		<category><![CDATA[搜索引擎]]></category>
		<category><![CDATA[ahei]]></category>
		<category><![CDATA[color]]></category>
		<category><![CDATA[control]]></category>
		<category><![CDATA[crawl]]></category>
		<category><![CDATA[crawler]]></category>
		<category><![CDATA[DEA]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hdfs]]></category>
		<category><![CDATA[ide]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[plugin]]></category>
		<category><![CDATA[readlink]]></category>
		<category><![CDATA[se]]></category>
		<category><![CDATA[ssh]]></category>
		<category><![CDATA[ssh-copy-id]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[top]]></category>
		<category><![CDATA[ubuntu]]></category>
		<category><![CDATA[vi]]></category>
		<category><![CDATA[抓取]]></category>
		<category><![CDATA[插件]]></category>
		<category><![CDATA[配置]]></category>
		<category><![CDATA[配置文件]]></category>

		<guid isPermaLink="false">http://emacser.com/?p=40688</guid>
		<description><![CDATA[前段时间我写了一篇文章讲nutch的简单使用，是单台机器抓取，今天我讲一下nutch的分布式抓取。 由于nutch的分布式是采用hadoop，所以nutch的分布式抓取主要涉及到hadoop和nutch本身两方面的配置。 hadoop的配置 hadoop的配置主要涉及到以下几个文件: hadoop-env.sh hadoop-env.sh里面是一些hadoop脚本文件需要用到的环境变量。 JAVA_HOME hadoop-env.sh中最重要的选项是JAVA_HOME, 如果这个选项没有设置的话，而且你的系统也没有设置这个环境变量的话，运行hadoop脚本的时候会出现下面的错误提示： ?View Code TEXTError: JAVA_HOME is not set. 我改了一下hadoop脚本，当你没有设置JAVA_HOME的时候，可以通过&#8221;which java&#8221;命令来自动设置JAVA_HOME，代码如下： ?View Code BASH1 2 3 if &#91;&#91; -z &#34;$JAVA_HOME&#34; &#93;&#93;; then JAVA_HOME=`cd $&#40;dirname $&#40;readlink -m $&#40;which java&#41;&#41;&#41;/../../ &#38;&#38; pwd` fi 原理很简单，首先用which命令找到java命令的路径，然后用readlink命令得到软链接指向的真正目录，而java命令一般都在$JAVA_HOME/jre/bin/下，所以得到java命令的目录就知道了JAVA_HOME了。这个方法在ubuntu下一般都有效，但是在gentoo下无效，gentoo下java命令是/usr/bin/run-java-tool。 HADOOP_SSH_OPTS 这个选项是传给ssh的。由于hadoop在启动集群内别的机器上的hadoop程序的时候，是通过ssh来操作的，所以你可以通过设置这个选项来控制ssh的选项。ssh登录到别的机器的时候，如果目标机器没有经过你的认证，即它的key不在你的~/.ssh/known_hosts里面，你就会得到如下的提示： ?View Code TEXT1 2 3 The authenticity of host 'aheiu (172.0.1.208)' can't be established. [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright" title="Nutch" src="screenshots/nutch-logo.gif"/></p>
<p>前段时间我写了一篇文章讲<a href="nutch-tutorial.htm" target="_blank">nutch的简单使用</a>，是单台机器抓取，今天我讲一下nutch的分布式抓取。</p>
<p>由于nutch的分布式是采用hadoop，所以nutch的分布式抓取主要涉及到hadoop和nutch本身两方面的配置。<span id="more-40688"></span></p>
<h4>hadoop的配置</h4>
<p>hadoop的配置主要涉及到以下几个文件:
<ul>
<li>hadoop-env.sh<br />
    hadoop-env.sh里面是一些hadoop脚本文件需要用到的环境变量。
<ol>
<li><a href="http://ahei.info/t/java" class="st_tag internal_tag" rel="tag" title="标签 java 下的日志">JAVA</a>_HOME<br />
    hadoop-env.sh中最重要的选项是JAVA_HOME, 如果这个选项没有设置的话，而且你的系统也没有设置这个环境变量的话，运行hadoop脚本的时候会出现下面的错误提示：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code18'); return false;">View Code</a> TEXT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068818"><td class="code" id="p40688code18"><pre class="text" style="font-family:monospace;">Error: JAVA_HOME is not set.</pre></td></tr></table></div>

<p>      我改了一下hadoop脚本，当你没有设置JAVA_HOME的时候，可以通过&#8221;which java&#8221;命令来自动设置JAVA_HOME，代码如下：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code19'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068819"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p40688code19"><pre class="bash" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-z</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$JAVA_HOME</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
    <span style="color: #007800;">JAVA_HOME</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #7a0874; font-weight: bold;">cd</span> $<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #c20cb9; font-weight: bold;">dirname</span> $<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #c20cb9; font-weight: bold;">readlink</span> <span style="color: #660033;">-m</span> $<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #c20cb9; font-weight: bold;">which</span> java<span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #000000; font-weight: bold;">/</span>..<span style="color: #000000; font-weight: bold;">/</span>..<span style="color: #000000; font-weight: bold;">/</span> <span style="color: #000000; font-weight: bold;">&amp;&amp;</span> <span style="color: #7a0874; font-weight: bold;">pwd</span><span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #000000; font-weight: bold;">fi</span></pre></td></tr></table></div>

<p>      原理很简单，首先用which命令找到java命令的路径，然后用readlink命令得到软链接指向的真正目录，而java命令一般都在$JAVA_HOME/jre/bin/下，所以得到java命令的目录就知道了JAVA_HOME了。这个方法在ubuntu下一般都有效，但是在gentoo下无效，gentoo下java命令是/usr/bin/run-java-tool。</li>
<li>HADOOP_SSH_OPTS<br />
        这个选项是传给ssh的。由于hadoop在启动集群内别的机器上的hadoop程序的时候，是通过ssh来操作的，所以你可以通过设置这个选项来控制ssh的选项。ssh登录到别的机器的时候，如果目标机器没有经过你的认证，即它的key不在你的~/.ssh/known_hosts里面，你就会得到如下的提示：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code20'); return false;">View Code</a> TEXT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068820"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p40688code20"><pre class="text" style="font-family:monospace;">The authenticity of host 'aheiu (172.0.1.208)' can't be established.
RSA key fingerprint is fa:e0:57:4e:6a:1d:e3:3e:49:86:8f:13:e5:45:47:f0.
Are you sure you want to continue connecting (yes/no)?</pre></td></tr></table></div>

<p>        这时候你必须输入yes才能继续。这样就需要人工的干预了。那么怎样才能做到自动化呢？<br />
        ssh有个选项StrictHostKeyChecking, 这个选项控制当目标主机没有进行过认证的时候，是否显示上面的信息，所以我们登录别的机器的时候，只需要ssh -O StrictHostKeyChecking=no就可以直接登录了，就不会有上面烦人的提示了, 而且还会讲目标主机key加到~/.ssh/known_hosts里面。</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code21'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068821"><td class="code" id="p40688code21"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">alias</span> <span style="color: #007800;">ssh</span>=<span style="color: #ff0000;">'ssh -o StrictHostKeyChecking=no'</span></pre></td></tr></table></div>

<p>        这样以后每次只要输入ssh, 不用输入那么长的命令了。</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code22'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068822"><td class="code" id="p40688code22"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_SSH_OPTS</span>=<span style="color: #ff0000;">&quot;-o StrictHostKeyChecking=no&quot;</span></pre></td></tr></table></div>

<p>        这样配置以后，启动hadoop集群的时候，也不需要手工输入那个yes了。
        </li>
<li>HADOOP_PID_DIR<br />
        hadoop脚本启动hadoop程序的时候，把每一个程序的pid写到一个文件里，这个文件所在的目录就是HADOOP_PID_DIR的值。HADOOP_PID_DIR的默认值是/tmp, 这样如果想在同一个机器集群上启动多个hadoop集群，就会覆盖pid文件，所以要设置成其他目录：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code23'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068823"><td class="code" id="p40688code23"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_PID_DIR</span>=<span style="color: #800000;">${HADOOP_HOME}</span><span style="color: #000000; font-weight: bold;">/</span>pids</pre></td></tr></table></div>

</li>
</ol>
<p>    hadoop-env.sh配置如下：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code24'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068824"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
</pre></td><td class="code" id="p40688code24"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;"># Set Hadoop-specific environment variables here.</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># The only required environment variable is JAVA_HOME.  All others are</span>
<span style="color: #666666; font-style: italic;"># optional.  When running a distributed configuration it is best to</span>
<span style="color: #666666; font-style: italic;"># set JAVA_HOME in this file, so that it is correctly defined on</span>
<span style="color: #666666; font-style: italic;"># remote nodes.</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># The java implementation to use.  Required.</span>
<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">JAVA_HOME</span>=<span style="color: #000000; font-weight: bold;">/</span>usr<span style="color: #000000; font-weight: bold;">/</span>lib<span style="color: #000000; font-weight: bold;">/</span>jvm<span style="color: #000000; font-weight: bold;">/</span>java-<span style="color: #000000;">6</span>-sun
&nbsp;
<span style="color: #666666; font-style: italic;"># The maximum amount of heap to use, in MB. Default is 1000.</span>
<span style="color: #666666; font-style: italic;"># export HADOOP_HEAPSIZE=2000</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># Extra Java runtime options.  Empty by default.</span>
<span style="color: #666666; font-style: italic;"># export HADOOP_OPTS=-server</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># Extra ssh options.  Default: '-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR'.</span>
<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_SSH_OPTS</span>=<span style="color: #ff0000;">&quot;-o StrictHostKeyChecking=no&quot;</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># Where log files are stored.  $HADOOP_HOME/logs by default.</span>
<span style="color: #666666; font-style: italic;"># export HADOOP_LOG_DIR=${HADOOP_HOME}/logs</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.</span>
<span style="color: #666666; font-style: italic;"># export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># host:path where hadoop code should be rsync'd from.  Unset by default.</span>
<span style="color: #666666; font-style: italic;"># export HADOOP_MASTER=master:/home/$USER/src/hadoop</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># The directory where pid files are stored. /tmp by default.</span>
<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_PID_DIR</span>=<span style="color: #800000;">${HADOOP_HOME}</span><span style="color: #000000; font-weight: bold;">/</span>pids
&nbsp;
<span style="color: #666666; font-style: italic;"># A string representing this instance of hadoop. $USER by default.</span>
<span style="color: #666666; font-style: italic;"># export HADOOP_IDENT_STRING=$USER</span></pre></td></tr></table></div>

</li>
<li>hadoop-site.xml<br />
    hadoop-site.xml是对hadoop的java程序进行配置。和nutch一样，hadoop-default.xml是默认的配置，不要直接修改它，把你的配置放到hadoop-site.xml中来。<br />
    必须的选项：
<ol>
<li>hadoop.tmp.dir<br />
        hadoop的dfs数据和map reduce程序运行的时候临时数据存放在此
      </li>
<li>fs.default.name<br />
        namenode的ip和端口
      </li>
<li>mapred.job.tracker<br />
        jobtracker的ip和端口
      </li>
</ol>
<p>可选的选项：
<ol>
<li>mapred.job.tracker.http.address<br />
        jobtracker的web ip和端口配置
      </li>
<li>dfs.http.address<br />
        hdfs的web ip和端口配置
      </li>
<p>      在一个机器集群上配置多个hadoop集群的时候，需要修改上面这两个选项和上面的必须的选项中关于namenode和jobtracker的两个选项。
<li>mapred.map.tasks<br />
        每个任务的map task数目
      </li>
<li>mapred.reduce.tasks<br />
        每个任务的reduce task书目
      </li>
<li>mapred.tasktracker.map.tasks.maximum<br />
        每个tasktracker能运行的map task的最大的数目
      </li>
<li>mapred.tasktracker.reduce.tasks.maximum<br />
        每个tasktracker能运行的reduce task的最大的数目
      </li>
<li>mapred.child.java.opts<br />
        传给每个task程序的java选项，默认的是设置最大内存为200M
      </li>
</ol>
<p>    hadoop-site.xml配置如下：</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://ahei.info/wp-content/plugins/wp-codebox/wp-codebox.php?p=40688&amp;download=hadoop-site.xml">hadoop-site.xml</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068825"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
</pre></td><td class="code" id="p40688code25"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;?xml</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;1.0&quot;</span><span style="color: #000000; font-weight: bold;">?&gt;</span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;?xml-stylesheet</span> <span style="color: #000066;">type</span>=<span style="color: #ff0000;">&quot;text/xsl&quot;</span> <span style="color: #000066;">href</span>=<span style="color: #ff0000;">&quot;configuration.xsl&quot;</span><span style="color: #000000; font-weight: bold;">?&gt;</span></span>
&nbsp;
<span style="color: #808080; font-style: italic;">&lt;!-- Put site-specific property overrides in this file. --&gt;</span>
&nbsp;
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;configuration<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>master<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
	<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>hadoop.tmp.dir<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>/opt/crawler/data<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>A base for other temporary directories.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>fs.default.name<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>hdfs://${master}:9000/<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>The name of the default file system. Either the literal string
      &quot;local&quot; or a host:port for DFS.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>mapred.job.tracker<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>hdfs://${master}:9001/<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>The host and port that the MapReduce job tracker runs at. If
      &quot;local&quot;, then jobs are run in-process as a single map and reduce task.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>mapred.job.tracker.http.address<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>0.0.0.0:50030<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      The job tracker http server address and port the server will listen on.
      If the port is 0 then the server will start on a free port.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>dfs.http.address<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>0.0.0.0:50070<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      The address and the base port where the dfs namenode web ui will listen on.
      If the port is 0 then the server will start on a free port.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>mapred.map.tasks<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>31<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>The default number of map tasks per job.  Typically set
      to a prime several times greater than number of available hosts.
      Ignored when mapred.job.tracker is &quot;local&quot;.  
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>mapred.reduce.tasks<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>5<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>The default number of reduce tasks per job.  Typically set
      to a prime close to the number of available hosts.  Ignored when
      mapred.job.tracker is &quot;local&quot;.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>mapred.tasktracker.map.tasks.maximum<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>10<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>The maximum number of map tasks that will be run
      simultaneously by a task tracker.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>mapred.tasktracker.reduce.tasks.maximum<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>10<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>The maximum number of reduce tasks that will be run
      simultaneously by a task tracker.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>mapred.child.java.opts<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>-Xmx1024m<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Java opts for the task tracker child processes.  
      The following symbol, if present, will be interpolated: @taskid@ is replaced 
      by current TaskID. Any other occurrences of '@' will go unchanged.
      For example, to enable verbose gc logging to a file named for the taskid in
      /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
      -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
&nbsp;
      The configuration variable mapred.child.ulimit can be used to control the
      maximum virtual memory of the child processes. 
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>mapred.task.tracker.http.address<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>0.0.0.0:0<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      The task tracker http server address and port.
      If the port is 0 then the server will start on a free port.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>dfs.secondary.http.address<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>0.0.0.0:0<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      The secondary namenode http server address and port.
      If the port is 0 then the server will start on a free port.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>dfs.datanode.address<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>0.0.0.0:0<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      The address where the datanode server will listen to.
      If the port is 0 then the server will start on a free port.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>dfs.datanode.http.address<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>0.0.0.0:0<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      The datanode http server address and port.
      If the port is 0 then the server will start on a free port.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>dfs.datanode.ipc.address<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>0.0.0.0:0<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      The datanode ipc server address and port.
      If the port is 0 then the server will start on a free port.
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/configuration<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

</li>
<li>master secondarymasters<br />
    hadoop中默认的没有master这个文件，只有个masters文件，启动hadoop集群的时候只能在master上启动，不能在slave上启动，masters文件里面存放的是secondarynamenode的ip。我改了一下hadoop的脚本，master文件里面存放master的ip，secondarymasters里面存放secondarynamenode的ip。
  </li>
</ul>
<h4>nutch的配置</h4>
<ul>
<li>urlfilter<br />
    由于plugin.includes中只包含了urlfilter-regex，而根据<a href="nutch-load-conf.htm">《nutch配置文件的加载》</a>一文，crawl-tool.xml文件的优先级最高，所以urlfilter-regex插件所用到的配置文件应该是crawl-tool.xml中配置的，默认是crawl-urlfilter.txt，改其配置如下：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code26'); return false;">View Code</a> TEXT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068826"><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code" id="p40688code26"><pre class="text" style="font-family:monospace;"># skip file:, ftp:, &amp; mailto: urls
-^(file|ftp|mailto):
&nbsp;
# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$
&nbsp;
+.</pre></td></tr></table></div>

</li>
<li><a href="http://ahei.info/c-nutch.htm" class="st_tag internal_tag" rel="tag" title="标签 Nutch 下的日志">nutch</a>-site.xml<br />
    必须的配置是http.agent.name和http.robots.agents，和<a href="nutch-load-conf.htm">《nutch配置文件的加载》</a>文中一样。
  </li>
</ul>
<h4>一些方便部署的脚本</h4>
<p>我修改了一些hadoop的脚本，使得部署和监控hadoop更方便。
<ul>
<li>restart-all.sh<br />
    重启hadoop集群
  </li>
<li>all.sh<br />
    这个脚本使你可以同时在集群的所有机器上执行同一个命令，比如你想查看集群上的日志里有没有错误，这样就可以了：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code27'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068827"><td class="code" id="p40688code27"><pre class="bash" style="font-family:monospace;">      .<span style="color: #000000; font-weight: bold;">/</span>all.sh <span style="color: #c20cb9; font-weight: bold;">grep</span> ERROR path-of-logs<span style="color: #000000; font-weight: bold;">/</span>hadoop.log</pre></td></tr></table></div>


<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://ahei.info/wp-content/plugins/wp-codebox/wp-codebox.php?p=40688&amp;download=all.sh">all.sh</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068828"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
</pre></td><td class="code" id="p40688code28"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/bin/sh</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># Time-stamp: &lt;2010-01-04 16:21:16 Monday by ahei&gt;</span>
&nbsp;
<span style="color: #7a0874; font-weight: bold;">readonly</span> <span style="color: #007800;">PROGRAM_NAME</span>=<span style="color: #ff0000;">&quot;rm-all.sh&quot;</span>
&nbsp;
usage<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #7a0874; font-weight: bold;">&#123;</span>
    <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;usage: <span style="color: #007800;">${PROGRAM_NAME}</span> -h | [-s] &lt;COMMAND&gt; ...&quot;</span>
    <span style="color: #7a0874; font-weight: bold;">echo</span>
    <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;Options&quot;</span>
    <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;-s<span style="color: #000099; font-weight: bold;">\t</span>sort output&quot;</span>
&nbsp;
    <span style="color: #7a0874; font-weight: bold;">exit</span> <span style="color: #000000;">1</span>
<span style="color: #7a0874; font-weight: bold;">&#125;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;x$1&quot;</span> = <span style="color: #ff0000;">&quot;x-h&quot;</span> <span style="color: #660033;">-o</span> <span style="color: #ff0000;">&quot;x$1&quot;</span> = <span style="color: #ff0000;">&quot;x--help&quot;</span> <span style="color: #660033;">-o</span> <span style="color: #007800;">$#</span> = <span style="color: #000000;">0</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
    usage
<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
<span style="color: #007800;">bin</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">dirname</span> <span style="color: #ff0000;">&quot;$0&quot;</span><span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #007800;">bin</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>&quot;</span>; <span style="color: #7a0874; font-weight: bold;">pwd</span><span style="color: #000000; font-weight: bold;">`</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;x$1&quot;</span> = <span style="color: #ff0000;">&quot;x-s&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
    <span style="color: #7a0874; font-weight: bold;">shift</span>
    <span style="color: #007800;">output</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>/slaves.sh&quot;</span> <span style="color: #660033;">--hosts</span> master <span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">${bin}</span>&quot;</span> \<span style="color: #000000; font-weight: bold;">&amp;</span>\<span style="color: #000000; font-weight: bold;">&amp;</span> <span style="color: #ff0000;">&quot;$@&quot;</span><span style="color: #000000; font-weight: bold;">`</span>
    <span style="color: #007800;">output</span>=<span style="color: #ff0000;">&quot;<span style="color: #007800;">$output</span><span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #000000; font-weight: bold;">`</span><span style="color: #007800;">$bin</span><span style="color: #000000; font-weight: bold;">/</span>slaves.sh<span style="color: #ff0000;">&quot; --hosts secondarymasters cd &quot;</span><span style="color: #800000;">${bin}</span><span style="color: #ff0000;">&quot; \&amp;\&amp; &quot;</span>$<span style="color: #000000; font-weight: bold;">@</span><span style="color: #ff0000;">&quot;<span style="color: #780078;">`
    output=&quot;${output}\n&quot;`</span>&quot;</span><span style="color: #007800;">$bin</span><span style="color: #000000; font-weight: bold;">/</span>slaves.sh<span style="color: #ff0000;">&quot; cd &quot;</span><span style="color: #800000;">${bin}</span><span style="color: #ff0000;">&quot; \&amp;\&amp; &quot;</span>$<span style="color: #000000; font-weight: bold;">@</span><span style="color: #ff0000;">&quot;`
    echo &quot;</span><span style="color: #800000;">${output}</span><span style="color: #ff0000;">&quot; | sort
else
    &quot;</span><span style="color: #007800;">$bin</span><span style="color: #000000; font-weight: bold;">/</span>slaves.sh<span style="color: #ff0000;">&quot; --hosts master cd &quot;</span><span style="color: #800000;">${bin}</span><span style="color: #ff0000;">&quot; \&amp;\&amp; &quot;</span>$<span style="color: #000000; font-weight: bold;">@</span><span style="color: #ff0000;">&quot;
    &quot;</span><span style="color: #007800;">$bin</span><span style="color: #000000; font-weight: bold;">/</span>slaves.sh<span style="color: #ff0000;">&quot; --hosts secondarymasters cd &quot;</span><span style="color: #800000;">${bin}</span><span style="color: #ff0000;">&quot; \&amp;\&amp; &quot;</span>$<span style="color: #000000; font-weight: bold;">@</span><span style="color: #ff0000;">&quot;
    &quot;</span><span style="color: #007800;">$bin</span><span style="color: #000000; font-weight: bold;">/</span>slaves.sh<span style="color: #ff0000;">&quot; cd &quot;</span><span style="color: #800000;">${bin}</span><span style="color: #ff0000;">&quot; \&amp;\&amp; &quot;</span>$<span style="color: #000000; font-weight: bold;">@</span><span style="color: #ff0000;">&quot;
fi</span></pre></td></tr></table></div>

</li>
<li>clean-logs.sh<br />
    删除所有机器上的log</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://ahei.info/wp-content/plugins/wp-codebox/wp-codebox.php?p=40688&amp;download=clean-logs.sh">clean-logs.sh</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068829"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code" id="p40688code29"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/bin/sh</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># Time-stamp: &lt;10/24/2008 11:09:33 星期五 by ahei&gt;</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># Clean logs on all hadoop daemons.</span>
&nbsp;
<span style="color: #007800;">bin</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">dirname</span> <span style="color: #ff0000;">&quot;$0&quot;</span><span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #007800;">bin</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>&quot;</span>; <span style="color: #7a0874; font-weight: bold;">pwd</span><span style="color: #000000; font-weight: bold;">`</span>
&nbsp;
. <span style="color: #007800;">$bin</span><span style="color: #000000; font-weight: bold;">/</span>hadoop-config.sh
&nbsp;
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-f</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">${HADOOP_CONF_DIR}</span>/hadoop-env.sh&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
  . <span style="color: #ff0000;">&quot;<span style="color: #007800;">${HADOOP_CONF_DIR}</span>/hadoop-env.sh&quot;</span>
<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
<span style="color: #007800;">HADOOP_LOG_DIR</span>=<span style="color: #800000;">${HADOOP_LOG_DIR:-&quot;${HADOOP_HOME}</span><span style="color: #000000; font-weight: bold;">/</span>logs<span style="color: #ff0000;">&quot;}
&nbsp;
&quot;</span><span style="color: #800000;">${bin}</span><span style="color: #000000; font-weight: bold;">/</span>rm-all.sh<span style="color: #ff0000;">&quot; &quot;</span><span style="color: #800000;">${HADOOP_LOG_DIR}</span><span style="color: #ff0000;">&quot;</span></pre></td></tr></table></div>

</li>
<li>df-all.sh du-all.sh jps-all.sh ll-all.sh mv-all.sh rm-all.sh<br />
    在所有的机器上执行对应的前缀命令，比如df-all.sh，即在所有机器上执行df命令，这些脚本调用的都是all.sh。
  </li>
<li>update-conf.sh<br />
    配置hadoop的时候，有两个地方需要配置master的ip，一个是master文件夹，另一个是hadoop-site.xml中配置namenode和jobtracker的ip，那么每次配置hadoop的时候都需要配置这两个项，能不能只配置一个呢？还有，为了方便管理，我部署nutch的时候，建立的文件结构是这样的，/opt/crawler，/opt/crawler/data，/opt/crawler/program，data这个文件夹是hadoop.tmp.dir，program则是nutch的程序，所以hadoop.tmp.dir实际上即使$HADOOP_HOME/../data。为了方便部署，我写了这个update-conf.sh脚本，自动把master文件中的内容写到haoop-site.xml中去，而且自动更新hadoop-site.xml中的hadoop.tmp.dir的值，这样你配置的时候，只需要配置master文件就可以了。</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://ahei.info/wp-content/plugins/wp-codebox/wp-codebox.php?p=40688&amp;download=update-conf.sh">update-conf.sh</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068830"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
</pre></td><td class="code" id="p40688code30"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/usr/bin/env bash</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># Time-stamp: &lt;2010-01-22 15:31:53 Friday by ahei&gt;</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># @version 1.0</span>
<span style="color: #666666; font-style: italic;"># @author ahei</span>
&nbsp;
<span style="color: #007800;">bin</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">dirname</span> <span style="color: #ff0000;">&quot;$0&quot;</span><span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #007800;">bin</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>&quot;</span> <span style="color: #000000; font-weight: bold;">&amp;&amp;</span> <span style="color: #7a0874; font-weight: bold;">pwd</span><span style="color: #000000; font-weight: bold;">`</span>
&nbsp;
resolveLink<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #7a0874; font-weight: bold;">&#123;</span>
    <span style="color: #007800;">this</span>=<span style="color: #ff0000;">&quot;$1&quot;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-L</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$this</span>&quot;</span> <span style="color: #000000; font-weight: bold;">&amp;&amp;</span> <span style="color: #660033;">-r</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$this</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">do</span>
        <span style="color: #007800;">link</span>=$<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #c20cb9; font-weight: bold;">readlink</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$this</span>&quot;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
        <span style="color: #007800;">link</span>=$<span style="color: #7a0874; font-weight: bold;">&#40;</span>normalizePath <span style="color: #ff0000;">&quot;<span style="color: #007800;">$link</span>&quot;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
&nbsp;
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">${link:0:1}</span>&quot;</span> = <span style="color: #ff0000;">&quot;/&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
            <span style="color: #007800;">this</span>=<span style="color: #ff0000;">&quot;<span style="color: #007800;">$link</span>&quot;</span>
        <span style="color: #000000; font-weight: bold;">else</span>
            <span style="color: #007800;">dir</span>=$<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #c20cb9; font-weight: bold;">dirname</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$this</span>&quot;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
            <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$dir</span>&quot;</span> <span style="color: #000000; font-weight: bold;">!</span>= <span style="color: #ff0000;">&quot;.&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
                <span style="color: #007800;">this</span>=<span style="color: #ff0000;">&quot;<span style="color: #007800;">$dir</span>/<span style="color: #007800;">$link</span>&quot;</span>
            <span style="color: #000000; font-weight: bold;">else</span>
                <span style="color: #007800;">this</span>=<span style="color: #ff0000;">&quot;<span style="color: #007800;">$link</span>&quot;</span>
            <span style="color: #000000; font-weight: bold;">fi</span>
        <span style="color: #000000; font-weight: bold;">fi</span>
    <span style="color: #000000; font-weight: bold;">done</span>
&nbsp;
    <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$this</span>&quot;</span>
<span style="color: #7a0874; font-weight: bold;">&#125;</span>
&nbsp;
normalizePath<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #7a0874; font-weight: bold;">&#123;</span>
    <span style="color: #7a0874; font-weight: bold;">local</span> <span style="color: #007800;">path</span>=<span style="color: #ff0000;">&quot;$1&quot;</span>
&nbsp;
    <span style="color: #007800;">dir</span>=$<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #c20cb9; font-weight: bold;">dirname</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$path</span>&quot;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
    <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$dir</span>&quot;</span> <span style="color: #000000; font-weight: bold;">!</span>= <span style="color: #ff0000;">&quot;.&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
        <span style="color: #007800;">path</span>=<span style="color: #007800;">$dir</span><span style="color: #000000; font-weight: bold;">/</span>$<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #c20cb9; font-weight: bold;">basename</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$path</span>&quot;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
    <span style="color: #000000; font-weight: bold;">else</span>
        <span style="color: #007800;">path</span>=$<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #c20cb9; font-weight: bold;">basename</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$path</span>&quot;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
    <span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
    <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$path</span>&quot;</span>
<span style="color: #7a0874; font-weight: bold;">&#125;</span>
&nbsp;
<span style="color: #007800;">confFile</span>=hadoop-site.xml
&nbsp;
<span style="color: #666666; font-style: italic;"># update master setting</span>
&nbsp;
<span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>&quot;</span><span style="color: #000000; font-weight: bold;">/</span>..<span style="color: #000000; font-weight: bold;">/</span>conf <span style="color: #000000; font-weight: bold;">&amp;&amp;</span>
<span style="color: #007800;">no</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #660033;">-xE</span> <span style="color: #ff0000;">&quot;[[:space:]]*&lt;name&gt;master&lt;/name&gt;[[:space:]]*&quot;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$confFile</span>&quot;</span> <span style="color: #660033;">-n</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">tail</span> <span style="color: #660033;">-1</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">awk</span> -F: <span style="color: #ff0000;">'{print $1}'</span><span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #7a0874; font-weight: bold;">let</span> no++
<span style="color: #007800;">master</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">cat</span> master<span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #c20cb9; font-weight: bold;">sed</span> <span style="color: #660033;">-r</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$no</span> s#&lt;value&gt;.*&lt;/value&gt;#&lt;value&gt;<span style="color: #007800;">$master</span>&lt;/value&gt;#g&quot;</span> <span style="color: #660033;">-i</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$confFile</span>&quot;</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># update hadoop.tmp.dir</span>
&nbsp;
<span style="color: #007800;">no</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #660033;">-xE</span> <span style="color: #ff0000;">&quot;[[:space:]]*&lt;name&gt;hadoop.tmp.dir&lt;/name&gt;[[:space:]]*&quot;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$confFile</span>&quot;</span> <span style="color: #660033;">-n</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">tail</span> <span style="color: #660033;">-1</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">awk</span> -F: <span style="color: #ff0000;">'{print $1}'</span><span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #7a0874; font-weight: bold;">let</span> no++
&nbsp;
<span style="color: #007800;">dataDir</span>=$<span style="color: #7a0874; font-weight: bold;">&#40;</span>resolveLink <span style="color: #000000; font-weight: bold;">`</span><span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>&quot;</span><span style="color: #000000; font-weight: bold;">/</span>..<span style="color: #000000; font-weight: bold;">/</span>.. <span style="color: #000000; font-weight: bold;">&amp;&amp;</span> <span style="color: #7a0874; font-weight: bold;">pwd</span><span style="color: #000000; font-weight: bold;">`</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #000000; font-weight: bold;">/</span>data
<span style="color: #c20cb9; font-weight: bold;">sed</span> <span style="color: #660033;">-r</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$no</span> s#&lt;value&gt;.*&lt;/value&gt;#&lt;value&gt;<span style="color: #007800;">$dataDir</span>&lt;/value&gt;#g&quot;</span> <span style="color: #660033;">-i</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$confFile</span>&quot;</span></pre></td></tr></table></div>

</li>
<li><span style="color: #0000ff;">kill-all.sh</span><br />
    由于hadoop的stop-all.sh脚本是根据pid文件来kill hadoop的daemon程序的，所以如果你不小心删除了pid文件，stop-all.sh就不能kill掉那些daemon程序了。kill-all.sh弥补了stop-all.sh的缺陷，它是通过jps命令来得到所有的java进程pid，然后根据daemon程序的名字来得到所有的daemon程序的pid，再根据/proc文件夹得到这些进程的当前目录，如果这个当前目录与HADOOP_HOME一样，就kill掉这个进程。<br />
    <span style="color: #0000ff;">ping-all.sh</span><br />
    这个脚本不是在所有的机器上运行ping命令，而是ping一下所有机器上的daemon程序，还是否还活着，管理hadoop集群的时候很方便。<br />
    由于kill-all.sh和ping-all.sh最终都是通过hadoop-daemon.sh来实现的, 我这里只列出hadoop-daemon.sh的代码:</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://ahei.info/wp-content/plugins/wp-codebox/wp-codebox.php?p=40688&amp;download=hadoop-daemon.sh">hadoop-daemon.sh</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068831"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
</pre></td><td class="code" id="p40688code31"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/bin/sh</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># Time-stamp: &lt;2010-02-03 15:34:22 Wednesday by ahei&gt;</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># Runs a Hadoop command as a daemon.</span>
<span style="color: #666666; font-style: italic;">#</span>
<span style="color: #666666; font-style: italic;"># Environment Variables</span>
<span style="color: #666666; font-style: italic;">#</span>
<span style="color: #666666; font-style: italic;">#   HADOOP_CONF_DIR  Alternate conf dir. Default is ${HADOOP_HOME}/conf.</span>
<span style="color: #666666; font-style: italic;">#   HADOOP_LOG_DIR   Where log files are stored.  PWD by default.</span>
<span style="color: #666666; font-style: italic;">#   HADOOP_MASTER    host:path where hadoop code should be rsync'd from</span>
<span style="color: #666666; font-style: italic;">#   HADOOP_PID_DIR   The pid files are stored. /tmp by default.</span>
<span style="color: #666666; font-style: italic;">#   HADOOP_IDENT_STRING   A string representing this instance of hadoop. $USER by default</span>
<span style="color: #666666; font-style: italic;">#   HADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.</span>
&nbsp;
<span style="color: #007800;">usage</span>=<span style="color: #ff0000;">&quot;Usage: hadoop-daemon.sh [--config &lt;conf-dir&gt;] [--hosts hostlistfile] (start|stop|ping) &lt;hadoop-command&gt; &lt;args...&gt;&quot;</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># if no args specified, show usage</span>
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #007800;">$#</span> <span style="color: #660033;">-le</span> <span style="color: #000000;">1</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
    <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #007800;">$usage</span>
    <span style="color: #7a0874; font-weight: bold;">exit</span> <span style="color: #000000;">1</span>
<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
<span style="color: #007800;">bin</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">dirname</span> <span style="color: #ff0000;">&quot;$0&quot;</span><span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #007800;">bin</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>&quot;</span>; <span style="color: #7a0874; font-weight: bold;">pwd</span><span style="color: #000000; font-weight: bold;">`</span>
&nbsp;
. <span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>&quot;</span><span style="color: #000000; font-weight: bold;">/</span>hadoop-config.sh
&nbsp;
<span style="color: #666666; font-style: italic;"># get arguments</span>
<span style="color: #007800;">startStop</span>=<span style="color: #007800;">$1</span>
<span style="color: #7a0874; font-weight: bold;">shift</span>
<span style="color: #007800;">command</span>=<span style="color: #007800;">$1</span>
<span style="color: #7a0874; font-weight: bold;">shift</span>
&nbsp;
hadoop_rotate_log <span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #7a0874; font-weight: bold;">&#123;</span>
    <span style="color: #007800;">log</span>=<span style="color: #007800;">$1</span>;
    <span style="color: #007800;">num</span>=<span style="color: #000000;">5</span>;
    <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-n</span> <span style="color: #ff0000;">&quot;$2&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
	    <span style="color: #007800;">num</span>=<span style="color: #007800;">$2</span>
    <span style="color: #000000; font-weight: bold;">fi</span>
    <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-f</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$log</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span> <span style="color: #666666; font-style: italic;"># rotate logs</span>
	    <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #007800;">$num</span> <span style="color: #660033;">-gt</span> <span style="color: #000000;">1</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">do</span>
	        <span style="color: #007800;">prev</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">expr</span> <span style="color: #007800;">$num</span> - <span style="color: #000000;">1</span><span style="color: #000000; font-weight: bold;">`</span>
	        <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-f</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$log</span>.<span style="color: #007800;">$prev</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span> <span style="color: #000000; font-weight: bold;">&amp;&amp;</span> <span style="color: #c20cb9; font-weight: bold;">mv</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$log</span>.<span style="color: #007800;">$prev</span>&quot;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$log</span>.<span style="color: #007800;">$num</span>&quot;</span>
	        <span style="color: #007800;">num</span>=<span style="color: #007800;">$prev</span>
	    <span style="color: #000000; font-weight: bold;">done</span>
	    <span style="color: #c20cb9; font-weight: bold;">mv</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$log</span>&quot;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$log</span>.<span style="color: #007800;">$num</span>&quot;</span>;
    <span style="color: #000000; font-weight: bold;">fi</span>
<span style="color: #7a0874; font-weight: bold;">&#125;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-f</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">${HADOOP_CONF_DIR}</span>/hadoop-env.sh&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
    . <span style="color: #ff0000;">&quot;<span style="color: #007800;">${HADOOP_CONF_DIR}</span>/hadoop-env.sh&quot;</span>
<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># get log directory</span>
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$HADOOP_LOG_DIR</span>&quot;</span> = <span style="color: #ff0000;">&quot;&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
    <span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_LOG_DIR</span>=<span style="color: #ff0000;">&quot;<span style="color: #007800;">$HADOOP_HOME</span>/logs&quot;</span>
<span style="color: #000000; font-weight: bold;">fi</span>
<span style="color: #c20cb9; font-weight: bold;">mkdir</span> <span style="color: #660033;">-p</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$HADOOP_LOG_DIR</span>&quot;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$HADOOP_PID_DIR</span>&quot;</span> = <span style="color: #ff0000;">&quot;&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
    <span style="color: #007800;">HADOOP_PID_DIR</span>=<span style="color: #000000; font-weight: bold;">/</span>tmp
<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$HADOOP_IDENT_STRING</span>&quot;</span> = <span style="color: #ff0000;">&quot;&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
    <span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_IDENT_STRING</span>=<span style="color: #ff0000;">&quot;<span style="color: #007800;">$USER</span>&quot;</span>
<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># some variables</span>
<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_LOGFILE</span>=hadoop-<span style="color: #007800;">$HADOOP_IDENT_STRING</span>-<span style="color: #007800;">$command</span>-<span style="color: #007800;">$HOSTNAME</span>.log
<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_ROOT_LOGGER</span>=<span style="color: #ff0000;">&quot;INFO,DRFA&quot;</span>
<span style="color: #007800;">log</span>=<span style="color: #007800;">$HADOOP_LOG_DIR</span><span style="color: #000000; font-weight: bold;">/</span>hadoop-<span style="color: #007800;">$HADOOP_IDENT_STRING</span>-<span style="color: #007800;">$command</span>-<span style="color: #007800;">$HOSTNAME</span>.out
<span style="color: #007800;">pid</span>=<span style="color: #007800;">$HADOOP_PID_DIR</span><span style="color: #000000; font-weight: bold;">/</span>hadoop-<span style="color: #007800;">$HADOOP_IDENT_STRING</span>-<span style="color: #007800;">$command</span>.pid
&nbsp;
<span style="color: #666666; font-style: italic;"># Set default scheduling priority</span>
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$HADOOP_NICENESS</span>&quot;</span> = <span style="color: #ff0000;">&quot;&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
    <span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">HADOOP_NICENESS</span>=<span style="color: #000000;">0</span>
<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">case</span> <span style="color: #007800;">$startStop</span> <span style="color: #000000; font-weight: bold;">in</span>
    start<span style="color: #7a0874; font-weight: bold;">&#41;</span>
        <span style="color: #c20cb9; font-weight: bold;">mkdir</span> <span style="color: #660033;">-p</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$HADOOP_PID_DIR</span>&quot;</span>
&nbsp;
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-f</span> <span style="color: #007800;">$pid</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
            <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #c20cb9; font-weight: bold;">kill</span> <span style="color: #660033;">-0</span> <span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">cat</span> <span style="color: #007800;">$pid</span><span style="color: #000000; font-weight: bold;">`</span> <span style="color: #000000; font-weight: bold;">&gt;</span> <span style="color: #000000; font-weight: bold;">/</span>dev<span style="color: #000000; font-weight: bold;">/</span>null <span style="color: #000000;">2</span><span style="color: #000000; font-weight: bold;">&gt;&amp;</span><span style="color: #000000;">1</span>; <span style="color: #000000; font-weight: bold;">then</span>
                <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #007800;">$command</span> running <span style="color: #c20cb9; font-weight: bold;">as</span> process <span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">cat</span> <span style="color: #007800;">$pid</span><span style="color: #000000; font-weight: bold;">`</span>.  Stop it first.
                <span style="color: #7a0874; font-weight: bold;">exit</span> <span style="color: #000000;">1</span>
            <span style="color: #000000; font-weight: bold;">fi</span>
        <span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
        hadoop_rotate_log <span style="color: #007800;">$log</span>
        <span style="color: #7a0874; font-weight: bold;">echo</span> starting <span style="color: #007800;">$command</span>, logging to <span style="color: #007800;">$log</span>
        <span style="color: #c20cb9; font-weight: bold;">nohup</span> <span style="color: #c20cb9; font-weight: bold;">nice</span> <span style="color: #660033;">-n</span> <span style="color: #007800;">$HADOOP_NICENESS</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$HADOOP_HOME</span>&quot;</span><span style="color: #000000; font-weight: bold;">/</span>bin<span style="color: #000000; font-weight: bold;">/</span>hadoop <span style="color: #660033;">--config</span> <span style="color: #007800;">$HADOOP_CONF_DIR</span> <span style="color: #007800;">$command</span> <span style="color: #ff0000;">&quot;$@&quot;</span> <span style="color: #000000; font-weight: bold;">&gt;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$log</span>&quot;</span> <span style="color: #000000;">2</span><span style="color: #000000; font-weight: bold;">&gt;&amp;</span><span style="color: #000000;">1</span> <span style="color: #000000; font-weight: bold;">&lt;</span> <span style="color: #000000; font-weight: bold;">/</span>dev<span style="color: #000000; font-weight: bold;">/</span>null <span style="color: #000000; font-weight: bold;">&amp;</span>
        <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #007800;">$!</span> <span style="color: #000000; font-weight: bold;">&gt;</span> <span style="color: #007800;">$pid</span>
        <span style="color: #c20cb9; font-weight: bold;">sleep</span> <span style="color: #000000;">1</span>; <span style="color: #c20cb9; font-weight: bold;">head</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$log</span>&quot;</span>
        <span style="color: #000000; font-weight: bold;">;;</span>
&nbsp;
    stop<span style="color: #7a0874; font-weight: bold;">&#41;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-f</span> <span style="color: #007800;">$pid</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
            <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #c20cb9; font-weight: bold;">kill</span> <span style="color: #660033;">-0</span> <span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">cat</span> <span style="color: #007800;">$pid</span><span style="color: #000000; font-weight: bold;">`</span> <span style="color: #000000; font-weight: bold;">&gt;</span> <span style="color: #000000; font-weight: bold;">/</span>dev<span style="color: #000000; font-weight: bold;">/</span>null <span style="color: #000000;">2</span><span style="color: #000000; font-weight: bold;">&gt;&amp;</span><span style="color: #000000;">1</span>; <span style="color: #000000; font-weight: bold;">then</span>
                <span style="color: #7a0874; font-weight: bold;">echo</span> stopping <span style="color: #007800;">$command</span>
                <span style="color: #c20cb9; font-weight: bold;">kill</span> <span style="color: #660033;">-9</span> <span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">cat</span> <span style="color: #007800;">$pid</span><span style="color: #000000; font-weight: bold;">`</span>
            <span style="color: #000000; font-weight: bold;">else</span>
                <span style="color: #7a0874; font-weight: bold;">echo</span> no <span style="color: #007800;">$command</span> to stop
            <span style="color: #000000; font-weight: bold;">fi</span>
        <span style="color: #000000; font-weight: bold;">else</span>
            <span style="color: #7a0874; font-weight: bold;">echo</span> no <span style="color: #007800;">$command</span> to stop
        <span style="color: #000000; font-weight: bold;">fi</span>
        <span style="color: #000000; font-weight: bold;">;;</span>
&nbsp;
    <span style="color: #c20cb9; font-weight: bold;">kill</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
        <span style="color: #007800;">pids</span>=$<span style="color: #7a0874; font-weight: bold;">&#40;</span>jps <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">tr</span> <span style="color: #ff0000;">'[A-Z]'</span> <span style="color: #ff0000;">'[a-z]'</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">awk</span> <span style="color: #ff0000;">&quot;{if (NF &gt; 1 &amp;&amp; <span style="color: #000099; font-weight: bold;">\$</span>2 == <span style="color: #000099; font-weight: bold;">\&quot;</span><span style="color: #007800;">$command</span><span style="color: #000099; font-weight: bold;">\&quot;</span>){print <span style="color: #000099; font-weight: bold;">\$</span>1}}&quot;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
        <span style="color: #007800;">exist</span>=
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-n</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$pids</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
            <span style="color: #000000; font-weight: bold;">for</span> p <span style="color: #000000; font-weight: bold;">in</span> <span style="color: #007800;">$pids</span>; <span style="color: #000000; font-weight: bold;">do</span>
                <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$(readlink -m /proc/$p/cwd)</span>&quot;</span> = <span style="color: #ff0000;">&quot;<span style="color: #007800;">$(readlink -m &quot;$HADOOP_HOME&quot;)</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
                    <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;killing <span style="color: #007800;">$command</span> of pid <span style="color: #007800;">$p</span> ...&quot;</span>
                    <span style="color: #c20cb9; font-weight: bold;">kill</span> <span style="color: #660033;">-9</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$p</span>&quot;</span>
                    <span style="color: #007800;">exist</span>=<span style="color: #000000;">1</span>
                <span style="color: #000000; font-weight: bold;">fi</span>
            <span style="color: #000000; font-weight: bold;">done</span>
        <span style="color: #000000; font-weight: bold;">fi</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$exist</span>&quot;</span> <span style="color: #000000; font-weight: bold;">!</span>= <span style="color: #000000;">1</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
            <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;Can not found any <span style="color: #007800;">$command</span> to kill&quot;</span>
        <span style="color: #000000; font-weight: bold;">fi</span>
        <span style="color: #000000; font-weight: bold;">;;</span>
&nbsp;
    <span style="color: #c20cb9; font-weight: bold;">ping</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-f</span> <span style="color: #007800;">$pid</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span> <span style="color: #000000; font-weight: bold;">&amp;&amp;</span> <span style="color: #c20cb9; font-weight: bold;">kill</span> <span style="color: #660033;">-0</span> <span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">cat</span> <span style="color: #007800;">$pid</span><span style="color: #000000; font-weight: bold;">`</span> <span style="color: #000000; font-weight: bold;">&gt;</span> <span style="color: #000000; font-weight: bold;">/</span>dev<span style="color: #000000; font-weight: bold;">/</span>null <span style="color: #000000;">2</span><span style="color: #000000; font-weight: bold;">&gt;&amp;</span><span style="color: #000000;">1</span>; <span style="color: #000000; font-weight: bold;">then</span>
            <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$command</span> is alive&quot;</span>
        <span style="color: #000000; font-weight: bold;">else</span>
            <span style="color: #007800;">pids</span>=$<span style="color: #7a0874; font-weight: bold;">&#40;</span>jps <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">tr</span> <span style="color: #ff0000;">'[A-Z]'</span> <span style="color: #ff0000;">'[a-z]'</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">awk</span> <span style="color: #ff0000;">&quot;{if (NF &gt; 1 &amp;&amp; <span style="color: #000099; font-weight: bold;">\$</span>2 == <span style="color: #000099; font-weight: bold;">\&quot;</span><span style="color: #007800;">$command</span><span style="color: #000099; font-weight: bold;">\&quot;</span>){print <span style="color: #000099; font-weight: bold;">\$</span>1}}&quot;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
            <span style="color: #007800;">maybePids</span>=
            <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-n</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$pids</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
                <span style="color: #000000; font-weight: bold;">for</span> p <span style="color: #000000; font-weight: bold;">in</span> <span style="color: #007800;">$pids</span>; <span style="color: #000000; font-weight: bold;">do</span>
                    <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$(readlink -m /proc/$p/cwd)</span>&quot;</span> = <span style="color: #ff0000;">&quot;<span style="color: #007800;">$(readlink -m &quot;$HADOOP_HOME&quot;)</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
                        <span style="color: #007800;">maybePids</span>=<span style="color: #ff0000;">&quot;<span style="color: #007800;">$maybePids</span> <span style="color: #007800;">$p</span>&quot;</span>
                    <span style="color: #000000; font-weight: bold;">fi</span>
                <span style="color: #000000; font-weight: bold;">done</span>
            <span style="color: #000000; font-weight: bold;">fi</span>
            <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-z</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$maybePids</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
                <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$command</span> is dead&quot;</span>
            <span style="color: #000000; font-weight: bold;">else</span>
                <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-f</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$pid</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
                    <span style="color: #007800;">output</span>=<span style="color: #ff0000;">&quot;<span style="color: #007800;">$command</span> pid can not found in its pid file <span style="color: #007800;">$pid</span>&quot;</span>
                <span style="color: #000000; font-weight: bold;">else</span>
                    <span style="color: #007800;">output</span>=<span style="color: #ff0000;">&quot;<span style="color: #007800;">${command}</span>'s pid file <span style="color: #007800;">$pid</span> does not exist&quot;</span>
                <span style="color: #000000; font-weight: bold;">fi</span>
                <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$output</span>, but some pids<span style="color: #007800;">$maybePids</span> of <span style="color: #007800;">$command</span> exist&quot;</span>
            <span style="color: #000000; font-weight: bold;">fi</span>
        <span style="color: #000000; font-weight: bold;">fi</span>
        <span style="color: #000000; font-weight: bold;">;;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">*</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
        <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #007800;">$usage</span>
        <span style="color: #7a0874; font-weight: bold;">exit</span> <span style="color: #000000;">1</span>
        <span style="color: #000000; font-weight: bold;">;;</span>
<span style="color: #000000; font-weight: bold;">esac</span></pre></td></tr></table></div>

</li>
<li><span style="color: #0000ff;">rsync-slaves.sh</span><br />
    假如你修改了一项配置或者改了一下程序，那你怎么把所有机器上的程序都更新一下？hadoop已经替你想好了，它默认的是在hadoop-daemon.sh里调用rsync命令，来把某台机器与master同步，我单独写了这个脚本，来把所有的slave和master同步。在start-all.sh脚本里会自动调用rsync-slaves.sh，所以基本上不需要你手动执行它。该脚本会忽略名为ignores的文件或文件夹，你可以把你不想同步的文件都放到ignores文件夹里面。</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://ahei.info/wp-content/plugins/wp-codebox/wp-codebox.php?p=40688&amp;download=rsync-slaves.sh">rsync-slaves.sh</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068832"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
</pre></td><td class="code" id="p40688code32"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/bin/sh</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># Time-stamp: &lt;2010-01-14 17:07:05 Thursday by ahei&gt;</span>
&nbsp;
<span style="color: #7a0874; font-weight: bold;">readonly</span> <span style="color: #007800;">PROGRAM_NAME</span>=<span style="color: #ff0000;">&quot;rsync-slaves.sh&quot;</span>
&nbsp;
usage<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #7a0874; font-weight: bold;">&#123;</span>
    <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;usage: <span style="color: #007800;">${PROGRAM_NAME}</span> [--hosts hostlistfile] [-h]&quot;</span>
    <span style="color: #7a0874; font-weight: bold;">exit</span> <span style="color: #000000;">1</span>
<span style="color: #7a0874; font-weight: bold;">&#125;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #ff0000;">&quot;x$1&quot;</span> = <span style="color: #ff0000;">&quot;x-h&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
    usage
<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
<span style="color: #007800;">bin</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">dirname</span> <span style="color: #ff0000;">&quot;$0&quot;</span><span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #007800;">bin</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>&quot;</span>; <span style="color: #7a0874; font-weight: bold;">pwd</span><span style="color: #000000; font-weight: bold;">`</span>
&nbsp;
. <span style="color: #007800;">$bin</span><span style="color: #000000; font-weight: bold;">/</span>hadoop-config.sh
&nbsp;
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-f</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">${HADOOP_CONF_DIR}</span>/hadoop-env.sh&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
  . <span style="color: #ff0000;">&quot;<span style="color: #007800;">${HADOOP_CONF_DIR}</span>/hadoop-env.sh&quot;</span>
<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
<span style="color: #007800;">command</span>=<span style="color: #ff0000;">&quot;mkdir -p <span style="color: #000099; font-weight: bold;">\&quot;</span><span style="color: #007800;">$HADOOP_HOME</span><span style="color: #000099; font-weight: bold;">\&quot;</span> &amp;&amp; rsync -azvh --delete --progress --exclude=logs --exclude=ignores --exclude=pids <span style="color: #000099; font-weight: bold;">\&quot;</span><span style="color: #007800;">${HADOOP_MASTER}</span><span style="color: #000099; font-weight: bold;">\&quot;</span> <span style="color: #000099; font-weight: bold;">\&quot;</span><span style="color: #007800;">${HADOOP_HOME}</span><span style="color: #000099; font-weight: bold;">\&quot;</span> $@&quot;</span>
<span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>&quot;</span><span style="color: #000000; font-weight: bold;">/</span>slaves.sh <span style="color: #007800;">$command</span>
<span style="color: #ff0000;">&quot;<span style="color: #007800;">$bin</span>&quot;</span><span style="color: #000000; font-weight: bold;">/</span>slaves.sh <span style="color: #660033;">--hosts</span> secondarymasters <span style="color: #007800;">$command</span></pre></td></tr></table></div>

</li>
</ul>
<h4>部署</h4>
<p>讲完配置，下面就开始部署了。
<ol>
<li>配置机器连通性<br />
    由于hadoops是通过ssh启动没个节点上的daemon程序，所以先配置好机器之间的<a href="ssh-copy-id.htm" target="_blank">免认证登录</a>，免得每次启动hadoop集群的时候都需要输入密码。
  </li>
<li>启动hadoop集群

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code33'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068833"><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code" id="p40688code33"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">mkdir</span> <span style="color: #660033;">-p</span> <span style="color: #000000; font-weight: bold;">/</span>opt<span style="color: #000000; font-weight: bold;">/</span>crawler
<span style="color: #c20cb9; font-weight: bold;">cp</span> nutch <span style="color: #000000; font-weight: bold;">/</span>opt<span style="color: #000000; font-weight: bold;">/</span>crawler<span style="color: #000000; font-weight: bold;">/</span>program <span style="color: #660033;">-r</span>
<span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #000000; font-weight: bold;">/</span>opt<span style="color: #000000; font-weight: bold;">/</span>crawler<span style="color: #000000; font-weight: bold;">/</span>program<span style="color: #000000; font-weight: bold;">/</span>bin
.<span style="color: #000000; font-weight: bold;">/</span>hadoop namenode <span style="color: #660033;">-format</span>
.<span style="color: #000000; font-weight: bold;">/</span>start-all.sh</pre></td></tr></table></div>

</li>
<li>开始抓取<br />
    抓取和文<a href="nutch-load-conf.htm">《nutch配置文件的加载》</a>中一样，有一个不通的地方是url文件夹必须是在hdfs里面存放的，你可以用这个命令把本地url文件夹拷贝到hdfs中：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40688code34'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4068834"><td class="code" id="p40688code34"><pre class="bash" style="font-family:monospace;">.<span style="color: #000000; font-weight: bold;">/</span>hadoop fs <span style="color: #660033;">-copyFromLocal</span> ignores<span style="color: #000000; font-weight: bold;">/</span>urls urls</pre></td></tr></table></div>

</li>
<li>查看hadoop job task状态
<p>http://master:50030查看jobtracker状态，http://master:50070可以浏览hdfs中内容</p>
</li>
</ol>
<h4>部署多个hadoop集群</h4>
<p>如果你的机器比较紧张，想在一个机器集群上部署多个hadoop集群，该怎么弄呢？很简单，首先把nutch文件夹拷贝到另一个不同的地方，然后你只需要修改hadoop-site.xml中以下几项为不同的值就可以了：<br />
fs.default.name mapred.job.tracker mapred.job.tracker.http.address dfs.http.address</p>
]]></content:encoded>
			<wfw:commentRss>http://ahei.info/nutch-distributed-crawl.htm/feed</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>ssh的免认证登录</title>
		<link>http://ahei.info/ssh-copy-id.htm</link>
		<comments>http://ahei.info/ssh-copy-id.htm#comments</comments>
		<pubDate>Fri, 12 Feb 2010 15:13:32 +0000</pubDate>
		<dc:creator>ahei</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[中级]]></category>
		<category><![CDATA[color]]></category>
		<category><![CDATA[expect]]></category>
		<category><![CDATA[keychain]]></category>
		<category><![CDATA[se]]></category>
		<category><![CDATA[ssh]]></category>
		<category><![CDATA[ssh-agent]]></category>
		<category><![CDATA[ssh-copy-id]]></category>
		<category><![CDATA[配置]]></category>

		<guid isPermaLink="false">http://emacser.com/?p=40693</guid>
		<description><![CDATA[linux下用ssh登录别的机器的时候，需要通过交互方式手工输入密码，ssh不支持直接加密码的选项，它觉得这样不安全。 但是有时候要完成一些自动的任务，比如登录到别的机器上，并在那台机器上启动一些程序，这时候该怎么办呢？ 我下面提供几种方法： 通过expect ?Download ssh.exp1 2 3 4 5 6 #!/usr/bin/expect &#160; spawn ssh -o StrictHostKeyChecking=no -l username hostname expect &#34;*password:&#34; send &#34;password\r&#34; interact 把上面的代码中的username和hostname替换为你的用户名和ip，然后保存为ssh.exp，再执行下面的代码： ?View Code BASH1 2 chown +x ssh.exp ./ssh.exp 就可以自动登录到目标机器上并执行一下ls命令。 这个方法有个缺点，就是密码以明文的方式保存在文件里，不安全。 sshpass sshpass是专门为ssh的免认证登录设计的, 它可以通过标准输入读入密码, 也可以通过把密码放在它的&#8221;-p&#8221;选项后面, 还可以用&#8221;-f&#8221;选项来制定密码文件, 还可以用&#8221;-e&#8221;选项从环境变量&#8221;SSHPASS&#8221;来读入密码, ssh的命令跟在sshpass的选项后面, 例如: ?View Code BASHsshpass -p password ssh host -l username rsync是一个同步的命令, 它是通过ssh来同步的, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright" title="OpenSSH" src="screenshots/openssh.gif" width="200" height="90"/></p>
<p>linux下用ssh登录别的机器的时候，需要通过交互方式手工输入密码，ssh不支持直接加密码的选项，它觉得这样不安全。 但是有时候要完成一些自动的任务，比如登录到别的机器上，并在那台机器上启动一些程序，这时候该怎么办呢？<span id="more-40693"></span></p>
<p>我下面提供几种方法：</p>
<ol>
<li><span style="color: #0000ff;">通过expect</span>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://ahei.info/wp-content/plugins/wp-codebox/wp-codebox.php?p=40693&amp;download=ssh.exp">ssh.exp</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069347"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p40693code47"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/usr/bin/expect</span>
&nbsp;
spawn <span style="color: #c20cb9; font-weight: bold;">ssh</span> <span style="color: #660033;">-o</span> <span style="color: #007800;">StrictHostKeyChecking</span>=no <span style="color: #660033;">-l</span> username <span style="color: #c20cb9; font-weight: bold;">hostname</span>
expect <span style="color: #ff0000;">&quot;*password:&quot;</span>
send <span style="color: #ff0000;">&quot;password<span style="color: #000099; font-weight: bold;">\r</span>&quot;</span>
interact</pre></td></tr></table></div>

<p>把上面的代码中的username和hostname替换为你的用户名和ip，然后保存为ssh.exp，再执行下面的代码：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40693code48'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069348"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p40693code48"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">chown</span> +x ssh.exp
.<span style="color: #000000; font-weight: bold;">/</span>ssh.exp</pre></td></tr></table></div>

<p>就可以自动登录到目标机器上并执行一下ls命令。<br />
这个方法有个缺点，就是密码以明文的方式保存在文件里，<span style="color: #0000ff;">不安全</span>。
  </li>
<li><span style="color: #0000ff;">sshpass</span><br />
    sshpass是专门为ssh的免认证登录设计的, 它可以通过标准输入读入密码, 也可以通过把密码放在它的&#8221;-p&#8221;选项后面, 还可以用&#8221;-f&#8221;选项来制定密码文件, 还可以用&#8221;-e&#8221;选项从环境变量&#8221;SSHPASS&#8221;来读入密码, ssh的命令跟在sshpass的选项后面, 例如:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40693code49'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069349"><td class="code" id="p40693code49"><pre class="bash" style="font-family:monospace;">sshpass <span style="color: #660033;">-p</span> password <span style="color: #c20cb9; font-weight: bold;">ssh</span> host <span style="color: #660033;">-l</span> username</pre></td></tr></table></div>

<p>    rsync是一个同步的命令, 它是通过ssh来同步的, 如果你想执行rsync的时候也不输入密码, 可以通过指定rsync的&#8221;&#8211;rsh&#8221;选项来实现, 比如:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40693code50'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069350"><td class="code" id="p40693code50"><pre class="bash" style="font-family:monospace;">rsync <span style="color: #660033;">--rsh</span>=<span style="color: #ff0000;">'sshpass -p password ssh -l username'</span> host.example.com:path</pre></td></tr></table></div>

<p>    如果想要scp也不输入密码的话，建立下面的文件：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://ahei.info/wp-content/plugins/wp-codebox/wp-codebox.php?p=40693&amp;download=ssh.sh">ssh.sh</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069351"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p40693code51"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/usr/bin/env bash</span>
&nbsp;
sshpass <span style="color: #660033;">-p</span> password <span style="color: #c20cb9; font-weight: bold;">ssh</span> <span style="color: #ff0000;">&quot;$@&quot;</span></pre></td></tr></table></div>

<p>    然后这样使用scp：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40693code52'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069352"><td class="code" id="p40693code52"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">scp</span> <span style="color: #660033;">-S</span> path-of-ssh.sh<span style="color: #000000; font-weight: bold;">/</span>ssh.sh <span style="color: #c20cb9; font-weight: bold;">file</span> user<span style="color: #000000; font-weight: bold;">@</span>host:path</pre></td></tr></table></div>

<p>    该方法使用起来简单, 缺点也是密码以明文方式保存.
  </li>
<li><span style="color: #0000ff;">通过密钥文件来实现免认证登录</span>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40693code53'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069353"><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code" id="p40693code53"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;"># 1. 生成密钥</span>
<span style="color: #c20cb9; font-weight: bold;">ssh-keygen</span> <span style="color: #660033;">-t</span> rsa <span style="color: #660033;">-P</span> <span style="color: #ff0000;">&quot;&quot;</span> <span style="color: #660033;">-f</span> ~<span style="color: #000000; font-weight: bold;">/</span>.ssh<span style="color: #000000; font-weight: bold;">/</span>id_rsa
&nbsp;
<span style="color: #666666; font-style: italic;"># 2. 把本机的公钥拷到目标机器上</span>
<span style="color: #c20cb9; font-weight: bold;">scp</span> ~<span style="color: #000000; font-weight: bold;">/</span>.ssh<span style="color: #000000; font-weight: bold;">/</span>id_rsa.pub username<span style="color: #000000; font-weight: bold;">@</span>remote-hostname:~<span style="color: #000000; font-weight: bold;">/</span>temp</pre></td></tr></table></div>

<p>    经过上面的操作，再登录到目标机器上执行：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40693code54'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069354"><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code" id="p40693code54"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">mkdir</span> <span style="color: #660033;">-p</span> ~<span style="color: #000000; font-weight: bold;">/</span>.ssh
<span style="color: #666666; font-style: italic;"># 改变权限，必须</span>
<span style="color: #c20cb9; font-weight: bold;">chmod</span> <span style="color: #000000;">700</span> .ssh
<span style="color: #c20cb9; font-weight: bold;">cat</span> temp <span style="color: #000000; font-weight: bold;">&gt;&gt;</span> ~<span style="color: #000000; font-weight: bold;">/</span>.ssh<span style="color: #000000; font-weight: bold;">/</span>authorized_keys
<span style="color: #666666; font-style: italic;"># 改变权限，必须</span>
<span style="color: #c20cb9; font-weight: bold;">chmod</span> <span style="color: #000000;">600</span> ~<span style="color: #000000; font-weight: bold;">/</span>.ssh<span style="color: #000000; font-weight: bold;">/</span>authorized_keys
<span style="color: #c20cb9; font-weight: bold;">rm</span> <span style="color: #660033;">-rf</span> temp</pre></td></tr></table></div>

<p>    注意：上面代码中的chmod修改权限的语句必须执行，有的ssh设置使得.ssh目录和authorized_keys的权限必须只能自己可读可写，如果权限没设对的话，照样不能免认证登录，这是为了安全考虑。<br />
    经过上面的操作，你现在就可以不需要输入密码就可以登录到目标机器上了。
  </li>
<li><span style="color: #0000ff;"><a href="http://ahei.info/t/ssh-copy-id" class="st_tag internal_tag" rel="tag" title="标签 ssh-copy-id 下的日志">ssh-copy-id</a></span><br />
    上面那个方法，虽然执行的命令不多，但是你想想，如果我们要让一台机器对100台机器都实现免认证登录，岂不是还是很麻烦。那我们把上面的命令写成一个脚本岂不甚好？好注意，不过不用你写了，<a href="http://ahei.info/t/ssh" class="st_tag internal_tag" rel="tag" title="标签 ssh 下的日志">ssh</a>-copy-id这个命令已经帮你写了，你可以去看看/usr/bin/<a href="http://ahei.info/t/ssh" class="st_tag internal_tag" rel="tag" title="标签 ssh 下的日志">ssh</a>-copy-id这个文件，它实际上就是一个shell脚本，帮你把你的公钥拷到目标主机的认证文件里，并修改权限，不过它不帮你生成公钥，还得你自己生成。使用方法：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40693code55'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069355"><td class="code" id="p40693code55"><pre class="bash" style="font-family:monospace;">ssh-copy-id <span style="color: #7a0874; font-weight: bold;">&#91;</span>username<span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #000000; font-weight: bold;">@</span><span style="color: #c20cb9; font-weight: bold;">hostname</span></pre></td></tr></table></div>

<p>    很简单吧！</li>
</ol>
<p>上面的方法3由于不需要把密码明文直接写到文件里面，所以比较安全，方法4和方法3本质上是一样的。</p>
<p>现在我们来考虑另外一个问题。</p>
<p>假如你通过方法2设置了免认证登录，这样, 只要别人拿到了你的私钥, 就可以登录所有已经认证你的机器了. 当然你也可以为你的私钥设置一个密码, 这是通过ssh-keygen的&#8221;-P&#8221;的来设置的. 但是现在又有一个问题, 那就是你每次登录到已经认证过你的机器的时候, 你都要输入一次你的私钥密码. keychain是解决这个问题的一个很好的工具, 它是ssh-agent的一个前端, 它会把已经认证过的密钥加入ssh-agent的高速缓存, 这样, 只有你第一次使用你的私钥登录别的机器的时候, 需要输入一下密码, 以后再次使用你的私钥的时候, 就不用输入密码了, 既保证了安全性, 又保证了便捷性.</p>
<p>讲完了上面说的免认证登录方法，我们现在可以很简单的让一个集群之间的每一台机器之间都互相免认证，而且完全自动化。<br />
方法就是：先在集群中的某一台机器上生成好密钥，并且把这台机器自己的公钥添加到它自己的认证文件里面，这样就实现了这台机器免认证登录自己。然后利用expect向每一台机器拷贝刚才那台机器的公私钥和认证文件，这样这个集群中所有机器的公私钥和认证文件都一样了，而刚才那台机器已经可以免认证登录自己，它们之间也当然可以免认证登录了（:)，是不是有点绕？）。这个方法会让每台机器的公私钥都一样，如果集群机器中已经配置了一些其他免认证登录的信息，不能破坏已有的公私钥，这个方法就不能凑效了，只能每两台机器之间互相调用ssh-copy-id命令。<br />
上面的方法具体实现如下：<br />
在集群中某一台机器上执行：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40693code56'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069356"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p40693code56"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">ssh-keygen</span> <span style="color: #660033;">-t</span> rsa <span style="color: #660033;">-P</span> <span style="color: #ff0000;">&quot;&quot;</span> <span style="color: #660033;">-f</span> ~<span style="color: #000000; font-weight: bold;">/</span>.ssh<span style="color: #000000; font-weight: bold;">/</span>id_rsa
<span style="color: #666666; font-style: italic;"># 改变权限，必须</span>
<span style="color: #c20cb9; font-weight: bold;">chmod</span> <span style="color: #000000;">700</span> ~<span style="color: #000000; font-weight: bold;">/</span>.ssh
<span style="color: #c20cb9; font-weight: bold;">cat</span> ~<span style="color: #000000; font-weight: bold;">/</span>.ssh<span style="color: #000000; font-weight: bold;">/</span>id_rsa.pub <span style="color: #000000; font-weight: bold;">&gt;&gt;</span> ~<span style="color: #000000; font-weight: bold;">/</span>.ssh<span style="color: #000000; font-weight: bold;">/</span>authorized_keys
<span style="color: #666666; font-style: italic;"># 改变权限，必须</span>
<span style="color: #c20cb9; font-weight: bold;">chmod</span> <span style="color: #000000;">600</span> ~<span style="color: #000000; font-weight: bold;">/</span>.ssh<span style="color: #000000; font-weight: bold;">/</span>authorized_keys</pre></td></tr></table></div>

<p>然后拷贝该机器的公私钥和认证文件到其他的机器上：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://ahei.info/wp-content/plugins/wp-codebox/wp-codebox.php?p=40693&amp;download=scp-auth.exp">scp-auth.exp</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069357"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="code" id="p40693code57"><pre class="tcl" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/expect</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">proc</span> scp <span style="color: #483d8b;">{user password host}</span> <span style="color: black;">&#123;</span>
    <span style="color: #ff7700;font-weight:bold;">global</span> env
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">set</span> home <span style="color: #008000;"><span style="color: #ff3333;">$env</span></span><span style="color: black;">&#40;</span>HOME<span style="color: black;">&#41;</span>
&nbsp;
    spawn ssh -o StrictHostKeyChecking=no <span style="color: #ff3333;">$user</span>@<span style="color: #ff3333;">$host</span> mkdir -p ~/.ssh
    expect <span style="color: #483d8b;">&quot;*password:&quot;</span> <span style="color: #483d8b;">{send &quot;$password\r&quot;}</span>
&nbsp;
    spawn scp -r -o StrictHostKeyChecking=no <span style="color: #ff3333;">$home</span>/.ssh/id_rsa <span style="color: #ff3333;">$user</span>@<span style="color: #ff3333;">$host</span>:~/.ssh
    expect <span style="color: #483d8b;">&quot;*password:&quot;</span> <span style="color: #483d8b;">{send &quot;$password\r&quot;}</span>
&nbsp;
    spawn scp -r -o StrictHostKeyChecking=no <span style="color: #ff3333;">$home</span>/.ssh/authorized_keys <span style="color: #ff3333;">$user</span>@<span style="color: #ff3333;">$host</span>:~/.ssh
    expect <span style="color: #483d8b;">&quot;*password:&quot;</span> <span style="color: #483d8b;">{send &quot;$password\r&quot;}</span>
&nbsp;
    wait
<span style="color: black;">&#125;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">set</span> user <span style="color: black;">&#91;</span><span style="color: #008000;">lindex</span> <span style="color: #008000;"><span style="color: #ff3333;">$argv</span></span> <span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">set</span> password <span style="color: black;">&#91;</span><span style="color: #008000;">lindex</span> <span style="color: #008000;"><span style="color: #ff3333;">$argv</span></span> <span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">set</span> host <span style="color: black;">&#91;</span><span style="color: #008000;">lindex</span> <span style="color: #008000;"><span style="color: #ff3333;">$argv</span></span> <span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>
&nbsp;
scp <span style="color: #ff3333;">$user</span> <span style="color: #ff3333;">$password</span> <span style="color: #ff3333;">$host</span></pre></td></tr></table></div>

<p>把上面的scp-auth.exp文件保存后,执行下面的命令:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40693code58'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4069358"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p40693code58"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">chmod</span> +x scp-auth.exp
<span style="color: #c20cb9; font-weight: bold;">cat</span> hostlist <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">xargs</span> <span style="color: #660033;">-l</span> .<span style="color: #000000; font-weight: bold;">/</span>scp-auth.exp</pre></td></tr></table></div>

<p>其中hostlist文件为你的所有要拷贝认证文件的机器列表,每行一条记录, 每条记录的格式为:<br />
&lt;username&gt; &lt;password&gt; &lt;ip&gt;<br />
现在你可以在这个机器集群上自由的穿梭了, 不用输入任何密码!</p>
<p>expect很方便吧, 它最大的用处就是用来为那些需要交互的程序模拟用户的输入, 比如passwd, ssh, fsck, ftp等.</p>
]]></content:encoded>
			<wfw:commentRss>http://ahei.info/ssh-copy-id.htm/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>我的wordpress插件</title>
		<link>http://ahei.info/wordpress-plugins.htm</link>
		<comments>http://ahei.info/wordpress-plugins.htm#comments</comments>
		<pubDate>Thu, 21 Jan 2010 06:00:49 +0000</pubDate>
		<dc:creator>ahei</dc:creator>
				<category><![CDATA[初级]]></category>
		<category><![CDATA[技术杂记]]></category>
		<category><![CDATA[ahei]]></category>
		<category><![CDATA[akismet]]></category>
		<category><![CDATA[color]]></category>
		<category><![CDATA[DEA]]></category>
		<category><![CDATA[Emacs]]></category>
		<category><![CDATA[emacser]]></category>
		<category><![CDATA[emacser.com]]></category>
		<category><![CDATA[favicon]]></category>
		<category><![CDATA[Gallery]]></category>
		<category><![CDATA[Google Analytics]]></category>
		<category><![CDATA[highlight]]></category>
		<category><![CDATA[ide]]></category>
		<category><![CDATA[lightbox]]></category>
		<category><![CDATA[lisp]]></category>
		<category><![CDATA[Permalink]]></category>
		<category><![CDATA[plugin]]></category>
		<category><![CDATA[Redirection]]></category>
		<category><![CDATA[se]]></category>
		<category><![CDATA[seo]]></category>
		<category><![CDATA[snippet]]></category>
		<category><![CDATA[top]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[vi]]></category>
		<category><![CDATA[wordpress]]></category>
		<category><![CDATA[wp-syntax]]></category>
		<category><![CDATA[yasnippet]]></category>
		<category><![CDATA[安装]]></category>
		<category><![CDATA[插件]]></category>
		<category><![CDATA[搜索引擎]]></category>
		<category><![CDATA[模板]]></category>
		<category><![CDATA[配置]]></category>

		<guid isPermaLink="false">http://emacser.com/?p=40617</guid>
		<description><![CDATA[最近自己用wordpress把博客搭起来了，由于以前在yo2上写博客的时候，wordpress都是yo2已搭好的，插件也都是他们加的，用户不能自己上传插件，少了很多麻烦，也多了一些不方便。这次自己搭博客，由于没有以前yo2上wordpress的插件列表，只有自己根据印象一个的去找插件，找的过程中，也发现了一些比较好的插件，这里我记录下来，做备录用，也供没有搭过wordpress的同志参考。（注：以下插件根据重要性和功能强大性做基本有序排序，这些插件都可以根据名字去wordpress插件官网下载。） Google XML Sitemaps 这个插件会生成你网站的xml地图，发送给google，yahoo，ask，bing等搜索引擎，而且可以配置成当你写了一篇博文后，自动生成xml地图给搜索引擎，使得搜索引擎能尽快知道你博客的更新。我使用了这个插件后，写了一篇博文后，有时候只需要一二十分钟后就能在google里检索到我刚写的博文，非常实用。 Baidu Sitemap Generator 和google XML Sitemaps插件功能类似，也是生成网站的xml地图，方便百度爬取，因为百度的sitemap xml和其他搜索引擎不一样，所以才有了这个插件。这个插件不支持你发文后通知百度。 jadedcoder Sticky Permalinks 这个插件太强悍了，随便你怎么更改你网站上的任何链接，旧的链接会重定向到新的链接，这样你的网站就不会因为更改了链接而丢失任何流量了。 Redirection 这个插件可以根据你配置的模板，把一个页面重定向到另一个页面，由于我的网站上的链接基本都是.htm的结尾，我就配置了把所有的.html结尾的url重定向到对应的.htm结尾的页面。这个插件貌似不支持中文url。 Custom Permalinks 对你博客中的分类，tag，页面，博文进行任意的url自定义，与你的永久链接设置不冲突，非常好用。安装完成后，可以去分类，tag，页面，博文编辑页面，里面有一个“Custom Permalink”选项，可以自己输入任意你喜欢的url。你可以把你的url都弄成&#8221;.htm&#8221;或&#8221;.html&#8221;结尾，让搜索引擎更好的索引你的博客。 All in One SEO Pack 对你的网站做SEO，挺好用的 Lightbox Gallery 这个插件可以让你网站上的图片有jQuery中的lightbox效果，这里有效果演示。安装好这个插件后，在文章你增加相册，会自动加上lightbox效果。但是默认的，弹出的图片是缩略图，把代码改成 ?View Code HTML4STRICT[ gallery lightboxsize=&#34;full&#34; ] 弹出的图片就是原来的图片。 向文章中增加单个图片的时候，默认的没有lightbox效果，在a标签中增加rel=&#8221;lightbox&#8221;就可以使单个图片也有lightbox效果，就像下面这样： ?View Code HTML4STRICT1 2 3 &#60;a href=&#34;image.jpg&#34; rel=&#34;lightbox&#34; title=&#34;this is a caption&#34;&#62; &#60;img src=&#34;thumbnail.jpg&#34; alt=&#34;&#34; /&#62; [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright" title="WordPress" src="screenshots/wordpress.jpg" width="100" height="100"/></p>
<p>最近自己用wordpress把<a href="blog-reborn.htm" target="_blank">博客搭起来了</a>，由于以前在yo2上写博客的时候，wordpress都是yo2已搭好的，插件也都是他们加的，用户不能自己上传插件，少了很多麻烦，也多了一些不方便。这次自己搭博客，由于没有以前yo2上wordpress的插件列表，只有自己根据印象一个的去找插件，找的过程中，也发现了一些比较好的插件，这里我记录下来，做备录用，也供没有搭过wordpress的同志参考。（注：以下插件根据重要性和功能强大性做基本有序排序，这些插件都可以根据名字去wordpress<a href="http://wordpress.org/extend/plugins/" target="_blank">插件官网</a>下载。）<span id="more-40617"></span></p>
<ul>
<li><span style="color: #0000ff;">Google XML Sitemaps</span><br />
    这个插件会生成你网站的xml地图，发送给google，yahoo，ask，bing等搜索引擎，而且可以配置成当你写了一篇博文后，自动生成xml地图给搜索引擎，使得搜索引擎能尽快知道你博客的更新。我使用了这个插件后，写了一篇博文后，有时候只需要一二十分钟后就能在google里检索到我刚写的博文，非常实用。
  </li>
<li><span style="color: #0000ff;">Baidu Sitemap Generator</span><br />
    和google XML Sitemaps插件功能类似，也是生成网站的xml地图，方便百度爬取，因为百度的sitemap xml和其他搜索引擎不一样，所以才有了这个插件。这个插件不支持你发文后通知百度。
  </li>
<li><span style="color: #0000ff;">jadedcoder Sticky Permalinks</span><br />
    这个插件太强悍了，随便你怎么更改你网站上的任何链接，旧的链接会重定向到新的链接，这样你的网站就不会因为更改了链接而丢失任何流量了。
  </li>
<li><span style="color: #0000ff;">Redirection</span><br />
    这个插件可以根据你配置的模板，把一个页面重定向到另一个页面，由于我的网站上的链接基本都是.htm的结尾，我就配置了把所有的.html结尾的url重定向到对应的.htm结尾的页面。这个插件貌似不支持中文url。
  </li>
<li><span style="color: #0000ff;">Custom Permalinks</span><br />
    对你博客中的分类，tag，页面，博文进行任意的url自定义，与你的永久链接设置不冲突，非常好用。安装完成后，可以去分类，tag，页面，博文编辑页面，里面有一个“Custom Permalink”选项，可以自己输入任意你喜欢的url。你可以把你的url都弄成&#8221;.htm&#8221;或&#8221;.html&#8221;结尾，让搜索引擎更好的索引你的博客。
  </li>
<li><span style="color: #0000ff;">All in One SEO Pack</span><br />
    对你的网站做SEO，挺好用的
  </li>
<li><span style="color: #0000ff;">Lightbox Gallery</span><br />
    这个插件可以让你网站上的图片有jQuery中的lightbox效果，<a href="emacs.htm" target="_blank">这里</a>有效果演示。安装好这个插件后，在文章你增加相册，会自动加上lightbox效果。但是默认的，弹出的图片是缩略图，把代码改成</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40617code62'); return false;">View Code</a> HTML4STRICT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4061762"><td class="code" id="p40617code62"><pre class="html4strict" style="font-family:monospace;">[ gallery lightboxsize=&quot;full&quot; ]</pre></td></tr></table></div>

<p>    弹出的图片就是原来的图片。<br />
    向文章中增加单个图片的时候，默认的没有lightbox效果，在a标签中增加rel=&#8221;lightbox&#8221;就可以使单个图片也有lightbox效果，就像下面这样：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40617code63'); return false;">View Code</a> HTML4STRICT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4061763"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p40617code63"><pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<a href="http://december.com/html/4/element/a.html"><span style="color: #000000; font-weight: bold;">a</span></a> <span style="color: #000066;">href</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;image.jpg&quot;</span> <span style="color: #000066;">rel</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;lightbox&quot;</span> <span style="color: #000066;">title</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;this is a caption&quot;</span>&gt;</span>
  <span style="color: #009900;">&lt;<a href="http://december.com/html/4/element/img.html"><span style="color: #000000; font-weight: bold;">img</span></a> <span style="color: #000066;">src</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;thumbnail.jpg&quot;</span> <span style="color: #000066;">alt</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;&quot;</span> <span style="color: #66cc66;">/</span>&gt;</span>
<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><a href="http://december.com/html/4/element/a.html"><span style="color: #000000; font-weight: bold;">a</span></a>&gt;</span></pre></td></tr></table></div>

</li>
<li><span style="color: #0000ff;">Faster Image Insert</span><br />
    wordpress默认的在文章里增加图片的时候每次只能增加一张，非常麻烦，这个插件可以让你每次可以增加多张，非常方便。
  </li>
<li><span style="color: #0000ff;">Google Analytics for <a href="http://ahei.info/t/wordpress" class="st_tag internal_tag" rel="tag" title="标签 wordpress 下的日志">WordPress</a></span><br />
    这个插件只需要你填写你的google analytics帐号id，它会自动在你的每个页面增加google analytics的代码，非常不错。
  </li>
<li><span style="color: #0000ff;">WordPress Thread Comment</span><br />
    用树状显示评论，还可以直接在前台管理评论，挺好用的。<a href="auto-complete_yasnippet.htm#comment-72" target="_blank">这里</a>是效果图。
  </li>
<li><span style="color: #0000ff;">WP-CodeBox</span><br />
    非常强大的高亮代码的插件，基于GeSHi, <a href="http://wordpress.org/extend/plugins/wp-syntax/other_notes/" target="_blank">支持的语言</a>非常多，常见的就不用说了，还包括apt_sources，autoit，bash，cmake，diff，email，lisp等等等等，很猛吧。<a href="dea.htm" target="_blank">这里</a>是效果图。
  </li>
<li><span style="color: #0000ff;"><a href="http://ahei.info/t/wp-syntax" class="st_tag internal_tag" rel="tag" title="标签 wp-syntax 下的日志">WP-Syntax</a></span><br />
    也是一款基于GeSHi的高亮代码的插件.
  </li>
<li><span style="color: #0000ff;">WP-RecentComments</span><br />
    安装完这个插件后，你可以在你的“小工具”里面增加这个工具，用ajax显示最新的评论，还可以显示评论里面的表情。我的博客右侧栏的“最新评论”就是用这个插件生成的。
  </li>
<li><span style="color: #0000ff;">Twitter Tools</span><br />
    显然，这是一个twitter的插件，你可以在“小工具”里面增加显示最新twitter信息的小工具。我的博客右侧栏的“Twitter”就是用这个插件生成的。还有，当你在博客上发文章的时候，它会自动用你的twitter帐号发送一条tweet，非常不错。
  </li>
<li><span style="color: #0000ff;">Dagon Design Sitemap Generator</span><br />
    为你的博客产生博客地图，我的<a href="map" target="_blank">博客地图</a>就是用这个插件生成的。
  </li>
<li><span style="color: #0000ff;">KB robots.txt</span><br />
    可以直接在wp后台修改robots.txt，安装完之后，你访问你的robots.txt，发现原来的内容已经没了，不用担心，这个插件截获了http://yourblog/robots.txt的请求，把它的插件里面的robots.txt显示而已，你禁用掉这个插件后，你原来的robots.txt又能看到了。
  </li>
<li><span style="color: #0000ff;">My Link Order</span><br />
    对你博客中的链接进行排序，挺实用的。使用的时候，在“小工具”里面把“My Link Order”拖到sidebar里面即可。我这次搭博客的时候，我没有把“My Link Order”拖到sidebar里面，用的还是以前的“链接”，我定义的链接顺序一直没起作用，我还怀疑这个插件不能用了呢，最后才搞明白怎么回事。
  </li>
<li><span style="color: #0000ff;">My Page Order</span><br />
    对你博客中的页面进行排序，也挺实用的
  </li>
<li><span style="color: #0000ff;">Category Order</span><br />
    对你博客中的分类进行排序
  </li>
<li><span style="color: #0000ff;">Top Level Categories</span><br />
    wordpress默认的分类url是/category/catname, 这个插件为你去掉category,直接变成/catname.
  </li>
<li><span style="color: #0000ff;">Autolink URI</span><br />
    对你的博文中的url自动加上链接，不错。
  </li>
<li><span style="color: #0000ff;">Most Commented Widget</span><br />
    安装完这个插件后，你可以在你的“小工具”里面增加这个工具，可以显示最多的评论。我的博客右侧栏的“最多评论”就是用这个插件生成的。
  </li>
<li><span style="color: #0000ff;">NextGEN Gallery</span><br />
    这是一个相册管理软件，可以批量导入相册。由于默认的wordpress批量导入图片很麻烦，我一般都是用这个插件批量扫描服务器上的图片目录，把它们增加到相册里，而且它也会给这些图片生成缩略图，然后我手工写图片的html。扫描图片之前，要先配置一下它使得它扫描图片时，不要压缩原来的图片。
  </li>
<li><span style="color: #0000ff;">Simple Tags</span><br />
    这是一个管理tag的插件，它可以自动根据你的关键词列表生成你文章中的tag，我文章中的tag就是根据这个插件自动生成的。
  </li>
<li><span style="color: #0000ff;">Permalink Finder</span><br />
    这个插件可以使得即使用户输入错了url，也可以正确到达用户所需要的页面。比如，我有一个页面url是blog-reborn.htm，用户输入blog-reborna.htm也能到达这个页面。挺好用的。
  </li>
<li><span style="color: #0000ff;"><a href="http://ahei.info/t/akismet" class="st_tag internal_tag" rel="tag" title="标签 akismet 下的日志">akismet</a></span><br />
    这是一个阻止垃圾评论的插件,可以根据你已有的垃圾评论进行学习,来更好的识别垃圾评论. 我的博客曾经有一段时间被人发大量垃圾评论, 加了这个插件后, 好多了, 自动把它们识别为垃圾评论.
  </li>
<li><span style="color: #0000ff;">WP-UserOnline</span><br />
    这个插件可以看到你博客上的在线用户和爬虫
  </li>
<li><span style="color: #0000ff;">Custom Smilies</span><br />
    为评论增加表情，我的博客的评论里面的表情就是这个插件的效果。
  </li>
<li><span style="color: #0000ff;">Shockingly Simple Favicon</span><br />
    在你的网站首页head标签内增加以下代码：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40617code64'); return false;">View Code</a> HTML4STRICT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4061764"><td class="code" id="p40617code64"><pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<a href="http://december.com/html/4/element/link.html"><span style="color: #000000; font-weight: bold;">link</span></a> <span style="color: #000066;">rel</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;shortcut icon&quot;</span> <span style="color: #000066;">href</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;PATH_TO_ICON/favicon.ico&quot;</span><span style="color: #66cc66;">/</span>&gt;</span></pre></td></tr></table></div>

<p>    就可以在地址栏看到你设定的图版。<br />
    Shockingly Simple Favicon这个插件很简单，就是它帮你增加这些代码。我博客的地址栏图标就是这个插件生成的。
  </li>
<li><span style="color: #0000ff;">WordPress Database Backup</span><br />
    如果你没有管理后台数据库的权限，可以用这个插件备份你的数据库，这个插件还可以定时备份，还可以把备份文件发到你的邮箱。
  </li>
<li><span style="color: #0000ff;">WP-PageNavi</span><br />
    wordpress默认的翻页链接只显示前一页和后一页，这个插件可以显示多个翻页链接。我博客最下面的翻页链接就是这个插件生成的。
  </li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://ahei.info/wordpress-plugins.htm/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>我的博客重生了</title>
		<link>http://ahei.info/blog-reborn.htm</link>
		<comments>http://ahei.info/blog-reborn.htm#comments</comments>
		<pubDate>Sun, 17 Jan 2010 14:07:50 +0000</pubDate>
		<dc:creator>ahei</dc:creator>
				<category><![CDATA[我的生活]]></category>
		<category><![CDATA[A record]]></category>
		<category><![CDATA[ahei]]></category>
		<category><![CDATA[bluehost]]></category>
		<category><![CDATA[byethost]]></category>
		<category><![CDATA[control]]></category>
		<category><![CDATA[Emacs]]></category>
		<category><![CDATA[emacser]]></category>
		<category><![CDATA[emacser.com]]></category>
		<category><![CDATA[godaddy]]></category>
		<category><![CDATA[se]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[vi]]></category>
		<category><![CDATA[wordpress]]></category>
		<category><![CDATA[xtreemhost]]></category>
		<category><![CDATA[安装]]></category>
		<category><![CDATA[浏览器]]></category>

		<guid isPermaLink="false">http://emacser.com/?p=40508</guid>
		<description><![CDATA[由于伟大的&#8221;最大公约数&#8221;整治网络, 关闭yo2上的所有博客, 我的博客http://ahei.yo2.cn便不能再访问了. 之后试了godaddy的免费空间, 崩溃的是, 安装完WordPress后竟然只能用ie才能正确显示后台管理界面, 其他浏览器都不能正确显示, 貌似没有成功加载css(不过水木网友告诉了我解决办法). 无奈, 再去找了些免费的空间, byethost, xtreemhost, 很失望, 导入文章后, 都出现以下错误信息: ?View Code TEXTThis webpage appears to be infected with a virus. &#160; &#160; If you are the webmaster of this site you should log into your account and check / remove any hidden iframes from the page. Once this is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright" title="hugege" src="screenshots/hugege.jpg" width="100" height="100"/></p>
<p>由于伟大的&#8221;最大公约数&#8221;整治网络, 关闭yo2上的所有博客, 我的博客<a href="http://ahei.yo2.cn">http://ahei.yo2.cn</a>便不能再访问了. 之后试了godaddy的免费空间, 崩溃的是, 安装完WordPress后竟然只能用ie才能正确显示后台管理界面, 其他浏览器都不能正确显示, 貌似没有成功加载css(不过水木网友告诉了我<a href="http://fivebig.com/blog/2010/01/wordpress-on-godaddy/" target="_blank">解决办法</a>). 无奈, 再去找了些免费的空间, <a href="http://ahei.info/t/byethost" class="st_tag internal_tag" rel="tag" title="标签 byethost 下的日志">byethost</a>, <a href="http://ahei.info/t/xtreemhost" class="st_tag internal_tag" rel="tag" title="标签 xtreemhost 下的日志">xtreemhost</a>, 很失望, 导入文章后, 都出现以下错误信息:<span id="more-40508"></span></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p40508code66'); return false;">View Code</a> TEXT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p4050866"><td class="code" id="p40508code66"><pre class="text" style="font-family:monospace;">This webpage appears to be infected with a virus.
&nbsp;
&nbsp;
If you are the webmaster of this site you should log into your account and check / remove any hidden iframes from the page. Once this is completed the page will display.
&nbsp;
You should also change all passwords for your hosting account control panel, then scan your PC with a recent antivirus product / spyware checker, then update all php scripts on your hosting account to the most recent versions. Then you should view the following documents regarding security
&nbsp;
http://www.google.com/search?q=mysql+injection
&nbsp;
http://en.wikipedia.org/wiki/Cross-site_scripting
&nbsp;
http://www.google.com/search?q=php+script+vulnerabilities
&nbsp;
http://en.wikipedia.org/wiki/Remote_File_Inclusion
&nbsp;
http://en.wikipedia.org/wiki/SQL_injection</pre></td></tr></table></div>

<p>没办法, 还是自己去买个空间吧.</p>
<p>看了一些国外的空间, dreamhost, bluehost都比较贵, 不太适合我这种写点博客消遣的人. 最好在同事的推荐下, 去买了<a href="http://hugege.com/" target="_blank">胡戈戈</a>的空间, 主机在国外, 速度也很不错, 也挺便宜的, 600M硬盘空间/6G带宽每月/可绑定3个顶级域名 100元/年, 很快就把博客搭好了, 基本上没碰到啥问题. 后台管理cPanel也非常的方便. 绑定域名也特别的方便，godaddy的话只需要把你的A record修改为你空间所在的服务器的ip就可以了。</p>
<p>总算, 我的博客重生了. 请订阅了原博客<a href="http://ahei.yo2.cn">http://ahei.yo2.cn</a>的同志重新订阅一下<a href="http://emacser.com">http://emacser.com</a>, 你如果愿意的话。</p>
<p>在我的博客无法访问期间, 对那些关注我的博客的同志, 表示抱歉, 也表示感谢, 正是有了他们的支持, 我才更有动力来和大家一起分享获得知识的愉悦. </p>
]]></content:encoded>
			<wfw:commentRss>http://ahei.info/blog-reborn.htm/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>终于注册了一个独立域名</title>
		<link>http://ahei.info/emacser.htm</link>
		<comments>http://ahei.info/emacser.htm#comments</comments>
		<pubDate>Sat, 19 Dec 2009 11:10:07 +0000</pubDate>
		<dc:creator>ahei</dc:creator>
				<category><![CDATA[我的生活]]></category>
		<category><![CDATA[ahei]]></category>
		<category><![CDATA[Emacs]]></category>
		<category><![CDATA[emacser]]></category>
		<category><![CDATA[emacser.com]]></category>
		<category><![CDATA[godaddy]]></category>
		<category><![CDATA[paypal]]></category>
		<category><![CDATA[se]]></category>
		<category><![CDATA[域名转向]]></category>

		<guid isPermaLink="false">http://ahei.yo2.cn/?p=40499</guid>
		<description><![CDATA[前阵子, 有朋友建议我注册个独立域名, 买个空间, 那样会稳定些. 昨天同事说godaddy圣诞节前, 搞优惠活动, 申请.com域名时, 只要输入优惠码“BUYCOM99”, 就可以享受0.99美元的优惠价格. 遂也打算去注册一个域名. 先试了下emacs.com, 已被注册, 然后再试了下emacser.com, 没被注册, 太好了. 赶快注册! 注册的时候, 考虑支付方式的时候, 由于看到月光博客上说godaddy会不经本人同意直接刷信用卡, 心有余悸, 所以准备用paypal支付. 立马去注册了一个paypal账号, 刚注册完, 手机收到短信说我的尾数为****的信用卡消费1美元, nnd, 虽然已经听同事说过paypal会刷一点钱, 来验证一下信用卡的有效性, 但还是不爽. 算了, 继续注册我的域名. 等到选择支付方式的时候, 找了半天, 也没找到可以选择paypal进行支付, 我Google之, 无果. 据说现在godaddy也支持支付宝支付, 也Google了半天无果. 真个郁闷, 那我刚才注册的paypal岂不是没用? 白花了1美元. 算了, 狠狠心, 就刷信用卡吧. 刷完后, 域名搞到手, 开始域名转向, 转到我的博客. 据说域名转向要一两个小时, 我就等啊等. 一个下午过去了, 我通过我们公司的国外的服务器能访问我的域名了, 但是国内还是不行, 郁闷. 晚上回来, 就查了一下godaddy域名转向的问题, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright" title="godaddy" src="screenshots/godaddy.gif" width="160" height="70"/></p>
<p>前阵子, 有朋友建议我注册个独立域名, 买个空间, 那样会稳定些. 昨天同事说<a href="https://www.godaddy.com/" target="_blank">godaddy</a>圣诞节前, 搞优惠活动, 申请.com域名时, 只要输入优惠码“BUYCOM99”, 就可以享受0.99美元的优惠价格. 遂也打算去注册一个域名. 先试了下emacs.com, 已被注册, 然后再试了下emacser.com, 没被注册, 太好了. 赶快注册!<span id="more-40499"></span></p>
<p>注册的时候, 考虑支付方式的时候, 由于看到月光博客上说<a href="http://www.williamlong.info/blog/archives/99.html" target="_blank">godaddy会不经本人同意直接刷信用卡</a>, 心有余悸, 所以准备用paypal支付. 立马去注册了一个paypal账号, 刚注册完, 手机收到短信说我的尾数为****的信用卡消费1美元, nnd, 虽然已经听同事说过paypal会刷一点钱, 来验证一下信用卡的有效性, 但还是不爽. 算了, 继续注册我的域名. 等到选择支付方式的时候, 找了半天, 也没找到可以选择paypal进行支付, 我Google之, 无果. 据说现在godaddy也支持支付宝支付, 也Google了半天无果. 真个郁闷, 那我刚才注册的paypal岂不是没用? 白花了1美元. 算了, 狠狠心, 就刷信用卡吧. 刷完后, 域名搞到手, 开始域名转向, 转到我的博客. 据说域名转向要一两个小时, 我就等啊等. 一个下午过去了, 我通过我们公司的国外的服务器能访问我的域名了, 但是国内还是不行, 郁闷.</p>
<p>晚上回来, 就查了一下godaddy域名转向的问题, 原来godaddy的域名转向被棺材店封了, 该死的棺材店! 不过棺材店永远也斗不过&#8221;有着雪亮的眼睛但却不明真相&#8221;的群众, 找到了一篇<a href="http://bleakhand.yo2.cn/articles/godaddy-yu-ming-ding-xiang-di-fang-fa.html" target="_blank">godaddy域名转向</a>的文章, 终于搞定了, 你现在已经能通过<a href="http://emacser.com" target="_blank">http://emacser.com</a>来访问我的博客了. </p>
<p>最后, 我不服气为啥paypal非要扣我一美元, 去Google了一把, 原来是<a href="http://www.google.cn/search?hl=zh-CN&#038;newwindow=1&#038;q=paypal+1.95&#038;btnG=Google+%E6%90%9C%E7%B4%A2&#038;aq=f&#038;oq=" target="_blank">这么回事</a>, 还是感觉不爽, 支付宝关联银行卡的时候, 给我们打几分钱过来, 它却倒好, 给我们扣个2.95美元, nnd. 不过今天去给paypal客服打电话, 客服mm态度不错, 不过有点像机器人, <img src='http://ahei.info/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> .</p>
]]></content:encoded>
			<wfw:commentRss>http://ahei.info/emacser.htm/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>用tsocks代替sockscap来转发网络请求</title>
		<link>http://ahei.info/tsocks.htm</link>
		<comments>http://ahei.info/tsocks.htm#comments</comments>
		<pubDate>Mon, 07 Dec 2009 13:19:30 +0000</pubDate>
		<dc:creator>ahei</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[中级]]></category>
		<category><![CDATA[ahei]]></category>
		<category><![CDATA[genproxy]]></category>
		<category><![CDATA[LD_PRELOAD]]></category>
		<category><![CDATA[proxy]]></category>
		<category><![CDATA[se]]></category>
		<category><![CDATA[sockscap]]></category>
		<category><![CDATA[ssh]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[top]]></category>
		<category><![CDATA[tsocks]]></category>
		<category><![CDATA[windows]]></category>
		<category><![CDATA[代理]]></category>
		<category><![CDATA[安装]]></category>
		<category><![CDATA[配置]]></category>
		<category><![CDATA[配置文件]]></category>

		<guid isPermaLink="false">http://ahei.yo2.cn/?p=39137</guid>
		<description><![CDATA[你有没有遇到过这种情况: 某一台机器A的网速特别快, 另外一台机器B和A机器在同一个局域网内, 但是B机器的带宽有限, 由于A机器和B机器由于是在局域网内, 传输速度很快, 所以如果能把B机器的网络请求先发到A, 再由A转发出去, 这样B机器的网速可以一样很快了. 那么怎样来转发请求呢? 很显然, 用代理软件就可以做到. 但是, 我在这里给大家提供一个更简便的架设socks代理的方法, 用ssh服务. ssh的功能巨强大, 大家可以通过它的man详细了解它的功能, 它的man非常详细. 利用ssh架设代理服务主要是利用它的&#8221;-D&#8221;选项, 这个选项后面跟一个ip地址和端口, 格式为ip:port, ip地址为你本机待绑定的ip, 是可选的, 加了这个选项后, 就表示在本机与目标机器之间建立一条ssh通道, 而且在本机监听一个你指定的端口. 我写了一个简单的小函数来开通ssh的代理: ?View Code BASH1 2 3 4 5 6 genproxy &#40;&#41; &#123; ip=&#34;$1&#34;; user=&#34;$2&#34;; ssh -o StrictHostKeyChecking=no &#34;$ip&#34; -l &#34;$user&#34; -D 8888 -N -f &#125; 这个函数的第一个参数是ip, 第二个参数是用户名, &#8220;-o StrictHostKeyChecking=no&#8221;表示目标主机的key未知或者改变过时, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright" title="tsocks" src="screenshots/tsocks-logo.jpg"/></p>
<p>你有没有遇到过这种情况: 某一台机器A的网速特别快, 另外一台机器B和A机器在同一个局域网内, 但是B机器的带宽有限, 由于A机器和B机器由于是在局域网内, 传输速度很快, 所以如果能把B机器的网络请求先发到A, 再由A转发出去, 这样B机器的网速可以一样很快了. 那么怎样来转发请求呢? 很显然, 用代理软件就可以做到. 但是, 我在这里给大家提供一个更简便的架设socks代理的方法, 用ssh服务.<span id="more-39137"></span></p>
<p>ssh的功能巨强大, 大家可以通过它的man详细了解它的功能, 它的man非常详细. 利用ssh架设代理服务主要是利用它的&#8221;-D&#8221;选项, 这个选项后面跟一个ip地址和端口, 格式为ip:port, ip地址为你本机待绑定的ip, 是可选的, 加了这个选项后, 就表示在本机与目标机器之间建立一条ssh通道, 而且在本机监听一个你指定的端口.<br />
我写了一个简单的小函数来开通ssh的代理:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p39137code77'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3913777"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p39137code77"><pre class="bash" style="font-family:monospace;">genproxy <span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #7a0874; font-weight: bold;">&#123;</span>
    <span style="color: #007800;">ip</span>=<span style="color: #ff0000;">&quot;$1&quot;</span>;
    <span style="color: #007800;">user</span>=<span style="color: #ff0000;">&quot;$2&quot;</span>;
    <span style="color: #c20cb9; font-weight: bold;">ssh</span> <span style="color: #660033;">-o</span> <span style="color: #007800;">StrictHostKeyChecking</span>=no <span style="color: #ff0000;">&quot;<span style="color: #007800;">$ip</span>&quot;</span> <span style="color: #660033;">-l</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$user</span>&quot;</span> <span style="color: #660033;">-D</span> <span style="color: #000000;">8888</span> <span style="color: #660033;">-N</span> <span style="color: #660033;">-f</span>
<span style="color: #7a0874; font-weight: bold;">&#125;</span></pre></td></tr></table></div>

<p>这个函数的第一个参数是ip, 第二个参数是用户名, &#8220;-o StrictHostKeyChecking=no&#8221;表示目标主机的key未知或者改变过时, 不提示. 如果不加这个选项, 你可能会得到类似以下的提示:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p39137code78'); return false;">View Code</a> TEXT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3913778"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p39137code78"><pre class="text" style="font-family:monospace;">The authenticity of host '172.0.1.251 (172.0.1.251)' can't be established.
RSA key fingerprint is 51:18:fe:5f:de:a7:55:ef:7c:d4:6e:ba:bc:9e:a2:7c.
Are you sure you want to continue connecting (yes/no)?</pre></td></tr></table></div>

<p>&#8220;-N&#8221;表示ssh到目标机器后不执行任何命令, 一般不加这个选项的话, 会连接上目标机器后执行你指定的命令, 如果你没有指定任何命令的话, 就直接执行/bin/sh, 所以这个&#8221;-N&#8221;通常会用在这种只需要监听端口的场合.<br />
&#8220;-f&#8221;表示在执行任何命令之前转入到后台进行处理.<br />
上述命令如果执行成功的话, 用netstat能看到本机已经建立一个8888端口, 这时候, 只要有网络请求转发到8888这个端口, ssh会把这个请求通过刚才已经建立好的ssh通道发到目标机器上, 从而达到代理的作用. 那么怎么来把请求发到8888这个端口上呢? 有些软件, 比如qq, 有设置socks代理的功能, 但是有好多软件都没有设置socks代理的功能, 那么对于这些软件该怎么办呢? Windows下sockscap这样的软件, 你可以把一些软件的快捷方式加入到sockscap里面去, 然后要想使用代理的话, 就直接在sockscap里面来启动软件. 很方便. 那么linux下是否也有这类软件呢? 当然有, 而且更方便.</p>
<p>tsocks就是一款类似sockscap的网络请求转发的软件.<br />
使用很简单, 安装完tsocks后, 打开它的配置文件/etc/<a href="http://ahei.info/t/tsocks" class="st_tag internal_tag" rel="tag" title="标签 tsocks 下的日志">tsocks</a>.conf, 翻到文件末尾, 里面有一个server和server_port的选项, 这个就是socks server的ip和端口, 分别填上即可. 要注意的地方就是, 对于上面那个例子, 在B机器上配置tsocks的时候, server应该写127.0.0.1, 而不是A机器的ip, 因为8888端口是在B机器上开启的. 我今天配置的时候, 就犯了这个错误, tsocks提示&#8221;socks server is not on a local subnet local&#8221;, 很诡异, 弄了半天才明白. 切记切记.<br />
配置好后, 直接tsocks后面跟命令就可以了, 比如</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p39137code79'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3913779"><td class="code" id="p39137code79"><pre class="bash" style="font-family:monospace;">tsocks <span style="color: #c20cb9; font-weight: bold;">wget</span> http:<span style="color: #000000; font-weight: bold;">//</span>www.g.cn</pre></td></tr></table></div>

<p>就可以使用代理来wget了. 但是这样的话, 就必须要在每个命令前都要加tsocks. 还有一个更简单的方法:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p39137code80'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3913780"><td class="code" id="p39137code80"><pre class="bash" style="font-family:monospace;">. tsocks <span style="color: #660033;">-on</span></pre></td></tr></table></div>

<p>注意了, 上面这个命令前面有一个点号, 必须要的, 我今天在配置的时候, 也是没加, 弄了半天都不行, 后来仔细看了tsocks的man才知道了. 那么为什么要加点号呢?<br />
shell里面,</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p39137code81'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3913781"><td class="code" id="p39137code81"><pre class="bash" style="font-family:monospace;">.<span style="color: #000000; font-weight: bold;">/</span>test.sh</pre></td></tr></table></div>

<p>shell会开启一个子shell进程来执行test.sh, test.sh里面所有影响环境变量的语句对它的父shell进程都没有影响, 而</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p39137code82'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3913782"><td class="code" id="p39137code82"><pre class="bash" style="font-family:monospace;">. .<span style="color: #000000; font-weight: bold;">/</span>test.sh</pre></td></tr></table></div>

<p>不是单独开启一个shell进程, 而是在当前shell下执行test.sh, 这样test.sh里面对环境变量影响的语句在当前shell就起作用了.<br />
知道上面加与不加点号的区别后, 我们再来看看tsocks的源码:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p39137code83'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3913783"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
</pre></td><td class="code" id="p39137code83"><pre class="bash" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">case</span> <span style="color: #ff0000;">&quot;$1&quot;</span> <span style="color: #000000; font-weight: bold;">in</span>
	-on<span style="color: #7a0874; font-weight: bold;">&#41;</span>
		<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-z</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$LD_PRELOAD</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>
			<span style="color: #000000; font-weight: bold;">then</span>
				<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">LD_PRELOAD</span>=<span style="color: #ff0000;">&quot;/usr/lib/libtsocks.so&quot;</span>
			<span style="color: #000000; font-weight: bold;">else</span>
				<span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #007800;">$LD_PRELOAD</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #660033;">-q</span> <span style="color: #ff0000;">&quot;/usr/lib/libtsocks\.so&quot;</span> <span style="color: #000000; font-weight: bold;">||</span> \
				<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">LD_PRELOAD</span>=<span style="color: #ff0000;">&quot;/usr/lib/libtsocks.so <span style="color: #007800;">$LD_PRELOAD</span>&quot;</span>
		<span style="color: #000000; font-weight: bold;">fi</span>
	<span style="color: #000000; font-weight: bold;">;;</span>
	-off<span style="color: #7a0874; font-weight: bold;">&#41;</span>
		<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">LD_PRELOAD</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #660033;">-n</span> <span style="color: #007800;">$LD_PRELOAD</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">sed</span> <span style="color: #ff0000;">'s/\/usr\/lib\/libtsocks.so \?//'</span><span style="color: #000000; font-weight: bold;">`</span>
		<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-z</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$LD_PRELOAD</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>
			<span style="color: #000000; font-weight: bold;">then</span>
				<span style="color: #7a0874; font-weight: bold;">unset</span> LD_PRELOAD
		<span style="color: #000000; font-weight: bold;">fi</span>
	<span style="color: #000000; font-weight: bold;">;;</span>
	-show<span style="color: #000000; font-weight: bold;">|</span>-sh<span style="color: #7a0874; font-weight: bold;">&#41;</span>
		<span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;LD_PRELOAD=<span style="color: #000099; font-weight: bold;">\&quot;</span><span style="color: #007800;">$LD_PRELOAD</span><span style="color: #000099; font-weight: bold;">\&quot;</span>&quot;</span>
	<span style="color: #000000; font-weight: bold;">;;</span>
	-h<span style="color: #000000; font-weight: bold;">|</span>-?<span style="color: #7a0874; font-weight: bold;">&#41;</span>
      <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;$0: Please see tsocks(1) or read comment at top of $0&quot;</span>
   <span style="color: #000000; font-weight: bold;">;;</span>
	<span style="color: #000000; font-weight: bold;">*</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
		<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-z</span> <span style="color: #ff0000;">&quot;<span style="color: #007800;">$LD_PRELOAD</span>&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>
		<span style="color: #000000; font-weight: bold;">then</span>
			<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">LD_PRELOAD</span>=<span style="color: #ff0000;">&quot;/usr/lib/libtsocks.so&quot;</span>
		<span style="color: #000000; font-weight: bold;">else</span>
			<span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #007800;">$LD_PRELOAD</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #660033;">-q</span> <span style="color: #ff0000;">&quot;/usr/lib/libtsocks\.so&quot;</span> <span style="color: #000000; font-weight: bold;">||</span> \
			<span style="color: #7a0874; font-weight: bold;">export</span> <span style="color: #007800;">LD_PRELOAD</span>=<span style="color: #ff0000;">&quot;/usr/lib/libtsocks.so <span style="color: #007800;">$LD_PRELOAD</span>&quot;</span>
		<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
		<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #007800;">$#</span> = <span style="color: #000000;">0</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>
		<span style="color: #000000; font-weight: bold;">then</span>
			<span style="color: #800000;">${SHELL:-/bin/sh}</span>
		<span style="color: #000000; font-weight: bold;">fi</span>
&nbsp;
		<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #007800;">$#</span> <span style="color: #660033;">-gt</span> <span style="color: #000000;">0</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>
		<span style="color: #000000; font-weight: bold;">then</span>
			<span style="color: #7a0874; font-weight: bold;">exec</span> <span style="color: #ff0000;">&quot;$@&quot;</span>
		<span style="color: #000000; font-weight: bold;">fi</span>
	<span style="color: #000000; font-weight: bold;">;;</span>
<span style="color: #000000; font-weight: bold;">esac</span></pre></td></tr></table></div>

<p>从上面可以看出, tsocks这个脚本是通过修改LD_PRELOAD这个环境变量来达到它的转发网络请求的目的. 那为什么修改LD_PRELOAD这个环境变量就能达到他的转发网络请求的目的呢? 说简单点, LD_PRELOAD这个环境变量表示系统会把这个变量对应的共享库文件中的函数来覆盖目标程序中的函数, 详情请看<a href="http://blog.csdn.net/haoel/archive/2007/05/09/1602108.aspx" target="_blank">这里</a>.<br />
现在你应该明白了为什么tsoc -on前面为什么要加点号了吧? 你不加点号的话, tsocks脚本修改的LD_PRELOAD变量不对当前的shell进程起作用啊.<br />
把</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p39137code84'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3913784"><td class="code" id="p39137code84"><pre class="bash" style="font-family:monospace;">. tsocks <span style="color: #660033;">-on</span></pre></td></tr></table></div>

<p>加到你的.bashrc里面, 这样每次打开新的shell会话都可以使用代理. 如果想所有的软件都使用代理的话, 包括不是在shell里面启动的, 重启一下机器, 使得你的.bashrc里面的配置对全局生效.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p39137code85'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3913785"><td class="code" id="p39137code85"><pre class="bash" style="font-family:monospace;">. tsocks <span style="color: #660033;">-off</span></pre></td></tr></table></div>

<p>关闭tsocks,</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p39137code86'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3913786"><td class="code" id="p39137code86"><pre class="bash" style="font-family:monospace;">. tsocks <span style="color: #660033;">-sh</span></pre></td></tr></table></div>

<p>显示LD_PRELOAD的值. </p>
]]></content:encoded>
			<wfw:commentRss>http://ahei.info/tsocks.htm/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Nutch配置文件的加载</title>
		<link>http://ahei.info/nutch-load-conf.htm</link>
		<comments>http://ahei.info/nutch-load-conf.htm#comments</comments>
		<pubDate>Mon, 30 Nov 2009 11:42:50 +0000</pubDate>
		<dc:creator>ahei</dc:creator>
				<category><![CDATA[Nutch]]></category>
		<category><![CDATA[初级]]></category>
		<category><![CDATA[搜索引擎]]></category>
		<category><![CDATA[ahei]]></category>
		<category><![CDATA[crawl]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[se]]></category>
		<category><![CDATA[top]]></category>
		<category><![CDATA[vi]]></category>
		<category><![CDATA[抓取]]></category>
		<category><![CDATA[插件]]></category>
		<category><![CDATA[配置]]></category>
		<category><![CDATA[配置文件]]></category>

		<guid isPermaLink="false">http://ahei.yo2.cn/?p=37761</guid>
		<description><![CDATA[Nutch的配置文件主要有三类： Nutch插件的配置文件，这些配置文件主要是在加载插件的时候由插件自己加载的，主要是filter和normalizer插件的配置文件 Nutch自己的配置文件，nutch-default.xml和nutch-site.xml Hadoop的配置文件，hadoop-default.xml和hadoop-site.xml 这些配置文件的加载顺序决定了它们的优先级，优先级低的会被优先级高的配置文件中的配置覆盖，所以要想配置好nutch，了解配置文件的加载顺序是必须的。下面我通过对nutch源码的剖析来看看nutch是怎样加载配置文件的。 Nutch的主要命令是&#8221;./nutch crawl&#8221;，而这个crawl命令main类是org/apache/nutch/crawl/Crawl.java，我们就从Crawl.java的main方法开始。 Nutch配置文件的加载主要是以下代码： ?View Code JAVA1 2 3 4 5 6 7 8 9 10 11 12 /* Perform complete crawling and indexing given a set of root urls. */ public static void main&#40;String args&#91;&#93;&#41; throws Exception &#123; if &#40;args.length &#60; 1&#41; &#123; System.out.println&#40;&#34;Usage: Crawl &#60;urlDir&#62; [-dir d] [-threads n] [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright" title="Nutch" src="screenshots/nutch-logo.gif"/></p>
<p>Nutch的配置文件主要有三类：</p>
<ul>
<li>Nutch插件的配置文件，这些配置文件主要是在加载插件的时候由插件自己加载的，主要是filter和normalizer插件的配置文件</li>
<li>Nutch自己的配置文件，<a href="http://ahei.info/c-nutch.htm" class="st_tag internal_tag" rel="tag" title="标签 Nutch 下的日志">nutch</a>-default.xml和nutch-site.xml</li>
<li>Hadoop的配置文件，<a href="http://ahei.info/c-hadoop.htm" class="st_tag internal_tag" rel="tag" title="标签 Hadoop 下的日志">hadoop</a>-default.xml和hadoop-site.xml</li>
</ul>
<p>这些配置文件的加载顺序决定了它们的优先级，优先级低的会被优先级高的配置文件中的配置覆盖，所以要想配置好nutch，了解配置文件的加载顺序是必须的。下面我通过对nutch源码的剖析来看看nutch是怎样加载配置文件的。<span id="more-37761"></span></p>
<p>Nutch的主要命令是&#8221;./nutch <a href="http://ahei.info/t/crawl" class="st_tag internal_tag" rel="tag" title="标签 crawl 下的日志">crawl</a>&#8221;，而这个crawl命令main类是org/apache/nutch/<a href="http://ahei.info/t/crawl" class="st_tag internal_tag" rel="tag" title="标签 crawl 下的日志">crawl</a>/Crawl.<a href="http://ahei.info/t/java" class="st_tag internal_tag" rel="tag" title="标签 java 下的日志">java</a>，我们就从Crawl.java的main方法开始。</p>
<p>Nutch配置文件的加载主要是以下代码：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37761code91'); return false;">View Code</a> JAVA</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3776191"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code" id="p37761code91"><pre class="java" style="font-family:monospace;">  <span style="color: #666666; font-style: italic;">/* Perform complete crawling and indexing given a set of root urls. */</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> main<span style="color: #009900;">&#40;</span><a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Astring+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span style="color: #003399;">String</span></a> args<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Aexception+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span style="color: #003399;">Exception</span></a> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>args.<span style="color: #006633;">length</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
      <a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Asystem+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span style="color: #003399;">System</span></a>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Usage: Crawl &lt;urlDir&gt; [-dir d] [-threads n] [-depth i] [-topN N] [-r]&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Asystem+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span style="color: #003399;">System</span></a>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;-r<span style="color: #000099; font-weight: bold;">\t</span>remove css and javascript, default is do not remove&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #000000; font-weight: bold;">return</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    Configuration conf <span style="color: #339933;">=</span> NutchConfiguration.<span style="color: #006633;">create</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    conf.<span style="color: #006633;">addResource</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;crawl-tool.xml&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    JobConf job <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> NutchJob<span style="color: #009900;">&#40;</span>conf<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>上述代码中，&#8221;Configuration conf = NutchConfiguration.create();&#8221;生成一个NutchConfiguration的对象，NutchConfiguration是管理Nutch自己的配置文件的类，Configuration类是管理Hadoop配置文件的类，我们进入create方法：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37761code92'); return false;">View Code</a> JAVA</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3776192"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p37761code92"><pre class="java" style="font-family:monospace;">  <span style="color: #008000; font-style: italic; font-weight: bold;">/** Create a {@link Configuration} for Nutch. */</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> Configuration create<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    Configuration conf <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Configuration<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    addNutchResources<span style="color: #009900;">&#40;</span>conf<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">return</span> conf<span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>create方法中先创建一个Configuration对象，Configuration方法如下：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37761code93'); return false;">View Code</a> JAVA</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3776193"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code" id="p37761code93"><pre class="java" style="font-family:monospace;">  <span style="color: #008000; font-style: italic; font-weight: bold;">/** A new configuration. */</span>
  <span style="color: #000000; font-weight: bold;">public</span> Configuration<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">this</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #008000; font-style: italic; font-weight: bold;">/** A new configuration where the behavior of reading from the default 
   * resources can be turned off.
   * 
   * If the parameter {@code loadDefaults} is false, the new instance
   * will not load resources from the default files. 
   * @param loadDefaults specifies whether to load from the default files
   */</span>
  <span style="color: #000000; font-weight: bold;">public</span> Configuration<span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">boolean</span> loadDefaults<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>LOG.<span style="color: #006633;">isDebugEnabled</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      LOG.<span style="color: #006633;">debug</span><span style="color: #009900;">&#40;</span>StringUtils.<span style="color: #006633;">stringifyException</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> <a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Aioexception+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span style="color: #003399;">IOException</span></a><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;config()&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>loadDefaults<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      resources.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;hadoop-default.xml&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      resources.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;hadoop-site.xml&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>由此可见，当构造Configuration对象的时候，会先去加载hadoop-default.xml，然后再去加载hadoop-site.xml，所以hadoop-site.xml里面的配置会覆盖hadoop-default.xml里面的配置。<br />
了解了Hadoop的配置文件的加载，我们再回到刚才的create方法里面。<br />
现在要调用“addNutchResources(conf);”了，其定义如下：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37761code94'); return false;">View Code</a> JAVA</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p3776194"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p37761code94"><pre class="java" style="font-family:monospace;">  <span style="color: #008000; font-style: italic; font-weight: bold;">/** Add the standard Nutch resources to {@link Configuration}. */</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> Configuration addNutchResources<span style="color: #009900;">&#40;</span>Configuration conf<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    conf.<span style="color: #006633;">addResource</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;nutch-default.xml&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    conf.<span style="color: #006633;">addResource</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;nutch-site.xml&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">return</span> conf<span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>这里很明显看出，先加载nutch-default.xml文件，然后再加载nutch-site.xml文件。<br />
下面我们再沿着main方法继续往下看，该到调用“conf.addResource(&#8220;crawl-tool.xml&#8221;);”了，看来crawl-tool.xml最后加载，这个配置文件主要是用于配置抓取企业内部网。</p>
<p>通过我们上面简单的源码分析，我们得出Nutch配置文件的优先级为：</p>
<ul>
<li>hadoop-site.xml要高于hadoop-default.xml</li>
<li>crawl-tool.xml高于nutch-site.xml，nutch-site.xml高于nutch-default.xml</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://ahei.info/nutch-load-conf.htm/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Nutch的简单使用</title>
		<link>http://ahei.info/nutch-tutorial.htm</link>
		<comments>http://ahei.info/nutch-tutorial.htm#comments</comments>
		<pubDate>Wed, 25 Nov 2009 10:43:21 +0000</pubDate>
		<dc:creator>ahei</dc:creator>
				<category><![CDATA[Nutch]]></category>
		<category><![CDATA[初级]]></category>
		<category><![CDATA[搜索引擎]]></category>
		<category><![CDATA[ahei]]></category>
		<category><![CDATA[control]]></category>
		<category><![CDATA[crawl]]></category>
		<category><![CDATA[crawler]]></category>
		<category><![CDATA[ede]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[ide]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[se]]></category>
		<category><![CDATA[spider]]></category>
		<category><![CDATA[term]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[top]]></category>
		<category><![CDATA[ubuntu]]></category>
		<category><![CDATA[分布式]]></category>
		<category><![CDATA[抓取]]></category>
		<category><![CDATA[配置]]></category>
		<category><![CDATA[配置文件]]></category>

		<guid isPermaLink="false">http://ahei.yo2.cn/?p=37220</guid>
		<description><![CDATA[Nutch是一个开源的搜索引擎，包括抓取，索引，搜索，不过它主要专注于抓取，下面我讲一下它的简单使用。 首先，从这里下载Nutch的最新release(作此文时最新release为1.0)，或者从这里直接下载源码，然后解压。解压后，打开文件$NUTCH_HOME/conf/nutch-site.xml(NUTCH_HOME为你nutch所在的文件夹，这个nutch-site文件是nutch的配置文件，不要直接修改nutch-default文件，那个是nutch的默认配置，nutch-site.xml会覆盖nutch-default.xml中的配置，详情请见Nutch配置文件的加载。当然你也可以修改nutch-default,xml，但是nutch官方不推荐那样做)，在&#60;configuration&#62;和&#60;/configuration&#62;之间输入以下内容： ?View Code XML1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 &#60;property&#62; &#60;name&#62;http.agent.name&#60;/name&#62; &#60;value&#62;spider&#60;/value&#62; &#60;description&#62;HTTP 'User-Agent' request header. MUST NOT be empty - please set this to a single word uniquely related to your [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright" title="Nutch" src="screenshots/nutch-logo.gif"/></p>
<p><a href="http://lucene.apache.org/nutch/" target="_blank">Nutch</a>是一个开源的搜索引擎，包括抓取，索引，搜索，不过它主要专注于抓取，下面我讲一下它的简单使用。<span id="more-37220"></span></p>
<p>首先，从<a href="http://apache.etoak.com/lucene/nutch/" target="_blank">这里</a>下载Nutch的最新release(作此文时最新release为1.0)，或者从<a href="http://svn.apache.org/repos/asf/lucene/nutch/" target="_blank">这里</a>直接下载源码，然后解压。解压后，打开文件$<a href="http://ahei.info/c-nutch.htm" class="st_tag internal_tag" rel="tag" title="标签 Nutch 下的日志">NUTCH</a>_HOME/conf/<a href="http://ahei.info/c-nutch.htm" class="st_tag internal_tag" rel="tag" title="标签 Nutch 下的日志">nutch</a>-site.xml(<a href="http://ahei.info/c-nutch.htm" class="st_tag internal_tag" rel="tag" title="标签 Nutch 下的日志">NUTCH</a>_HOME为你nutch所在的文件夹，这个nutch-site文件是nutch的配置文件，不要直接修改nutch-default文件，那个是nutch的默认配置，<a href="http://ahei.info/c-nutch.htm" class="st_tag internal_tag" rel="tag" title="标签 Nutch 下的日志">nutch</a>-site.xml会覆盖nutch-default.xml中的配置，详情请见<a href="nutch-load-conf.htm" target="_blank">Nutch配置文件的加载</a>。当然你也可以修改nutch-default,xml，但是nutch官方不推荐那样做)，在&lt;configuration&gt;和&lt;/configuration&gt;之间输入以下内容：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37220code103'); return false;">View Code</a> XML</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p37220103"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
</pre></td><td class="code" id="p37220code103"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http.agent.name<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>spider<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>HTTP 'User-Agent' request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.
&nbsp;
  NOTE: You should also check other related properties:
&nbsp;
	http.robots.agents
	http.agent.description
	http.agent.url
	http.agent.email
	http.agent.version
&nbsp;
  and set their values appropriately.
&nbsp;
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
&nbsp;
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http.robots.agents<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>spider,*<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>The agent strings we'll look for in robots.txt files,
  comma-separated, in decreasing order of precedence. You should
  put the value of http.agent.name as the first agent name, and keep the
  default * at the end of the list. E.g.: BlurflDev,Blurfl,*
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<p>其中字段“http.agent.name”为你的crawler的名字(记得早期的版本可以不填的，现在的版本不填就报错)，字段http.robots.agents，也可以不填，但是不填的话抓取的时候nutch会报：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37220code104'); return false;">View Code</a> TEXT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p37220104"><td class="code" id="p37220code104"><pre class="text" style="font-family:monospace;">Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.</pre></td></tr></table></div>

<p>烦的慌，你要是不怕烦的话可以不填。<br />
然后再打开文件$NUTCH_HOME/conf/<a href="http://ahei.info/t/crawl" class="st_tag internal_tag" rel="tag" title="标签 crawl 下的日志">crawl</a>-urlfilter.txt，把该文件里面的MY.DOMAIN.NAME替换成你想抓取的域名，比如apache.org。</p>
<p>修改完以上的配置，现在就可以抓取了，抓取之前你得建立一个文件，里面存放你要抓取的url，比如建立一个文件urls，内容为：<a href="http://lucene.apache.org/nutch/">http://lucene.apache.org/nutch/</a>，把该文件放到目录urls下面，Nutch抓取的时候只能对一个目录下的所有文件中的url进行抓取，不能对一个文件中的url进行抓取(这是由它的分布式系统Hadoop的特性决定的)。抓取很简单：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37220code105'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p37220105"><td class="code" id="p37220code105"><pre class="bash" style="font-family:monospace;"><span style="color: #007800;">$NUTCH_HOME</span><span style="color: #000000; font-weight: bold;">/</span>bin<span style="color: #000000; font-weight: bold;">/</span>nutch crawl urls <span style="color: #660033;">-dir</span> crawl <span style="color: #660033;">-depth</span> <span style="color: #000000;">2</span></pre></td></tr></table></div>

<p>urls为待抓取的urls目录，crawl为输出目录(可以不写，默认为&#8221;crawl-&#8221;加当前日期和时间)，depth为抓取深度，默认为5。输出如下：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37220code106'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p37220106"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
</pre></td><td class="code" id="p37220code106"><pre class="bash" style="font-family:monospace;">ahei<span style="color: #000000; font-weight: bold;">@</span>ubuntu3:~<span style="color: #000000; font-weight: bold;">/</span>nutch-<span style="color: #000000;">1.0</span><span style="color: #000000; font-weight: bold;">/</span>bin$ .<span style="color: #000000; font-weight: bold;">/</span>nutch crawl urls <span style="color: #660033;">-dir</span> crawl <span style="color: #660033;">-depth</span> <span style="color: #000000;">2</span>
crawl started <span style="color: #000000; font-weight: bold;">in</span>: crawl
rootUrlDir = urls
threads = <span style="color: #000000;">10</span>
depth = <span style="color: #000000;">2</span>
	Injector: starting
Injector: crawlDb: crawl<span style="color: #000000; font-weight: bold;">/</span>crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: <span style="color: #000000; font-weight: bold;">done</span>
Generator: Selecting best-scoring urls due <span style="color: #000000; font-weight: bold;">for</span> fetch.
Generator: starting
Generator: segment: crawl<span style="color: #000000; font-weight: bold;">/</span>segments<span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000;">20091126170222</span>
Generator: filtering: <span style="color: #c20cb9; font-weight: bold;">true</span>
Generator: jobtracker is <span style="color: #ff0000;">'local'</span>, generating exactly one partition.
Generator: Partitioning selected urls by host, <span style="color: #000000; font-weight: bold;">for</span> politeness.
Generator: done.
Fetcher: Your <span style="color: #ff0000;">'http.agent.name'</span> value should be listed first <span style="color: #000000; font-weight: bold;">in</span> <span style="color: #ff0000;">'http.robots.agents'</span> property.
Fetcher: starting
Fetcher: segment: crawl<span style="color: #000000; font-weight: bold;">/</span>segments<span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000;">20091126170222</span>
Fetcher: threads: <span style="color: #000000;">10</span>
QueueFeeder finished: total <span style="color: #000000;">1</span> records.
fetching http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>
<span style="color: #660033;">-finishing</span> thread FetcherThread, <span style="color: #007800;">activeThreads</span>=<span style="color: #000000;">1</span>
<span style="color: #660033;">-finishing</span> thread FetcherThread, <span style="color: #007800;">activeThreads</span>=<span style="color: #000000;">1</span>
<span style="color: #660033;">-finishing</span> thread FetcherThread, <span style="color: #007800;">activeThreads</span>=<span style="color: #000000;">1</span>
<span style="color: #660033;">-finishing</span> thread FetcherThread, <span style="color: #007800;">activeThreads</span>=<span style="color: #000000;">1</span>
<span style="color: #660033;">-finishing</span> thread FetcherThread, <span style="color: #007800;">activeThreads</span>=<span style="color: #000000;">1</span>
<span style="color: #660033;">-finishing</span> thread FetcherThread, <span style="color: #007800;">activeThreads</span>=<span style="color: #000000;">1</span>
<span style="color: #660033;">-finishing</span> thread FetcherThread, <span style="color: #007800;">activeThreads</span>=<span style="color: #000000;">1</span>
<span style="color: #660033;">-finishing</span> thread FetcherThread, <span style="color: #007800;">activeThreads</span>=<span style="color: #000000;">1</span>
<span style="color: #660033;">-finishing</span> thread FetcherThread, <span style="color: #007800;">activeThreads</span>=<span style="color: #000000;">1</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">1</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">0</span>, fetchQueues.totalSize=<span style="color: #000000;">0</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">1</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">0</span>, fetchQueues.totalSize=<span style="color: #000000;">0</span>
<span style="color: #660033;">-finishing</span> thread FetcherThread, <span style="color: #007800;">activeThreads</span>=<span style="color: #000000;">0</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">0</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">0</span>, fetchQueues.totalSize=<span style="color: #000000;">0</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">0</span>
Fetcher: <span style="color: #000000; font-weight: bold;">done</span>
CrawlDb update: starting
CrawlDb update: db: crawl<span style="color: #000000; font-weight: bold;">/</span>crawldb
CrawlDb update: segments: <span style="color: #7a0874; font-weight: bold;">&#91;</span>crawl<span style="color: #000000; font-weight: bold;">/</span>segments<span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000;">20091126170222</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>
CrawlDb update: additions allowed: <span style="color: #c20cb9; font-weight: bold;">true</span>
CrawlDb update: URL normalizing: <span style="color: #c20cb9; font-weight: bold;">true</span>
CrawlDb update: URL filtering: <span style="color: #c20cb9; font-weight: bold;">true</span>
CrawlDb update: Merging segment data into db.
CrawlDb update: <span style="color: #000000; font-weight: bold;">done</span>
Generator: Selecting best-scoring urls due <span style="color: #000000; font-weight: bold;">for</span> fetch.
Generator: starting
Generator: segment: crawl<span style="color: #000000; font-weight: bold;">/</span>segments<span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000;">20091126170233</span>
Generator: filtering: <span style="color: #c20cb9; font-weight: bold;">true</span>
Generator: jobtracker is <span style="color: #ff0000;">'local'</span>, generating exactly one partition.
Generator: Partitioning selected urls by host, <span style="color: #000000; font-weight: bold;">for</span> politeness.
Generator: done.
Fetcher: Your <span style="color: #ff0000;">'http.agent.name'</span> value should be listed first <span style="color: #000000; font-weight: bold;">in</span> <span style="color: #ff0000;">'http.robots.agents'</span> property.
Fetcher: starting
Fetcher: segment: crawl<span style="color: #000000; font-weight: bold;">/</span>segments<span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000;">20091126170233</span>
Fetcher: threads: <span style="color: #000000;">10</span>
QueueFeeder finished: total <span style="color: #000000;">38</span> records.
fetching http:<span style="color: #000000; font-weight: bold;">//</span>wiki.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>
fetching http:<span style="color: #000000; font-weight: bold;">//</span>issues.apache.org<span style="color: #000000; font-weight: bold;">/</span>jira<span style="color: #000000; font-weight: bold;">/</span>browse<span style="color: #000000; font-weight: bold;">/</span>Nutch
fetching http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>tutorial.html
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">7</span>, fetchQueues.totalSize=<span style="color: #000000;">35</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">9</span>, fetchQueues.totalSize=<span style="color: #000000;">35</span>
fetching http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>skin<span style="color: #000000; font-weight: bold;">/</span>breadcrumbs.js
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">9</span>, fetchQueues.totalSize=<span style="color: #000000;">34</span>
Error parsing: http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>skin<span style="color: #000000; font-weight: bold;">/</span>breadcrumbs.js: org.apache.nutch.parse.ParseException: parser not found <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #007800;">contentType</span>=application<span style="color: #000000; font-weight: bold;">/</span>javascript <span style="color: #007800;">url</span>=http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>skin<span style="color: #000000; font-weight: bold;">/</span>breadcrumbs.js
	at org.apache.nutch.parse.ParseUtil.parse<span style="color: #7a0874; font-weight: bold;">&#40;</span>ParseUtil.java:<span style="color: #000000;">74</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
	at org.apache.nutch.fetcher.Fetcher<span style="color: #007800;">$FetcherThread</span>.output<span style="color: #7a0874; font-weight: bold;">&#40;</span>Fetcher.java:<span style="color: #000000;">766</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
	at org.apache.nutch.fetcher.Fetcher<span style="color: #007800;">$FetcherThread</span>.run<span style="color: #7a0874; font-weight: bold;">&#40;</span>Fetcher.java:<span style="color: #000000;">552</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
&nbsp;
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">10</span>, fetchQueues.totalSize=<span style="color: #000000;">34</span>
fetching http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>version_control.html
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">9</span>, fetchQueues.totalSize=<span style="color: #000000;">33</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">10</span>, fetchQueues.totalSize=<span style="color: #000000;">33</span>
fetching http:<span style="color: #000000; font-weight: bold;">//</span>wiki.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>FAQ
fetching http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>apidocs-<span style="color: #000000;">0.8</span>.x<span style="color: #000000; font-weight: bold;">/</span>index.html
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">8</span>, fetchQueues.totalSize=<span style="color: #000000;">31</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">9</span>, fetchQueues.totalSize=<span style="color: #000000;">31</span>
fetching http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>hadoop<span style="color: #000000; font-weight: bold;">/</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">8</span>, fetchQueues.totalSize=<span style="color: #000000;">30</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">10</span>, fetchQueues.totalSize=<span style="color: #000000;">30</span>
fetching http:<span style="color: #000000; font-weight: bold;">//</span>forrest.apache.org<span style="color: #000000; font-weight: bold;">/</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">9</span>, fetchQueues.totalSize=<span style="color: #000000;">29</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">9</span>, fetchQueues.totalSize=<span style="color: #000000;">29</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">10</span>, fetchQueues.totalSize=<span style="color: #000000;">29</span>
fetching http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>apidocs-<span style="color: #000000;">0.9</span><span style="color: #000000; font-weight: bold;">/</span>index.html
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">9</span>, fetchQueues.totalSize=<span style="color: #000000;">28</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">10</span>, fetchQueues.totalSize=<span style="color: #000000;">28</span>
fetching http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>credits.html
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">10</span>, fetchQueues.totalSize=<span style="color: #000000;">27</span>
fetching http:<span style="color: #000000; font-weight: bold;">//</span>www.apache.org<span style="color: #000000; font-weight: bold;">/</span>dist<span style="color: #000000; font-weight: bold;">/</span>lucene<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>CHANGES-<span style="color: #000000;">0.9</span>.txt
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">9</span>, fetchQueues.totalSize=<span style="color: #000000;">26</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">9</span>, fetchQueues.totalSize=<span style="color: #000000;">26</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">9</span>, fetchQueues.totalSize=<span style="color: #000000;">26</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">10</span>, fetchQueues.totalSize=<span style="color: #000000;">26</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">10</span>, fetchQueues.totalSize=<span style="color: #000000;">26</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">10</span>, fetchQueues.totalSize=<span style="color: #000000;">26</span>
<span style="color: #660033;">-activeThreads</span>=<span style="color: #000000;">10</span>, <span style="color: #007800;">spinWaiting</span>=<span style="color: #000000;">10</span>, fetchQueues.totalSize=<span style="color: #000000;">26</span></pre></td></tr></table></div>

<p>抓取完数据之后怎样检验呢？使用命令：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37220code107'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p37220107"><td class="code" id="p37220code107"><pre class="bash" style="font-family:monospace;"><span style="color: #007800;">$NUTCH_HOME</span><span style="color: #000000; font-weight: bold;">/</span>bin<span style="color: #000000; font-weight: bold;">/</span>nutch org.apache.nutch.searcher.NutchBean apache</pre></td></tr></table></div>

<p>这个命令会给出apache的搜索结果，这个命令默认是对crawl目录进行搜索，这是代码证明：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37220code108'); return false;">View Code</a> JAVA</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p37220108"><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code" id="p37220code108"><pre class="java" style="font-family:monospace;">文件：$NUTCH_HOME<span style="color: #339933;">/</span>src<span style="color: #339933;">/</span>java<span style="color: #339933;">/</span>org<span style="color: #339933;">/</span>apache<span style="color: #339933;">/</span>nutch<span style="color: #339933;">/</span>searcher<span style="color: #339933;">/</span>NutchBean.<span style="color: #006633;">java</span><span style="color: #339933;">:</span><span style="color: #cc66cc;">87</span>
  <span style="color: #000000; font-weight: bold;">public</span> NutchBean<span style="color: #009900;">&#40;</span>Configuration conf, Path dir<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Aioexception+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span style="color: #003399;">IOException</span></a> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">conf</span> <span style="color: #339933;">=</span> conf<span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">fs</span> <span style="color: #339933;">=</span> FileSystem.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">conf</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>dir <span style="color: #339933;">==</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      dir <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Path<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">conf</span>.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;searcher.dir&quot;</span>, <span style="color: #0000ff;">&quot;crawl&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>要想对其他目录进行搜索，在nutch-site.xml中加入以下内容：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37220code109'); return false;">View Code</a> XML</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p37220109"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code" id="p37220code109"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>searcher.dir<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>other-searcher-dir<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/value<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  Path to root of crawl.  This directory is searched (in
  order) for either the file search-servers.txt, containing a list of
  distributed search servers, or the directory &quot;index&quot; containing
  merged indexes, or the directory &quot;segments&quot; containing segment
  indexes.
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/property<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></td></tr></table></div>

<p>搜索结果如下：</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p37220code110'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p37220110"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="code" id="p37220code110"><pre class="bash" style="font-family:monospace;">ahei<span style="color: #000000; font-weight: bold;">@</span>ubuntu3:~<span style="color: #000000; font-weight: bold;">/</span>nutch-<span style="color: #000000;">1.0</span><span style="color: #000000; font-weight: bold;">/</span>bin$ .<span style="color: #000000; font-weight: bold;">/</span>nutch org.apache.nutch.searcher.NutchBean apache
Total hits: <span style="color: #000000;">25</span>
 <span style="color: #000000;">0</span> <span style="color: #000000;">20091126170222</span><span style="color: #000000; font-weight: bold;">/</span>http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>
 ... Lucene. January <span style="color: #000000;">2005</span>: Nutch Joins Apache Incubator Nutch is a ... determined that the Apache license is the appropriate
 <span style="color: #000000;">1</span> <span style="color: #000000;">20091126170233</span><span style="color: #000000; font-weight: bold;">/</span>http:<span style="color: #000000; font-weight: bold;">//</span>www.apache.org<span style="color: #000000; font-weight: bold;">/</span>
 ... including Apache XML, Apache Jakarta, Apache Cocoon, Apache Xerces, Apache Ant, and Apache ... Source projects such <span style="color: #c20cb9; font-weight: bold;">as</span> NoSQL, Apache ... 
 <span style="color: #000000;">2</span> <span style="color: #000000;">20091126170233</span><span style="color: #000000; font-weight: bold;">/</span>http:<span style="color: #000000; font-weight: bold;">//</span>www.apache.org<span style="color: #000000; font-weight: bold;">/</span>licenses<span style="color: #000000; font-weight: bold;">/</span>
 ... Copyright © <span style="color: #000000;">2009</span> The Apache Software Foundation, Licensed under the ... Apache License, Version <span style="color: #000000;">2.0</span> . Apache ... Apache and the  ... 
 <span style="color: #000000;">3</span> <span style="color: #000000;">20091126170233</span><span style="color: #000000; font-weight: bold;">/</span>http:<span style="color: #000000; font-weight: bold;">//</span>forrest.apache.org<span style="color: #000000; font-weight: bold;">/</span>
 ... Welcome to Apache Forrest apache <span style="color: #000000; font-weight: bold;">&gt;</span> forrest   Welcome Developers Versioned Docs ... Example sites Thanks Related projects Apache Gump Apache ... 
 <span style="color: #000000;">4</span> <span style="color: #000000;">20091126170233</span><span style="color: #000000; font-weight: bold;">/</span>http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>
 ... the release of Apache Mahout <span style="color: #000000;">0.1</span>. Apache Mahout is a subproject ... on top of  ... 
 <span style="color: #000000;">5</span> <span style="color: #000000;">20091126170233</span><span style="color: #000000; font-weight: bold;">/</span>http:<span style="color: #000000; font-weight: bold;">//</span>wiki.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>
FrontPage - Nutch Wiki Search: Nutch Wiki Login FrontPage FrontPage RecentChanges FindPage HelpContents Immutable Page Comments Info Attachments More Actions:  ... 
 <span style="color: #000000;">6</span> <span style="color: #000000;">20091126170233</span><span style="color: #000000; font-weight: bold;">/</span>http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>index.html
 ... Lucene. January <span style="color: #000000;">2005</span>: Nutch Joins Apache Incubator Nutch is a ... determined that the Apache license is the appropriate
 <span style="color: #000000;">7</span> <span style="color: #000000;">20091126170233</span><span style="color: #000000; font-weight: bold;">/</span>http:<span style="color: #000000; font-weight: bold;">//</span>wiki.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>FAQ
 ... all available at http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>mailing_lists.html . How ... 
 <span style="color: #000000;">8</span> <span style="color: #000000;">20091126170233</span><span style="color: #000000; font-weight: bold;">/</span>http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>tutorial8.html
 ... http:<span style="color: #000000; font-weight: bold;">//</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>a-z0-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #000000; font-weight: bold;">*</span>\.<span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #000000; font-weight: bold;">*</span>apache.org<span style="color: #000000; font-weight: bold;">/</span> This will include any ... <span style="color: #000000; font-weight: bold;">in</span> the domain apache.org . Edit the <span style="color: #c20cb9; font-weight: bold;">file</span> ... 
 <span style="color: #000000;">9</span> <span style="color: #000000;">20091126170233</span><span style="color: #000000; font-weight: bold;">/</span>http:<span style="color: #000000; font-weight: bold;">//</span>lucene.apache.org<span style="color: #000000; font-weight: bold;">/</span>nutch<span style="color: #000000; font-weight: bold;">/</span>tutorial.html
 ... crawl to the apache.org domain, the line ... http:<span style="color: #000000; font-weight: bold;">//</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#91;</span>a-z0-<span style="color: #000000;">9</span><span style="color: #7a0874; font-weight: bold;">&#93;</span><span style="color: #000000; font-weight: bold;">*</span>\.<span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #000000; font-weight: bold;">*</span>apache.org<span style="color: #000000; font-weight: bold;">/</span> This will include any</pre></td></tr></table></div>

<p>Nutch的入门使用很简单吧，上面所述只是在一台机器上进行抓取，Nutch有个分布式系统Hadoop，可以实现分布式抓取，请看<a href="nutch-distributed-crawl.htm" target="_blank">Nutch的分布式抓取</a>。</p>
]]></content:encoded>
			<wfw:commentRss>http://ahei.info/nutch-tutorial.htm/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

