Linux迷+Python粉 - 字符串https://blog.pythonwood.com/2017-11-30T22:12:00+08:00【实习记】2014-08-29算法学习Boyer-Moore和最长公共子串(LCS)问题——阿里校招题2014-08-29T21:23:00+08:002017-11-30T22:12:00+08:00pythonwoodtag:blog.pythonwood.com,2014-08-29:/2014/08/【实习记】2014-08-29算法学习Boyer-Moore和最长公共子串(LCS)问题——阿里校招题/<!--
昨天的问题
方案一:寻找hash函数,可行性极低。
方案二:载入内存,维护成一个守护进程的服务。难度比较大。
方案三:使用前5位来索引,由前3位增至前5位唯一性,理论上是分拆记录扩大100倍,但可以就地利用mysql,最易行。
方案四:使用方案三,但增加一个表以减少冗余,但代价新开一个表,并且每次查询都select join两个表。
-->
<h3 id="_1">最长公共子串问题分析<a class="headerlink" href="#_1" title="Permanent link">¶</a></h3>
<p>其实这是一个序贯决策问题,可以用动态规划来求解。我们采用一个二维矩阵来记录中间的结果。这个二维矩阵怎么构造呢?直接举个例子吧:”bab”和”caba”(当然我们现在一眼就可以看出来最长公共子串是”ba”或”ab”)</p>
<p>b a b</p>
<p>c 0 0 0 …</p><!--
昨天的问题
方案一:寻找hash函数,可行性极低。
方案二:载入内存,维护成一个守护进程的服务。难度比较大。
方案三:使用前5位来索引,由前3位增至前5位唯一性,理论上是分拆记录扩大100倍,但可以就地利用mysql,最易行。
方案四:使用方案三,但增加一个表以减少冗余,但代价新开一个表,并且每次查询都select join两个表。
-->
<h3 id="_1">最长公共子串问题分析<a class="headerlink" href="#_1" title="Permanent link">¶</a></h3>
<p>其实这是一个序贯决策问题,可以用动态规划来求解。我们采用一个二维矩阵来记录中间的结果。这个二维矩阵怎么构造呢?直接举个例子吧:”bab”和”caba”(当然我们现在一眼就可以看出来最长公共子串是”ba”或”ab”)</p>
<p>b a b</p>
<p>c 0 0 0</p>
<p>a 0 1 0</p>
<p>b 1 0 1</p>
<p>a 0 1 0</p>
<p>我们看矩阵的斜对角线最长的那个就能找出最长公共子串。</p>
<p>不过在二维矩阵上找最长的由1组成的斜对角线也是件麻烦费时的事,下面改进:当要在矩阵是填1时让它等于其左上角元素加1。</p>
<p>b a b</p>
<p>c 0 0 0</p>
<p>a 0 1 0</p>
<p>b 1 0 2</p>
<p>a 0 2 0</p>
<p>这样矩阵中的最大元素就是 最长公共子串的长度。</p>
<p>在构造这个二维矩阵的过程中由于得出矩阵的某一行后其上一行就没用了,所以实际上在程序中可以用一维数组来代替这个矩阵。</p>
<h3 id="_2">代码实践<a class="headerlink" href="#_2" title="Permanent link">¶</a></h3>
<p>根据以上算法 使用C语言实践了一下。</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf"><stdlib.h></span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf"><string.h></span><span class="cp"></span>
<span class="kt">int</span> <span class="nf">comfix</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">stra</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">strb</span><span class="p">);</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">){</span>
<span class="k">const</span> <span class="kt">char</span>
<span class="o">*</span><span class="n">stra</span> <span class="o">=</span> <span class="s">"hello world"</span><span class="p">,</span>
<span class="o">*</span><span class="n">strb</span> <span class="o">=</span> <span class="s">"malloc"</span><span class="p">;</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%s,%s: %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">stra</span><span class="p">,</span> <span class="n">strb</span><span class="p">,</span> <span class="n">comfix</span><span class="p">(</span><span class="n">stra</span><span class="p">,</span> <span class="n">strb</span><span class="p">));</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">comfix</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">stra</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">strb</span><span class="p">){</span>
<span class="cm">/*</span>
<span class="cm"> * 变量第一字符</span>
<span class="cm"> * c:char*, l:len</span>
<span class="cm"> * 变量第二字符</span>
<span class="cm"> * s:small, l:large</span>
<span class="cm"> */</span>
<span class="k">const</span> <span class="kt">char</span>
<span class="o">*</span><span class="n">cs</span> <span class="o">=</span> <span class="n">stra</span><span class="p">,</span>
<span class="o">*</span><span class="n">cl</span> <span class="o">=</span> <span class="n">strb</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
<span class="n">la</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">stra</span><span class="p">),</span>
<span class="n">lb</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">strb</span><span class="p">),</span>
<span class="n">ls</span> <span class="o">=</span> <span class="n">la</span><span class="p">,</span>
<span class="n">ll</span> <span class="o">=</span> <span class="n">lb</span><span class="p">;</span>
<span class="cm">/* 如果不对,就调换呗 */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">lb</span><span class="o"><</span><span class="n">la</span><span class="p">)</span>
<span class="n">cs</span> <span class="o">=</span> <span class="n">strb</span><span class="p">,</span> <span class="n">ls</span> <span class="o">=</span> <span class="n">lb</span><span class="p">,</span> <span class="n">cl</span> <span class="o">=</span> <span class="n">stra</span><span class="p">,</span> <span class="n">ll</span> <span class="o">=</span> <span class="n">la</span><span class="p">;</span>
<span class="cm">/* 矩阵,只保存矩阵的一行即可动态之 */</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">pint</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">((</span><span class="n">ls</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">*</span><span class="mi">4</span><span class="p">);</span>
<span class="n">memset</span><span class="p">(</span><span class="n">pint</span> <span class="p">,</span><span class="mi">0</span> <span class="p">,</span> <span class="p">(</span><span class="n">ls</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">*</span><span class="mi">4</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o"><</span><span class="n">ll</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">){</span>
<span class="cm">/* 生成下一行,同时上一行内容被回收 */</span>
<span class="k">for</span> <span class="p">(</span><span class="n">j</span><span class="o">=</span><span class="n">ls</span><span class="p">;</span> <span class="n">j</span><span class="o">></span><span class="n">ret</span><span class="p">;</span> <span class="n">j</span><span class="o">--</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">cl</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">==</span><span class="n">cs</span><span class="p">[</span><span class="n">j</span><span class="p">])</span>
<span class="n">pint</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">pint</span><span class="p">[</span><span class="n">j</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">+</span><span class="mi">1</span><span class="p">;</span>
<span class="cm">/* 如果有更大就更新ret */</span>
<span class="k">for</span> <span class="p">(</span><span class="n">j</span><span class="o">=</span><span class="n">ls</span><span class="p">;</span> <span class="n">j</span><span class="o">></span><span class="n">ret</span><span class="p">;</span> <span class="n">j</span><span class="o">--</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pint</span><span class="p">[</span><span class="n">j</span><span class="p">]</span><span class="o">></span><span class="n">ret</span><span class="p">)</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">pint</span><span class="p">[</span><span class="n">j</span><span class="p">];</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>这种算法非常巧妙地化繁为简,有时换一个思路,就会扩然开朗!</p>
<p>比较喜欢这种锻炼。</p>
<h3 id="_3">参考<a class="headerlink" href="#_3" title="Permanent link">¶</a></h3>
<p>研究了 求最长公共子串问题,顺便研究了字符串匹配</p>
<p>字符串匹配的Boyer-Moore算法 http://www.ruanyifeng.com/blog/2013/05/boyer-moore_string_search_algorithm.html</p>
<p>字符串匹配的<span class="caps">KMP</span>算法 http://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html</p>
<p>动态规划算法之:最长公共子序列 <span class="amp">&</span> 最长公共子串(<span class="caps">LCS</span>) http://my.oschina.net/leejun2005/blog/117167</p>