Process a block of HTML, ignore content in specific tags

On a blog I wish to pass all of the text for a blog entry through a PHP script to process quotes and some other items into nice typographic characters.

The blog text in question contains HTML, and in particular will highlight code snippets contained within

 ... 


blocks. The code
blocks can appear randomly and in multiple places within the text (much like Stack Overflow!)

I do not want those code blocks processed by the typographic scripts I will be using. The processing itself is not the point, being able to selectively apply it is.

I have been able to write a regex to find those blocks:

preg_match_all('/(
(.*?)

)/s', $text, $matches);

But I am not sure what the best way is to process the rest of the text and then slot these blocks back into their correct places.

Thanks for your help!

The first solution that comes to my mind looks like this :

  • extract all the codes
  • remove the codes, replacing them with a special marker, that will not be affected by your string manipulations — that marker has to be really special (and you could verify it’s not present in the input string, btw)
  • do your manipulations on the string
  • put back the codes, where there are markers now

In code, it could be something like this : (sorry, it’s quite long — and I didn’t include any check ; it’s up to you to add those)

$str = <<<A
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec sodales lacus et erat accumsan consectetur. Sed lacinia enim vitae erat suscipit fermentum. Quisque lobortis nisi et lacus imperdiet ac malesuada dui imperdiet. 
ThIs Is
CoDe 1

Donec vestibulum commodo quam rhoncus luctus. Nam vitae ipsum sed nibh dignissim condimentum. Sed ultrices fermentum dapibus. Vivamus mattis nisi nec enim convallis quis aliquet arcu accumsan. Suspendisse potenti. Nullam eget fringilla nunc. Nulla porta justo justo. Nunc consectetur egestas malesuada. Mauris ac nisi ipsum, et accumsan lorem. Quisque interdum accumsan pellentesque. Sed at felis metus. Nulla gravida tincidunt tortor,

AnD cOdE 2

nec aliquam tortor ultricies vel. Integer semper libero eu magna congue eget lacinia purus auctor. Nunc volutpat ultricies feugiat. Nullam id mauris eget ipsum ultricies ullamcorper non vel risus. Proin volutpat volutpat interdum. Nulla orci odio, ornare sit amet ullamcorper non, condimentum sagittis libero.

aNd
CoDe
NuMbEr 3

Ut non justo at neque convallis luctus ultricies amet.
A;
var_dump($str);

// Extract the codes
$matches = array();
preg_match_all('#

(.*?)

#s', $str, $matches);
var_dump($matches);

// Remove the codes
$str_nocode = preg_replace('#

.*?

#s', 'THIS_IS_A_NOCODE_MARKER', $str);
var_dump($str_nocode);

// Do whaterver you want with $str_nocode
$str_nocode = strtoupper($str_nocode);
var_dump($str_nocode);

// And put back the codes :
$str_codes = $str_nocode;
foreach ($matches[0] as $code) {
$str_codes = preg_replace('#THIS_IS_A_NOCODE_MARKER#', $code, $str_codes, 1);
}
var_dump($str_codes);

I’ve tried with :

  • code on one line,
  • code on 2 lines,
  • and code on multiple lines

Note : you should really test more than I did — but this could give you a first idea…

Hope this helps :-)

As a side note : generally, parsing HTML with regexes is considered bad practice, and often leads to troubles… Maybe using something like
DOMDocument::loadHTML

could be an idea worth having a look ?

Hello, buddy!责编内容来自:Hello, buddy! (源链) | 更多关于

阅读提示:酷辣虫无法对本内容的真实性提供任何保证,请自行验证并承担相关的风险与后果!
本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 前端开发 » Process a block of HTML&comma; ignore content in specific tags

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录