Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Chapter 8

When the pattern is edited to (?<=\d)(?=(\d\d\d)+$), you get the results you want. The position following the numeric digit 2 now fails to satisfy the lookahead constraint. It is followed by five numeric digits, which does not match the pattern (\d\d\d)+.

However, the position after the numeric digit 1 still matches. It is followed by six numeric digits, which matches the pattern (\d\d\d)+. Similarly, the position after the numeric digit 4 is matched, because it is followed by three numeric digits, which matches the pattern (\d\d\d)+. In both those positions that match, a comma is inserted.

Exercises

These exercises allow you to test your understanding of some of the techniques for lookahead and lookbehind that were introduced in this chapter:

1.Specify a pattern that will match a sequence of one or more alphabetic characters only if they are followed by a comma character.

2.Create a pattern, using lookbehind and lookahead, to match the word sheep. Do not use the word-boundary metacharacters in your pattern.

220

9

Sensitivity and Specificity

of Regular Expressions

This chapter discusses the issues of sensitivity and specificity of regular expression patterns. Sensitivity and specificity relate to two fundamental tasks in all uses of regular expressions: trying to ensure that you match all the text that you want to match and trying to avoid matching text that you don’t want to match.

Assuming that you typically want to manipulate the data that you match in some way, failing to match desired data will mean that part of your intended task remains undone. If you don’t have a good appreciation of your data and the effect on it of the regular expression that you are using, you can be completely unaware that you have missed some data. At least, you are unaware that you have missed it until your manager or a customer calls and complains.

Conversely, matching and manipulating undesired data may well corrupt parts of your data. Whether that data corruption leads to minor typos or more serious problems depends on your data, what its intended use is, and the extent and severity of the undesired changes you unintentionally make to it. Again, the undesired effects can impact adversely on customer satisfaction. So sensitivity and specificity are issues to take seriously.

In this chapter, you will learn the following:

What sensitivity and specificity are

How to work out how far you should go in investing time and effort in maximizing sensitivity and/or specificity

How to use regular expression techniques to give an optimal balance of sensitivity and specificity

How the detail of the data source can affect sensitivity and specificity

How to gain a better balance of sensitivity and specificity in the Star Training Company example