BreakIterator Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
<strong>[icu enhancement]</strong> ICU's replacement for java.text.BreakIterator
.
[Android.Runtime.Register("android/icu/text/BreakIterator", ApiSince=24, DoNotGenerateAcw=true)]
public abstract class BreakIterator : Java.Lang.Object, IDisposable, Java.Interop.IJavaPeerable, Java.Lang.ICloneable
[<Android.Runtime.Register("android/icu/text/BreakIterator", ApiSince=24, DoNotGenerateAcw=true)>]
type BreakIterator = class
inherit Object
interface ICloneable
interface IJavaObject
interface IDisposable
interface IJavaPeerable
- Inheritance
- Attributes
- Implements
Remarks
<strong>[icu enhancement]</strong> ICU's replacement for java.text.BreakIterator
. Methods, fields, and other functionality specific to ICU are labeled '<strong>[icu]</strong>'.
A class that locates boundaries in text. This class defines a protocol for objects that break up a piece of natural-language text according to a set of criteria. Instances or subclasses of BreakIterator can be provided, for example, to break a piece of text into words, sentences, or logical characters according to the conventions of some language or group of languages.
We provide five built-in types of BreakIterator: <ul><li>getTitleInstance() returns a BreakIterator that locates boundaries between title breaks. <li>getSentenceInstance() returns a BreakIterator that locates boundaries between sentences. This is useful for triple-click selection, for example. <li>getWordInstance() returns a BreakIterator that locates boundaries between words. This is useful for double-click selection or "find whole words" searches. This type of BreakIterator makes sure there is a boundary position at the beginning and end of each legal word. (Numbers count as words, too.) Whitespace and punctuation are kept separate from real words. <li>getLineInstance() returns a BreakIterator that locates positions where it is legal for a text editor to wrap lines. This is similar to word breaking, but not the same: punctuation and whitespace are generally kept with words (you don't want a line to start with whitespace, for example), and some special characters can force a position to be considered a line-break position or prevent a position from being a line-break position. <li>getCharacterInstance() returns a BreakIterator that locates boundaries between logical characters. Because of the structure of the Unicode encoding, a logical character may be stored internally as more than one Unicode code point. (A with an umlaut may be stored as an a followed by a separate combining umlaut character, for example, but the user still thinks of it as one character.) This iterator allows various processes (especially text editors) to treat as characters the units of text that a user would think of as characters, rather than the units of text that the computer sees as "characters".</ul> The text boundary positions are found according to the rules described in Unicode Standard Annex #29, Text Boundaries, and Unicode Standard Annex #14, Line Breaking Properties. These are available at http://www.unicode.org/reports/tr14/ and http://www.unicode.org/reports/tr29/.
BreakIterator's interface follows an "iterator" model (hence the name), meaning it has a concept of a "current position" and methods like first(), last(), next(), and previous() that update the current position. All BreakIterators uphold the following invariants: <ul><li>The beginning and end of the text are always treated as boundary positions. <li>The current position of the iterator is always a boundary position (random- access methods move the iterator to the nearest boundary position before or after the specified position, not to the specified position). <li>DONE is used as a flag to indicate when iteration has stopped. DONE is only returned when the current position is the end of the text and the user calls next(), or when the current position is the beginning of the text and the user calls previous(). <li>Break positions are numbered by the positions of the characters that follow them. Thus, under normal circumstances, the position before the first character is 0, the position after the first character is 1, and the position after the last character is 1 plus the length of the string. <li>The client can change the position of an iterator, or the text it analyzes, at will, but cannot change the behavior. If the user wants different behavior, he must instantiate a new iterator.</ul>
BreakIterator accesses the text it analyzes through a CharacterIterator, which makes it possible to use BreakIterator to analyze text in any text-storage vehicle that provides a CharacterIterator interface.
<b>Note:</b> Some types of BreakIterator can take a long time to create, and instances of BreakIterator are not currently cached by the system. For optimal performance, keep instances of BreakIterator around as long as makes sense. For example, when word-wrapping a document, don't create and destroy a new BreakIterator for each line. Create one break iterator for the whole document (or whatever stretch of text you're wrapping) and use it to do the whole job of wrapping the text.
<strong>Examples</strong>:
Creating and using text boundaries <blockquote>
public static void main(String args[]) {
if (args.length == 1) {
String stringToExamine = args[0];
//print each word in order
BreakIterator boundary = BreakIterator.getWordInstance();
boundary.setText(stringToExamine);
printEachForward(boundary, stringToExamine);
//print each sentence in reverse order
boundary = BreakIterator.getSentenceInstance(Locale.US);
boundary.setText(stringToExamine);
printEachBackward(boundary, stringToExamine);
printFirst(boundary, stringToExamine);
printLast(boundary, stringToExamine);
}
}
</blockquote>
Print each element in order <blockquote>
public static void printEachForward(BreakIterator boundary, String source) {
int start = boundary.first();
for (int end = boundary.next();
end != BreakIterator.DONE;
start = end, end = boundary.next()) {
System.out.println(source.substring(start,end));
}
}
</blockquote>
Print each element in reverse order <blockquote>
public static void printEachBackward(BreakIterator boundary, String source) {
int end = boundary.last();
for (int start = boundary.previous();
start != BreakIterator.DONE;
end = start, start = boundary.previous()) {
System.out.println(source.substring(start,end));
}
}
</blockquote>
Print first element <blockquote>
public static void printFirst(BreakIterator boundary, String source) {
int start = boundary.first();
int end = boundary.next();
System.out.println(source.substring(start,end));
}
</blockquote>
Print last element <blockquote>
public static void printLast(BreakIterator boundary, String source) {
int end = boundary.last();
int start = boundary.previous();
System.out.println(source.substring(start,end));
}
</blockquote>
Print the element at a specified position <blockquote>
public static void printAt(BreakIterator boundary, int pos, String source) {
int end = boundary.following(pos);
int start = boundary.previous();
System.out.println(source.substring(start,end));
}
</blockquote>
Find the next word <blockquote>
public static int nextWordStartAfter(int pos, String text) {
BreakIterator wb = BreakIterator.getWordInstance();
wb.setText(text);
int wordStart = wb.following(pos);
for (;;) {
int wordLimit = wb.next();
if (wordLimit == BreakIterator.DONE) {
return BreakIterator.DONE;
}
int wordStatus = wb.getRuleStatus();
if (wordStatus != BreakIterator.WORD_NONE) {
return wordStart;
}
wordStart = wordLimit;
}
}
The iterator returned by #getWordInstance
is unique in that the break positions it returns don't represent both the start and end of the thing being iterated over. That is, a sentence-break iterator returns breaks that each represent the end of one sentence and the beginning of the next. With the word-break iterator, the characters between two boundaries might be a word, or they might be the punctuation or whitespace between two words. The above code uses #getRuleStatus
to identify and ignore boundaries associated with punctuation or other non-word characters. </blockquote>
Java documentation for android.icu.text.BreakIterator
.
Portions of this page are modifications based on work created and shared by the Android Open Source Project and used according to terms described in the Creative Commons 2.5 Attribution License.
Constructors
BreakIterator() |
Default constructor. |
BreakIterator(IntPtr, JniHandleOwnership) |
Fields
Done |
DONE is returned by previous() and next() after all valid boundaries have been returned. |
KindCharacter |
Obsolete.
<strong>[icu]</strong> |
KindLine |
Obsolete.
<strong>[icu]</strong> |
KindSentence |
Obsolete.
<strong>[icu]</strong> |
KindTitle |
<strong>[icu]</strong> |
KindWord |
Obsolete.
<strong>[icu]</strong> |
WordIdeo |
Obsolete.
Tag value for words containing ideographic characters, lower limit |
WordIdeoLimit |
Obsolete.
Tag value for words containing ideographic characters, upper limit |
WordKana |
Obsolete.
Tag value for words containing kana characters, lower limit |
WordKanaLimit |
Obsolete.
Tag value for words containing kana characters, upper limit |
WordLetter |
Obsolete.
Tag value for words that contain letters, excluding hiragana, katakana or ideographic characters, lower limit. |
WordLetterLimit |
Obsolete.
Tag value for words containing letters, upper limit |
WordNone |
Obsolete.
Tag value for "words" that do not fit into any of other categories. |
WordNoneLimit |
Obsolete.
Upper bound for tags for uncategorized words. |
WordNumber |
Obsolete.
Tag value for words that appear to be numbers, lower limit. |
WordNumberLimit |
Obsolete.
Tag value for words that appear to be numbers, upper limit. |
Properties
CharacterInstance |
Returns a new instance of BreakIterator that locates logical-character boundaries. |
Class |
Returns the runtime class of this |
Handle |
The handle to the underlying Android instance. (Inherited from Object) |
JniIdentityHashCode | (Inherited from Object) |
JniPeerMembers | |
LineInstance |
Returns a new instance of BreakIterator that locates legal line- wrapping positions. |
PeerReference | (Inherited from Object) |
RuleStatus |
For RuleBasedBreakIterators, return the status tag from the break rule that determined the boundary at the current iteration position. |
SentenceInstance |
Returns a new instance of BreakIterator that locates sentence boundaries. |
Text | |
ThresholdClass | |
ThresholdType | |
TitleInstance |
<strong>[icu]</strong> Returns a new instance of BreakIterator that locates title boundaries. |
WordInstance |
Returns a new instance of BreakIterator that locates word boundaries. |
Methods
Clone() |
Clone method. |
Current() |
Return the iterator's current position. |
Dispose() | (Inherited from Object) |
Dispose(Boolean) | (Inherited from Object) |
Equals(Object) |
Indicates whether some other object is "equal to" this one. (Inherited from Object) |
First() |
Set the iterator to the first boundary position. |
Following(Int32) |
Sets the iterator's current iteration position to be the first boundary position following the specified position. |
GetAvailableLocales() |
Returns a list of locales for which BreakIterators can be used. |
GetCharacterInstance(Locale) |
Returns a new instance of BreakIterator that locates logical-character boundaries. |
GetCharacterInstance(ULocale) |
<strong>[icu]</strong> Returns a new instance of BreakIterator that locates logical-character boundaries. |
GetHashCode() |
Returns a hash code value for the object. (Inherited from Object) |
GetLineInstance(Locale) |
Returns a new instance of BreakIterator that locates legal line- wrapping positions. |
GetLineInstance(ULocale) |
<strong>[icu]</strong> Returns a new instance of BreakIterator that locates legal line- wrapping positions. |
GetRuleStatusVec(Int32[]) |
For RuleBasedBreakIterators, get the status (tag) values from the break rule(s) that determined the the boundary at the current iteration position. |
GetSentenceInstance(Locale) |
Returns a new instance of BreakIterator that locates sentence boundaries. |
GetSentenceInstance(ULocale) |
<strong>[icu]</strong> Returns a new instance of BreakIterator that locates sentence boundaries. |
GetTitleInstance(Locale) |
<strong>[icu]</strong> Returns a new instance of BreakIterator that locates title boundaries. |
GetTitleInstance(ULocale) |
<strong>[icu]</strong> Returns a new instance of BreakIterator that locates title boundaries. |
GetWordInstance(Locale) |
Returns a new instance of BreakIterator that locates word boundaries. |
GetWordInstance(ULocale) |
<strong>[icu]</strong> Returns a new instance of BreakIterator that locates word boundaries. |
IsBoundary(Int32) |
Return true if the specified position is a boundary position. |
JavaFinalize() |
Called by the garbage collector on an object when garbage collection determines that there are no more references to the object. (Inherited from Object) |
Last() |
Set the iterator to the last boundary position. |
Next() |
Advances the iterator forward one boundary. |
Next(Int32) |
Move the iterator by the specified number of steps in the text. |
Notify() |
Wakes up a single thread that is waiting on this object's monitor. (Inherited from Object) |
NotifyAll() |
Wakes up all threads that are waiting on this object's monitor. (Inherited from Object) |
Preceding(Int32) |
Sets the iterator's current iteration position to be the last boundary position preceding the specified position. |
Previous() |
Move the iterator backward one boundary. |
SetHandle(IntPtr, JniHandleOwnership) |
Sets the Handle property. (Inherited from Object) |
SetText(ICharSequence) |
Sets the iterator to analyze a new piece of text. |
SetText(String) |
Sets the iterator to analyze a new piece of text. |
ToArray<T>() | (Inherited from Object) |
ToString() |
Returns a string representation of the object. (Inherited from Object) |
UnregisterFromRuntime() | (Inherited from Object) |
Wait() |
Causes the current thread to wait until it is awakened, typically by being <em>notified</em> or <em>interrupted</em>. (Inherited from Object) |
Wait(Int64, Int32) |
Causes the current thread to wait until it is awakened, typically by being <em>notified</em> or <em>interrupted</em>, or until a certain amount of real time has elapsed. (Inherited from Object) |
Wait(Int64) |
Causes the current thread to wait until it is awakened, typically by being <em>notified</em> or <em>interrupted</em>, or until a certain amount of real time has elapsed. (Inherited from Object) |
Explicit Interface Implementations
IJavaPeerable.Disposed() | (Inherited from Object) |
IJavaPeerable.DisposeUnlessReferenced() | (Inherited from Object) |
IJavaPeerable.Finalized() | (Inherited from Object) |
IJavaPeerable.JniManagedPeerState | (Inherited from Object) |
IJavaPeerable.SetJniIdentityHashCode(Int32) | (Inherited from Object) |
IJavaPeerable.SetJniManagedPeerState(JniManagedPeerStates) | (Inherited from Object) |
IJavaPeerable.SetPeerReference(JniObjectReference) | (Inherited from Object) |
Extension Methods
JavaCast<TResult>(IJavaObject) |
Performs an Android runtime-checked type conversion. |
JavaCast<TResult>(IJavaObject) | |
GetJniTypeName(IJavaPeerable) |
Gets the JNI name of the type of the instance |
JavaAs<TResult>(IJavaPeerable) |
Try to coerce |
TryJavaCast<TResult>(IJavaPeerable, TResult) |
Try to coerce |