Overview
Search in Microsoft® SharePoint® Server 2010 is re-architected with new
components to create greater redundancy within a single farm and to allow
scalability in multiple directions. Each of the components that make up the
query architecture and the crawling architecture can be scaled out separately
based on the needs of an organization. More information:
Query
architecture:
The query architecture includes query components, index
partitions, and property databases.
About index partitions:
-
An index partition is a logical portion of the
entire index. The index is the aggregation of all index partitions.
-
Index partitions are associated with query
components. You deploy a query component that is associated with a particular
index partition to a specific server. In this way, index partitions are spread
across query servers. For example, in a farm with three index partitions and
one query component per partition, each query component contains one-third of
the total index.
-
Deploying query components that are associated
with index partitions across different servers creates faster query
architecture because the processing power of multiple query servers is used to
respond to queries.
-
Index partitions can be associated with one or
more query components. Multiple query components (mirrors) for a given index
partition can be deployed across query servers to achieve redundancy.
Typically, two query components are configured for each index partition, and
these query components reside on different query servers to achieve redundancy
of the index partition.
Crawl
architecture
The crawl architecture includes several components that can
be scaled out based on crawl volume and performance requirements:
-
Crawl component — multiple crawl components can
be deployed to crawl simultaneously. Each crawl component is associated with a
crawl database. Crawl components reside on application servers. Crawl
components produce portions of the index (per index partition) and propagate
them to the servers that are running the query components associated with the
given index partition.
-
Crawl database — Manages crawl operations and
stores crawl history. You can assign multiple crawl components to each crawl
database for redundancy. In this case, each crawl component will crawl
different content during a crawl.
-
Property database — Also considered part of the
query architecture; stores properties for crawled data. The number of required
property databases depends on the volume of content that is crawled and the
amount of metadata that is associated with the content.
Search Flow
The Step Search:
1.
Upload document or create new item
2.
The crawl process works: When those Full crawl
process starts, The Start address of the search source moved to queue. iFilter
opens files and Content index created on crawl server. Then the Index moved in
batches to query server and the relevant Data written to crawl.
The crawler uses
protocol handlers and iFilters as follows:
a. The
crawler retrieves the start addresses of content sources and calls the protocol
handler based on the URL’s prefix.
b. The
protocol handler connects to the content source and extracts system-level
metadata and access control list information.
c. The
protocol handler identifies the file type of each content item based on the
file name extension and calls the appropriate iFilter associated with that file
type.
d. The
iFilter extracts content, removing any embedded formatting, and then retrieves
content item metadata.
e. Content
is parsed by one or more language-appropriate word breakers and is added to the
content index, also called the full-text index. Metadata and access control
lists are added to the Search database.
Additional Reading:
-
Manage Crawl Rules: (http://go.microsoft.com/fwlink/?LinkID=197051&clcid=0x409)
-
Best Practices for using crawls: (http://go.microsoft.com/fwlink/?LinkID=197052&clcid=0x409)
3.
Written to Property databases
4.
User input keyword
5.
The query flow: the WFE serving the call uses
the associated search service application proxy to connect to a server running
the Query and Site Settings Service also known as the Query Processor. It uses WCF for this communication. The Query
Processor will connect to the following components to gather results
merges\security trims and return results back to WFE: Query Component (holds
entire index or partition of an index) Property Store DB (holds metadata\properties
of indexed content) Search Admin DB (holds Security Descriptors\Configuration
data). Then WFE displays search results to the user.
0 comments:
Post a Comment