Tuesday, April 9, 2013

Sharepoint Server 2010 Search Architecture


Overview

Search in Microsoft® SharePoint® Server 2010 is re-architected with new components to create greater redundancy within a single farm and to allow scalability in multiple directions. Each of the components that make up the query architecture and the crawling architecture can be scaled out separately based on the needs of an organization. More information:

Query architecture:

The query architecture includes query components, index partitions, and property databases.
About index partitions:
-         An index partition is a logical portion of the entire index. The index is the aggregation of all index partitions.
-         Index partitions are associated with query components. You deploy a query component that is associated with a particular index partition to a specific server. In this way, index partitions are spread across query servers. For example, in a farm with three index partitions and one query component per partition, each query component contains one-third of the total index.
-         Deploying query components that are associated with index partitions across different servers creates faster query architecture because the processing power of multiple query servers is used to respond to queries.
-         Index partitions can be associated with one or more query components. Multiple query components (mirrors) for a given index partition can be deployed across query servers to achieve redundancy. Typically, two query components are configured for each index partition, and these query components reside on different query servers to achieve redundancy of the index partition.

Crawl architecture

The crawl architecture includes several components that can be scaled out based on crawl volume and performance requirements:
-         Crawl component — multiple crawl components can be deployed to crawl simultaneously. Each crawl component is associated with a crawl database. Crawl components reside on application servers. Crawl components produce portions of the index (per index partition) and propagate them to the servers that are running the query components associated with the given index partition.
-         Crawl database — Manages crawl operations and stores crawl history. You can assign multiple crawl components to each crawl database for redundancy. In this case, each crawl component will crawl different content during a crawl.
-         Property database — Also considered part of the query architecture; stores properties for crawled data. The number of required property databases depends on the volume of content that is crawled and the amount of metadata that is associated with the content.

Search Flow



The Step Search:
1.         Upload document or create new item
2.         The crawl process works: When those Full crawl process starts, The Start address of the search source moved to queue. iFilter opens files and Content index created on crawl server. Then the Index moved in batches to query server and the relevant Data written to crawl.
The crawler uses protocol handlers and iFilters as follows:
a.     The crawler retrieves the start addresses of content sources and calls the protocol handler based on the URL’s prefix.
b.    The protocol handler connects to the content source and extracts system-level metadata and access control list information.
c.     The protocol handler identifies the file type of each content item based on the file name extension and calls the appropriate iFilter associated with that file type.
d.    The iFilter extracts content, removing any embedded formatting, and then retrieves content item metadata.
e.     Content is parsed by one or more language-appropriate word breakers and is added to the content index, also called the full-text index. Metadata and access control lists are added to the Search database.
Additional Reading:
-                 Manage Crawl Rules:  (http://go.microsoft.com/fwlink/?LinkID=197051&clcid=0x409)
-                 Best Practices for using crawls: (http://go.microsoft.com/fwlink/?LinkID=197052&clcid=0x409)
3.         Written to Property databases
4.         User input keyword
5.         The query flow: the WFE serving the call uses the associated search service application proxy to connect to a server running the Query and Site Settings Service also known as the Query Processor.  It uses WCF for this communication. The Query Processor will connect to the following components to gather results merges\security trims and return results back to WFE: Query Component (holds entire index or partition of an index) Property Store DB (holds metadata\properties of indexed content) Search Admin DB (holds Security Descriptors\Configuration data). Then WFE displays search results to the user.

0 comments:

Post a Comment