The search engine in the company
Foremost there is the content; it is through him that everything
begins. Such an infection in the information system, it multiplies,
spreads, invaded all parts of the system. Then it embeds itself,
and moves when you think he is gone we realize that it never dies.
Soon users are suffering from a disease: "Searchite" acute. They
search tirelessly, continuously, but cannot find or just what they
need. And this time to search and navigate the maze of directories
is a huge loss for the company, a mess. As well as sesame is
finally located, the user does not have the certainty of having the
latest version of this content. To overcome this aberration there
are tools "search engines".
This article aims to draw up an inventory using the search engine
company, but also my vision for the future of it. (While some
aspects are not as futuristic as it sounds).
We will see at first the history of information management, with
three major steps, before the computer at the beginning with shared
directories and now with the GED, then we will see the utility of
an engine research and its possible future.
The history of information management
30 years ago, before the democratization of information
technology (IT front boundary), the information management is
already a headache for companies. The information is on paper
documents and storing them is to classify them with cardboard
covers. These cardboard covers are in turn classified into bins.
When archiving is moved lockers (or at least their content) in
another room: the archive room.
To navigate and find information is a work in itself: that of the
Communication and document exchange through pneumatic tubes or
courier and to exchange information quickly within the company.
15 years ago, in the 90s, with the democratization of
computing, the paper document is slowly giving way to electronic
document and the information system of the company is often based
on a shared directory where all of information. The mass of paper
is, at the time, limited by technical capabilities. Indeed 15 years
ago the cost of storage was not commensurate with what it is
The electronic document, twin brother of the paper document, has
inherited the same genetic characteristics, it is unstructured,
often poorly organized, and quickly takes a lot of space. Its
ranking is the same puzzle than his brother. It is placed in
virtual index cards (called back) classified themselves in a locker
(called HDD) which itself is stored in a building (called server).
The major difference between these two twin brothers is in the
space they occupy ... Nevertheless the situation in terms of
storage remains the same as with paper documents, in order to
reproduce what the user knows for comfort.
With the emergence of the Internet in the 90s and the increasing
use of email, information sharing will take another form
Today, everything is accelerated; the amount of
electronic documents stored grows exponentially. With the laws on
legal electronic archiving, the advent of new media such as sound
or video, as well as social networking, there has never been so
much information stored.
The cost of storage is so low that the waste of disk space is no
longer a source of guilt, thus paving the way for a high redundancy
My first computer in 1996 had 1.6 GB of disk space; each day was
akin to a battle for not fulfilling this disc. Today, my computer
has 465 GB, and the battle is rather to successfully complete
In Business we are seeing the same scenario, the plethora of
documents and duplication of these are huge and are growing. The
problem now is no longer to store information but rather to find
her and especially the last valid version of it.
How many times does he not come to a user to move a monumental
time in search for a document, to ultimately give up the search by
asking the author if he is lucky to know him, that this shall
Morality we store this umpteenth copy to another location, thus
making the system even more nebulous information.
As an example, if I send a document to 5 people for validation. I
send the document that is in the directory "My Documents", that
document is in my outbox, it happens within 5 inboxes of my
colleagues, who store them in the same directory "My Documents "and
then return it validated me, it finds itself again in 5 boxes of
dispatch before arriving in my inbox in 5 copies, for the merge I
copy the 5 documents in the folder" My Documents "and releases
An original document in "my documents"
A document in my outbox
5 documents in the inbox of my colleagues
5 documents in "my documents" on the computer of my
5 documents in their outbox
5 documents in my inbox
5 documents in "my documents"
A merged document
Or 28 copies of the same document. In three weeks, if you look
where you take?
So we invent solutions that mimic the GED miserably the ranking
system directory and only shift the problem.
But they offer the advantages of:
- to exchange links rather than the document directly
- The taxonomy that allows a document to be categorized in
several "folders" called tags, if this is respected by people using
- The folksonomy that allows anyone to create their own tags and
no matter how you end up with tags like (SharePoint, CherePoint,
SharePoint) anything that can exist as a typo and phonetic
Then some tools are very poorly used. How many times I've seen a
migration to SharePoint shared directory which was a big cut and
paste, and the user to say with a touch of irony, "I do not
understand because I'm still lost when we have paid dearly for a
new tool. "
In fact, for 30 years, little has actually changed.
Of course the features are deliberately exaggerated and not
everything is so black.
The scary part is that one cause of the explosion of internet
content is Social Networking, Social Networking, which is almost
here in business. The wealth of content and information will be
logarithmic in companies deploying a CSR (Corporate Social
The paradox private / professional
In the company it becomes very difficult to find information and
especially to obtain the latest information day.
User confidence is often put to severe tests when they are not
after all necessary information.
Worse for the most part the information is scattered in different
applications (document stored in a shared directory, customer
information in CRM, not including GED, ERP, and mail).
In the best case, these applications are from the same publisher
and can easily integrate and communicate with each other. But in
reality, the choice of applications is based on the choices that
were made at the time of deployment, and affinities, if not
interest, to work with a particular editor, so that final
applications are not at all integrated and consultants / developers
ride patches interoperability and ingenuity in all
So in the company, as a collaborator, we seek, we spend our time
looking after and seek information to different people in society.
Various studies show that an employee loses up to 8 hours per week
searching for information within the company:
But as an online how do I find information? How do you find when
you want an article about nuclear power plants, earthquakes or
famine in Africa? It just goes to Google or Bing. Worse more and
more people clap directly address they are looking for (or want to
see) in the search engine. The favourites are now seldom used.
The diagram below shows the explosion in the number of
As we can see if the user has available more than one input field
has the impression of being drowned in information or being
assaulted (I just thought of some certain content or intranet
manager insist on putting everything on the first page the user
drowning in information flow).
We come to the following paradox: it is easier to find information
on the internet, a generic system since by definition must be
suitable for everyone but especially larger and much less
structured than the information system of company.
The reason for this paradox is simple: on the internet there are
so-called search engines that find information. These search
engines index day and night the contents of any site and,
increasingly, any file format (HTML, PDF, WORD, POWERPOINT .....).
These search engines work on algorithms of relevance which allows
the user to get what he wants based on a keyword in the top search
result. To make the user experience better, especially for filter
functions, search engines are fitted with modern bars refining.
In business, until recently, it was far. Organizations that deploy
a search engine are rare. In addition, most search engines are not
really suitable for the company, or just for a specific
The search engine in the company
We see real solutions exist on the Internet and can be tailored to
the company. Of course many people I retort "I have a search engine
in my GED" the answer will be quick "Yes but if they seek only what
is stored in your GED then discard it!".
In business the big advantage lies in the mastery of more or less
advanced information system but especially in the art business.
Indeed, consider the term Fund in a bank it represents a financial
investment in a furniture factory on the contrary it will be a
piece of furniture. The search engine can be fully adapted to the
business and customized to its business and its associated terms.
It may therefore have different rules for word Fond of our bank and
the word of our Fond furniture manufacturer.
Given the heterogeneity of file types in the enterprise, the
enterprise search engine must know the maximum size, ideally all
the formats used by the company.
At a time of globalization, the internationalization of the
information system becomes widespread; the search engine must also
recognize the languages used within the company, for example the
-In English it means a car which can be very useful in a car
-In French it must be considered a stop Word and not be
The search engine company must adapt to its user and its
function or role within the company. Indeed a business does not
need the same result on the search "SharePoint" a technician or
developer, if the search adapts to the user that is a plus.
Research also needs to be filtered and sorted afford to be just.
This needs to be customized. Order search by creation date instead
of relevance, for example, allows the research conducted on the
last document created.
Research in the company must also take into account ALL the
information system of the company and not only the software
publisher's search engine. So if the company has a Filenet,
Documentum, a shared directory, or SharePoint, then a search on a
keyword will return everything that is found in all containers of
information, and if the mails are also indexed, it reached
The federation is also very important in an era of social
networks are interconnected; it is inconceivable to have search
engines that look without relying on others. If a search is
unsuccessful in the information system of the company, the search
engine should automatically display the result of the same search
in other engine like Bing or Google:
SharePoint and Bing federation.
Research can also be merged directly into the search of the user
station. As well as a user workstation, by searching on a specific
term, the user can have all the results, by bringing the mails, the
information system of the company and its local post:
The thumbnail view of what is sought is also a plus: without the
thumbnails, you download, it opens the document, you realize that
this is not good, it closes and destroys local before downloading
another in the search engine. With the thumbnails are arranged in
an outline on it, (http://thumbextsp.codeplex.com/ ) engine Google
has very well understood:
However, search engines on the internet as there are not good at
all for the company but to the internet. The life cycle and update
the content on the internet is not the same as in business. Thus it
may be acceptable for a blog in January saw the day appears in the
search engine in June. Now imagine the same thing with the memo
from your boss! You understand what I mean...
In addition, search engines are based on Internet content, as
many search engines sold in business. Thus, research on the keyword
"Molière" back 8,830,000 results in Google when the search for
"Jean Baptiste Poquelin" she goes back 495,000 results despite the
fact that it is the same person. This is because it comes mostly
from Jean Baptiste Poquelin Molière by the alias and then, as
Google is based on the content, there is no relationship between
The search engine company should be smarter and have the
opportunity to "learn" to use the information in the company, this
is where the notion of semantic web.
The future of search engine
Semantic search and Web 3.0
The Semantic Web is a set of technologies designed to make the
content resources of the World Wide Web accessible and usable by
software agents and programs, through a system of formal metadata,
using in particular the family of languages developed by
Currently there is still work. However there has advanced obvious.
Some search engines allow extracting in "an enclosed space such as
the company" the so-called metadata.
An example, Luxembourg banks increasingly anonymize records and
documents in many customers, the customer name is not mentioned but
A search engine such as Internet search engines, based on
document content. And documentation of client John Doe does not
contain the words "John Doe" as anonymous, but containing the
reference "1234" will no longer be by the search engine while
searching on the keywords "John Doe" should be instead perform a
search by the customer reference to get the desired result, like
Molière and Jean Baptiste Poquelin.
An intelligent search engine and semantics identify the reference
"1234" and will question customer database mappings from the client
and its reference. And John Doe go up research documents containing
John Doe but also the reference "1234".
Research extraction of information in a data
This full integration into the information system provides a
transparency to the user. Indeed, for him, regardless of whether
the information is in the latest SQL Server, the best in the world
or GED in the last collaboration portal. What he wants is access to
information and so that simple on the internet.
Consider another example:
A list of documents that speak of the way into a bank.
If you have a list of the names of funds and fund administrators
can "teach" your search engine the name of these funds and
administrators, even the link between each of them. So if the name
of a director of the bottom A is present in a document so you can
automatically tag the document with the name of the background AND
the name of the administrator. Then, the communication with other
machines and software will be very simple.
Semantic search also includes research in natural language.
As an example, you are on a place in a city and you tap your
phone, itself connected to your search engine, "Where can I eat? ".
If you say that has any person she will answer you, but how can a
search engine to link food with restaurant, brasserie, french
fries, cottage etc.... It is very far from the search engine we
know of course content on the Internet, this is the fact that the
search engine must know your profile, see what you like as flat, if
you have food allergies or special diets (eg Duncan) and finally he
must know where you are.
But research in natural language is the keystone of Web 3.0,
enterprise issues would be different.
Imagine that one is in a law firm and that the question be:
"Viewing sections of laws concerning the car accident that my
client, running on a parking lot, was done by a person get into it
right from the steering wheel with a 15 year old girl who is not
allowed and learning driving with his father "(news story that
happened to me)
Clearly, a search engine based on the content is lost from the
first words. A semantic search engine with natural language will
recognize the request and will get out information.
Natural language is still in its infancy, however, Bing has
already begun and research style "Air jordan under $ 100" work
Web 3.0 also comes standard with:
Nevertheless, that a simple search engine can, alone, to boast of
the semantic web is still a utopia today.
A search engine search without.
We may add research suggestive. When does one do a search using a
search engine? When did we need information? When conducting a task
in the context of our work.
Imagine your manager assigns you the task "Remove all the laws on
adoption in France." You read this work and your first instinct is
to look into the information system of the enterprise and Internet
texts of laws.
Now imagine that when you open the task, an area of the screen has
already done the research for you because this area is planned to
display the search results inherent in the task title. The time
saving is huge!
The search can go much further, as we have seen above can be
linked to the information system the search engine. If the search
engine is linked to project basis, employees and customers, so he
knows the three entities. Thus, while indexing, it will bind the
three entities together based on their appearance in the
When the search engine will index:
• CVs with the knowledge and names of consultants
• The Mission Sheets containing the names of projects, customers,
• The commercial offers
• The specifications
He will be able to link the content to know what project is linked
to any client and any consultant.
Now imagine a much more related entity such as commercial
products, incidents etc....
This offers opportunities and usage scenarios huge and we did not
even have imagined possible a few years ago.
Research on new media
Index text is what search engines have done for years, but now
with social networking, new means of communication used in
business, a summary of the new file types such as sound, video and
images, index documents is not necessarily relevant.
Indeed if you index a video and that the only information that you
have emerged are the name of the video resolution, duration, and
number of frames per second it does not necessarily help you, take
a picture, data eXIF can be interesting but are not necessarily
Now the voice recognition technology, face and objects are
developed and affordable (eg Picasa proves). Included in a search
engine you could find all the pictures of the person (Megan Fox)
that the photo does not appear on a page containing the word or not
Megan Fox is called Megan_Fox.jpg example.
Some sites like Facebook now include a facial recognition engine,
it can easily index the photos (finished tagging along pictures),
and others like Google have acquired face recognition technologies
So if you are looking for in your business name of a person you
will find all the documents he wrote, but all images and videos in
which he appears, and sounds where his name is pronounced. This
gives a new dimension to research.
Research, brain of your content (Content Intelligence)
The software that best knows the information system of your
company or at least the information that resides there is the
search engine. This information can index every few minutes, to
obtain a relevance of the results fast enough.
Knowing your information system and therefore the content and
materials, the search engine could show all documents of type
"order" of a particular client, compare your system to business
intelligence you can come out ( 5 orders in the information system
against actual orders 6 by the customer) it can highlight problems.
One could also imagine out the order forms in comparing bids and
invoices, giving a snapshot at a time T monitoring of the client.
How does this change in business intelligence?
I would suggest the absence of the database. The search engine
draws content directly in the documents, the receipt of an email
just to be able to update all information after indexing it.
Of course this is possible with search engines for performing
queries against the entire index as simply as with SQL. Languages
such as Google or Fast Query Language Query Language can imagine
what kind of application:
Unlike the database, the configuration of the search engine is the
largest in Content Intelligence, but, unlike the database queries
are simpler, some would say simplistic.
With that kind of language that allows querying directly in the
index content, we can make the Content Intelligence, which unlike
the Business intelligence is based on the content of the
information system, the search engine being the brain's information
system is best able to meet this demand.
Driven Application Search (Content Intelligence II)
Probably around the concept of search engine the less successful
now, but paradoxically one that could emerge soon.
An example produced by Microsoft:
Applications today are Data Driven Application, they are based on
a database, XML file, or web services, in short they are based on a
structured data source.
The Search Driven Application are applications based on the search
engine, the data source is the search engine requests go through
complete Query languages such as Fast Query Language or the Google
From there, the Dashboard (Dashboard) can be created and displayed
The search engine has not really changed since the release of
Google and the penetration rate of search Engine Company is not
very high. At a time when companies are starting to bring social
networks, where the media becoming part of the content and the
number of documents is increasing exponentially, the search will
become the keystone of the information system, because a successful
information system is not a system that has many different kinds of
content but it is a system for finding information simply.
I'm trying to encode some example of these concepts using the
search engine Fast Search for SharePoint (aka fs4sp). Wait and
Thanks to Nicolas Esprit and my wife for proofreading.
Stay tuned for more SharePoint content by joining our community or by
following us on twitter or facebook.