The search engine in the company
Foremost there is the content; it is through him that everything begins. Such an infection in the information system, it multiplies, spreads, invaded all parts of the system. Then it embeds itself, and moves when you think he is gone we realize that it never dies. Soon users are suffering from a disease: “Searchite” acute. They search tirelessly, continuously, but cannot find or just what they need. And this time to search and navigate the maze of directories is a huge loss for the company, a mess. As well as sesame is finally located, the user does not have the certainty of having the latest version of this content. To overcome this aberration there are tools “search engines”.
This article aims to draw up an inventory using the search engine company, but also my vision for the future of it. (While some aspects are not as futuristic as it sounds).
We will see at first the history of information management, with three major steps, before the computer at the beginning with shared directories and now with the GED, then we will see the utility of an engine research and its possible future.
The history of information management
30 years ago, before the democratization of information technology (IT front boundary), the information management is already a headache for companies. The information is on paper documents and storing them is to classify them with cardboard covers. These cardboard covers are in turn classified into bins. When archiving is moved lockers (or at least their content) in another room: the archive room.
To navigate and find information is a work in itself: that of the archivist.
Communication and document exchange through pneumatic tubes or courier and to exchange information quickly within the company.
15 years ago, in the 90s, with the democratization of computing, the paper document is slowly giving way to electronic document and the information system of the company is often based on a shared directory where all of information. The mass of paper is, at the time, limited by technical capabilities. Indeed 15 years ago the cost of storage was not commensurate with what it is today.
The electronic document, twin brother of the paper document, has inherited the same genetic characteristics, it is unstructured, often poorly organized, and quickly takes a lot of space. Its ranking is the same puzzle than his brother. It is placed in virtual index cards (called back) classified themselves in a locker (called HDD) which itself is stored in a building (called server). The major difference between these two twin brothers is in the space they occupy … Nevertheless the situation in terms of storage remains the same as with paper documents, in order to reproduce what the user knows for comfort.
With the emergence of the Internet in the 90s and the increasing use of email, information sharing will take another form
Today, everything is accelerated; the amount of electronic documents stored grows exponentially. With the laws on legal electronic archiving, the advent of new media such as sound or video, as well as social networking, there has never been so much information stored.
The cost of storage is so low that the waste of disk space is no longer a source of guilt, thus paving the way for a high redundancy of information.
My first computer in 1996 had 1.6 GB of disk space; each day was akin to a battle for not fulfilling this disc. Today, my computer has 465 GB, and the battle is rather to successfully complete it.
In Business we are seeing the same scenario, the plethora of documents and duplication of these are huge and are growing. The problem now is no longer to store information but rather to find her and especially the last valid version of it.
How many times does he not come to a user to move a monumental time in search for a document, to ultimately give up the search by asking the author if he is lucky to know him, that this shall send?
Morality we store this umpteenth copy to another location, thus making the system even more nebulous information.
As an example, if I send a document to 5 people for validation. I send the document that is in the directory “My Documents”, that document is in my outbox, it happens within 5 inboxes of my colleagues, who store them in the same directory “My Documents “and then return it validated me, it finds itself again in 5 boxes of dispatch before arriving in my inbox in 5 copies, for the merge I copy the 5 documents in the folder” My Documents “and releases new.
An original document in “my documents”
A document in my outbox
5 documents in the inbox of my colleagues
5 documents in “my documents” on the computer of my colleagues
5 documents in their outbox
5 documents in my inbox
5 documents in “my documents”
A merged document
Or 28 copies of the same document. In three weeks, if you look where you take?
So we invent solutions that mimic the GED miserably the ranking system directory and only shift the problem.
But they offer the advantages of:
- to exchange links rather than the document directly
- The taxonomy that allows a document to be categorized in several “folders” called tags, if this is respected by people using the GED
- The folksonomy that allows anyone to create their own tags and no matter how you end up with tags like (SharePoint, CherePoint, SharePoint) anything that can exist as a typo and phonetic notation.
Then some tools are very poorly used. How many times I’ve seen a migration to SharePoint shared directory which was a big cut and paste, and the user to say with a touch of irony, “I do not understand because I’m still lost when we have paid dearly for a new tool. ”
In fact, for 30 years, little has actually changed.
Of course the features are deliberately exaggerated and not everything is so black.
The scary part is that one cause of the explosion of internet content is Social Networking, Social Networking, which is almost here in business. The wealth of content and information will be logarithmic in companies deploying a CSR (Corporate Social network).
The paradox private / professional
In the company it becomes very difficult to find information and especially to obtain the latest information day.
User confidence is often put to severe tests when they are not after all necessary information.
Worse for the most part the information is scattered in different applications (document stored in a shared directory, customer information in CRM, not including GED, ERP, and mail).
In the best case, these applications are from the same publisher and can easily integrate and communicate with each other. But in reality, the choice of applications is based on the choices that were made at the time of deployment, and affinities, if not interest, to work with a particular editor, so that final applications are not at all integrated and consultants / developers ride patches interoperability and ingenuity in all directions.
So in the company, as a collaborator, we seek, we spend our time looking after and seek information to different people in society. Various studies show that an employee loses up to 8 hours per week searching for information within the company:
But as an online how do I find information? How do you find when you want an article about nuclear power plants, earthquakes or famine in Africa? It just goes to Google or Bing. Worse more and more people clap directly address they are looking for (or want to see) in the search engine. The favourites are now seldom used. The diagram below shows the explosion in the number of requests:
As we can see if the user has available more than one input field has the impression of being drowned in information or being assaulted (I just thought of some certain content or intranet manager insist on putting everything on the first page the user drowning in information flow).
We come to the following paradox: it is easier to find information on the internet, a generic system since by definition must be suitable for everyone but especially larger and much less structured than the information system of company.
The reason for this paradox is simple: on the internet there are so-called search engines that find information. These search engines index day and night the contents of any site and, increasingly, any file format (HTML, PDF, WORD, POWERPOINT …..). These search engines work on algorithms of relevance which allows the user to get what he wants based on a keyword in the top search result. To make the user experience better, especially for filter functions, search engines are fitted with modern bars refining.
In business, until recently, it was far. Organizations that deploy a search engine are rare. In addition, most search engines are not really suitable for the company, or just for a specific occupation.
The search engine in the company
We see real solutions exist on the Internet and can be tailored to the company. Of course many people I retort “I have a search engine in my GED” the answer will be quick “Yes but if they seek only what is stored in your GED then discard it!”.
In business the big advantage lies in the mastery of more or less advanced information system but especially in the art business. Indeed, consider the term Fund in a bank it represents a financial investment in a furniture factory on the contrary it will be a piece of furniture. The search engine can be fully adapted to the business and customized to its business and its associated terms. It may therefore have different rules for word Fond of our bank and the word of our Fond furniture manufacturer.
Given the heterogeneity of file types in the enterprise, the enterprise search engine must know the maximum size, ideally all the formats used by the company.
At a time of globalization, the internationalization of the information system becomes widespread; the search engine must also recognize the languages used within the company, for example the word ‘CAR’:
-In English it means a car which can be very useful in a car maker
-In French it must be considered a stop Word and not be considered.
The search engine company must adapt to its user and its function or role within the company. Indeed a business does not need the same result on the search “SharePoint” a technician or developer, if the search adapts to the user that is a plus.
Research also needs to be filtered and sorted afford to be just. This needs to be customized. Order search by creation date instead of relevance, for example, allows the research conducted on the last document created.
Research in the company must also take into account ALL the information system of the company and not only the software publisher’s search engine. So if the company has a Filenet, Documentum, a shared directory, or SharePoint, then a search on a keyword will return everything that is found in all containers of information, and if the mails are also indexed, it reached paradise.
The federation is also very important in an era of social networks are interconnected; it is inconceivable to have search engines that look without relying on others. If a search is unsuccessful in the information system of the company, the search engine should automatically display the result of the same search in other engine like Bing or Google:
SharePoint and Bing federation.
Research can also be merged directly into the search of the user station. As well as a user workstation, by searching on a specific term, the user can have all the results, by bringing the mails, the information system of the company and its local post:
The thumbnail view of what is sought is also a plus: without the thumbnails, you download, it opens the document, you realize that this is not good, it closes and destroys local before downloading another in the search engine. With the thumbnails are arranged in an outline on it, (http://thumbextsp.codeplex.com/ ) engine Google has very well understood:
However, search engines on the internet as there are not good at all for the company but to the internet. The life cycle and update the content on the internet is not the same as in business. Thus it may be acceptable for a blog in January saw the day appears in the search engine in June. Now imagine the same thing with the memo from your boss! You understand what I mean…
In addition, search engines are based on Internet content, as many search engines sold in business. Thus, research on the keyword “Molière” back 8,830,000 results in Google when the search for “Jean Baptiste Poquelin” she goes back 495,000 results despite the fact that it is the same person. This is because it comes mostly from Jean Baptiste Poquelin Molière by the alias and then, as Google is based on the content, there is no relationship between the two.
The search engine company should be smarter and have the opportunity to “learn” to use the information in the company, this is where the notion of semantic web.
The future of search engine
Semantic search and Web 3.0
The Semantic Web is a set of technologies designed to make the content resources of the World Wide Web accessible and usable by software agents and programs, through a system of formal metadata, using in particular the family of languages developed by W3C.
Currently there is still work. However there has advanced obvious. Some search engines allow extracting in “an enclosed space such as the company” the so-called metadata.
An example, Luxembourg banks increasingly anonymize records and documents in many customers, the customer name is not mentioned but its reference.
A search engine such as Internet search engines, based on document content. And documentation of client John Doe does not contain the words “John Doe” as anonymous, but containing the reference “1234” will no longer be by the search engine while searching on the keywords “John Doe” should be instead perform a search by the customer reference to get the desired result, like Molière and Jean Baptiste Poquelin.
An intelligent search engine and semantics identify the reference “1234” and will question customer database mappings from the client and its reference. And John Doe go up research documents containing John Doe but also the reference “1234”.
This full integration into the information system provides a transparency to the user. Indeed, for him, regardless of whether the information is in the latest SQL Server, the best in the world or GED in the last collaboration portal. What he wants is access to information and so that simple on the internet.
Consider another example:
A list of documents that speak of the way into a bank.
If you have a list of the names of funds and fund administrators can “teach” your search engine the name of these funds and administrators, even the link between each of them. So if the name of a director of the bottom A is present in a document so you can automatically tag the document with the name of the background AND the name of the administrator. Then, the communication with other machines and software will be very simple.
Semantic search also includes research in natural language.
As an example, you are on a place in a city and you tap your phone, itself connected to your search engine, “Where can I eat? “. If you say that has any person she will answer you, but how can a search engine to link food with restaurant, brasserie, french fries, cottage etc…. It is very far from the search engine we know of course content on the Internet, this is the fact that the search engine must know your profile, see what you like as flat, if you have food allergies or special diets (eg Duncan) and finally he must know where you are.
But research in natural language is the keystone of Web 3.0, enterprise issues would be different.
Imagine that one is in a law firm and that the question be:
“Viewing sections of laws concerning the car accident that my client, running on a parking lot, was done by a person get into it right from the steering wheel with a 15 year old girl who is not allowed and learning driving with his father “(news story that happened to me)
Clearly, a search engine based on the content is lost from the first words. A semantic search engine with natural language will recognize the request and will get out information.
Natural language is still in its infancy, however, Bing has already begun and research style “Air jordan under $ 100” work now:
Web 3.0 also comes standard with:
Nevertheless, that a simple search engine can, alone, to boast of the semantic web is still a utopia today.
A search engine search without.
We may add research suggestive. When does one do a search using a search engine? When did we need information? When conducting a task in the context of our work.
Imagine your manager assigns you the task “Remove all the laws on adoption in France.” You read this work and your first instinct is to look into the information system of the enterprise and Internet texts of laws.
Now imagine that when you open the task, an area of the screen has already done the research for you because this area is planned to display the search results inherent in the task title. The time saving is huge!
The search can go much further, as we have seen above can be linked to the information system the search engine. If the search engine is linked to project basis, employees and customers, so he knows the three entities. Thus, while indexing, it will bind the three entities together based on their appearance in the documents.
When the search engine will index:
• CVs with the knowledge and names of consultants
• The Mission Sheets containing the names of projects, customers, consultants
• The commercial offers
• The specifications
He will be able to link the content to know what project is linked to any client and any consultant.
Now imagine a much more related entity such as commercial products, incidents etc….
This offers opportunities and usage scenarios huge and we did not even have imagined possible a few years ago.
Research on new media
Index text is what search engines have done for years, but now with social networking, new means of communication used in business, a summary of the new file types such as sound, video and images, index documents is not necessarily relevant.
Indeed if you index a video and that the only information that you have emerged are the name of the video resolution, duration, and number of frames per second it does not necessarily help you, take a picture, data eXIF can be interesting but are not necessarily relevant either.
Now the voice recognition technology, face and objects are developed and affordable (eg Picasa proves). Included in a search engine you could find all the pictures of the person (Megan Fox) that the photo does not appear on a page containing the word or not Megan Fox is called Megan_Fox.jpg example.
Some sites like Facebook now include a facial recognition engine, it can easily index the photos (finished tagging along pictures), and others like Google have acquired face recognition technologies for more:
So if you are looking for in your business name of a person you will find all the documents he wrote, but all images and videos in which he appears, and sounds where his name is pronounced. This gives a new dimension to research.
Research, brain of your content (Content Intelligence)
The software that best knows the information system of your company or at least the information that resides there is the search engine. This information can index every few minutes, to obtain a relevance of the results fast enough.
Knowing your information system and therefore the content and materials, the search engine could show all documents of type “order” of a particular client, compare your system to business intelligence you can come out ( 5 orders in the information system against actual orders 6 by the customer) it can highlight problems. One could also imagine out the order forms in comparing bids and invoices, giving a snapshot at a time T monitoring of the client. How does this change in business intelligence?
I would suggest the absence of the database. The search engine draws content directly in the documents, the receipt of an email just to be able to update all information after indexing it.
Of course this is possible with search engines for performing queries against the entire index as simply as with SQL. Languages such as Google or Fast Query Language Query Language can imagine what kind of application:
Unlike the database, the configuration of the search engine is the largest in Content Intelligence, but, unlike the database queries are simpler, some would say simplistic.
With that kind of language that allows querying directly in the index content, we can make the Content Intelligence, which unlike the Business intelligence is based on the content of the information system, the search engine being the brain’s information system is best able to meet this demand.
Driven Application Search (Content Intelligence II)
Probably around the concept of search engine the less successful now, but paradoxically one that could emerge soon.
An example produced by Microsoft:
Applications today are Data Driven Application, they are based on a database, XML file, or web services, in short they are based on a structured data source.
The Search Driven Application are applications based on the search engine, the data source is the search engine requests go through complete Query languages such as Fast Query Language or the Google Query Language.
From there, the Dashboard (Dashboard) can be created and displayed (as above).
The search engine has not really changed since the release of Google and the penetration rate of search Engine Company is not very high. At a time when companies are starting to bring social networks, where the media becoming part of the content and the number of documents is increasing exponentially, the search will become the keystone of the information system, because a successful information system is not a system that has many different kinds of content but it is a system for finding information simply.
I’m trying to encode some example of these concepts using the search engine Fast Search for SharePoint (aka fs4sp). Wait and See!
Thanks to Nicolas Esprit and my wife for proofreading.