Datafari: an open source search engine for businesses


Datafarian is an open source enterprise search software using Apache Solr for the indexing and search phases. It combines Apache ManifoldCF, Apache Solr and Apache Cassandra. based on HTML5, CSS3 and jQuery.

This is a packaged search engine in the sense that proposes connection to data sources, indexing, search and graphical system administration and is distributed using SolrCloud.

Datafarian was created by France Labs. France Labs sought open source search software to improve their R&D with a new intranet relevance algorithm.

The team discovered that there was nothing well maintained and available under an Apache License and created Datafari.

He became independent of the research on the algorithm, considering that it had a search value of its own.

About Datafari

This search engine enables employees to find data wherever it is, safe and secure.

More specifically, Datafari retrieves and indexes data and documents from different sources and file formats, and allows searching for both internal documents and metadata.

Besides, andIt is available in an open source version, called Datafari Community Edition, and in a proprietary version, called Datafari Enterprise Edition.

As mentioned above, it is a search engine for business.

Your goals are different from a web search engine, and the technical challenges are different.

For a business search engine, it must be multi-source, multi-format, and manage security.

Also, you must allow yourself to manage the tool. In the free version, we can, on the administration side:

  • Textual search including Boolean operators
  • An Apache ManifoldCF based crawler that allows indexing of CMS, websites, shared files (Netapp, Samba, Windows), emails, databases, HDFS.
  • "Full text" analytics and a plug-in system for adding transform filters in the indexing and search phases
  • Graphical interface in HTML5 and javascript that uses HTML widgets, in responsive design
  • Use of Apache Tika to analyze and extract content and metadata from various types of documents (MSOffice, OpenOffice, HTML, XML, PDF, RTF, TXT, ZIP, EXIF, MP3 ...)
  • E-mail alert system to receive notifications of new results in insert mode (information reception) instead of extraction mode.


For search engine administrators

  • User search query graphical analysis tool.
  • Solr administration tool used in Datafari.
  • Tool to analyze the yields and calculate the relevance of the queries.
  • Administration tool for security with connection to AD or LDAP.
  • Tool to manage synonyms.
  • Tool to manage promolinks, allowing data that is not in the index to be displayed for the identified keywords.
  • Tool to manage tracking connectors, with several commercial data sources (Sharepoint, shared files, emails, websites, CMIS ...) and the ability to create new ones.

How to get Datafari?

For those who are interested in obtaining this search engine to know a little more about it and to know if they can implement it in their businesses or companies, they should follow the following steps.

Datafarian we can find it prepackaged through a virtual machine or a dockable container or you can download the installation for Debian or Redhat (RHEL is only available with Datafari Enterprise Edition).

For, those who are users of Debian, Ubuntu or derived systems they can make use of the deb package provided by the developers from the project's official website.

They must open a terminal and execute the following command:


Once the download is done, we proceed to install the package with:

sudo dpkg -i datafari.deb

For, all other Linux distributions can get the installation on their systems with the help of docker containers so for its installation they must have the support for it and install with the following command:

docker pull datafari/datafari

To get started right away, it's probably best follow the quick start guide.

The content of the article adheres to our principles of editorial ethics. To report an error click here!.

Be the first to comment

Leave a Comment

Your email address will not be published. Required fields are marked with *



  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.