Ingesting Documents

There are 3 ways of ingesting documents to Amazon Kendra:

  • Connectors: S3, Salesforce, ServiceNow, RDS, Sharepoint, and, OneDrive. More connectors are coming.
  • FAQ documents that contain questions and answers, that are ingested by using either the console or the CreateFaq API.
  • Using BatchPutDocument API that can take inline blobs and S3 locations for documents.

An index can include both unstructured text and frequently asked questions (FAQ):

Unstructured text: The following documents types containing unstructured text containing unstructured text in the following formats can be ingested into Amazon Kendra via connectors or the batchput interface.

  • HTML files

  • Microsoft PowerPoint presentations

  • Microsoft Word documents

  • Plain text documents including JSON

  • PDFs

FAQs and answers: Amazon Kendra’s Add FAQ capability can ingest question-answer pairs.

You can use the built-in connectors to ingest documents through the Kendra console. For a POC, if there no connector for your data source, you can mirror the data into an S3 bucket and use the S3 connector. To ingest a document directly, you can use the BatchPutDocument operation to ingest inline documents or a set of documents stored in an Amazon S3 bucket, add custom attributes to the documents, and to attach an access control list to the documents added to the index. You can find documentation about the BatchPutDocument action here.