How do we build a DocSearch index?
In this section you will learn how we build a DocSearch index from your page.
Everything starts from your page
data:image/s3,"s3://crabby-images/2fa8e/2fa8ef2a22a95e08d8571b7b4645ae75345a6819" alt="1st step"
We extract the payload with to your set of selectors
data:image/s3,"s3://crabby-images/26970/26970695a6acbab0dc75edcf7ab26460187e9129" alt="2nd step"
We will focus on the highlighted information depending on your selectors.
We iterate through the HTML flow and build the payload
data:image/s3,"s3://crabby-images/a2391/a2391edc3df60fc4c453027a9f788d56d3a36064" alt="3rd step"
This payload will be the only data extracted from your page.
We iterate through the payload and start pushing records
data:image/s3,"s3://crabby-images/0b2e5/0b2e5bf6bf6953d21f1340dac2a13e07d9bdee49" alt="4th step"
We index the temporary record when we add an element to it (if min_indexed_level
equals 0
)
We pile up the elements based on the current temporary record
data:image/s3,"s3://crabby-images/f1eef/f1eefbcf37d87ea33121df65d51e599cea4411b8" alt="5th step"
Based on the position within the flow, we nest elements as much as possible to keep the context and increase the relevancy.
We iterate until we match a text
element
data:image/s3,"s3://crabby-images/67360/67360a4e554868decb7025177f20454a4e67e33d" alt="6th step"
We override the text element when we find a newer one
data:image/s3,"s3://crabby-images/6c4f5/6c4f51f7a575a06acd1e62b995d148f496203ffb" alt="7th step"
We remove the stashed, deeper elements when we add a higher level
data:image/s3,"s3://crabby-images/f8032/f8032801efbf58aaf8c7bce09cae00a78eda139a" alt="8th step"
Contextual information and hierarchy must be updated once we encounter a new level. We are doing that because it highlights a new sub-section not related to the previous one.
If you need any further information, please connect with us on Discord or let our support team know.