Oracle Contextual Intelligence (Context) analyzes content at the massive scale and speeds required by automated advertising technology to determine the context and the central meaning of web content where ads typically appear. (Oracle Contextual Intelligence was previously called Grapeshot.)
You can use Context to have content included or excluded for consideration for your automated advertising buys based on the appropriateness of the content for the brand being advertised.
Context is part of Oracle Data Cloud, which also includes Oracle Measurement — providing real-time analytics, fraud detection, and accreditation of more than 50 metrics for ad-impression viewability — and a proprietary DMP (data management platform) that has unparalleled access to industry data.
In this topic:
- Contextual Intelligence Product Line
- Primary users
- Language support
- How Context works
- Signal nomenclature
- Brand safety options
- Video and audio
- Mobile and CTV apps
Contextual Intelligence Product Line
- Brand safety. You can use Context for brand safety purposes to help avoid having advertising messages placed on pages with content that is inappropriate or irrelevant to the advertiser. The decisioning can be based upon existing standards, such as advertising-industry groups' approved lists of topics to avoid, or customized to match the particular needs of specific brands and products.
- Contextual targeting finds pages whose content is relevant to advertising messages. The application can enhance other targeting applications, adding context to the mix to discover pages that might otherwise have been missed, such as in the hard-news sections of a website that might otherwise be avoided altogether.
- Custom brand safety and targeting. We offer powerful tools that give advertisers granular control over the brand suitability of their ad campaigns. Using the same tools we use to build our syndicated segments, you can confidently create bespoke segments that target or avoid content.
- Predicts is a contextual solution that increases campaign relevancy by capitalizing on trending content and its associated inventory in real time. Predicts is powered by dynamic technology that automatically updates context segments with terms from trending content. Adding these segments to your campaigns ensures that ads are optimized to appear alongside trending content and secure the associated inventory that goes with it. This allows you to stay one step ahead of online trends, drive scale with your campaigns, and navigate through the unpredictable nature of the open web.
- Data Driven Context is a privacy-compliant, de-identified way to reach buyers most likely to interact with your ad at scale. We use data about consumer content-consumption habits to model contextual segments that can be activated in any campaign.
Primary users
Context users include:
- Advertisers and their partners who use Context to find pages to target or to avoid.
- Publishers who use Context to help discover pages for their advertising partners to target or avoid.
- Technology partners who use the signals sent from Context to offer brand safety or contextual targeting to their partners and customers.
Language support
Context is available for 31 of the most-used languages in the world. It can distinguish among important variations of languages, such as between British and American English.
How Context works
Context has two main processes: crawling and contextually analyzing pages, then matching them against carefully curated sets of words and phrases. These sets are called contextual segments.
Note: Context works in the pre-bid environment: it notifies ad systems to include or exclude pages before an advertising bid has been placed. This feature contrasts with systems that block an ad from appearing after a bid has been placed and won for a spot on a page. By using pre-bid technology, advertisers avoid being billed for placements on which they've never bid.
Crawling and analyzing pages
Context crawls and contextually analyzes text in response to requests. The technology and architecture support loads exceeding 3 million queries per second (QPS), a level found in massive programmatic advertising implementations, and also offer sub-millisecond response times.
The request for a page at a given URL can originate from a number of sources but typically comes from a partner's technology platform or server that handles ad serving, measurement and/or verification. Pages are re-crawled to check for changes, depending on the propensity of a particular page at the given URL to change within its epicenter.
To account for web pages that are the same as one another but have subtly different URLs, such as from parameters toward the end of a URL that are caused by web-analytics software, parameters are stripped out.
Sites that our systems are attempting to crawl can exclude or block the crawlers by various means, such as by using a robots.txt page or by specifically excluding Context's crawler. If we are unable to crawl a page, information about that inability is indicated to partners or customers via a signal prefixed "gx_".
Concentrating on the epicenter
After our crawler has received a request to scan a web page, it finds and crawls that page. It then downloads the page's core textual content (we call this the epicenter) from the page's HTML. We do not download or analyze the CSS, JavaScript, images, navigation, footer, sidebars, or other areas tangential to the main textual content on the page. For example, on a typical news web page, our technology analyzes the central text of that page. It does not analyze the embedded or side elements, which may include related stories, additional linked headlines, images, videos, and so on.
Re-crawling pages
Pages change, of course, and need to be re-crawled. The crawler maintains an estimate of how frequently a page changes. If a page has been modified since the last time it was crawled, then the crawling frequency is halved to a lower limit of every four hours. If it hasn't been modified, then the crawling frequency is doubled, to a maximum of every 30 days. In this way, the rate of re-crawl soon matches the modification rate, providing efficiency in apportioning resources.
Categorizing pages
Once a web page has been crawled, its information is kept in a document store — a centralized area for information from all crawled pages. Context's central data store holds information from more than 5 billion documents at a time and is an ever-growing, frequently updated record of all pages that have been crawled.
From this document store, a web page's record can be run against our proprietary algorithm to determine the weighted value of the language on the page and determine if there is a match to the contextual segments being used by a partner. If a web page is requested but not found in the document store, the URL is sent to the crawler layer to be crawled, processed and stored in the document store.
Contextual segments, matches, and signals
When a page is requested for analysis, its categorized data is then matched against the relevant contextual segments. Contextual segments are another elemental factor at the heart of Context's technological processes.
Segments and sub-segments build upon one another in a tree structure. So, for example, the "sports" segment may include "gs-sports-football" and "gs-sports-basketball," and each of those may include sub-segments relevant to their individual sports.
For example, a segment concerning "sports" would contain multiple words and phrases that would indicate &mdash if they appear with enough weight on a page — that the page is about sports.
Each language we cover has hundreds of standard segments addressing an array of topics. The segments are constructed by our teams of editors trained in linguistics and in our processes. Partners can also create, or ask us to create, custom segments to cover topics or niches not sufficiently covered for their purposes by our standard segments or to avoid brand-unsuitable content, based on their own criteria.
Once a page has been crawled, indexed and categorized, that information is compared to the relevant keyword segments to determine if there is a contextual match. Matches are scored to indicate how strong the match is compared with other matches.
Once a match is found and scored, a response is sent via API to our partners so they can determine how the page should be treated for advertising purposes.
The cache
In order to operate as quickly as possible, customers can elect to situate a cache close to the systems they are using so the information on categorized pages can be retrieved and matched as quickly as possible. The cache is updated every time a page is rescanned and categorized.
Signal nomenclature
Context provides signals in standardized alphanumeric formats that include a set prefix, descriptive word(s), and the "_" or "-" symbols. The nomenclature is constructed to be easily understandable and differentiated. As one example, a page found to match our "sports" segment sends a response of "gs_sports" as well as a "score" to indicate how strong the match is for that page to the segment compared with other matching segments. The "gs" prefix indicates "Grapeshot Standard," as described below. There may be sub-segments as well, such as "sports-football" or "sports-tennis," which can be used separately for matching or rolled up into an umbrella segment.
We have a range of overarching signal response types:
- gs_ : "Standard" is for segments designed to be positively targeted, for partners wishing to find advertising inventory on contextually relevant web pages, audio files or videos. There are more than 150 signals of this type, and they are translated into all our supported languages.
- gv_: "Verified" is used to indicate a web page to be negatively targeted (that is, avoided). These responses are for textual content on web pages or audio or video content that contain known brand safety violations or language that is considered unsafe for most brands.
- gl_: "Language" is a signal that indicates the language of a page, for example, "gl_English." This is used to confirm the language is the one desired.
- gx_: "Information Codes." We give a "gx_" response when we have not been able to crawl a page and therefore cannot deliver a more meaningful signal. Examples include pages that have not been crawled and categorized; that have no editorial text; that block crawlers; that are behind logins; or that are temporarily unavailable. These responses can be logged by partners for further analysis. A list of "gx_" response types is available separately.
- gq_: "Page Quality." This signal is given based on page- or session-measurement data collected through Measurement analytics solutions implemented by Measurement partners. The signal is used to inform partners of pages that may yield higher levels of users' attention.
- gs_predicts_: "Predicts." This product helps partners find content that is about trending topics. Whereas standard segments are comprised of a fixed set of terms that change only if edited, Predicts segments contain a core set of "seed" terms plus additional terms that are dynamically updated.
- gi_: "Data Driven Context" segments are segments containing URLs that have been identified as pages that a particular type of buyer is likely to frequent. These are based on de-identified and privacy-compliant analysis of buyer behavior and content-consumption habits.
- Custom segments. This is the only segment type that is not part of Context's fixed taxonomy of uneditable segments; it's either built by the partners and clients or by operators of their sub-accounts (also known as "zones"). By default, these are returned with the convention of zonename_segmentname.
Brand safety options
Context provides two standard levels for identifying risks associated with the textual content of web pages, audio and video:
- Maximum reach: For this level, content must be identified as unsafe to be excluded. Content not identified as potentially damaging are included for targeting. This allows clients to maximize advertising reach while having a standard level of protection.
- Maximum protection: For this level, content must be proactively identified as safe to be included. If the content is not explicitly identified as safe, it is excluded from targeting (negatively targeted). This protects customers for whom safety is the paramount concern.
Content that has been successfully processed and does not match any of the standard unsafe (gv_) segments is identified as safe (gv_safe) and offered for targeting.
Custom "safe-from" segments can also be added — segments for which if a match is found the page is negatively targeted (avoided).
Standard unsafe segments are ones that map to brand safety parameters that have been agreed upon by industry trade groups such as the 4A's. In September of 2018, the American Association of Advertising Agencies (4A's) Advertiser Protection Bureau (APB) introduced its Brand Safety framework. The framework lists 13 content categories that, in the words of a 4A's news release, "pose risk to advertisers, whereby advertisers might choose to adopt a 'never appropriate' position for their ad buys." These 13 categories and the 4A's definitions of them are identified in the table below, along with Context's corresponding avoidance categories where available and the ways in which such content is otherwise addressed for programmatic content.
4A Framework |
Context Avoidance Categories |
|||
# |
Category |
Definition |
Category |
Definition |
1 |
Adult and Explicit Sexual Content |
Illegal sale, distribution, and consumption of child pornography. Explicit or gratuitous depiction of sexual acts, and/or display of genitals, real or animated. |
gv_adult |
Avoids mature and sexual content. |
2 |
Arms and Ammunition |
Promotion and advocacy of sales of illegal arms, rifles, and handguns. Instructive content on how to obtain, make, distribute, or use illegal arms. Glamorization of illegal arms for the purpose of harm to others. Use of illegal arms in unregulated environments. |
gv_arms |
Avoids content around guns and weapons. |
3 |
Crime and Harmful acts to individuals and Society and Human Right Violations |
Graphic promotion, advocacy, and depiction of willful harm and actual unlawful criminal activity, including murder, manslaughter and harm to others. Explicit violations of human rights, such as trafficking and slavery. |
gv_crime |
Segments include serious, sex and violent crimes. |
4 |
Death or Injury |
Promotion or advocacy of death or injury. Murder or willful bodily harm to others. Graphic depictions of willful harm to others. |
gv_death_injury |
Segments include air, fire, rail, road and sea. |
5 |
Online piracy |
Pirating, copyright infringement, and counterfeiting. |
gv_download |
Topics related to online piracy and spam. |
6 |
Hate Speech and Acts of Aggression |
Unlawful acts of aggression based on race, nationality, ethnicity, religious affiliation, gender, or sexual preference. Behavior or commentary that incites such hateful acts, including bullying. |
gv_hatespeech |
Avoids derogatory terms, including content around racism, homophobia, and inflammatory political terms. |
7 |
Military Conflict |
Incendiary content provoking, enticing, or evoking military aggression. Live-action footage/photos of military actions and genocide or other war crimes. |
gv_military |
Avoids conflict, war and negative foreign policy content. |
8 |
Obscenity and Profanity |
Excessive use of profane language or gestures and other repulsive actions with the intent to shock, offend, or insult. |
gv_obscenity |
Avoids content that includes offensive terms. |
9 |
Illegal Drugs |
Promotion or sale of illegal drug use, including abuse of prescription drugs. Federal jurisdiction applies, but allowable where legal local jurisdiction can be effectively managed. |
gv_drugs |
Avoids content related to consumption of drugs, including recreational and performance-enhancing use. |
10 |
Spam or Harmful Content |
Malware/Phishing |
Does not map directly to a specific Context avoidance category, although certain URLs related to this definition may be covered in some part by Context's gv_download category. Additionally, Context, as part of our monitoring of keyword segments, manually adds Spam or Harmful Content sites to our internal block list of pages not to be crawled. If requested by Context clients, these sites return a gv_spam_or_harmful_site categorization. |
|
11 |
Terrorism |
Promotion and advocacy of graphic terrorist activity involving defamation, physical and/or emotional harm of individuals, communities, and society. |
gv_terrorism |
Avoids content around terrorist attacks. |
12 |
Tobacco/eCigarettes/Vaping |
Promotion and advocacy of tobacco and e-cigarette (vaping) and alcohol use to minors. |
gv_tobacco |
Avoids smoking content, including vaping and e-cigarettes. |
13 |
Sensitive Social Issue/ Violations of Human Rights |
Disrespectful and harmful treatment of sensitive social topics such as abortion, extreme political positions, and so on. Acts, language and gestures deemed illegal, not otherwise outlined in this framework. Examples include harm to self or others and animal cruelty. Targeted harassment of individuals and groups. |
Does not map directly to a specific Context avoidance category, although some text related to this definition may be covered in some part by Context's gv_hatespeech and gv_obscenity categories. Sites deemed inappropriate as noted in category 10 above may be removed from our crawl list and be designated with a harmful_site categorization. |
Video and Audio
For video and audio, we ingest and process the audio matched to the video, converting it to text. That text is then analyzed and matched in similar fashion to the processes described for epicenter text noted above.
Mobile and CTV apps
Context is able to evaluate applications built for mobile device and CTV environments through the analysis of descriptions in Google Play, iOS and other app stores. For mobile, we categorize apps with a PEGI rating of 3 (for Google) or an Age rating of 4+ (for iOS) as "safe for mobile."