Data Science Toolkit - Logistic regression custom model service

The Logistic Regression Custom Model Service allows you to create logistic regression models (sometimes called "logit models") to predict the likelihood of clicks or conversions based on a combination of multiple signals. The logistic regression models can then be associated with a line item using the Line Item Service - ALI (Archived).

Note

This offering is currently in Closed Beta and available to a limited set of clients. For more information about the Advanced Ads Toolset and potential use cases that may apply to your business, reach out to your account representative.

The formula for logistic regression is:

Screenshot of the formula for logistic regression.

For online advertising, the event is a click, a pixel fire, or another online action. The probability is conditional on both the predictors x 1 through xn and on an implicit set of variables that represent the features in a bid request. The beta coefficients are the weights that the model assigns to different predictors.

We convert this probability of an event happening to an expected value by multiplying the probability by the event's value (the eCPC goal for a click prediction), adding an additive offset to the estimate, and then applying min/max expected value limits to reduce the impact of mispredictions.

The formula for deriving an expected value for an impression from the probability of an event happening is:

Screenshot of the formula for deriving an expected value for an impression from the probability of an event happening.

The offset will usually be 0. However, a negative value may be useful as a security factor to ensure performance at the expense of delivery on low-performing inventory. That will ensure that the advertiser does not bid, instead of bidding very little and potentially incurring fixed fees.

For more information about how logistic regression custom models work, see Logistic Regression Models.

REST API

Add a new logistic regression model

POST https://api.appnexus.com/custom-model-logit
(template JSON)

Modify a logistic regression model

PUT https://api.appnexus.com/custom-model-logit?id=CUSTOM_MODEL_LOGIT_ID
(template JSON)

Delete a logistic regression model

DELETE https://api.appnexus.com/custom-model-logit?id=CUSTOM_MODEL_LOGIT_ID

View all logistic regression models

GET https://api.appnexus.com/custom-model-logit 

View a specific logistic regression model

GET https://api.appnexus.com/custom-model-logit?id=CUSTOM_MODEL_LOGIT_ID

JSON fields

General

Field Type Description
id int The ID of the logistic regression custom model.
Default: Auto-generated number
Required on: PUT/ DELETE in query string
name string The name of the model.
beta0 float β 0 coefficient in the logistic regression equation.
max float Max in the expected value equation.
min float Min in the expected value equation.
offset float Offset in the expected value equation.
scale float Scale in the expected value equation.
predictors array Array of predictors. See Predictors below for more details.

Predictors

Field Type Description
predictor_type scalar_descriptor, custom_model_descriptor, freq_rec_descriptor, segment_descriptor, categorical_descriptor
keys
hash_seed
default_value
hash_table_size_log
coefficients

Hashed descriptor

This endpoint is to submit a pre-hashed table. bucket_index0 and bucket_index1, each 64 bits long, are there to support hashing algorithms that produce long values as keys. Currently, we only support one hashing algorithm: MurmurHash3_x64_128, which will create two 64 bit integers but we only use the lower 64 bits of the hash.

Values in bucket_index0 must always be smaller than (2 ^ hash_table_size_log) or they will get rejected.

Currently, the values in bucket_index1 are ignored as this is to be used for future expansion. If a value is sent for bucket_index1, it must be 0. The parameter is optional.

Hash Table keys

For each of your Hash Table keys, you will need a uint32 value. These values should be the ID of respective object that you are referencing from our system - domain_id, for instance, rather than the domain string value. These uint32 keys are then transformed into a byte array (little-endian), and hashed.

Python example

hash_bucket  = (mmh3.hash64(bytes, seed)[ 0 ])  % table_size

Field Type Description
type
keys Array of one to 5 descriptors in this list:
hashed_seeds Seeds used when passed to Murmurhash3_x64_128 function, only first one is used for now, array is for planned future hash functions that need more than one seed
hash_id Existing hash table ID
default_value Value returned by the descriptor if no match is found in your hash table
hash_table_size_log Log of maximum value for a key of your table. Values larger than 2^hash_table_size_log will be rejected. Max for hash_table_size_log is 64 (no bucketing)

Hashed descriptor example

{
    "type": "hashed",
    "keys": [ 
    ],
    "hash_seeds": [42, 42, 42, 42, 42, 42], 
    "hash_id": ,
    "default_value": 0,                     
    "hash_table_size_log": 20                
}

Lookup Table (LUT) descriptor

Field Type Description
type
feature_keyword
default_value
initial_range_log
bucket_count_log_per_range

LUT descriptor example

{
        "type": "lookup",
        "default_value": 0.1,
        "features": [
                {
                        "type": "categorical_descriptor",
                        "feature_keyword": "advertiser_id"
                },
        {
                        "type": "scalar_descriptor",
                        "feature_keyword": "user_age",
                        "default_value": 0
                }
        ],
        "coefficients": [
                {'weight': 1.1, 'key': [1, 1]},
                {'weight': 1.3, 'key': [2, 2]},
                {'weight': 1.2, 'key': [3, 3]},
        ]
}

Categorical descriptor

Since this is finding an exact match, the default_valueinitial_range_log and bucket_count_log_per_range params are not needed.

Missing values

The key value of -1 can be used as a placeholder for a missing feature; For instance, when a domain is not reported by the seller. Otherwise, the default value of the LUT or Hash Model will be used, since no match will be found for that feature value.

Field Type Description
type
feature_keyword - country
- region
- city
- dma
- postal_code
- user_day
- user_hour
- os_family
- os_extended
- browser
- language
- user_gender
- domain
- ip_address
- position
- placement
- placement_group
- publisher
- seller_member_id
- supply_type
- device_type
- device_model
- carrier
- mobile_app
- mobile_app_instance
- mobile_app_bundle
- appnexus_intended_audience
- seller_intended_audience
- spend_protection
- user_group_id
- advertiser_id
- brand_category
- creative
- inventory_url_id
- media_type
default_value
object_type Advertiser, li - ne_item, campaign (not split?)
object_id ID of the referenced advertiser,

Categorical descriptor example

{
    "type": "categorical_descriptor",
    "feature_keyword": "city"
}

Frequency and recency descriptor

Field Type Description
type
feature_keyword frequency_life
frequency_daily
recency
default_value
object_type Advertiser, line_item, campaign (not split?)
object_id ID of the referenced advertiser,
default_value Value returned by the descriptor if no match is found
initial_range_log Used for log bucketing, initial range
bucket_count_log_per_range used for log bucketing, # of buckets per range

Frequency and recency descriptor example

{
    "type": "frequency_recency_descriptor",
    "feature_keyword": 'frequency_life',
    "object_type": 'advertiser',
    "object_id": 1,                   
    "default_value": 0,              
    "initial_range_log": 4,           
    "bucket_count_log_per_range": 2   
}

Scalar descriptor

Field Type Description
type
feature_keyword - appnexus_audited
- cookie_age
- estimated_average_price
- estimated_clearing_price
- predicted_iab_view_rate
- predicted_video_completion_rate
- self_audited
- size
- creative_size
- spend_protection
- uniform
- user_age

Note: The size descriptor is represented as a string in your models ("300x250", for instance), though converted to a scalar in our bidder. Any size is technically valid in our system, hence this feature being treated as a scalar rather than a categorical feature.
default_value Value returned by the descriptor if no match is found
initial_range_log Used for log bucketing, initial range
bucket_count_log_per_range Used for log bucketing, # of buckets per range

Scalar descriptor example

{
    "type": "scalar_descriptor",
    "feature_keyword": "cookie_age",
    "default_value": 0,              
    "initial_range_log": 4,           
    "bucket_count_log_per_range": 2   
} 

Segment descriptor

Field Type Description
type
feature_keyword segment_value
segment_age
segment_presence
segment_id ID of referenced segment
default_value Value returned by the descriptor if no match is found
initial_range_log Used for log bucketing, initial range
bucket_count_log_per_range Used for log bucketing, # of buckets per range

Segment descriptor example

{
    "type": "segment_descriptor",
    "feature_keyword": "segment_age",
    "segment_id": 2,                  
    "default_value": 0,              
    "initial_range_log": 4,           
    "bucket_count_log_per_range": 2   
}