Dela via


ScrapingBee (Independent Publisher) (Preview)

ScrapingBee is the most powerful web scraping service on the web. It will handle headless browsers, proxies, CAPTCHAs, extracting complex structured information from any website with CSS selectors, and running JavaScript scenarios (click, scroll, form filling, etc.).

This connector is available in the following products and regions:

Service Class Regions
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure Government regions
     -   Azure China regions
     -   US Department of Defense (DoD)
Power Automate Premium All Power Automate regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Power Apps Premium All Power Apps regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Contact
Name Troy Taylor
URL https://www.hitachisolutions.com
Email ttaylor@hitachisolutions.com
Connector Metadata
Publisher Troy Taylor
Website https://www.scrapingbee.com/
Privacy policy https://www.scrapingbee.com/privacy-policy/
Categories Website

Creating a connection

The connector supports the following authentication types:

Default Parameters for creating connection. All regions Not shareable

Default

Applicable: All regions

Parameters for creating connection.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name Type Description Required
API Key securestring The API Key for this api True

Throttling Limits

Name Calls Renewal Period
API calls per connection 100 60 seconds

Actions

Get usage

Retrieve information about credit consumption and concurrency usage.

Perform Google search

Retrieves a scrape of Google Search results pages

Scrap URL

Fetches the URL requested to scrap and will render JavaScript if requested.

Get usage

Retrieve information about credit consumption and concurrency usage.

Returns

Name Path Type Description
Max API Credit
max_api_credit integer

The max API credit.

Used API Credit
used_api_credit integer

The used API credit.

Max Concurrency
max_concurrency integer

The max concurrency.

Current Concurrency
current_concurrency integer

The current concurrency.

Renewal Subscription Date
renewal_subscription_date string

The renewal subscription date.

Retrieves a scrape of Google Search results pages

Parameters

Name Key Required Type Description
Search
search True string

The text you would put in the Google search bar.

Country Code
country_code string

The country you would like the request to come from.

Results
nb_results integer

The number of results to return.

Page
page integer

The page number to extract results from.

Language
language string

The language to return the results in.

Extra Params
extra_params string

Any additional URL parameters to submit.

Returns

Name Path Type Description
URL
meta_data.url string

The URL address.

Results
meta_data.number_of_results integer

The number of results.

Location
meta_data.location string

The location.

Organic Results
meta_data.number_of_organic_results integer

The number of organic results.

Ads
meta_data.number_of_ads integer

The number of ads.

Page
meta_data.number_of_page integer

The page number.

No Results Message
meta_data.no_results_message string

The no results message.

Organic Results
organic_results array of object
URL
organic_results.url string

The URL address.

Displayed URL
organic_results.displayed_url string

The displayed URL adress.

Description
organic_results.description string

The description.

Extra Info
organic_results.extra_info string

The extra info.

Position
organic_results.position integer

The position.

Title
organic_results.title string

The title.

Local Results
local_results array of string

The local results.

Top Ads
top_ads string

The top ads.

Bottom Ads
bottom_ads string

The bottom ads.

Related Queries
related_queries array of object
Text
related_queries.text string

The text.

Position
related_queries.position integer

The position.

Questions
questions array of string

The questions.

Scrap URL

Fetches the URL requested to scrap and will render JavaScript if requested.

Parameters

Name Key Required Type Description
URL
url True string

The URL you want to scrape.

Render JS
render_js True boolean

Render the website in an headless browser.

JS Scenario
js_scenario string

Execute JavaScript before rendering.

Wait
wait integer

Time to wait before rendering.

Wait For
wait_for string

Wait for a particular element to appear in the DOM.

Block Ads
block_ads boolean

Whether to block ads.

Block Resources
block_resources boolean

Whether to block all images and CSS.

Window Width
window_width integer

The width of the window to use.

Window Height
window_height integer

The height of the window to use.

Premium Proxy
premium_proxy boolean

Whether to use a proxy to scrap website.

Country Code
country_code string

The proxy country to use to scrap website.

Stealth Proxy
stealth_proxy boolean

Whether to use a stealth proxy to scrap website.

Own Proxy
own_proxy string

Your own proxy to use.

Extract Rules
extract_rules string

Extraction rules to parse the HTML before responding.

Screenshot
screenshot boolean

Take a screenshot of the requested website.

Screenshot Selector
screenshot_selector string

Take a screenshot of a particular CSS selector.

Screenshot Full Page
screenshot_full_page boolean

Take a screenshot of the entire website.

Return Page Source
return_page_source boolean

Return the page source as well.

Session ID
session_id integer

All API requests using the same session_id will be routed through the same IP address for a duration of 5 minutes.

Timeout
timeout integer

The maximum number of ms timeout, between 1000 and 140000 (default).

Cookies
cookies string

Custom cookie to pass to the website.

Device
device string

The kind of device sent to the server.

Custom Google
custom_google boolean

Set to true if scraping webpage on Google or a Google subdomain.

Returns

Name Path Type Description
Body
body string

The body.

Cookies
cookies array of object
Name
cookies.name string

The name.

Value
cookies.value string

The value.

Domain
cookies.domain string

The domain.

Path
cookies.path string

The path.

Expires
cookies.expires float

When expires.

Size
cookies.size integer

The size.

HTTP Only
cookies.httpOnly boolean

Whether only HTTP.

Secure
cookies.secure boolean

Whether secure.

Session
cookies.session boolean

Whether session.

Same Party
cookies.sameParty boolean

Whether the same party.

Source Scheme
cookies.sourceScheme string

The source scheme.

Source Port
cookies.sourcePort integer

The source port.

Evaluated Results
evaluate_results array of string

The evaluated results.

Age
headers.age string

The age.

Cache Control
headers.cache-control string

The cache control.

Content Encoding
headers.content-encoding string

The content encoding.

Content Security Policy
headers.content-security-policy string

The content security policy.

Content Type
headers.content-type string

The content type.

Date
headers.date string

The date.

ETag
headers.etag string

The eTag.

Referrer Policy
headers.referrer-policy string

The referrer policy.

Server
headers.server string

The server.

Strict Transport Security
headers.strict-transport-security string

The strict transport security.

X Content Type Options
headers.x-content-type-options string

The x content type options.

X Frame Options
headers.x-frame-options string

The x frame options.

X Matched Path
headers.x-matched-path string

The x matched path.

X Powered By
headers.x-powered-by string

The x powered by.

X Vercel Cache
headers.x-vercel-cache string

The x Vercel cache.

X Vercel ID
headers.x-vercel-id string

The x Vercel identifier.

Type
type string

The type.

IFrames
iframes array of string

The iFrames.

XHR
xhr array of object
URL
xhr.url string

The URL address.

Status Code
xhr.status_code integer

The status code.

Method
xhr.method string

The method.

Age
xhr.headers.age string

The age.

Cache Control
xhr.headers.cache-control string

The cache control.

Content Length
xhr.headers.content-length string

The content length.

Content Security Policy
xhr.headers.content-security-policy string

The content security policy.

Content Type
xhr.headers.content-type string

The content type.

Date
xhr.headers.date string

The date.

ETag
xhr.headers.etag string

The eTag.

Referrer Policy
xhr.headers.referrer-policy string

The referrer policy.

Server
xhr.headers.server string

The server.

Strict Transport Security
xhr.headers.strict-transport-security string

The strict transport security.

X Content Type Options
xhr.headers.x-content-type-options string

The X content type options.

X Frame Options
xhr.headers.x-frame-options string

The X frame options.

X Matched Path
xhr.headers.x-matched-path string

The X matching path.

X Vercel Cache
xhr.headers.x-vercel-cache string

The X Vercel cache.

X Vercel ID
xhr.headers.x-vercel-id string

The X Vercel identifier.

Access Control Allow Origin
xhr.headers.access-control-allow-origin string

The access control allow origin.

Access Control Expose Headers
xhr.headers.access-control-expose-headers string

The access control expose headers.

Alt SVC
xhr.headers.alt-svc string

The alt SVC.

Vary
xhr.headers.vary string

The vary.

Via
xhr.headers.via string

The via.

X Envoy Upstream Service Time
xhr.headers.x-envoy-upstream-service-time string

The X envoy upstream service time.

X Amazon Request ID
xhr.headers.x-amzn-requestid string

The X Amazon request identifier.

X Amazon Trace ID
xhr.headers.x-amzn-trace-id string

The X Amazon trace identifier.

Body
xhr.body string

The body.

Cost
cost integer

The cost.

Initial Status Code
initial-status-code integer

The initial status code.

Resolved URL
resolved-url string

The resolved URL address.

Microdata
metadata.microdata array of string

The microdata.

JSON LD
metadata.json-ld array of object
Context
metadata.json-ld.@context string

The context.

Type
metadata.json-ld.@type string

The type.

Name
metadata.json-ld.name string

The name.

URL
metadata.json-ld.url string

The URL address.

Description
metadata.json-ld.description string

The description.

Type
metadata.json-ld.mainEntityOfPage.@type string

The type.

URL
metadata.json-ld.mainEntityOfPage.url string

The URL address.

Type
metadata.json-ld.image.@type string

The type.

URL
metadata.json-ld.image.url string

The URL address.

Type
metadata.json-ld.publisher.@type string

The type.

Name
metadata.json-ld.publisher.name string

The name.

URL
metadata.json-ld.publisher.url string

The URL address.

Same As
metadata.json-ld.sameAs string

The same as.

Open Graph
metadata.opengraph array of object
Open Graph Title
metadata.opengraph.og:title string

The Open Graph title.

Open Graph Description
metadata.opengraph.og:description string

The Open Graph description.

Open Graph Site Name
metadata.opengraph.og:site_name string

The Open Graph site name.

Open Graph URL
metadata.opengraph.og:url string

The Open Graph URL address.

Open Graph Image
metadata.opengraph.og:image string

The Open Graph image.

Type
metadata.opengraph.@type string

The type.

OG
metadata.opengraph.@context.og string

The Open Graph.

Dublincore
metadata.dublincore array of object
Elements
metadata.dublincore.elements array of object
Name
metadata.dublincore.elements.name string

The name.

Content
metadata.dublincore.elements.content string

The content.

URI
metadata.dublincore.elements.URI string

The URI.

Terms
metadata.dublincore.terms array of string

The terms.