Что бы вступить в ряды "Принятый кодер" Вам нужно:
Написать 10 полезных сообщений или тем и Получить 10 симпатий.
Для того кто не хочет терять время,может пожертвовать средства для поддержки сервеса, и вступить в ряды VIP на месяц, дополнительная информация в лс.
Пользаватели которые будут спамить, уходят в бан без предупреждения. Спам сообщения определяется администрацией и модератором.
Гость, Что бы Вы хотели увидеть на нашем Форуме? Изложить свои идеи и пожелания по улучшению форума Вы можете поделиться с нами здесь. ----> Перейдите сюда
Все пользователи не прошедшие проверку электронной почты будут заблокированы. Все вопросы с разблокировкой обращайтесь по адресу электронной почте : info@guardianelinks.com . Не пришло сообщение о проверке или о сбросе также сообщите нам.

JavaScript отключен. Для полноценно использования нашего сайта, пожалуйста, включите JavaScript в своем браузере.

? Reputato: Not Every Company Is Golden. We Sniff Out the Ones That Are.

Автор темы Lomanu4
Дата начала 16 Май 2025

Оффлайн

Lomanu4

Команда форума

Администратор

#1

This is a submission for the

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

What I Built

Most of us have been there: you're looking at a company - maybe for a job, maybe for curiosity - and you wonder "What’s really going on behind their glossy careers page?" Is it a great place to work or just a PR-fueled mirage?

So I built an OSINT-style AI agent that gathers public information about companies from multiple sources. It’s not a recruiter bot. It’s the one doing background checks before you even click Apply.

The tool collects data from:

LinkedIn
Crunchbase
Glassdoor
Search news to surface any recent scandals or milestones

Once all the data is collected the tool generates a short summary of what it found - recent news, company reputation, signals from employee reviews and public profiles. Then it assigns a simple rating from 1 to 5 potatoes to reflect the overall picture.

Demo

The project is not fully deployed at the moment - I did try, honestly! But I ran into a 5-minute blocking bug when using the scraping_browser_* tools in Docker/Render, which I documented

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

.

For now, here’s the

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

.

Your browser does not support the video tag.
Screenshots of some summaries:

Open.AI

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Intel

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

How I Used Bright Data's Infrastructure

I used

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

with

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

.

Each data source is connected to a different Bright Data MCP server. Here's how:

LinkedIn → via web_data_linkedin_company_profile (Bright Data Dataset)
News / events / scandals → via search_engine
Glassdoor → via scraping_browser_navigate + scraping_browser_get_text
Crunchbase → via the same scraping browser tools

Each MCP server has its own WEB_UNLOCKER_ZONE and BROWSER_AUTH, and each agent logs all its requests and tool calls to

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

, so I can trace the exact sequence of scraping, parsing and merging.

The frontend is a simple Streamlit dashboard where you enter a company name. It sends a request to a FastAPI backend, which dispatches all four agents in parallel to gather and analyze the data.

I used openai:gpt-4.1-mini as the model behind each agent with the following system prompt to define their behavior:

You are a tool-using agent connected to Bright Data's MCP server.
You act as an OSINT investigator whose job is to evaluate companies based on public information.
Your goal is to help users understand whether a company is reputable or potentially suspicious.
You always use Bright Data real-time tools to search, navigate, and extract data from company profiles.
You never guess or assume anything.
Company name matching must be case-sensitive and exact. Do not return data for similarly named or uppercase-variant companies.
Only use the following tools during your investigation:
- `search_engine`
- `scrape_as_markdown`
- `scrape_as_html`
- `scraping_browser_navigate`
- `scraping_browser_get_text`
- `scraping_browser_click`
- `scraping_browser_links`
- `web_data_linkedin_company_profile`
Do not invoke any other tools even if they are available.
LinkedIn

The LinkedIn agent received this prompt:

Your task is to find the LinkedIn profile for the company '{company_name}' and extract specific structured data.
Use the `web_data_linkedin_company_profile` tool if available to extract the following fields:
- Company name
- Company description (short summary of what the company does)
- Number of employees (as listed on the LinkedIn profile)
- Linkedin company profile url
- Headquarters address
- Year the company was founded (if available)
- Industry or sector (e.g., 'Software', 'Healthcare')
- Company website
If the structured LinkedIn tool is unavailable or insufficient, use the following tools in order:
1. `scraping_browser_navigate` - to visit the LinkedIn company page
2. `scraping_browser_get_text` - to extract visible page text
3. `scraping_browser_links` and `scraping_browser_click` - to navigate if needed
Return ONLY a JSON object with the following keys:
{
"company_name": str,
"description": str,
"number_of_employees": str,
"linkedin_url": str,
"headquarters": str,
"founded": str or null,
"industry": str,
"website": str
}
Do not include raw HTML, markdown, explanations, or other fields.
If a field is missing, use null for that field. If the company cannot be found at all, return null.

And here’s what I saw in the logs when running a query for Google:

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

As you can see web_data_linkedin_company_profile was used.

Glassdoor

The Glassdoor agent uses the browser automation tools to navigate to the company’s profile and extract public employee reviews and ratings. The prompt guides it to:

Your task is to find the Glassdoor profile for the company '{company_name}' and extract specific structured data.

Extract the following fields:
- Overall company rating (float, out of 5)
- Total number of employee reviews
- A short summary of the top 5 pros and cons from employee reviews posted in 2025 or 2024 only
Use the following tools in order:
1. `scraping_browser_navigate` - to go to the Glassdoor company page
2. `scraping_browser_get_text` - to extract visible content
3. `scraping_browser_links` and `scraping_browser_click` - to find and open the review section if necessary
Return ONLY a JSON object with the following keys:
{
"rating": float,
"num_reviews": int,
"review_summary": str
}
Only use reviews from 2025 or 2024. Do not include older reviews.
Do not include HTML, markdown, or explanations.
If a field is missing, use null for that field. If the company cannot be found at all, return null.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Crunchbase

The Crunchbase agent follows a similar pattern to Glassdoor - it navigates to the company profile and extracts public funding info, key people and sector tags.

Search for the Crunchbase profile of the company '{company_name}'.
Once you find the correct page, extract the following information:
- Year founded (as a string or null)
- Latest funding round name
- Funding round date
- Funding amount
- List of known investors (as strings)
- Key people (e.g., founders, CEOs, etc)
Use the following tools in order:
1. `scraping_browser_navigate`
2. `scraping_browser_get_text`
3. `scraping_browser_links` and `scraping_browser_click`
Return ONLY a JSON object with the following keys:
{
"founded": str or null,
"funding_round": str or null,
"funding_date": str or null,
"funding_amount": str or null,
"investors": list[str] or null,
"key_people": list[str] or null
}
Do not include HTML, markdown, or explanations.
If a field is missing, use null for that field. If the company cannot be found at all, return null.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Even with Cloudflare's "Are you human?" check, scraping_browser_get_text was able to get through and extract the real page content.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

News & Events

The final agent uses the search_engine tool to search for company-related news articles, events or public mentions across Google and other engines. It extracts links and summaries from the search results and surfaces relevant headlines.

Search for news about the company '{company_name}' from 2023, 2024, and 2025.
Extract the following if available:
- Layoffs: Dates and brief summaries of any layoff announcements.
- Scandals: Brief, neutral headlines about controversies or investigations.
- Achievements: Public product launches, funding milestones, acquisitions, or major hires.
Return a structured JSON object with keys:
{
"layoffs": list[str],
"scandals": list[str],
"achievements": list[str]
}
If no news is found in a category, return an empty list.
Do not include HTML, explanations, or irrelevant information.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

After collecting data from all four sources, the outputs are cleaned and normalized into a consistent format. This structured input is then passed to openai:gpt-4o, which generates a concise company summary.

Performance Improvements

Real-time web access makes this tool actually useful. If you're relying on APIs or stale datasets, you’ll often miss recent news - like funding rounds, leadership changes, or layoffs that happened last week. With live scraping, you get a snapshot of how the company looks today, not how it looked last quarter. It helps cut through outdated signals and pick up on what’s actually happening - even if that means surfacing things the company would rather you didn’t see.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Вам необходимо войти или зарегистрироваться, чтобы здесь отвечать.

Поделиться:

Facebook Pinterest Tumblr WhatsApp Электронная почта Ссылка

На данном сайте используются cookie-файлы, чтобы персонализировать контент и сохранить Ваш вход в систему, если Вы зарегистрируетесь.
Продолжая использовать этот сайт, Вы соглашаетесь на использование наших cookie-файлов.

Accept Узнать больше....

Вверх Снизу