A batch analytics platform with a 3-layer data engineering pipeline (Raw → Staging → Analytics) that analyzes trending GitHub repositories across 3 programming languages (Python, TypeScript with Next.js >= 16, and Go). Leverages Render Workflows' distributed task execution to process data in parallel, storing results in a dimensional model for high-performance analytics.
- Multi-Language Analysis: Tracks Python, TypeScript/Next.js, and Go repositories
- 3-Layer Data Pipeline: Raw ingestion → Staging validation → Analytics dimensional model
- Parallel Processing: 4 concurrent workflow tasks using Render Workflows SDK
- Render Ecosystem Spotlight: Dedicated showcase for Render-deployed projects
- Real-time Dashboard: Next.js 14 dashboard with analytics visualizations
- Hourly Updates: Automated cron job triggers workflow execution
graph TD
A[Cron Job Hourly] --> B[Workflow Orchestrator]
B --> C[Python Analyzer]
B --> D[TypeScript Analyzer]
B --> E[Go Analyzer]
B --> F[Render Ecosystem]
C --> G[Raw Layer JSONB]
D --> G
E --> G
F --> G
G --> H[Staging Layer Validated]
H --> I[Analytics Layer Fact/Dim]
I --> J[Next.js Dashboard]
Backend (Workflows)
- Python 3.11+
- Render Workflows SDK with
@taskdecorators - asyncpg for PostgreSQL
- aiohttp for async API calls
- GitHub REST API
Frontend (Dashboard)
- Next.js 14 (App Router)
- TypeScript
- Tailwind CSS
- Recharts for visualizations
- PostgreSQL (pg)
Infrastructure
- Render Workflows (task execution)
- Render Cron Job (hourly trigger)
- Render Web Service (Next.js dashboard)
- Render PostgreSQL (data storage)
trender/
├── workflows/
│ ├── workflow.py # Main workflow with @task decorators
│ ├── github_api.py # Async GitHub API client
│ ├── connections.py # Shared resource management
│ ├── metrics.py # Momentum/activity calculations
│ ├── render_detection.py # Render usage detection
│ ├── etl/
│ │ ├── extract.py # Raw layer extraction
│ │ ├── transform.py # Staging transformations
│ │ ├── load.py # Analytics layer loading
│ │ └── data_quality.py # Quality scoring
│ └── requirements.txt
├── trigger/
│ ├── trigger.py # Cron trigger script
│ └── requirements.txt
├── dashboard/
│ ├── app/ # Next.js App Router pages
│ ├── lib/
│ │ └── db.ts # Database utilities
│ └── package.json
├── database/
│ ├── schema/
│ │ ├── 01_raw_layer.sql
│ │ ├── 02_staging_layer.sql
│ │ ├── 03_analytics_layer.sql
│ │ └── 04_views.sql
│ └── init.sql
├── render.yaml
├── .env.example
└── README.md
If you've already completed the setup and just want to trigger a workflow run:
# Navigate to trigger directory
cd trigger
# Set environment variables
export RENDER_API_KEY=your_api_key
export RENDER_WORKFLOW_SLUG=trender-wf
# Install dependencies and run
pip install -r requirements.txt
python trigger.pyOr use the Render Dashboard: Workflows → trender-wf → Tasks → main_analysis_task → Run Task
- GitHub authentication (Personal Access Token or OAuth App - covered in step 2)
- Render account
- Node.js 18+ (for dashboard)
- Python 3.11+ (for workflows)
git clone <your-repo-url>
cd trenderTrender needs a GitHub access token to fetch repository data. You can choose between two authentication methods:
Best for: Individual developers, quick setup, local development
This is the simplest method - just create a token from GitHub settings.
cd workflows
pip install -r requirements.txt
python auth_setup.py- Open https://github.com/settings/tokens/new in your browser
- Configure the token:
- Note:
Trender Analytics Access - Expiration:
No expiration(or your preference) - Scopes:
- ✓
repo(Full control of private repositories) - ✓
read:org(Read org and team membership)
- ✓
- Note:
- Click "Generate token"
- Copy the token (starts with
ghp_orgithub_pat_) - Paste it into the terminal when prompted
The script will verify your token and display:
✅ SUCCESS! Your GitHub access token (PAT):
============================================================
ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
============================================================
Add this to your .env file:
GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxAdd the token to your .env file:
GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx✅ Done! Skip to Step 3.
Best for: Team setups, production deployments, requiring user authorization flow
- Go to https://github.com/settings/developers
- Click "New OAuth App"
- Fill in the details:
- Application name:
Trender Analytics - Homepage URL:
http://localhost:3000 - Authorization callback URL:
http://localhost:8000/callback
- Application name:
- Click "Register application"
- Note your Client ID (starts with
Ov23orIv1.) - Click "Generate a new client secret" and save it
GITHUB_CLIENT_ID=Ov23xxxxx_or_Iv1.xxxxx
GITHUB_CLIENT_SECRET=your_secret_herecd workflows
pip install -r requirements.txt
python auth_setup.pyChoose option [2] for OAuth, then:
- The script starts a local server on port 8000
- Your browser opens to GitHub's authorization page
- Click "Authorize" to approve
- The script exchanges the auth code for a token
- Your
GITHUB_ACCESS_TOKENis displayed
Add the token to your .env file:
GITHUB_ACCESS_TOKEN=gho_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- ✅ Tokens don't expire (unless you set expiration on PAT)
- ✅ Never commit tokens to version control (
.envis in.gitignore) - ✅ Token scopes:
repoandread:orgonly - ✅ Revoke access anytime at https://github.com/settings/tokens
⚠️ Treat tokens like passwords
PAT Issues:
- Token doesn't start with
ghp_: Classic tokens start withghp_, fine-grained tokens withgithub_pat_ - API returns 401: Token may be expired or revoked. Generate a new one.
- Rate limit errors: Ensure token has proper scopes selected
OAuth Issues:
- Port 8000 in use: Run
lsof -ti:8000 | xargs kill -9, then try again - "Redirect URI mismatch": Ensure callback URL in OAuth app is exactly
http://localhost:8000/callback - Browser doesn't open: Manually visit the URL shown in the terminal
- "Bad verification code": Code expires quickly. Run
python auth_setup.pyagain
Both Methods:
- Token verification fails: Check your internet connection
- Need to regenerate: Revoke old token at https://github.com/settings/tokens and generate new one
cp .env.example .env
# Edit .env with your credentialsYour .env file should now contain (from step 2):
If you used PAT (Option A):
GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxIf you used OAuth (Option B):
GITHUB_CLIENT_ID=Ov23xxxxx_or_Iv1.xxxxx
GITHUB_CLIENT_SECRET=your_secret_here
GITHUB_ACCESS_TOKEN=gho_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxOther required variables (add as you complete the setup):
DATABASE_URL: PostgreSQL connection string (from step 4)RENDER_API_KEY: Render API key (from https://dashboard.render.com/u/settings#api-keys)RENDER_WORKFLOW_SLUG:trender-wf(or your workflow slug from step 6)
- Go to Render Dashboard
- Create new PostgreSQL database named
trender - Note the connection string for
DATABASE_URL
# Connect to your Render PostgreSQL instance and run the initialization script
DATABASE_URL=YOUR_DATABASE_URL
psql $DATABASE_URL -f database/init.sqlIf you prefer to run the schema files one at a time:
# Run each schema file in order
psql $DATABASE_URL -f database/schema/01_raw_layer.sql
psql $DATABASE_URL -f database/schema/02_staging_layer.sql
psql $DATABASE_URL -f database/schema/03_analytics_layer.sql
psql $DATABASE_URL -f database/schema/04_views.sqlRaw Layer:
raw_github_repos: Stores complete GitHub API responsesraw_repo_metrics: Stores repository metrics (stars, forks, issues)
Staging Layer:
stg_repos_validated: Cleaned and validated repository datastg_render_enrichment: Render-specific metadata and detection
Analytics Layer:
- Dimension tables:
dim_repositories,dim_languages,dim_render_services - Fact tables:
fact_repo_snapshots,fact_render_usage,fact_workflow_executions
Views:
- Pre-aggregated analytics views for dashboard queries
Check that all tables were created successfully:
psql $DATABASE_URL -c "\dt"You should see 12+ tables across the raw, stg, dim, and fact prefixes.
- Connection refused: Ensure your
DATABASE_URLis correct and the Render PostgreSQL instance is active - Permission denied: Make sure you're using the connection string with full admin privileges
- Tables already exist: Drop the database and recreate it, or use
DROP TABLE IF EXISTSstatements
The render.yaml file defines:
- Web Service: Next.js dashboard
- Cron Job: Hourly workflow trigger
- Database: PostgreSQL instance
Deploy to Render:
# Push to GitHub and connect to Render
# Or use Render Blueprint buttonAfter deploying via render.yaml, add your GitHub access token to the workflow service (trender-wf) in the Render Dashboard:
- Go to your
trender-wfworkflow in Render Dashboard - Navigate to Environment tab
- Add:
GITHUB_ACCESS_TOKEN: The token you generated in step 2 (starts withghp_orgho_orgithub_pat_)DATABASE_URL: Automatically connected from the database (no action needed)
Important: After adding the token, trigger a manual deploy:
- Click "Manual Deploy" → "Clear build cache & deploy"
- This ensures the environment variables are available to your workflow tasks
Note: You only need GITHUB_ACCESS_TOKEN in Render. If you used OAuth, you don't need to add GITHUB_CLIENT_ID or GITHUB_CLIENT_SECRET to Render.
There are three ways to trigger a workflow run to populate data:
The trigger/trigger.py script uses the Render SDK to trigger workflows programmatically:
cd trigger
# Install dependencies
pip install -r requirements.txt
# Set required environment variables
export RENDER_API_KEY=your_render_api_key
export RENDER_WORKFLOW_SLUG=trender # Your workflow slug from Render dashboard
# Run the trigger script
python trigger.pyExpected output:
Triggering task: trender/main-analysis-task
✓ Workflow triggered successfully at 2026-01-20 12:00:00
Task Run ID: run_abc123xyz
Initial Status: running
- Go to Render Dashboard
- Navigate to Workflows section
- Select your
trenderworkflow - Click "Trigger Workflow" button
- Select the
main-analysis-tasktask - Click "Run Task"
If you have the Render CLI installed:
# Install Render CLI (if not already installed)
npm install -g @render-inc/cli
# Login to Render
render login
# Trigger the workflow
render workflows trigger trender main-analysis-taskCheck the workflow status:
- Via Dashboard: Go to Workflows → trender → View recent runs
- Via Script: The trigger script outputs the Task Run ID
- Via Database: Query the
fact_workflow_executionstable:
psql $DATABASE_URL -c "SELECT * FROM fact_workflow_executions ORDER BY execution_date DESC LIMIT 1;"Expected workflow completion time: 8-15 seconds for ~300 repositories
- "RENDER_API_KEY not set": Export your API key from Render Settings
- "Task not found": Verify your workflow slug and that the workflow is deployed
- "Connection refused": Check that
DATABASE_URLis correct and the database is running - Workflow fails: Check the Render dashboard logs for detailed error messages
Once the workflow completes, access your dashboard at:
https://trender-dashboard.onrender.com
You should see:
- Top trending repositories across Python, TypeScript, and Go
- Render ecosystem projects
- Momentum scores and analytics
- Historical trends
- Stores complete GitHub API responses
- Tables:
raw_github_repos,raw_repo_metrics - Purpose: Audit trail and reprocessing capability
- Cleaned and validated data
- Tables:
stg_repos_validated,stg_render_enrichment - Data quality scoring (0.0 - 1.0)
- Business rules applied
- Dimensions:
dim_repositories,dim_languages,dim_render_services - Facts:
fact_repo_snapshots,fact_render_usage,fact_workflow_executions - Views: Pre-aggregated analytics for dashboard
The workflow consists of 8 tasks decorated with @task:
main_analysis_task: Orchestrator that spawns parallel tasksfetch_language_repos: Fetches repos for Python, TypeScript, Goanalyze_repo_batch: Analyzes repos in batches of 10fetch_render_ecosystem: Fetches Render-related projectsanalyze_render_projects: Analyzes Render-specific featuresaggregate_results: ETL pipeline execution (Extract → Transform → Load)store_execution_stats: Records workflow performance metrics
- Star Velocity:
(stars_last_7_days / total_stars) * 100 - Activity Score: Weighted formula using commits, issues, contributors
- Momentum Score:
(star_velocity * 0.4) + (activity_score * 0.6) - Render Boost: 1.2x multiplier for projects using Render
- Freshness Penalty: 0.9x for repos older than 180 days
cd workflows
pip install -r requirements.txt
python workflow.pycd dashboard
npm install
npm run dev
# Access at http://localhost:3000psql $DATABASE_URL -f database/schema/01_raw_layer.sql
psql $DATABASE_URL -f database/schema/02_staging_layer.sql
psql $DATABASE_URL -f database/schema/03_analytics_layer.sql
psql $DATABASE_URL -f database/schema/04_views.sqlTechnical:
- Process 300+ repos across 3 languages in under 10 seconds
- 3x speedup vs sequential processing
- 99%+ success rate on workflow runs
- Data quality score >= 0.90 for 95%+ repositories
Marketing:
- Showcase 50+ Render ecosystem projects
- Track Render adoption vs competitors
- Identify case study candidates
MIT
Contributions welcome! Please open an issue or submit a pull request.