Fake Firm Compensation Data Generator
ByLegal InnovAI LLCLegal Operations·Law Firm / Legal Business Management·Jurisdiction-neutral
About this skill
Fake Firm Compensation Data Generator produces realistic-but-fictional law-firm compensation panels covering every role — equity partners, non-equity partners, of counsel, associates (junior / mid / senior), paralegals, and professional staff. Output is a one-row-per-person-per-year panel, with a partner-only subset for analysis tools that scope to the partnership. Its defining feature is that the causal structure is dialed in and known. The author chooses the firm structure (size, lockstep vs modified-lockstep vs eat-what-you-kill), the practice mix, the demographic composition, the size of the direct gender pay gap, the strength of the origination-credit channel, the selection pattern, the role-composition skew, and the noise level — and every choice is recorded in a ground-truth manifest. The generator then writes a panel that encodes that structure precisely, with a CSV partner subset, a JSON config pre-pointed at the partner-pay-equity engine, a data dictionary, a README with the synthetic-data disclaimer, and the manifest itself as the answer key. Use cases: (1) compensation-system modeling — change a knob, regenerate, see the downstream pay distribution before any change touches real people; (2) prototyping comp dashboards and reporting tools that need shareable, fully-synthetic data; (3) training and teaching — make pay-gap decomposition tractable by working from data whose right answer is documented; (4) testing and validating a pay-equity analysis end-to-end — pair with Partner Pay Equity Analysis, recover the dialed-in gap from the generator's output, and check it against the manifest. What it is NOT: not billing or profitability test data (use Fake Billing Data Generator for that), not data about any real firm, not a benchmark of market compensation. Every pay gap in the output is designed and injected — it is fictional by construction. The bundled script is self-validating: it verifies on every run that the dataset encodes the structure the manifest claims, and aborts rather than deliver a panel that fails a critical check. Run is deterministic from a seed, so the same parameters always produce the same panel — scenarios compare cleanly, demos are repeatable. Outputs require professional review before use in any firm-facing decision. Synthetic data is for testing, demonstration, and training only; treat any modeling result from it as a methodological signal, not a finding about a real firm or any real person.
Preview before you buy:
Firm size: 142 partners (97 equity, 45 non-equity) plus 380 associates, 95 paralegals, 110 professional staff. Single firm; 5 offices. Compensation system: modified-lockstep with an origination overlay. Comp components: scale (40%), discretionary (25%), origination (30%), capital return (5% for equity partners). Practice mix: Corporate/M&A, Litigation, IP, Tax, Real Estate, Employment. Demographic composition: senior partner cohorts (pre-2010) ~70% male; junior partner cohorts (2018+) ~52% male. Associate gender mix balanced. Paralegal and professional staff skew female (~70%). Dialed-in design (the answer key the analysis should recover): - Direct gender gap on total comp: -6.5% (women earn 6.5% less, all else equal). - Origination credit channel: women receive 18% less origination credit than men with equivalent client books. - Selection: 15% higher attrition for women equity partners post-cohort year 8. - Noise: σ = 0.12 on log-comp residual. Output: 5-year panel (2021–2025), Excel optional, seed = 42.
The skill writes a synthetic 5-year compensation panel covering 727 people-years for the partnership (equity + non-equity) plus 2,925 people-years across associates, paralegals, and professional staff. Each row carries the person's demographic record (legal sex, gender identity, race, cohort, tier, equity points), their role and practice group, the comp components (scale, discretionary, origination, capital return), and the realized totals. A second file holds the partner-only subset, formatted as the canonical input shape for the Partner Pay Equity Analysis skill that may be purchased separately. A pre-filled JSON config points the pay-equity engine at that file so the round trip runs without additional setup. The ground-truth manifest (JSON) records every design choice the author dialed in — the direct gender gap, the origination channel strength, the selection rate, the noise level, the demographic composition, the practice mix — alongside the firm-wide summary statistics the generator actually realized and the gap a correct analysis should recover. The manifest is the answer key: feed the partner subset into Partner Pay Equity Analysis and check whether the analysis recovers the dialed-in number within the documented confidence band. A README accompanies the panel with the synthetic-data disclaimer, the scenario summary, and the round-trip steps. Two data dictionaries explain every column. With the Excel flag, the dataset, the dictionary, and the manifest also ship as a single .xlsx workbook with a synthetic-data banner. The generator self-validates before writing: it confirms the panel encodes the structure the manifest claims, and aborts rather than ship data that fails a critical check. The same seed always produces the same panel, so scenarios compare cleanly and demos are repeatable. Outputs require professional review.
Sanitized example, not professional advice. All sales final — use the preview to confirm fit before purchase.
Compatible models
The author has tested this skill on the providers below. The specific model list updates automatically as providers ship new models or retire old ones. Compatibility with providers not listed below is not guaranteed — the skill may not produce equivalent results outside the tested set.
Data handling
Seller of record
- Business name
- Legal InnovAI LLC
- Entity type
- Verified business (Stripe-KYC'd)
- Location
- Colorado
This is the party you have a software-license contract with. If you aren't satisfied with the skill, please contact this party directly to work it out.
Version history
- v1.0.0Current2026-05-23
Existing buyers receive new versions free of charge. Pin to a specific version from your library if your workflow needs the exact bundle behavior of an earlier release.
Buyer reviews
No reviews yet — be the first after you buy.
- Tools are starting points, like templates. Read every file in the bundle before running, modify for your workflow, and assess safety and legal implications for your use case.
- Outputs vary run-to-run. Generative AI is non-deterministic by design — the same skill on the same input can produce different results, and outputs can vary across sessions, model versions, and provider load conditions. Your input will differ and your model may differ, so you should expect your output to vary from the example above. Variance is normal, not a defect.
- All sales final. Skills are immediately downloadable digital goods.


