Problem Statement
Explain test data management strategies including data generation, test fixtures, database seeding, and data privacy considerations in CI/CD.
Explanation
Test data management ensures tests have consistent, realistic data without exposing sensitive information. Strategies: synthetic data generation, database seeding, test fixtures, data masking.
Synthetic data generation creates realistic test data:
```javascript
// Faker.js for realistic data
const { faker } = require('@faker-js/faker');
function generateTestUser() {
return {
id: faker.string.uuid(),
email: faker.internet.email(),
firstName: faker.person.firstName(),
lastName: faker.person.lastName(),
phone: faker.phone.number(),
address: {
street: faker.location.streetAddress(),
city: faker.location.city(),
zipCode: faker.location.zipCode(),
},
createdAt: faker.date.past(),
};
}
const testUsers = Array.from({ length: 100 }, generateTestUser);
```
Test fixtures provide consistent test data:
```python
# pytest fixtures
import pytest
@pytest.fixture
def test_user():
return {
'id': 1,
'email': 'test@example.com',
'name': 'Test User'
}
@pytest.fixture
def database(tmp_path):
db_path = tmp_path / "test.db"
db = Database(db_path)
db.create_tables()
yield db
db.close()
def test_create_user(database, test_user):
user_id = database.create_user(test_user)
assert user_id == test_user['id']
```
Database seeding for integration tests:
```javascript
// seeds/test-data.js
exports.seed = async function(knex) {
await knex('users').del();
await knex('products').del();
await knex('users').insert([
{ id: 1, email: 'user1@example.com', role: 'customer' },
{ id: 2, email: 'admin@example.com', role: 'admin' },
]);
await knex('products').insert([
{ id: 1, name: 'Product 1', price: 99.99, stock: 100 },
{ id: 2, name: 'Product 2', price: 149.99, stock: 50 },
]);
};
// Run in tests
beforeEach(async () => {
await knex.migrate.latest();
await knex.seed.run();
});
afterEach(async () => {
await knex.migrate.rollback();
});
```
Testcontainers with seeded database:
```java
@Container
private static PostgreSQLContainer<?> postgres =
new PostgreSQLContainer<>("postgres:13")
.withDatabaseName("testdb")
.withInitScript("test-data.sql");
@BeforeEach
void setUp() {
DataSource dataSource = createDataSource(postgres.getJdbcUrl());
userRepository = new UserRepository(dataSource);
}
```
test-data.sql:
```sql
INSERT INTO users (id, email, name) VALUES
(1, 'test1@example.com', 'Test User 1'),
(2, 'test2@example.com', 'Test User 2');
INSERT INTO orders (id, user_id, total, status) VALUES
(1, 1, 99.99, 'completed'),
(2, 1, 149.99, 'pending');
```
Data masking for production data:
```python
# Mask sensitive data from production backup
import hashlib
def mask_email(email):
username, domain = email.split('@')
return f"{username[:2]}***@{domain}"
def mask_phone(phone):
return f"***-***-{phone[-4:]}"
def anonymize_user(user):
return {
**user,
'email': mask_email(user['email']),
'phone': mask_phone(user['phone']),
'ssn': hashlib.sha256(user['ssn'].encode()).hexdigest()[:10],
'credit_card': None,
}
# Apply to production data export
with open('production_users.json') as f:
users = json.load(f)
masked_users = [anonymize_user(u) for u in users]
with open('test_users.json', 'w') as f:
json.dump(masked_users, f)
```
CI/CD data management:
```yaml
# GitLab CI
test:
stage: test
services:
- postgres:13
variables:
DATABASE_URL: "postgresql://test:test@postgres:5432/testdb"
before_script:
- npm run db:migrate
- npm run db:seed
script:
- npm test
after_script:
- npm run db:reset
```
Factory pattern for test objects:
```javascript
// factories/user.factory.js
class UserFactory {
static build(overrides = {}) {
return {
id: faker.string.uuid(),
email: faker.internet.email(),
name: faker.person.fullName(),
role: 'customer',
createdAt: new Date(),
...overrides
};
}
static buildAdmin() {
return this.build({ role: 'admin' });
}
static async create(overrides = {}) {
const user = this.build(overrides);
return await database.users.create(user);
}
}
// Usage in tests
test('admin can delete users', async () => {
const admin = await UserFactory.create({ role: 'admin' });
const user = await UserFactory.create();
const result = await deleteUser(admin, user.id);
expect(result).toBe(true);
});
```
Data privacy considerations:
1. Never use real production data directly
2. Mask PII (personally identifiable information)
3. Use synthetic data for tests
4. Secure test data storage
5. Rotate test credentials regularly
6. Document data classification
7. Comply with GDPR, CCPA requirements
Best practices: use factories for object creation, seed minimal required data, clean up after tests, isolate test data per test, version control seed files, automate data generation, use realistic but fake data, implement data refresh strategies. Understanding test data management ensures reliable, privacy-compliant tests.