Data science
fromMedium
5 days agoData Quality on Spark, Part 4: Deequ
Deequ provides scalable, Spark-native tools for defining, profiling, and analyzing data quality checks with Scala APIs and an optional Python wrapper (PyDeequ).
Air is built on FastAPI, so we could use [pyinstrument's instructions](https://pyinstrument.readthedocs.io/en/latest/guide.html#profile-a-web-request-in-fastapi) modified. However, because profilers reveal a LOT of internal data, in our example we actively use an environment variable. You will need both `air` and `pyinstrument` to get this working: ```sh # preferred uv add "air[standard]" pyinstrument # old school pip install "air[standard]" pyinstrument ```
Code Optimizations is an AI-based service running on Azure Application Insights that uses telemetry gathered by the Application Insights Profiler for .NET to analyse runtime behaviour, find performance bottlenecks down to individual methods, and provide actionable suggestions. Developers can view aggregated data over time (defaulting to a rolling 24‑hour window, with history up to 30 days) for their production and non-production environments.